Skip to main content
Version: current

Record Extractor

info

The following content is for the new DocSearch infrastructure. If you haven't received an email to migrate your account yet, please refer to the legacy documentation.

Introduction#

info

This documentation will only contain information regarding the helpers.docsearch method, see Algolia Crawler Documentation for more information on the Algolia Crawler.

Pages are extracted by a recordExtractor. These extractors are assigned to actions via the recordExtractor parameter. This parameter links to a function that returns the data you want to index, organized in a array of JSON objects.

The helpers are a collection of functions to help you extract content and generate Algolia records.

Useful links#

Usage#

The most common way to use the DocSearch helper, is to return its result to the recordExtractor function.

recordExtractor: ({ helpers }) => {  return helpers.docsearch({    recordProps: {      lvl0: {        selectors: "header h1",      },      lvl1: "article h2",      lvl2: "article h3",      lvl3: "article h4",      lvl4: "article h5",      lvl5: "article h6",      content: "main p, main li",    },  });},

Complex extractors#

Using the Cheerio instance ($)#

You can also use the provided Cheerio instance ($) to exclude content from the DOM:

recordExtractor: ({ $, helpers }) => {  // Removing DOM elements we don't want to crawl  $(".my-warning-message").remove();
  return helpers.docsearch({    recordProps: {      lvl0: {        selectors: "header h1",      },      lvl1: "article h2",      lvl2: "article h3",      lvl3: "article h4",      lvl4: "article h5",      lvl5: "article h6",      content: "main p, main li",    },  });},

With fallback DOM selectors#

Each lvlX and content supports fallback selectors as an array of string, which allows for robust config files:

recordExtractor: ({ $, helpers }) => {  return helpers.docsearch({    recordProps: {      // `.exists h1` will be selected if `.exists-probably h1` does not exists.      lvl0: {        selectors: [".exists-probably h1", ".exists h1"],      }      lvl1: "article h2",      lvl2: "article h3",      lvl3: "article h4",      lvl4: "article h5",      lvl5: "article h6",      // `.exists p, .exists li` will be selected.      content: [        ".does-not-exists p, .does-not-exists li",        ".exists p, .exists li",      ],    },  });},

With custom variables#

Custom variables are useful to filter content in the frontend (version, lang, etc.).

These selectors also support defaultValue and fallback selectors

recordExtractor: ({ helpers }) => {  return helpers.docsearch({    recordProps: {      lvl0: {        selectors: "header h1",      },      lvl1: "article h2",      lvl2: "article h3",      lvl3: "article h4",      lvl4: "article h5",      lvl5: "article h6",      content: "main p, main li",      // The variables below can be used to filter your search      foo: ".bar",      language: {        // It also supports the fallback DOM selectors syntax!        selectors: ".does-not-exists",        // Since custom variables are used for filtering, we allow sending        // multiple raw values        defaultValue: ["en", "en-US"],      },      version: {        // You can send raw values without `selectors`        defaultValue: ["latest", "stable"],      },    },  });},

The version, lang and foo attribute of these records will be :

foo: "valueFromBarSelector",language: ["en", "en-US"],version: ["latest", "stable"]

You can now use them to filter your search in the frontend

With raw text (defaultValue)#

The lvl0 and custom variables selectors also accepts a fallback raw value:

recordExtractor: ({ $, helpers }) => {  return helpers.docsearch({    recordProps: {      lvl0: {        // It also supports the fallback DOM selectors syntax!        selectors: ".exists-probably h1",        defaultValue: "myRawTextIfDoesNotExists",      },      lvl1: "article h2",      lvl2: "article h3",      lvl3: "article h4",      lvl4: "article h5",      lvl5: "article h6",      content: "main p, main li",      // The variables below can be used to filter your search      language: {        // It also supports the fallback DOM selectors syntax!        selectors: ".exists-probably .language",        // Since custom variables are used for filtering, we allow sending        // multiple raw values        defaultValue: ["en", "en-US"],      },    },  });},

recordProps API Reference#

lvl0#

type: Lvl0 | required

type Lvl0 = {  selectors: string | string[];  defaultValue?: string;};

lvl1, content#

type: string | string[] | required

lvl2, lvl3, lvl4, lvl5, lvl6#

type: string | string[] | optional

pageRank#

type: string | optional

Custom variables ([k: string])#

type: string | string[] | CustomVariable | optional

type CustomVariable =  | {      defaultValue: string | string[];    }  | {      selectors: string | string[];      defaultValue?: string | string[];    };

Contains values that can be used as facetFilters