Record Extractor
info
The following content is for the new DocSearch infrastructure. If you haven't received an email to migrate your account yet, please refer to the legacy documentation.
#
Introductioninfo
This documentation will only contain information regarding the helpers.docsearch method, see Algolia Crawler Documentation for more information on the Algolia Crawler.
Pages are extracted by a recordExtractor
. These extractors are assigned to actions
via the recordExtractor
parameter. This parameter links to a function that returns the data you want to index, organized in a array of JSON objects.
The helpers are a collection of functions to help you extract content and generate Algolia records.
#
Useful links#
UsageThe most common way to use the DocSearch helper, is to return its result to the recordExtractor
function.
recordExtractor: ({ helpers }) => { return helpers.docsearch({ recordProps: { lvl0: { selectors: "header h1", }, lvl1: "article h2", lvl2: "article h3", lvl3: "article h4", lvl4: "article h5", lvl5: "article h6", content: "main p, main li", }, });},
#
Complex extractors$
)#
Using the Cheerio instance (You can also use the provided Cheerio instance ($)
to exclude content from the DOM:
recordExtractor: ({ $, helpers }) => { // Removing DOM elements we don't want to crawl $(".my-warning-message").remove();
return helpers.docsearch({ recordProps: { lvl0: { selectors: "header h1", }, lvl1: "article h2", lvl2: "article h3", lvl3: "article h4", lvl4: "article h5", lvl5: "article h6", content: "main p, main li", }, });},
#
With fallback DOM selectorsEach lvlX
and content
supports fallback selectors as an array of string, which allows for robust config files:
recordExtractor: ({ $, helpers }) => { return helpers.docsearch({ recordProps: { // `.exists h1` will be selected if `.exists-probably h1` does not exists. lvl0: { selectors: [".exists-probably h1", ".exists h1"], } lvl1: "article h2", lvl2: "article h3", lvl3: "article h4", lvl4: "article h5", lvl5: "article h6", // `.exists p, .exists li` will be selected. content: [ ".does-not-exists p, .does-not-exists li", ".exists p, .exists li", ], }, });},
#
With custom variablesCustom variables are useful to filter content in the frontend (version
, lang
, etc.).
These selectors also support defaultValue
and fallback selectors
recordExtractor: ({ helpers }) => { return helpers.docsearch({ recordProps: { lvl0: { selectors: "header h1", }, lvl1: "article h2", lvl2: "article h3", lvl3: "article h4", lvl4: "article h5", lvl5: "article h6", content: "main p, main li", // The variables below can be used to filter your search foo: ".bar", language: { // It also supports the fallback DOM selectors syntax! selectors: ".does-not-exists", // Since custom variables are used for filtering, we allow sending // multiple raw values defaultValue: ["en", "en-US"], }, version: { // You can send raw values without `selectors` defaultValue: ["latest", "stable"], }, }, });},
The version
, lang
and foo
attribute of these records will be :
foo: "valueFromBarSelector",language: ["en", "en-US"],version: ["latest", "stable"]
You can now use them to filter your search in the frontend
defaultValue
)#
With raw text (The lvl0
and custom variables selectors also accepts a fallback raw value:
recordExtractor: ({ $, helpers }) => { return helpers.docsearch({ recordProps: { lvl0: { // It also supports the fallback DOM selectors syntax! selectors: ".exists-probably h1", defaultValue: "myRawTextIfDoesNotExists", }, lvl1: "article h2", lvl2: "article h3", lvl3: "article h4", lvl4: "article h5", lvl5: "article h6", content: "main p, main li", // The variables below can be used to filter your search language: { // It also supports the fallback DOM selectors syntax! selectors: ".exists-probably .language", // Since custom variables are used for filtering, we allow sending // multiple raw values defaultValue: ["en", "en-US"], }, }, });},
recordProps
API Reference#
lvl0
#
type: Lvl0
| required
type Lvl0 = { selectors: string | string[]; defaultValue?: string;};
lvl1
, content
#
type: string | string[]
| required
lvl2
, lvl3
, lvl4
, lvl5
, lvl6
#
type: string | string[]
| optional
pageRank
#
type: string
| optional
[k: string]
)#
Custom variables (
type: string | string[] | CustomVariable
| optional
type CustomVariable = | { defaultValue: string | string[]; } | { selectors: string | string[]; defaultValue?: string | string[]; };
Contains values that can be used as facetFilters