Version: current

Migrating from legacy

Introduction#

With this new version of the DocSearch UI, we wanted to go further and provide better tooling for you to create and maintain your config file, and some extra Algolia features that you all have been requesting for a long time!

What's new?#

Scraper#

The DocSearch infrastructure now leverages the Algolia Crawler. We've teamed up with our friends and created a new DocSearch helper, that extracts records as we were previously doing with our beloved DocSearch scraper!

The best part, is that you no longer need to install any tooling on your side if you want to maintain or update your index!

We now provide a web interface that will allow you to:

Start, schedule and monitor your crawls
Edit your config file from our live editor
Test your results directly with DocSearch v3

Algolia application and credentials#

We've received a lot of requests asking for:

A way to manage team members
Browse and see how Algolia records are indexed
See and subscribe to other Algolia features

They are now all available, in your own Algolia application, for free :D

Config file key mapping#

Below are the keys that can be found in the legacy DocSearch configs and their translation to an Algolia Crawler config. More detailed documentation of the Algolia Crawler can be found on the the official documentation

`legacy`	`current`	description
`start_urls`	`startUrls`	Now accepts URLs only, see `helpers.docsearch` to handle custom variables
`page_rank`	`pageRank`	Can be added to the `recordProps` in `helpers.docsearch`
`js_render`	`renderJavaScript`	Unchanged
`js_wait`	`renderJavascript.waitTime`	See documentation of `renderJavaScript`
`index_name`	removed, see `actions`	Handled directly in the `actions`
`sitemap_urls`	`sitemaps`	Unchanged
`stop_urls`	`exclusionPatterns`	Supports `micromatch`
`selectors_exclude`	removed	Should be handled in the `recordExtractor` and `helpers.docsearch`
`custom_settings`	`initialIndexSettings`	Unchanged
`scrape_start_urls`	removed	Can be handled with `exclusionPatterns`
`strip_chars`	removed	`#` are removed automatically from anchor links, edge cases should be handled in the `recordExtractor` and `helpers.docsearch`
`conversation_id`	removed	Not needed anymore
`nb_hits`	removed	Not needed anymore
`sitemap_alternate_links`	removed	Not needed anymore
`stop_content`	removed	Should be handled in the `recordExtractor` and `helpers.docsearch`

FAQ#

Migration seems to have started, but I don't have received any emails#

Due to the large number of indices DocSearch has, we need to migrate configs in small incremental batches.

If you have not received a migration mail yet, don't worry, your turn will come!

What do I need to do to migrate?#

Nothing!

We handle all the migration on our side, your existing config file will be migrated to an Algolia Crawler config, crawls will be started and scheduled for you, your Algolia application will be ready to go, and your Algolia index populated with your website content!

What do I need to update to make the migration work?#

We've tried to make the migration as seamless as possible for you, so all you need to update is your frontend integration with the new credentials you've received by mail, or directly from the Algolia dashboard!

What should I do with my legacy config and credentials?#

Your legacy config will be parsed to a Crawler config, please use the dedicated web interface to make any changes if you already received your access!

Your credentials will remain available, but once all the existing configs have been migrated, we will stop the daily crawl jobs.

Are the `docsearch-scraper` and `docsearch-configs` repository still maintained?#

At the time you are reading this, the migration hasn't been completed, so yes they are still maintained.

Once the migration has been completed:

The docsearch-scraper will be archived and not maintained in favor of our Algolia Crawler, you'll still be able to use our run your own solution if you want!
The docsearch-configs repository will be archived and and host all of the existing and active legacy DocSearch config file, and their parsed version. You can get a preview on this branch.