# Pushing data to the Crawler
Before starting to push urls to the Crawler, you first need to create a new Crawler one here (opens new window).
To push urls to be crawled by the Crawler, you must use the Crawling API with two additional parameters:
- You must append
&callback=true
- You must append
&crawler=YourCrawlerName
using the name of the crawler you created here (opens new window).
In response to your crawler push, the API will send back a JSON representation with a unique request identifier RID. This RID is unique and will help you identify the request at any point in the future.
Example of push response:
{ "rid": "1e92e8bff32c31c2728714d4" }
By default, you can push up to 30 urls each second to the Crawler.
# Crawler waiting queue limit
The combined total for all Crawler waiting queues is capped at 1M pages. If any of the queues or all queues combined exceed 1M pages, your Crawler push will temporarily pause, and we will notify you via email. Crawler push will automatically resume once pages in the waiting queue(s) are below 1M pages.
# Sending additional data
Optionally, you can receive custom headers to your callback if you use the callback_headers
parameter. That is great for passing additional data for identification purposes at your side.
The format is the following: HEADER-NAME:VALUE|HEADER-NAME2:VALUE2|etc.
And it must be encoded properly.
Example for headers and values MY-ID 1234, some-other 4321
&callback_headers=MY-ID%3A1234%7Csome-other%3A4321
Those headers will come back in the webhook post request.