# Pushing data to the Enterprise Crawler
Before starting to push URLs to the Crawler, you first need to create a new Crawler one here (opens new window).
To push URLs to be crawled by the Crawler, you must use the Crawling API with two additional parameters:
- You must append
&callback=true - You must append
&crawler=YourCrawlerNameusing the name of the crawler you created here (opens new window).
In response to your crawler push, the API will send back a JSON representation with a unique request identifier RID. This RID is unique and will help you identify the request at any point in the future.
Example of push response:
{ "rid": "1e92e8bff32c31c2728714d4" }
By default, you can push up to 30 URLs each second to the Crawler.
# The Enterprise Crawler waiting queue limit
The combined total for all Crawler waiting queues is capped at 1M pages. If any of the queues or all queues combined exceed 1M pages, your Crawler push will temporarily pause, and we will notify you via email. Crawler push will automatically resume once pages in the waiting queue(s) are below 1M pages.
# Sending additional data
Optionally, you can receive custom headers to your callback if you use the callback_headers parameter. That is great for passing additional data for identification purposes at your side.
The format is the following: HEADER-NAME:VALUE|HEADER-NAME2:VALUE2|etc. And it must be encoded properly.
Example for headers and values MY-ID 1234, some-other 4321
&callback_headers=MY-ID%3A1234%7Csome-other%3A4321
Those headers will come back in the webhook post request.
# Per-request queue timeout
You can control how long a specific request is allowed to remain in the queue before it is processed by using the queue_timeout parameter. This is useful for time-sensitive crawls where a result is only valuable if delivered within a certain window.
| Parameter | Type | Description |
|---|---|---|
queue_timeout | Integer | Maximum time in minutes a request may wait in the queue before being processed. Accepted values: 1 to 10080 (1 minute to 7 days). If the request is not picked up by a worker within this time, it is marked as failed. If not provided or set to 0, no per-request queue timeout is enforced. |
Important notes:
- The timeout is evaluated against the time the request spends in the queue before processing begins. Once a worker picks up the request, the
queue_timeoutno longer applies. - Setting an aggressive timeout may increase the rate of failed requests. Choose a value that reflects how long the result remains useful for your use case.
- When a request expires due to
queue_timeout, you will receive a callback with an HTTP status of504and a Crawlbase status of699.
Example: Push a URL with a queue timeout of 30 minutes:
https://api.crawlbase.com/scraper?token=YOUR_TOKEN&callback=true&crawler=YourCrawlerName&queue_timeout=30