# Parameters

The API has the following parameters, only the token and url are mandatory, the rest are optional.

# token

Required
Type string

This parameter is required for all calls

This is your authentication token. You have two tokens; one for normal requests and another one for JavaScript requests.

Use the JavaScript token when the content you need to crawl is generated via JavaScript, either because it's a JavaScript built page (React, Angular, etc.) or because the content is dynamically generated on the browser.

Normal token

_USER_TOKEN_

JavaScript token

_JS_TOKEN_

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&url=https%3A%2F%2Fgithub.com%2Fcrawlbase%3Ftab%3Drepositories"

# url

Required
Type string

This parameter is required for all calls

You will need a URL to crawl. Make sure it starts with http or https and that is fully encoded.

For example, in the following URL: https://github.com/crawlbase?tab=repositories the URL should be encoded when calling the API like the following: https%3A%2F%2Fgithub.com%2Fcrawlbase%3Ftab%3Drepositories

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&url=https%3A%2F%2Fgithub.com%2Fcrawlbase%3Ftab%3Drepositories"

# format

Optional
Type string

Indicates the response format, either json or html. Defaults to html.

If format html is used, Crawlbase will send you back the response parameters in the headers (see HTML response below).

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&url=https%3A%2F%2Fgithub.com%2Fcrawlbase%3Ftab%3Drepositories&format=json"

# pretty

Optional
Type boolean

If you're expecting a json response, you can optimize its readability by employing &pretty=true.

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&url=https%3A%2F%2Fgithub.com%2Fcrawlbase%3Ftab%3Drepositories&format=json&pretty=true"

# user_agent

Optional
Type string

If you want to make the request with a custom user agent, you can pass it here and our servers will forward it to the requested URL.

We recommend to NOT use this parameter and let our artificial intelligence handle this.

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&user_agent=Mozilla%2F5.0+%28Macintosh%3B+Intel+Mac+OS+X+10_12_5%29+AppleWebKit%2F603.2.4+%28KHTML%2C+like+Gecko%29+Version%2F10.1.1+Safari%2F603.2.4&url=https%3A%2F%2Fpostman-echo.com%2Fheaders"

# page_wait

Optional
Type number

If you are using the JavaScript token, you can optionally pass page_wait parameter to wait an amount of milliseconds before the browser captures the resulting html code.

This is useful in cases where the page takes some seconds to render or some ajax needs to be loaded before the html is being captured.

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_JS_TOKEN_&page_wait=1000&url=https%3A%2F%2Fgithub.com%2Fcrawlbase%3Ftab%3Drepositories"

# ajax_wait

Optional
Type boolean

If you are using the JavaScript token, you can optionally pass ajax_wait parameter to wait for the ajax requests to finish before getting the html response.

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_JS_TOKEN_&ajax_wait=true&url=https%3A%2F%2Fgithub.com%2Fcrawlbase%3Ftab%3Drepositories"

# css_click_selector

Optional
Type string

# Single CSS Selector

If you are using the JavaScript token, you can optionally pass the css_click_selector parameter to click an element on the page before the browser captures the resulting HTML code.

This parameter accepts a fully specified and valid CSS selector. For example, you can use an ID selector such as #some-button, a class selector like .some-other-button, or an attribute selector such as [data-tab-item="tab1"]. It is important to ensure that the CSS selector is properly encoded to avoid errors.

Please note, if the selector is not found on the page, the request will fail with pc_status 595. To receive a response even when a selector is not found, you can append a universally found selector, like body, as a fallback. For example: #some-button,body.

# Multiple CSS Selectors

To accommodate scenarios where multiple elements may need to be clicked sequentially before capturing the page content, the css_click_selector parameter can now accept multiple CSS selectors. Separate each selector with a pipe (|) character. Ensure the entire value, including separators, is URL-encoded to avoid any parsing issues.

Suppose you want to click a button with the ID start-button and then a link with the class next-page-link. You would construct your css_click_selector parameter like this:

Original selectors: #start-button|.next-page-link
URL-encoded: %23start-button%7C.next-page-link

Append this parameter to your API request to ensure both elements are clicked in the order specified.

Please ensure all selectors provided are valid and present on the page to avoid errors. If any selector is not found, the request will adhere to the error handling specified above, failing with pc_status 595 unless a fallback selector is included.

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_JS_TOKEN_&css_click_selector=%5Bdata-tab-item%3D%22overview%22%5D&page_wait=1000&url=https%3A%2F%2Fgithub.com%2Fcrawlbase%3Ftab%3Drepositories"

# device

Optional
Type string

Optionally, if you don't want to specify a user_agent but you want to have the requests from a specific device, you can use this parameter.

There are two options available: desktop and mobile.

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&device=mobile&url=https%3A%2F%2Fgithub.com%2Fcrawlbase%3Ftab%3Drepositories"

# get_cookies

Optional
Type boolean

Optionally, if you need to get the cookies that the original website sets on the response, you can use the &get_cookies=true parameter.

The cookies will come back in the header (or in the json response if you use &format=json) as original_set_cookie.

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&get_cookies=true&url=https%3A%2F%2Fgithub.com%2Fcrawlbase%3Ftab%3Drepositories"

# get_headers

Optional
Type boolean

Optionally, if you need to get the headers that the original website sets on the response, you can use the &get_headers=true parameter.

The headers will come back in the response as original_header_name by default. When &format=json is passed, the header will come back as original_headers.

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&get_headers=true&url=https%3A%2F%2Fgithub.com%2Fcrawlbase%3Ftab%3Drepositories"

# request_headers

Optional
Type string

Optionally, if you need to send request headers to the original website, you can use the &request_headers=EncodedRequestHeaders parameter.

Example request headers: accept-language:en-GB|accept-encoding:gzip

Example encoded: &request_headers=accept-language%3Aen-GB%7Caccept-encoding%3Agzip

Please note that not all request headers are allowed by the API. We recommend that you test the headers sent using this testing URL: https://postman-echo.com/headers

If you need to send some additional headers which are not allowed by the API, please let us know the header names and we will authorize them for your token.

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&request_headers=accept-language%3Aen-GB%7Caccept-encoding%3Agzip&url=https%3A%2F%2Fpostman-echo.com%2Fheaders"

# set_cookies

Optional
Type string

Optionally, if you need to send cookies to the original website, you can use the &cookies=EncodedCookies parameter.

Example cookies: key1=value1; key2=value2; key3=value3

Example encoded: &cookies=key1%3Dvalue1%3B%20key2%3Dvalue2%3B%20key3%3Dvalue3

We recommend that you test the cookies sent using this testing url: https://postman-echo.com/cookies

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&cookies=key1%3Dvalue1%3B%20key2%3Dvalue2%3B%20key3%3Dvalue3&url=https%3A%2F%2Fpostman-echo.com%2Fcookies"

# cookies_session

Optional
Type string

If you need to send the cookies that come back on every request to all subsequent calls, you can use the &cookies_session= parameter.

The &cookies_session= parameter can be any value. Simply send a new value to create a new cookies session (this will allow you to send the returned cookies from the subsequent calls to the next API calls with that cookies session value). The value can be a maximum of 32-characters and sessions expire in 300 seconds after the last API call.

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&cookies_session=1234abcd&url=https%3A%2F%2Fgithub.com%2Fcrawlbase%3Ftab%3Drepositories"

# screenshot

Optional
Type boolean

If you are using the JavaScript token, you can optionally pass &screenshot=true parameter to get a screenshot in the JPEG format of the whole crawled page.

Crawlbase will send you back the screenshot_url in the response headers (or in the json response if you use &format=json). The screenshot_url expires in one hour.

Note: When using the screenshot=true parameter, you can customize the screenshot output with these additional parameters:

mode: Set to viewport to capture only the viewport instead of the full page. Default is fullpage.
width: Specify maximum width in pixels (only works with mode=viewport). Default is screen width.
height: Specify maximum height in pixels (only works with mode=viewport). Default is screen height.

Example: &screenshot=true&mode=viewport&width=1200&height=800

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_JS_TOKEN_&screenshot=true&url=https%3A%2F%2Fgithub.com%2Fcrawlbase%3Ftab%3Drepositories"

# store

Optional
Type boolean

Optionally pass &store=true parameter to store a copy of the API response in the Crawlbase Cloud Storage (opens new window).

Crawlbase will send you back the storage_url in the response headers (or in the json response if you use &format=json).

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&store=true&url=https%3A%2F%2Fgithub.com%2Fcrawlbase%3Ftab%3Drepositories"

# scraper

Optional
Type string

Returns back the information parsed according to the specified scraper. Check the list of all the available data scrapers (opens new window) list of all the available data scrapers to see which one to choose.

The response will come back as JSON.

Please note: Scraper is an optional parameter. If you don't use it, you will receive back the full HTML of the page so you can scrape it freely.

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&scraper=amazon-product-details&url=https%3A%2F%2Fwww.amazon.com%2Fdp%2FB0B7CBZZ16"

# async

Optional
Type boolean
Currently only linkedin.com is supported using this parameter. Talk to us if you require other domains on async mode.

Optionally pass &async=true parameter to crawl the requested URL asynchronously. Crawlbase will store the resulted page in the Crawlbase Cloud Storage (opens new window).

As a result of doing a call with async=true, Crawlbase will send you back the request identifier rid in the json response. You will need to store the RID to retrieve the document from the storage. With the RID, you can then use the Cloud Storage (opens new window) to retrieve the resulted page.

You can use the async=true parameter in combination with other API parameter like for example &async=true&autoparse=true.

Example of request with async=true call:

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&async=true&url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fcrawlbase"

Example of response with async=true call:

{ "rid": "1e92e8bff32c31c2728714d4" }

# autoparse

Optional
Type boolean

Optionally, if you need to get the scraped data of the page that you requested, you can pass &autoparse=true parameter.

The response will come back as JSON. The structure of the response varies depending on the URL that you sent.

Please note: &autoparse=true is an optional parameter. If you don't use it, you will receive back the full HTML of the page so you can scrape it freely.

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&autoparse=true&url=https%3A%2F%2Fwww.amazon.com%2Fdp%2FB0B7CBZZ16"

# country

Optional
Type string

If you want your requests to be geolocated from a specific country, you can use the &country= parameter, like &country=US (two-character country code).

Please take into account that specifying a country can reduce the number of successful requests you get back, so use it wisely and only when geolocation crawls are required.

Also note that some websites like Amazon are routed via different special proxies and all countries are allowed regardless of being in the list or not.

You have access to the following countries

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&country=US&url=https%3A%2F%2Fpostman-echo.com%2Fip"

# tor_network

Optional
Type boolean

If you want to crawl onion websites over the Tor network, you can pass the &tor_network=true parameter.

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&tor_network=true&url=https%3A%2F%2Fwww.facebookcorewwwi.onion%2F"

# scroll

Optional
Type: boolean

Enables automated scrolling to load dynamic page content using a real browser session. Used with the JavaScript token.

Parameters

scroll=true: Enables scrolling.
scroll_interval: Integer (seconds). Sets the scrolling duration after page load. Default: 10. Maximum: 60.

Example: &scroll=true&scroll_interval=20

Behavior

When scroll=true is set, the API loads the URL in a real browser and programmatically scrolls the page for up to scroll_interval seconds to trigger dynamic content loading (e.g., infinite scroll).
After scrolling, the content is captured and returned.
If scroll_interval is not set, the default is 10 seconds.

Billing

Scroll-enabled requests are billed based on total server-side processing time:

Initial billing unit:
Each scroll=true API call is billed as 1 request, covering the first 8 seconds of total processing time (including page load and scrolling).
Additional billing units:
For every additional 5 seconds of processing time beyond the first 8 seconds, 1 extra billed request is added.
- Example calculation:
- Processing time: 20 seconds
  - 1 billed request for the first 8 seconds
    - +1 billed request for seconds 9–13
    - +1 billed request for seconds 14–18
    - +1 billed request (19–20s, this fraction is billed as a full block)
    - Total billed: 4 requests
- If the process completes before the set scroll_interval, only the actual processing time is billed.

Notes

Maximum allowed scroll_interval is 60 seconds. After 60 seconds, scrolling stops and data is returned.
Connection time: If using scroll_interval=60, keep your client connection open for up to 90 seconds.
Site-specific timeouts: Some domains may require longer server timeouts, handled automatically. Combining scroll with page_wait can increase total processing time and affect billing.

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_JS_TOKEN_&scroll=true&scroll_interval=20&url=https%3A%2F%2Fwww.reddit.com%2Fsearch%2F%3Fq%3Dcrawlbase"

# custom_success_codes

Optional
Type string

Allows you to specify custom HTTP status codes that should be treated as successful responses, preventing unnecessary retries while still preserving the original status code in the response.

Usage: custom_success_codes=403,429,503

This parameter is useful when targeting domains that return non-standard success codes (like 403 or 500) that should be considered successful for your specific use case.

Note: By using this parameter, you take responsibility for defining what constitutes a successful response for your requests.

curl
ruby
node
php
python
go

curl "https://api.crawlbase.com/?token=_USER_TOKEN_&custom_success_codes=403%2C429%2C503&url=https%3A%2F%2Fexample.com%2Fapi"

← Headless Browsers Response →