Crawlbase Documents

# How it works?

If you want to use the crawling API behind a proxy, please refer to the documenation of the Smart Proxy (opens new window) product. If you do not want to purchase a Smart Proxy subscription or you want to use all the features of the Crawling API without limitations and a higher rate limit then please continue reading the following paragraph.

All Crawling API in proxy mode calls should go to smartproxy.crawlbase.com using your access token as a proxy username. You can use either HTTPS (port 8001, recommended) or HTTP (port 8000) depending on your needs. Notice that these ports are different than the ports used in the Smart Proxy, so make sure to use the correct ports. Everything else that is referred in the Smart Proxy documentation stays the same.

Therefore making your first call is as easy as running one of the following lines in the terminal. Go ahead and try it!

Using HTTPS (recommended):

curl -x "https://[email protected]:8001" -k "https://httpbin.org/ip"

Using HTTP alternative:

curl -x "http://[email protected]:8000" -k "https://httpbin.org/ip"

To do JavaScript requests (headless browser) instead of normal requests, go ahead and try the following in your terminal:

Using HTTPS (recommended):

curl -x "https://[email protected]:8001" -k "https://httpbin.org/ip"

Using HTTP alternative:

curl -x "http://[email protected]:8000" -k "https://httpbin.org/ip"

# Rate Limit

By default the Crawling API in proxy mode is rate limited to 20 requests per second (1.728M req/day). If your proxy management solution is working with concurrent requests/threads instead of requests per second, its important to note that 20 requests per second converts to much more concurrent requests in general. As an example, If you are crawling Amazon with crawlbase, the average request takes about 4 seconds, therefore 20 requests per second converts to 80 concurrent threads. If the website you are crawling responds quick then you need less concurent requests. If you hit the limit of concurrent requests, please contact support (opens new window) with your usecase to increase your concurrency.

← Data Scrapers Try the API →