Docs
Log in

Prerequisites

You need exactly two things:

Two tokens, one account

Each account has a Normal token (TCP, fastest) and a JavaScript token (full Chrome rendering). Pick based on the site — most APIs and static pages work with the Normal token.

Your first request

The Crawling API takes a single required parameter — url — fully URL-encoded. Drop in your token and you're crawling.

GEThttps://api.crawlbase.com/?token=YOUR_TOKEN&url=ENCODED_URL
curl 'https://api.crawlbase.com/?token=YOUR_TOKEN&url=https%3A%2F%2Fhttpbin.org%2Fheaders'
from crawlbase import CrawlingAPI

api = CrawlingAPI({'token': 'YOUR_TOKEN'})
res = api.get('https://httpbin.org/headers')

print(res['status_code'])
print(res['body'])
const { CrawlingAPI } = require('crawlbase');
const api = new CrawlingAPI({ token: 'YOUR_TOKEN' });

const res = await api.get('https://httpbin.org/headers');
console.log(res.statusCode, res.body);
require 'crawlbase'

api = Crawlbase::API.new(token: 'YOUR_TOKEN')
res = api.get('https://httpbin.org/headers')

puts res.status_code
puts res.body
<?php
use Crawlbase\CrawlingAPI;

$api = new CrawlingAPI(['token' => 'YOUR_TOKEN']);
$res = $api->get('https://httpbin.org/headers');

echo $res->statusCode . PHP_EOL;
echo $res->body;
package main

import (
    "fmt"
    "github.com/crawlbase/crawlbase-go"
)

func main() {
    api := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
    res, _ := api.Get("https://httpbin.org/headers")
    fmt.Println(res.StatusCode)
    fmt.Println(res.Body)
}

You'll get back the page HTML, plus a few headers describing what happened upstream. The most important ones:

original_status
int
The HTTP status the target site returned to us. Useful for distinguishing "site says 404" from "we couldn't reach the site".
pc_status
int
The Crawlbase status code. 200 means success. See status codes for the full list.
url
string
The final URL after any redirects. Useful when you want to know where you actually landed.
rid
stringoptional
A request identifier returned when you use &async=true or &store=true. Use it to look up the page in Cloud Storage.

Need JavaScript rendering?

Sites built with React, Vue, Angular, or anything that ships an empty HTML shell need a real browser. Switch to your JavaScript token — same endpoint, different token.

curl 'https://api.crawlbase.com/?token=YOUR_JS_TOKEN&url=https%3A%2F%2Freact-app.example.com&page_wait=2000'
from crawlbase import CrawlingAPI

api = CrawlingAPI({'token': 'YOUR_JS_TOKEN'})
res = api.get('https://react-app.example.com', {
    'page_wait': 2000,
    'ajax_wait': True,
})
print(res['body'])

Useful JS-rendering parameters:

  • page_wait — wait N milliseconds after load (defaults to 0).
  • ajax_wait — wait until network is idle.
  • css_click_selector — click an element before capturing.

See the full list in Crawling API parameters.

Next steps

You're crawling. Now pick a path: