Docs
Log in

Two statuses, two questions

Most HTTP APIs give you a single status code. Crawlbase gives you two because crawling involves two layers - Crawlbase's infrastructure, and the target site behind it.

HTTP status
int
The status of your request to Crawlbase. 200 means we processed it; 4xx/5xx means we couldn't.
pc_status
intheader
The status of Crawlbase's request to the target site. 200 means we got a clean page; other codes describe what went wrong upstream.
original_status
intheader
The raw HTTP status the target site returned. Useful when the site itself returns a non-200 you need to handle (404, 403, etc.).
The mental model

Always check HTTP status first. If it's 200, then check pc_status. If that's 200, then check original_status for site-side errors.

HTTP status codes

What Crawlbase itself returned to your client.

CodeMeaningAction
200Request processed. Check pc_status for outcome.Continue to pc_status
401Token missing or invalid.Verify token; check it hasn't been reset
402Out of credits or trial expired.Top up account
403Token doesn't have access to this product.Use the right token type (Normal vs JS)
422Malformed request - usually missing or unencoded URL.URL-encode the url parameter
429Concurrency limit reached.Back off and retry; see Rate Limits
500Crawlbase internal error. Rare and transient.Retry with backoff; check status page
503Service temporarily unavailable.Retry with backoff

pc_status codes

What happened during the actual crawl. Returned as the pc_status response header on every 200-OK request.

Success

CodeMeaning
200Page crawled successfully. Body is the target page's HTML or JSON.
201Async request accepted. Result will be delivered to your webhook or stored under the rid.

Target site responded with an error

Crawlbase reached the site, but the site itself returned a non-2xx. The body contains whatever the site sent back.

CodeMeaning
404Target page does not exist.
410Target page has been permanently removed.
451Page blocked for legal reasons in the target geography.

Blocked or filtered

CodeMeaningWhat to try
520Target site returned an empty or invalid response.Retry; switch to JS token if not already
521Target site refused the connection.Check URL is correct; site may be down
522Crawlbase couldn't reach the target site (timeout).Retry; consider page_wait tuning
523Target site sent a TLS handshake error.Site may have certificate issues; report to support
525Bot challenge couldn't be solved automatically.Switch to JS token; some sites may need custom handling
599Generic upstream failure.Retry with backoff

Reading the response

curl -i 'https://api.crawlbase.com/?token=YOUR_TOKEN&url=https%3A%2F%2Fexample.com'

# HTTP/1.1 200 OK
# pc_status: 200
# original_status: 200
# url: https://example.com/
# content-type: text/html
from crawlbase import CrawlingAPI

api = CrawlingAPI({'token': 'YOUR_TOKEN'})
res = api.get('https://example.com')

# Layer 1: did Crawlbase accept the request?
if res['status_code'] != 200:
    raise RuntimeError(f"Crawlbase: {res['status_code']}")

# Layer 2: did Crawlbase succeed in fetching the page?
if res['pc_status'] != 200:
    raise RuntimeError(f"Crawl failed: {res['pc_status']}")

# Layer 3: did the target site return content?
if res['original_status'] != 200:
    print(f"Site returned {res['original_status']}")

print(res['body'])

Next steps

Patterns for retries, dead-letter queues, and observability.
Specific guidance on 429s and concurrency.