Status Codes
Crawlbase returns two status signals on every response: the standard HTTP status, and a pc_status header describing what Crawlbase did. Here's what each combination means.
Two statuses, two questions
Most HTTP APIs give you a single status code. Crawlbase gives you two because crawling involves two layers - Crawlbase's infrastructure, and the target site behind it.
The status of your request to Crawlbase.
200 means we processed it; 4xx/5xx means we couldn't.The status of Crawlbase's request to the target site.
200 means we got a clean page; other codes describe what went wrong upstream.The raw HTTP status the target site returned. Useful when the site itself returns a non-200 you need to handle (404, 403, etc.).
The mental model
Always check HTTP status first. If it's 200, then check pc_status. If that's 200, then check original_status for site-side errors.
HTTP status codes
What Crawlbase itself returned to your client.
| Code | Meaning | Action |
|---|---|---|
200 | Request processed. Check pc_status for outcome. | Continue to pc_status |
401 | Token missing or invalid. | Verify token; check it hasn't been reset |
402 | Out of credits or trial expired. | Top up account |
403 | Token doesn't have access to this product. | Use the right token type (Normal vs JS) |
422 | Malformed request - usually missing or unencoded URL. | URL-encode the url parameter |
429 | Concurrency limit reached. | Back off and retry; see Rate Limits |
500 | Crawlbase internal error. Rare and transient. | Retry with backoff; check status page |
503 | Service temporarily unavailable. | Retry with backoff |
pc_status codes
What happened during the actual crawl. Returned as the pc_status response header on every 200-OK request.
Success
| Code | Meaning |
|---|---|
200 | Page crawled successfully. Body is the target page's HTML or JSON. |
201 | Async request accepted. Result will be delivered to your webhook or stored under the rid. |
Target site responded with an error
Crawlbase reached the site, but the site itself returned a non-2xx. The body contains whatever the site sent back.
| Code | Meaning |
|---|---|
404 | Target page does not exist. |
410 | Target page has been permanently removed. |
451 | Page blocked for legal reasons in the target geography. |
Blocked or filtered
| Code | Meaning | What to try |
|---|---|---|
520 | Target site returned an empty or invalid response. | Retry; switch to JS token if not already |
521 | Target site refused the connection. | Check URL is correct; site may be down |
522 | Crawlbase couldn't reach the target site (timeout). | Retry; consider page_wait tuning |
523 | Target site sent a TLS handshake error. | Site may have certificate issues; report to support |
525 | Bot challenge couldn't be solved automatically. | Switch to JS token; some sites may need custom handling |
599 | Generic upstream failure. | Retry with backoff |
Reading the response
curl -i 'https://api.crawlbase.com/?token=YOUR_TOKEN&url=https%3A%2F%2Fexample.com'
# HTTP/1.1 200 OK
# pc_status: 200
# original_status: 200
# url: https://example.com/
# content-type: text/htmlfrom crawlbase import CrawlingAPI
api = CrawlingAPI({'token': 'YOUR_TOKEN'})
res = api.get('https://example.com')
# Layer 1: did Crawlbase accept the request?
if res['status_code'] != 200:
raise RuntimeError(f"Crawlbase: {res['status_code']}")
# Layer 2: did Crawlbase succeed in fetching the page?
if res['pc_status'] != 200:
raise RuntimeError(f"Crawl failed: {res['pc_status']}")
# Layer 3: did the target site return content?
if res['original_status'] != 200:
print(f"Site returned {res['original_status']}")
print(res['body'])Next steps
Patterns for retries, dead-letter queues, and observability.
Specific guidance on 429s and concurrency.

