PHP
Official PHP client for the Crawlbase platform. PSR-compatible, Composer-installable, works with PHP 7.4+ — same package, every API, sensible defaults.
How the SDK is shaped
The PHP SDK is a thin wrapper around the same HTTP API documented in API Reference. Every Crawling API parameter you'd append as a query string in a raw HTTP call is reachable from the SDK as a key in the options array — names, defaults, and behavior all map one-to-one. There is no parameter the SDK adds; there is no parameter it hides.
What you get for using it instead of cURL or Guzzle directly:
- URL encoding, parameter validation, and response parsing handled out of the box.
- PSR-4 autoloading — drop into any modern PHP framework (Laravel, Symfony, Slim) without ceremony.
- A single client class per Crawlbase API, all sharing the same constructor / call shape.
- Sensible defaults (90-second timeout, automatic JSON parsing of
format=jsonresponses, UTF-8-encoded bodies).
Source on github.com/crawlbase/crawlbase-php. Issues + PRs welcome.
Install
Latest version on Packagist. Requires PHP 7.4+; tested through PHP 8.3.
composer require crawlbase/crawlbase
# Or add to composer.json directly:
# "crawlbase/crawlbase": "^1.0"Authentication
Every Crawlbase API authenticates with the same token model. Two token types live on a single account:
- Normal Token (TCP) — for static HTML, JSON endpoints, anything that doesn't need a browser. Faster + cheaper.
- JavaScript Token
— for SPAs, lazy-loaded feeds, anything that hides content behind client-side rendering. Required to use
page_wait,ajax_wait,scroll, andcss_click_selector.
Use environment variables (or your framework's config — Laravel config(), Symfony parameters) in production. The SDK doesn't read env vars itself — that's deliberate so you stay in control of where credentials come from. Pattern:
<?php
require 'vendor/autoload.php';
use Crawlbase\CrawlingAPI;
// Pick the right token at instantiation; the SDK doesn't switch
// tokens per-call, so keep two clients if you alternate.
$api = new CrawlingAPI(['token' => getenv('CRAWLBASE_TOKEN')]);
$js = new CrawlingAPI(['token' => getenv('CRAWLBASE_JS_TOKEN')]);
$api->get('https://github.com/anthropic');
$js->get('https://feed.example.com', ['page_wait' => 2000]);Full token model + dashboard locations on the Authentication page.
Quickstart
Three lines from autoload to crawled HTML:
<?php
require 'vendor/autoload.php';
$api = new \Crawlbase\CrawlingAPI(['token' => 'YOUR_TOKEN']);
$res = $api->get('https://github.com/anthropic');
if ($res->statusCode == 200) {
echo $res->body;
}Branch on ->statusCode (the SDK's HTTP status to Crawlbase) and ->headers->pc_status (the Crawlbase verdict — see Errors below) when deciding whether to retry. Pass ['format' => 'json'] to receive a JSON envelope instead of raw page content.
All APIs in one package
Every Crawlbase API has a matching client class. Same constructor, same get / post verbs.
<?php
use Crawlbase\{CrawlingAPI, ScraperAPI, LeadsAPI, ScreenshotsAPI, StorageAPI};
$token = ['token' => 'YOUR_TOKEN'];
$crawl = new CrawlingAPI($token); // general-purpose page fetch
$scraper = new ScraperAPI($token); // parsed JSON for supported sites
$leads = new LeadsAPI($token); // domain-scoped email extraction (legacy)
$shots = new ScreenshotsAPI($token); // screenshots of any URL
$storage = new StorageAPI($token); // Cloud Storage CRUD
// Push high-volume async jobs to the Enterprise Crawler via the Crawling API:
// $api->get($url, ['async' => true, 'callback' => '...', 'crawler' => 'YourCrawler']).
// See /docs/crawler for the queue workflow.Common patterns
JavaScript rendering
For SPAs, lazy-loaded feeds, and pages where the initial HTML is empty, instantiate with the JavaScript token and pass any combination of page_wait, ajax_wait, scroll, and css_click_selector. Order to think about: a fixed wait, then network-idle, then scroll for lazy-load, then click for any gating UI element.
$api = new \Crawlbase\CrawlingAPI(['token' => 'YOUR_JS_TOKEN']);
$res = $api->get('https://spa.example.com', [
'page_wait' => 2000,
'ajax_wait' => true,
'scroll' => true,
]);Use a built-in scraper
Skip the parser entirely on supported sites. Pass 'scraper' => 'NAME' and the response body becomes a JSON string with the structured fields documented on the per-scraper page.
<?php
use Crawlbase\ScraperAPI;
$api = new ScraperAPI(['token' => 'YOUR_TOKEN']);
$res = $api->get('https://www.amazon.com/dp/B08N5WRWNW',
['scraper' => 'amazon-product-details']);
$data = json_decode($res->body, true);
echo $data['name'] . ' - ' . $data['price'];Geo-routing
Pass 'country' => 'ISO' to route the crawl through that country's exit nodes. Use it any time the target serves localized content based on IP.
$api = new \Crawlbase\CrawlingAPI(['token' => 'YOUR_TOKEN']);
// Hit the German Amazon catalog from a German residential IP
$res = $api->get('https://www.amazon.com/dp/B08N5WRWNW', ['country' => 'DE']);Retry with backoff
The recommended retry shape: exponential backoff capped at 3-5 attempts, retry on transient errors only (5xx or empty body), don't retry on 4xx.
<?php
use Crawlbase\CrawlingAPI;
function crawl(CrawlingAPI $api, string $url, int $attempts = 5) {
for ($i = 0; $i < $attempts; $i++) {
$res = $api->get($url);
if ($res->statusCode === 200 && (int) $res->headers->pc_status === 200) {
return $res;
}
if ($res->statusCode >= 400 && $res->statusCode < 500) {
throw new RuntimeException("client error {$res->statusCode}: $url");
}
usleep((int) (mt_rand() / mt_getrandmax() * pow(2, $i) * 1_000_000));
}
throw new RuntimeException("Failed: $url");
}Async crawls + webhooks
Fire-and-forget mode. The SDK call returns immediately with an rid; Crawlbase POSTs the result to your callback URL when the page is ready. Useful for batch jobs and slow targets.
$api = new \Crawlbase\CrawlingAPI(['token' => 'YOUR_TOKEN']);
$res = $api->get('https://example.com', [
'async' => true,
'callback' => 'https://your-app.com/webhook',
]);
$rid = $res->rid; // correlate the eventual webhook delivery
// Your Laravel / Symfony / Slim webhook receives a POST with:
// { rid, url, original_status, pc_status, body }For very high volumes (millions of URLs), use the Enterprise Crawler which sits in front of this same async pipeline.
Sticky sessions
Some flows need the same residential IP across multiple calls. Pass cookies_session with a stable identifier and Crawlbase reuses the same exit node for ~30 minutes.
$api = new \Crawlbase\CrawlingAPI(['token' => 'YOUR_JS_TOKEN']);
$session = "checkout-{$userId}";
$api->get('https://shop.example.com/cart', ['cookies_session' => $session]);
$api->get('https://shop.example.com/checkout', ['cookies_session' => $session]);
$api->get('https://shop.example.com/confirm', ['cookies_session' => $session]);Errors & retries
The platform surfaces two status codes on every response: the SDK's own ->statusCode (HTTP status of the request to Crawlbase itself) and ->headers->pc_status (Crawlbase's verdict on the target — see the Crawling API errors table for the full list). Always branch on ->headers->pc_status when deciding whether to retry — a target can return 200 with empty body, in which case ->statusCode is 200 but ->headers->pc_status is 520.
$res = $api->get($url);
$pc = (int) $res->headers->pc_status;
switch (true) {
case $pc === 200:
use_body($res->body);
break;
case in_array($pc, [520, 525], true):
// 520 = empty body, 525 = anti-bot couldn't be solved.
// Switch to JS token and retry.
retry_with_js_token($url);
break;
case in_array($pc, [521, 522, 523], true):
// Target unreachable or timed out. Retry with backoff.
schedule_retry($url);
break;
default:
$logger->error('crawl failed', ['url' => $url, 'pc_status' => $pc]);
}All retries against the platform are free — only successful responses (pc_status: 200) count against your quota.
Performance & best practices
- Reuse a single client per token. Build it once at app boot (Laravel service provider, Symfony service container) and inject everywhere — each instance opens its own connection.
- Use the cheapest token that works. Don't default to the JavaScript token "just in case" — Normal-token requests are faster and use less concurrency. Promote to JS only when the Normal response is empty or anti-bot-blocked.
- Prefer
ajax_waitoverpage_wait. Fixed delays burn concurrency on every request, even fast ones. - For batch jobs: async + webhook, or push to the Enterprise Crawler. Queue workers calling the SDK synchronously will saturate your concurrency cap; async + webhook releases the slot the moment a request is queued.
- Watch the
remainingresponse header. It carries the number of concurrency slots you have left.
Method reference
All client classes share the same surface. Constructor takes an options array; verbs mirror the underlying HTTP methods.
'timeout' in seconds (default 90).$options maps any Crawling API parameter to its value.$data is the body — pass an array for form-encoded, a string for raw.Response shape — public properties on the response object returned from each verb:
format=json / scraper= was used).->headers->pc_status— Crawlbase verdict on the target (branch on this for retry decisions).->headers->original_status— HTTP status the target site returned to Crawlbase.->headers->storage_url/->headers->rid— set when the call carried'store' => true.

