Docs
Log in

How the SDK is shaped

The .NET SDK is a thin wrapper around the same HTTP API documented in API Reference. Every Crawling API parameter you'd append as a query string in a raw HTTP call is reachable as a Dictionary<string, object> option — names, defaults, and behavior all map one-to-one.

One quirk worth knowing up front: the .NET SDK exposes the response state on the API instance itself, not on a returned value object. Calls like api.Get(url) return void; you read the result via api.StatusCode, api.Body, and so on. This is different from the Python / Node / Ruby / PHP SDKs (which return a response object). The Storage API is the exception — its methods return a response object you read directly.

What you get for using it instead of HttpClient directly:

  • URL encoding, parameter validation, and response parsing handled out of the box.
  • Sync + async pair on every verb — pick whichever fits your call site.
  • A single client class per Crawlbase API, all sharing the same constructor / call shape.
  • Sensible defaults (90-second timeout, automatic JSON parsing of format=json responses).

Source on github.com/crawlbase/crawlbase-net.

Install

Latest version on NuGet. Targets .NET 6+; tested through .NET 9.

# .NET CLI
dotnet add package CrawlbaseAPI

# Package Manager Console
Install-Package CrawlbaseAPI

# Or in csproj:
# <PackageReference Include="CrawlbaseAPI" Version="1.1.0" />

Authentication

Every Crawlbase API authenticates with the same token model. Two token types live on a single account:

  • Normal Token (TCP) — for static HTML, JSON endpoints, anything that doesn't need a browser. Faster + cheaper.
  • JavaScript Token — for SPAs, lazy-loaded feeds, anything that hides content behind client-side rendering. Required to use page_wait, ajax_wait, scroll, and css_click_selector.

Use environment variables or your DI container's configuration in production. Pattern:

// Pick the right token at instantiation; the SDK doesn't switch
// tokens per-call, so keep two clients if you alternate.
var api = new Crawlbase.API(Environment.GetEnvironmentVariable("CRAWLBASE_TOKEN"));
var js = new Crawlbase.API(Environment.GetEnvironmentVariable("CRAWLBASE_JS_TOKEN"));

await api.GetAsync("https://github.com/anthropic");

var opts = new Dictionary<string, object> { ["page_wait"] = 2000 };
await js.GetAsync("https://feed.example.com", opts);

Full token model + dashboard locations on the Authentication page.

Quickstart

Three lines from the namespace to a crawled response. Note that response state lives on the api instance:

var api = new Crawlbase.API("YOUR_TOKEN");
await api.GetAsync("https://github.com/anthropic");

if (api.StatusCode == 200) {
 Console.WriteLine(api.Body);
}

Branch on api.StatusCode (the SDK's HTTP status to Crawlbase) and api.CrawlbaseStatus (the Crawlbase verdict — see Errors below) when deciding whether to retry. Pass new Dictionary<string,object> { ["format"] = "json" } to receive a JSON envelope instead of raw page content.

All APIs in one package

Each Crawlbase product has a matching client class. Same constructor (single token string), same Get / GetAsync / Post / PostAsync shape.

string token = "YOUR_TOKEN";

var crawl = new Crawlbase.API(token); // Crawling API
var scraper = new Crawlbase.ScraperAPI(token); // parsed JSON for supported sites
var leads = new Crawlbase.LeadsAPI(token); // domain-scoped email extraction (legacy)
var shots = new Crawlbase.ScreenshotsAPI(token); // body is base64-encoded image
var storage = new Crawlbase.StorageAPI(token); // Cloud Storage CRUD

// Push high-volume async jobs to the Enterprise Crawler via the Crawling API:
// api.Get(url, options) where options carries `callback=true` + `crawler=YourCrawler`.
// See /docs/crawler for the queue-management workflow.

Common patterns

JavaScript rendering

For SPAs, lazy-loaded feeds, and pages where the initial HTML is empty, instantiate with the JavaScript token and pass any combination of page_wait, ajax_wait, scroll, and css_click_selector. Order to think about: a fixed wait, then network-idle, then scroll for lazy-load, then click for any gating UI element.

var api = new Crawlbase.API("YOUR_JS_TOKEN");

await api.GetAsync("https://spa.example.com", new Dictionary<string, object> {
 ["page_wait"] = 2000,
 ["ajax_wait"] = true,
 ["scroll"] = true,
});

Use a built-in scraper

Skip the parser entirely on supported sites. Pass ["scraper"] = "NAME" and the body becomes a JSON string with the structured fields documented on the per-scraper page.

using System.Text.Json;

var api = new Crawlbase.ScraperAPI("YOUR_TOKEN");
await api.GetAsync(
 "https://www.amazon.com/dp/B08N5WRWNW",
 new Dictionary<string, object> { ["scraper"] = "amazon-product-details" }
);

var data = JsonSerializer.Deserialize<JsonElement>(api.Body);
Console.WriteLine($"{data.GetProperty("name")} - {data.GetProperty("price")}");

Geo-routing

Pass ["country"] = "ISO" to route the crawl through that country's exit nodes. Use it any time the target serves localized content based on IP.

var api = new Crawlbase.API("YOUR_TOKEN");

// Hit the German Amazon catalog from a German residential IP
await api.GetAsync(
 "https://www.amazon.com/dp/B08N5WRWNW",
 new Dictionary<string, object> { ["country"] = "DE" }
);

Retry with backoff

The recommended retry shape: exponential backoff capped at 3-5 attempts, retry on transient errors only (5xx or empty body), don't retry on 4xx.

public async Task<bool> CrawlAsync(Crawlbase.API api, string url, int attempts = 5) {
 var rand = new Random();
 for (int i = 0; i < attempts; i++) {
 try {
 await api.GetAsync(url);
 } catch (Exception) {
 // SDK throws on transport failures — fall through to retry
 }
 if (api.StatusCode == 200 && api.CrawlbaseStatus == 200) {
 return true;
 }
 if (api.StatusCode is >= 400 and < 500) {
 throw new InvalidOperationException($"client error {api.StatusCode}: {url}");
 }
 // Exponential backoff with jitter
 var ms = (int) (rand.NextDouble() * Math.Pow(2, i) * 1000);
 await Task.Delay(ms);
 }
 return false;
}

Async crawls + webhooks

Fire-and-forget mode. Pass ["async"] = true with a ["callback"] URL; the call returns immediately and Crawlbase POSTs the result to your webhook when the page is ready. Useful for batch jobs and slow targets.

var api = new Crawlbase.API("YOUR_TOKEN");

await api.GetAsync("https://example.com", new Dictionary<string, object> {
 ["async"] = true,
 ["callback"] = "https://your-app.com/webhook",
});

// api.Body is a JSON envelope { rid: ... } — use that to correlate
// the eventual webhook delivery.
//
// Your ASP.NET / Minimal API endpoint receives a POST with:
// { rid, url, original_status, pc_status, body }

For very high volumes (millions of URLs), use the Enterprise Crawler which sits in front of this same async pipeline.

Sticky sessions

Some flows need the same residential IP across multiple calls. Pass cookies_session with a stable identifier and Crawlbase reuses the same exit node for ~30 minutes.

var api = new Crawlbase.API("YOUR_JS_TOKEN");

var session = $"checkout-{userId}";
var opts = new Dictionary<string, object> { ["cookies_session"] = session };

await api.GetAsync("https://shop.example.com/cart", opts);
await api.GetAsync("https://shop.example.com/checkout", opts);
await api.GetAsync("https://shop.example.com/confirm", opts);

Cloud Storage CRUD

The Storage API is the exception to the "response on api instance" pattern — its methods return a response object you read directly. Useful when reading back results stored from a previous Crawling API call (store=true).

var storage = new Crawlbase.StorageAPI("YOUR_TOKEN");

// Fetch by URL
var response = storage.GetByUrl("https://www.apple.com");
Console.WriteLine(response.OriginalStatus);
Console.WriteLine(response.CrawlbaseStatus);
Console.WriteLine(response.URL);
Console.WriteLine(response.RID);
Console.WriteLine(response.StoredAt);

// Or fetch by RID, delete, bulk-fetch, list RIDs, total count
var item = storage.GetByRID(rid);
bool deleted = storage.Delete(rid);
var items = storage.Bulk(new List<string> { rid1, rid2 });
var rids = storage.RIDs(100); // optional limit
var total = storage.TotalCount();

Errors & retries

The platform surfaces two status codes on every response: the SDK's own api.StatusCode (HTTP status of the request to Crawlbase itself) and api.CrawlbaseStatus (Crawlbase's verdict on the target — see the Crawling API errors table for the full list). Always branch on api.CrawlbaseStatus when deciding whether to retry — a target can return 200 with empty body, in which case StatusCode is 200 but CrawlbaseStatus is 520.

try {
 await api.GetAsync(url);
} catch (Exception ex) {
 log.LogError(ex, "transport error");
 return;
}

int pc = api.CrawlbaseStatus;

switch (pc) {
 case 200:
 UseBody(api.Body);
 break;
 case 520 or 525:
 // 520 = empty body, 525 = anti-bot couldn't be solved.
 // Switch to JS token and retry.
 await RetryWithJsTokenAsync(url);
 break;
 case 521 or 522 or 523:
 // Target unreachable or timed out. Retry with backoff.
 ScheduleRetry(url);
 break;
 default:
 log.LogError("crawl failed url={Url} crawlbase_status={CrawlbaseStatus}", url, pc);
 break;
}

All retries against the platform are free — only successful responses (CrawlbaseStatus: 200) count against your quota.

Performance & best practices

  • Reuse a single client per token. Register it as a singleton in your DI container — each instance opens its own underlying HttpClient. Don't construct one per request.
  • Use the cheapest token that works. Don't default to the JavaScript token "just in case" — Normal-token requests are faster and use less concurrency.
  • Prefer ajax_wait over page_wait. Fixed delays burn concurrency on every request, even fast ones.
  • Mind shared state on the API instance. Because Crawling/Scraper/Leads/Screenshots APIs write response state onto the api object (not a return value), do not share one instance across concurrent Tasks — a second await's GetAsync() will overwrite the first task's response state mid-read. Pool one instance per worker, or use the StorageAPI's return-object methods which are safe to interleave.
  • For batch jobs: async + webhook, or push to the Enterprise Crawler. Awaitable Tasks blocking on synchronous calls saturate concurrency caps quickly; async + webhook releases the slot the moment a request is queued.

Method reference

All non-Storage client classes share the same surface. Constructors take a token string; verbs come in sync + async pairs and write response state onto the api instance.

new Crawlbase.API(string token)
constructor
Initialize a Crawling API client. Same shape for Crawlbase.ScraperAPI, Crawlbase.LeadsAPI, Crawlbase.ScreenshotsAPI, Crawlbase.StorageAPI.
api.Get(string url, Dictionary options = null)
method
Send a GET (synchronous). Returns void; read response via properties on api.
api.GetAsync(string url, Dictionary options = null)
method
Send a GET (async). Returns Task. Same response model.
api.Post(...) / api.PostAsync(...)
method
Send a POST. data is the body — pass a Dictionary for form-encoded, a string for raw.

Response state — properties on the api instance after a call:

api.StatusCode
int
HTTP status of the SDK's request to Crawlbase.
api.CrawlbaseStatus
int
Crawlbase verdict on the target. Branch on this for retry decisions.
api.OriginalStatus
int
HTTP status the target returned to Crawlbase.
api.Body
string
Page content (or JSON string when format=json / scraper= was used). For ScreenshotsAPI, this is base64-encoded — convert with Convert.FromBase64String(api.Body).
api.StorageURL / api.StorageRID
string
Set when the call carried store=true. Use these to fetch the stored response back via StorageAPI.