Docs
Log in

How the SDK is shaped

The Go SDK is intentionally lean. One client — CrawlingAPI — covers every Crawlbase product through the unified Crawling API endpoint:

Use casePass in options
Plain crawl(nothing — the default)
Built-in scraper"scraper": "amazon-product-details" (and the rest of the catalog)
Screenshot"screenshot": "true"
Email extraction"scraper": "email-extractor"
Async + webhook"async": "true" + "callback": "https://..."
Push to Enterprise Crawler"async": "true" + "callback" + "crawler": "YourCrawler"

The standalone /scraper, /leads, and /screenshots endpoints (which the older Crawlbase SDKs wrap with separate client classes) have been closed to new sign-ups since 2024. The Go SDK ships only the modern path — one client, every product, no vestigial classes.

What you get for using it instead of net/http directly:

  • URL encoding, parameter validation, and response parsing handled out of the box.
  • Idiomatic Go surface — (result, error) returns, named struct fields, no panics on transport failures.
  • context.Context support on every verb via *WithContext variants for cancellation / deadlines / trace propagation.
  • Sensible defaults (90-second timeout, transparent gzip decompression, automatic JSON parsing of format=json / scraper= responses).

Source on github.com/crawlbase/crawlbase-go. Reference on pkg.go.dev. Issues + PRs welcome.

Install

Latest version on pkg.go.dev. Requires Go 1.21+.

go get github.com/crawlbase/crawlbase-go@latest

# Or pin a specific version
go get github.com/crawlbase/[email protected]

Authentication

Every Crawlbase API authenticates with the same token model. Two token types live on a single account:

  • Normal Token (TCP) — for static HTML, JSON endpoints, anything that doesn't need a browser. Faster + cheaper.
  • JavaScript Token — for SPAs, lazy-loaded feeds, anything that hides content behind client-side rendering. Required to use page_wait, ajax_wait, scroll, and css_click_selector.

Use environment variables in production. The SDK doesn't read env vars itself — that's deliberate so you stay in control of where credentials come from. Pattern:

package main

import (
 "log"
 "os"

 "github.com/crawlbase/crawlbase-go"
)

func main() {
 // Pick the right token at instantiation; the SDK doesn't switch
 // tokens per-call, so keep two clients if you alternate.
 api, err := crawlbase.NewCrawlingAPI(os.Getenv("CRAWLBASE_TOKEN"))
 if err != nil {
 log.Fatal(err)
 }
 js, err := crawlbase.NewCrawlingAPI(os.Getenv("CRAWLBASE_JS_TOKEN"))
 if err != nil {
 log.Fatal(err)
 }

 api.Get("https://github.com/anthropic", nil)
 js.Get("https://feed.example.com", map[string]string{"page_wait": "2000"})
}

The constructor returns crawlbase.ErrTokenRequired if the token string is empty. Full token model + dashboard locations on the Authentication page.

Quickstart

Three lines from import to crawled HTML:

package main

import (
 "fmt"
 "log"

 "github.com/crawlbase/crawlbase-go"
)

func main() {
 api, err := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
 if err != nil {
 log.Fatal(err)
 }
 res, err := api.Get("https://github.com/anthropic", nil)
 if err != nil {
 log.Fatal(err)
 }
 if res.StatusCode == 200 {
 fmt.Println(res.Body)
 }
}

Branch on res.StatusCode (the SDK's HTTP status to Crawlbase) and res.PCStatus (the Crawlbase verdict — see Errors below) when deciding whether to retry. Pass map[string]string{"format": "json"} to receive a JSON envelope instead of raw page content (auto-parsed into res.JSON).

Common patterns

JavaScript rendering

For SPAs, lazy-loaded feeds, and pages where the initial HTML is empty, instantiate with the JavaScript token and pass any combination of page_wait, ajax_wait, scroll, and css_click_selector. Order to think about: a fixed wait, then network-idle, then scroll for lazy-load, then click for any gating UI element.

api, _ := crawlbase.NewCrawlingAPI("YOUR_JS_TOKEN")
res, err := api.Get("https://spa.example.com", map[string]string{
 "page_wait": "2000",
 "ajax_wait": "true",
 "scroll": "true",
})

Use a built-in scraper

Skip the parser entirely on supported sites. Pass "scraper": "NAME" and the response Body becomes a JSON string with the structured fields documented on the per-scraper page. The body is also pre-decoded into res.JSON so you can read fields directly.

api, _ := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
res, err := api.Get(
 "https://www.amazon.com/dp/B08N5WRWNW",
 map[string]string{"scraper": "amazon-product-details"},
)
if err != nil {
 log.Fatal(err)
}

if name, ok := res.JSON["name"].(string); ok {
 fmt.Println(name)
}

Geo-routing

Pass "country": "ISO" to route the crawl through that country's exit nodes. Use it any time the target serves localized content based on IP.

api, _ := crawlbase.NewCrawlingAPI("YOUR_TOKEN")

// Hit the German Amazon catalog from a German residential IP
res, _ := api.Get(
 "https://www.amazon.com/dp/B08N5WRWNW",
 map[string]string{"country": "DE"},
)

Retry with backoff

The recommended retry shape: exponential backoff capped at 3-5 attempts, retry on transient errors only (5xx or empty body), don't retry on 4xx.

import (
 "fmt"
 "math"
 "math/rand"
 "time"

 "github.com/crawlbase/crawlbase-go"
)

func Crawl(api *crawlbase.CrawlingAPI, url string, attempts int) (*crawlbase.Response, error) {
 for i := 0; i < attempts; i++ {
 res, err := api.Get(url, nil)
 if err != nil {
 return nil, err
 }
 if res.StatusCode == 200 && res.PCStatus == 200 {
 return res, nil
 }
 if res.StatusCode >= 400 && res.StatusCode < 500 {
 return nil, fmt.Errorf("client error %d: %s", res.StatusCode, url)
 }
 // Exponential backoff with jitter
 d := time.Duration(rand.Float64() * math.Pow(2, float64(i)) * float64(time.Second))
 time.Sleep(d)
 }
 return nil, fmt.Errorf("failed: %s", url)
}

Async crawls + webhooks

Fire-and-forget mode. The SDK call returns immediately with an RID; Crawlbase POSTs the result to your callback URL when the page is ready. Useful for batch jobs and slow targets.

api, _ := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
res, _ := api.Get("https://example.com", map[string]string{
 "async": "true",
 "callback": "https://your-app.com/webhook",
})
rid := res.RID // correlate the eventual webhook delivery

// Your net/http handler receives a POST with:
// { rid, url, original_status, pc_status, body }

For very high volumes (millions of URLs), push to the Enterprise Crawler by adding "crawler": "YourCrawlerName" alongside the async + callback options.

Sticky sessions

Some flows need the same residential IP across multiple calls. Pass cookies_session with a stable identifier and Crawlbase reuses the same exit node for ~30 minutes.

api, _ := crawlbase.NewCrawlingAPI("YOUR_JS_TOKEN")

session := fmt.Sprintf("checkout-%d", userID)
opts := map[string]string{"cookies_session": session}

api.Get("https://shop.example.com/cart", opts)
api.Get("https://shop.example.com/checkout", opts)
api.Get("https://shop.example.com/confirm", opts)

Screenshots

Pass "screenshot": "true" to capture a full-page screenshot. The body comes back as a base64-encoded image; use crawlbase.ImageBytes(res) to decode into raw bytes for os.WriteFile / image.Decode.

api, _ := crawlbase.NewCrawlingAPI("YOUR_JS_TOKEN")
res, _ := api.Get("https://www.apple.com", map[string]string{
 "screenshot": "true",
})

img, err := crawlbase.ImageBytes(res)
if err != nil {
 log.Fatal(err)
}
os.WriteFile("apple.png", img, 0o644)

Context for cancellation

Every verb has a *WithContext variant for use with context.Context — useful any time the call should respect upstream cancellation, deadlines, or trace propagation (HTTP handlers, gRPC servers, anything in a request loop).

import (
 "context"
 "time"
)

api, _ := crawlbase.NewCrawlingAPI("YOUR_TOKEN")

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

res, err := api.GetWithContext(ctx, "https://example.com", nil)

Errors & retries

The platform surfaces two status codes on every response: the SDK's own res.StatusCode (HTTP status of the request to Crawlbase itself) and res.PCStatus (Crawlbase's verdict on the target — lifted out of the pc_status response header for typed access; see the Crawling API errors table for the full list). Always branch on PCStatus when deciding whether to retry — a target can return 200 with empty body, in which case StatusCode is 200 but PCStatus is 520.

res, err := api.Get(url, nil)
if err != nil {
 return err
}

switch res.PCStatus {
case 200:
 use(res.Body)
case 520, 525:
 // 520 = empty body, 525 = anti-bot couldn't be solved.
 // Switch to JS token and retry.
 retryWithJSToken(url)
case 521, 522, 523:
 // Target unreachable or timed out. Retry with backoff.
 scheduleRetry(url)
default:
 log.Printf("crawl failed: url=%s pc_status=%d", url, res.PCStatus)
}

All retries against the platform are free — only successful responses (PCStatus: 200) count against your quota.

Performance & best practices

  • Reuse a single client per token. The constructor is cheap, but each *CrawlingAPI instance has its own underlying http.Client with its own connection pool. Build it once at service init, share it across goroutines (the SDK is goroutine-safe).
  • Use the cheapest token that works. Don't default to the JavaScript token "just in case" — Normal-token requests are faster and use less concurrency. Promote on a PCStatus == 520 or 525.
  • Prefer ajax_wait over page_wait. Fixed delays burn concurrency on every request, even fast ones.
  • For batch jobs: async + webhook, or push to the Enterprise Crawler. Goroutine pools blocking on synchronous calls saturate concurrency caps quickly; async + webhook releases the slot the moment a request is queued.
  • Use GetWithContext / PostWithContext in server code. A request-scoped context propagates cancellation when the caller goes away — without it, a hung crawl will continue past the caller's deadline.

Response fields

Full method signatures, godoc, and per-method examples live on pkg.go.dev. The fields below are the bit Crawlbase users reach for most — the typed verdict on the target, returned on every *crawlbase.Response:

StatusCode
int
HTTP status of the SDK's request to Crawlbase.
PCStatus
int
Crawlbase verdict on the target. Lifted from the pc_status (or cb_status) response header for typed access. Branch on this for retry decisions.
OriginalStatus
int
HTTP status the target site returned to Crawlbase.
URL
string
Final URL after target-side redirects.
Body
string
Page content (or JSON string when format=json / scraper= was used; or base64-encoded image when screenshot=true).
Headers
map[string]string
Lower-cased response headers.
RID
string
Request ID — set when the call carried "async": "true" or "store": "true".
JSON
map[string]any
Pre-parsed JSON when the response Content-Type is JSON. Saves a json.Unmarshal step on scraper / format=json calls.