Go
Official Go client for the Crawlbase platform. Idiomatic Go — error returns instead of exceptions, context.Context support on every verb, zero external dependencies (only net/http + stdlib).
How the SDK is shaped
The Go SDK is intentionally lean. One client — CrawlingAPI — covers every Crawlbase product through the unified Crawling API endpoint:
| Use case | Pass in options |
|---|---|
| Plain crawl | (nothing — the default) |
| Built-in scraper | "scraper": "amazon-product-details" (and the rest of the catalog) |
| Screenshot | "screenshot": "true" |
| Email extraction | "scraper": "email-extractor" |
| Async + webhook | "async": "true" + "callback": "https://..." |
| Push to Enterprise Crawler | "async": "true" + "callback" + "crawler": "YourCrawler" |
The standalone /scraper, /leads, and /screenshots endpoints (which the older Crawlbase SDKs wrap with separate client classes) have been closed to new sign-ups since 2024. The Go SDK ships only the modern path — one client, every product, no vestigial classes.
What you get for using it instead of net/http directly:
- URL encoding, parameter validation, and response parsing handled out of the box.
- Idiomatic Go surface —
(result, error)returns, named struct fields, no panics on transport failures. context.Contextsupport on every verb via*WithContextvariants for cancellation / deadlines / trace propagation.- Sensible defaults (90-second timeout, transparent gzip decompression, automatic JSON parsing of
format=json/scraper=responses).
Source on github.com/crawlbase/crawlbase-go. Reference on pkg.go.dev. Issues + PRs welcome.
Install
Latest version on pkg.go.dev. Requires Go 1.21+.
go get github.com/crawlbase/crawlbase-go@latest
# Or pin a specific version
go get github.com/crawlbase/[email protected]Authentication
Every Crawlbase API authenticates with the same token model. Two token types live on a single account:
- Normal Token (TCP) — for static HTML, JSON endpoints, anything that doesn't need a browser. Faster + cheaper.
- JavaScript Token
— for SPAs, lazy-loaded feeds, anything that hides content behind client-side rendering. Required to use
page_wait,ajax_wait,scroll, andcss_click_selector.
Use environment variables in production. The SDK doesn't read env vars itself — that's deliberate so you stay in control of where credentials come from. Pattern:
package main
import (
"log"
"os"
"github.com/crawlbase/crawlbase-go"
)
func main() {
// Pick the right token at instantiation; the SDK doesn't switch
// tokens per-call, so keep two clients if you alternate.
api, err := crawlbase.NewCrawlingAPI(os.Getenv("CRAWLBASE_TOKEN"))
if err != nil {
log.Fatal(err)
}
js, err := crawlbase.NewCrawlingAPI(os.Getenv("CRAWLBASE_JS_TOKEN"))
if err != nil {
log.Fatal(err)
}
api.Get("https://github.com/anthropic", nil)
js.Get("https://feed.example.com", map[string]string{"page_wait": "2000"})
}The constructor returns crawlbase.ErrTokenRequired if the token string is empty. Full token model + dashboard locations on the Authentication page.
Quickstart
Three lines from import to crawled HTML:
package main
import (
"fmt"
"log"
"github.com/crawlbase/crawlbase-go"
)
func main() {
api, err := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
if err != nil {
log.Fatal(err)
}
res, err := api.Get("https://github.com/anthropic", nil)
if err != nil {
log.Fatal(err)
}
if res.StatusCode == 200 {
fmt.Println(res.Body)
}
}Branch on res.StatusCode (the SDK's HTTP status to Crawlbase) and res.PCStatus (the Crawlbase verdict — see Errors below) when deciding whether to retry. Pass map[string]string{"format": "json"} to receive a JSON envelope instead of raw page content (auto-parsed into res.JSON).
Common patterns
JavaScript rendering
For SPAs, lazy-loaded feeds, and pages where the initial HTML is empty, instantiate with the JavaScript token and pass any combination of page_wait, ajax_wait, scroll, and css_click_selector. Order to think about: a fixed wait, then network-idle, then scroll for lazy-load, then click for any gating UI element.
api, _ := crawlbase.NewCrawlingAPI("YOUR_JS_TOKEN")
res, err := api.Get("https://spa.example.com", map[string]string{
"page_wait": "2000",
"ajax_wait": "true",
"scroll": "true",
})Use a built-in scraper
Skip the parser entirely on supported sites. Pass "scraper": "NAME" and the response Body becomes a JSON string with the structured fields documented on the per-scraper page. The body is also pre-decoded into res.JSON so you can read fields directly.
api, _ := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
res, err := api.Get(
"https://www.amazon.com/dp/B08N5WRWNW",
map[string]string{"scraper": "amazon-product-details"},
)
if err != nil {
log.Fatal(err)
}
if name, ok := res.JSON["name"].(string); ok {
fmt.Println(name)
}Geo-routing
Pass "country": "ISO" to route the crawl through that country's exit nodes. Use it any time the target serves localized content based on IP.
api, _ := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
// Hit the German Amazon catalog from a German residential IP
res, _ := api.Get(
"https://www.amazon.com/dp/B08N5WRWNW",
map[string]string{"country": "DE"},
)Retry with backoff
The recommended retry shape: exponential backoff capped at 3-5 attempts, retry on transient errors only (5xx or empty body), don't retry on 4xx.
import (
"fmt"
"math"
"math/rand"
"time"
"github.com/crawlbase/crawlbase-go"
)
func Crawl(api *crawlbase.CrawlingAPI, url string, attempts int) (*crawlbase.Response, error) {
for i := 0; i < attempts; i++ {
res, err := api.Get(url, nil)
if err != nil {
return nil, err
}
if res.StatusCode == 200 && res.PCStatus == 200 {
return res, nil
}
if res.StatusCode >= 400 && res.StatusCode < 500 {
return nil, fmt.Errorf("client error %d: %s", res.StatusCode, url)
}
// Exponential backoff with jitter
d := time.Duration(rand.Float64() * math.Pow(2, float64(i)) * float64(time.Second))
time.Sleep(d)
}
return nil, fmt.Errorf("failed: %s", url)
}Async crawls + webhooks
Fire-and-forget mode. The SDK call returns immediately with an RID; Crawlbase POSTs the result to your callback URL when the page is ready. Useful for batch jobs and slow targets.
api, _ := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
res, _ := api.Get("https://example.com", map[string]string{
"async": "true",
"callback": "https://your-app.com/webhook",
})
rid := res.RID // correlate the eventual webhook delivery
// Your net/http handler receives a POST with:
// { rid, url, original_status, pc_status, body }For very high volumes (millions of URLs), push to the Enterprise Crawler by adding "crawler": "YourCrawlerName" alongside the async + callback options.
Sticky sessions
Some flows need the same residential IP across multiple calls. Pass cookies_session with a stable identifier and Crawlbase reuses the same exit node for ~30 minutes.
api, _ := crawlbase.NewCrawlingAPI("YOUR_JS_TOKEN")
session := fmt.Sprintf("checkout-%d", userID)
opts := map[string]string{"cookies_session": session}
api.Get("https://shop.example.com/cart", opts)
api.Get("https://shop.example.com/checkout", opts)
api.Get("https://shop.example.com/confirm", opts)Screenshots
Pass "screenshot": "true" to capture a full-page screenshot. The body comes back as a base64-encoded image; use crawlbase.ImageBytes(res) to decode into raw bytes for os.WriteFile / image.Decode.
api, _ := crawlbase.NewCrawlingAPI("YOUR_JS_TOKEN")
res, _ := api.Get("https://www.apple.com", map[string]string{
"screenshot": "true",
})
img, err := crawlbase.ImageBytes(res)
if err != nil {
log.Fatal(err)
}
os.WriteFile("apple.png", img, 0o644)Context for cancellation
Every verb has a *WithContext variant for use with context.Context — useful any time the call should respect upstream cancellation, deadlines, or trace propagation (HTTP handlers, gRPC servers, anything in a request loop).
import (
"context"
"time"
)
api, _ := crawlbase.NewCrawlingAPI("YOUR_TOKEN")
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
res, err := api.GetWithContext(ctx, "https://example.com", nil)Errors & retries
The platform surfaces two status codes on every response: the SDK's own res.StatusCode (HTTP status of the request to Crawlbase itself) and res.PCStatus (Crawlbase's verdict on the target — lifted out of the pc_status response header for typed access; see the Crawling API errors table for the full list). Always branch on PCStatus when deciding whether to retry — a target can return 200 with empty body, in which case StatusCode is 200 but PCStatus is 520.
res, err := api.Get(url, nil)
if err != nil {
return err
}
switch res.PCStatus {
case 200:
use(res.Body)
case 520, 525:
// 520 = empty body, 525 = anti-bot couldn't be solved.
// Switch to JS token and retry.
retryWithJSToken(url)
case 521, 522, 523:
// Target unreachable or timed out. Retry with backoff.
scheduleRetry(url)
default:
log.Printf("crawl failed: url=%s pc_status=%d", url, res.PCStatus)
}All retries against the platform are free — only successful responses (PCStatus: 200) count against your quota.
Performance & best practices
- Reuse a single client per token.
The constructor is cheap, but each
*CrawlingAPIinstance has its own underlyinghttp.Clientwith its own connection pool. Build it once at service init, share it across goroutines (the SDK is goroutine-safe). - Use the cheapest token that works.
Don't default to the JavaScript token "just in case" — Normal-token requests are faster and use less concurrency. Promote on a
PCStatus == 520or525. - Prefer
ajax_waitoverpage_wait. Fixed delays burn concurrency on every request, even fast ones. - For batch jobs: async + webhook, or push to the Enterprise Crawler. Goroutine pools blocking on synchronous calls saturate concurrency caps quickly; async + webhook releases the slot the moment a request is queued.
- Use
GetWithContext/PostWithContextin server code. A request-scoped context propagates cancellation when the caller goes away — without it, a hung crawl will continue past the caller's deadline.
Response fields
Full method signatures, godoc, and per-method examples live on pkg.go.dev. The fields below are the bit Crawlbase users reach for most — the typed verdict on the target, returned on every *crawlbase.Response:
pc_status (or cb_status) response header for typed access. Branch on this for retry decisions.format=json / scraper= was used; or base64-encoded image when screenshot=true)."async": "true" or "store": "true".json.Unmarshal step on scraper / format=json calls.
