Airbyte · Crawlbase Documentation

Coming soon - preview of how it will work

The dedicated Crawlbase Airbyte source connector is in development. The setup + streams below are a preview of the shipped flow. Email us to be notified when it lands.

Need it today? Use Airbyte's HTTP API source against the Crawling API, or push results to Cloud Storage and ingest the bucket via Airbyte's S3 source - both work end-to-end without the dedicated connector.

Setup

In your Airbyte instance, go to Sources → New Source.
Search for Crawlbase and select it.
Configure: paste your token, choose a Crawler (the queue you push URLs to), pick which streams to sync.
Test the connection, save, and connect to a destination.

Streams

crawl_results

incremental

Every completed crawl, one row per URL. Columns: rid, url, pc_status, original_status, completed_at, body, headers.

scraper_outputs

incremental

Structured scraper results, with per-scraper schemas (Amazon, Google, etc.) automatically inferred and exposed as nested columns.

crawler_status

full refresh

Snapshot of crawler queue health: queued, in-progress, completed/failed counts per crawler.

Patterns

Hourly product price warehouse: push product URLs to a Crawler with the Amazon scraper. Sync every hour. Build a dbt model on top to flag price drops.
Compliance archive: daily full-page crawls of regulated sites, synced to S3 via Airbyte. Time-stamped, schemaed, queryable.
SEO competitive monitoring: SERPs scraped weekly, synced to BigQuery, dashboarded in Looker.