Docs
Log in
Coming soon — preview of how it will work

The dedicated Crawlbase Airbyte source connector is in development. The setup + streams below are a preview of the shipped flow. Email us to be notified when it lands.

Need it today? Use Airbyte's HTTP API source against the Crawling API, or push results to Cloud Storage and ingest the bucket via Airbyte's S3 source — both work end-to-end without the dedicated connector.

Setup

  1. In your Airbyte instance, go to Sources → New Source.
  2. Search for Crawlbase and select it.
  3. Configure: paste your token, choose a Crawler (the queue you push URLs to), pick which streams to sync.
  4. Test the connection, save, and connect to a destination.

Streams

crawl_results
incremental
Every completed crawl, one row per URL. Columns: rid, url, pc_status, original_status, completed_at, body, headers.
scraper_outputs
incremental
Structured scraper results, with per-scraper schemas (Amazon, Google, etc.) automatically inferred and exposed as nested columns.
crawler_status
full refresh
Snapshot of crawler queue health: queued, in-progress, completed/failed counts per crawler.

Patterns

  • Hourly product price warehouse: push product URLs to a Crawler with the Amazon scraper. Sync every hour. Build a dbt model on top to flag price drops.
  • Compliance archive: daily full-page crawls of regulated sites, synced to S3 via Airbyte. Time-stamped, schemaed, queryable.
  • SEO competitive monitoring: SERPs scraped weekly, synced to BigQuery, dashboarded in Looker.