Docs
Log in

What it does

The Crawlbase Codex plugin wraps Crawlbase MCP as a Codex-native plugin. Once installed, you can ask Codex to crawl a page, extract its content, or capture a screenshot in plain English — Codex picks the right tool, calls Crawlbase, and returns the result.

Powered by Crawlbase's infrastructure: JavaScript rendering, automatic proxy rotation, and built-in anti-bot bypass. Same reliability you use in production, conversational interface in Codex.

Source

The plugin is open source: github.com/crawlbase/crawlbase-codex-plugin. Issues and PRs welcome.

Prerequisites

You need a Crawlbase account and two API tokens:

CRAWLBASE_TOKEN
required
Normal token — used for static pages.
CRAWLBASE_JS_TOKEN
required
JavaScript token — used for JS-rendered pages and all screenshots.

Grab both from your dashboard. See Authentication for the difference.

Install from Codex Marketplace

  1. Open Codex and go to Plugins → Browse Marketplace.
  2. Search for Crawlbase Web Scraper.
  3. Click Install.
  4. Add your CRAWLBASE_TOKEN and CRAWLBASE_JS_TOKEN when prompted.
Marketplace listing coming soon

The marketplace listing is still in review. Use manual installation below in the meantime.

Manual installation

Clone into your Codex plugins directory and set environment variables:

# Clone the plugin into Codex's plugins directory
git clone https://github.com/crawlbase/crawlbase-codex-plugin \
  ~/.codex/plugins/crawlbase-mcp

# Set your tokens
export CRAWLBASE_TOKEN=YOUR_TOKEN
export CRAWLBASE_JS_TOKEN=YOUR_JS_TOKEN

# Restart Codex — the plugin auto-discovers

Usage

Once installed, ask Codex naturally. It will pick the right tool and call Crawlbase under the hood.

# Crawling
"Crawl https://example.com and return the HTML"
"Get the markdown content of https://example.com/article"
"Take a screenshot of https://example.com"

# Device emulation
"Fetch the page at https://example.com using a mobile browser"
"Take a full-page screenshot of https://example.com and describe what you see"

Tools exposed

The plugin registers three crawl tools and six storage tools.

Crawl tools

crawl
tool
Fetch any URL and return raw HTML. Accepts store: true to push the page to Cloud Storage instead of returning inline.
crawl_markdown
tool
Crawl a URL and return clean Markdown — content extracted from HTML noise, optimized for LLM consumption. Supports store: true.
crawl_screenshot
tool
Render the URL as PNG. The screenshot is returned ephemerally via screenshot_url — the underlying HTML can be persisted with store: true but the image itself is not stored.

Storage tools

storage_get
tool
Fetch one stored page by rid or url. Pass as: "json", "html", or "markdown" to choose the response shape.
storage_bulk_get
tool
Fetch up to 100 RIDs in one call. Optional delete_after flag for fire-and-forget pipelines.
storage_list
tool
Enumerate stored RIDs with scroll pagination, up to 1,000 per call.
storage_count
tool
Total document count in your storage silo.
storage_delete
tool
Delete a single stored page by RID.
storage_bulk_delete
tool
Delete up to 100 RIDs in one call.

Storage usage examples

"Crawl https://example.com and store it in Crawlbase Cloud Storage"
"List all stored pages in Crawlbase"
"Fetch rid abc123 from storage as markdown"
"Bulk-retrieve these 50 rids and delete them afterward"
"How many pages do I have in Crawlbase storage?"

Per-token storage silos

Storage is partitioned per token. Pages crawled with CRAWLBASE_TOKEN live in a separate silo from pages crawled with CRAWLBASE_JS_TOKEN (which covers JS-rendered pages and all screenshots).

Every crawl response includes a token_type field — "normal" or "js" — that tells you which silo a result landed in. When calling any storage tool, pass use_js_token: true if the item lives in the JS silo. Otherwise omit it.

Querying the wrong silo returns "Not found"

If storage_get returns a not-found error for a RID you know exists, you're probably querying the wrong silo. Try again with use_js_token: true (or remove it if you had it set).