MCP Server · Crawlbase Documentation

What is MCP?

The Model Context Protocol is an open standard for connecting AI assistants to external tools. The Crawlbase MCP server speaks MCP, so any compatible client - Claude Desktop, Cursor, Zed, Continue, the OpenAI Agents SDK - can use Crawlbase as a native capability.

The result: your AI can fetch a page, parse a product, take a screenshot, or search the web during a conversation. No glue code, no copy-paste between windows, no proxy server.

Same APIs, conversational interface

The MCP server is a thin wrapper over the same APIs documented in AI & MCP. Your token, your concurrency limits, your usage. The only thing that changes is who's calling - your code, or your AI.

Install

The server runs as a small Node process. Most clients launch it on demand via npx: no global install required.

# No install - let your client launch it
npx @crawlbase/mcp@latest
# Or install globally if you prefer
npm install -g @crawlbase/mcp
crawlbase-mcp
docker run -i --rm \
  -e CRAWLBASE_TOKEN=YOUR_TOKEN \
  -e CRAWLBASE_JS_TOKEN=YOUR_JS_TOKEN \
  crawlbase/mcp

Source on GitHub. Requires Node 18+ if running directly.

Configure your client

Every MCP client uses the same config shape - server name, command to run, environment variables. Drop this into your client's config file.

{
  "mcpServers": {
    "crawlbase": {
      "type": "stdio",
      "command": "npx",
      "args": ["@crawlbase/mcp@latest"],
      "env": {
        "CRAWLBASE_TOKEN": "YOUR_TOKEN",
        "CRAWLBASE_JS_TOKEN": "YOUR_JS_TOKEN"
      }
    }
  }
}

Per-client setup guides:

Claude Desktop & Claude Code - config goes in claude_desktop_config.json / claude.json
Cursor - Settings → Tools and Integrations → Add Custom MCP
VS Code & Windsurf - via Continue, Cline, or Windsurf's built-in MCP support
Codex plugin - wraps this server as a native Codex plugin

Tools exposed

The server registers three crawl tools and six storage tools. Your AI sees each as a callable function.

Crawl tools

crawl

tool

Fetch any URL and return raw HTML. Maps to the Crawling API. Accepts store: true to push results to Cloud Storage.

crawl_markdown

tool

Crawl a URL and return clean Markdown - content extracted from the HTML, optimized for LLM consumption.

crawl_screenshot

tool

Render the URL as PNG. Returned as image content the model can see directly. Accepts store: true to persist the underlying HTML page to Cloud Storage (the screenshot image itself is not stored - only the rendered HTML).

Storage tools

Six tools for retrieving and managing pages stored via store: true:

storage_get

tool

Fetch one stored page by rid or url. Choose response shape with as: "json" | "html" | "markdown".

storage_bulk_get

tool

Fetch up to 100 RIDs in one call. Pass as: "metadata_only" (default) to keep context lean - returns RID/URL/timestamps only - or as: "json" | "html" | "markdown" to include bodies. Optional auto_delete: true for fire-and-forget pipelines that drain the silo as they read.

storage_list

tool

Enumerate stored RIDs with scroll pagination, up to 1,000 per call.

storage_count

tool

Total document count in your storage silo.

storage_delete

tool

Delete one stored page by RID.

storage_bulk_delete

tool

Delete up to 100 stored pages by RID in a single call. Useful for cleaning out the silo at the end of a pipeline.

Per-token storage silos

Storage is partitioned per token. Pages crawled with CRAWLBASE_TOKEN live in a different silo from pages crawled with CRAWLBASE_JS_TOKEN. The token_type field in crawl responses ("normal" or "js") tells you which. Pass use_js_token: true to storage tools when retrieving items from the JS silo.

Example session

Once configured, your AI calls these tools naturally during conversation. A typical turn looks like:

# You
What's the current price of "Web Scraping with Python" (3rd ed.) on Amazon US, UK, and DE?

# AI (calls crawl_markdown three times in parallel)
tool_use: crawl_markdown(
  url="https://www.amazon.com/dp/1098145356"
)
tool_use: crawl_markdown(
  url="https://www.amazon.co.uk/dp/1098145356"
)
tool_use: crawl_markdown(
  url="https://www.amazon.de/dp/1098145356"
)

# AI
"Web Scraping with Python" (3rd ed.) prices right now:
- US: $59.99 (in stock)
- UK: £52.99 (in stock)
- DE: €57.99 (in stock)
The US price is the lowest after currency conversion (~£47).

Environment variables

CRAWLBASE_TOKEN

required

Your Normal token. Used by default for the crawl, crawl_markdown, and storage tools.

CRAWLBASE_JS_TOKEN

recommended

Your JavaScript token. Used for crawl_screenshot and any tool call that needs JS rendering (SPAs, client-rendered pages).

CRAWLBASE_DEFAULT_COUNTRY

optional

Default country for geo-routing (ISO code). Tools can override per-call.

CRAWLBASE_LOG_LEVEL

info

One of error, warn, info, debug. Logs go to stderr so they don't interfere with MCP stdio.

Security notes

Tokens never leave the server process. The MCP client sees tool definitions and results, not your credentials.
The model can request any URL. If you're concerned about prompt injection driving outbound requests, run with CRAWLBASE_ALLOWED_DOMAINS set to an allowlist.
Run locally. The server is designed for local stdio transport. Don't expose it over the network without an auth layer.