Docs
Log in

What is MCP?

The Model Context Protocol is an open standard for connecting AI assistants to external tools. The Crawlbase MCP server speaks MCP, so any compatible client — Claude Desktop, Cursor, Zed, Continue, the OpenAI Agents SDK — can use Crawlbase as a native capability.

The result: your AI can fetch a page, parse a product, take a screenshot, or search the web during a conversation. No glue code, no copy-paste between windows, no proxy server.

Same APIs, conversational interface

The MCP server is a thin wrapper over the same APIs documented in AI & MCP. Your token, your concurrency limits, your usage. The only thing that changes is who's calling — your code, or your AI.

Install

The server runs as a small Node process. Most clients launch it on demand via npx — no global install required.

# No install — let your client launch it
npx @crawlbase/mcp@latest
# Or install globally if you prefer
npm install -g @crawlbase/mcp
crawlbase-mcp
docker run -i --rm \
  -e CRAWLBASE_TOKEN=YOUR_TOKEN \
  -e CRAWLBASE_JS_TOKEN=YOUR_JS_TOKEN \
  crawlbase/mcp

Source on GitHub. Requires Node 18+ if running directly.

Configure your client

Every MCP client uses the same config shape — server name, command to run, environment variables. Drop this into your client's config file.

{
  "mcpServers": {
    "crawlbase": {
      "type": "stdio",
      "command": "npx",
      "args": ["@crawlbase/mcp@latest"],
      "env": {
        "CRAWLBASE_TOKEN": "YOUR_TOKEN",
        "CRAWLBASE_JS_TOKEN": "YOUR_JS_TOKEN"
      }
    }
  }
}

Per-client setup guides:

  • Claude Desktop & Claude Code — config goes in claude_desktop_config.json / claude.json
  • Cursor — Settings → Tools and Integrations → Add Custom MCP
  • VS Code & Windsurf — via Continue, Cline, or Windsurf's built-in MCP support
  • Codex plugin — wraps this server as a native Codex plugin

Tools exposed

The server registers three crawl tools and six storage tools. Your AI sees each as a callable function.

Crawl tools

crawl
tool
Fetch any URL and return raw HTML. Maps to the Crawling API. Accepts store: true to push results to Cloud Storage.
crawl_markdown
tool
Crawl a URL and return clean Markdown — content extracted from the HTML, optimized for LLM consumption.
crawl_screenshot
tool
Render the URL as PNG. Returned as image content the model can see directly. Accepts store: true to persist the underlying HTML page to Cloud Storage (the screenshot image itself is not stored — only the rendered HTML).

Storage tools

Six tools for retrieving and managing pages stored via store: true:

storage_get
tool
Fetch one stored page by rid or url. Choose response shape with as: "json" | "html" | "markdown".
storage_bulk_get
tool
Fetch up to 100 RIDs in one call. Pass as: "metadata_only" (default) to keep context lean — returns RID/URL/timestamps only — or as: "json" | "html" | "markdown" to include bodies. Optional auto_delete: true for fire-and-forget pipelines that drain the silo as they read.
storage_list
tool
Enumerate stored RIDs with scroll pagination, up to 1,000 per call.
storage_count
tool
Total document count in your storage silo.
storage_delete
tool
Delete one stored page by RID.
storage_bulk_delete
tool
Delete up to 100 stored pages by RID in a single call. Useful for cleaning out the silo at the end of a pipeline.
Per-token storage silos

Storage is partitioned per token. Pages crawled with CRAWLBASE_TOKEN live in a different silo from pages crawled with CRAWLBASE_JS_TOKEN. The token_type field in crawl responses ("normal" or "js") tells you which. Pass use_js_token: true to storage tools when retrieving items from the JS silo.

Example session

Once configured, your AI calls these tools naturally during conversation. A typical turn looks like:

# You
What's the current price of Echo Dot 4th gen on Amazon US, UK, and DE?

# AI (calls crawl_markdown three times in parallel)
tool_use: crawl_markdown(
  url="https://www.amazon.com/dp/B08N5WRWNW"
)
tool_use: crawl_markdown(
  url="https://www.amazon.co.uk/dp/B08N5WRWNW"
)
tool_use: crawl_markdown(
  url="https://www.amazon.de/dp/B08N5WRWNW"
)

# AI
The Echo Dot 4th gen prices right now:
- US: $49.99 (in stock)
- UK: £49.99 (in stock)
- DE: €54.99 (in stock)
The US price is lowest after currency conversion (~£40.

Environment variables

CRAWLBASE_TOKEN
required
Your Normal token. Used by default for the crawl, crawl_markdown, and storage tools.
CRAWLBASE_JS_TOKEN
recommended
Your JavaScript token. Used for crawl_screenshot and any tool call that needs JS rendering (SPAs, client-rendered pages).
CRAWLBASE_DEFAULT_COUNTRY
optional
Default country for geo-routing (ISO code). Tools can override per-call.
CRAWLBASE_LOG_LEVEL
info
One of error, warn, info, debug. Logs go to stderr so they don't interfere with MCP stdio.

Security notes

  • Tokens never leave the server process. The MCP client sees tool definitions and results, not your credentials.
  • The model can request any URL. If you're concerned about prompt injection driving outbound requests, run with CRAWLBASE_ALLOWED_DOMAINS set to an allowlist.
  • Run locally. The server is designed for local stdio transport. Don't expose it over the network without an auth layer.