Log in

What is MCP?

The Model Context Protocol is an open standard for connecting AI assistants to external tools. The Crawlbase MCP server speaks MCP, so any compatible client - Claude Desktop, Cursor, Zed, Continue, the OpenAI Agents SDK - can use Crawlbase as a native capability.

The result: your AI can fetch a page, parse a product, take a screenshot, or search the web during a conversation. No glue code, no copy-paste between windows, no proxy server.

Same APIs, conversational interface

The MCP server is a thin wrapper over the same APIs documented in AI & MCP. Your token, your concurrency limits, your usage. The only thing that changes is who's calling - your code, or your AI.

Install

The server runs as a small Node process. Most clients launch it on demand via npx: no global install required.

# No install - let your client launch it
npx @crawlbase/mcp@latest
# Or install globally if you prefer
npm install -g @crawlbase/mcp
crawlbase-mcp
docker run -i --rm \
  -e CRAWLBASE_TOKEN=YOUR_TOKEN \
  -e CRAWLBASE_JS_TOKEN=YOUR_JS_TOKEN \
  crawlbase/mcp

Source on GitHub. Requires Node 18+ if running directly.

Configure your client

Every MCP client uses the same config shape - server name, command to run, environment variables. Drop this into your client's config file.

{
  "mcpServers": {
    "crawlbase": {
      "type": "stdio",
      "command": "npx",
      "args": ["@crawlbase/mcp@latest"],
      "env": {
        "CRAWLBASE_TOKEN": "YOUR_TOKEN",
        "CRAWLBASE_JS_TOKEN": "YOUR_JS_TOKEN"
      }
    }
  }
}

Per-client setup guides:

  • Claude Desktop & Claude Code - config goes in claude_desktop_config.json / claude.json
  • Cursor - Settings → Tools and Integrations → Add Custom MCP
  • VS Code & Windsurf - via Continue, Cline, or Windsurf's built-in MCP support
  • Codex plugin - wraps this server as a native Codex plugin

Tools exposed

The server registers three crawl tools and six storage tools. Your AI sees each as a callable function.

Crawl tools

crawl
tool
Fetch any URL and return raw HTML. Maps to the Crawling API. Accepts store: true to push results to Cloud Storage.
crawl_markdown
tool
Crawl a URL and return clean Markdown - content extracted from the HTML, optimized for LLM consumption.
crawl_screenshot
tool
Render the URL as PNG. Returned as image content the model can see directly. Accepts store: true to persist the underlying HTML page to Cloud Storage (the screenshot image itself is not stored - only the rendered HTML).

Storage tools

Six tools for retrieving and managing pages stored via store: true:

storage_get
tool
Fetch one stored page by rid or url. Choose response shape with as: "json" | "html" | "markdown".
storage_bulk_get
tool
Fetch up to 100 RIDs in one call. Pass as: "metadata_only" (default) to keep context lean - returns RID/URL/timestamps only - or as: "json" | "html" | "markdown" to include bodies. Optional auto_delete: true for fire-and-forget pipelines that drain the silo as they read.
storage_list
tool
Enumerate stored RIDs with scroll pagination, up to 1,000 per call.
storage_count
tool
Total document count in your storage silo.
storage_delete
tool
Delete one stored page by RID.
storage_bulk_delete
tool
Delete up to 100 stored pages by RID in a single call. Useful for cleaning out the silo at the end of a pipeline.
Per-token storage silos

Storage is partitioned per token. Pages crawled with CRAWLBASE_TOKEN live in a different silo from pages crawled with CRAWLBASE_JS_TOKEN. The token_type field in crawl responses ("normal" or "js") tells you which. Pass use_js_token: true to storage tools when retrieving items from the JS silo.

Example session

Once configured, your AI calls these tools naturally during conversation. A typical turn looks like:

# You
What's the current price of "Web Scraping with Python" (3rd ed.) on Amazon US, UK, and DE?

# AI (calls crawl_markdown three times in parallel)
tool_use: crawl_markdown(
  url="https://www.amazon.com/dp/1098145356"
)
tool_use: crawl_markdown(
  url="https://www.amazon.co.uk/dp/1098145356"
)
tool_use: crawl_markdown(
  url="https://www.amazon.de/dp/1098145356"
)

# AI
"Web Scraping with Python" (3rd ed.) prices right now:
- US: $59.99 (in stock)
- UK: £52.99 (in stock)
- DE: €57.99 (in stock)
The US price is the lowest after currency conversion (~£47).

Environment variables

CRAWLBASE_TOKEN
required
Your Normal token. Used by default for the crawl, crawl_markdown, and storage tools.
CRAWLBASE_JS_TOKEN
recommended
Your JavaScript token. Used for crawl_screenshot and any tool call that needs JS rendering (SPAs, client-rendered pages).
CRAWLBASE_DEFAULT_COUNTRY
optional
Default country for geo-routing (ISO code). Tools can override per-call.
CRAWLBASE_LOG_LEVEL
info
One of error, warn, info, debug. Logs go to stderr so they don't interfere with MCP stdio.

Security notes

  • Tokens never leave the server process. The MCP client sees tool definitions and results, not your credentials.
  • The model can request any URL. If you're concerned about prompt injection driving outbound requests, run with CRAWLBASE_ALLOWED_DOMAINS set to an allowlist.
  • Run locally. The server is designed for local stdio transport. Don't expose it over the network without an auth layer.