MCP Server
Expose every Crawlbase tool to AI assistants through the Model Context Protocol. One install and your AI can crawl, scrape, screenshot, and search the web with the same reliability you use in production.
What is MCP?
The Model Context Protocol is an open standard for connecting AI assistants to external tools. The Crawlbase MCP server speaks MCP, so any compatible client — Claude Desktop, Cursor, Zed, Continue, the OpenAI Agents SDK — can use Crawlbase as a native capability.
The result: your AI can fetch a page, parse a product, take a screenshot, or search the web during a conversation. No glue code, no copy-paste between windows, no proxy server.
The MCP server is a thin wrapper over the same APIs documented in AI & MCP. Your token, your concurrency limits, your usage. The only thing that changes is who's calling — your code, or your AI.
Install
The server runs as a small Node process. Most clients launch it on demand via npx — no global install required.
# No install — let your client launch it
npx @crawlbase/mcp@latest# Or install globally if you prefer
npm install -g @crawlbase/mcp
crawlbase-mcpdocker run -i --rm \
-e CRAWLBASE_TOKEN=YOUR_TOKEN \
-e CRAWLBASE_JS_TOKEN=YOUR_JS_TOKEN \
crawlbase/mcpSource on GitHub. Requires Node 18+ if running directly.
Configure your client
Every MCP client uses the same config shape — server name, command to run, environment variables. Drop this into your client's config file.
{
"mcpServers": {
"crawlbase": {
"type": "stdio",
"command": "npx",
"args": ["@crawlbase/mcp@latest"],
"env": {
"CRAWLBASE_TOKEN": "YOUR_TOKEN",
"CRAWLBASE_JS_TOKEN": "YOUR_JS_TOKEN"
}
}
}
}Per-client setup guides:
- Claude Desktop & Claude Code — config goes in
claude_desktop_config.json/claude.json - Cursor — Settings → Tools and Integrations → Add Custom MCP
- VS Code & Windsurf — via Continue, Cline, or Windsurf's built-in MCP support
- Codex plugin — wraps this server as a native Codex plugin
Tools exposed
The server registers three crawl tools and six storage tools. Your AI sees each as a callable function.
Crawl tools
store: true to push results to Cloud Storage.store: true to persist the underlying HTML page to Cloud Storage (the screenshot image itself is not stored — only the rendered HTML).Storage tools
Six tools for retrieving and managing pages stored via store: true:
rid or url. Choose response shape with as: "json" | "html" | "markdown".as: "metadata_only" (default) to keep context lean — returns RID/URL/timestamps only — or as: "json" | "html" | "markdown" to include bodies. Optional auto_delete: true for fire-and-forget pipelines that drain the silo as they read.Storage is partitioned per token. Pages crawled with CRAWLBASE_TOKEN live in a different silo from pages crawled with CRAWLBASE_JS_TOKEN. The token_type field in crawl responses ("normal" or "js") tells you which. Pass use_js_token: true to storage tools when retrieving items from the JS silo.
Example session
Once configured, your AI calls these tools naturally during conversation. A typical turn looks like:
# You
What's the current price of Echo Dot 4th gen on Amazon US, UK, and DE?
# AI (calls crawl_markdown three times in parallel)
tool_use: crawl_markdown(
url="https://www.amazon.com/dp/B08N5WRWNW"
)
tool_use: crawl_markdown(
url="https://www.amazon.co.uk/dp/B08N5WRWNW"
)
tool_use: crawl_markdown(
url="https://www.amazon.de/dp/B08N5WRWNW"
)
# AI
The Echo Dot 4th gen prices right now:
- US: $49.99 (in stock)
- UK: £49.99 (in stock)
- DE: €54.99 (in stock)
The US price is lowest after currency conversion (~£40.Environment variables
crawl, crawl_markdown, and storage tools.crawl_screenshot and any tool call that needs JS rendering (SPAs, client-rendered pages).error, warn, info, debug. Logs go to stderr so they don't interfere with MCP stdio.Security notes
- Tokens never leave the server process. The MCP client sees tool definitions and results, not your credentials.
- The model can request any URL. If you're concerned about prompt injection driving outbound requests, run with
CRAWLBASE_ALLOWED_DOMAINSset to an allowlist. - Run locally. The server is designed for local stdio transport. Don't expose it over the network without an auth layer.

