If you have ever wired an AI agent up to the live web, you already know where it falls apart. The agent reasons fine, but the moment it needs real page content it hits a wall: the site renders client-side, the HTML is a tangle, or the request gets challenged before any data comes back. The fix is not a smarter prompt. It is giving the agent a tool that returns clean, structured web data on demand, and letting the agent decide when to call it.
That is exactly what the Crawlbase Web MCP server provides. This guide shows you how to build AI agent workflows around the Crawlbase Web MCP: the planning loop the agent runs, the MCP tool calls it makes to scrape and crawl, and a concrete end-to-end example that takes a URL, fetches the rendered page, and returns a structured answer. No custom scraping code, no proxy pool to babysit, no parsing rules baked into the agent.
What the Crawlbase Web MCP adds to an agent
MCP, the Model Context Protocol, is the open standard that lets a language model call external tools through a consistent interface. An MCP server publishes a set of tools, and any MCP-aware client (Claude Desktop, Cursor, n8n, or your own agent) can discover and invoke them. The Crawlbase MCP server publishes web-access tools, so the agent gains the ability to read any public URL the way a real browser would.
Under the hood, those tools are backed by the same Crawling API that powers the rest of Crawlbase. That means the agent inherits JavaScript rendering, residential IP rotation, anti-bot handling, retries, and clean output without knowing any of it exists. From the agent's point of view it just called a tool and got back readable content. For a fuller tour of what the server exposes, see our introduction to the Crawlbase MCP.
The Web MCP typically exposes two tools your agent will reach for:
- crawl fetches a single URL and returns the rendered page as clean markdown or HTML, ready for the model to read.
- crawl_markdown (or a screenshot/structured variant, depending on your server build) returns the same content trimmed to readable text, which keeps token usage down on long pages.
You could hand the agent a plain HTTP-request tool instead. On modern sites it rarely holds up: most pages render client-side and challenge automated traffic, so raw fetches return empty shells or blocks. The MCP tool routes through the Crawling API, which renders the page behind a trusted IP and returns finished content, so the agent gets real data on the first call rather than a retry loop.
The agent loop, step by step
An agent workflow is a loop, not a straight line. The model plans, picks a tool, reads the result, and decides whether it has enough to answer or needs another call. With the Web MCP wired in, that loop looks like this:
- Receive the task. The agent gets an instruction that usually contains a URL or a topic to research.
- Plan. It reasons about whether it can answer from what it knows or whether it needs live web data.
-
Call the MCP tool. When it needs the page, it invokes
crawlwith the target URL. - Read the result. Crawlbase returns clean, rendered content, which the model ingests as tool output.
- Decide. Enough to answer? It writes the structured response. Not yet? It loops back, crawling another URL or refining the query.
- Return. It hands back a clean, structured result in whatever shape you asked for.
The important shift is that the decision to scrape is made by the agent, not hardcoded by you. You describe the goal; the agent figures out which pages it needs and when to fetch them.
Step 1: Run the Crawlbase Web MCP server
Any MCP client connects to the server through a small config block. You point the client at the Crawlbase MCP package and pass your token through the environment. Here is a typical configuration for a desktop MCP client.
{ "mcpServers": { "crawlbase": { "command": "npx", "args": ["-y", "@crawlbase/mcp"], "env": { "CRAWLBASE_TOKEN": "YOUR_CRAWLBASE_JS_TOKEN" } } } }
Use your JavaScript (JS) token here. Crawlbase issues two token types: the normal token fetches static HTML, while the JS token renders the page in a real browser first. Because most sites worth crawling are client-side rendered, the JS token is the safe default for agent work. You get both tokens from the dashboard after signing up.
If you are running an agent platform like n8n instead of a desktop client, you connect to a hosted MCP endpoint over HTTP rather than spawning the process locally. The full n8n setup is covered in connecting n8n with the Crawlbase Web MCP; the rest of this guide builds the agent in code so you can see the loop directly.
Step 2: Build the agent that calls the MCP tools
Now wire a real agent to the server. The pattern below uses Python with an MCP client library and a tool-calling model. The agent connects to the Crawlbase MCP server, discovers the available tools, and hands them to the model so it can decide when to crawl.
import asyncio from mcp import ClientSession, StdioServerParameters from mcp.client.stdio import stdio_client server = StdioServerParameters( command="npx", args=["-y", "@crawlbase/mcp"], env={"CRAWLBASE_TOKEN": "YOUR_CRAWLBASE_JS_TOKEN"}, ) async def connect(): async with stdio_client(server) as (read, write): async with ClientSession(read, write) as session: await session.initialize() tools = await session.list_tools() print([t.name for t in tools.tools]) return session asyncio.run(connect())
Running this prints the tool names the server exposes, confirming the agent can see crawl and its siblings before you ask the model to use them. That discovery step is what makes the workflow portable: swap in a different MCP server later and the agent adapts to whatever tools it finds.
Step 3: Give the model a tool-calling loop
With the session live, the loop is straightforward. You give the model the task and the tool list, let it emit a tool call, run that call against the MCP server, feed the result back, and repeat until the model stops calling tools and writes its answer.
async def run_agent(session, model, task): messages = [{"role": "user", "content": task}] tools = (await session.list_tools()).tools while True: reply = await model.chat(messages, tools=tools) if not reply.tool_calls: return reply.content for call in reply.tool_calls: result = await session.call_tool(call.name, call.arguments) messages.append({ "role": "tool", "tool_call_id": call.id, "content": result.content, })
That while loop is the entire agent. The model plans, calls crawl when it wants the page, reads the markdown Crawlbase returns, and either answers or crawls again. You never tell it which URL to fetch or when to fetch it; you describe the outcome and it routes itself there.
Step 4: Steer the agent with a system prompt
The one place ambiguity creeps in is whether the model trusts that it should use the tool at all. A short, explicit system message removes the doubt and locks in a consistent output shape.
SYSTEM = """You are a web research assistant with crawl tools. Always use the crawl tool to read a URL before answering about it. Never guess page contents from memory. After crawling, extract only the fields requested and return them as structured JSON.""" task = ( "Crawl https://www.example-store.com/product/123 and return " "the product name, price, rating, and a one-line summary." )
With that in place, a single run produces a clean object: the agent crawls the page, reads the rendered content Crawlbase returns, and emits exactly the fields you asked for. This is the same idea behind structured AI data extraction, except the model decides for itself when to reach for the page.
The Web MCP server gives your agent live web access in one tool call. It is backed by the Crawling API, so every crawl renders JavaScript behind a rotating residential IP and returns clean markdown, with no proxy pool or headless fleet for you to run. Point an agent at a public page on the free tier first.
A concrete workflow: competitor price watch
Tie the pieces together with a workflow you would actually run. Say you want a daily check on a handful of competitor product pages: current price, availability, and any promo banner. You give the agent the list and let it work through it.
urls = [ "https://competitor-a.com/p/widget", "https://competitor-b.com/p/widget", ] async def price_watch(session, model): rows = [] for url in urls: task = f"Crawl {url}. Return price, in_stock, promo as JSON." rows.append(await run_agent(session, model, task)) return rows
Each iteration runs the full agent loop: the model crawls the URL through the MCP tool, Crawlbase renders it and rotates the IP, and the agent returns a structured row. The output is a tidy array you can diff against yesterday's run, push to a sheet, or alert on when a price moves.
The same skeleton flexes to other jobs without rewrites. Swap the task string and you have a news monitor, a research assistant that gathers notes across several sources, or a lead-enrichment step over public company pages. Because the agent talks to Crawlbase through one stable tool, pointing it at a new site needs no new API wiring. For more on where this fits, the AI proxy use cases roundup walks through adjacent patterns.
Tuning crawls for tough pages
Most pages crawl cleanly with defaults, but heavy single-page apps sometimes need a hint. The MCP tools accept the same waiting options the Crawling API uses, so you can pass them in the tool arguments when a page renders late. Two matter most: an ajax-wait flag that holds for asynchronous content, and a page-wait value in milliseconds for a fixed pause after load.
{ "url": "https://www.example-store.com/product/123", "ajax_wait": true, "page_wait": 5000 }
If results come back thin, raise page_wait before reaching for anything else. You can let the agent set these itself by describing the page in the system prompt ("for slow single-page apps, wait for ajax content"), or hardcode them in a wrapper when you know the target is heavy. Either way the rendering, rotation, and retry behavior stays on the Crawlbase side; the agent just reads the result.
If a site is so hostile that even rendered crawls struggle, the Smart AI Proxy gives you a single rotating endpoint to route requests through, and the Crawling API returns pre-parsed JSON for popular sites when you would rather skip the model parsing the page at all. Both share the same infrastructure the MCP tools sit on.
Keeping the workflow reliable
A few habits keep an agent workflow healthy in production. Add a check after each run so a failed crawl is visible instead of silently producing an empty row. Pace your requests when looping many URLs rather than firing them all at once. Persist the structured output somewhere, a database or even a spreadsheet, so you can look back and diff over time. And tune the prompt per target when needed: one generic instruction across very different sites usually gives weaker results than a few site-specific lines.
When an agent reports that "no tools were used," it almost always means the model was not confident it should crawl. Tightening the system message and making sure the URL is clearly in the task resolves it. For connection problems, check that the MCP server is running, confirm the token is set in the environment, and list the tools first to prove the handshake works before debugging the model.
Key takeaways
- The MCP server is the agent's web access. It publishes crawl tools any MCP-aware client can discover and call, backed by the Crawling API.
- The agent owns the decision to scrape. You describe the goal; the model plans, calls the tool when it needs a page, reads the result, and loops or answers.
- Use the JS token. It renders client-side pages in a real browser, which is what most modern sites require to return real content.
- The loop is portable. Discover tools at runtime and the same agent adapts to new sites with no new API wiring.
-
Tune with wait options. Pass
ajax_waitandpage_waitfor heavy single-page apps; raisepage_waitfirst when results come back thin. - Add guardrails. Check for failed crawls, pace requests, persist output, and tune prompts per target.
Frequently Asked Questions (FAQs)
What is the Crawlbase Web MCP and how does an agent use it?
The Crawlbase Web MCP is a Model Context Protocol server that publishes web-access tools, chiefly a crawl tool, to any MCP-aware AI agent. The agent connects to the server, discovers the tools, and calls them when it needs live page content. Each call is backed by the Crawling API, so the agent receives rendered, clean content without writing any scraping code.
Do I need the normal token or the JS token for agent workflows?
Use the JS token for agent work. The normal token fetches static HTML, which on modern client-side sites is an empty shell. The JS token renders the page in a real browser before returning it, so the content the agent reads actually contains the data. You get both tokens from the Crawlbase dashboard after signing up.
Which AI agents and platforms work with the Crawlbase Web MCP?
Any MCP-compatible client works, including Claude Desktop, Cursor, Windsurf, and agent platforms like n8n, as well as custom agents you build with an MCP client library. As long as the client can connect to the server and call tools, it can use the Crawlbase crawl tools.
Can the agent scrape JavaScript-heavy sites without extra setup?
Yes. The crawl tool renders JavaScript automatically through the Crawling API, so the agent receives fully rendered content without you running Puppeteer or Selenium. For pages that load late, pass ajax_wait and a larger page_wait in the tool arguments and the API holds until the content appears.
How does this avoid getting blocked?
The MCP tools route through the Crawling API, which rotates residential IPs, manages browser fingerprinting, and handles anti-bot challenges and retries server-side. The agent never sees that machinery; it just gets clean content back. Keep your request rate reasonable when looping many URLs and the workflow stays healthy.
How is this different from giving the agent a plain HTTP request tool?
A raw HTTP tool returns whatever the server sends, which on most modern sites is an unrendered shell or a block. The Crawlbase MCP tool renders the page behind a trusted IP and returns finished content on the first call, so the agent spends its turns reasoning about real data instead of retrying failed fetches.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

