Prompt patterns · Crawlbase Documentation

Extraction from a single page

Use this when you want structured data out of a specific URL. Direct, no agent loop required.

You will be given a URL. Use the crawl_url tool to fetch it, then
extract a JSON object matching this schema:

{
  "title": string,
  "author": string | null,
  "published_date": ISO 8601 date | null,
  "main_image_url": string | null,
  "summary": string  // 2-3 sentences
}

Return ONLY the JSON object, no commentary.

URL: {url}

Tip: pin the model to JSON output mode if your client supports it. Otherwise, use a JSON parser that tolerates leading/trailing whitespace.

Multi-source research

For "what does the web say about X" tasks. Combines search and fetch.

You are a research assistant. Given a topic, you must:

1. Use search_web to find 5-8 high-quality recent sources.
2. Use crawl_url on the top 3-4 to read them in full.
3. Synthesize findings into a brief with:
   - Key facts (bulleted)
   - Points of agreement across sources
   - Points of disagreement, with attribution
   - Open questions

Always cite sources by URL. Reject low-quality results (forums,
content farms) and search again if needed.

Topic: {topic}

Change detection

For "tell me when X changes" workflows. Pair with a scheduled job.

You are monitoring this URL: {url}
The previous snapshot is in ... tags below.

Use crawl_url to fetch the current version. Compare them and report:

- Has the page changed in any meaningful way? (Ignore timestamps,
  view counts, ad rotations.)
- If yes, summarize what changed in 1-3 bullet points.
- If no, respond with the single word "UNCHANGED".


{previous_snapshot}

Visual QA

Combine the screenshot tool with the model's vision capability for layout review.

Use the screenshot tool with mode=fullpage on this URL: {url}.

Then evaluate the page on these criteria:
- Is there a clear primary call-to-action above the fold?
- Is the hero text scannable in under 3 seconds?
- Are there any obvious layout regressions (overlapping elements,
  truncated text, broken images)?

Be specific - point to coordinates or sections, not vague feelings.

Lead enrichment

For sales/marketing - start from a name + company, end with a profile.

You will receive a name and company. Your job is to enrich them
into a structured profile.

1. search_web for "{name} {company} linkedin" - find the LinkedIn URL.
2. scrape_structured with scraper=linkedin-profile on that URL.
3. search_web for "{company}" to find their domain.
4. crawl_url the company homepage and extract a 1-line description.

Return:
{
  "name": ..., "title": ..., "linkedin": ...,
  "company": ..., "company_domain": ..., "company_description": ...
}

If any step fails or returns low-confidence results, set the field
to null rather than guessing.

Always include a refusal path

AI tools fail more gracefully when you tell them what to do on failure. "Set to null rather than guessing" is much better than silently letting the model fabricate answers from training data.

General tips

Specify the schema. Don't ask for "the data on this page" - describe the exact fields you want.
Limit recursive crawling. Tell the agent how many URLs maximum it should fetch in a single turn.
Cache when you can. Use store=true to avoid re-crawling the same URL across turns.
Set page_wait for SPAs. Mention this in the prompt: "for client-rendered sites, use page_wait=2000".