Google AI Mode has changed what a search result looks like. Instead of ten blue links, the AI answer surface returns a written response to your query, with a set of cited sources underneath it and a wider band of related links off to the side. For anyone tracking what Google says about a topic, that is a new and valuable dataset: not just who ranks, but the actual answer tied to a query and the pages Google chose to back it up.

This guide shows you how to scrape Google AI Mode the reliable way. You will build a small, runnable Python script that fetches the public AI Mode result through the Crawling API, which renders the JavaScript-heavy page behind a trusted IP, then extracts the AI answer, its citations, and the related links into clean structured JSON. The fetching and rotation are handled for you, so you spend your time on the data, not on keeping a headless browser and a proxy pool alive.

What Google AI Mode is and why scrape it

AI Mode is the generative answer layer inside Google Search. You reach it by adding the udm=50 parameter to a normal search URL, which tells Google to return its AI-written response instead of the classic results page. The response is composed of three parts that matter for scraping: the answer text itself, the citations Google attaches to that answer, and a broader set of reference links related to the query.

That structure is what makes it worth collecting. You can track how Google's answer to a query shifts over time, see which domains it cites and how often, and feed that into SEO research or content workflows. It is the same family of problem as building any search engine data tool, except the payload is an answer plus its sources rather than a ranked list.

What udm=50 does

The udm parameter selects a Google search vertical. udm=50 is the one that triggers AI Mode, the same way other values switch to images or news. Append it to a standard google.com/search URL along with your query and region, and Google returns the AI answer surface for that query.

What you will build

A short Python script that takes a search query, builds the AI Mode URL, sends it through the Crawling API with JavaScript rendering on, and parses the rendered result into a JSON object with three fields: the answer text, a list of citations, and a list of related links. Every snippet runs as-is once you drop in your token, and the same code works whether you point it at one query or loop it over hundreds.

Why a plain fetch fails here

If you request an AI Mode URL with a bare HTTP client, you get a response with status 200 and none of the answer in the body. Two things work against you. First, AI Mode is rendered in the browser: the answer and its citations stream in after the page's JavaScript runs, so the initial HTML is an empty shell. Second, Google challenges automated traffic fast. Datacenter IPs and request patterns that do not look like a real visitor get a CAPTCHA or a block before they ever see the rendered answer.

So a working AI Mode scraper needs two things in one request: a browser that actually renders the page, and an IP that Google reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the job. The Crawling API folds both into a single call: send it the URL with a JavaScript token, it renders the page behind a trusted IP, and it hands back the finished HTML for you to parse.

Why a JS token

Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. AI Mode is rendered client-side, so you need the JS token here. Using the normal token returns the same empty shell a plain fetch would, with no answer text to extract.

Step 1: Build the AI Mode URL

The entry point of the pipeline is a function that turns a query into a valid AI Mode URL. You set udm=50 to select AI Mode, pass the query in q, and add gl and hl to control the country and language so results are consistent run to run.

python
from urllib.parse import urlencode, quote_plus

def build_ai_mode_url(query, gl="us", hl="en"):
    if not query or not query.strip():
        raise ValueError("query must be non-empty")
    params = {
        "udm": "50",
        "q": query.strip(),
        "gl": gl,
        "hl": hl,
    }
    query_string = urlencode(params, quote_via=quote_plus)
    return f"https://www.google.com/search?{query_string}"

print(build_ai_mode_url("best ai tools for developers"))

Pass in a query and you get a consistent AI Mode URL back. From here the rest of the workflow is one straight line: send the URL through the Crawling API, then normalize the rendered result into structured fields your system can use.

Step 2: Fetch the rendered page with the Crawling API

With the URL built, fetch the rendered page. You pass two options that matter for an AI surface: ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so the streamed answer and citations have time to appear before the page is captured. Five seconds is a reasonable starting point; raise it if the answer comes back thin.

python
from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"})

def fetch_html(target_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = api.get(target_url, options)
    return response["body"].decode("utf-8")

url = build_ai_mode_url("best ai tools for developers")
html = fetch_html(url)
print(html[:500])

Run this and you should see real markup with the AI answer block in it, not the empty shell a plain fetch returns. That confirms rendering is working before you write a single selector. You are no longer dealing with browser state or anti-bot challenges; you have finished HTML ready for the parser.

Crawlbase Google Scraper

AI Mode needs a rendered page behind a trusted IP, in one call. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a public query on the free tier first.

With the rendered HTML in hand, load it into a parser and pull the three fields that matter. Use BeautifulSoup to read the answer text, then walk the citation and link blocks to collect their URLs and titles. AI Mode markup is not a public contract, so treat the selectors below as a starting template: inspect a live AI Mode page in your browser's dev tools and confirm them against the current layout.

python
from bs4 import BeautifulSoup

def extract_ai_mode(html):
    soup = BeautifulSoup(html, "html.parser")

    answer_block = soup.select_one('div[data-rl="ai-mode"], div.ai-response')
    response_text = answer_block.get_text(" ", strip=True) if answer_block else ""

    citations = []
    for cite in soup.select('a.ai-citation, div.citation a[href]'):
        href = cite.get("href", "")
        if href.startswith("http"):
            citations.append({
                "url": href,
                "title": cite.get_text(strip=True),
            })

    links = []
    for link in soup.select('div.related-links a[href^="http"]'):
        links.append({
            "url": link["href"],
            "title": link.get_text(strip=True),
        })

    return {
        "response_text": response_text,
        "citations": citations,
        "links": links,
    }

Each field maps directly to how AI Mode presents an answer. response_text is the generated answer you can store, compare, or display. citations are the sources Google attached to back that answer, each with its URL and title. links are the broader set of related results around the query. Keeping the schema this small is deliberate: three stable fields cover most dashboards and pipelines without locking you to a markup that shifts.

Selectors drift

Google does not publish stable class names for AI Mode, and the markup changes without notice. When extraction starts returning empty strings, re-inspect a live AI Mode page and update the selectors. This is normal maintenance for any production scraper, not a sign something is broken. Keeping a short raw-HTML sample alongside your output makes it easy to tell whether the issue is your parser or an upstream change.

Step 4: Wire it together and save the result

Now combine the three steps into one runnable script. It builds the URL, fetches the rendered page, extracts the fields, and writes the structured result to a JSON file you can inspect or load downstream.

python
import json
from urllib.parse import urlencode, quote_plus
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"})

def build_ai_mode_url(query, gl="us", hl="en"):
    params = {"udm": "50", "q": query.strip(), "gl": gl, "hl": hl}
    return f"https://www.google.com/search?{urlencode(params, quote_via=quote_plus)}"

def fetch_html(target_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = api.get(target_url, options)
    return response["body"].decode("utf-8")

def extract_ai_mode(html):
    soup = BeautifulSoup(html, "html.parser")
    answer_block = soup.select_one('div[data-rl="ai-mode"], div.ai-response')
    response_text = answer_block.get_text(" ", strip=True) if answer_block else ""
    citations = [
        {"url": c["href"], "title": c.get_text(strip=True)}
        for c in soup.select('a.ai-citation, div.citation a[href^="http"]')
    ]
    links = [
        {"url": l["href"], "title": l.get_text(strip=True)}
        for l in soup.select('div.related-links a[href^="http"]')
    ]
    return {"response_text": response_text, "citations": citations, "links": links}

def scrape_ai_mode(query, gl="us", hl="en"):
    url = build_ai_mode_url(query, gl, hl)
    html = fetch_html(url)
    data = extract_ai_mode(html)
    data["query"] = query
    return data

if __name__ == "__main__":
    result = scrape_ai_mode("best ai tools for developers")
    with open("ai_mode.json", "w") as f:
        json.dump(result, f, indent=2)
    print(json.dumps(result, indent=2))

Run it with python scraper.py and you get the structured answer written to ai_mode.json and echoed to the console. The scrape_ai_mode function is also the one place to import if you want to drop this into your own app: call it with a query and you get the same structured dict back, no extra parsing needed.

What the output looks like

The result is a single object with the answer text and two lists. A trimmed sample:

json
{
  "query": "best ai tools for developers",
  "response_text": "Popular AI tools for developers include code assistants, ...",
  "citations": [
    { "url": "https://example.com/ai-dev-tools", "title": "10 AI Tools for Developers" },
    { "url": "https://example.org/coding-assistants", "title": "Best Coding Assistants" }
  ],
  "links": [
    { "url": "https://example.net/dev-productivity", "title": "Developer Productivity Guide" }
  ]
}

That shape is enough to track an answer over time, count which domains Google cites most, or compare answers across regions by varying gl and hl. Store the answer text alongside a timestamp and you have a simple history of how Google's response to a query changes.

Scaling to many queries

One query is a demo; a real job runs over a list. The shape stays the same: loop the queries, scrape each one, and collect the rows. The thing to get right at volume is staying unblocked, since Google watches for scraper-shaped traffic. The Crawling API rotates IPs for you on every call, so the loop below does not have to manage proxies.

python
queries = [
    "best ai tools for developers",
    "how does web scraping work",
    "python vs javascript for automation",
]

results = []
for q in queries:
    try:
        results.append(scrape_ai_mode(q))
    except Exception as err:
        print(f"Skipped {q}: {err}")

with open("ai_mode_batch.json", "w") as f:
    json.dump(results, f, indent=2)

If you would rather route your own traffic through a rotating pool instead of using the Crawling API end to end, the Smart AI Proxy (also called the AI Proxy) gives you the same residential IP rotation as a drop-in proxy endpoint. For the broader playbook on keeping any Google scrape healthy, see how to rotate proxies for scraping Google search results and how to scrape websites without getting blocked.

Common issues and fixes

Scraping a Google surface is not a set-and-forget job. Payloads change and your parser needs room to flex. The issues you will hit most are straightforward:

  • Empty answer text. If response_text comes back blank, the page likely had not finished rendering. Raise page_wait, confirm you are using the JS token, and re-inspect the answer-block selector against a live page.
  • Auth or credit errors. A request that fails before rendering usually means a bad token or an empty balance. Check that your JS token is correct and your account has credits, and review the API response behavior when Google challenges traffic.
  • Citations or links missing. Google moves these blocks around. When they stop matching, update the citation and link selectors, and keep a raw-HTML sample so you can see exactly what shifted.

When results suddenly drop or look different, compare a recent raw-HTML sample against an older one. That usually tells you straight away whether the change is in your parsing logic or upstream in the rendered page.

Whether scraping Google AI Mode is allowed depends on Google's terms of service, your jurisdiction, and what you do with the data. Google's terms restrict automated access to its services, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it only makes the technical part work. Read Google's Terms of Service and its robots.txt before you point this at production volume, and decide accordingly.

A few lines worth holding to. Collect only public AI Mode results, the answer and the sources anyone can see without logging in. Keep your request rate low enough that you are not straining Google's systems, and respect the rate expectations its robots.txt signals. Never collect personal or private data, and do not try to reach anything behind a login. If you plan to reuse the data commercially, get permission or an official data agreement rather than assuming silence is consent.

This guide is deliberately scoped to public AI Mode results because that is the line that keeps the work defensible. It does not cover login-walled or private data, account or profile information, or any attempt to bypass authentication. If your project needs more than the public answer surface, the right move is an official Google data agreement, not a cleverer scraper.

Recap

Key takeaways

  • AI Mode is a new dataset. The udm=50 URL returns an AI answer plus its citations and related links, not a ranked list, so you collect answers tied to queries.
  • A plain fetch returns an empty shell. AI Mode is rendered client-side and Google challenges bots, so you must render the page behind a trusted IP before you parse it.
  • One call handles both. The Crawling API with a JS token renders the page and rotates IPs server-side; ajax_wait and page_wait control how long it waits for the answer.
  • Three fields are enough. Normalize the result into response_text, citations, and links, and keep a raw-HTML sample so you can tell parser issues from upstream changes.
  • Stay on public data. Respect Google's ToS and robots.txt; no login-walled or personal data, no auth bypass.

Frequently Asked Questions (FAQs)

What is udm=50 in a Google search URL?

udm=50 is the search parameter that triggers Google AI Mode. The udm value selects a search vertical, and 50 is the one that returns the AI answer surface instead of the classic list of links. Append it to a standard google.com/search URL along with your query, country, and language, and Google returns the AI Mode result for that query.

Why does a plain fetch return no answer from Google AI Mode?

Because AI Mode is rendered client-side. The answer and its citations stream in after the page's JavaScript runs, so a raw HTTP request returns status 200 with an empty shell. Google also challenges automated traffic quickly. To get the real answer you have to render the page behind a trusted IP, which is what the Crawling API's JS token handles for you.

Do I need the normal token or the JS token?

The JS token. The normal token fetches static HTML, which for AI Mode is the same empty shell a plain fetch returns. The JS token renders the page in a real browser first, so the answer, citations, and links are present when your parser reads them.

What data can I extract from a Google AI Mode result?

Three stable fields cover most needs: the generated answer text, the citations Google attaches to that answer (each with a URL and title), and the broader set of related links around the query. Keeping the schema small makes the pipeline resilient, since you are not locked to a markup that Google changes without notice.

My selectors return empty strings. What changed?

Almost certainly Google's markup. AI Mode has no published, stable class names, so selectors that worked last month can break. Re-inspect a live AI Mode page in your browser's dev tools and update the answer, citation, and link selectors. Keeping a short raw-HTML sample alongside your output makes it easy to see what moved.

It depends on Google's terms of service, your jurisdiction, and your purpose, and Google's terms restrict automated access. Keep strictly to public AI Mode results, respect robots.txt and rate expectations, and never touch login-walled content, personal data, or authentication. For commercial reuse, get permission or an official data agreement rather than relying on a scraper.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available