Google is the front door to most of the web's public information, and the page it returns for a query is far more than a list of blue links. A single search results page can carry organic listings, paid ads, a "People Also Ask" box, a knowledge panel, a local pack with a map, and a row of related searches, each one a distinct block of structured data. For anyone doing keyword research, rank tracking, competitor analysis, or market intelligence, that page is one of the richest public datasets on the internet.

This guide is the broad pillar on how to scrape Google search pages: what a SERP is actually made of, why a plain HTTP request gets blocked, the realistic methods for collecting the data, and a runnable example that parses the feature blocks and paginates. If you want a tight, copy-paste Python walkthrough instead, our focused tutorial on how to scrape Google search results with Python is the companion piece. Everything here stays scoped to public SERP data that anyone can see without an account, and the legality section near the end is not boilerplate, so read it before you point anything at real volume.

What you will build

By the end you will have a small Python script that sends a public Google search URL to a SERP-capable API, receives parsed JSON, and reads each feature block into clean records. We use the query "data science" as the running example and pull these fields from the response:

  • Search results. The organic listings, each with a title, destination URL, displayed snippet, and position on the page.
  • Ads. The sponsored listings shown above or below the organic results when the query is commercial.
  • People Also Ask. The expandable related-question box, each entry with its question, a short answer, and a source link.
  • Related searches. The row of suggested follow-up queries Google shows at the bottom of the page.
  • Local pack. The map-backed block of nearby places, when the query has local intent.
  • Result count. The reported total number of results, which drives pagination.

The anatomy of a Google search results page

To scrape a SERP well you first need a mental model of how it is assembled. Google does not return one uniform list; it composes the page from several distinct blocks, and which blocks appear depends on the query's intent. The same parser has to recognize each one. Here are the parts that matter.

  • Organic results. The core list of ranked web pages. Each carries a title, the destination URL, a snippet, and sometimes a date or sitelinks. This is the block most SEO work cares about, because position here is what "ranking" means.
  • Ads. Paid listings marked "Sponsored," shown at the top and bottom of the page for commercial queries. They look similar to organic results but are bought, so competitive-intelligence work treats them as a separate stream.
  • People Also Ask (PAA). An expandable list of related questions. Each one opens to reveal a short answer drawn from a source page, plus a link to that source. PAA is a useful map of the questions searchers attach to a topic.
  • Knowledge panel. The information box on the right (people, companies, places, things) built from Google's Knowledge Graph. It surfaces key facts, images, and related entities without the user clicking through.
  • Local pack. A map plus a short list of nearby businesses, shown when the query has local intent ("coffee shops near me"). It carries names, ratings, and addresses, and differs from full Google Maps results.
  • Related searches. A row of suggested queries at the foot of the page, useful for keyword expansion.
  • Pagination. The controls that move you deeper into the results, driven by a start offset in the URL rather than a page number.

The practical takeaway is that "scraping Google" is really scraping a handful of independent blocks that happen to share a page. A good extractor returns each one as its own list so you can use them separately downstream. If you want a deeper tour of the question box specifically, see how to scrape Google's People Also Ask.

Why a plain request fails on Google

If you fire a bare HTTP request at https://www.google.com/search?q=... from a script, you almost never get the page you see in your own browser. Google is one of the most aggressively defended targets on the web, for two reasons that compound each other.

First, the page is dynamic. Much of a modern SERP, especially the feature blocks, is assembled with JavaScript after the initial HTML lands. A raw requests.get sees a skeleton, not the rendered result, so the data you want is simply not in the bytes you receive. Second, Google watches hard for automation. A request that lacks real browser headers, comes from a datacenter IP, or arrives faster than a human could type gets a consent interstitial, a CAPTCHA, or an outright block. Sustained scraping from a single address gets that address rate-limited quickly.

So a working Google scraper needs two things in one request: an IP the platform reads as an ordinary visitor, and, where the page leans on scripts, a browser that renders it. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but keeping that fleet healthy is most of the work and most of the cost. The alternative is a managed SERP-capable API that folds rendering, IP rotation, and CAPTCHA handling into a single call and hands you finished data. We cover the trade-off between the two next.

Methods to scrape Google: managed API vs DIY

There are two honest ways to do this at any real scale, and it is worth being clear-eyed about both.

The DIY stack. You run a headless browser (Playwright or Selenium) to render the page, route it through a pool of rotating residential proxies so no single IP burns out, add header and fingerprint management so the traffic looks human, and build retry and CAPTCHA-solving logic for the requests that still get challenged. This works, and for a one-off academic project it can be the right call. The catch is maintenance: proxies expire, fingerprints get stale, Google ships layout changes, and your CAPTCHA solver needs feeding. The scraper is the easy part; keeping it unblocked is the job. Our guides on scraping without getting blocked and bypassing CAPTCHAs go deep on this stack.

The managed API. You send a URL to an endpoint that already owns the residential IP pool, the rendering layer, and the CAPTCHA handling, and it returns either the finished HTML or, better, the SERP parsed into JSON. You skip the infrastructure entirely and pay per successful request. The trade-off is that you depend on a provider and its parser. For most production rank-tracking and research work this is the pragmatic choice, because the time you would spend babysitting a proxy fleet is worth more than the per-request fee. The example below uses the Crawlbase Crawling API, which has a built-in Google SERP parser.

The rest of this guide takes the managed-API path because it lets us show real parsed output without 200 lines of proxy plumbing. The parsing concepts (which block is which, how the fields map) transfer to the DIY stack unchanged if you go that route.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you want to brush up on the parsing side for a DIY approach, our guide to using BeautifulSoup in Python covers the basics.

Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.

A Crawlbase account and token. Sign up, open your dashboard, and copy your request token. Your first 1,000 requests are free with no credit card, and you only pay for successful requests. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the lightweight Crawlbase Python wrapper, which sends requests to the API and reads the responses.

bash
python --version

python -m venv serp_env
source serp_env/bin/activate

pip install crawlbase

On Windows, activate the environment with serp_env\Scripts\activate instead of the source line. The crawlbase package is a small, dependency-free wrapper around the API: you initialize it with your token and call get with a URL. SQLite, which we use to store results later, ships with Python, so there is nothing extra to install for that.

Step 1: Fetch a SERP through the Crawling API

Google's search pages serve fine through the Crawling API's Normal token, since the parser reads the server-rendered markup. Initialize the client with your token, point it at a search URL, and request the JSON response format so you get a clean envelope instead of raw HTML.

python
from crawlbase import CrawlingAPI
import json

# Initialize the Crawling API with your Crawlbase token
api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

# URL of the Google search page you want to scrape
google_search_url = "https://www.google.com/search?q=data+science"

# Ask the API for a JSON response envelope
options = {"format": "json"}

response = api.get(google_search_url, options)

if response["headers"]["pc_status"] == "200":
    body = json.loads(response["body"].decode("latin1"))
    print(json.dumps(body, indent=4, sort_keys=True))
else:
    print("Failed to retrieve the page. Status:", response["status_code"])

The API returns a JSON envelope with two status fields and a body. pc_status is the Crawlbase status, and original_status is what Google itself returned; guarding on pc_status == "200" means a block or interstitial surfaces as a failure instead of feeding garbage downstream. The body comes back as bytes, so you decode it with latin1 before json.loads. Running this prints the raw page envelope, which confirms the fetch works before you add parsing.

Crawlbase Google Scraper

That pc_status check only ever reads 200 because the request reached Google as an ordinary visitor in the first place. The Crawling API fetches the SERP from a rotating residential IP, renders it when a feature block needs a browser, and handles the CAPTCHAs that would otherwise stop a raw request, so you skip running a headless fleet and a residential proxy pool yourself. Point it at a public search URL on the free tier first.

Step 2: Parse the SERP feature blocks

Reading the raw HTML and writing selectors for every feature block by hand is possible, but Google ships layout changes often and the class names are obfuscated, so hand-written selectors break constantly. The Crawling API ships a built-in google-serp scraper that does this parsing for you and returns each block as a clean list. You turn it on with one option.

python
from crawlbase import CrawlingAPI
import json

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

google_search_url = "https://www.google.com/search?q=data+science"

# The google-serp scraper parses the page into structured JSON
options = {"scraper": "google-serp"}

response = api.get(google_search_url, options)

if response["headers"]["pc_status"] == "200":
    parsed = json.loads(response["body"].decode("latin1"))
    serp = parsed["body"]

    # Each feature block comes back as its own key
    organic = serp.get("searchResults", [])
    ads = serp.get("ads", [])
    paa = serp.get("peopleAlsoAsk", [])
    related = serp.get("relatedSearches", [])
    local = serp.get("snackPack", {})
    total = serp.get("numberOfResults", 0)

    print(f"Organic results: {len(organic)}")
    print(f"Ads: {len(ads)}")
    print(f"People Also Ask: {len(paa)}")
    print(f"Related searches: {len(related)}")
    print(f"Total results reported: {total}")
else:
    print("Failed to retrieve the page. Status:", response["status_code"])

The google-serp scraper returns a body object whose keys map directly to the feature blocks from the anatomy section. searchResults is the organic list, ads the sponsored listings, peopleAlsoAsk the question box, relatedSearches the suggestion row, and snackPack the local pack with its map link and place results. numberOfResults is the reported total, which you will use to drive pagination. Each entry in searchResults carries a title, url, description, and position, so the records are ready to use without any selector work.

Which blocks appear depends on the query

A commercial query surfaces ads; an informational one fills the People Also Ask box; a local query populates the snack pack. Always read each key with .get(key, default) so a missing block returns an empty list rather than raising a KeyError. The same query can also return different blocks on different days as Google tunes its layout.

Step 3: Assemble the full script

Now wire the fetch and the parse into one runnable script that pulls the organic results and the question box into flat records and writes them to JSON. This is the shape you would extend for a real job.

python
from crawlbase import CrawlingAPI
import json

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})
options = {"scraper": "google-serp"}

def fetch_serp(url):
    response = api.get(url, options)
    if response["headers"]["pc_status"] != "200":
        raise Exception(f"Unable to crawl '{url}'")
    parsed = json.loads(response["body"].decode("latin1"))
    return parsed["body"]

def extract(serp):
    results = []
    for item in serp.get("searchResults", []):
        results.append({
            "position": item.get("position"),
            "title": item.get("title"),
            "url": item.get("url"),
            "description": item.get("description"),
        })
    questions = [q.get("title") for q in serp.get("peopleAlsoAsk", [])]
    related = [r.get("title") for r in serp.get("relatedSearches", [])]
    return {
        "searchResults": results,
        "peopleAlsoAsk": questions,
        "relatedSearches": related,
        "numberOfResults": serp.get("numberOfResults", 0),
    }

def main():
    url = "https://www.google.com/search?q=data+science"
    serp = fetch_serp(url)
    data = extract(serp)
    with open("google_results.json", "w", encoding="utf-8") as f:
        json.dump(data, f, ensure_ascii=False, indent=2)
    print(f"Saved {len(data['searchResults'])} organic results")

if __name__ == "__main__":
    main()

Run it with python main.py. The fetch_serp function gets the parsed SERP and raises on any non-200 so failures are loud, and extract flattens the blocks you care about into a single object. Swap the query in the URL and the same two functions handle whatever comes back. To follow each People Also Ask question to its full answer and source, the per-entry description and destination fields are already in the response, so you can widen extract without another request.

What the output looks like

The google-serp scraper returns a structured object keyed by feature block. Here is a trimmed sample of what comes back for the "data science" query, so you can see the field names before you write code against them.

json
{
  "numberOfResults": 2520000000,
  "ads": [],
  "peopleAlsoAsk": [
    {
      "position": 1,
      "title": "What exactly does a data scientist do?",
      "description": "A data scientist uses data to understand and explain the phenomena around them...",
      "destination": { "text": "Coursera", "url": "https://www.coursera.org/articles/what-is-a-data-scientist" }
    }
  ],
  "relatedSearches": [
    { "title": "data science jobs", "url": "https://google.com/search?q=Data+science+jobs" }
  ],
  "searchResults": [
    {
      "position": 1,
      "title": "What is Data Science?",
      "url": "https://www.ibm.com/topics/data-science",
      "description": "Data science combines math and statistics, specialized programming..."
    }
  ],
  "snackPack": { "mapLink": "", "results": [] }
}

Every block is its own key, so you can take just the organic searchResults for rank tracking, just peopleAlsoAsk for content research, or just ads for competitive analysis. Because the ad and local blocks are empty for an informational query like this one, your code should always default-read each key rather than assume it is present.

Pagination across the results

One page is a demo; real SERP work goes deeper. Google paginates with the start query parameter, an offset into the results list rather than a page number. With roughly nine to ten organic results per page, start=10 gives you the second page, start=20 the third, and so on. The numberOfResults field from the parser is your upper bound, though in practice Google rarely serves more than a few hundred results for a single query.

python
import time

base = "https://www.google.com/search?q=data+science"
all_results = []

for page in range(3):
    start = page * 10
    url = f"{base}&start={start}"
    serp = fetch_serp(url)
    all_results.extend(extract(serp)["searchResults"])
    time.sleep(2)  # pace the crawl

print(f"Collected {len(all_results)} results across 3 pages")

Build each page URL with the offset, fetch it through the API, and parse it with the same function. The one habit that keeps a long run healthy is pacing: a short sleep between requests spreads your traffic out instead of firing it in a tight loop. To persist what you collect, write each record's title, URL, description, and position into a database; SQLite is the simplest option since it ships with Python and needs no server. From there you can also save the same rows to CSV for a spreadsheet, or load them into whatever analysis tool you use.

Scaling and staying unblocked

Moving from a handful of queries to thousands changes the problem from "parse the page" to "stay unblocked over time." A few habits make a long-running job durable.

  • Pace and vary. Spread requests out and rotate across many queries rather than paging one term at full speed. Bursty, repetitive traffic is the fastest way to get challenged.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right, and our guide on rotating proxies for Google search results covers it in depth.
  • Retry on failure for free. Crawlbase does not charge for failed requests, so a blocked or unavailable URL costs you nothing to retry. Build a small retry with backoff rather than dropping pages.
  • Watch for layout drift. Google ships SERP changes often. A managed parser absorbs most of these for you, but if a field starts coming back empty, the layout likely moved, so check the response shape before assuming your code is wrong.

If you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart AI Proxy gives you the same residential IP rotation as a drop-in proxy endpoint. For jobs that need to fan out across many queries at once, an asynchronous crawler lets you queue URLs and collect results as they finish rather than waiting on each request in series. And for the specific problem of consent and verification pages, our walkthrough on bypassing CAPTCHAs while scraping Google goes deeper than we can here.

Whether scraping Google is allowed depends on Google's terms of service, your jurisdiction, and what you do with the data. Scraping public search-results pages, the titles, links, snippets, and positions that anyone can see without logging in, is the kind of public data that courts in several jurisdictions have treated as collectible. That said, Google's terms of service restrict automated access to its services, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Google's terms and its robots.txt, and treat both as the boundary for what you collect and how fast.

A few lines worth holding to. Collect only public SERP data and stay away from anything behind a login, from personal data about individuals, and from copyrighted media you would redistribute from the linked destinations. Keep your request volume modest enough that you are not straining Google's infrastructure, and pace your crawl rather than running it flat out. The goal is structured public data for research, SEO, and market analysis, not bulk re-publication of Google's index.

Where an official API exists, prefer it. Google offers sanctioned products for specific needs, the Custom Search JSON API for programmatic search over a defined scope, and Search Console for your own site's performance data. Those give you structured access within clear terms. They do not cover every use case a SERP scraper does, which is why public-data scraping persists, but when your project fits an official endpoint, that endpoint is the cleaner path. If you are weighing the broader trade-offs, our piece on the challenges of scraping Google search results is worth a read.

Recap

Key takeaways

  • A SERP is many blocks, not one list. Organic results, ads, People Also Ask, the knowledge panel, and the local pack are independent blocks; parse each one separately.
  • Plain requests get blocked. Google renders feature blocks with JavaScript and challenges datacenter traffic, so you need a rendered page from a trusted residential IP.
  • Managed API or DIY. A SERP-capable API folds rendering, rotation, and CAPTCHA handling into one call; the DIY stack works but the maintenance is the real cost.
  • Paginate with the start offset. Increase start in steps of ten to walk deeper, use numberOfResults as the bound, and pace requests with a sleep.
  • Stay on public data. Respect Google's ToS and robots.txt, avoid logins and personal data, keep volume modest, and prefer an official API where one fits.

Frequently Asked Questions (FAQs)

What is a Google SERP made of?

A search engine results page is assembled from several distinct blocks: organic results (the ranked web listings), ads (paid listings on commercial queries), People Also Ask (an expandable question box), the knowledge panel (an entity box from Google's Knowledge Graph), the local pack (a map plus nearby businesses), and related searches. Which blocks appear depends on the query's intent, so a good scraper reads each block independently and tolerates the ones that are absent.

Why does a plain request fail on Google?

Two reasons compound. Much of a modern SERP, especially the feature blocks, is rendered with JavaScript, so a raw HTTP request sees a skeleton rather than the finished page. Google also watches hard for automation and challenges traffic from datacenter IPs or requests without real browser headers with a consent page or CAPTCHA. Fetching through a SERP-capable API that uses rotating residential IPs and rendering makes the request look like an ordinary visitor.

Should I use a managed API or build my own scraper?

For a one-off project the DIY stack (a headless browser plus rotating proxies) can be fine. For production rank tracking or research at scale, a managed API is usually the pragmatic choice because the ongoing work is keeping the scraper unblocked, not writing it: proxies expire, fingerprints go stale, and Google ships layout changes. The parsing concepts transfer between the two, so the choice is mostly about who maintains the infrastructure.

How do I scrape Google search results with Python specifically?

The example here uses the Crawlbase Python wrapper with the google-serp scraper to get parsed JSON without writing selectors. For a tight, end-to-end Python walkthrough that builds the request and parsing step by step, see our focused tutorial on how to scrape Google search results with Python, which is the companion to this broader pillar.

How do I paginate through more Google results?

Use the start query parameter, an offset into the results rather than a page number. With roughly ten organic results per page, start=10 is the second page, start=20 the third, and so on. Build each URL with the offset, fetch it through the API, parse with the same function, and pause a couple of seconds between requests so you are pacing the crawl rather than hammering it.

Public SERP data, the titles, links, snippets, and positions anyone can see without an account, is generally treated as collectible, but Google's terms of service restrict automated access, so scraping can conflict with those terms. Read Google's terms and robots.txt, avoid logins and personal data, keep volume modest, and prefer an official endpoint such as the Custom Search JSON API or Search Console where it fits your use case.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available