Scrape Google Search Results with Python

Q: How do I paginate through more Google results?

Google paginates in sets of 10 through the start query parameter: page two is start=10, page three is start=20, and so on. The scrape_google_results function turns a page number into that offset, and scrape_all_pages loops over a page range, stopping when a page returns no organic results. Keep a short sleep between pages so you are pacing the crawl.

Q: Do I need to handle JavaScript myself when scraping Google?

No. Google's SERP needs JavaScript to render, but the google-serp scraper renders the page for you server-side, so you do not run a headless browser or manage a render fleet. You send the search URL with the scraper enabled and get back fully rendered, parsed results.

Google handles billions of searches a day, and the results page behind each query is a structured snapshot of what ranks for a term: the organic titles, the links they point to, and the snippet text under each one. That makes the public Google SERP a useful signal for keyword research, rank tracking, and competitor analysis. The data sits in plain view on the results page; the work is getting at it reliably.

This is the focused, runnable version of that job. You will set up Python, fetch a rendered Google results page through the Crawling API, parse each organic result into a title, link, and snippet, page through deeper results, and export to JSON and CSV. The walkthrough stays scoped to public search-results data that anyone can see without an account. For the wider tour of SERP structure, multiple approaches, and scale, read the broader pillar, how to scrape Google search pages; this post keeps its head down and ships a working scraper.

What you will build

A Python script that takes a search query, retrieves the rendered Google results page through the Crawling API, and returns a clean record for every organic result on the page. We will use a sample query as the running example and pull these fields from each result:

Position the rank of the result on the page, counted from the top.
Title the headline text of the result, as shown in the listing.
Link the destination URL the result points to.
Snippet the displayed description or summary under the title.
Related searches the suggested follow-up queries Google lists at the foot of the page.

Why a plain request fails on Google

Fire a bare HTTP request at google.com/search from a script and you almost never get the page you see in your own browser. Two things work against you. First, Google now leans on JavaScript to render the results page: as of 2025 the SERP requires scripts to be enabled, so a plain fetch that does not execute JavaScript comes back with an empty shell instead of the listings. Second, Google watches for automated traffic. Too many requests from one IP, requests that come too fast, or requests that do not look like a real browser get met with a CAPTCHA, a rate limit, or a block.

So a working Google scraper needs two things in one request: an IP the search engine reads as an ordinary visitor, and a browser that renders the page. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but keeping that fleet healthy is most of the work. The Crawling API folds both into a single call. It also ships a built-in google-serp scraper that fetches the rendered page from a trusted rotating IP and returns the organic results already parsed into JSON, so you skip writing brittle selectors against Google's frequently changing markup.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are newer to scraping in general, our guide to scraping websites with Python covers the groundwork this tutorial assumes.

Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda. Any editor works; VS Code, PyCharm, and Jupyter are all fine for this.

A Crawlbase account and token. Sign up, open your dashboard, and copy your request token. You get up to 20,000 free requests, no credit card required, and the built-in google-serp scraper works with the normal token. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the Crawlbase Python library, which wraps the Crawling API in a small client.

bash

python --version

python -m venv google_env
source google_env/bin/activate

pip install crawlbase

On Windows, activate the environment with google_env\Scripts\activate instead of the source line. The crawlbase package gives you a CrawlingAPI client so you do not have to build the request URL by hand. Because the google-serp scraper returns parsed JSON, you do not even need an HTML parsing library for the organic results in this tutorial.

Step 1: Fetch the rendered SERP through the Crawling API

Start by getting the data. Write a small scrape_google_results() function that builds the search URL, sends it to the Crawling API with the google-serp scraper enabled, checks the status, and returns the parsed body. Google paginates in sets of 10, so the function takes a page number and translates it into the start offset Google expects in the URL.

python

import json
from urllib.parse import quote_plus
from crawlbase import CrawlingAPI

# Replace with your token from the Crawlbase dashboard
crawling_api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

def scrape_google_results(query, page=0):
    encoded = quote_plus(query)
    url = f"https://www.google.com/search?q={encoded}&start={page * 10}"
    options = {"scraper": "google-serp"}
    response = crawling_api.get(url, options)

    if response["headers"]["cb_status"] == "200":
        data = json.loads(response["body"].decode("latin1"))
        return data.get("body", {})

    print(f"Failed to fetch results for '{query}' (page {page}).")
    return {}

if __name__ == "__main__":
    results = scrape_google_results("web scraping tools", page=0)
    print(json.dumps(results.get("searchResults", [])[:2], indent=2))

The options dict is the whole trick: passing {"scraper": "google-serp"} tells the Crawling API to render the page, bypass the bot checks, and return the SERP already parsed. The client gives you back a response whose headers["cb_status"] is Crawlbase's own status for the crawl, so guarding on "200" before parsing keeps a block or a failed render loud instead of feeding garbage downstream. The body comes back as bytes, which you decode with latin1 and load as JSON, then read the parsed SERP out of its body key. Run the script and you should see the first two organic results print as JSON, which confirms the fetch and parse work before you build anything on top.

Crawlbase Google Scraper

That cb_status (legacy pc_status) reads 200 because the request reached Google as a real visitor, with the page rendered and the bot checks handled, before the google-serp scraper turned it into the JSON you just printed. The Crawling API does the JavaScript rendering and rotates residential IPs server-side, so you skip running a headless browser fleet and sourcing a proxy pool yourself. Start on the free tier and point it at a public results URL first.

Start free

Step 2: Parse organic results into clean records

The google-serp scraper returns the SERP already structured, so parsing is a matter of reading the fields you want rather than writing CSS selectors. The organic listings live under the searchResults key, each one carrying its position, title, url, and description. The page also hands you relatedSearches, and the scraper exposes ads, peopleAlsoAsk, and a local snackPack when the query triggers them. Write a small function that picks out the fields you care about and normalizes the snippet field name.

python

def parse_results(serp):
    organic = []
    for item in serp.get("searchResults", []):
        organic.append({
            "position": item.get("position"),
            "title": item.get("title"),
            "link": item.get("url"),
            "snippet": item.get("description"),
        })

    related = [r.get("title") for r in serp.get("relatedSearches", [])]

    return {"organic": organic, "relatedSearches": related}

This keeps the four fields most rank-tracking and research jobs need: the position straight from Google, the title, the destination link (read from the result's url field), and the snippet (read from description). Pulling relatedSearches alongside gives you the suggested follow-up queries for free, which is handy for keyword expansion. Because the scraper does the field extraction, you are insulated from Google's markup changes; you only ever touch these stable JSON keys.

Why the parsed scraper, not raw HTML

You could fetch the raw rendered HTML and parse it with BeautifulSoup, but Google's result-container class names are obfuscated and change often, so hand-written selectors break regularly. The google-serp scraper returns stable JSON keys like searchResults, position, and relatedSearches instead, which is far less maintenance. If you do want the selector approach for other sites, our guide to XPath and CSS selectors covers it.

Step 3: Handle pagination

One page is a demo; a real job walks deeper into the results. Google paginates in sets of 10 through the start parameter, so page two is start=10, page three is start=20, and so on, which the page-to-offset math in scrape_google_results already handles. Loop over a page range, stop early when a page comes back empty, and collect every organic record into one list. Pacing the loop with a short sleep keeps a long run healthy.

python

import time

def scrape_all_pages(query, max_pages=3):
    all_organic = []
    for page in range(max_pages):
        print(f"Scraping page {page + 1}...")
        serp = scrape_google_results(query, page)
        parsed = parse_results(serp)

        if not parsed["organic"]:
            print("No more results, stopping.")
            break

        all_organic.extend(parsed["organic"])
        time.sleep(2)

    return all_organic

The loop runs from page 0 up to max_pages, fetches each page, parses it, and extends the running list. The if not parsed["organic"]: break guard stops the moment a page has no organic results, so you do not keep paying for empty pages past the end of the listings. The time.sleep(2) between pages spreads the requests out instead of firing them in a tight loop, which is the single best habit for staying unblocked over a long run.

Step 4: Assemble the full script and export

Now wire the fetch, parse, and pagination into one runnable script, and write the output to both JSON and CSV. JSON keeps the full nested shape; CSV gives you a flat table you can open in a spreadsheet for quick rank checks.

python

import csv
import json
import time
from urllib.parse import quote_plus
from crawlbase import CrawlingAPI

crawling_api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

def scrape_google_results(query, page=0):
    encoded = quote_plus(query)
    url = f"https://www.google.com/search?q={encoded}&start={page * 10}"
    options = {"scraper": "google-serp"}
    response = crawling_api.get(url, options)
    if response["headers"]["cb_status"] == "200":
        data = json.loads(response["body"].decode("latin1"))
        return data.get("body", {})
    print(f"Failed to fetch results for '{query}' (page {page}).")
    return {}

def parse_results(serp):
    organic = []
    for item in serp.get("searchResults", []):
        organic.append({
            "position": item.get("position"),
            "title": item.get("title"),
            "link": item.get("url"),
            "snippet": item.get("description"),
        })
    return organic

def scrape_all_pages(query, max_pages=3):
    all_organic = []
    for page in range(max_pages):
        print(f"Scraping page {page + 1}...")
        organic = parse_results(scrape_google_results(query, page))
        if not organic:
            print("No more results, stopping.")
            break
        all_organic.extend(organic)
        time.sleep(2)
    return all_organic

def save_json(rows, filename):
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(rows, f, ensure_ascii=False, indent=2)

def save_csv(rows, filename):
    fields = ["position", "title", "link", "snippet"]
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fields)
        writer.writeheader()
        writer.writerows(rows)

if __name__ == "__main__":
    query = "web scraping tools"
    rows = scrape_all_pages(query, max_pages=2)
    save_json(rows, "google_results.json")
    save_csv(rows, "google_results.csv")
    print(f"Saved {len(rows)} results to JSON and CSV")

Run it with python main.py. It scrapes two pages for "web scraping tools", flattens every organic result into one list, and writes both google_results.json and google_results.csv. To scrape a different term, change the query string; to go deeper, raise max_pages. Setting ensure_ascii=False keeps non-Latin titles readable in the JSON file rather than escaping them into \u sequences.

What the output looks like

The JSON file is an ordered list of organic records, each with its position, title, link, and snippet, ready to drop into a database or a rank-tracking sheet.

json

[
  {
    "position": 1,
    "title": "Web Scraper - The #1 web scraping extension",
    "link": "https://webscraper.io/",
    "snippet": "The most popular web scraping extension. Start scraping in minutes."
  },
  {
    "position": 2,
    "title": "ParseHub | Free web scraping - The most powerful web scraper",
    "link": "https://www.parsehub.com/",
    "snippet": "ParseHub is a free web scraping tool. Turn any site into a spreadsheet or API."
  }
]

The CSV holds the same records as a flat table with one row per result and the columns position, title, link, and snippet, which is the format most spreadsheet-based rank tracking expects.

Scaling across queries and staying unblocked

The script above scrapes one query across a few pages. A production job usually runs many queries and may track them over time, but the shape does not change: loop your list of queries, call scrape_all_pages for each, and tag every record with the query it came from before you save. The Crawling API handles the rendering and IP rotation per request, so scaling is mostly about pacing yourself and watching the status. A few habits keep a long run healthy.

Pace your requests. Keep the sleep between pages and queries rather than firing in a tight loop. Spreading traffic out is the single best defense against being challenged.
Read the status. Watch the cb_status on each response. A run that starts failing is telling you to back off, not noise to ignore.
Retry on transient failures. Any 5XX from the API is free of charge, so retrying a failed crawl costs you nothing; a short backoff before the retry usually clears it.
Lean on rotation. The built-in residential rotation spreads requests across many real-user addresses. If you would rather route your own traffic, the Smart AI Proxy gives you the same rotation as a drop-in endpoint.

For the broader playbook, see how to scrape websites without getting blocked. If you also want to pull the question boxes Google shows mid-page, our focused guide on scraping Google's People Also Ask covers that block specifically.

Is it legal to scrape Google search results?

Whether scraping Google is allowed depends on Google's terms of service, your jurisdiction, and what you do with the data. Google's terms place limits on automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Google's terms and its robots.txt, and treat both as the boundary for what you collect.

A few lines are worth holding to. Collect only public search-results data: the titles, links, snippets, and positions that anyone can see on a results page without an account. Keep your request volume modest so you are not straining Google's servers, and pace your crawl rather than running it flat out. Do not scrape personal data, do not pull content from behind a login, and do not redistribute copyrighted media you reach through the result links. These are the lines that keep the work defensible.

If your project needs sanctioned, high-volume access, Google offers official products for it, including the Custom Search JSON API and programmatic search through Google Cloud, which is the correct path for that scale rather than a cleverer scraper. This guide is deliberately scoped to public SERP pages because that is the line that keeps the work clean. For more on the specific obstacles Google puts in front of scrapers and how to handle them responsibly, see our write-up on the challenges of scraping Google search results.

Recap

Key takeaways

A plain request fails on Google. The 2025 SERP needs JavaScript to render and Google blocks scraper-shaped traffic, so you need rendering plus a trusted IP in one request.
The google-serp scraper does the heavy lifting. Pass {"scraper": "google-serp"} and the Crawling API renders the page, handles the bot checks, and returns the SERP as parsed JSON.
Read stable JSON keys, not selectors. Pull position, title, url, and description from searchResults, which insulates you from Google's changing markup.
Paginate with the start offset. Increase start in multiples of 10 to walk deeper, stop when a page comes back empty, and sleep between pages.
Stay on public data. Respect Google's ToS and robots.txt, keep volume modest, prefer the official Custom Search API for high volume, and never touch personal or logged-in data.

Frequently Asked Questions (FAQs)

Why does a plain request return an empty page from Google?

As of 2025 Google's results page requires JavaScript to render, so a bare HTTP request that does not execute scripts comes back as an empty or skeletal shell. Google also blocks traffic that looks automated. Fetching through the Crawling API with the google-serp scraper renders the page from a trusted rotating IP and returns the listings as parsed JSON, so you get real results instead of a blank page.

Can I scrape Google search results with Python?

Yes. With the Crawlbase Python library you call the Crawling API, enable the built-in google-serp scraper, and read titles, links, snippets, and positions straight out of the returned JSON. The API acts as the bridge that gets your request to Google from a trusted IP with the page rendered, so requests are processed smoothly instead of being blocked. For a wider view of the SERP and other approaches, see the broader guide to scraping Google search pages.

How is this different from the broader Google scraping guide?

This post is the focused, runnable Python tutorial: set up, fetch, parse organic results, paginate, and export, with copy-pasteable code. The broader pillar, how to scrape Google search pages, covers the full SERP structure (ads, People Also Ask, knowledge panel, local pack), multiple approaches, and scaling strategy. Start here if you want code now; read the pillar for the wider map.

How do I paginate through more Google results?

Google paginates in sets of 10 through the start query parameter: page two is start=10, page three is start=20, and so on. The scrape_google_results function turns a page number into that offset, and scrape_all_pages loops over a page range, stopping when a page returns no organic results. Keep a short sleep between pages so you are pacing the crawl.

Do I need to handle JavaScript myself when scraping Google?

No. Google's SERP needs JavaScript to render, but the google-serp scraper renders the page for you server-side, so you do not run a headless browser or manage a render fleet. You send the search URL with the scraper enabled and get back fully rendered, parsed results.

How do I store the scraped Google results?

Any structured format works. The full script in this guide writes both a JSON file, which keeps the nested shape, and a CSV file, which gives you a flat table of position, title, link, and snippet for spreadsheets. From there you can load the records into a database or a rank-tracking pipeline. Stay within public search-results data and avoid anything behind a login.

Hassan Rehan

Software Engineer · Crawlbase

Software engineer at Crawlbase writing hands-on guides on rotating proxies, scraping, and the practical details of wiring proxies into real code.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. Up to 20,000 requests free, no card required.

Get a free API key →Read the docs

Self-serve · No sales call required · Enterprise crawl volumes available

What you will build

Why a plain request fails on Google

Prerequisites

Set up the project

Step 1: Fetch the rendered SERP through the Crawling API

Step 2: Parse organic results into clean records

Step 3: Handle pagination

Step 4: Assemble the full script and export

What the output looks like

Scaling across queries and staying unblocked

Is it legal to scrape Google search results?

Key takeaways

Frequently Asked Questions (FAQs)

Why does a plain request return an empty page from Google?

Can I scrape Google search results with Python?

How is this different from the broader Google scraping guide?

How do I paginate through more Google results?

Do I need to handle JavaScript myself when scraping Google?

How do I store the scraped Google results?

Crawl any site at scale, without fighting infrastructure.

Continue Reading

How to Scrape Google People Also Ask: full PAA extraction guide

Introducing the New Crawlbase Dashboard: a cleaner control center

13 Tips to Master Data Crawling: crawls that do not break

The infrastructure brief, in your inbox.

We use cookies

Customize cookies