Google Flights is where most travellers start when they compare fares before booking. It lays out competing airlines side by side with prices, departure and arrival times, durations, and stop counts, which makes its public results a useful feed for anyone tracking fares over time, researching a route, or watching how prices move between airlines. The same numbers a traveller reads off the page are the numbers a fare monitor wants in a structured table.

This guide shows you how to scrape Google Flights with Python the reliable way. You build a small, runnable scraper that fetches a rendered Google Flights results page through the Crawling API, parses each flight with BeautifulSoup, and exports the records to JSON and CSV for fare monitoring. The whole walkthrough stays scoped to public flight-results data that anyone can see without an account, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.

What you will build

A Python script that takes a public Google Flights search URL, retrieves the rendered HTML through the Crawling API, and extracts a structured record for every flight listing on the page. We will use a sample route as the running example and pull these fields from each listing:

  • Airline the operating carrier (or carriers) shown on the listing.
  • Price the displayed fare for the itinerary.
  • Departure time the local departure time of the flight.
  • Arrival time the local arrival time, including any +1 / +2 day markers.
  • Duration the total travel time for the itinerary.
  • Stops whether the flight is nonstop or how many stops it has.

Why a plain request fails on Google Flights

If you fire a bare HTTP request at a Google Flights URL from a script, you do not get the page you see in your browser. Google Flights builds its result list client-side: the initial HTML is mostly an empty shell, and the flight cards are filled in by JavaScript after the page loads. A plain requests.get never runs that JavaScript, so it returns markup with no flights in it, and every selector you write comes back empty.

On top of that, Google watches for automated traffic. Requests that do not look like a real browser get challenged, fed a verification page, or blocked before they reach the listings. So a working Google Flights scraper needs two things in one request: a browser that actually renders the page, and an IP that Google reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but keeping that healthy is most of the work. The Crawling API folds both into a single call: you send it the URL, it renders the page in a real browser from a trusted residential IP, and returns finished HTML for you to parse.

Rendering is not optional here

Unlike a classic server-rendered results page, Google Flights has almost nothing useful in its raw HTML. The flight cards exist only after JavaScript runs. That is why this tutorial uses the JavaScript-rendering token from the start, not as an afterthought. If your fetch returns a page with zero listings, rendering is almost always the missing piece.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If BeautifulSoup is new to you, our guide to using BeautifulSoup in Python covers the parsing basics this tutorial assumes.

Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.

A Crawlbase account and token. Sign up, open your dashboard, and copy your tokens from the account docs page. Google Flights needs rendering, so you will use the JavaScript token rather than the normal one. Your first 1,000 requests are free, and adding billing details before you spend them unlocks an extra 9,000 free requests. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the libraries the scraper needs.

bash
python --version

python -m venv google_flights_env
source google_flights_env/bin/activate

pip install crawlbase beautifulsoup4

On Windows, activate the environment with google_flights_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official Python library that calls the Crawling API and handles the rendering, and beautifulsoup4 parses the returned HTML so you can pull out individual fields by CSS selector.

Step 1: Fetch the rendered results

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JavaScript token, and request the search URL with rendering enabled. Reading the status code before you parse keeps failures loud instead of silent, which matters a lot on a target that throttles.

python
from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"})

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = api.get(page_url, options)
    status = response["headers"].get("pc_status")
    if status == "200":
        return response["body"].decode("utf-8")
    print(f"Request failed: {status}")
    return None

if __name__ == "__main__":
    page_url = "https://www.google.com/travel/flights/search?tfs=CBwQAhopEgoyMDI0LTA3LTE0ag0IAxIJL20vMDFmMDhycgwIAxIIL20vMDZ5NTcaKRIKMjAyNC0wNy0yMGoMCAMSCC9tLzA2eTU3cg0IAxIJL20vMDFmMDhy&hl=en-US&curr=EUR"
    html = crawl(page_url)
    print(html[:500] if html else "No HTML returned")

The two wait options matter for a client-rendered target like this. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so the late-rendering flight cards appear before the page is captured. Five seconds is a reasonable start; raise it if the flight list comes back short or empty. Note the pc_status header: that is Crawlbase's own status for the underlying request, and it is the value to trust over the outer HTTP code when you decide whether a fetch succeeded. Run the script and you should see real listing markup, not the empty shell a plain fetch returns.

Crawlbase Crawling API

Google Flights needs a rendered page behind a trusted IP, in one call. That is exactly what the JavaScript token in the snippet above buys you: the Crawling API runs the page in a real browser, waits for the flight cards to fill in, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a public search on the free tier first.

Step 2: Parse each flight with BeautifulSoup

With rendered HTML in hand, load it into BeautifulSoup and pull each flight from its listing element. Google wraps every flight in a list item, so you find all the items, then read airline, duration, price, departure and arrival times, and stops out of each one. The selectors below come straight from the live layout; inspect a page in your browser's dev tools (right-click, then Inspect) to confirm the current class names before a long run.

python
from bs4 import BeautifulSoup

def text_or_none(listing, selector):
    el = listing.select_one(selector)
    return el.get_text(strip=True) if el else None

def scrape_dates(listing):
    dep = listing.select_one(
        'span.mv1WYe span:first-child [jscontroller="cNtv4b"] span')
    arr = listing.select_one(
        'span.mv1WYe span:last-child [jscontroller="cNtv4b"] span')
    departure = dep.get_text(strip=True) if dep else None
    arrival = arr.get_text(strip=True) if arr else None
    return departure, arrival

def scrape_flights(html):
    soup = BeautifulSoup(html, "html.parser")
    flights = []
    for listing in soup.select("li.pIav2d"):
        departure, arrival = scrape_dates(listing)
        flights.append({
            "airline": text_or_none(listing, "div.Ir0Voe div.sSHqwe"),
            "price": text_or_none(listing, "div.U3gSDe div.FpEdX span"),
            "departure": departure,
            "arrival": arrival,
            "duration": text_or_none(listing, "div.AdWm1c.gvkrdb"),
            "stops": text_or_none(listing, "div.EfT7Ae span.ogfYpf"),
        })
    return flights

The wrapper selector li.pIav2d matches each flight listing on the page, and every field is read from a selector inside that listing. div.Ir0Voe div.sSHqwe holds the airline name, div.U3gSDe div.FpEdX span holds the price, div.AdWm1c.gvkrdb holds the duration, and div.EfT7Ae span.ogfYpf holds the stops text (for example "Nonstop" or "1 stop"). Departure and arrival times live in a shared span.mv1WYe block, where the first child is the departure and the last child is the arrival, each wrapped in a [jscontroller="cNtv4b"] span. The small text_or_none helper means a missing field returns None instead of crashing the loop, which is what you want on a page where not every card carries every field.

Selectors drift

Google's class names, like pIav2d and sSHqwe, are obfuscated and rotate when Google redeploys its front end. Treat the selectors above as a starting template, not a contract. When a field comes back empty for every flight, re-inspect a live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.

Step 3: Put it together and export

Now wire the fetch and the parse into one runnable script, then write the structured output to both JSON and CSV. JSON is convenient for further processing; CSV drops straight into a spreadsheet or a fare-monitoring sheet that you diff day over day.

python
import csv
import json
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"})

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = api.get(page_url, options)
    if response["headers"].get("pc_status") == "200":
        return response["body"].decode("utf-8")
    return None

def text_or_none(listing, selector):
    el = listing.select_one(selector)
    return el.get_text(strip=True) if el else None

def scrape_dates(listing):
    dep = listing.select_one(
        'span.mv1WYe span:first-child [jscontroller="cNtv4b"] span')
    arr = listing.select_one(
        'span.mv1WYe span:last-child [jscontroller="cNtv4b"] span')
    departure = dep.get_text(strip=True) if dep else None
    arrival = arr.get_text(strip=True) if arr else None
    return departure, arrival

def scrape_flights(html):
    soup = BeautifulSoup(html, "html.parser")
    flights = []
    for listing in soup.select("li.pIav2d"):
        departure, arrival = scrape_dates(listing)
        flights.append({
            "airline": text_or_none(listing, "div.Ir0Voe div.sSHqwe"),
            "price": text_or_none(listing, "div.U3gSDe div.FpEdX span"),
            "departure": departure,
            "arrival": arrival,
            "duration": text_or_none(listing, "div.AdWm1c.gvkrdb"),
            "stops": text_or_none(listing, "div.EfT7Ae span.ogfYpf"),
        })
    return flights

def save_json(flights, path="google_flights.json"):
    with open(path, "w", encoding="utf-8") as f:
        json.dump(flights, f, ensure_ascii=False, indent=2)

def save_csv(flights, path="google_flights.csv"):
    fields = ["airline", "price", "departure",
              "arrival", "duration", "stops"]
    with open(path, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fields)
        writer.writeheader()
        writer.writerows(flights)

def main():
    page_url = "https://www.google.com/travel/flights/search?tfs=CBwQAhopEgoyMDI0LTA3LTE0ag0IAxIJL20vMDFmMDhycgwIAxIIL20vMDZ5NTcaKRIKMjAyNC0wNy0yMGoMCAMSCC9tLzA2eTU3cg0IAxIJL20vMDFmMDhy&hl=en-US&curr=EUR"
    html = crawl(page_url)
    if not html:
        print("No HTML returned")
        return
    flights = scrape_flights(html)
    save_json(flights)
    save_csv(flights)
    print(f"Saved {len(flights)} flights")

if __name__ == "__main__":
    main()

Run the full script with python main.py. It fetches the rendered results page for the sample route, extracts a record for each flight, and writes both google_flights.json and google_flights.csv. To monitor a fare, run it on a schedule and append each run's CSV to a history file: the price column over time is your fare curve. To scan a different route or date, swap the tfs value and dates in the URL; the parser handles whatever comes back.

What the output looks like

You get a clean list of flight records, one object per listing, ready to write to JSON, CSV, or a database. Here is a trimmed JSON sample for a long-haul route:

json
[
  {
    "airline": "Cebu Pacific",
    "price": "€924",
    "departure": "10:10 PM",
    "arrival": "9:45 AM+2",
    "duration": "29 hr 35 min",
    "stops": "1 stop"
  },
  {
    "airline": "Etihad",
    "price": "€2,038",
    "departure": "10:25 PM",
    "arrival": "6:10 PM+1",
    "duration": "13 hr 45 min",
    "stops": "Nonstop"
  },
  {
    "airline": "Emirates",
    "price": "€2,215",
    "departure": "9:30 PM",
    "arrival": "5:20 PM+1",
    "duration": "13 hr 50 min",
    "stops": "Nonstop"
  }
]

The same records land in google_flights.csv with one header row (airline,price,departure,arrival,duration,stops) and one row per flight, which is the shape you want for fare monitoring: append each day's run and the price column becomes a time series you can chart or alert on.

Scaling across routes and dates

One route on one day is a demo; a fare monitor runs the same parse over several routes and a range of departure dates. The shape stays the same: build each search URL, fetch it through the Crawling API with rendering on, and parse it with the same scrape_flights function. The one habit that keeps a long run healthy is pacing, so pause between requests rather than firing them in a tight loop.

python
import time

route_urls = [
    "https://www.google.com/travel/flights/search?tfs=...&curr=EUR",
    "https://www.google.com/travel/flights/search?tfs=...&curr=USD",
]

all_flights = []
for url in route_urls:
    html = crawl(url)
    if html:
        all_flights.extend(scrape_flights(html))
    time.sleep(3)

print(f"Collected {len(all_flights)} flights across {len(route_urls)} routes")

Any 5XX response from the API is free of charge, so retrying a blocked or rendering-timeout URL costs you nothing. Because every Google Flights fetch needs rendering, each request uses the JavaScript token tier; check the pricing page for how rendered requests are counted before you scale up a monitor. If you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart AI Proxy (also called the AI Proxy) gives you the same residential IP rotation as a drop-in proxy endpoint.

Staying unblocked

Even with rendering and a trusted IP handled, Google watches for scraper-shaped traffic harder than most. A few habits keep a run healthy.

  • Pace your requests. Hammering search pages in a tight loop is the fastest way to get challenged. Spread requests out and vary your routes instead of paging one search at full speed.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
  • Give the page time to render. If flights come back empty, raise page_wait before you assume the selectors broke. The cards need a moment to fill in.
  • Re-inspect when fields go empty. Google rotates its obfuscated class names. If results stop parsing, open a live page in dev tools and update the selectors.

For the broader playbook, see how to scrape websites without getting blocked. Since Google Flights is fully client-rendered, our guide on crawling JavaScript websites explains why rendering matters and how to turn it on, and if you also track lodging prices, scraping Google Hotels with Python uses the same rendered-fetch pattern.

Whether scraping Google Flights is allowed depends on Google's terms of service, your jurisdiction, and what you do with the data. Google's terms place limits on automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Google's terms and its robots.txt, and treat both as the boundary for what you collect.

A few lines worth holding to. Collect only public flight-results data: the airlines, prices, times, durations, and stop counts that anyone can see on a results page without an account. Keep your request volume low enough that you are not straining Google's servers, and pace your crawl rather than running it flat out. This guide is deliberately scoped to those public listings because that is the line that keeps the work defensible. It does not cover anything behind a login, account or personal data, payment or booking flows, or fare content you would redistribute as your own.

It is also worth knowing that the underlying fares come from airlines and global distribution systems, and many airlines publish official fare APIs or partner programmes for exactly this kind of access. If your project needs guaranteed, high-volume, redistribution-grade fare data, an official data agreement is the correct path, not a cleverer scraper. Use scraping for monitoring and research on public listings, and reach for a sanctioned API when you outgrow that.

Recap

Key takeaways

  • Google Flights is fully client-rendered. A plain fetch returns an empty shell, so rendering is essential, not optional, to see any flight cards at all.
  • You need rendering and a trusted IP together. The Crawling API takes a JavaScript token, runs the page in a real browser, rotates residential IPs server-side, and returns finished HTML.
  • BeautifulSoup does the extraction. Select each li.pIav2d listing, then read airline, price, departure, arrival, duration, and stops from it, and expect the obfuscated class names to drift.
  • Export to JSON and CSV. CSV with one row per flight is the shape for fare monitoring: append each run and the price column becomes a time series.
  • Stay on public data. Respect Google's ToS and robots.txt, keep volume low, and prefer an official airline or GDS fare API for high-volume, redistribution-grade access.

Frequently Asked Questions (FAQs)

Why does a plain request return no flights on Google Flights?

Google Flights builds its result cards client-side with JavaScript, so the raw HTML a plain requests.get downloads is mostly an empty shell with no flights in it. To see the listings you have to render the page in a real browser. Fetching through the Crawling API with the JavaScript token runs that rendering for you and returns the finished HTML, which is why every selector in this guide assumes a rendered page.

Can I scrape Google Flights with Python?

Yes. With the crawlbase library and BeautifulSoup you can fetch a rendered results page and pull out airline, price, departure and arrival times, duration, and stops. The Crawling API acts as the bridge that renders the page and gets your request to Google from a trusted IP, so requests are processed smoothly instead of being blocked. For a broader Python primer, see our guide on scraping websites with Python.

What fields can I extract from a Google Flights listing?

This tutorial pulls six fields from each flight: the airline, the price, the departure time, the arrival time (including +1 / +2 day markers), the total duration, and the stops text such as "Nonstop" or "1 stop". You can extend it to other visible fields like layover airports or carbon estimates by adding selectors. Stay within public flight-results data and avoid anything behind a login or a booking flow.

Do I need JavaScript rendering to scrape Google Flights?

Yes, and that is the key difference from many other targets. Google Flights has almost nothing in its raw HTML; the flight cards only appear after JavaScript runs. The Crawling API offers a JavaScript-rendering token plus ajax_wait and page_wait options that fetch the page the way a real browser would. Our guide to crawling JavaScript websites covers when rendering is necessary and how it works.

How do I monitor a fare over time?

Run the scraper on a schedule for a fixed route and date, and append each run's CSV row (with a timestamp) to a history file. The price column across runs is your fare curve, which you can chart or wire to an alert when it drops below a threshold. Keep the cadence modest, a few checks a day per route is plenty, so you stay within the pacing guidance above.

My selectors return nothing. What changed?

Almost certainly Google's markup. Class names like pIav2d and sSHqwe are obfuscated and rotate when Google redeploys its front end, so selectors that worked last month can break. First confirm the page actually rendered by raising page_wait; if it did and fields are still empty, re-inspect a live results page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available