Google Hotels is where most travellers compare properties before they book: prices, ratings, photos, and availability sit side by side in one place. For anyone working on price monitoring, competitor analysis, or travel-demand models, that surface is a rich source of structured data. The catch is that Google Hotels renders its listings client-side and defends itself hard against automated traffic, so a plain HTTP request hands you a near-empty page instead of the hotels you came for.

This guide shows you how to scrape Google Hotels with Python the reliable way. You build a small, runnable scraper that fetches the rendered results through the Crawling API, parses them with BeautifulSoup, and extracts per hotel: name, price, rating, and link. The whole walkthrough stays scoped to public results data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.

What you will build

A Python script that takes a Google Hotels search URL, retrieves the rendered HTML through the Crawling API, and extracts a structured record for each hotel on the page. We will use a city search as the running example and pull these fields:

  • Name the hotel's name as shown on the card.
  • Price the nightly price for the selected dates.
  • Rating the average guest rating, where one is shown.
  • Link the URL to that hotel's result on Google.

Why a plain fetch fails on Google Hotels

Request a Google Hotels search URL with a bare HTTP client and you get a response with status 200 and almost none of the listing data in the body. Two forces work against you. First, Google Hotels builds its result cards in the browser with JavaScript, so the initial HTML is a shell that only fills in after the page's scripts run. Second, Google flags automated traffic fast: datacenter IPs and request patterns that do not look like a real browser get challenged or blocked before they ever reach the rendered content.

So a working Google Hotels scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted IP, and it returns finished HTML for you to parse.

Why the JS token

Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Google Hotels is client-side rendered, so you need the JS token here. Using the normal token returns the same empty shell a plain fetch would, and there is nothing to parse out of it.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, the official Python docs and any beginner course will get you to the level this tutorial assumes.

Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.

A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token from the account docs page. New accounts include free requests with no card required. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the two libraries the scraper needs.

bash
python --version

python -m venv hotels_env
source hotels_env/bin/activate

pip install crawlbase beautifulsoup4

On Windows, activate the environment with hotels_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull out individual fields by CSS selector.

Step 1: Fetch the rendered results

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JS token, and request the search URL. Reading the status code before you parse keeps failures loud instead of silent, which matters a lot on a target that throttles.

python
from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"})

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = api.get(page_url, options)
    status = response["headers"].get("pc_status")
    if status == "200":
        return response["body"].decode("utf-8")
    print(f"Request failed: {status}")
    return None

if __name__ == "__main__":
    page_url = "https://www.google.com/travel/search?q=hotels+in+New+York&currency=USD"
    html = crawl(page_url)
    print(html[:500] if html else "No HTML returned")

The two wait options matter for a client-rendered target like this. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so late-rendering cards appear before the page is captured. Five seconds is a reasonable start; raise it if the result list comes back short or empty. Note the pc_status header: that is Crawlbase's own status for the underlying request, and it is the value to trust over the outer HTTP code when you decide whether a fetch succeeded. Run the script and you should see real listing markup, not the empty shell a plain fetch returns.

Crawlbase Crawling API

Google Hotels needs a rendered page behind a trusted IP, in one call. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a public search on the free tier first.

Step 2: Parse the hotel cards with BeautifulSoup

With rendered HTML in hand, load it into BeautifulSoup and pull each hotel from its result card. Google lays the listings out as repeating card elements, so you find all the cards, then read name, price, rating, and link out of each one. Guard every field so a missing rating or price does not crash the run.

python
from bs4 import BeautifulSoup

def parse_hotels(html):
    soup = BeautifulSoup(html, "html.parser")
    hotels = []

    for card in soup.find_all("div", class_="BcKagd"):
        name = card.find("h2", class_="BgYkof")
        price = card.find("span", class_="qQOQpe prxS3d")
        rating = card.find("span", class_="KFi5wf lA0BZ")
        link = card.find("a", class_="PVOOXe")

        hotels.append({
            "name": name.text.strip() if name else None,
            "price": price.text.strip() if price else None,
            "rating": rating.text.strip() if rating else None,
            "link": "https://www.google.com" + link["href"] if link else None,
        })

    return hotels

Each if name else None guard does the same job: it returns None when an element is missing instead of throwing on a .text call against nothing. That keeps the extraction resilient when one card lacks a price or has no rating yet, which is common across a result page. The link is read from the anchor's href and prefixed with the Google host, since the markup uses relative paths.

Selectors drift

Google's class names (BcKagd, BgYkof, qQOQpe prxS3d, and the rest) are obfuscated and change without notice. Treat the selectors above as a starting template, not a contract. When a field comes back as None across every card, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.

Step 3: Put it together

Now wire the fetch and the parse into one runnable script. Fetch the rendered HTML, hand it to the parser, and write the structured records to a JSON file.

python
import json
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"})

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = api.get(page_url, options)
    status = response["headers"].get("pc_status")
    if status == "200":
        return response["body"].decode("utf-8")
    print(f"Request failed: {status}")
    return None

def parse_hotels(html):
    soup = BeautifulSoup(html, "html.parser")
    hotels = []
    for card in soup.find_all("div", class_="BcKagd"):
        name = card.find("h2", class_="BgYkof")
        price = card.find("span", class_="qQOQpe prxS3d")
        rating = card.find("span", class_="KFi5wf lA0BZ")
        link = card.find("a", class_="PVOOXe")
        hotels.append({
            "name": name.text.strip() if name else None,
            "price": price.text.strip() if price else None,
            "rating": rating.text.strip() if rating else None,
            "link": "https://www.google.com" + link["href"] if link else None,
        })
    return hotels

def main():
    page_url = "https://www.google.com/travel/search?q=hotels+in+New+York&currency=USD"
    html = crawl(page_url)
    if not html:
        return
    hotels = parse_hotels(html)
    with open("google_hotels.json", "w", encoding="utf-8") as f:
        json.dump(hotels, f, ensure_ascii=False, indent=2)
    print(f"Saved {len(hotels)} hotels to google_hotels.json")

if __name__ == "__main__":
    main()

What the output looks like

Run the full script and you get a clean list of structured records, ready to write to CSV or a database or feed into a price-tracking job.

json
[
  {
    "name": "The One Boutique Hotel",
    "price": "$90",
    "rating": "3.3",
    "link": "https://www.google.com/travel/search?q=New+York&..."
  },
  {
    "name": "Ly New York Hotel",
    "price": "$153",
    "rating": "4.4",
    "link": "https://www.google.com/travel/search?q=New+York&..."
  }
]

Loading more results

One search page shows a limited slice of hotels; Google reveals the rest behind a "more results" button that loads further cards in place. The Crawling API can drive that interaction for you. Pass a css_click_selector for the button and set ajax_wait so the API clicks, waits for the new cards to render, and then returns the expanded HTML. The same parse_hotels function reads the longer page without any changes.

python
def crawl_expanded(page_url, click_selector):
    options = {
        "ajax_wait": "true",
        "page_wait": 5000,
        "css_click_selector": click_selector,
    }
    response = api.get(page_url, options)
    if response["headers"].get("pc_status") == "200":
        return response["body"].decode("utf-8")
    return None

The selector for that button is itself an obfuscated value that Google rotates, so inspect the live page and confirm it before relying on it. Keep the number of expansions modest. Each click is another rendered request against a sensitive target, and stacking many of them in a tight loop is exactly the pattern that gets a run throttled.

Staying unblocked

Even with rendering handled, Google watches for scraper-shaped traffic harder than almost any other target. A few habits keep a run healthy, and they apply to any aggressive commercial site.

  • Pace your requests. Hammering searches in a tight loop is the fastest way to draw a challenge. Spread requests out and vary your queries instead of crawling one location at full speed.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
  • Read the status codes. A run that starts returning a non-200 pc_status or visible challenge markup is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.

For the broader playbook, see how to bypass captchas while scraping Google and how to rotate proxies for scraping Google search results. If you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart AI Proxy (also called the AI Proxy) gives you the same residential IP rotation as a drop-in proxy endpoint. For a deeper dive on challenge handling, the guide on how to bypass captchas while web scraping covers the general case.

Whether scraping Google Hotels is allowed depends on Google's terms of service, your jurisdiction, and what you do with the data. Google's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read the Google Terms of Service and the robots.txt for the travel surface, and treat both as the boundary for what you collect.

A few lines worth holding to. Collect only public results data: the hotel name, price, rating, and link that anyone sees on a search without signing in. Respect Google's stated rate expectations and keep your request volume low enough that you are not straining its servers. Do not scrape anything behind a login, anything personalized to a signed-in account, or any data gated by authentication, and never attempt to bypass authentication to reach it. Avoid personal data, including anything tied to identifiable individuals beyond what is publicly listed.

This guide is deliberately scoped to public search results because that is the line that keeps the work defensible. Public results only, reasonable volume. If your project needs more than the public surface, a licensed travel-data feed or an official partnership is the correct path, not a cleverer scraper.

Recap

Key takeaways

  • Google Hotels is client-side rendered. A plain fetch returns an empty shell, so you must render the page before you parse it.
  • You need rendering and a trusted IP together. The Crawling API with a JS token does both in one call; ajax_wait and page_wait control how long it waits for content.
  • BeautifulSoup does the extraction. Loop the result cards and read name, price, rating, and link from each, with a guard on every field, and expect the obfuscated selectors to drift.
  • Pace and rotate. Google throttles hard, so spread requests, rotate residential IPs, and read the pc_status header to know when to back off.
  • Stay on public data. Respect Google's ToS and robots.txt, collect only public results, keep volume reasonable, and never touch login-walled, personalized, or authenticated data.

Frequently Asked Questions (FAQs)

Why does a plain fetch return no hotels from Google Hotels?

Because Google Hotels builds its result cards client-side with JavaScript. The initial HTML is a shell that only fills in after the page's scripts run in a browser, so a raw HTTP request returns status 200 with the listings blank. To get real data you have to render the page first, which is what the Crawling API's JS token handles for you.

Do I need the normal token or the JS token for Google Hotels?

The JS token. The normal token fetches static HTML, which on Google Hotels is the same empty shell a plain fetch returns. The JS token renders the page in a real browser before handing back the HTML, so the hotel cards are present when BeautifulSoup parses them.

My selectors return None on every card. What changed?

Almost certainly Google's markup. The class names on the cards (BcKagd, BgYkof, and the rest) are obfuscated and rotate without notice, so selectors that worked last month can break. Re-inspect a live results page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.

How do I get more than one page of results?

Google reveals additional hotels behind a "more results" button rather than classic pagination. Pass that button's selector to the Crawling API with css_click_selector and set ajax_wait so the API clicks it, waits for the new cards to render, and returns the expanded HTML. Keep the number of expansions modest so you do not draw a challenge.

How do I avoid getting blocked while scraping Google Hotels?

Keep your per-IP request rate low, vary your queries instead of looping one location, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation and a trusted IP pool for you; if you build your own stack, the Smart AI Proxy gives you that rotation as a drop-in endpoint. Watch the pc_status header and back off when you start seeing non-200 responses.

Can I scrape booking or account data from Google Hotels?

No, and this guide does not cover it. Anything tied to a signed-in account, a booking flow, or personalized pricing sits behind authentication, so it is not public results data. Scraping login-walled or personalized content, or bypassing authentication to reach it, is out of scope here and runs against Google's terms. Stick to the public search surface and keep volume reasonable.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available