TripAdvisor sits on one of the largest pools of user-generated travel data on the web: hundreds of millions of reviews, ratings, and listings for hotels, restaurants, and attractions. That makes it a tempting source for market research, competitor benchmarking, and reputation tracking. The catch is that TripAdvisor renders most of its content with JavaScript and defends hard against automated traffic, so a plain HTTP request hands you a near-empty page and, often, a block.

This guide shows you how to scrape TripAdvisor the reliable way with Python: route your requests through the Crawlbase Smart AI Proxy, let it render the page and rotate IPs for you, then parse the returned HTML with BeautifulSoup to pull names, ratings, review counts, and locations from public search listings. Everything here is runnable, and the whole walkthrough is scoped to public data only.

Scraping a large commercial platform sits in a legal gray area, and whether it is allowed depends on TripAdvisor's terms of use, your jurisdiction, and what you do with the data. TripAdvisor's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work.

A few lines worth holding to. Collect only public data: listing names, ratings, review counts, and locations that anyone can see without an account. Respect TripAdvisor's robots.txt and its stated rate expectations, and keep your request volume low enough that you are not straining anyone's servers. Reviews are user-generated content tied to real people, so do not republish them wholesale or scrape anything that identifies an individual. If you plan to reuse the data commercially, get permission or an official data agreement rather than assuming silence is consent.

This guide is deliberately scoped to public listing data because that is the line that keeps the work defensible. It does not cover anything behind a login, account or profile data, private messages, or any attempt to bypass authentication. If your project needs more than public listings, the right move is an official API or a data agreement with TripAdvisor, not a cleverer scraper.

Why scrape TripAdvisor

Public travel sentiment moves constantly, and a single page view only tells you what a listing looks like right now. Scraping TripAdvisor's public listings lets you track ratings, review counts, and rankings over time, which is what most research jobs actually depend on. A few concrete uses:

  • Market research. Spot popular destinations and shifting customer preferences by watching review volume and ratings across a category.
  • Competitive analysis. Benchmark your hotel, restaurant, or attraction against rivals on rating and review count in the same market.
  • Reputation monitoring. Track how your own listing's rating trends over time so you can react to dips early.
  • Trend discovery. Aggregate sentiment across many listings to surface emerging patterns a single page would never reveal.

For an engineer, the value is structured data: turning a rendered search page into clean rows you can query, chart, or feed into a model.

Why a plain fetch fails here

If you request a TripAdvisor URL with a bare HTTP client, you typically get a response with status 200 and almost none of the data you came for. Two things work against you. First, TripAdvisor renders its listings in the browser with JavaScript, so the initial HTML is a shell that fills in only after the page's scripts run. Second, TripAdvisor flags automated traffic quickly: datacenter IPs and request patterns that do not look like a real browser get challenged or blocked before they ever reach the rendered content.

So a working TripAdvisor scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawlbase Smart AI Proxy folds both into a single endpoint: you point your normal HTTP client at it, pass a couple of parameters, and it renders the page behind a trusted, rotating IP and returns the finished HTML.

What the Smart AI Proxy is

The Smart AI Proxy is a drop-in proxy endpoint backed by Crawlbase's Crawling API. You keep using requests (or any client) as normal and just route through it. Under the hood it rotates a large pool of datacenter and residential IPs and can render JavaScript, so you get the power of the Crawling API without rewriting your code around a new SDK.

Set up the environment

You need Python 3.8 or later. Confirm your version, create a virtual environment so project dependencies stay isolated, then install the three libraries this guide uses.

bash
python --version

python -m venv tripadvisor_env
source tripadvisor_env/bin/activate

pip install requests beautifulsoup4 pandas

On Windows, activate the environment with tripadvisor_env\Scripts\activate instead of the source line. The three dependencies do the work: requests makes the HTTP calls, beautifulsoup4 parses the returned HTML, and pandas organizes your results and writes them to a file.

You also need a Crawlbase account and an access token, which you get from the dashboard after signing up. Drop it into the code wherever you see YOUR_ACCESS_TOKEN.

Send a request through the Smart AI Proxy

The Smart AI Proxy is just an HTTP proxy endpoint, so you configure it the same way you would any proxy in requests. Put your access token in the proxy URL, point both the http and https entries at it, and make a normal GET request. Because the proxy terminates TLS for you, pass verify=False to skip local certificate checks.

python
import requests

proxy_url = "http://YOUR_ACCESS_TOKEN:@smartproxy.crawlbase.com:8012"
target_url = "https://www.tripadvisor.com/Search?q=london"

proxies = {"http": proxy_url, "https": proxy_url}

response = requests.get(target_url, proxies=proxies, verify=False)

print("Status:", response.status_code)
print(response.content[:500])

That single call is the whole integration. Your code stays plain requests; the proxy handles IP rotation and the trusted-visitor reputation that keeps you from getting blocked on the first hit.

Pass Crawling API parameters

The Smart AI Proxy exposes the same controls as the Crawling API through a single request header, CrawlbaseAPI-Parameters. You pass options as a query-string-style value. For example, to geolocate the request to the United States, set the country parameter:

python
import requests

proxy_url = "http://YOUR_ACCESS_TOKEN:@smartproxy.crawlbase.com:8012"
target_url = "https://www.tripadvisor.com/Search?q=london"

headers = {"CrawlbaseAPI-Parameters": "country=US"}
proxies = {"http": proxy_url, "https": proxy_url}

response = requests.get(target_url, headers=headers, proxies=proxies, verify=False)
print(response.content[:500])

You can chain several options with & in the same header value, which is exactly how you turn on JavaScript rendering next.

Render JavaScript-heavy pages

TripAdvisor loads its listings client-side, so without rendering you get the empty shell again. Turn on a headless browser with javascript=true, and add page_wait to hold for a fixed number of milliseconds after load so late-rendering elements appear before the page is captured. Five seconds is a reasonable start; raise it if results come back thin.

python
import requests

proxy_url = "http://YOUR_ACCESS_TOKEN:@smartproxy.crawlbase.com:8012"
target_url = "https://www.tripadvisor.com/Search?q=london"

headers = {"CrawlbaseAPI-Parameters": "javascript=true&page_wait=5000"}
proxies = {"http": proxy_url, "https": proxy_url}

response = requests.get(target_url, headers=headers, proxies=proxies, verify=False)
html_content = response.content.decode("utf-8")

print("Fetched", len(html_content), "characters of rendered HTML")

With JavaScript rendering on, the response body now contains the populated listings instead of a blank scaffold. That is the HTML you will hand to BeautifulSoup.

Crawlbase TripAdvisor Scraper

TripAdvisor needs a rendered page behind a trusted IP, and the Smart AI Proxy gives you both as a drop-in endpoint. Keep using requests as normal, pass javascript=true, and it renders the page in a real browser, rotates through residential and datacenter IPs server-side, and hands back finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a public search on the free tier first.

Parse the search listings with BeautifulSoup

With rendered HTML in hand, load it into BeautifulSoup and walk the result cards. The first job is finding the container that holds the search results. Open the page in your browser, right-click a result, and choose Inspect to read the live markup.

In TripAdvisor's search view, each listing sits in a div with the class result, and those live inside a results list keyed by data-widget-type="LOCATIONS". Select that and loop the cards:

python
from bs4 import BeautifulSoup

soup = BeautifulSoup(html_content, "html.parser")

search_results = soup.select(
    'div.search-results-list[data-widget-type="LOCATIONS"] div.result'
)

for result in search_results:
    # extract fields from each result here
    pass

Now map each field to its selector. Inspecting a card shows where the name, rating, review count, and location live:

  • Name sits in a <span> inside a div.result-title.
  • Rating is in a span.ui_bubble_rating, with the value stored in its alt attribute.
  • Review count is the text of an a.review_count link.
  • Location is the text of a div.address-text.
python
name_el = result.select_one("div.result-title span")
name = name_el.text.strip() if name_el else None

rating_el = result.select_one("span.ui_bubble_rating")
rating = rating_el["alt"] if rating_el else None

reviews_el = result.select_one("a.review_count")
reviews = reviews_el.text.strip() if reviews_el else None

location_el = result.select_one("div.address-text")
location = location_el.text.strip() if location_el else None
Selectors drift

TripAdvisor updates its markup without notice, so class names like result-title and ui_bubble_rating can change. Treat the selectors above as a starting template, not a contract. When a field comes back as None, re-inspect a live search page in your browser's dev tools and update the selector. This is normal maintenance for any production scraper, not a sign something is broken.

The full scraper

Here is everything wired into one runnable file. It fetches the rendered search page through the Smart AI Proxy, parses each result, collects the rows, and prints them as JSON.

python
import json
import requests
from bs4 import BeautifulSoup

proxy_url = "http://YOUR_ACCESS_TOKEN:@smartproxy.crawlbase.com:8012"
target_url = "https://www.tripadvisor.com/Search?q=london"

headers = {"CrawlbaseAPI-Parameters": "javascript=true&page_wait=5000"}
proxies = {"http": proxy_url, "https": proxy_url}


def parse_results(html):
    soup = BeautifulSoup(html, "html.parser")
    cards = soup.select(
        'div.search-results-list[data-widget-type="LOCATIONS"] div.result'
    )
    rows = []
    for result in cards:
        name_el = result.select_one("div.result-title span")
        rating_el = result.select_one("span.ui_bubble_rating")
        reviews_el = result.select_one("a.review_count")
        location_el = result.select_one("div.address-text")
        rows.append({
            "name": name_el.text.strip() if name_el else None,
            "rating": rating_el["alt"] if rating_el else None,
            "reviews": reviews_el.text.strip() if reviews_el else None,
            "location": location_el.text.strip() if location_el else None,
        })
    return rows


def main():
    response = requests.get(target_url, headers=headers, proxies=proxies, verify=False)
    html = response.content.decode("utf-8")
    rows = parse_results(html)
    print(json.dumps(rows, indent=2))


if __name__ == "__main__":
    main()

What the output looks like

Run it with python tripadvisor_scraper.py and you get an array of structured listing objects. A trimmed sample:

json
[
  {
    "name": "London Eye",
    "rating": "4.5 of 5 bubbles",
    "reviews": "89,766 reviews",
    "location": "Westminster Bridge Road, London, England, United Kingdom"
  },
  {
    "name": "Big Bus London Hop-On Hop-Off Tour and River Cruise",
    "rating": "4 of 5 bubbles",
    "reviews": "8,656 reviews",
    "location": "London, England, United Kingdom"
  },
  {
    "name": "North London Skydiving Centre",
    "rating": "5 of 5 bubbles",
    "reviews": "2,889 reviews",
    "location": "Block Fen Drove, Wimblington, Cambridgeshire, England, United Kingdom"
  }
]

Handle pagination

One page of results is a demo; a real job walks several. TripAdvisor uses the o (offset) query parameter to page through search results, advancing by the number of results per page. Loop over a range, bump the offset each time, and collect the rows into one list.

python
base_url = "https://www.tripadvisor.com/Search?q=london"
all_rows = []
offset = 0

for page in range(5):  # adjust to the number of pages you want
    target_url = f"{base_url}&o={offset}"
    response = requests.get(target_url, headers=headers, proxies=proxies, verify=False)
    html = response.content.decode("utf-8")
    all_rows.extend(parse_results(html))
    offset += 30  # results per page

print(f"Collected {len(all_rows)} listings")

Reusing parse_results from the full script keeps the extraction logic in one place, so each page goes through the same selectors. Pace the loop so you are not firing requests back to back; the proxy rotates IPs for you, but a steady rhythm keeps a run healthy on a defensive site.

Save the data to Excel

Logging to the console is fine while you iterate, but you want the data on disk. With pandas, turning your rows into an Excel file that anyone can open in a spreadsheet is two lines.

python
import pandas as pd

df = pd.DataFrame(all_rows)
df.to_excel("tripadvisor_data.xlsx", index=False)
print("Saved tripadvisor_data.xlsx")

Each object key becomes a column, so you end up with a tidy sheet of name, rating, reviews, and location. If you would rather query the data with SQL, write the same rows into a SQLite table instead; the parsing stays identical.

Staying unblocked

Even with rendering and IP rotation handled by the Smart AI Proxy, TripAdvisor watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.

  • Pace your requests. Hammering the same search in a tight loop is the fastest way to get throttled. Spread requests out and vary your search queries.
  • Lean on rotation. A pool of residential proxies spreads requests across many real-user IPs so no single address trips a rate limit. The Smart AI Proxy handles this for you; if you roll your own, this is the part to get right.
  • Read the status codes. A run that starts returning challenges or errors is telling you the current rate or IP tier is no longer enough. Treat those responses as signal, back off, and retry later.

For the broader playbook, see how to scrape websites without getting blocked.

Recap

Key takeaways

  • TripAdvisor is client-side rendered. A plain fetch returns a near-empty shell, so you must render the page before you parse it.
  • The Smart AI Proxy is a drop-in fix. Route plain requests through it, pass javascript=true, and you get rendering plus IP rotation in one endpoint, no new SDK.
  • Tune with parameters. Send options like country and page_wait through the CrawlbaseAPI-Parameters header to geolocate and wait for content.
  • BeautifulSoup does the extraction. Map name, rating, reviews, and location to current selectors, and expect those selectors to drift over time.
  • Stay on public data. Respect TripAdvisor's terms of use and robots.txt; no accounts, no personal data, and no republishing reviews wholesale.

Frequently Asked Questions (FAQs)

It depends on TripAdvisor's terms of use, your jurisdiction, and your purpose, and their terms restrict automated access. Keep strictly to public listing data such as names, ratings, review counts, and locations, respect robots.txt and stated rate expectations, and never touch accounts, personal data, or login-walled content. Reviews are user content tied to real people, so do not republish them wholesale. For commercial reuse, get permission or an official data agreement rather than relying on a scraper.

Why does a plain fetch return no data from TripAdvisor?

Because TripAdvisor renders its listings client-side with JavaScript. The initial HTML is a shell that only fills in after the page's scripts run in a browser, so a raw HTTP request returns status 200 with the fields blank. To get real data you have to render the page first, which is what the Smart AI Proxy's javascript=true parameter handles for you.

How do I handle dynamic content loading on TripAdvisor?

Enable JavaScript rendering through the Smart AI Proxy by passing javascript=true in the CrawlbaseAPI-Parameters header, and add page_wait to hold a few seconds after load so late-rendering elements appear. That combination runs the page in a real headless browser before returning the HTML, so the dynamic listings are present when BeautifulSoup parses them.

Can I scrape multiple pages of TripAdvisor search results?

Yes. TripAdvisor uses the o offset parameter to page through search results, so you loop over a range, bump the offset by the number of results per page each time, and collect the rows into one list. Pace the loop instead of firing requests back to back, and reuse the same parsing function for every page.

My selectors return None. What changed?

Almost certainly TripAdvisor's markup. Class names like result-title and ui_bubble_rating change without notice, so selectors that worked last month can break. Re-inspect a live search page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.

Does TripAdvisor allow web scraping?

TripAdvisor's terms restrict automated access, so it does not broadly permit scraping. That said, collecting publicly visible data like listing names, ratings, review counts, and locations through a tool such as the Smart AI Proxy is the lower-risk path when done responsibly: stay on public data, respect robots.txt and rate limits, and avoid anything tied to individual users.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available