Rotten Tomatoes is one of the most-referenced sources of movie ratings on the public web. Its movie pages carry the Tomatometer critics score, the audience score, the genre, and a link to the full title page, all of it visible to anyone without an account. For entertainment research, trend analysis, or a personal film database, that public rating data is genuinely useful to collect in a structured form.

This guide shows you how to scrape Rotten Tomatoes movie ratings with Python. The whole walkthrough stays scoped to public, non-personal data: the title, scores, genre, and page link that the site shows openly. It does not touch reviewer identities, full review text, or anything behind a login. Because Rotten Tomatoes renders its scores client-side with JavaScript, we route requests through the Crawling API so the page is fully loaded before we parse it.

What you will build

A small Python scraper that takes one or more public Rotten Tomatoes movie pages, fetches each rendered page through the Crawling API, parses a handful of public fields, and exports the result to JSON and CSV:

  • Movie title the name of the film as shown on the page.
  • Tomatometer score the critics score aggregated from approved critics.
  • Audience score the aggregate public score for the film.
  • Genre the category the film is classified under, such as comedy or drama.
  • Link the canonical URL of the movie page on Rotten Tomatoes.

These are all public, aggregate facts about the film itself. The scraper handles multiple movies in one run and writes a clean dataset you can load into a notebook or a spreadsheet.

Why a plain request fails on Rotten Tomatoes

Request a Rotten Tomatoes movie URL with a bare HTTP client and the scores will not be there. The Tomatometer and audience numbers, along with much of the rating metadata, load dynamically through JavaScript after the initial HTML arrives. A library like requests only sees the first shell of markup, so the fields you care about come back empty. On top of that, repetitive automated traffic from a single datacenter IP tends to get challenged before the content ever renders.

A working scraper therefore needs two things in the same request: a browser that runs the page's JavaScript, and an IP address the site reads as an ordinary visitor. You can assemble that yourself with a headless browser and a pool of residential proxies, but maintaining that stack is most of the effort. The Crawling API folds both into one call. You send it a URL with a JavaScript token, it renders the page behind a trusted residential IP, and it returns finished HTML you can hand straight to BeautifulSoup. For more background, see our guide on how to crawl JavaScript websites.

Why the JS token

Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Rotten Tomatoes scores are injected client-side, so you need the JS token here. The normal token returns the same incomplete shell a plain fetch would.

Prerequisites

A few things to have in place first. None take long.

Basic Python. You should be comfortable running a script and installing packages with pip. If parsing HTML is new to you, our primer on how to use BeautifulSoup in Python covers the extraction side, and scrape a website with Python walks the full loop end to end.

Python 3.8 or later. Confirm with python --version. If you do not have it, install it from python.org.

A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token. Crawlbase gives you 1,000 free requests to start, and you pay only for successful requests. Treat the token like a password and keep it out of version control.

Set up the project

Create an isolated virtual environment, then install the two libraries the scraper needs.

bash
python --version

python -m venv rt_env
source rt_env/bin/activate

pip install crawlbase beautifulsoup4

On Windows, activate with rt_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull out individual fields by selector.

Step 1: Fetch the rendered movie page

Start by getting the finished page. Import CrawlingAPI, initialize it with your JS token, and request a public movie URL. Two wait options matter for a client-rendered target: ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so the scores appear before the page is captured. Check the status before parsing so failures stay loud instead of silent.

python
from crawlbase import CrawlingAPI

crawling_api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

def fetch_html(url):
    options = {"ajax_wait": "true", "page_wait": "5000"}
    response = crawling_api.get(url, options)
    if response["headers"]["pc_status"] == "200":
        return response["body"].decode("utf-8")
    print(f"Failed to fetch the page. Status code: {response['headers']['pc_status']}")
    return None

if __name__ == "__main__":
    url = "https://www.rottentomatoes.com/m/beetlejuice_beetlejuice"
    html = fetch_html(url)
    print(html[:500] if html else "No HTML returned")

Five seconds is a reasonable starting point for page_wait; raise it if scores come back empty. The example uses a public movie page. Run the script and you should see real markup from the title page, which confirms rendering works before you write a single selector.

Crawlbase Crawling API

Rotten Tomatoes injects its Tomatometer and audience scores client-side, so you need a rendered page behind a trusted IP in one call. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless browser fleet and a proxy pool yourself. Point it at a movie page on the free tier first.

Step 2: Inspect the page and parse the public fields

Before writing selectors, open a movie page in your browser and use the developer tools to find where each field lives. On a Rotten Tomatoes movie page the structure is stable enough to target directly:

  • Title sits in an <h1> element with a slot="titleIntro" attribute.
  • Tomatometer (critics) score is inside an rt-text element with slot="criticsScore".
  • Audience score is also in an rt-text element, with slot="audienceScore".
  • Genre appears in the movie details list, under a <dt> labelled Genre with its values in the matching <dd>.

With rendered HTML in hand, load it into BeautifulSoup and pull each field. The helper below reads the title and both scores by their slot selectors, then walks the details list to find the genre. Each lookup is guarded so a missing field returns an empty string instead of raising.

python
from bs4 import BeautifulSoup

def text_or_blank(node):
    return node.text.strip() if node else ""

def find_genre(soup):
    for dt in soup.select("dt.key rt-text"):
        if dt.text.strip() == "Genre":
            dd = dt.find_parent("dt").find_next_sibling("dd")
            if dd:
                values = [v.text.strip() for v in dd.find_all(["rt-link", "rt-text"]) if v.text.strip()]
                return ", ".join(values)
    return ""

def parse_movie(html, url):
    soup = BeautifulSoup(html, "html.parser")

    title = text_or_blank(soup.select_one('h1[slot="titleIntro"]'))
    critics_score = text_or_blank(soup.select_one('rt-text[slot="criticsScore"]'))
    audience_score = text_or_blank(soup.select_one('rt-text[slot="audienceScore"]'))
    genre = find_genre(soup)

    return {
        "title": title,
        "tomatometer_score": critics_score,
        "audience_score": audience_score,
        "genre": genre,
        "link": url,
    }

The two scores come straight from the criticsScore and audienceScore slots. The genre comes from the details list, where each label sits in a dt.key and the values in the matching dd. Joining the rt-link and rt-text values handles films tagged with more than one genre, such as comedy and fantasy.

Selectors drift

Rotten Tomatoes changes its markup from time to time. The slot attributes used here are more stable than deeply nested class names, but if a field comes back blank, re-inspect the live page in your browser's dev tools and update the selector. Periodic maintenance is normal for any production scraper.

Step 3: Handle multiple movies and export

Most research starts from a list of films, not a single page. Wire the fetch and parse steps into one loop that walks a list of movie URLs, pace the requests, and write the collected rows to both JSON and CSV. JSON keeps the structure for a notebook; CSV drops straight into a spreadsheet. If you plan to feed this into analysis later, our guide on structuring and cleaning scraped data for AI and ML covers the next step.

python
import csv
import json
import time
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup

crawling_api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

MOVIE_URLS = [
    "https://www.rottentomatoes.com/m/beetlejuice_beetlejuice",
    "https://www.rottentomatoes.com/m/deadpool_and_wolverine",
    "https://www.rottentomatoes.com/m/twisters",
]

def save_to_json(rows, filename="movies.json"):
    with open(filename, "w") as f:
        json.dump(rows, f, indent=4)
    print(f"Saved {len(rows)} movies to {filename}")

def save_to_csv(rows, filename="movies.csv"):
    fields = ["title", "tomatometer_score", "audience_score", "genre", "link"]
    with open(filename, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=fields)
        writer.writeheader()
        writer.writerows(rows)
    print(f"Saved {len(rows)} movies to {filename}")

def main():
    movies = []
    for url in MOVIE_URLS:
        html = fetch_html(url)
        if html:
            movies.append(parse_movie(html, url))
        time.sleep(3)

    save_to_json(movies)
    save_to_csv(movies)

if __name__ == "__main__":
    main()

This reuses the fetch_html and parse_movie helpers from the previous steps, so paste all three blocks into one file. The time.sleep(3) between requests is not decoration: pacing is the single biggest factor in whether a run stays healthy. Drop your own movie URLs into MOVIE_URLS and the script collects each one in turn.

What the output looks like

Run the full script and you get a clean record of public fields per movie, ready to load into a notebook or a spreadsheet.

json
[
    {
        "title": "Beetlejuice Beetlejuice",
        "tomatometer_score": "77%",
        "audience_score": "81%",
        "genre": "Comedy, Fantasy",
        "link": "https://www.rottentomatoes.com/m/beetlejuice_beetlejuice"
    },
    {
        "title": "Deadpool & Wolverine",
        "tomatometer_score": "79%",
        "audience_score": "95%",
        "genre": "Action, Comedy",
        "link": "https://www.rottentomatoes.com/m/deadpool_and_wolverine"
    }
]

The CSV mirror carries the same columns, one row per film, with a header line. From there you can sort by Tomatometer, compare critics against audience scores, or filter by genre for whatever entertainment research you have in mind.

Scaling to more movies and staying unblocked

The pattern above scales cleanly: a longer MOVIE_URLS list, or a discovery step that first collects movie links from a public browse page such as the top box office list, then visits each title page. A few habits keep a larger run healthy, and they apply to any defended target.

  • Pace your requests. Keep the delay between calls, and resist the urge to parallelize aggressively. Throttling is the fastest way to get a clean run.
  • Lean on rotation. The Crawling API spreads requests across residential IPs for you, so no single address trips a rate limit. If you build your own stack, this is the part to get right.
  • Read the status codes. When a run starts returning non-200 statuses, back off rather than pushing harder.
  • Keep volume sensible. Public-rating research rarely needs the whole catalogue. Sample the films you care about and stop.

For the broader playbook, see our guide on how to scrape websites without getting blocked.

This is the section to read before you write production code. The approach here is scoped to public, non-personal rating data: the movie title, the Tomatometer and audience scores, the genre, and the page link. Those are aggregate facts about a film, not personal data about an individual, which keeps this firmly on the educational, public-data side. Even so, collecting them responsibly means respecting Rotten Tomatoes' Terms of Service and its robots.txt, and pacing your requests so you do not put load on the site.

There are clear lines not to cross. Do not republish copyrighted material: the full text of individual critic reviews, editorial write-ups, images, and video are protected content, and aggregating someone's review prose or tying it to a named critic is a different activity from recording a public score. Stay with the numbers, the genre, and the link. Do not attempt to reach anything behind a login, and do not collect personal data about reviewers or users. Where personal data is ever involved, privacy regimes such as GDPR and CCPA apply, including a lawful basis for collection and honoring deletion requests.

If you need richer or large-scale movie data for a real project, the sanctioned route is a licensed data source. Rotten Tomatoes data is available through official partnerships and the Fandango family of services, and there are licensed movie databases built for programmatic access. For anything ongoing or commercial, an official agreement gives you guaranteed structure and keeps you inside the terms, which a scraper cannot promise. Treat this walkthrough as a technical exercise in reading public ratings, not as a license to mirror the site.

Recap

Key takeaways

  • Rotten Tomatoes is JavaScript-rendered. Scores load client-side, so a plain request returns an incomplete shell; you must render the page before you parse it.
  • Rendering and a trusted IP belong in one call. The Crawling API with a JS token does both, and ajax_wait plus page_wait control how long it waits for the scores.
  • Target stable slots. The titleIntro, criticsScore, and audienceScore slots, plus the details list, are more durable than nested class names.
  • Export to JSON and CSV. JSON keeps the structure for analysis; CSV drops into a spreadsheet, both with the same public fields per film.
  • Public ratings only. Collect titles, scores, genre, and links; never republish copyrighted review text, and respect the ToS and robots.txt.

Frequently Asked Questions (FAQs)

Why does a plain request return no scores from Rotten Tomatoes?

Because the Tomatometer and audience scores load client-side with JavaScript after the initial HTML arrives. A raw HTTP request with a library like requests only sees the first shell of markup, so those fields come back empty. Rendering the page first, which the Crawling API's JS token handles, is what makes the scores available to parse.

Do I need the normal token or the JS token?

The JS token. The normal token fetches static HTML, which on Rotten Tomatoes is the same incomplete shell a plain fetch returns. The JS token renders the page in a real browser before handing back the HTML, so the scores and genre are present when BeautifulSoup parses them.

What Rotten Tomatoes data is safe to scrape?

Public, non-personal facts about a film: the title, the Tomatometer score, the audience score, the genre, and the page link. Avoid republishing copyrighted material such as the full text of critic reviews, and do not collect personal data about reviewers or users. Stay with the aggregate ratings and respect the site's terms and robots.txt.

How do I scrape ratings for many movies at once?

Put the movie URLs in a list and loop over them, calling the fetch and parse helpers for each, with a short delay between requests. You can also add a discovery step that first collects links from a public browse page, then visits each title page. Keep the volume sensible and write the results to JSON or CSV as you go.

What happens if Rotten Tomatoes changes its layout?

Your selectors may stop matching and fields will come back blank. Re-inspect the live page in your browser's dev tools, find the new attribute or element for the field, and update the selector. Leaning on the slot attributes rather than deeply nested class names reduces how often this happens, but periodic maintenance is normal for any scraper.

Should I use an official source instead of scraping?

For anything ongoing or commercial, yes. Rotten Tomatoes data is available through official partnerships and the Fandango family of services, and licensed movie databases exist for programmatic access. An official agreement gives you guaranteed structure and keeps you inside the terms. The scraping approach here fits lightweight, public-data research where no licensed access is in place.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available