Realtor.com is one of the largest property portals in the United States, and its listing pages carry exactly the structured data that powers price tracking, market research, and investment analysis: the list price, beds, baths, square footage, the street address, and a link back to each listing. That public data is the raw material behind any serious read on a local housing market, but the pages render client-side and the site defends hard against automated traffic, so a plain HTTP request hands you a thin shell instead of the listings you came for.

This guide shows you how to scrape Realtor.com with Python the reliable way. You build a small, runnable scraper that fetches a rendered search page through the Crawling API, reads the listing data Realtor.com embeds in a hidden __NEXT_DATA__ script, extracts the fields you want, handles pagination, and exports clean JSON and CSV. We keep the whole walkthrough scoped to public listing data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.

What you will build

A Python script that takes a public Realtor.com search URL for a city and state, retrieves the rendered HTML through the Crawling API, parses the embedded listing dataset, and writes one structured record per property. We will use a single city as the running example and pull these fields:

  • Price the list price shown on the listing.
  • Beds the number of bedrooms.
  • Baths the consolidated bathroom count.
  • Sqft the interior square footage of the home.
  • Address the street, city, state, and postal code.
  • Link the canonical URL of the listing, rebuilt from its permalink.

Why a plain request fails on Realtor.com

If you request a Realtor.com search URL with a bare HTTP client, you get a response with status 200 and almost none of the listing data in the visible markup. Two things work against you. First, Realtor.com is a Next.js application that hydrates its listings in the browser, so the data lives inside a JSON blob in a hidden <script id="__NEXT_DATA__"> tag rather than in rendered HTML elements you can read directly. Second, the site flags automated traffic quickly: datacenter IPs and request patterns that do not look like a real browser get challenged or served a captcha before they ever reach a complete page.

So a working Realtor.com scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted IP, and it returns finished HTML with the __NEXT_DATA__ payload intact for you to parse.

Where the data lives

Right-click a Realtor.com page and choose Inspect, then search the HTML for __NEXT_DATA__. That single hidden script holds the full listing dataset the page renders from, including fields the visible layout never shows. Reading it is far more stable than scraping individual DOM elements, because the JSON keys change less often than the CSS class names around them.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, the Python web scraping guide covers the groundwork this tutorial assumes.

Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.

A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token. The first 1,000 requests are free and no card is required. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the two libraries the scraper needs.

bash
python --version

python -m venv realtor_env
source realtor_env/bin/activate

pip install crawlbase

On Windows, activate the environment with realtor_env\Scripts\activate instead of the source line. We only need one third-party dependency here: crawlbase is the official client for the Crawling API. Because the listing data arrives as JSON inside the __NEXT_DATA__ script, Python's built-in json and re modules handle the parsing without a separate HTML library.

Step 1: Fetch the rendered search page

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JS token, and request a Realtor.com search URL. Checking the status code before you parse keeps failures loud instead of silent.

python
from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("utf-8")
    print(f"Request failed: {response['status_code']}")
    return None

if __name__ == "__main__":
    search_url = "https://www.realtor.com/realestateandhomes-search/Los-Angeles_CA/pg-1"
    html = crawl(search_url)
    print(html[:500] if html else "No HTML returned")

The two wait options matter for a client-rendered Next.js target like this. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so the hydrated data is in place before the page is captured. Five seconds is a reasonable start; raise it if the dataset comes back empty. Run the script with python realtor_scraper.py and you should see real page markup that contains the __NEXT_DATA__ script, not the empty shell a plain fetch returns. That confirms rendering works before you write a single line of parsing.

Crawlbase Crawling API

Realtor.com needs a rendered Next.js page behind a trusted IP, in one call, before that __NEXT_DATA__ payload is even present to read. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a public search page on the free tier first.

Step 2: Extract the embedded listing dataset

With rendered HTML in hand, pull the JSON out of the __NEXT_DATA__ script and walk to the part of it that holds the search results. Realtor.com nests its results under props.pageProps, with a fallback path under searchResults.home_search for pages that use the alternate shape. Wrap the lookups so a missing key returns None instead of crashing the run.

python
import re
import json

def extract_next_data(html):
    # The listing dataset lives in a hidden __NEXT_DATA__ script.
    match = re.search(
        r'<script id="__NEXT_DATA__" type="application/json">(.*?)</script>',
        html,
        re.DOTALL,
    )
    if not match:
        print("No hidden web data found.")
        return None
    return json.loads(match.group(1))

def get_results(data):
    # Prefer the pageProps path, fall back to the home_search shape.
    page_props = data.get("props", {}).get("pageProps", {})
    results = page_props.get("properties")
    if results:
        return results
    search = data.get("searchResults", {}).get("home_search", {})
    return search.get("results", [])

The extract_next_data function uses a single regular expression to grab the contents of the __NEXT_DATA__ script and parse it as JSON, which avoids pulling in an HTML parser just to read a JSON blob. The get_results helper then tries the properties array under pageProps first and falls back to home_search.results, because Realtor.com serves both shapes depending on how the page was reached. Every lookup uses dict.get with a default, so a page that shifts its structure returns an empty list rather than raising a KeyError.

Step 3: Parse each property into a flat record

Each item in the results array carries a description block (beds, baths, sqft), a location.address block, the list_price, and a permalink you can turn back into a full listing URL. Map those into a flat dictionary so the output is easy to write to JSON or CSV.

python
def parse_property(item):
    description = item.get("description") or {}
    location = item.get("location") or {}
    address = location.get("address") or {}

    parts = [
        address.get("line"),
        address.get("city"),
        address.get("state_code"),
        address.get("postal_code"),
    ]
    full_address = ", ".join(p for p in parts if p)

    permalink = item.get("permalink")
    link = (
        f"https://www.realtor.com/realestateandhomes-detail/{permalink}"
        if permalink
        else None
    )

    return {
        "price": item.get("list_price"),
        "beds": description.get("beds"),
        "baths": description.get("baths_consolidated"),
        "sqft": description.get("sqft"),
        "address": full_address or None,
        "link": link,
    }

The or {} guards matter because Realtor.com sets some of these nested objects to null on listings that lack the data, and calling .get on None would throw. Beds, baths, and sqft come straight from the description block, where baths_consolidated is the field Realtor.com uses to roll full and half baths into one figure. The address is built by joining the street line, city, state code, and postal code, skipping any part that is missing, and the link is reconstructed from the permalink Realtor.com assigns each property. The result is one flat record per listing, the shape you want for export.

JSON keys drift too

The __NEXT_DATA__ structure is more stable than CSS selectors, but it is not frozen. If price or sqft starts coming back as None across every record, dump the raw JSON for one property and re-check the key names. Reading the dataset defensively with .get means a key rename degrades to empty fields rather than a crash, which is exactly what you want for an unattended run.

Step 4: Put it together with pagination

One page is a demo; a real job runs across a city's full result set. Realtor.com paginates its search with a clean /pg-<PAGE> suffix, so you build the URL for each page from a city and state, crawl it, extract the dataset, and parse every property. A short sleep between pages paces the run.

python
import re
import json
import time
from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("utf-8")
    print(f"Request failed: {response['status_code']}")
    return None

def find_properties(city, state, max_pages=1):
    listings = []
    for page in range(1, max_pages + 1):
        url = (
            "https://www.realtor.com/realestateandhomes-search/"
            f"{city}_{state.upper()}/pg-{page}"
        )
        html = crawl(url)
        if not html:
            continue
        data = extract_next_data(html)
        if not data:
            continue
        for item in get_results(data):
            listings.append(parse_property(item))
        print(f"Page {page}: {len(listings)} listings so far")
        time.sleep(2)
    return listings

def main():
    listings = find_properties("Los-Angeles", "CA", max_pages=3)
    print(json.dumps(listings, indent=2))

if __name__ == "__main__":
    main()

The find_properties function mirrors the legacy approach: a loop walks the page range, builds the {city}_{state}/pg-{page} URL for each one, and appends every parsed property to a running list. The time.sleep(2) between pages is deliberate, it paces the run so you are not hammering the site, which is the single most effective habit for staying unblocked. Drop in the extract_next_data, get_results, and parse_property functions from the previous steps and this is a complete, runnable scraper.

What the output looks like

Run the full script with python realtor_scraper.py and you get a clean list of structured records, one per listing.

json
[
  {
    "price": 139000000,
    "beds": 12,
    "baths": "17",
    "sqft": null,
    "address": "1200 Bel Air Rd, Los Angeles, CA, 90077",
    "link": "https://www.realtor.com/realestateandhomes-detail/1200-Bel-Air-Rd_Los-Angeles_CA_90077_M17839-35941"
  }
]

Export to JSON and CSV

Once each listing is a flat dictionary, exporting is two short functions. JSON keeps the full nested-friendly shape; CSV flattens it into a spreadsheet-ready table with one column per field.

python
import csv
import json

def save_json(listings, path="realtor_listings.json"):
    with open(path, "w", encoding="utf-8") as f:
        json.dump(listings, f, indent=2)

def save_csv(listings, path="realtor_listings.csv"):
    if not listings:
        return
    fields = ["price", "beds", "baths", "sqft", "address", "link"]
    with open(path, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fields)
        writer.writeheader()
        writer.writerows(listings)

Call save_json(listings) and save_csv(listings) at the end of main and you have both formats on disk. The explicit fields list keeps the CSV column order stable across runs, which matters if you append results into the same file or load them into a tool that expects a fixed header. From here the data is ready for a notebook, a database, or a pricing model.

Staying unblocked at scale

Even with rendering handled, Realtor.com watches for scraper-shaped traffic, and its layout and defenses shift over time. A few habits keep a longer run healthy, and they apply to any hard commercial target.

  • Pace your requests. Hammering pages in a tight loop is the fastest way to get throttled or served a captcha. Keep the sleep between pages and avoid crawling one city at full speed.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
  • Read the status codes. A run that starts returning challenges or errors is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.
  • Expect structure changes. Realtor.com refreshes its site regularly, so re-check the __NEXT_DATA__ keys when fields go empty rather than assuming the scraper is broken.

For the broader playbook, see how to scrape websites without getting blocked and the guide to crawling JavaScript websites. For scale beyond a single city, batch your search URLs and feed them through the same find_properties loop.

Whether scraping Realtor.com is allowed depends on Realtor.com's terms of service, your jurisdiction, and what you do with the data. The terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read the Realtor.com Terms of Service and its robots.txt, respect any stated rate limits, and treat both as the boundary for what you collect.

A few lines worth holding to. Collect only public listing data: the list price, beds, baths, square footage, address, and listing link that anyone can see without an account. A large share of Realtor.com's underlying data comes from MLS feeds that are licensed, not freely reusable, so a property record may carry usage restrictions even when the page is public; treat MLS-sourced fields as licensed content rather than open data. Avoid anything tied to identifiable individuals, including the names and contact details of agents, brokers, or owners shown on a page, which are personal data under regimes like the GDPR and CCPA. If you plan to reuse the data commercially or in bulk, get permission or a licensed feed rather than assuming silence is consent.

This guide is deliberately scoped to public listing pages because that is the line that keeps the work defensible. It does not cover anything behind a login, saved-search or account data, the personal or contact details of agents and owners, or any attempt to bypass authentication. Public listing data only. If your project needs more than that, Realtor.com and the MLS systems behind it offer official data partnerships and licensed feeds, which are the correct path for production volume, not a cleverer scraper.

Recap

Key takeaways

  • Realtor.com hides its data in __NEXT_DATA__. The listings live in a JSON blob in a hidden script, so you read that payload instead of scraping DOM elements.
  • You need rendering and a trusted IP together. The Crawling API with a JS token renders the Next.js page and rotates IPs in one call; ajax_wait and page_wait control how long it waits.
  • Parse defensively. Map price, beds, baths, sqft, address, and link with .get and or {} guards so a null field or a key rename degrades to empty values, not a crash.
  • Paginate with /pg-N and export both formats. Walk the page suffix, parse each property, then write JSON and CSV from the same flat records.
  • Stay on public data. Respect Realtor.com's ToS and robots.txt, treat MLS-sourced fields as licensed, and never collect the personal details of agents or owners.

Frequently Asked Questions (FAQs)

Why does a plain request return no listings from Realtor.com?

Because Realtor.com is a Next.js app that hydrates its listings in the browser. The data is not in the static HTML elements a raw request returns; it sits inside a hidden __NEXT_DATA__ JSON script that only appears once the page renders. To get it you have to render the page first, which is what the Crawling API's JS token handles, then read the JSON out of that script.

What is the __NEXT_DATA__ script and why scrape it?

It is the JSON payload Next.js sites embed so the page can hydrate in the browser. On Realtor.com it holds the full search-results dataset, including price, beds, baths, sqft, address, and the permalink for each listing. Reading it is more stable than parsing visible HTML, because the JSON keys change less often than the CSS class names around them.

Do I need the normal token or the JS token for Realtor.com?

The JS token. The normal token fetches static HTML, which on Realtor.com does not include the hydrated __NEXT_DATA__ content you need. The JS token renders the page in a real browser first, so the embedded dataset is present when you extract and parse it.

How do I handle pagination across a city's listings?

Realtor.com uses a /pg-<PAGE> suffix on its search URLs, so you build {city}_{state}/pg-{page} for each page and loop the page number. The find_properties function above does exactly that: it crawls each page, extracts the dataset, parses every property, and pauses between pages so you stay within a polite request rate.

What fields can I extract from a Realtor.com listing?

Public listing fields: the list price, the number of beds and baths, the square footage, the full address, and the listing link rebuilt from the permalink. Stay on data visible to any visitor without an account, treat MLS-sourced fields as licensed content, and avoid the personal details of agents, brokers, or owners, which fall outside the public-listing scope this guide covers.

Can I track price and listing changes over time?

Yes. Run the scraper on a schedule, key each listing by its permalink, and diff the price and status fields between runs to capture new listings, price drops, and sold properties. Keep the request rate modest and store only the public listing fields you need. For related real-estate targets, see the guides on how to scrape Zillow and how to scrape Redfin.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available