Redfin lists homes for sale and for rent across the United States and Canada, and each listing page carries exactly the structured data that powers price tracking, market research, and real-estate analysis: the asking price, beds, baths, square footage, the street address, and the link back to the listing. Pull that across a city or a region and you have a dataset you can compare, chart, and watch over time.

This guide shows you how to scrape Redfin property data with Python the reliable way. You build a small, runnable scraper that fetches a rendered Redfin listing through the Crawling API, parses the fields you want with BeautifulSoup, walks search pagination, and exports the result to JSON and CSV. The whole walkthrough stays scoped to public listing data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.

What you will build

A Python script that takes a public Redfin listing URL, retrieves the rendered HTML through the Crawling API, and extracts a structured record for the property. We use a single sales listing as the running example and pull these fields:

  • Price the listed sale price shown on the property page.
  • Beds the number of bedrooms.
  • Baths the number of bathrooms.
  • Sqft the finished square footage of the home.
  • Address the street address, city, state, and zip.
  • Link the canonical URL of the listing itself.

Why a plain request fails on Redfin

If you request a Redfin listing URL with a bare HTTP client, you usually get one of two disappointing results: a thin HTML shell with the price, beds, and baths fields still empty, or a block page before you reach the listing at all. Two things work against you. First, Redfin renders much of its listing detail in the browser with JavaScript, so the initial HTML that a plain fetch sees does not yet contain the numbers you came for. Second, Redfin runs anti-scraping measures, including IP rate limiting, CAPTCHAs, and user-agent detection, so datacenter IPs and request patterns that do not look like a real browser get challenged quickly.

So a working Redfin scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted IP, and it returns finished HTML for you to parse.

Why the JS token

Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Redfin fills its listing fields client-side, so you need the JS token here. The normal token returns the same partial shell a plain fetch would, and there is little useful to parse out of it.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, the Python web scraping guide covers the groundwork this tutorial assumes.

Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.

A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token. The first 1,000 Crawling API requests are free and no card is required. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the two libraries the scraper needs.

bash
python --version

python -m venv redfin_env
source redfin_env/bin/activate

pip install crawlbase beautifulsoup4

On Windows, activate the environment with redfin_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull out individual fields by CSS selector. If you have not used the parser before, the BeautifulSoup guide is a good companion to this tutorial.

Step 1: Fetch the rendered listing

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JS token, and request the listing URL. Checking the status before you parse keeps failures loud instead of silent.

python
from crawlbase import CrawlingAPI

crawling_api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = crawling_api.get(page_url, options)
    if response["headers"]["pc_status"] == "200":
        return response["body"].decode("utf-8")
    print(f"Failed to fetch the page. Crawlbase status: {response['headers']['pc_status']}")
    return None

if __name__ == "__main__":
    listing_url = "https://www.redfin.com/CA/North-Hollywood/6225-Coldwater-Canyon-Ave-91606/unit-106/home/5104172"
    html = crawl(listing_url)
    print(html[:500] if html else "No HTML returned")

The two wait options matter for a client-rendered target like this. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so late-rendering elements appear before the page is captured. Five seconds is a reasonable start; raise it if the price or detail fields come back empty. The Crawlbase library returns the status under response["headers"]["pc_status"]; a value of "200" means the page was fetched and rendered. Run the script and you should see real listing markup, not a block page, which confirms rendering works before you write a single selector.

Crawlbase Crawling API

Redfin needs a rendered page behind a trusted IP, in one call, and it actively rate-limits and CAPTCHA-challenges anything that looks automated. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Your first 1,000 requests are free.

Step 2: Parse the listing fields with BeautifulSoup

With rendered HTML in hand, load it into BeautifulSoup and pull each field by its selector. Redfin lays the core listing details out in a predictable structure, so you can map price, address, and the key facts to individual selectors. Redfin has no public API for sale pages, so this is XPath-and-CSS-selector work: the price sits in a data-rf-test-id="abp-price" block, the address splits across .street-address and .bp-cityStateZip, and the headline facts (beds, baths, sqft) live in the .keyDetails-value rows. Wrap the extraction in a helper that returns None when an element is missing, so one absent field does not crash the run.

python
from bs4 import BeautifulSoup

def text_of(soup, selector):
    el = soup.select_one(selector)
    return el.get_text(strip=True) if el else None

def find_detail(details, keyword):
    for value in details:
        if keyword in value.lower():
            return value
    return None

def parse_property(html, listing_url):
    soup = BeautifulSoup(html, "html.parser")

    price = text_of(soup, 'div[data-rf-test-id="abp-price"] div')

    street = text_of(soup, ".street-address")
    city_state_zip = text_of(soup, ".bp-cityStateZip")
    address = " ".join(filter(None, [street, city_state_zip]))

    details = [d.get_text(strip=True) for d in soup.select(".keyDetails-value")]

    return {
        "address": address or None,
        "price": price,
        "beds": find_detail(details, "bed"),
        "baths": find_detail(details, "bath"),
        "sqft": find_detail(details, "sq ft"),
        "link": listing_url,
    }

The text_of helper queries an element and returns None when it is missing instead of throwing on a call against nothing. The address is rebuilt from the two pieces Redfin renders separately, the street line and the city/state/zip line, joined with a space. The headline facts come back as a flat list of .keyDetails-value strings; find_detail picks out the bed, bath, and square-footage entries by keyword rather than relying on a fixed position, which survives small reorderings in the layout. That structure keeps the extraction resilient when one field is absent on a given listing.

Selectors drift

Redfin class names and test IDs (the abp-price block, .street-address, .bp-cityStateZip, the .keyDetails-value rows) change without notice. Treat the selectors above as a starting template, not a contract. When a field comes back as None, re-inspect the live listing in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper.

Step 3: Put it together

Now wire the fetch and the parse into one runnable script. Fetch the rendered HTML, hand it to the parser, and print the structured record.

python
import json
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup

crawling_api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = crawling_api.get(page_url, options)
    if response["headers"]["pc_status"] == "200":
        return response["body"].decode("utf-8")
    print(f"Failed to fetch the page. Crawlbase status: {response['headers']['pc_status']}")
    return None

def text_of(soup, selector):
    el = soup.select_one(selector)
    return el.get_text(strip=True) if el else None

def find_detail(details, keyword):
    for value in details:
        if keyword in value.lower():
            return value
    return None

def parse_property(html, listing_url):
    soup = BeautifulSoup(html, "html.parser")
    price = text_of(soup, 'div[data-rf-test-id="abp-price"] div')
    street = text_of(soup, ".street-address")
    city_state_zip = text_of(soup, ".bp-cityStateZip")
    address = " ".join(filter(None, [street, city_state_zip]))
    details = [d.get_text(strip=True) for d in soup.select(".keyDetails-value")]
    return {
        "address": address or None,
        "price": price,
        "beds": find_detail(details, "bed"),
        "baths": find_detail(details, "bath"),
        "sqft": find_detail(details, "sq ft"),
        "link": listing_url,
    }

def main():
    listing_url = "https://www.redfin.com/CA/North-Hollywood/6225-Coldwater-Canyon-Ave-91606/unit-106/home/5104172"
    html = crawl(listing_url)
    if not html:
        return
    data = parse_property(html, listing_url)
    print(json.dumps(data, indent=2))

if __name__ == "__main__":
    main()

What the output looks like

Run the full script and you get a clean structured record for the listing, ready to write to JSON, CSV, or a database.

json
{
  "address": "6225 Coldwater Canyon Ave #106 Valley Glen, CA 91606",
  "price": "$627,000",
  "beds": "2 beds",
  "baths": "2 baths",
  "sqft": "1,209 sq ft",
  "link": "https://www.redfin.com/CA/North-Hollywood/6225-Coldwater-Canyon-Ave-91606/unit-106/home/5104172"
}

Scaling across search pages and pagination

One listing is a demo; a real job runs over a whole search. Redfin paginates its search results, and each city or region has a results path you can append a page number to, such as /page-2. The pattern is two layers: crawl each search-results page to collect the listing URLs, then fetch each listing through the same parse_property function you already wrote. Listing cards on a Redfin search page expose their URL through an anchor with the .bp-Homecard__Photo--image wrapper or the card's link element, so you can collect the hrefs and resolve them against the Redfin domain.

python
import time
from urllib.parse import urljoin

BASE = "https://www.redfin.com"

def collect_listing_urls(search_html):
    soup = BeautifulSoup(search_html, "html.parser")
    cards = soup.select("a.bp-Homecard")
    urls = [urljoin(BASE, a["href"]) for a in cards if a.get("href")]
    return list(dict.fromkeys(urls))

def scrape_search(search_url, pages):
    listings = []
    for page in range(1, pages + 1):
        page_url = search_url if page == 1 else f"{search_url}/page-{page}"
        search_html = crawl(page_url)
        if not search_html:
            continue
        for url in collect_listing_urls(search_html):
            html = crawl(url)
            if html:
                listings.append(parse_property(html, url))
            time.sleep(2)
    print(f"Scraped {len(listings)} listings")
    return listings

The dict.fromkeys step removes duplicate URLs that appear when a card links to the same listing more than once. The time.sleep(2) between listing fetches is deliberate: it paces the run so you are not hammering Redfin, which is the single most effective habit for staying unblocked. Adjust the page count and the search slug to fit your target region.

Export to JSON and CSV

Once you have a list of records, writing them out is two short functions. JSON keeps the full structure for downstream code; CSV is the format every spreadsheet and BI tool reads.

python
import csv

def save_json(records, path="redfin_listings.json"):
    with open(path, "w", encoding="utf-8") as f:
        json.dump(records, f, indent=2, ensure_ascii=False)

def save_csv(records, path="redfin_listings.csv"):
    if not records:
        return
    fields = ["address", "price", "beds", "baths", "sqft", "link"]
    with open(path, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fields)
        writer.writeheader()
        writer.writerows(records)

if __name__ == "__main__":
    results = scrape_search("https://www.redfin.com/city/11203/CA/Los-Angeles", pages=3)
    save_json(results)
    save_csv(results)

The CSV writer pins an explicit column order so the header is stable run to run, and both writers use UTF-8 so addresses with accented characters survive the round trip. With JSON and CSV on disk you can feed the data straight into a notebook, a dashboard, or a database load.

Staying unblocked

Even with rendering handled, Redfin watches for scraper-shaped traffic with IP rate limiting, CAPTCHAs, and user-agent checks. A few habits keep a run healthy, and they apply to any hard commercial target.

  • Pace your requests. Hammering listings in a tight loop is the fastest way to get throttled or served a CAPTCHA. Spread requests out, as the sleep above does, and vary your targets instead of crawling one path at full speed.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
  • Read the status codes. A run that starts returning challenges or non-200 pc_status values is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.

For the broader playbook, see how to scrape websites without getting blocked. If you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart AI Proxy gives you the same residential IP rotation as a drop-in proxy endpoint. Working other real-estate portals next? The same render-then-parse pattern carries over to scraping Zillow and scraping Realtor.com, only the selectors change.

Whether scraping Redfin is allowed depends on Redfin's terms of service, your jurisdiction, and what you do with the data. Redfin's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read the Redfin Terms of Service and its robots.txt, respect the rate expectations they imply, and treat both as the boundary for what you collect.

A few lines worth holding to. Collect only public listing data: the price, beds, baths, square footage, address, and listing link that anyone can see without an account. Keep your request volume low enough that you are not straining Redfin's servers. Stay away from anything tied to identifiable individuals, including the names and contact details of listing agents or owners that appear on a page; a public property listing is not a license to build profiles of the people attached to it. Much of the underlying property data on real-estate portals comes from MLS feeds that are licensed under specific terms, so republishing or reselling it in bulk can run into licensing restrictions even when the page itself is public.

This guide is deliberately scoped to public listing pages because that is the line that keeps the work defensible. It does not cover anything behind a login, saved-search or account data, agent or owner personal data, or any attempt to bypass authentication. Public property data only. If your project needs a steady, sanctioned feed, Redfin and other portals offer data partnerships, and licensed MLS feeds exist for exactly this purpose; a licensing arrangement or a real-estate data provider is the correct path for production volume, not a cleverer scraper.

Recap

Key takeaways

  • Redfin is client-side rendered and well-defended. A plain fetch returns a partial shell or a block page, so you must render the page behind a trusted IP before you parse it.
  • The Crawling API does both in one call. A JS token renders the page and rotates residential IPs server-side; ajax_wait and page_wait control how long it waits for content, and pc_status tells you whether the fetch worked.
  • BeautifulSoup does the extraction. Map price, address, beds, baths, sqft, and the listing link to current selectors like abp-price and .keyDetails-value, and expect those selectors to drift.
  • Scale by paginating search, then looping listings. Collect URLs from each results page, fetch each listing with the same parser, pace the run with a short sleep, and export to JSON and CSV.
  • Stay on public data. Respect Redfin's ToS and robots.txt, collect only public property fields, remember that MLS data is often licensed, and never touch accounts, logins, or the personal details of agents and owners.

Frequently Asked Questions (FAQs)

Why does a plain request return no data from Redfin?

Because Redfin renders much of its listing detail client-side with JavaScript, and it challenges automated traffic. A raw HTTP request often returns a thin shell with the price, beds, and baths fields blank, or a block page before the listing loads at all. To get real data you have to render the page behind a trusted IP, which is what the Crawling API's JS token handles for you.

Do I need the normal token or the JS token for Redfin?

The JS token. The normal token fetches static HTML, which on Redfin is the same partial shell a plain fetch returns. The JS token renders the page in a real browser before handing back the HTML, so the listing fields are present when BeautifulSoup parses them.

What data can I scrape from a Redfin listing?

Public listing fields: the price, the number of beds and baths, the square footage, the street address, and the listing link. Stay on data that is visible to any visitor without an account, and avoid the names and contact details of agents or owners, which fall outside the public-listing scope this guide covers.

My selectors return None. What changed?

Almost certainly Redfin's markup. The abp-price block, the .street-address and .bp-cityStateZip elements, and the .keyDetails-value rows change without notice, so selectors that worked last month can break. Re-inspect a live listing in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.

Redfin appends a page segment such as /page-2 to a region's search path, so you crawl each results page in turn, collect the listing links from the cards on it, and fetch each listing with the same parser. Keep a short sleep between requests and stop when a page returns no new cards. The scrape_search function above shows the full loop.

Can I scrape Redfin rental and sale pages with the same approach?

Yes. The render-then-parse pattern is the same for both; only the selectors differ, since rental and sale pages lay their fields out differently. Fetch the rendered HTML through the Crawling API, then point BeautifulSoup at the selectors that match the page type you are scraping. For other portals, the same flow carries over to scraping Trulia and other real-estate sites.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available