Zillow is one of the most-visited real estate platforms on the web, and its listing pages hold exactly the structured data that drives price tracking, market research, and investment analysis: the asking price, beds, baths, square footage, property type, and the street address. For anyone studying a local market, that public listing data is the raw material, and pulling it by hand across dozens of properties is slow and error-prone.
This guide shows you how to scrape Zillow with Python the reliable way. You build a small, runnable scraper that fetches rendered Zillow pages through the Crawling API, collects property links from a search page, parses the fields you want with BeautifulSoup, handles pagination, and exports clean JSON and CSV. The whole walkthrough stays scoped to public listing data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.
What you will build
A Python script that takes a public Zillow search URL for a location, collects the property page links, fetches each rendered listing through the Crawling API, and extracts a structured record per property. The running example is properties for sale in Columbia Heights, Washington, DC. We pull these fields:
- Price the listed asking price for the property.
- Beds the number of bedrooms.
- Baths the number of bathrooms.
- Size the interior square footage of the home.
- Address the street address shown on the listing.
- Type the property type, such as condominium, townhouse, or single-family residence.
- Link the canonical URL of the property page.
Why a plain request fails on Zillow
If you request a Zillow search or listing URL with a bare HTTP client, you get a response with status 200 and only a fraction of the data in the body. Two things work against you. First, Zillow loads most of its search results and listing details in the browser through JavaScript and Ajax, so the initial HTML is a thin shell that fills in only after the page's scripts run. Pull the property links out of that first response and you capture a handful of cards instead of the full result set. Second, Zillow flags automated traffic quickly: datacenter IPs and request patterns that do not look like a real browser get rate-limited, IP-blocked, or challenged before they ever reach the rendered content.
So a working Zillow scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted IP, and it returns finished HTML for you to parse.
Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Zillow fills its search results and listing fields client-side, so you need the JS token here. The normal token returns the same thin shell a plain fetch would, and there is little useful to parse out of it.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the parsing side, the BeautifulSoup guide is a good companion to this tutorial.
Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda, and make sure Python is on your PATH.
A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token from the account docs page. Crawlbase includes 1,000 free requests to start, which is plenty for working through this guide. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create a virtual environment so project dependencies stay isolated, then install the libraries the scraper needs.
python --version python -m venv zillow_env source zillow_env/bin/activate pip install crawlbase beautifulsoup4
On Windows, activate the environment with zillow_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull out individual fields by CSS selector. Both json and csv ship with the standard library, so there is nothing more to install for the export step.
Step 1: Fetch a rendered Zillow page
Start by getting a finished page. Import the CrawlingAPI class, initialize it with your JS token, and request a Zillow search URL. Zillow loads results asynchronously, so pass ajax_wait and page_wait to hold for the dynamic content before the page is captured. Checking the Crawlbase pc_status before you parse keeps failures loud instead of silent.
from crawlbase import CrawlingAPI api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) OPTIONS = { "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/122.0", "ajax_wait": "true", "page_wait": 5000, } def crawl(page_url): response = api.get(page_url, OPTIONS) if response["headers"]["pc_status"] == "200": return response["body"].decode("utf-8") print(f"Request failed: {response['headers']['pc_status']}") return None if __name__ == "__main__": serp_url = "https://www.zillow.com/columbia-heights-washington-dc/sale/" html = crawl(serp_url) print(html[:500] if html else "No HTML returned")
The two wait options matter for a client-rendered target like Zillow. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so late-rendering cards appear before the page is captured. Five seconds is a reasonable start; raise it if the results come back thin. Run the script with python zillow_scraper.py and you should see real Zillow search markup, not the shell a plain request returns. That confirms rendering works before you write a single selector.
Zillow needs a rendered page behind a trusted IP, in one call, which is exactly what the ajax_wait and page_wait options above set up. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a public search page on the free tier first.
Step 2: Collect property links from the search page
A Zillow search page is a grid of property cards, each linking to a full listing. Load the rendered HTML into BeautifulSoup and pull the href from each card's link. Zillow nests these inside its results grid, so the selector walks from the grid container down to the property card link.
from bs4 import BeautifulSoup CARD_SELECTOR = ( 'div[id="grid-search-results"] > ul > li[class^="ListItem-"] ' 'article[data-test="property-card"] a[data-test="property-card-link"]' ) def get_property_urls(html): soup = BeautifulSoup(html, "html.parser") return [a["href"] for a in soup.select(CARD_SELECTOR) if a.get("href")]
The class^="ListItem-" matcher is a prefix selector: Zillow appends a hash to its generated class names, so ListItem- matches every list item regardless of the suffix. Running this against the rendered search HTML returns a clean list of property page URLs:
[ "https://www.zillow.com/homedetails/1429-Girard-St-NW-101-Washington-DC-20009/2053968963_zpid/", "https://www.zillow.com/homedetails/1439-Euclid-St-NW-APT-301-Washington-DC-20009/68081615_zpid/", "https://www.zillow.com/homedetails/1362-Newton-St-NW-Washington-DC-20010/472850_zpid/", "https://www.zillow.com/homedetails/1458-Columbia-Rd-NW-APT-300-Washington-DC-20009/82293130_zpid/" ]
Zillow's generated class names and data-test attributes change without notice. Treat the selectors here as a starting template, not a contract. When a list comes back empty, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.
Step 3: Handle pagination across search pages
One search page is a slice of the result set. Zillow paginates with a {pageNo}_p path segment, so you fetch the first page to read the total page count, then walk each page collecting links. A small retry wrapper around the fetch keeps a single slow page from ending the run.
import time def fetch_html(page_url, max_retries=2): for attempt in range(max_retries + 1): html = crawl(page_url) if html: return html if attempt < max_retries: print(f"Retrying ({attempt + 1}/{max_retries})...") time.sleep(1) print(f"Unable to fetch {page_url}") return None def collect_all_urls(base_url, max_pages): first_html = fetch_html(f"{base_url}1_p/") if not first_html: return [] soup = BeautifulSoup(first_html, "html.parser") last = soup.select_one("div.search-pagination > nav > li:nth-last-child(3)") total_pages = int(last.text) if last else 1 pages = min(total_pages, max_pages) all_urls = get_property_urls(first_html) for page in range(2, pages + 1): html = fetch_html(f"{base_url}{page}_p/") if html: all_urls.extend(get_property_urls(html)) time.sleep(2) return all_urls
fetch_html retries a failed fetch up to twice with a short pause, returning the HTML on success and None once it gives up. collect_all_urls reads the highest page number from the pagination nav (Zillow puts it near the end of the list, hence nth-last-child(3)), caps the crawl at your max_pages ceiling so a large market does not run away, and gathers links from every page. The time.sleep(2) between pages paces the run so you are not hammering the site.
Step 4: Parse each property page
With a full list of property URLs, fetch each listing and extract the fields. Zillow groups the headline details inside its macro-data-view block, so the selectors below map price, beds, baths, size, address, and type to individual elements. Each lookup is guarded so a missing field returns None instead of crashing the run.
VIEW = 'div[data-testid="macro-data-view"]' FACTS = ( f'{VIEW} > div[data-renderstrat="inline"]:nth-child(2) ' 'div[data-testid="bed-bath-sqft-facts"]' ) def text_of(soup, selector): el = soup.select_one(selector) return el.get_text(strip=True) if el else None def scrape_property(html, url): soup = BeautifulSoup(html, "html.parser") return { "link": url, "price": text_of(soup, f'{VIEW} span[data-testid="price"] > span'), "address": text_of(soup, f'{VIEW} div[class^="styles__AddressWrapper-"] > h1'), "beds": text_of(soup, f'{FACTS} > div[data-testid="bed-bath-sqft-fact-container"]:first-child > span:first-child'), "baths": text_of(soup, f'{FACTS} > button > div[data-testid="bed-bath-sqft-fact-container"] > span:first-child'), "size": text_of(soup, f'{FACTS} > div[data-testid="bed-bath-sqft-fact-container"]:last-child > span:first-child'), "type": text_of(soup, f'{VIEW} > div[data-renderstrat="inline"]:nth-child(3) div.dBmBNo:first-child > span'), }
The text_of helper queries one element and returns its stripped text, or None when the element is absent, so a listing that omits a field does not break the loop. The selectors come straight from Zillow's listing layout: price reads the headline price span, address reads the H1 inside the address wrapper, and beds, baths, and size all live in the shared bed-bath-sqft-facts container, distinguished by their position. Baths sit inside a button on Zillow's markup, which is why that selector differs slightly from the other two.
Step 5: Assemble the full script
Now wire the pieces into one runnable script: collect URLs across pages, scrape each property, and export the records to both JSON and CSV.
import csv import json import time from crawlbase import CrawlingAPI from bs4 import BeautifulSoup api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) OPTIONS = { "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/122.0", "ajax_wait": "true", "page_wait": 5000, } CARD_SELECTOR = ( 'div[id="grid-search-results"] > ul > li[class^="ListItem-"] ' 'article[data-test="property-card"] a[data-test="property-card-link"]' ) VIEW = 'div[data-testid="macro-data-view"]' FACTS = ( f'{VIEW} > div[data-renderstrat="inline"]:nth-child(2) ' 'div[data-testid="bed-bath-sqft-facts"]' ) def crawl(page_url): response = api.get(page_url, OPTIONS) if response["headers"]["pc_status"] == "200": return response["body"].decode("utf-8") print(f"Request failed: {response['headers']['pc_status']}") return None def fetch_html(page_url, max_retries=2): for attempt in range(max_retries + 1): html = crawl(page_url) if html: return html if attempt < max_retries: time.sleep(1) return None def text_of(soup, selector): el = soup.select_one(selector) return el.get_text(strip=True) if el else None def get_property_urls(html): soup = BeautifulSoup(html, "html.parser") return [a["href"] for a in soup.select(CARD_SELECTOR) if a.get("href")] def collect_all_urls(base_url, max_pages): first_html = fetch_html(f"{base_url}1_p/") if not first_html: return [] soup = BeautifulSoup(first_html, "html.parser") last = soup.select_one("div.search-pagination > nav > li:nth-last-child(3)") total_pages = int(last.text) if last else 1 pages = min(total_pages, max_pages) all_urls = get_property_urls(first_html) for page in range(2, pages + 1): html = fetch_html(f"{base_url}{page}_p/") if html: all_urls.extend(get_property_urls(html)) time.sleep(2) return all_urls def scrape_property(html, url): soup = BeautifulSoup(html, "html.parser") return { "link": url, "price": text_of(soup, f'{VIEW} span[data-testid="price"] > span'), "address": text_of(soup, f'{VIEW} div[class^="styles__AddressWrapper-"] > h1'), "beds": text_of(soup, f'{FACTS} > div[data-testid="bed-bath-sqft-fact-container"]:first-child > span:first-child'), "baths": text_of(soup, f'{FACTS} > button > div[data-testid="bed-bath-sqft-fact-container"] > span:first-child'), "size": text_of(soup, f'{FACTS} > div[data-testid="bed-bath-sqft-fact-container"]:last-child > span:first-child'), "type": text_of(soup, f'{VIEW} > div[data-renderstrat="inline"]:nth-child(3) div.dBmBNo:first-child > span'), } def save_outputs(records): with open("zillow_properties.json", "w") as f: json.dump(records, f, indent=2) if not records: return with open("zillow_properties.csv", "w", newline="") as f: writer = csv.DictWriter(f, fieldnames=records[0].keys()) writer.writeheader() writer.writerows(records) def main(): serp_url = "https://www.zillow.com/columbia-heights-washington-dc/sale/" urls = collect_all_urls(serp_url, max_pages=2) records = [] for url in urls: html = fetch_html(url) if html: records.append(scrape_property(html, url)) time.sleep(2) save_outputs(records) print(f"Saved {len(records)} properties") if __name__ == "__main__": main()
The script collects property links across up to two search pages, fetches each listing with the retry wrapper, parses it into a record, and paces the loop with a two-second sleep. save_outputs writes both a JSON file and a CSV using the keys of the first record as the header, so you have the data in whichever shape your downstream tool wants. Adjust max_pages and the search URL to fit your target location.
What the output looks like
Run the full script with python zillow_scraper.py and you get a clean structured record per property, ready for analysis, a database, or a spreadsheet.
[ { "link": "https://www.zillow.com/homedetails/1008-Fairmont-St-NW-Washington-DC-20001/473889_zpid/", "price": "$850,000", "address": "1008 Fairmont St NW, Washington, DC 20001", "beds": "3", "baths": "4", "size": "1,801", "type": "Townhouse" }, { "link": "https://www.zillow.com/homedetails/1438-Meridian-Pl-NW-APT-106-Washington-DC-20010/467942_zpid/", "price": "$385,000", "address": "1438 Meridian Pl NW APT 106, Washington, DC 20010", "beds": "2", "baths": "2", "size": "634", "type": "Condominium" } ]
The matching CSV carries the same columns, one row per property, which drops straight into pandas or any spreadsheet for filtering by price band, beds, or property type.
Staying unblocked at scale
Even with rendering handled, Zillow watches for scraper-shaped traffic. A few habits keep a longer run healthy, and they apply to any hard commercial target.
- Pace your requests. Hammering listings in a tight loop is the fastest way to get throttled or challenged. The two-second sleeps above are the floor, not the ceiling; widen them for larger jobs and vary your targets instead of crawling one path at full speed.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
-
Read the status codes. A run that starts returning non-200
pc_statusvalues is telling you the current rate or IP tier is no longer enough. Treat that as a signal to back off, not noise to ignore.
For larger crawls, the async Crawler queues requests and delivers results to a webhook, which suits running many search pages without holding open connections. For the broader playbook, see how to scrape websites without getting blocked. And if you want to compare market data across portals, the same approach carries over to scraping Redfin, Realtor.com, and Trulia.
Is it legal to scrape Zillow?
Whether scraping Zillow is allowed depends on Zillow's terms of service, your jurisdiction, and what you do with the data. Zillow's terms restrict automated access and data collection, so scraping can run against those terms regardless of how careful your tooling is. Zillow has also historically been litigious about scraping, having pursued legal action against parties that harvested its listings at scale, so this is not a hypothetical risk. None of the code here changes any of that; it only makes the technical part work. Read Zillow's Terms of Use and its robots.txt, and treat both as the boundary for what you collect.
A few lines worth holding to. Collect only public listing data: the asking price, beds, baths, square footage, property type, and street address that anyone can see without an account. Avoid anything tied to identifiable individuals, including the contact details of agents, owners, or other people named on a page, which is personal data and falls outside public-listing scope. Respect Zillow's stated rate expectations and keep your request volume low enough that you are not straining its servers. Be aware too that much of the underlying property and sales data on real estate portals originates from MLS feeds, which are typically licensed and carry their own redistribution restrictions, so collecting it does not grant you the right to republish it.
This guide is deliberately scoped to public listing pages because that is the line that keeps the work defensible. It does not cover anything behind a login, saved-search or account data, the personal or contact details of individuals, or any attempt to bypass authentication. Public listing data only. If your project needs more than that, the right path is a licensing arrangement: Zillow offers official APIs and partner programs for permitted use cases, and licensed MLS or real-estate data providers cover the rest. That is the correct route for commercial or bulk use, not a cleverer scraper.
Key takeaways
- Zillow is client-side rendered. A plain request returns a thin shell with only part of the results, so you must render the page before you parse it.
-
You need rendering and a trusted IP together. The Crawling API with a JS token does both in one call;
ajax_waitandpage_waitcontrol how long it waits for content. -
Work in two layers. Collect property links from each search page with the
property-card-linkselector, then fetch and parse each listing for price, beds, baths, size, address, and type. -
Paginate and export. Walk Zillow's
{pageNo}_ppages up to a ceiling, pace the run with short sleeps, and write the records to JSON and CSV. - Stay on public data. Respect Zillow's ToS and robots.txt, note that it has litigated over scraping and that MLS data is often licensed, and never touch logins, accounts, or the personal details of individuals.
Frequently Asked Questions (FAQs)
Why does a plain request return only part of the Zillow results?
Because Zillow loads its search results and listing details client-side with JavaScript and Ajax. The initial HTML is a shell that fills in only after the page's scripts run in a browser, so a raw HTTP request returns status 200 with most cards and listing fields missing. To get the full set you have to render the page first, which is what the Crawling API's JS token handles for you.
Do I need the normal token or the JS token for Zillow?
The JS token. The normal token fetches static HTML, which on Zillow is the same thin shell a plain fetch returns. The JS token renders the page in a real browser before handing back the HTML, so the search cards and listing fields are present when BeautifulSoup parses them.
What data can I scrape from a Zillow listing?
Public listing fields: the asking price, the number of beds and baths, the square footage, the property type, the street address, and the listing link. Stay on data that is visible to any visitor without an account, and avoid the personal contact details of agents or owners, which fall outside the public-listing scope this guide covers.
My selectors return None. What changed?
Almost certainly Zillow's markup. Its generated class names and data-test attributes (the ListItem- prefix, the macro-data-view block, the bed-bath-sqft-facts container) change without notice, so selectors that worked last month can break. Re-inspect a live page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.
How do I handle pagination across a location's listings?
Zillow appends a {pageNo}_p segment to the search path. Fetch the first page to read the total page count from the pagination nav, cap the crawl at a max_pages ceiling, then walk each page collecting property links. The collect_all_urls function above shows the full loop, with a short sleep between pages.
Can I use scraped Zillow data commercially?
Treat that as a legal question, not a technical one. Much of Zillow's property data originates from licensed MLS feeds with their own redistribution terms, and Zillow's own Terms of Use restrict reuse, so commercial or bulk use generally needs permission. Review the terms, consider Zillow's official API or partner program, and seek legal advice before building a product on top of the data.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
