Houzz is one of the largest platforms for home design, furniture, and renovation, pairing a deep product catalog with editorial inspiration. Its public product and listing pages hold exactly the structured data that drives price tracking, competitor research, and trend analysis: the product name, the price, the rating and review count, the seller or brand, the category, and the link back to each product page. Pulling that by hand across a category of hundreds of products is slow and error-prone.
This guide shows you how to scrape Houzz data with Python the reliable way. You build a small, runnable scraper that fetches rendered Houzz pages through the Crawling API, collects product links from a category listing, parses the fields you want with BeautifulSoup, handles pagination, and exports clean JSON and CSV. The whole walkthrough stays scoped to public product data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.
What you will build
A Python script that takes a public Houzz category URL, collects the product page links, fetches each rendered page through the Crawling API, and extracts a structured record per product. The running example is the bathroom vanities and sink consoles category. We pull these fields:
- Name the product name as shown on the card and product page.
- Price the listed price for the product.
- Rating the average star rating.
- Reviews the number of customer reviews behind that rating.
- Seller the seller, store, or brand offering the product.
- Category the category the product sits in.
- Link the canonical URL of the product page.
Why a plain request fails on Houzz
If you request a Houzz category or product URL with a bare HTTP client, you get a response with status 200 and only a fraction of the data in the body. Two things work against you. First, Houzz renders most of its product grid and product details in the browser through JavaScript, so the initial HTML is a thin shell that fills in only after the page's scripts run. Pull the product cards out of that first response and you capture a handful of items, or none at all. Second, Houzz flags automated traffic quickly: datacenter IPs and patterns that do not look like a real browser get rate-limited, IP-blocked, or challenged before they reach the rendered content.
So a working Houzz scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but keeping those healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted IP, and it returns finished HTML for you to parse. If JavaScript-heavy targets are new to you, the guide to crawling JavaScript websites covers the why in more depth.
Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Houzz fills its product grid and product fields client-side, so you need the JS token here. The normal token returns the same thin shell a plain fetch would, and there is little useful to parse out of it.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the parsing side, the BeautifulSoup guide is a good companion to this tutorial.
Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda, and make sure Python is on your PATH.
A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token from the account docs page. Crawlbase includes 1,000 free requests to start, which is plenty for working through this guide. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create a virtual environment so project dependencies stay isolated, then install the libraries the scraper needs.
python --version python -m venv houzz_env source houzz_env/bin/activate pip install crawlbase beautifulsoup4
On Windows, activate the environment with houzz_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull out individual fields by CSS selector. Both json and csv ship with the standard library, so there is nothing more to install for the export step.
Step 1: Fetch a rendered Houzz page
Start by getting a finished page. Import the CrawlingAPI class, initialize it with your JS token, and request a Houzz category URL. Houzz loads its grid asynchronously, so pass ajax_wait and page_wait to hold for the dynamic content before the page is captured. Checking the Crawlbase pc_status before you parse keeps failures loud instead of silent.
from crawlbase import CrawlingAPI api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) OPTIONS = { "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/122.0", "ajax_wait": "true", "page_wait": 5000, } def crawl(page_url): response = api.get(page_url, OPTIONS) if response["headers"]["pc_status"] == "200": return response["body"].decode("utf-8") print(f"Request failed: {response['headers']['pc_status']}") return None if __name__ == "__main__": listing_url = "https://www.houzz.com/products/bathroom-vanities-and-sink-consoles/best-sellers--best-sellers" html = crawl(listing_url) print(html[:500] if html else "No HTML returned")
The two wait options matter for a client-rendered target like Houzz. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so late-rendering cards appear before the page is captured. Five seconds is a reasonable start; raise it if the results come back thin. Run the script with python houzz_scraper.py and you should see real Houzz category markup, not the shell a plain request returns. That confirms rendering works before you write a single selector.
Houzz needs a rendered page behind a trusted IP, in one call, which is exactly what the ajax_wait and page_wait options above set up. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a public category page on the free tier first.
Step 2: Collect product links from the category page
A Houzz category page is a grid of product cards, each linking to a full product page. Load the rendered HTML into BeautifulSoup and pull the href from each card's title link. Houzz nests these inside its product list container, so the selector walks from the list down to the card and its title anchor.
from bs4 import BeautifulSoup CARD_SELECTOR = ( 'div[data-container="Product List"] > div.hz-product-card ' 'a.hz-product-card__product-title' ) def get_product_urls(html): soup = BeautifulSoup(html, "html.parser") return [a["href"] for a in soup.select(CARD_SELECTOR) if a.get("href")]
Each product card carries the class hz-product-card, and the title link inside it carries hz-product-card__product-title with the product page URL in its href. Running this against the rendered category HTML returns a clean list of product page URLs:
[ "https://www.houzz.com/products/the-sequoia-bathroom-vanity-acacia-30-single-sink-freestanding-prvw-vr~170329010", "https://www.houzz.com/products/bosque-bath-vanity-driftwood-42-single-sink-undermount-freestanding-prvw-vr~107752516", "https://www.houzz.com/products/render-bathroom-vanity-oak-white-prvw-vr~176775440", "https://www.houzz.com/products/the-wailea-bathroom-vanity-single-sink-42-weathered-fir-freestanding-prvw-vr~188522678" ]
Houzz's class names and container attributes change without notice. Treat the selectors here as a starting template, not a contract. When a list comes back empty, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.
Step 3: Handle pagination across category pages
One category page is a slice of the catalog. Houzz exposes a "next page" link in its pagination control, so you fetch a page, collect its links, then follow the next-page link until there is no more. A small retry wrapper around the fetch keeps a single slow page from ending the run.
import time BASE = "https://www.houzz.com" def fetch_html(page_url, max_retries=2): for attempt in range(max_retries + 1): html = crawl(page_url) if html: return html if attempt < max_retries: print(f"Retrying ({attempt + 1}/{max_retries})...") time.sleep(1) print(f"Unable to fetch {page_url}") return None def get_next_page_url(soup): nxt = soup.select_one("a.hz-pagination-link--next") return BASE + nxt["href"] if nxt and nxt.get("href") else None def collect_all_urls(start_url, max_pages): all_urls = [] url = start_url page = 0 while url and page < max_pages: html = fetch_html(url) if not html: break all_urls.extend(get_product_urls(html)) soup = BeautifulSoup(html, "html.parser") url = get_next_page_url(soup) page += 1 time.sleep(2) return all_urls
fetch_html retries a failed fetch up to twice with a short pause, returning the HTML on success and None once it gives up. get_next_page_url reads the hz-pagination-link--next anchor and joins its relative href to the Houzz host. collect_all_urls follows that next-page link from one page to the next, capping the crawl at your max_pages ceiling so a large category does not run away, and gathers links from every page. The time.sleep(2) between pages paces the run so you are not hammering the site.
Step 4: Parse each product page
With a full list of product URLs, fetch each page and extract the fields. Houzz groups the headline details around its pricing block, so the selectors below map name, price, rating, review count, seller, and category to individual elements. Each lookup is guarded so a missing field returns None instead of crashing the run.
def text_of(soup, selector): el = soup.select_one(selector) return el.get_text(strip=True) if el else None def get_rating(soup): star = soup.select_one("span.star-rating") if star and star.get("aria-label"): return star["aria-label"].replace("Average rating: ", "") return None def scrape_product(html, url): soup = BeautifulSoup(html, "html.parser") return { "link": url, "name": text_of(soup, "span.view-product-title"), "price": text_of(soup, "span.pricing-info__price"), "rating": get_rating(soup), "reviews": text_of(soup, "span.review-count"), "seller": text_of(soup, "a.seller-name"), "category": text_of(soup, "nav.breadcrumb li:last-child"), }
The text_of helper queries one element and returns its stripped text, or None when the element is absent, so a product that omits a field does not break the loop. The selectors come straight from Houzz's product layout: name reads the view-product-title span, price reads the pricing-info__price span, and the rating lives in a star-rating span whose aria-label reads like "4.9 out of 5 stars", which get_rating cleans up. The review count, seller, and category sit in the surrounding metadata; re-inspect the live page if any of these comes back None, since Houzz revises these wrappers periodically.
Step 5: Assemble the full script
Now wire the pieces into one runnable script: collect URLs across pages, scrape each product, and export the records to both JSON and CSV.
import csv import json import time from crawlbase import CrawlingAPI from bs4 import BeautifulSoup api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) BASE = "https://www.houzz.com" OPTIONS = { "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/122.0", "ajax_wait": "true", "page_wait": 5000, } CARD_SELECTOR = ( 'div[data-container="Product List"] > div.hz-product-card ' 'a.hz-product-card__product-title' ) def crawl(page_url): response = api.get(page_url, OPTIONS) if response["headers"]["pc_status"] == "200": return response["body"].decode("utf-8") print(f"Request failed: {response['headers']['pc_status']}") return None def fetch_html(page_url, max_retries=2): for attempt in range(max_retries + 1): html = crawl(page_url) if html: return html if attempt < max_retries: time.sleep(1) return None def text_of(soup, selector): el = soup.select_one(selector) return el.get_text(strip=True) if el else None def get_rating(soup): star = soup.select_one("span.star-rating") if star and star.get("aria-label"): return star["aria-label"].replace("Average rating: ", "") return None def get_product_urls(html): soup = BeautifulSoup(html, "html.parser") return [a["href"] for a in soup.select(CARD_SELECTOR) if a.get("href")] def get_next_page_url(soup): nxt = soup.select_one("a.hz-pagination-link--next") return BASE + nxt["href"] if nxt and nxt.get("href") else None def collect_all_urls(start_url, max_pages): all_urls = [] url = start_url page = 0 while url and page < max_pages: html = fetch_html(url) if not html: break all_urls.extend(get_product_urls(html)) soup = BeautifulSoup(html, "html.parser") url = get_next_page_url(soup) page += 1 time.sleep(2) return all_urls def scrape_product(html, url): soup = BeautifulSoup(html, "html.parser") return { "link": url, "name": text_of(soup, "span.view-product-title"), "price": text_of(soup, "span.pricing-info__price"), "rating": get_rating(soup), "reviews": text_of(soup, "span.review-count"), "seller": text_of(soup, "a.seller-name"), "category": text_of(soup, "nav.breadcrumb li:last-child"), } def save_outputs(records): with open("houzz_products.json", "w") as f: json.dump(records, f, indent=2) if not records: return with open("houzz_products.csv", "w", newline="") as f: writer = csv.DictWriter(f, fieldnames=records[0].keys()) writer.writeheader() writer.writerows(records) def main(): listing_url = "https://www.houzz.com/products/bathroom-vanities-and-sink-consoles/best-sellers--best-sellers" urls = collect_all_urls(listing_url, max_pages=2) records = [] for url in urls: html = fetch_html(url) if html: records.append(scrape_product(html, url)) time.sleep(2) save_outputs(records) print(f"Saved {len(records)} products") if __name__ == "__main__": main()
The script collects product links across up to two category pages, fetches each product page with the retry wrapper, parses it into a record, and paces the loop with a two-second sleep. save_outputs writes both a JSON file and a CSV using the keys of the first record as the header, so you have the data in whichever shape your downstream tool wants. Adjust max_pages and the category URL to fit your target.
What the output looks like
Run the full script with python houzz_scraper.py and you get a clean structured record per product, ready for analysis, a database, or a spreadsheet.
[ { "link": "https://www.houzz.com/products/the-sequoia-bathroom-vanity-acacia-30-single-sink-freestanding-prvw-vr~170329010", "name": "The Sequoia Bathroom Vanity, Acacia, 30\", Single Sink, Freestanding", "price": "$948", "rating": "4.9 out of 5 stars", "reviews": "128 Reviews", "seller": "Cambridge Plumbing", "category": "Bathroom Vanities" }, { "link": "https://www.houzz.com/products/render-bathroom-vanity-oak-white-prvw-vr~176775440", "name": "Render Bathroom Vanity, Oak White", "price": "$295", "rating": "4.5 out of 5 stars", "reviews": "43 Reviews", "seller": "Modway", "category": "Bathroom Vanities" } ]
The matching CSV carries the same columns, one row per product, which drops straight into pandas or any spreadsheet for filtering by price band, rating, or seller. If your destination is a worksheet rather than a script, the same records feed an ecommerce scraping pipeline without further reshaping.
Staying unblocked at scale
Even with rendering handled, Houzz watches for scraper-shaped traffic. A few habits keep a longer run healthy, and they apply to any hard commercial target.
- Pace your requests. Hammering product pages in a tight loop is the fastest way to get throttled or challenged. The two-second sleeps above are the floor, not the ceiling; widen them for larger jobs and vary your targets instead of crawling one category at full speed.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
-
Read the status codes. A run that starts returning non-200
pc_statusvalues is telling you the current rate or IP tier is no longer enough. Treat that as a signal to back off, not noise to ignore.
For larger crawls, the async Crawler queues requests and delivers results to a webhook, which suits running many category pages without holding open connections. For the broader playbook, see how to scrape websites without getting blocked.
Is it legal to scrape Houzz?
Whether scraping Houzz is allowed depends on Houzz's terms of service, your jurisdiction, and what you do with the data. Houzz's Terms of Use restrict automated access and bulk data collection, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it only makes the technical part work. Read Houzz's Terms of Use and its robots.txt, respect any rate expectations and disallowed paths they set, and treat both as the boundary for what you collect. Keep your request volume low enough that you are not straining its servers.
A few lines worth holding to. Collect only public product and listing data: the product name, price, rating, review count, seller or brand, and category that anyone can see without an account. Houzz also hosts a large body of user-generated content, including project photos, design ideas, and reviews tied to named individuals; treat anything attached to a real person as personal data that falls outside this scope, and note that GDPR and CCPA apply the moment personal data enters the picture. Product photography and design images on Houzz are copyrighted by the sellers, designers, and Houzz itself, so collecting an image URL does not give you the right to redistribute, reuse, or republish the image. Stay on the factual product fields, not the media.
This guide is deliberately scoped to public product pages because that is the line that keeps the work defensible. It does not cover anything behind a login, account or saved-idea data, the personal details of homeowners or pros, or copyrighted photos you would redistribute. If your project needs more than public product fields, the right path is a licensing arrangement: Houzz operates a public API and partner programs for permitted use cases such as its Trade Program and seller integrations, and that is the correct route for commercial or bulk use, not a cleverer scraper.
Key takeaways
- Houzz is client-side rendered. A plain request returns a thin shell with little of the product grid, so you must render the page before you parse it.
-
You need rendering and a trusted IP together. The Crawling API with a JS token does both in one call;
ajax_waitandpage_waitcontrol how long it waits for content. -
Work in two layers. Collect product links from each category page with the
hz-product-card__product-titleselector, then fetch and parse each product for name, price, rating, reviews, seller, and category. -
Paginate and export. Follow Houzz's
hz-pagination-link--nextlink up to a ceiling, pace the run with short sleeps, and write the records to JSON and CSV. - Stay on public data. Respect Houzz's ToS and robots.txt, keep to factual product fields, never touch logins, personal data, or copyrighted images you would redistribute, and use the official API for bulk or commercial use.
Frequently Asked Questions (FAQs)
Why does a plain request return almost no Houzz products?
Because Houzz loads its product grid and product details client-side with JavaScript. The initial HTML is a shell that fills in only after the page's scripts run in a browser, so a raw HTTP request returns status 200 with most cards and product fields missing. To get the full set you have to render the page first, which is what the Crawling API's JS token handles for you.
Do I need the normal token or the JS token for Houzz?
The JS token. The normal token fetches static HTML, which on Houzz is the same thin shell a plain fetch returns. The JS token renders the page in a real browser before handing back the HTML, so the product cards and product fields are present when BeautifulSoup parses them.
What data can I scrape from a Houzz product page?
Public product fields: the product name, the price, the average rating and review count, the seller or brand, the category, and the product link. Stay on data that is visible to any visitor without an account, and avoid project photos, reviews, or any content tied to identifiable individuals, which falls outside the public-product scope this guide covers.
My selectors return None. What changed?
Almost certainly Houzz's markup. Class names like hz-product-card, pricing-info__price, and star-rating and container attributes change without notice, so selectors that worked last month can break. Re-inspect a live page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.
How do I scrape more than one page of a Houzz category?
Houzz exposes a next-page anchor with the class hz-pagination-link--next. Read its href, join it to the Houzz host, and follow it from one page to the next until the link is absent or you hit your max_pages ceiling. The collect_all_urls function above shows the full loop, with a short sleep between pages.
Can I use scraped Houzz data commercially?
Treat that as a legal question, not a technical one. Houzz's Terms of Use restrict reuse, much of the product imagery is copyrighted by sellers and designers, and any user content is personal data, so commercial or bulk use generally needs permission. Review the terms, consider Houzz's official API and partner programs, and seek legal advice before building a product on top of the data.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
