Best Buy's search and product pages are a public window into one of the largest consumer-electronics catalogs in North America. Every search result and product page carries the same handful of fields anyone can see: the product title, its price, the model and SKU numbers, a star rating, and whether the item is in stock. Those fields are exactly the signal that pricing analysts, market researchers, and tech buyers track when they want to watch electronics prices move or compare availability across a category.
This guide shows you how to scrape Best Buy product data with Python. You build a small, runnable scraper that fetches Best Buy search and product pages through the Crawling API, parses a clean record for each listing, handles pagination across result pages, and exports the results to JSON and CSV. The whole walkthrough stays scoped to public catalog data: the titles, prices, models, ratings, and availability anyone can read on Best Buy without logging in.
What you will build
A Python script that takes a Best Buy search URL, retrieves the rendered page through the Crawling API, and extracts a structured record per product. We use a search for "i phone" as the running example, the same query the legacy walkthrough used, and pull these fields from each listing card:
- Title the product name shown on the listing card.
- Price the current customer price, when the product shows one.
- Model / SKU the manufacturer model number and Best Buy's own SKU identifier.
- Rating the average star rating, with the review count alongside it.
- Availability whether the item is in stock, sold out, or available for shipping.
- Product URL the absolute link to the product's own detail page.
Why a plain request fails on Best Buy
If you point a bare HTTP client at a Best Buy search URL, you almost never get the product list you came for. Best Buy renders its search results client-side: the server ships a lightweight shell and the JavaScript on the page fills in the product cards afterward. So the raw HTML you get back from a simple requests.get() is missing the listings entirely, and your parser walks away with an empty list.
The second problem is bot detection. Best Buy flags automated traffic quickly. Datacenter IP ranges and request patterns that do not look like a real browser get met with a rate limit, a challenge page, or an outright block before you reach the products. So a working Best Buy scraper needs two things in one request: a browser that renders the page, and an IP that Best Buy reads as a real shopper. You can build that yourself with a headless browser and a pool of rotating residential proxies, but keeping that stack healthy is most of the work. The Crawling API folds both into a single call: you send it the search URL, it renders the page behind a trusted residential IP, handles the rotation and CAPTCHA solving, and returns finished HTML for you to parse.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, the official Python docs or any beginner course covers the level this tutorial assumes.
Python 3.8 or later. Confirm your version with python --version (or python3 --version). If you do not have it, install it from python.org and make sure Python is on your system PATH.
A Crawlbase account and token. Sign up for a free account, open your dashboard, and copy your token from the account docs page. The free tier includes 1,000 requests with no card, which is plenty to build and test this scraper. Best Buy is a JavaScript-rendered site, so you use the JavaScript request token here. Treat the token like a password and keep it out of version control.
Set up the project
Create a virtual environment so project dependencies stay isolated, then install the libraries the scraper needs. crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull each field out of the listing cards by CSS selector.
python --version python -m venv bestbuy_env source bestbuy_env/bin/activate pip install crawlbase beautifulsoup4
On Windows, activate the environment with bestbuy_env\Scripts\activate instead of the source line. With both libraries installed, create the script file the rest of the guide builds up:
touch bestbuy_scraper.py
Understanding the Best Buy search page
A Best Buy search lives at a stable URL on the searchpage.jsp endpoint, with your query in the st parameter. A search for "i phone" is https://www.bestbuy.com/site/searchpage.jsp?st=i+phone. The page lays out an ordered list of product cards, one per item, each carrying the same fields: a title, a price, a model and SKU, a star rating with a review count, and an availability state.
Before writing selectors, open a search page in your browser, right-click a product card, and choose Inspect. Best Buy wraps the whole result set in an ol.sku-item-list and each product in an li.sku-item container, then splits each card into a column-middle (title, model, SKU, rating) and a column-right (price, availability). Those are the elements you target. Best Buy's class names shift over time, so treat the selectors below as a starting template you re-check against the live page, not a permanent contract.
Step 1: Fetch the rendered search page
Start by getting the finished page. Import the CrawlingAPI class, initialize it with your token, build the search URL, and request it. Checking Crawlbase's pc_status before you parse keeps failures loud instead of silent.
from crawlbase import CrawlingAPI from urllib.parse import quote_plus api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) def crawl(page_url): options = {"ajax_wait": "true", "page_wait": 5000} response = api.get(page_url, options) if response["headers"]["pc_status"] == "200": return response["body"].decode("utf-8") print(f"Request failed. Crawlbase status: {response['headers']['pc_status']}") return None if __name__ == "__main__": search_term = "i phone" search_url = f"https://www.bestbuy.com/site/searchpage.jsp?st={quote_plus(search_term)}" html = crawl(search_url) print(html[:500] if html else "No HTML returned")
The two wait options matter for a list that fills in after load. ajax_wait tells the API to wait for the asynchronous product grid to finish loading, and page_wait holds for a fixed 5,000 milliseconds after load so the late-rendering cards appear before the page is captured. Crawlbase returns the upstream success code in the pc_status header, so you check that rather than a top-level status field. Run the script and you should see real product markup, not a challenge-page shell. That confirms rendering works before you write a single selector.
Best Buy fills its search grid with JavaScript and blocks traffic that does not look like a browser, which is exactly why the raw request above comes back empty. The Crawling API takes your token, runs the search page in a real browser with ajax_wait and page_wait so the cards finish loading, rotates through residential IPs server-side, and handles the CAPTCHA solving, then hands you finished HTML. You skip running a headless browser fleet and a proxy pool yourself. Start on the free 1,000-request tier.
Step 2: Parse the product cards with BeautifulSoup
With rendered HTML in hand, load it into BeautifulSoup, find every product card, and pull each field by its selector. Best Buy lists each item as an li.sku-item inside the ol.sku-item-list, with the title and rating in the column-middle and the price and availability in the column-right. Wrap each card in a try/except so one malformed listing does not crash the whole run.
from bs4 import BeautifulSoup BASE = "https://www.bestbuy.com" def text_of(card, selector): el = card.select_one(selector) return el.get_text(strip=True) if el else None def parse_card(card): title_el = card.select_one("div.column-middle h4.sku-title > a") href = title_el["href"] if title_el and title_el.has_attr("href") else None return { "title": title_el.get_text(strip=True) if title_el else None, "price": text_of(card, 'div.column-right div.sku-list-item-price div[data-testid="customer-price"] > span'), "model": text_of(card, "div.column-middle div.sku-model span.sku-value"), "sku": card.get("data-sku-id"), "rating": text_of(card, "div.column-middle div.ratings-reviews div.c-ratings-reviews > p"), "review_count": text_of(card, "div.column-middle div.ratings-reviews span.c-reviews"), "availability": text_of(card, "div.column-right div.fulfillment-add-to-cart-button button"), "product_url": BASE + href if href else None, } def scrape_bestbuy_listing(html): soup = BeautifulSoup(html, "html.parser") cards = soup.select("ol.sku-item-list li.sku-item") results = [] for card in cards: try: results.append(parse_card(card)) except Exception as e: print(f"Skipped a card: {e}") return results
The text_of helper queries one element inside a card and returns None when it is missing, instead of throwing on a .get_text() call against nothing. That keeps extraction resilient when a field is absent, which is common since not every listing shows a price or rating. The title and product link both come from the h4.sku-title > a anchor, the price from the customer-price test-id span, and the SKU from the card's own data-sku-id attribute. Availability is read from the fulfillment button's text, which reads "Add to Cart" for an in-stock item and "Sold Out" or "Coming Soon" otherwise, so the button label doubles as a stock signal.
Best Buy's class names and data-testid values change without notice. The structural markers like ol.sku-item-list, li.sku-item, and the column-middle / column-right split tend to be more durable than deep class chains. When a field comes back as None for every card, re-inspect the live search page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper.
Step 3: Assemble the script and export JSON and CSV
Now wire the fetch and the parse into one runnable script, then write the records to both JSON and CSV so you can load them into a notebook or a spreadsheet. Fetch the rendered search page, hand it to the parser, and dump the structured rows.
import csv import json from urllib.parse import quote_plus from crawlbase import CrawlingAPI from bs4 import BeautifulSoup api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) BASE = "https://www.bestbuy.com" FIELDS = ["title", "price", "model", "sku", "rating", "review_count", "availability", "product_url"] def crawl(page_url): options = {"ajax_wait": "true", "page_wait": 5000} response = api.get(page_url, options) if response["headers"]["pc_status"] == "200": return response["body"].decode("utf-8") print(f"Request failed. Crawlbase status: {response['headers']['pc_status']}") return None def text_of(card, selector): el = card.select_one(selector) return el.get_text(strip=True) if el else None def parse_card(card): title_el = card.select_one("div.column-middle h4.sku-title > a") href = title_el["href"] if title_el and title_el.has_attr("href") else None return { "title": title_el.get_text(strip=True) if title_el else None, "price": text_of(card, 'div.column-right div.sku-list-item-price div[data-testid="customer-price"] > span'), "model": text_of(card, "div.column-middle div.sku-model span.sku-value"), "sku": card.get("data-sku-id"), "rating": text_of(card, "div.column-middle div.ratings-reviews div.c-ratings-reviews > p"), "review_count": text_of(card, "div.column-middle div.ratings-reviews span.c-reviews"), "availability": text_of(card, "div.column-right div.fulfillment-add-to-cart-button button"), "product_url": BASE + href if href else None, } def scrape_bestbuy_listing(html): soup = BeautifulSoup(html, "html.parser") cards = soup.select("ol.sku-item-list li.sku-item") results = [] for card in cards: try: results.append(parse_card(card)) except Exception as e: print(f"Skipped a card: {e}") return results def export(rows, name="bestbuy_products"): with open(f"{name}.json", "w", encoding="utf-8") as f: json.dump(rows, f, indent=2, ensure_ascii=False) with open(f"{name}.csv", "w", newline="", encoding="utf-8") as f: writer = csv.DictWriter(f, fieldnames=FIELDS) writer.writeheader() writer.writerows(rows) print(f"Saved {len(rows)} products to {name}.json and {name}.csv") def main(): search_term = "i phone" url = f"{BASE}/site/searchpage.jsp?st={quote_plus(search_term)}" html = crawl(url) if not html: return rows = scrape_bestbuy_listing(html) export(rows) if __name__ == "__main__": main()
Run the full script with python bestbuy_scraper.py. It fetches the rendered search page, parses one row per product, and writes both bestbuy_products.json and bestbuy_products.csv. The shared FIELDS list keeps the CSV column order in step with the dictionary keys, so the two exports never drift apart.
What the output looks like
You get a clean list of product records, in search order, ready to write to JSON, CSV, or a database.
[ { "title": "Apple - iPhone 14 128GB (Unlocked) - Midnight", "price": "$729.99", "model": "MPUA3LL/A", "sku": "6507555", "rating": "Rating 4.9 out of 5 stars with 155 reviews", "review_count": "(155)", "availability": "Add to Cart", "product_url": "https://www.bestbuy.com/site/apple-iphone-14-128gb-unlocked-midnight/6507555.p?skuId=6507555" }, { "title": "Apple - iPhone SE (3rd Generation) 64GB (Unlocked)", "price": "$429.99", "model": "MMX73LL/A", "sku": "6507470", "rating": "Rating 4.5 out of 5 stars with 111 reviews", "review_count": "(111)", "availability": "Add to Cart", "product_url": "https://www.bestbuy.com/site/apple-iphone-se-3rd-generation-64gb-unlocked/6507470.p?skuId=6507470" } ]
Handling pagination across result pages
One search page is a demo; a real research job runs across the whole result set. Best Buy splits search results over several pages and tracks the current page with the cp URL parameter: &cp=1 is the first page, &cp=2 the second, and so on. To collect a full dataset, walk the pages in order, stop when a page returns no products, and pace the requests so you are not hammering Best Buy in a tight loop.
import time def scrape_all_pages(search_term, max_pages=5): base_url = f"{BASE}/site/searchpage.jsp?st={quote_plus(search_term)}" all_rows = [] for page_number in range(1, max_pages + 1): page_url = f"{base_url}&cp={page_number}" html = crawl(page_url) if not html: break rows = scrape_bestbuy_listing(html) if not rows: print(f"No products on page {page_number}, stopping.") break all_rows.extend(rows) print(f"Page {page_number}: {len(rows)} products") time.sleep(2) return all_rows if __name__ == "__main__": rows = scrape_all_pages("i phone", max_pages=5) export(rows)
The empty-results break stops you early when a search runs out of pages, and the time.sleep(2) between requests paces the run so you are not flagged for rapid-fire traffic. Swap the search term for any query you want, point the same parser at a single product page rather than a search URL, and you can extend this into a price-tracking pipeline. For the wider picture on turning this kind of feed into a monitoring tool, see web scraping for price intelligence and the guide on building a price comparison tool.
Staying unblocked
Even with rendering handled, Best Buy watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.
- Pace your requests. Spread requests out with a delay between pages rather than crawling everything at full speed. Schedule heavier jobs during off-peak hours to ease load on Best Buy's servers.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
- Retain only what you need. Store the product fields your project uses and discard the rest. Re-check your selectors periodically so the scraper keeps pace with markup changes.
For the broader playbook, see how to scrape websites without getting blocked, and for more on why rendering matters here, how to crawl JavaScript websites. If your project sits in a wider retail context, the overview of ecommerce web scraping covers the patterns shared across stores like Best Buy and Amazon product data.
Is it legal to scrape Best Buy?
Whether scraping Best Buy is allowed depends on Best Buy's Terms and Conditions, your jurisdiction, and what you do with the data. Best Buy's terms place limits on automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Best Buy's Terms and Conditions and its robots.txt, and treat both as the boundary for what you collect. For commercial or competitive use, the legal picture gets more complex, and consulting a legal expert about your specific case is the sensible move.
A few lines worth holding to. Collect only public data: the product titles, prices, models, SKUs, ratings, and availability that anyone can see on a Best Buy search or product page without an account. Keep your request volume low enough that you are not straining Best Buy's servers, and avoid personal data, including anything tied to identifiable shoppers, reviewers, or store staff beyond what is publicly listed. If you plan to reuse the data commercially, get permission or an official agreement rather than assuming silence is consent.
This guide is deliberately scoped to public search and product pages because that is the line that keeps the work defensible. It does not cover anything behind a login, account or order data, personal information, or any attempt to bypass authentication or a CAPTCHA you are not entitled to pass. Best Buy runs an affiliate and partner program with official product feeds for licensed use, and that is the right path when you need large volumes, guaranteed structure, or commercial rights. If your project needs more than public catalog data, an official feed or a data agreement is the correct route, not a cleverer scraper.
Key takeaways
- Best Buy is a public electronics catalog. Its search and product pages expose title, price, model, SKU, rating, and availability, which is why they are useful for price tracking and market research.
-
You need rendering and a trusted IP together. Best Buy fills its search grid client-side and blocks bot traffic, so the Crawling API renders the page behind a residential IP in one call with
ajax_waitandpage_wait. -
BeautifulSoup does the extraction. Loop
ol.sku-item-list li.sku-itemcards and map each field to its selector, and expect those selectors to drift as Best Buy's markup changes. -
Paginate with the
cpparameter. Walk pages with&cp=N, stop on an empty page, and pace requests with a short delay between pages. - Stay on public data. Respect Best Buy's Terms and Conditions and robots.txt, prefer an official feed for licensed or bulk data, and never touch accounts, orders, or personal information.
Frequently Asked Questions (FAQs)
Why does a plain request return no products from Best Buy?
Best Buy renders its search grid client-side, so a raw requests.get() gets back a shell with no product cards in it, which is why a plain parser returns an empty list. On top of that, Best Buy challenges or blocks traffic that does not look like a real browser. Rendering the page through the Crawling API behind a trusted IP solves both, which is why the scraper here routes its request through it with the ajax_wait and page_wait options set.
What fields can I scrape from a Best Buy listing?
From each search result card you can read the product title, the current customer price, the manufacturer model number, Best Buy's own SKU, the star rating and review count, an availability state from the fulfillment button, and the link to the product's detail page. The scraper in this guide pulls all of those into one record per product, then writes them to JSON and CSV.
How do I handle pagination on Best Buy?
Best Buy tracks the current search page with the cp URL parameter, so &cp=2 requests the second page and so on. Loop over the page numbers, append &cp=N to the search URL each time, and stop when a page returns no products. Add a short delay between requests so you are pacing the run rather than firing pages back to back.
How does the Crawling API handle Best Buy's JavaScript content?
The Crawling API renders the page in a real browser before returning it, and the ajax_wait and page_wait options control how long it waits for asynchronous content. Setting ajax_wait to true waits for the asynchronous product grid to finish, and page_wait holds for a fixed number of milliseconds after load so the late-rendering cards appear in the HTML you get back.
How do I avoid getting blocked while scraping Best Buy?
Keep your per-IP request rate low, add a delay between pages, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation, a trusted IP pool, and CAPTCHA handling for you; if you build your own stack, that is the part to invest in. Watch the pc_status values and back off when you start seeing challenges.
Can I track Best Buy price changes over time?
Yes. Run the scraper on a schedule, stamp each export with the date, and store the snapshots. Comparing successive runs shows which products changed price or moved in or out of stock, which is the basis for price monitoring and competitive analysis. The same parser works on a single product page, so you can narrow a tracker to the exact SKUs you care about.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
