Amazon's Best Sellers pages are a live, public ranking of what is actually selling across the catalog. Every category, from Electronics to Computers and Accessories, gets its own ordered list of the top products, refreshed hourly by Amazon, with each item's rank, title, price, and rating sitting right there in the page. That ranking is one of the cleanest demand signals on the open web, which is why product researchers, sellers, and analysts track it for trend spotting, competitive analysis, and pricing decisions.
This guide shows you how to scrape an Amazon Best Sellers list with Python. You build a small, runnable scraper that fetches a category's Best Sellers page through the Crawling API, parses a clean record for each ranked product, and exports the results to JSON and CSV. The whole walkthrough stays scoped to public ranking data: the names, prices, ratings, and links anyone can see on a Best Sellers page without logging in.
What you will build
A Python script that takes an Amazon Best Sellers category URL, retrieves the rendered page through the Crawling API, and extracts a structured record per ranked product. We use the Best Sellers in Computers and Accessories page as the running example, the same category the legacy walkthrough used, and pull these fields from each ranked card:
- Rank the product's position in the list, for example 1 for the top seller.
- Title the product name shown on the ranking card.
- Price the listed price, when the product shows one.
- Rating the average star rating, for example "4.7 out of 5 stars".
- Link the URL to the product's own detail page.
Why a plain request fails on Amazon
If you point a bare HTTP client at an Amazon Best Sellers URL, you rarely get the ranked list you came for. Two things work against you. First, much of the ranking grid renders client-side: Amazon ships a lightweight shell and fills the cards in as the page's JavaScript runs and as you scroll, so the initial HTML is often missing the lower-ranked items. Second, Amazon flags automated traffic fast. Datacenter IP ranges and request patterns that do not look like a real browser get met with a CAPTCHA, a "robot check" interstitial, or an outright block before you ever reach the list.
So a working Best Sellers scraper needs two things in one request: a browser that renders the page, and an IP that Amazon reads as a real shopper. You can assemble that yourself with a headless browser and a pool of rotating residential proxies, but keeping that stack healthy is most of the work. The Crawling API folds both into a single call: you send it the category URL, it renders the page behind a trusted residential IP, handles the rotation and CAPTCHA solving, and returns finished HTML for you to parse.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, the official Python docs or any beginner course covers the level this tutorial assumes.
Python 3.8 or later. Confirm your version with python --version (or python3 --version). If you do not have it, install it from python.org and make sure Python is on your system PATH.
A Crawlbase account and token. Sign up for a free account, open your dashboard, and copy your token from the account docs page. The free tier includes 1,000 requests with no card, which is plenty to build and test this scraper. Treat the token like a password and keep it out of version control.
Set up the project
Create a virtual environment so project dependencies stay isolated, then install the two libraries the scraper needs. crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull each field out of the ranking cards by CSS selector.
python --version python -m venv amazon_env source amazon_env/bin/activate pip install crawlbase beautifulsoup4
On Windows, activate the environment with amazon_env\Scripts\activate instead of the source line. With both libraries installed, create the script file the rest of the guide builds up:
touch amazon_best_sellers.py
Understanding the Best Sellers page
Each Amazon Best Sellers category lives at a stable /zgbs/ URL. The Computers and Accessories list, for example, is https://www.amazon.com/Best-Sellers-Computers-Accessories/zgbs/pc, and Electronics is https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics. The page lays out an ordered grid of ranking cards, one per product, each carrying the same handful of fields: a rank badge, a title, a price, a star rating, and a link into the product's detail page.
Before writing selectors, open a Best Sellers page in your browser, right-click a ranking card, and choose Inspect. Amazon wraps each ranked item in a container marked with an id="gridItemRoot" attribute, shows the rank in a numbered badge near the top of the card, and exposes the title, price, and rating inside that container. Those are the elements you target. Amazon's utility class names change often, so where you can, lean on the more durable structural attributes like id="gridItemRoot" rather than a brittle class chain.
Step 1: Fetch the rendered Best Sellers page
Start by getting the finished page. Import the CrawlingAPI class, initialize it with your token, set the category URL, and request it. Checking the status code before you parse keeps failures loud instead of silent.
from crawlbase import CrawlingAPI api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) def crawl(page_url): options = {"ajax_wait": "true", "page_wait": 4000} response = api.get(page_url, options) if response["status_code"] == 200: return response["body"].decode("latin1") print(f"Request failed: {response['status_code']}") return None if __name__ == "__main__": bestsellers_url = "https://www.amazon.com/Best-Sellers-Computers-Accessories/zgbs/pc" html = crawl(bestsellers_url) print(html[:500] if html else "No HTML returned")
The two wait options matter for a list that fills in as the page loads. ajax_wait tells the API to wait for asynchronous content to finish, and page_wait holds for a fixed number of milliseconds after load so the late-rendering cards appear before the page is captured. The body is decoded as latin1 because Amazon pages mix in characters that strict UTF-8 decoding can choke on. Run the script and you should see real ranking markup, not a robot-check shell. That confirms rendering works before you write a single selector.
The Best Sellers list needs a rendered page behind a trusted IP, in one call. The Crawling API takes your token, runs the category page in a real browser, rotates through residential IPs server-side, and handles the CAPTCHA solving, then hands you finished HTML. You skip running a headless browser fleet and a proxy pool yourself. Point it at a /zgbs/ category on the free 1,000-request tier first.
Step 2: Parse the ranking cards with BeautifulSoup
With rendered HTML in hand, load it into BeautifulSoup, find every ranking card, and pull each field by its selector. Amazon wraps each ranked item in a container you select on, shows the rank in a numbered badge, and exposes the title, price, and rating inside the card. Read the product link from the card's anchor. Wrap each card in a try/except so one malformed listing does not crash the run.
from bs4 import BeautifulSoup BASE = "https://www.amazon.com" def text_of(card, selector): el = card.select_one(selector) return el.get_text(strip=True) if el else None def parse_link(card): a = card.select_one("a.a-link-normal[href]") if not a: return None href = a["href"] return href if href.startswith("http") else BASE + href def scrape_best_sellers(html): soup = BeautifulSoup(html, "html.parser") cards = soup.select("div#gridItemRoot") results = [] for card in cards: try: rank = text_of(card, "span.zg-bdg-text") results.append({ "rank": rank.lstrip("#") if rank else None, "title": text_of(card, "div._cDEzb_p13n-sc-css-line-clamp-3_g3dy1"), "price": text_of(card, "span._cDEzb_p13n-sc-price_3mJ9Z"), "rating": text_of(card, "span.a-icon-alt"), "link": parse_link(card), }) except Exception as e: print(f"Skipped a card: {e}") return results
The text_of helper queries one element inside a card and returns None when it is missing, instead of throwing on a .get_text() call against nothing. That keeps extraction resilient when a field is absent, which is common since not every ranked item shows a price. The rank comes from the numbered badge (span.zg-bdg-text), with the leading # stripped so you store a clean number. The rating string lives in the hidden span.a-icon-alt that holds the "4.7 out of 5 stars" text Amazon renders behind the star graphic, and the link is normalized to an absolute URL since Amazon often serves a relative href.
Amazon's p13n-sc utility class names (the title clamp, the price span) are generated and change without notice, while structural markers like div#gridItemRoot and span.zg-bdg-text are more durable. Treat the class-based selectors above as a starting template, not a contract. When a field comes back as None for every card, re-inspect the live Best Sellers page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper.
Step 3: Assemble the script and export JSON and CSV
Now wire the fetch and the parse into one runnable script, then write the records to both JSON and CSV so you can load them into a notebook or a spreadsheet. Fetch the rendered category page, hand it to the parser, and dump the structured rows.
import csv import json from crawlbase import CrawlingAPI from bs4 import BeautifulSoup api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) BASE = "https://www.amazon.com" FIELDS = ["rank", "title", "price", "rating", "link"] def crawl(page_url): options = {"ajax_wait": "true", "page_wait": 4000} response = api.get(page_url, options) if response["status_code"] == 200: return response["body"].decode("latin1") print(f"Request failed: {response['status_code']}") return None def text_of(card, selector): el = card.select_one(selector) return el.get_text(strip=True) if el else None def parse_link(card): a = card.select_one("a.a-link-normal[href]") if not a: return None href = a["href"] return href if href.startswith("http") else BASE + href def scrape_best_sellers(html): soup = BeautifulSoup(html, "html.parser") cards = soup.select("div#gridItemRoot") results = [] for card in cards: try: rank = text_of(card, "span.zg-bdg-text") results.append({ "rank": rank.lstrip("#") if rank else None, "title": text_of(card, "div._cDEzb_p13n-sc-css-line-clamp-3_g3dy1"), "price": text_of(card, "span._cDEzb_p13n-sc-price_3mJ9Z"), "rating": text_of(card, "span.a-icon-alt"), "link": parse_link(card), }) except Exception as e: print(f"Skipped a card: {e}") return results def export(rows, name="amazon_best_sellers"): with open(f"{name}.json", "w", encoding="utf-8") as f: json.dump(rows, f, indent=2, ensure_ascii=False) with open(f"{name}.csv", "w", newline="", encoding="utf-8") as f: writer = csv.DictWriter(f, fieldnames=FIELDS) writer.writeheader() writer.writerows(rows) print(f"Saved {len(rows)} products to {name}.json and {name}.csv") def main(): url = "https://www.amazon.com/Best-Sellers-Computers-Accessories/zgbs/pc" html = crawl(url) if not html: return rows = scrape_best_sellers(html) export(rows) if __name__ == "__main__": main()
Run the full script with python amazon_best_sellers.py. It fetches the rendered category page, parses one row per ranked product, and writes both amazon_best_sellers.json and amazon_best_sellers.csv. The shared FIELDS list keeps the CSV column order in step with the dictionary keys, so the two exports never drift apart.
What the output looks like
You get a clean list of ranked records, in list order, ready to write to JSON, CSV, or a database.
[ { "rank": "1", "title": "Amazon Basics High-Speed HDMI Cable, 6 Feet", "price": "$6.99", "rating": "4.7 out of 5 stars", "link": "https://www.amazon.com/dp/B014I8SSD0" }, { "rank": "2", "title": "Apple AirTag", "price": "$24.99", "rating": "4.7 out of 5 stars", "link": "https://www.amazon.com/dp/B0D54JZTHY" } ]
Faster path: the built-in Best Sellers scraper
Writing your own selectors gives you full control, but it also means maintaining them as Amazon's markup shifts. If you would rather not, the Crawling API ships a built-in amazon-best-sellers scraper that returns the ranked list as structured JSON directly, no BeautifulSoup needed. You pass a scraper option and read the parsed body.
import json from crawlbase import CrawlingAPI api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) url = "https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics" response = api.get(url, {"scraper": "amazon-best-sellers"}) if response.get("status_code") == 200: data = json.loads(response["body"].decode("latin1")) result = data.get("body", {}) with open("amazon_best_sellers.json", "w", encoding="utf-8") as f: json.dump(result, f, indent=4, ensure_ascii=False) print("Scraper response saved to amazon_best_sellers.json") else: print(f"Request failed: {response.get('status_code', 0)}")
The parsed response includes a pageTitle, a products array (each with title, price, customerReview, customerReviewCount, asin, image, url, and position), the list of sibling categories, and a pagination block. It is the quickest way to get a maintained ranking feed; the hand-rolled parser above is the right call when you need fields the built-in scraper does not return or want to learn the page structure. The Scraper API documented in auto-parse web data drives the same auto-parsing.
Scaling across categories and pages
One category is a demo; a real research job runs across many. Amazon exposes a Best Sellers list for nearly every department and subcategory, each at its own /zgbs/ URL, and each list paginates (typically the top 50 split over two pages, with a ?pg=2 parameter). Walk a set of category URLs, and pace the requests so you are not hammering Amazon in a tight loop.
import time CATEGORIES = { "computers": "https://www.amazon.com/Best-Sellers-Computers-Accessories/zgbs/pc", "electronics": "https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics", } def scrape_categories(categories): everything = {} for name, url in categories.items(): rows = [] for page in (1, 2): page_url = url if page == 1 else f"{url}?pg={page}" html = crawl(page_url) if not html: break found = scrape_best_sellers(html) if not found: break rows.extend(found) print(f"{name} page {page}: {len(found)} products") time.sleep(2) everything[name] = rows return everything
The empty-results break stops you early when a category runs out of pages, and the time.sleep(2) between requests paces the run so you are not flagged for rapid-fire traffic. Keying the output by category name keeps each list separate, which is what you want when comparing rankings across departments. To track trends over time, run the job on a schedule and stamp each export with the date, then diff successive snapshots to see what moved up or down the list.
Staying unblocked
Even with rendering handled, Amazon watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.
- Pace your requests. Spread requests out with a delay between pages and categories rather than crawling everything at full speed. Schedule heavier jobs during off-peak hours to ease load on Amazon's servers.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
- Retain only what you need. Store the ranking fields your project uses and discard the rest. Re-check your selectors periodically so the scraper keeps pace with markup changes.
For the broader playbook on avoiding blocks, see how to scrape websites without getting blocked, and for more on why rendering matters here, how to crawl JavaScript websites. If you want to deepen the BeautifulSoup side, the guide on using BeautifulSoup in Python covers the parsing library in detail.
Is it legal to scrape Amazon?
Whether scraping Amazon is allowed depends on Amazon's Conditions of Use, your jurisdiction, and what you do with the data. Amazon's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Amazon's Conditions of Use and its robots.txt, and treat both as the boundary for what you collect. For commercial or competitive use, the legal picture gets more complex, and consulting a legal expert about your specific case is the sensible move.
A few lines worth holding to. Collect only public data: the ranks, product titles, prices, ratings, and listing links that anyone can see on a Best Sellers page without an account. Keep your request volume low enough that you are not straining Amazon's servers, and avoid personal data, including anything tied to identifiable shoppers, reviewers, or sellers beyond what is publicly listed. If you plan to reuse the data commercially, get permission or an official agreement rather than assuming silence is consent.
This guide is deliberately scoped to public Best Sellers ranking pages because that is the line that keeps the work defensible. It does not cover anything behind a login, account or order data, personal information, or any attempt to bypass authentication or a CAPTCHA you are not entitled to pass. For licensed or bulk access, Amazon offers the Product Advertising API and other official programs, and that is the right tool when you need large volumes, guaranteed structure, or commercial rights. If your project needs more than public ranking data, an official API or a data agreement is the correct path, not a cleverer scraper.
Key takeaways
-
Best Sellers is a live demand signal. Each category's
/zgbs/page ranks the current top products, which is why it is so useful for product research and trend tracking. - You need rendering and a trusted IP together. Amazon fills the ranking grid client-side and blocks bot traffic, so the Crawling API renders the page behind a residential IP in one call.
-
BeautifulSoup does the extraction. Loop
div#gridItemRootcards and map rank, title, price, rating, and link to current selectors, and expect those selectors to drift. -
Export to JSON and CSV. A shared field list keeps both files in sync; the built-in
amazon-best-sellersscraper is the lower-maintenance alternative when you want structured JSON directly. - Stay on public data. Respect Amazon's Conditions of Use and robots.txt, prefer the official Product Advertising API for licensed or bulk data, and never touch accounts, orders, or personal information.
Frequently Asked Questions (FAQs)
Why does a plain request return no Best Sellers from Amazon?
Two reasons. Amazon fills much of the ranking grid client-side as the page loads and scrolls, so a raw request often gets a shell missing the lower-ranked items. On top of that, Amazon challenges or blocks traffic that does not look like a real browser. Rendering the page through the Crawling API behind a trusted IP solves both, which is why the scraper here routes its request through it.
How do I scrape Best Sellers for a specific category?
Every Best Sellers list has its own stable /zgbs/ URL, for example /Best-Sellers-Computers-Accessories/zgbs/pc for Computers and Accessories or /Best-Sellers-Electronics/zgbs/electronics for Electronics. Point the scraper at the category URL you want. To cover many categories, keep a map of names to URLs and loop over it, pacing the requests with a short delay.
Can I scrape Best Sellers for any product category?
You can scrape most categories, since Amazon publishes Best Sellers for nearly every department and many subcategories, each with its own list. The page layout is consistent across categories, so the same selectors usually carry over. Keep volume reasonable and respect Amazon's terms and rate expectations when you fan out across many categories.
What is the difference between the BeautifulSoup parser and the built-in scraper?
The BeautifulSoup parser gives you full control over which fields you extract and how, at the cost of maintaining selectors as Amazon's markup changes. The built-in amazon-best-sellers scraper returns the ranked list as structured JSON directly, including ASIN, image, and position, with no selectors to maintain. Use the built-in scraper for speed and the hand-rolled parser when you need a field it does not return.
How can I track Best Sellers trends over time?
Run the scraper on a schedule, stamp each export with the date, and store the snapshots. Comparing successive runs shows which products climbed or fell in the ranking and how prices shifted. That time series is what turns a one-off list into trend data you can use for product research, pricing, and competitive analysis.
How do I avoid getting blocked while scraping Amazon?
Keep your per-IP request rate low, add a delay between pages and categories, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation, a trusted IP pool, and CAPTCHA handling for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing challenges.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
