Amazon's search results are one of the densest public datasets in retail. Every query returns a ranked grid of listings with titles, prices, star ratings, review counts, ASINs, and links into each product page. That data feeds competitor price tracking, catalog research, demand signals, and assortment analysis. The catch is that Amazon renders those listings with JavaScript and defends hard against automated traffic, so a plain HTTP request hands you a near-empty shell instead of the products you came for.

This guide shows you how to scrape Amazon search pages with Python the reliable way. You build a small, runnable scraper that fetches a rendered search results page through the Crawling API, parses each result card into a clean record, handles pagination across result pages, and exports the data to JSON and CSV. The whole walkthrough stays scoped to public search and listing data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.

What you will build

A Python script that takes an Amazon search query, retrieves the rendered results page through the Crawling API, and extracts a structured record per product. We will use the search term "games" as the running example and pull these fields from each result card:

  • Title the product name as it appears on the search card.
  • Price the listed price shown on the card.
  • Rating the average customer review, for example "4.5 out of 5 stars".
  • ASIN Amazon's ten-character product identifier for each listing.
  • Link the URL to the product's own detail page.

This post is scoped to the search results page (the SERP). If you want to drill into a single item once you have its link or ASIN, the companion guides on scraping Amazon product data and scraping by ASIN pick up where this one leaves off.

Why a plain request fails on Amazon

If you request an Amazon search URL with a bare HTTP client, you get a response with status 200 and almost none of the product data in the body. Two things work against you. First, Amazon loads much of its search grid client-side: the initial HTML is a shell, and the results, filters, and prices fill in only after the page's JavaScript and Ajax calls run in a browser. Second, Amazon flags automated traffic quickly. Datacenter IPs and request patterns that do not look like a real browser get challenged with a CAPTCHA, rate-limited, or blocked before they ever reach the rendered listings.

So a working Amazon scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real shopper. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted residential IP, and it returns finished data for you to parse.

Why the JS token

Crawlbase offers two token types. The normal token (TCP) fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Amazon's search grid is JavaScript-driven, so you need the JS token here. Using the normal token returns the same empty shell a plain fetch would, and there is nothing useful to parse out of it.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, the guide to web scraping with Python covers the fundamentals this tutorial assumes.

Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.

A Crawlbase account and JS token. Sign up for a free account, open your dashboard, and copy your JavaScript (JS) token. The free tier includes 1,000 requests with no card required, and you only pay for successful requests after that. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the two libraries the scraper needs.

bash
python --version

python -m venv amazon_env
source amazon_env/bin/activate

pip install crawlbase pandas

On Windows, activate the environment with amazon_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official, dependency-free client for the Crawling API, and pandas organizes the scraped records and writes them out to CSV. Because we use the Crawling API's built-in amazon-serp scraper, the parsed fields come back as structured JSON, so you do not need a separate HTML parser like BeautifulSoup for this post.

Understanding the Amazon search page

Amazon exposes search through a simple URL pattern. The query goes in the k parameter:

bash
https://www.amazon.com/s?k=games

Each search page lays out a grid of product cards, one per listing. A card carries a title, a price, a star rating with a review count, an image, a link into the detail page, and the product's ASIN. Amazon also seeds the top and middle of results with sponsored listings, which the parser flags separately so you can keep or drop them. Below the grid sit pagination controls that walk you through more result pages for the same query. The data is there, but it loads dynamically, which is exactly why the rendering step in the next section matters.

Step 1: Fetch the rendered search page

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JS token, build the search URL from a query, and request it with the wait options that let the dynamic content settle. Checking the status code before you parse keeps failures loud instead of silent.

python
from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

def crawl(page_url):
    options = {"page_wait": 2000, "ajax_wait": "true"}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("latin1")
    print(f"Request failed: {response['status_code']}")
    return None

if __name__ == "__main__":
    query = "games"
    search_url = f"https://www.amazon.com/s?k={query}"
    html = crawl(search_url)
    print(html[:500] if html else "No HTML returned")

The two wait options matter for a client-rendered target like this. ajax_wait tells the API to wait for asynchronous requests to finish before capturing the page, and page_wait holds for a fixed number of milliseconds after load so the late-rendering grid appears first. Two seconds is a reasonable start; raise it if cards come back missing. The body is decoded as latin1 because Amazon pages mix in characters that strict UTF-8 decoding can choke on. Run the script and you should see real product markup, not the empty shell a plain fetch returns. That confirms rendering works before you ask the API to parse anything.

Crawlbase Amazon Scraper

Amazon needs a rendered page behind a trusted IP, in one call. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished output, so you skip running a headless browser fleet and a proxy pool yourself. Point it at a search query on the free tier first.

Step 2: Parse results with the amazon-serp scraper

You could pull the rendered HTML and write your own selectors, but the Crawling API ships built-in scrapers for common sites, including Amazon. Passing the scraper option tells the API to parse the page server-side and return structured JSON instead of raw HTML. The scraper built for search result pages is amazon-serp. It returns an array of products, each with fields like name, price, customer review, ASIN, image, and link, plus a pagination block.

Here is a trimmed shape of what one product entry from amazon-serp looks like, so you know which keys to read:

json
{
  "name": "Product Name",
  "price": "$19.99",
  "rawPrice": 19.99,
  "currency": "$",
  "offer": "Offer Details",
  "customerReview": "4.5 out of 5 stars",
  "customerReviewCount": "1,234",
  "asin": "B0XXXXXXXX",
  "image": "Product Image URL",
  "url": "Product URL",
  "isPrime": true,
  "sponsoredAd": false
}

Add the scraper option to the request and the response body becomes JSON. The parsed payload sits under the body key, with the products under products and a pagination block alongside it. The function below extracts the five fields we care about per product, mapping name to title, customerReview to rating, and reading the asin and url straight through.

python
import json
from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

options = {
    "page_wait": 2000,
    "ajax_wait": "true",
    "scraper": "amazon-serp",
}

def parse_products(scraper_result):
    products = scraper_result.get("products", [])
    results = []
    for product in products:
        results.append({
            "title": product.get("name", ""),
            "price": product.get("price", ""),
            "rating": product.get("customerReview", ""),
            "reviews": product.get("customerReviewCount", ""),
            "asin": product.get("asin", ""),
            "link": product.get("url", ""),
            "isPrime": product.get("isPrime", False),
            "sponsored": product.get("sponsoredAd", False),
        })
    return results

Every field is read with .get() and a default, so a listing missing a price or rating yields an empty string rather than crashing the run, which is common since not every result carries a review yet. We keep isPrime and sponsored in the record too: the sponsored flag lets you filter paid placements out of organic analysis later. Because the scraper does the extraction server-side, you are not chasing Amazon's CSS class names yourself, which is the part that breaks most often in a hand-rolled scraper.

Step 3: Put it together and export

Now wire the fetch and the parse into one runnable script. Request the search page with the scraper option, load the JSON body, pull the products, then write them to both JSON and CSV so the data is ready for a spreadsheet or a downstream job.

python
import json
import pandas as pd
from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

options = {
    "page_wait": 2000,
    "ajax_wait": "true",
    "scraper": "amazon-serp",
}

def scrape_search(page_url):
    response = api.get(page_url, options)
    if response["status_code"] != 200:
        print(f"Request failed: {response['status_code']}")
        return [], None
    payload = json.loads(response["body"].decode("latin1"))
    scraper_result = payload["body"]
    return parse_products(scraper_result), scraper_result.get("pagination")

def parse_products(scraper_result):
    products = scraper_result.get("products", [])
    results = []
    for product in products:
        results.append({
            "title": product.get("name", ""),
            "price": product.get("price", ""),
            "rating": product.get("customerReview", ""),
            "reviews": product.get("customerReviewCount", ""),
            "asin": product.get("asin", ""),
            "link": product.get("url", ""),
            "isPrime": product.get("isPrime", False),
            "sponsored": product.get("sponsoredAd", False),
        })
    return results

def main():
    query = "games"
    search_url = f"https://www.amazon.com/s?k={query}"
    products, _ = scrape_search(search_url)
    print(f"Found {len(products)} products")

    with open("amazon_products.json", "w") as f:
        json.dump(products, f, indent=2)

    pd.DataFrame(products).to_csv("amazon_products.csv", index=False)

if __name__ == "__main__":
    main()

The scrape_search helper returns two things: the parsed product list and the pagination block, which the next section uses to walk every page. Writing both JSON and CSV covers the two common downstream needs: JSON keeps the records easy to feed into another script, and the pandas to_csv call hands you a spreadsheet-ready table in one line.

What the output looks like

Run the full script with python scraper.py and you get a clean list of records, one per result, ready to write anywhere.

json
[
  {
    "title": "Catan Board Game (Base Game)",
    "price": "$43.99",
    "rating": "4.8 out of 5 stars",
    "reviews": "32,114",
    "asin": "B00U26V4VQ",
    "link": "https://www.amazon.com/dp/B00U26V4VQ",
    "isPrime": true,
    "sponsored": false
  },
  {
    "title": "Ticket to Ride Board Game",
    "price": "$34.99",
    "rating": "4.8 out of 5 stars",
    "reviews": "28,907",
    "asin": "B0001WN5LE",
    "link": "https://www.amazon.com/dp/B0001WN5LE",
    "isPrime": true,
    "sponsored": false
  }
]

The CSV version of the same data has one column per field (title, price, rating, reviews, asin, link, isPrime, sponsored) and one row per product, which is what you want for a quick scan in a spreadsheet or a price comparison sheet.

Handling pagination across result pages

One page is a demo; a real job runs across every page of results for a query. The amazon-serp scraper returns a pagination block with the current page, the next page, and the total page count:

json
"pagination": {
  "currentPage": 1,
  "nextPage": 2,
  "totalPages": 20
}

Amazon paginates search with a &page= parameter, exactly as the site does in the browser: ?k=games&page=1, ?k=games&page=2, and so on. Read totalPages from the first response, then loop from page two to the last page, collecting products as you go.

python
import time

def scrape_all_pages(query, max_pages=5):
    base = f"https://www.amazon.com/s?k={query}"
    all_results, pagination = scrape_search(base)
    if not pagination:
        return all_results

    total_pages = min(pagination.get("totalPages", 1), max_pages)
    for page in range(2, total_pages + 1):
        page_url = f"{base}&page={page}"
        products, _ = scrape_search(page_url)
        if not products:
            break
        all_results.extend(products)
        print(f"Page {page}: {len(products)} products")
        time.sleep(2)
    return all_results

The max_pages cap keeps a run bounded so a broad query like "games" does not spin through all twenty pages by accident, and the empty-results break stops you early when a query has only a few pages. The time.sleep(2) between pages paces requests so you are not hammering search in a tight loop, which is the fastest way to get throttled. Swap the single-page call in main for scrape_all_pages(query) and the same JSON and CSV export handles the combined dataset.

Choosing the scraper or rolling your own

The scraper option is what makes this short. Drop it and the API returns the full rendered HTML, which you would then parse yourself with a library like BeautifulSoup, targeting Amazon's result-card selectors by hand. That path gives you total control but moves the maintenance burden onto you: Amazon's CSS class names drift, so a hand-rolled parser needs periodic upkeep. The amazon-serp scraper absorbs that work, which is why it is the default approach here. If you do want the manual route for a custom field the scraper does not expose, the BeautifulSoup guide covers the parsing technique, and the auto-parse Crawling API exposes the same Amazon scrapers through a dedicated endpoint if you prefer to call it that way.

Staying unblocked

Even with rendering handled, Amazon watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.

  • Pace your requests. Spread requests out with a delay between pages and vary your queries instead of crawling one term at full speed. The time.sleep in the pagination loop is the floor, not the ceiling.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
  • Read the status codes. A run that starts returning challenges or errors is telling you the current rate or IP tier is no longer enough. Treat that as a signal to back off, not noise to ignore.

For the broader playbook on keeping a scraper alive against defended sites, see how to scrape websites without getting blocked.

Whether scraping Amazon is allowed depends on Amazon's terms of service, your jurisdiction, and what you do with the data. Amazon's Conditions of Use restrict automated access and data extraction, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Amazon's Conditions of Use and its robots.txt, and treat both as the boundary for what you collect.

A few lines worth holding to. Collect only public data: the product titles, prices, ratings, review counts, ASINs, and listing links that anyone can see on a search results page without an account. Respect Amazon's stated rate expectations and keep your request volume low enough that you are not straining its servers. Avoid personal data, including anything tied to identifiable shoppers, reviewers, or third-party sellers beyond what is publicly listed on a results page. Do not redistribute copyrighted media such as product images or review text as if it were your own. If you plan to reuse the data commercially, get permission or an official agreement rather than assuming silence is consent.

This guide is deliberately scoped to public search and listing pages because that is the line that keeps the work defensible. It does not cover anything behind a login, account or order data, personal information, payment or checkout flows, or any attempt to bypass authentication. For licensed or bulk access, Amazon offers the Product Advertising API and other official programs, and that is the right tool when you need large volumes, guaranteed structure, or commercial rights. If your project needs more than public listings, an official API or a data agreement is the correct path, not a cleverer scraper.

Recap

Key takeaways

  • Amazon search is JavaScript-rendered. A plain fetch returns an empty shell, so you must render the page before you can read any products.
  • The JS token does rendering and a trusted IP together. One Crawling API call with page_wait and ajax_wait waits for the dynamic grid and hands back finished output.
  • The amazon-serp scraper parses for you. Pass scraper and you get structured JSON with name, price, customerReview, asin, and url per product, so you skip writing selectors.
  • Paginate with the page parameter. Read totalPages from the pagination block, walk &page= with a cap and a delay, and merge the results.
  • Stay on public data. Respect Amazon's Conditions of Use and robots.txt, prefer the official Product Advertising API for licensed or bulk data, and never touch accounts, orders, or personal information.

Frequently Asked Questions (FAQs)

Why does a plain request return no products from Amazon?

Because Amazon loads much of its search grid client-side with JavaScript and Ajax. The initial HTML is a shell that fills in only after the page's scripts run in a browser, so a raw HTTP request returns status 200 with the listings blank. To get real data you have to render the page first, which is what the Crawling API's JS token handles for you.

Do I need the normal token or the JS token for Amazon?

The JS token. The normal token fetches static HTML, which on Amazon is the same empty shell a plain fetch returns. The JS token renders the page in a real browser before parsing, so the product cards are present when the amazon-serp scraper extracts them.

What does the amazon-serp scraper return?

It returns a JSON object with a products array and a pagination block. Each product carries fields like name, price, rawPrice, currency, customerReview, customerReviewCount, asin, image, url, isPrime, and sponsoredAd. This post maps a handful of those to a clean title, price, rating, ASIN, and link record.

Read totalPages from the pagination block in the first response, then loop the search URL with an incrementing &page= parameter from page two onward. Scrape each page with the same function, cap the page count so a broad query stays bounded, and add a short delay between requests so you pace the run and do not get throttled.

Can I scrape order, account, or checkout data from Amazon?

No, and this guide does not cover it. Order history, account details, and checkout flows sit behind a login, so they are not public data. Scraping login-walled content, or bypassing authentication to reach it, is out of scope here and runs against Amazon's terms. For sanctioned access to richer data, the correct route is the official Product Advertising API or a partner agreement.

How do I avoid getting blocked while scraping Amazon?

Keep your per-IP request rate low, add a delay between pages, vary your queries instead of looping one term, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation and a trusted IP pool for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing challenges.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available