Scrape Amazon ASIN Data at Scale

Every product on Amazon carries a hidden primary key: its ASIN. For price monitoring, catalog research, or competitor analysis, the ASIN is the handle you build everything around, because it points at exactly one product page no matter how the listing's title or URL changes. The hard part is not the ASIN itself, it is getting Amazon's public product pages to load at all when you ask for them at scale.

This guide shows you how to scrape Amazon ASIN data in Python the reliable way. We cover what an ASIN is and how to find one, then build a small scraper that pulls public product data by ASIN and routes every request through the Crawlbase Smart AI Proxy so pages come back rendered instead of blocked. The whole walkthrough stays scoped to public product data, and the legality section near the end is worth reading before you point this at real volume.

What is an Amazon ASIN?

An ASIN, short for Amazon Standard Identification Number, is a unique 10-character alphanumeric code that Amazon assigns to every product in its catalog. It is the identifier the platform uses internally to tell one listing from another, and for most products it begins with B0. Books are the common exception: their ASIN is usually the ISBN-10.

The ASIN matters for scraping because it is stable. A product's title can be edited, its URL can carry tracking parameters, and its search position can move every hour, but the ASIN stays fixed for the life of the listing. That makes it the right key to anchor a dataset on: collect ASINs once and you can re-fetch each product's current price, availability, and rating on a schedule without re-discovering the listing every time.

How to find an ASIN

There are two easy ways to get a product's ASIN. The first is the URL. Open any product page and look at the path: the ASIN sits right after the /dp/ segment.

text

https://www.amazon.com/dp/B0B7CH8DMR
                            ^^^^^^^^^^
                            this is the ASIN

The same code appears in longer, SEO-style URLs too, again directly after /dp/:

text

https://www.amazon.com/OtterBox-COMMUTER-iPhone-Pro-ONLY/dp/B0B7CH8DMR/

The second way is the page body. Scroll to the "Product information" or "Additional information" table on most listings and the ASIN is listed as its own row. When you scrape the rendered HTML, the value also appears in an embedded field called currentAsin, which is a reliable place to read it programmatically.

ASIN vs SKU

Do not confuse an ASIN with a SKU. An ASIN is Amazon's catalog identifier, the same for every seller offering that product. A SKU (Stock Keeping Unit) is a private code a seller assigns to track their own inventory, and it can differ from one seller to the next and across other sales channels. For cross-listing analysis you key on the ASIN, not the SKU.

Why a plain request to Amazon fails

If you point a bare HTTP client at an Amazon product page, you will rarely get the page you expected. Amazon defends aggressively against automated traffic: datacenter IPs and request patterns that do not look like a real browser get a CAPTCHA, a "Sorry, something went wrong" interstitial, or an outright block long before you reach the product fields, and the blocks arrive faster the more you send from one IP.

So a working Amazon scraper needs two things on every request: an IP the platform reads as a genuine visitor, and request handling that gets past CAPTCHAs and bot challenges. You can assemble that yourself with a pool of rotating residential proxies plus your own anti-block logic, but keeping that stack healthy is most of the work. The Smart AI Proxy folds it into a single endpoint: point your normal HTTP client at it as a proxy, and it routes the request through Crawlbase's Crawling API behind a trusted IP, returning the rendered page.

Why the Smart AI Proxy

The Smart AI Proxy is a drop-in proxy endpoint backed by the Crawling API. Anything that can send traffic through an HTTP proxy, curl, Python requests, a headless browser, can use it without an SDK. Behind the endpoint you get rotating residential IPs, CAPTCHA handling, and block avoidance, so the same code that fails against Amazon directly starts returning real pages.

A first request with curl

Create a free Crawlbase account and open the Smart AI Proxy section of your dashboard, where the connection details list your access token. That token is the only credential you need: the proxy authenticates on the username alone, so the password stays empty. Before writing any Python, confirm it works with a one-line curl that sends a product page request through the Smart AI Proxy and prints the rendered HTML.

bash

curl -x "http://[email protected]:8012" -k \
  "https://www.amazon.com/dp/B0B7CH8DMR"

Two flags carry the request. -x sets the proxy, in the form http://TOKEN@host:port; replace YOUR_TOKEN with your access token. The -k flag (long form --insecure) lets curl connect without verifying the proxy's TLS certificate, which is required because the Smart AI Proxy terminates the connection to handle forwarding and block avoidance before it reaches Amazon. Sending requests to the Smart AI Proxy without -k will fail, so it is not optional. When it works, you get back HTML with the product content in it rather than a CAPTCHA page.

Scrape Amazon ASIN data with Python

curl proves the path; Python turns it into something you can automate. You need Python 3.8 or later, and the standard requests library is enough, because the Smart AI Proxy plugs in as an ordinary HTTP proxy with no special SDK required.

bash

python --version

python -m venv amazon_env
source amazon_env/bin/activate

pip install requests

On Windows, activate the environment with amazon_env\Scripts\activate instead of the source line. Now create amazon_asin_scraper.py. The script builds the proxy URL from your token, sends the request through it, and prints the status and the body. Note verify=False: it is the Python equivalent of curl's -k and is required for the Smart AI Proxy.

python

import os
import requests

token = os.environ["CRAWLBASE_TOKEN"]
proxy = f"http://{token}@smartproxy.crawlbase.com:8012"
proxies = {"http": proxy, "https": proxy}

asin = "B0B7CH8DMR"
url = f"https://www.amazon.com/dp/{asin}"

response = requests.get(url, proxies=proxies, verify=False)

print("Status:", response.status_code)
print(response.text[:500])

Set your token first with export CRAWLBASE_TOKEN=your_token_here, then run python amazon_asin_scraper.py. A 200 status and the opening of the product HTML means the page came back rendered. From here you could hand the HTML to a parser like BeautifulSoup and pull out the title, price, and currentAsin field, but there is a cleaner option that skips the parsing entirely.

Crawlbase Amazon Scraper

Amazon blocks plain requests fast. The Smart AI Proxy is a drop-in proxy endpoint: point curl, Python requests, or any HTTP client at it and your traffic routes through rotating residential IPs with CAPTCHA and block handling built in, so product pages come back rendered instead of challenged. Start on the free tier with a public product page.

Start free

Get structured JSON with autoparse

Parsing Amazon's HTML by hand is brittle: the markup is dense and the selectors drift. Because the Smart AI Proxy runs on top of the Crawling API, you can pass Crawling API parameters as a request header and let Crawlbase do the parsing for you. Send autoparse=true in a header named CrawlbaseAPI-Parameters and the proxy returns structured JSON for the product instead of raw HTML.

python

import os
import json
import requests

token = os.environ["CRAWLBASE_TOKEN"]
proxy = f"http://{token}@smartproxy.crawlbase.com:8012"
proxies = {"http": proxy, "https": proxy}

headers = {"CrawlbaseAPI-Parameters": "autoparse=true"}

asin = "B0B7CH8DMR"
url = f"https://www.amazon.com/dp/{asin}"

response = requests.get(url, proxies=proxies, headers=headers, verify=False)
data = json.loads(response.text)

print("Status:", response.status_code)
print(json.dumps(data, indent=4))

Run it and the response is structured fields rather than a wall of HTML: name, price, availability, rating, the ASIN itself, and more, ready to store without writing a single selector. A trimmed sample looks like this:

json

{
  "name": "OtterBox Commuter Series Case for iPhone 14 Pro Max",
  "asin": "B0B7CH8DMR",
  "price": "$32.99",
  "availability": "In Stock",
  "rating": "4.6 out of 5 stars",
  "reviewsCount": 2841
}

Target a specific country

Prices and availability on Amazon vary by region, so a price-research job often needs results as a shopper in a particular country would see them. Pass the country parameter with a two-letter code in the same CrawlbaseAPI-Parameters header and the request routes from that region. Combine it with autoparse by separating the parameters with an ampersand.

python

headers = {
    "CrawlbaseAPI-Parameters": "autoparse=true&country=GB"
}

response = requests.get(url, proxies=proxies, headers=headers, verify=False)

With country=GB the page comes back as a visitor in the United Kingdom would see it, down to the "Deliver to" location. Swap in any valid two-letter code, US, DE, JP, to pull region-specific pricing for the same ASIN.

Scaling to many ASINs

One product is a demo; a real job runs over a list of ASINs. The shape stays the same: loop the codes, request each through the Smart AI Proxy with autoparse, and collect the rows. Pace the loop so you are not hammering Amazon; the proxy's IP rotation keeps any single address from tripping a rate limit.

python

import os
import json
import time
import requests

token = os.environ["CRAWLBASE_TOKEN"]
proxy = f"http://{token}@smartproxy.crawlbase.com:8012"
proxies = {"http": proxy, "https": proxy}
headers = {"CrawlbaseAPI-Parameters": "autoparse=true"}

asins = ["B0B7CH8DMR", "B09V3HN1KC", "B0BDHWDR12"]
results = []

for asin in asins:
    url = f"https://www.amazon.com/dp/{asin}"
    try:
        r = requests.get(url, proxies=proxies, headers=headers, verify=False)
        results.append(json.loads(r.text))
    except (requests.RequestException, json.JSONDecodeError) as err:
        print(f"Skipped {asin}: {err}")
    time.sleep(2)

with open("amazon_products.json", "w") as f:
    json.dump(results, f, indent=2)

print(f"Saved {len(results)} products")

This writes a tidy amazon_products.json you can load into a spreadsheet, a database, or a pricing model. Swap the hard-coded list for ASINs read from a file and you have a repeatable catalog job. For a broader treatment of avoiding blocks across any site, see how to scrape websites without getting blocked.

The Scraper API alternative

The Smart AI Proxy with autoparse is the flexible path, because it works as a proxy for any client and any site. If Amazon is your main target and you would rather call a plain REST endpoint than configure a proxy, the Crawling API is the purpose-built alternative. It returns pre-parsed JSON for supported sites like Amazon across product, search, and other page types without setting up a proxy at all. The trade-off is scope: the Smart AI Proxy handles any URL through a familiar proxy interface, while the Scraper API gives the cleanest output for the specific sites it supports. For an Amazon-heavy pipeline, compare both against your own workload.

Is it legal to scrape Amazon?

Scraping public Amazon data sits in a legal gray area, and whether a given project is allowed depends on Amazon's terms of service, your jurisdiction, and what you do with the data. Amazon's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it only makes the technical part work. Read Amazon's Conditions of Use and its robots.txt before you build, and treat the rate expectations there as a floor, not a suggestion.

A few lines worth holding to. Collect only public product data: titles, prices, availability, ratings, and review counts that anyone can see without an account. Respect robots.txt and keep request volume low enough that you are not straining anyone's servers. For commercial reuse, seek permission or an official data agreement rather than assuming silence is consent. And never collect personal data, including anything tied to individual customer accounts.

This guide is deliberately scoped to public product pages, because that is the line that keeps the work defensible. It does not cover anything behind a login, account or order data, seller dashboards, or reviews tied to identifiable people, and it does not bypass authentication of any kind. If your project needs more than public product data, the right move is an official Amazon API or a data agreement, not a cleverer scraper.

Recap

Key takeaways

The ASIN is the key. It is a stable 10-character code (usually starting with B0) found after /dp/ in any product URL, so anchor your dataset on it rather than on titles or URLs.
Plain requests get blocked. Amazon challenges datacenter IPs and bot-shaped traffic fast, so you need a trusted IP and CAPTCHA handling on every request.
The Smart AI Proxy is a drop-in. Point any HTTP client at it as a proxy and your traffic routes through rotating residential IPs with block avoidance built in; remember -k in curl and verify=False in Python.
autoparse skips the parsing. Send autoparse=true in the CrawlbaseAPI-Parameters header to get structured JSON, and add country=XX for region-specific pricing.
The Scraper API is the Amazon-native option. For an Amazon-heavy job it returns pre-parsed JSON from a plain REST endpoint without configuring a proxy.
Stay on public data. Respect Amazon's ToS and robots.txt, pace your requests, and never touch accounts, personal data, or anything behind a login.

Frequently Asked Questions (FAQs)

What is an Amazon ASIN?

An ASIN (Amazon Standard Identification Number) is a unique 10-character alphanumeric code Amazon assigns to every product in its catalog. It is the platform's internal identifier for a listing, and for most products it begins with B0; books are the usual exception, where the ASIN is the ISBN-10. Because the code stays fixed for the life of the listing, it is the right key to anchor a scraped dataset on.

How do I find a product's ASIN?

The fastest way is the URL: the ASIN appears right after the /dp/ segment, for example B0B7CH8DMR in amazon.com/dp/B0B7CH8DMR. It also shows up in the "Product information" table on most listings, and when you scrape the page it is present in an embedded currentAsin field you can read programmatically.

Why do my requests to Amazon get blocked?

Amazon defends hard against automated traffic. Datacenter IPs and request patterns that do not look like a real browser get a CAPTCHA, an interstitial, or an outright block, and the blocks come faster the more you send from one IP. To get real pages you need a trusted IP and request handling that gets past challenges, which is what routing through the Smart AI Proxy provides.

What is the difference between an ASIN and a SKU?

An ASIN is Amazon's catalog identifier, the same for every seller offering a given product. A SKU (Stock Keeping Unit) is a private code a seller assigns to track their own inventory; it can differ from one seller to the next and is used across other sales channels too. For analysis that compares the same product across sellers, key on the ASIN, not the SKU.

Should I use the Smart AI Proxy or the Scraper API for Amazon?

Both work. The Smart AI Proxy is a drop-in proxy endpoint that fits any HTTP client and any site, and with autoparse=true it returns structured JSON for Amazon products. The Scraper API is purpose-built for supported sites like Amazon and returns pre-parsed JSON from a plain REST call without configuring a proxy. For an Amazon-heavy pipeline, compare both against your workload; the Scraper API often gives the cleanest output for that specific target.

Is it legal to scrape Amazon?

It depends on Amazon's terms of service, your jurisdiction, and your purpose, and Amazon's terms restrict automated access. Keep strictly to public product data, titles, prices, availability, and ratings, respect robots.txt and the rate expectations there, and never touch accounts, personal data, or anything behind a login. For commercial reuse, get permission or an official data agreement rather than relying on a scraper.

Hamza Ikhlaq

Software Developer · Crawlbase

Software developer at Crawlbase writing hands-on guides on scraping target sites, proxies, and the Crawling API.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Get a free API key →Read the docs

Self-serve · No sales call required · Enterprise crawl volumes available

What is an Amazon ASIN?

How to find an ASIN

Why a plain request to Amazon fails

A first request with curl

Scrape Amazon ASIN data with Python

Get structured JSON with autoparse

Target a specific country

Scaling to many ASINs

The Scraper API alternative

Is it legal to scrape Amazon?

Key takeaways

Frequently Asked Questions (FAQs)

What is an Amazon ASIN?

How do I find a product's ASIN?

Why do my requests to Amazon get blocked?

What is the difference between an ASIN and a SKU?

Should I use the Smart AI Proxy or the Scraper API for Amazon?

Is it legal to scrape Amazon?

Crawl any site at scale, without fighting infrastructure.

Continue Reading

Turn Codex into a Full-Stack Web Scraper: Live Web Access with Web MCP

Build an AI Research Dataset with Web MCP: Crawl Once, Reuse Forever

LLM-Ready Markdown Web Scraping: Clean Data for AI

The infrastructure brief, in your inbox.

We use cookies

Customize cookies