How to Scrape Amazon Product Data

Q: Should I use the auto-parser or cheerio?

Use the auto-parser (scraper: 'amazon-product-details') by default: it returns structured fields and there are no selectors to maintain when Amazon changes its layout. Reach for the cheerio fallback only when you need a specific field the parser does not expose, or when you want to see exactly where a value lives in the raw HTML.

Every public Amazon product page is a dense, structured record: a title, a current price, a star rating, a review count, an availability line, and a gallery of images. That is exactly the data that powers price tracking, competitor monitoring, catalog enrichment, and market research. The problem is that fetching it at any volume is harder than it looks, because Amazon renders parts of the page in the browser and challenges automated traffic before it ever reaches the content.

This guide shows you how to scrape Amazon product data with JavaScript and Node.js the reliable way. You build a small, runnable scraper that fetches a public product page through the Crawling API, pulls the product title, price, rating, review count, availability, and images, and exports a clean JSON record. Two approaches are covered: the built-in auto-parser that returns structured fields directly, and a cheerio fallback that reads the same fields from raw HTML by CSS selector. The whole walkthrough stays scoped to public product data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.

What you will build

A Node.js script that takes a public Amazon product URL, retrieves the page through the Crawling API, and produces a structured record for that product. We use the PHILIPS A4216 Wireless Sports Headphones page as the running example and extract these fields:

Title the product name, for example "PHILIPS A4216 Wireless Sports Headphones".
Price the current listed price as shown, like "$24.99".
Rating the average customer review, for instance "4.3 out of 5 stars".
Review count the number of ratings behind that average.
Availability the stock line, such as "In Stock" or "Currently unavailable".
Images the main image plus any additional gallery image URLs.

Why a plain request fails on Amazon

If you point a bare HTTP client at an Amazon product URL, you rarely get the clean page you see in a browser. Two things work against you. First, Amazon renders parts of the listing, including some pricing and gallery elements, with JavaScript, so the initial HTML can be incomplete until the page's scripts run. Second, Amazon flags automated traffic aggressively: datacenter IPs and request patterns that do not look like a real browser get served a CAPTCHA, a robot check, or an outright block long before you reach the product data.

So a working Amazon scraper needs two things in one request: a page that actually renders, and an IP that the platform reads as a genuine visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the product URL, it fetches the page behind a trusted, rotating IP, and it returns either finished HTML or, with one extra option, the product fields already parsed into JSON.

Two ways to parse

You have a choice once the page comes back. Pass the scraper option and Crawlbase runs its built-in Amazon parser server-side, handing you structured fields with no selectors to maintain. Omit it and you get raw HTML to parse yourself with cheerio. This guide shows both: the auto-parser first because it is the least brittle, then the manual path so you understand what is happening underneath.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic JavaScript and Node.js. You should be comfortable writing and running a Node script and installing packages with npm. If you are new to Node, the official docs and any beginner course will get you to the level this tutorial assumes. For a fuller walkthrough, see our guide on how to build a web scraper with Node.js.

Node.js 16 or later. Confirm your version with node --version. If you do not have it, install it from the Node.js website or through a version manager like nvm.

A Crawlbase account and token. Sign up, open your dashboard, and copy your token from the account docs page. You get up to 20,000 free requests with no card required, so you can run every example here on the free tier. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a project folder, initialize it, and install the two libraries the scraper needs.

bash

node --version

mkdir amazon-scraper && cd amazon-scraper
npm init -y

npm install crawlbase cheerio

Two dependencies do the work: crawlbase is the official Node client for the Crawling API, and cheerio parses returned HTML with a jQuery-style API so you can pull fields out by CSS selector. The auto-parser path needs only crawlbase; the manual fallback uses cheerio as well. If selectors are new to you, the primer on XPath and CSS selectors is a good companion.

Step 1: Fetch the product page with the auto-parser

Start with the least brittle approach. Import the CrawlingAPI class, initialize it with your token, and request the product URL with the scraper option set to amazon-product-details. That tells Crawlbase to parse the page server-side and return structured product fields as JSON instead of raw HTML.

javascript

const { CrawlingAPI } = require('crawlbase');

const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });

const productUrl = 'https://www.amazon.com/dp/B099MPWPRY';

api
  .get(productUrl, { scraper: 'amazon-product-details' })
  .then((response) => {
    if (response.statusCode === 200) {
      const data = JSON.parse(response.body);
      console.log(data.body);
    }
  })
  .catch((error) => console.error('Request error:', error));

Run it with node scraper.js. Because the scraper option is set, the response body is JSON rather than HTML, so you parse it with JSON.parse and read the parsed product under data.body. The Crawlbase Amazon parser returns the product's name, price, currency, rating, stock information, image URLs, and more as named fields, so you skip writing and maintaining selectors entirely. Checking the status code before you parse keeps failures loud instead of silent.

Crawlbase Amazon Scraper

That single scraper: 'amazon-product-details' option is the auto-parser doing the work for you. The Crawling API renders the Amazon page behind a rotating residential IP and returns title, price, rating, stock, and images as ready JSON, so you skip running a headless browser fleet, a proxy pool, and a wall of CSS selectors that drift every time Amazon ships a redesign. Point it at a public product page on the free tier first.

Start free

Step 2: Pull the fields you care about

The auto-parser returns a rich object, but most jobs only need a handful of fields. Map the parsed response down to the title, price, rating, review count, availability, and images, falling back to null when a field is absent so one missing value never crashes the run.

javascript

function extractProduct(parsed) {
  return {
    title: parsed.name || null,
    price: parsed.price || null,
    rating: parsed.customerReview || null,
    reviewCount: parsed.customerReviewCount || null,
    availability: parsed.inStock ? 'In Stock' : 'Unavailable',
    mainImage: parsed.mainImage || null,
    images: parsed.images || [],
  };
}

The field names here, name, price, customerReview, mainImage, and the rest, are the keys the Amazon parser returns inside data.body. The availability line is derived from the parser's stock flag rather than copied verbatim, so you get a consistent string regardless of how Amazon phrased it on the page. Keep this mapping in one small function: when you want a new field later, you add one line here instead of touching the fetch logic.

Step 3: Parse the raw HTML with cheerio (fallback)

Sometimes you want the raw page instead of the parsed object, either to grab a field the auto-parser does not expose or to understand exactly where each value lives. Drop the scraper option and the Crawling API returns the rendered HTML, which you load into cheerio and read by CSS selector. These are the selectors Amazon uses on a standard product page.

javascript

const cheerio = require('cheerio');

function parseHtml(html) {
  const $ = cheerio.load(html);

  const images = [];
  $('#altImages img').each((_, el) => {
    const src = $(el).attr('src');
    if (src) images.push(src);
  });

  return {
    title: $('#productTitle').text().trim() || null,
    price: $('.a-price .a-offscreen').first().text().trim() || null,
    rating: $('#acrPopover').attr('title') || null,
    reviewCount: $('#acrCustomerReviewText').text().trim() || null,
    availability: $('#availability').text().trim() || null,
    mainImage: $('#landingImage').attr('src') || null,
    images,
  };
}

Each field maps to a real element on the page. The title sits in #productTitle; the visible price is the offscreen text inside the first .a-price block, which Amazon keeps as a clean, currency-formatted string; the rating lives in the title attribute of #acrPopover ("4.3 out of 5 stars"); the rating count is the text of #acrCustomerReviewText; the stock line is in #availability; and the gallery thumbnails are the img tags under #altImages, with the hero shot at #landingImage. Reading each field defensively, with a null fallback, keeps a missing element from breaking the whole parse.

Selectors drift

Amazon's element IDs and class names (#productTitle, .a-price, #acrPopover, and the rest) change between layouts, regions, and product categories. Treat the selectors above as a starting template, not a contract. When a field comes back null, re-inspect the live page in your browser's dev tools and update the selector. This is exactly the maintenance the auto-parser saves you, which is why it is the default path above.

Step 4: Put it together and export JSON

Now wire the fetch, the extraction, and a JSON export into one runnable script. This version uses the auto-parser as the primary path and writes the final record to a file so you can feed it into a database, a comparison engine, or a price tracker.

javascript

const fs = require('fs');
const { CrawlingAPI } = require('crawlbase');

const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });

function extractProduct(parsed) {
  return {
    title: parsed.name || null,
    price: parsed.price || null,
    rating: parsed.customerReview || null,
    reviewCount: parsed.customerReviewCount || null,
    availability: parsed.inStock ? 'In Stock' : 'Unavailable',
    mainImage: parsed.mainImage || null,
    images: parsed.images || [],
  };
}

async function main() {
  const productUrl = 'https://www.amazon.com/dp/B099MPWPRY';
  const response = await api.get(productUrl, { scraper: 'amazon-product-details' });

  if (response.statusCode !== 200) {
    console.error(`Request failed: ${response.statusCode}`);
    return;
  }

  const parsed = JSON.parse(response.body).body;
  const product = extractProduct(parsed);

  fs.writeFileSync('product.json', JSON.stringify(product, null, 2));
  console.log('Saved product.json');
  console.log(product);
}

main().catch((error) => console.error('Error:', error));

Run the full script with node scraper.js. It fetches the page, maps the parsed fields with extractProduct, writes a tidy product.json, and logs the record to the console. To use the cheerio fallback instead, drop the scraper option from the api.get call and pass response.body straight to the parseHtml function from Step 3.

What the output looks like

The exported product.json is a single clean record you can store, compare against a previous run, or load into a tracker.

json

{
  "title": "PHILIPS A4216 Wireless Sports Headphones",
  "price": "$24.99",
  "rating": "4.3 out of 5 stars",
  "reviewCount": "2,184 ratings",
  "availability": "In Stock",
  "mainImage": "https://m.media-amazon.com/images/I/61abc123.jpg",
  "images": [
    "https://m.media-amazon.com/images/I/41def456.jpg",
    "https://m.media-amazon.com/images/I/51ghi789.jpg"
  ]
}

Scale to many products

One product is a demo; a real job runs a list. Because every standard product page shares the same parser and the same selectors, you collect a set of product URLs (or Amazon ASINs, which map directly to /dp/<ASIN> URLs) and loop over them, reusing the exact extractProduct logic for each.

javascript

async function scrapeMany(asins) {
  const records = [];
  for (const asin of asins) {
    const url = `https://www.amazon.com/dp/${asin}`;
    const response = await api.get(url, { scraper: 'amazon-product-details' });
    if (response.statusCode === 200) {
      const parsed = JSON.parse(response.body).body;
      records.push(extractProduct(parsed));
    }
  }
  return records;
}

scrapeMany(['B099MPWPRY', 'B08PZHYWJS']).then((rows) => {
  console.log(`Collected ${rows.length} products`);
});

To discover the URLs in the first place, scrape an Amazon search or category page for product links and feed those into this loop. That search-page step is its own topic, covered in our guide on how to scrape Amazon search pages with the Crawling API. If you would rather skip selectors and pricing logic entirely, our walkthrough on how to scrape prices from Amazon with AI shows a model-driven approach to the same data.

Staying unblocked

The Crawling API handles rendering and IP rotation for you, but a few habits keep large runs healthy, and they apply to any hard commercial target.

Pace your requests. Hammering Amazon in a tight loop is the fastest way to get throttled. Spread requests out and add a short delay between products rather than crawling at full speed.
Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API does this for you; if you build your own stack, the Smart AI Proxy gives you the same rotation as a drop-in endpoint.
Watch the status codes. A run that starts returning challenges or non-200 responses is telling you the current rate or IP tier is no longer enough. Treat that as a signal to back off, not noise to ignore.

For the broader playbook, see how to scrape websites without getting blocked. Amazon is also a frequent target for wider ecommerce web scraping work, where this same fetch-then-extract pattern carries across other marketplaces.

Is it legal to scrape Amazon?

Whether scraping Amazon is allowed depends on Amazon's Conditions of Use, your jurisdiction, and what you do with the data. Amazon's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it only makes the technical part work. Read Amazon's Conditions of Use and its robots.txt, and treat both as the boundary for what you collect.

A few lines worth holding to. Collect only public product data: the title, price, rating, review count, availability, and images that anyone can see without an account. Respect Amazon's stated rate expectations and keep your request volume low enough that you are not straining its servers. Do not scrape anything behind a login, and do not collect personal data about reviewers beyond the public review text already shown on the page. Copyrighted media, including product images, belongs to its owner: reference it, but do not redistribute it as your own. If you plan to reuse the data commercially, get permission or a data agreement rather than assuming silence is consent.

For sanctioned, large-scale access, Amazon offers official channels such as the Product Advertising API for affiliates and the Selling Partner API for sellers, and those are the right tools when you need guaranteed structure, volume, or commercial rights. This guide is deliberately scoped to public product pages because that is the line that keeps the work defensible. It does not cover anything behind a sign-in, buyer or seller account data, or any attempt to bypass authentication. If your project needs more than public product data, Amazon's official APIs or a licensing agreement are the correct path, not a cleverer scraper.

Recap

Key takeaways

Plain requests get blocked. Amazon renders parts of the page client-side and challenges automated traffic, so you need rendering and a trusted IP together, which the Crawling API provides in one call.
The auto-parser is the least brittle path. Passing scraper: 'amazon-product-details' returns title, price, rating, stock, and images as structured JSON with no selectors to maintain.
cheerio is the fallback. Drop the scraper option to get raw HTML, then read #productTitle, .a-price, #acrPopover, #availability, and the image elements yourself when you need a field the parser does not expose.
Scale by looping URLs or ASINs. The same extractProduct function runs over a list, and a search-page scraper feeds it the URLs to begin with.
Stay on public data. Respect Amazon's Conditions of Use and robots.txt, prefer the official Product Advertising or Selling Partner API for volume or commercial use, and never touch logins or personal reviewer data.

Frequently Asked Questions (FAQs)

Why does a plain request fail on Amazon?

Amazon renders parts of the listing with JavaScript and aggressively flags automated traffic, so a bare HTTP client often gets an incomplete page, a CAPTCHA, or a block instead of the product data. To get a complete page reliably you need it rendered and fetched behind a trusted, rotating IP, which is what the Crawling API handles for you.

Should I use the auto-parser or cheerio?

Use the auto-parser (scraper: 'amazon-product-details') by default: it returns structured fields and there are no selectors to maintain when Amazon changes its layout. Reach for the cheerio fallback only when you need a specific field the parser does not expose, or when you want to see exactly where a value lives in the raw HTML.

What product fields can I extract?

From a standard public product page you can pull the title, current price, average rating, review count, availability, and the main plus gallery image URLs. The auto-parser also returns extras like currency, seller name, and parent ASIN. You can collect all of it in structured JSON or CSV for price tracking, comparison engines, or competitive research.

Do my selectors keep returning null?

That usually means Amazon's markup changed. Its element IDs and class names (#productTitle, .a-price, #acrPopover, and the rest) differ between layouts, regions, and categories, so a selector that worked last month can break. Re-inspect a live page in your browser's dev tools and update the selector, or switch to the auto-parser, which absorbs those changes for you.

How do I avoid getting blocked while scraping Amazon?

Keep your per-IP request rate low, add a delay between products, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation and a trusted IP pool for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing challenges.

Can I scrape reviewer or seller personal data?

No, and this guide does not cover it. Stick to the public product fields shown on the page. Account data, anything behind a sign-in, and personal details about reviewers or sellers beyond the public review text are out of scope and run against Amazon's terms. For sanctioned access, the correct route is Amazon's official API or a data agreement.

Hamza Ikhlaq

Software Developer · Crawlbase

Software developer at Crawlbase writing hands-on guides on scraping target sites, proxies, and the Crawling API.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. Up to 20,000 requests free, no card required.

Get a free API key →Read the docs

Self-serve · No sales call required · Enterprise crawl volumes available

What you will build

Why a plain request fails on Amazon

Prerequisites

Set up the project

Step 1: Fetch the product page with the auto-parser

Step 2: Pull the fields you care about

Step 3: Parse the raw HTML with cheerio (fallback)

Step 4: Put it together and export JSON

What the output looks like

Scale to many products

Staying unblocked

Is it legal to scrape Amazon?

Key takeaways

Frequently Asked Questions (FAQs)

Why does a plain request fail on Amazon?

Should I use the auto-parser or cheerio?

What product fields can I extract?

Do my selectors keep returning null?

How do I avoid getting blocked while scraping Amazon?

Can I scrape reviewer or seller personal data?

Crawl any site at scale, without fighting infrastructure.

Continue Reading

How to Scrape Google People Also Ask: full PAA extraction guide

Introducing the New Crawlbase Dashboard: a cleaner control center

13 Tips to Master Data Crawling: crawls that do not break

The infrastructure brief, in your inbox.

We use cookies

Customize cookies