Amazon's search results page is one of the richest public datasets in ecommerce: every query returns a ranked grid of products with titles, prices, ratings, review counts, and a link back to each listing. Tracking how that grid shifts over time tells you what is selling, where a competitor sits for a keyword, and how prices move across a category. The data is public, but pulling it reliably from a script is the hard part, because Amazon renders much of the page with JavaScript and challenges automated traffic fast.

This guide shows you how to scrape the Amazon SERP with Next.js, the full-stack way. You build a small, runnable Next.js app whose server action calls the Crawling API to fetch a rendered Amazon search page, parses each product with Cheerio, and renders the results in a React grid. The Crawlbase token stays server-side the whole time, never exposed to the browser. We keep the walkthrough scoped to public search data, and the legality section near the end is not boilerplate, so read it before you point this at real volume. If you want the plain-script version instead, see how to scrape Amazon search pages with the Crawling API.

What you will build

A Next.js app (App Router) with a server action that takes a search keyword, fetches the rendered Amazon SERP through the Crawling API, parses it with Cheerio, and returns a structured record per product to a client component that renders the grid. We pull these fields per item:

  • Name the product title as shown on the card, for example "Apple iPhone 15 Pro Max 256GB".
  • Price the listed price as displayed, like "$1,199.00".
  • Image the product thumbnail URL for rendering the card.
  • Rating the star rating text when present, such as "4.7 out of 5 stars".
  • Reviews the customer review count shown beside the rating.
  • URL the absolute link to the individual product page.

Why a plain request fails on Amazon

If you request an Amazon search URL with a bare HTTP client, you rarely get the clean product grid you see in a browser. Two things work against you. First, Amazon renders prices, ratings, and parts of each result card in the browser with JavaScript, so the raw HTML can come back incomplete. Second, Amazon flags automated traffic quickly: datacenter IPs and request patterns that do not look like a real browser get a CAPTCHA, a "robot check" interstitial, or an outright block long before you reach the products.

So a working Amazon SERP scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted IP, and it returns finished HTML for you to parse. Doing this from a Next.js server action keeps your token off the client and your fetch on the server, which is exactly where it belongs.

Why the JS token

Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Amazon loads key result fields client-side, so the JS token gives you the most complete page here. Using the normal token can return a partial grid with prices or ratings missing, leaving you nothing reliable to parse.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Node.js 18.17 or later. Next.js needs a recent Node runtime. Confirm your version with node --version. If you do not have it, install it from the Node.js website or through a version manager like nvm.

Basic React and Next.js. You should be comfortable with React components, hooks, and running a Next.js app. If the App Router and server actions are new to you, the official Next.js docs cover both, and our guide on how to build a web scraper with Node.js is a good companion for the scraping half.

A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token from the account docs page. Treat the token like a password: it authenticates your requests, so keep it server-side and out of version control. We will read it from an environment variable, never hardcode it in a client component.

Set up the project

Scaffold a Next.js app, then install the two libraries the scraper needs. When the create wizard asks, choose the App Router; the rest of the defaults are fine.

bash
node --version

npx create-next-app@latest amazon-serp-scraper
cd amazon-serp-scraper

npm install crawlbase cheerio

Two dependencies do the work: crawlbase is the official Node client for the Crawling API, and cheerio parses the returned HTML with a jQuery-style API so you can pull out individual fields by CSS selector. If selectors are new to you, the primer on XPath and CSS selectors is a good companion. Now put your token in an env file at the project root so it stays server-side:

bash
# .env.local (never commit this file)
CRAWLBASE_JS_TOKEN=YOUR_CRAWLBASE_TOKEN

Because CRAWLBASE_JS_TOKEN has no NEXT_PUBLIC_ prefix, Next.js keeps it on the server and never bundles it into client JavaScript. That is the whole point of doing the fetch in a server action.

Step 1: Fetch the rendered SERP in a server action

Create app/actions.js. The 'use server' directive at the top marks everything in the file as a server action, so this code only ever runs on the server, where the token is safe. Import the CrawlingAPI client, read the token from the environment, and request the Amazon search URL built from the keyword. Checking the status code before you parse keeps failures loud instead of silent.

javascript
'use server';

const { CrawlingAPI } = require('crawlbase');

const api = new CrawlingAPI({ token: process.env.CRAWLBASE_JS_TOKEN });

async function fetchSerpHtml(keyword) {
  const query = encodeURIComponent(keyword.trim());
  const pageUrl = `https://www.amazon.com/s?k=${query}`;
  const options = { ajax_wait: 'true', page_wait: 5000 };
  const response = await api.get(pageUrl, options);
  if (response.statusCode === 200) {
    return response.body;
  }
  console.error(`Request failed: ${response.statusCode}`);
  return null;
}

The two wait options matter for a client-rendered target like this. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so late-rendering elements appear before the page is captured. Five seconds is a reasonable start; raise it if prices or ratings come back empty. At this point you have rendered Amazon HTML in hand, fetched server-side with the token kept out of the browser. That confirms rendering works before you write a single selector.

Crawlbase Amazon Scraper

That api.get call is doing the hard part Amazon throws at you: it takes your JS token, runs the search page in a real browser so the prices and ratings render, rotates through residential IPs server-side so the request reads as a real visitor, and hands your server action finished HTML. You skip running a headless browser fleet and a proxy pool yourself. Point it at a public search page on the free tier first.

Step 2: Parse each product with Cheerio

With rendered HTML in hand, load it into Cheerio and walk the result cards. Amazon lays each search result out in a repeating block tagged div[data-component-type="s-search-result"], so you select every card, then read name, price, image, rating, review count, and the product link from inside it. Reading each field defensively keeps one missing value from crashing the run. Add this function to the same app/actions.js file.

javascript
const cheerio = require('cheerio');

function parseProducts(html) {
  const $ = cheerio.load(html);
  const products = [];

  $('div[data-component-type="s-search-result"]').each((_, el) => {
    const card = $(el);
    const name = card.find('h2 span').first().text().trim();
    if (!name) return;

    const path = card.find('h2 a').attr('href');

    products.push({
      name,
      price: card.find('.a-price .a-offscreen').first().text().trim() || null,
      image: card.find('img.s-image').attr('src') || null,
      rating: card.find('.a-icon-alt').first().text().trim() || null,
      reviews: card.find('.a-size-base.s-underline-text').first().text().trim() || null,
      url: path ? `https://www.amazon.com${path}` : null,
    });
  });

  return products;
}

A couple of details keep this resilient. The price lives in a hidden .a-offscreen span inside the .a-price block, which is the cleanest single source for the displayed amount, so we read that rather than stitching the symbol and digits together. Each field falls back to null when the element is missing, which is common since sponsored cards and some listings omit a rating or a review count. The product link is a relative path, so we prefix it with the Amazon origin to get an absolute, clickable URL.

Selectors drift

Amazon's class names and layout (s-search-result, a-price, a-offscreen, s-image, and the rest) change without notice, and they vary by region and query type. Treat the selectors above as a starting template, not a contract. When a field comes back as null, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.

Step 3: Expose one server action and the data shape

Now wire the fetch and the parse into a single exported server action your UI can call. It takes a keyword, returns the parsed products, and never lets the token leak to the client because all of this stays in the 'use server' file. Add this to the bottom of app/actions.js.

javascript
export async function scrapeAmazonProducts(keyword) {
  if (!keyword || !keyword.trim()) return { products: [], error: 'Empty search' };
  try {
    const html = await fetchSerpHtml(keyword);
    if (!html) return { products: [], error: 'Fetch failed' };
    const products = parseProducts(html);
    return { products, error: null };
  } catch (err) {
    console.error(err);
    return { products: [], error: 'Something went wrong' };
  }
}

Returning a plain { products, error } object instead of throwing keeps the client component simple: it can render the grid when products has rows and show a message when error is set, with no try/catch scattered through the UI. Everything above runs on the server, so the token, the Crawling API call, and the Cheerio parse all stay out of the browser bundle.

Step 4: Render the results in a React grid

The client side is small. A single client component holds the search state, calls the server action on submit, and maps the returned products into cards. Create app/page.jsx with the 'use client' directive so it can use React hooks, then import and call the action directly; Next.js handles the server round-trip for you.

javascript
'use client';

import { useState } from 'react';
import { scrapeAmazonProducts } from './actions';

export default function Home() {
  const [keyword, setKeyword] = useState('');
  const [products, setProducts] = useState([]);
  const [loading, setLoading] = useState(false);
  const [error, setError] = useState(null);

  async function handleSubmit(event) {
    event.preventDefault();
    setLoading(true);
    setError(null);
    const result = await scrapeAmazonProducts(keyword);
    setProducts(result.products);
    setError(result.error);
    setLoading(false);
  }

  return (
    <main className="container">
      <form onSubmit={handleSubmit}>
        <input
          value={keyword}
          onChange={(e) => setKeyword(e.target.value)}
          placeholder="Search Amazon (iPhone, laptop, headphones...)"
        />
        <button type="submit" disabled={loading}>
          {loading ? 'Searching...' : 'Search'}
        </button>
      </form>

      {error && <p className="error">{error}</p>}

      <div className="grid">
        {products.map((product, i) => (
          <a key={i} href={product.url} target="_blank" className="card">
            {product.image && <img src={product.image} alt={product.name} />}
            <h3>{product.name}</h3>
            <p className="price">{product.price}</p>
            <p className="rating">{product.rating} ({product.reviews || 0})</p>
          </a>
        ))}
      </div>
    </main>
  );
}

This is the whole front end. The form updates keyword on every keystroke; on submit it flips a loading flag, awaits the server action, and stores the returned products and error. The grid maps each product into a card that links out to the Amazon listing in a new tab. Style it with Tailwind or plain CSS to taste; the data flow is what matters here. Start the app with npm run dev, open localhost:3000, search for something like "iPhone 15 Pro Max", and the rendered, parsed products appear in the grid.

What the output looks like

The server action returns a clean array of records, one per product, before it ever reaches the grid. Log result.products and you get a structure like this, ready to render, write to JSON, or persist to a database.

json
[
  {
    "name": "Apple iPhone 15 Pro Max 256GB Natural Titanium",
    "price": "$1,199.00",
    "image": "https://m.media-amazon.com/images/I/81fxjeu8fdL._AC_UL320_.jpg",
    "rating": "4.7 out of 5 stars",
    "reviews": "1,284",
    "url": "https://www.amazon.com/dp/B0CHX1W1XY"
  },
  {
    "name": "Samsung Galaxy S24 Ultra 256GB Unlocked",
    "price": "$1,099.99",
    "image": "https://m.media-amazon.com/images/I/71CXi9gZ4mL._AC_UL320_.jpg",
    "rating": "4.5 out of 5 stars",
    "reviews": "912",
    "url": "https://www.amazon.com/dp/B0CMDRCZBP"
  }
]

Scale across result pages

One page of results is a demo; a real job walks the pagination. Amazon exposes the page number through the page query parameter, so you can fetch each page through the same server action, parse it with the same function, and collect the rows. Because every results page shares the same card structure, the parser you already wrote works across all of them without changes. Add a paged variant to app/actions.js.

javascript
export async function scrapeAmazonPages(keyword, totalPages) {
  const all = [];
  const query = encodeURIComponent(keyword.trim());
  for (let page = 1; page <= totalPages; page++) {
    const url = `https://www.amazon.com/s?k=${query}&page=${page}`;
    const response = await api.get(url, { ajax_wait: 'true', page_wait: 5000 });
    if (response.statusCode === 200) {
      all.push(...parseProducts(response.body));
    }
  }
  return all;
}

To enrich each row with full product detail (the full description, every image, the buy box, complete review data), take the url from each card and fetch that individual product page through the same API client, then write a small parser for the product layout. The pattern is identical: render, then parse. For more on rendering-heavy targets, see how to crawl JavaScript websites.

Staying unblocked

Even with rendering handled, Amazon watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.

  • Pace your requests. Hammering pages in a tight loop is the fastest way to get a CAPTCHA. Spread requests out and vary your keywords instead of crawling one path at full speed.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
  • Read the status codes. A run that starts returning challenges or non-200 responses is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.

For the broader playbook, see how to scrape websites without getting blocked. If you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart AI Proxy gives you the same residential IP rotation as a drop-in proxy endpoint.

Whether scraping Amazon is allowed depends on Amazon's conditions of use, your jurisdiction, and what you do with the data. Amazon's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Amazon's Conditions of Use and its robots.txt, and treat both as the boundary for what you collect.

A few lines worth holding to. Collect only public search data: product names, prices, ratings, review counts, the thumbnail, and the product link that anyone can see without an account. Respect Amazon's stated rate expectations and keep your request volume low enough that you are not straining its servers. Avoid personal data, including anything tied to identifiable reviewers beyond the public review text and counts shown on a results page. Do not redistribute copyrighted product images or descriptions wholesale; reference them, do not republish them as your own. If you plan to reuse the data commercially, get permission or an official agreement rather than assuming silence is consent.

For volume or commercial use, Amazon offers official channels, including the Product Advertising API for affiliates and Amazon's seller and advertising APIs for registered businesses, and those are the right tools when you need large volumes, guaranteed structure, or commercial rights. This guide is deliberately scoped to public search and listing pages because that is the line that keeps the work defensible. It does not cover anything behind a login, buyer or seller account data, order history, private messages, or any attempt to bypass authentication or a CAPTCHA challenge as a means of access. If your project needs more than public search data, Amazon's official APIs or a data agreement are the correct path, not a cleverer scraper.

Recap

Key takeaways

  • Amazon renders the SERP client-side. A plain request returns an incomplete or challenged page, so you must render it before you parse it.
  • Do the fetch in a server action. A 'use server' file keeps the Crawlbase token in process.env on the server and out of the browser bundle entirely.
  • One call handles rendering and a trusted IP. The Crawling API with a JS token does both; ajax_wait and page_wait control how long it waits for content.
  • Cheerio does the extraction. Select every s-search-result card, then map name, price, image, rating, reviews, and URL to current selectors, and expect those selectors to drift.
  • Stay on public data. Respect Amazon's terms and robots.txt, prefer the official APIs for volume or commercial use, and never touch logins, personal data, or order history.

Frequently Asked Questions (FAQs)

Why use a Next.js server action instead of a client-side fetch?

Because the Crawlbase token must never reach the browser. A server action runs only on the server, so you can read the token from process.env.CRAWLBASE_JS_TOKEN, make the Crawling API call, and parse with Cheerio without any of it being bundled into client JavaScript. The client component just calls the action and renders what comes back. It also keeps Cheerio, which is a Node library, out of the browser where it cannot run.

Why does a plain request return incomplete data from Amazon?

Because Amazon renders prices, ratings, and parts of each result card client-side with JavaScript, and it challenges automated traffic with CAPTCHAs and robot checks. A raw HTTP request can come back with key fields missing or get blocked outright. To get a complete page you have to render it behind a trusted IP, which is what the Crawling API's JS token handles for you.

Do I need the normal token or the JS token for Amazon?

Use the JS token. The normal token fetches static HTML, which on Amazon can come back with prices or ratings missing. The JS token renders the page in a real browser before handing back the HTML, so the result fields are present when Cheerio parses them.

My selectors return null. What changed?

Almost certainly Amazon's markup. Its s-search-result cards, a-price blocks, and a-offscreen spans change without notice and vary by region and query type, so selectors that worked last month can break. Re-inspect a live page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.

How is this different from the plain Python or Node script approach?

The data and the Crawling API call are the same; the framing is full-stack. Here the fetch and parse live in a Next.js server action, and a React grid renders the results in a browser UI with the token safely server-side. If you only need a script that prints JSON, the simpler route is to scrape Amazon search pages with the Crawling API directly, or to build a web scraper with Node.js without the framework around it.

Can I scrape buyer or seller personal data from Amazon?

No, and this guide does not cover it. Account data, order history, and private messages sit behind a login, so they are not public data. Scraping login-walled content, personal data, or bypassing authentication or a CAPTCHA to reach it is out of scope here and runs against Amazon's terms. For sanctioned access the correct route is Amazon's official APIs or a licensing agreement.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available