On any Amazon product page with more than one seller, exactly one offer wins the prime spot next to the "Add to Cart" and "Buy Now" buttons. That spot is the Buy Box, and it carries the overwhelming majority of a listing's sales. Studies put roughly 90% of Amazon purchases through the Buy Box rather than the "Other sellers" list buried further down the page. For a seller competing on a shared listing, winning or losing the Buy Box is the difference between a steady stream of orders and near silence.

This guide shows you how to scrape Amazon Buy Box data with Python so you can track the winning offer over time. You build a small, runnable scraper that fetches a rendered product page through the Crawling API, parses it with BeautifulSoup, and pulls the fields that define the current Buy Box: the winning price, the seller, who fulfills the order, and availability. The whole walkthrough stays scoped to public product-page data, and the legality section near the end is worth reading before you point this at any real volume.

What you will build

A Python script that takes an Amazon product URL, retrieves the rendered page through the Crawling API, and extracts a structured record describing the current Buy Box winner. We use a single Motorola phone listing as the running example and pull these fields:

  • Product Title the name of the product featured in the Buy Box.
  • Price the winning offer's current price, which fluctuates as sellers compete.
  • Seller Name who currently holds the Buy Box, whether that is Amazon itself or a third-party seller.
  • Shipper Name who fulfills the order, which tells you FBA (fulfilled by Amazon) versus merchant-fulfilled.
  • Availability the in-stock status shown alongside the offer.
  • Buy Now / Add to Cart whether the purchase buttons are present, confirming a live Buy Box rather than an unavailable listing.

What the Amazon Buy Box is and why sellers track it

The Buy Box is the boxed offer on the right side of a product page that holds the price, the seller and fulfillment details, the availability line, and the "Add to Cart" and "Buy Now" buttons. When several merchants sell the same product on one listing, Amazon picks a single offer to feature there. Most shoppers click "Add to Cart" without ever scrolling to "Other sellers on Amazon" or "Compare with similar items," so the featured offer captures the sale. Amazon now formally calls this the Featured Offer, though most sellers still say Buy Box.

Which offer wins is decided by an algorithm that weighs price, shipping speed and cost, seller performance metrics, fulfillment method, and stock, and it re-evaluates continuously. A competitor undercutting your price by a few cents, or switching to faster fulfillment, can take the Buy Box from you within the hour. Because the position shifts in real time, sellers monitor it programmatically rather than by hand:

  • Real-time monitoring. The Buy Box changes constantly. A scraper that samples a listing on a schedule tells you who holds it now and how often it flips, which is impossible to track manually across a catalog.
  • Price intelligence. Price is one of the heaviest factors in the decision, so knowing the current winning price lets you adjust yours to compete. See our guide on web scraping for price intelligence for the broader pattern.
  • Competitor analysis. Tracking which sellers win, at what price, and with which fulfillment method shows you what it takes to compete on a given listing.
  • Strategy at scale. Across hundreds of listings, automated collection is the only practical way to watch every product and react fast enough to matter.

Why a plain request fails on Amazon

If you request an Amazon product URL with a bare HTTP client, you get a status 200 and a page with almost none of the Buy Box data in it. Two things work against you. First, Amazon loads much of the offer block, including the seller, fulfillment, and availability details, dynamically through JavaScript and AJAX after the initial HTML arrives. A raw fetch captures the shell before those parts render. Second, Amazon flags automated traffic quickly: datacenter IPs and non-browser request patterns get met with a CAPTCHA, a rate limit, or an outright IP block before they reach the rendered offer.

So a working Amazon scraper needs two things in one request: a real browser that renders the page, and an IP the platform reads as an ordinary shopper. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted residential IP, and it returns finished HTML for you to parse.

Why the JS token

Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Amazon loads its offer block client-side, so you need the JS token here. The normal token returns the same partial shell a plain fetch would, and the Buy Box fields are not in it to parse.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, our intro to scraping with Python and the official docs will get you to the level this tutorial assumes.

Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.

A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token. Crawlbase gives you 1,000 free Crawling API requests to start, no card required. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the three libraries the scraper needs.

bash
python --version

python -m venv buybox_env
source buybox_env/bin/activate

pip install crawlbase beautifulsoup4 pandas

On Windows, activate the environment with buybox_env\Scripts\activate instead of the source line. Three dependencies do the work: crawlbase is the official client for the Crawling API, beautifulsoup4 parses the returned HTML so you can pull each field out by CSS selector, and pandas writes the records to CSV at the end so you can append samples over time.

Step 1: Fetch the rendered product page

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JS token, and request the product URL. The two wait options matter here because the offer block renders late. Checking the status code before you parse keeps failures loud instead of silent.

python
from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

def crawl(product_url):
    options = {"page_wait": 2000, "ajax_wait": "true"}
    response = api.get(product_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("latin1")
    print(f"Request failed: {response['status_code']}")
    return None

if __name__ == "__main__":
    product_url = "https://www.amazon.com/Motorola-Stylus-Battery-Unlocked-Emerald/dp/B0BFYRV4CD"
    html = crawl(product_url)
    print(html[:500] if html else "No HTML returned")

The two wait options are what make a dynamic target like this work. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so the late-rendering offer block appears before the page is captured. The body is decoded as latin1 because Amazon pages mix in characters that strict UTF-8 decoding can choke on. Run this and you should see real product markup, not the partial shell a plain fetch returns. That confirms rendering works before you write a single selector.

Crawlbase Amazon Scraper

That offer block only filled in because the page rendered behind a trusted IP. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at one product URL on the free tier first.

Step 2: Inspect the page and map the selectors

Before writing the parser, open a product page in your browser, right-click the offer block, and choose Inspect. Each Buy Box field lives in a stable element you can target by CSS selector. Hover over elements in the dev tools panel to see which part of the page they cover, then copy the selector. These are the durable selectors for the Buy Box fields, ported from a live Amazon product page:

  • Product Title #productTitle
  • Price .a-price .a-offscreen (the accessibility copy of the price string)
  • Availability #availability span
  • Shipper Name #fulfillerInfoFeature_feature_div span.offer-display-feature-text-message
  • Seller Name #merchantInfoFeature_feature_div span.offer-display-feature-text-message
  • Buy Now button span#submit.buy-now and Add to Cart button span#submit.add-to-cart

The shipper and seller selectors are the two that tell you how an offer wins. When both read "Amazon.com" the order is sold and shipped by Amazon. When the seller is a third party but the shipper is Amazon, that is an FBA offer (fulfilled by Amazon). When the third-party seller also ships, the offer is merchant-fulfilled. Tracking those two fields over time shows you not just the price war on a listing but the fulfillment strategy of whoever is winning it.

Step 3: Parse the Buy Box data with BeautifulSoup

With rendered HTML in hand, load it into BeautifulSoup and pull each field by its selector. Two small helpers keep the extraction clean: one returns an element's text or a default when it is missing, and one reports whether a button element is present. Note the IDs with dots in them (submit.buy-now and submit.add-to-cart) are escaped in the CSS selector as submit\.buy-now, since a bare dot would read as a class.

python
from bs4 import BeautifulSoup

def scrape_buy_box(html_content):
    soup = BeautifulSoup(html_content, "html.parser")
    buy_box = {}

    def text_or_default(selector, default="Not found"):
        el = soup.select_one(selector)
        return el.text.strip() if el else default

    def is_present(selector):
        return "Present" if soup.select_one(selector) else "Not Present"

    buy_box["Buy Now Button"] = is_present("span#submit\\.buy-now")
    buy_box["Add to Cart Button"] = is_present("span#submit\\.add-to-cart")
    buy_box["Availability"] = text_or_default("#availability span")
    buy_box["Product Title"] = text_or_default("#productTitle")
    buy_box["Price"] = text_or_default(".a-price .a-offscreen")
    buy_box["Shipper Name"] = text_or_default(
        "#fulfillerInfoFeature_feature_div span.offer-display-feature-text-message"
    )
    buy_box["Seller Name"] = text_or_default(
        "#merchantInfoFeature_feature_div span.offer-display-feature-text-message"
    )

    return buy_box

Each lookup goes through text_or_default, so a missing element returns "Not found" rather than throwing on a .text call against nothing. That resilience matters here: when a listing has no current Buy Box, or is out of stock, several of these fields will be absent, and you want a clean record that says so instead of a crash. The is_present helper turns the presence of the purchase buttons into a simple flag, which is your fastest signal that an offer is actually buyable right now.

Step 4: Assemble the full script

Now wire the fetch, the parse, and storage into one runnable script. It crawls the product page, extracts the Buy Box record, prints it as JSON, and appends it to a CSV so you can build a history of samples over time.

python
import json
import os
from datetime import datetime, timezone
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup
import pandas as pd

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

def crawl(product_url):
    options = {"page_wait": 2000, "ajax_wait": "true"}
    response = api.get(product_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("latin1")
    print(f"Request failed: {response['status_code']}")
    return None

def scrape_buy_box(html_content):
    soup = BeautifulSoup(html_content, "html.parser")
    buy_box = {}

    def text_or_default(selector, default="Not found"):
        el = soup.select_one(selector)
        return el.text.strip() if el else default

    def is_present(selector):
        return "Present" if soup.select_one(selector) else "Not Present"

    buy_box["Buy Now Button"] = is_present("span#submit\\.buy-now")
    buy_box["Add to Cart Button"] = is_present("span#submit\\.add-to-cart")
    buy_box["Availability"] = text_or_default("#availability span")
    buy_box["Product Title"] = text_or_default("#productTitle")
    buy_box["Price"] = text_or_default(".a-price .a-offscreen")
    buy_box["Shipper Name"] = text_or_default(
        "#fulfillerInfoFeature_feature_div span.offer-display-feature-text-message"
    )
    buy_box["Seller Name"] = text_or_default(
        "#merchantInfoFeature_feature_div span.offer-display-feature-text-message"
    )

    return buy_box

def save_to_csv(record, path="buy_box_history.csv"):
    record["Captured At"] = datetime.now(timezone.utc).isoformat()
    df = pd.DataFrame([record])
    write_header = not os.path.exists(path)
    df.to_csv(path, mode="a", header=write_header, index=False)

def main():
    product_url = "https://www.amazon.com/Motorola-Stylus-Battery-Unlocked-Emerald/dp/B0BFYRV4CD"
    html = crawl(product_url)
    if not html:
        return
    buy_box = scrape_buy_box(html)
    print(json.dumps(buy_box, indent=2))
    save_to_csv(buy_box)

if __name__ == "__main__":
    main()

The script crawls the page, parses the Buy Box, and appends one timestamped row per run to buy_box_history.csv. The Captured At field is what turns a single snapshot into tracking: run this on a schedule (cron, a task runner, or an async Crawler callback) and each row records who held the Buy Box, at what price, and with which fulfillment, at a known moment. Diffing those rows over days or weeks shows you how often the winner flips and what price it takes to hold the spot.

What the output looks like

Run the full script with python scraper.py and you get a structured record for the current Buy Box, printed as JSON and appended to the CSV.

json
{
  "Buy Now Button": "Present",
  "Add to Cart Button": "Present",
  "Availability": "In Stock",
  "Product Title": "Motorola Moto G Stylus 5G | 2021 | 2-Day Battery | Unlocked | Made for US 4/128GB | 48MP Camera | Cosmic Emerald",
  "Price": "$149.99",
  "Shipper Name": "Amazon.com",
  "Seller Name": "Amazon.com",
  "Captured At": "2024-01-15T09:30:00+00:00"
}

In this sample, both the seller and the shipper read "Amazon.com," so Amazon itself holds the Buy Box and fulfills the order. On a listing won by a third party you would see a merchant name in Seller Name and either "Amazon.com" in Shipper Name (an FBA offer) or the same merchant again (merchant-fulfilled). Comparing those two fields across captures is what tells you the fulfillment story behind each Buy Box win.

Scaling to a catalog of listings

One URL is a demo; a real tracker watches many products. Because each Buy Box capture is one independent page fetch, you scale by looping a list of product URLs and pacing the requests so you do not hammer Amazon in a tight loop.

python
import time

def track_listings(product_urls):
    for url in product_urls:
        html = crawl(url)
        if not html:
            continue
        buy_box = scrape_buy_box(html)
        buy_box["Product URL"] = url
        save_to_csv(buy_box)
        print(f"{buy_box['Seller Name']} @ {buy_box['Price']} -> {url}")
        time.sleep(2)

The time.sleep(2) between requests paces the run so you are not firing back-to-back fetches at one target, which is the fastest way to get throttled. For larger catalogs, the async Crawler lets you push many URLs and receive results on a callback instead of blocking on each one. If you only need a handful of fields and want to skip writing selectors entirely, the Crawling API can auto-parse Amazon product pages into structured JSON for you.

Staying unblocked

Even with rendering handled, Amazon watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.

  • Pace your requests. Spread fetches out with a delay between listings rather than crawling at full speed. The time.sleep in the loop is the floor, not the ceiling.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
  • Read the status codes. A run that starts returning challenges or non-200 codes is telling you the current rate or IP tier is no longer enough. Treat that as a signal to back off, not noise to ignore.

For the broader playbook, see how to scrape websites without getting blocked. If you want to scrape the full listing rather than just the Buy Box, the companion guide on scraping Amazon product data covers the rest of the page, and our Amazon best sellers walkthrough shows how to collect listings to feed into a tracker like this one.

Whether scraping Amazon is allowed depends on Amazon's terms of service, your jurisdiction, and what you do with the data. Amazon's Conditions of Use restrict automated access and data collection, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it only makes the technical part work. Read Amazon's Conditions of Use and its robots.txt, and treat both as the boundary for what you collect.

A few lines are worth holding to. Collect only public data: the product title, the offer price, the seller and fulfillment labels, and the availability that anyone can see on a product page without an account. Keep your request volume low enough that you are not straining Amazon's servers, and pace the run as shown above. Avoid personal data, including anything tied to identifiable shoppers or to reviewers beyond the public review text. Do not redistribute copyrighted media such as product images or descriptions as if they were your own. This guide is deliberately scoped to the public offer block, so it does not cover anything behind a login: account data, order history, Seller Central dashboards, or payment and checkout flows are all out of scope, and bypassing authentication to reach them is not something a scraper should do.

If you are an Amazon seller, the sanctioned path to your own Buy Box and pricing data is Amazon's official Selling Partner API, which gives you structured, licensed access without scraping at all. For competitive monitoring at scale, or any use with commercial redistribution, an official API or a data agreement is the correct tool when you need guaranteed structure, volume, or rights. Use scraping for what it is good at: lightweight, public, read-only sampling of pages you could open in a browser yourself.

Recap

Key takeaways

  • The Buy Box is the offer that wins the sale. Around 90% of Amazon purchases go through it, so sellers track who holds it, at what price, and with which fulfillment.
  • Amazon renders the offer block client-side. A plain fetch returns a partial shell, so you must render the page with a JS token before the seller, shipper, and availability fields exist to parse.
  • Seller plus shipper tells the fulfillment story. Both reading "Amazon.com" means Amazon-fulfilled; a third-party seller with Amazon as shipper is FBA; the same merchant in both is merchant-fulfilled.
  • Timestamps turn snapshots into tracking. Appending a Captured At row per run, on a schedule, lets you diff Buy Box wins and price moves over time.
  • Stay on public data. Respect Amazon's terms and robots.txt, prefer the official Selling Partner API for licensed or seller data, and never touch accounts, orders, or anything behind a login.

Frequently Asked Questions (FAQs)

What is the Amazon Buy Box and why does it matter?

The Buy Box is the featured offer on a product page that holds the price, the seller and fulfillment details, and the "Add to Cart" and "Buy Now" buttons. When several merchants sell the same product, Amazon features one of them there, and most shoppers buy from that offer without comparing the others. Around 90% of Amazon sales go through the Buy Box, so winning it is the single biggest lever a seller on a shared listing has over their order volume.

Why does a plain request return no Buy Box data?

Because Amazon loads much of the offer block, including the seller, shipper, and availability fields, dynamically through JavaScript and AJAX after the initial HTML arrives. A raw HTTP request captures the shell before those parts render, so the fields you want are blank. Rendering the page first, which the Crawling API's JS token does, is what makes the Buy Box data appear in the HTML you parse.

How do I tell FBA from a merchant-fulfilled offer?

Compare the Seller Name and Shipper Name fields. If both read "Amazon.com," Amazon sells and ships the item. If the seller is a third party but the shipper is "Amazon.com," the offer is FBA (fulfilled by Amazon). If the same third-party name appears in both, the seller fulfills the order themselves. The two selectors for these fields are #merchantInfoFeature_feature_div and #fulfillerInfoFeature_feature_div respectively.

How do I track the Buy Box over time instead of taking one snapshot?

Append each capture as a timestamped row to a CSV, as the full script does with the Captured At field, and run the scraper on a schedule. Cron, a task scheduler, or the async Crawler callback all work. Once you have a history, diff consecutive rows to see when the winning seller, price, or fulfillment method changed for a listing.

Do I need the normal token or the JS token for Amazon?

The JS token. The normal token fetches static HTML, which on Amazon omits the dynamically loaded offer block. The JS token renders the page in a real browser first, so the Buy Box fields are present when BeautifulSoup parses them. Amazon is JavaScript-heavy, so the JS token is the right default for its product pages.

My selectors return "Not found." What changed?

Usually one of two things. Either Amazon updated its markup, in which case you re-inspect a live product page in your browser's dev tools and update the selector, or the offer block did not finish rendering, in which case raise the page_wait value so the API holds longer before capturing the HTML. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available