How to Scrape Amazon PPC Ad Data

Q: How do I scrape only the sponsored ads and not the organic results?

Amazon wraps sponsored cards in containers carrying an AdHolder class. Selecting on .AdHolder div[data-asin] matches only those cards, so your parser collects PPC placements and skips organic listings entirely. From each ad card you then read the title from its heading anchor, the price from the a-price block, and the href for the product link.

Search any popular keyword on Amazon and the first results you see are rarely the most organic ones. They are Sponsored Products: pay-per-click (PPC) ads that brands bid on to sit at the top of a results page, each tagged with a small "Sponsored" or "Ad" label. Those placements are public, and the pattern of which products advertise on which keywords is a direct window into competitor strategy, ad spend priorities, and which terms a category is fighting over.

This guide shows you how to scrape Amazon PPC ad data with Python. You build a small, runnable scraper that fetches a rendered Amazon search results page through the Crawling API, parses only the sponsored placements out of the grid with BeautifulSoup, and pulls a clean record for each ad: title, price, on-page position, and product link. The whole walkthrough stays scoped to public search-page data that any shopper sees, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.

What you will build

A Python script that takes a search keyword, retrieves the rendered Amazon results page through the Crawling API, isolates the sponsored cards from the organic ones, and extracts a structured record per ad. We will use a headphones search as the running example and pull these fields from each sponsored card:

Title the advertised product's title text.
Price the listed price shown on the sponsored card.
Position where the ad sits in the result order, so you can see top-of-page versus lower placements.
Link the URL to the advertised product's own detail page.
Keyword the search term the ad appeared for, carried through so you can group results by query.

Why a plain request fails on Amazon

If you request an Amazon search URL with a bare HTTP client, you get a response with status 200 and almost none of the product data in the body. Two things work against you. First, Amazon loads its search grid dynamically: the initial HTML is largely a shell, and the important content fills in through JavaScript and Ajax once the page runs in a browser. Save a raw fetch to a file and you will find the sponsored cards simply are not there. Second, Amazon flags automated traffic fast. Datacenter IPs and request patterns that do not look like a real browser get challenged with a CAPTCHA or blocked before they reach the rendered listings.

So a working Amazon ad scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real shopper. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted residential IP, and it returns finished HTML for you to parse. For more on why client-rendered sites resist plain requests, see how to crawl JavaScript websites.

Why the JS token

Crawlbase offers two token types. The normal token (TCP) fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Amazon relies heavily on JavaScript for dynamic content, so you need the JS token here. The normal token returns the same empty shell a plain fetch would, and there is nothing to parse out of it.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, the official Python docs and any beginner course will get you to the level this tutorial assumes.

Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.

A Crawlbase account and JS token. Sign up for a free account, open your dashboard, and copy your JavaScript (JS) token from the account docs page. The free tier includes 1,000 requests with no card, which is plenty to follow this guide. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the libraries the scraper needs.

bash

python --version

python -m venv amazon_env
source amazon_env/bin/activate

pip install crawlbase beautifulsoup4 pandas

On Windows, activate the environment with amazon_env\Scripts\activate instead of the source line. Three dependencies do the work: crawlbase is the official client for the Crawling API, beautifulsoup4 parses the returned HTML so you can pull each field out by CSS selector, and pandas helps you organize the scraped ads for the keyword analysis at the end.

Understanding Amazon sponsored placements

An Amazon search results page is a grid of product cards. Most are organic listings ranked by relevance, but interspersed among them are Sponsored Products: the PPC ads. Visually they look like any other card, with a title, a price, an image, and a "Sponsored" label. Structurally, Amazon wraps sponsored cards so you can tell them apart from organic ones, which is exactly what lets you scrape only the ads.

Before writing selectors, open a search page in your browser, right-click a sponsored card, and choose Inspect. Sponsored cards live inside containers that carry an AdHolder class alongside the usual data-asin result attributes. The title sits in an anchor inside the card's heading, and the price is exposed through Amazon's a-price markup. Those are the hooks you target. Amazon's class names shift over time, but the AdHolder marker and the a-price structure have been durable, so lean on them.

Step 1: Fetch the rendered search page

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JS token, build the search URL from a keyword, and request it with the wait options Amazon's dynamic grid needs. Checking the status code before you parse keeps failures loud instead of silent.

python

from crawlbase import CrawlingAPI

# Amazon is JavaScript-rendered, so use your JS token here
api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

def crawl(page_url):
    options = {"page_wait": 2000, "ajax_wait": "true"}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("latin1")
    print(f"Request failed: {response['status_code']}")
    return None

if __name__ == "__main__":
    keyword = "headphones"
    search_url = f"https://www.amazon.com/s?k={keyword}"
    html = crawl(search_url)
    print(html[:500] if html else "No HTML returned")

The two wait options matter for a dynamic target like this. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so the late-rendering grid appears before the page is captured. Two seconds is a reasonable start; raise it if cards come back missing. The body is decoded as latin1 because Amazon pages mix in characters that strict UTF-8 decoding can choke on, and latin1 handles them without errors. Run the script and you should see real product markup, not the empty shell a plain fetch returns. That confirms rendering works before you write a single selector.

Crawlbase Amazon Scraper

Amazon needs a rendered page behind a trusted IP, in one call, which is why the plain fetch above returns an empty shell. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless browser fleet and a proxy pool yourself. Point it at a keyword on the free tier first.

Start free

Step 2: Parse the sponsored cards with BeautifulSoup

With rendered HTML in hand, load it into BeautifulSoup and select only the sponsored cards. This is the key difference from a plain search scrape: instead of grabbing every result, you target the AdHolder containers so you collect ads and nothing else. Then pull each field by its selector, ported straight from the live page markup. Wrap each card in a try/except so one malformed listing does not crash the run.

python

from bs4 import BeautifulSoup

def parse_ads(html, keyword):
    soup = BeautifulSoup(html, "html.parser")

    # Select only sponsored (PPC) cards, not organic results
    ads = soup.select(
        '.AdHolder div[data-asin], '
        'div[data-asin][data-component-type="s-search-result"].AdHolder'
    )

    results = []
    for position, ad in enumerate(ads, start=1):
        try:
            # Price inside the ad card
            price_el = ad.select_one("span.a-price span.a-offscreen")
            price = price_el.text.strip() if price_el else "Price not found"

            # Title inside the ad card
            title_el = ad.select_one(
                "div.a-section h2 a.a-link-normal span, "
                "div.a-section a.a-link-normal span.a-offscreen"
            )
            title = title_el.text.strip() if title_el else "Title not found"

            # Link to the advertised product page
            link_el = ad.select_one("h2 a.a-link-normal, a.a-link-normal")
            link = None
            if link_el and link_el.get("href"):
                link = "https://www.amazon.com" + link_el["href"]

            results.append({
                "keyword": keyword,
                "position": position,
                "title": title,
                "price": price,
                "link": link,
            })
        except Exception as e:
            print(f"Skipped a card: {e}")
    return results

The selector that isolates the ads is the heart of this scraper. .AdHolder div[data-asin] matches the sponsored cards while leaving organic results untouched, so parse_ads only ever sees PPC placements. The price comes from the hidden span.a-offscreen inside Amazon's a-price block, which holds the clean numeric value, and the title is read from the card's heading anchor with a fallback selector for layouts that expose it differently. enumerate(..., start=1) records each ad's position in the result order, which is what turns a flat list into placement intelligence: position 1 is a top-of-page bid, lower numbers are more expensive real estate. Each missing field degrades to a sentinel string or None rather than crashing the loop. If you want a refresher on the selector syntax itself, the guide on how to use BeautifulSoup in Python covers it in depth.

Selectors drift

Amazon's class names and card structure change without notice. The AdHolder marker, the a-price price block, and the heading anchor above are a starting template, not a contract. When titles or prices come back as the "not found" sentinel for every ad, re-inspect a live sponsored card in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.

Step 3: Put it together and analyze the keyword

Now wire the fetch and the parse into one runnable script, then add the payoff: a small analysis that turns the raw ad list into competitor intelligence. Fetch the rendered search page, isolate the sponsored cards, and load the records into a pandas DataFrame so you can see which products are advertising on the keyword and at what price points.

python

import json
import pandas as pd
from bs4 import BeautifulSoup
from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

def crawl(page_url):
    options = {"page_wait": 2000, "ajax_wait": "true"}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("latin1")
    print(f"Request failed: {response['status_code']}")
    return None

def parse_ads(html, keyword):
    soup = BeautifulSoup(html, "html.parser")
    ads = soup.select(
        '.AdHolder div[data-asin], '
        'div[data-asin][data-component-type="s-search-result"].AdHolder'
    )
    results = []
    for position, ad in enumerate(ads, start=1):
        try:
            price_el = ad.select_one("span.a-price span.a-offscreen")
            price = price_el.text.strip() if price_el else "Price not found"
            title_el = ad.select_one(
                "div.a-section h2 a.a-link-normal span, "
                "div.a-section a.a-link-normal span.a-offscreen"
            )
            title = title_el.text.strip() if title_el else "Title not found"
            link_el = ad.select_one("h2 a.a-link-normal, a.a-link-normal")
            link = None
            if link_el and link_el.get("href"):
                link = "https://www.amazon.com" + link_el["href"]
            results.append({
                "keyword": keyword,
                "position": position,
                "title": title,
                "price": price,
                "link": link,
            })
        except Exception as e:
            print(f"Skipped a card: {e}")
    return results

def analyze(ads):
    if not ads:
        print("No sponsored ads found for this keyword.")
        return
    df = pd.DataFrame(ads)
    print(f"Sponsored placements found: {len(df)}")
    print("\nTop-of-page advertisers:")
    print(df[["position", "title", "price"]].head())
    return df

def main():
    keyword = "headphones"
    search_url = f"https://www.amazon.com/s?k={keyword}"
    html = crawl(search_url)
    if not html:
        return
    ads = parse_ads(html, keyword)
    print(json.dumps(ads, indent=2))
    df = analyze(ads)
    if df is not None:
        df.to_csv("amazon_ppc_ads.csv", index=False)

if __name__ == "__main__":
    main()

The analyze step is what makes this an ad-intelligence tool rather than a raw dump. Loading the records into a DataFrame lets you sort by position to see who is winning the top bids, group by price to read the competitive price band, and write everything to amazon_ppc_ads.csv for tracking over time. Run the same keyword on a schedule and the diffs tell you when a new competitor enters the auction or an existing one changes its bid priority. If you want to feed those prices into a wider model, the guide on using web scraping for price intelligence shows where this kind of data goes next.

What the output looks like

Run the full script with python scraper.py and you get a clean list of sponsored records, one per ad, plus the CSV and a short summary. The JSON shape looks like this:

json

[
  {
    "keyword": "headphones",
    "position": 1,
    "title": "Wireless Bluetooth Headphones, Over-Ear, 40H Playtime",
    "price": "$39.99",
    "link": "https://www.amazon.com/dp/..."
  },
  {
    "keyword": "headphones",
    "position": 2,
    "title": "Noise Cancelling Headphones, Wired and Wireless",
    "price": "$59.95",
    "link": "https://www.amazon.com/dp/..."
  }
]

Each record carries the keyword, the on-page position, the advertised title and price, and a link into the product page. That is enough to answer the core competitor-intelligence question: for any term you care about, which products are paying to sit at the top, and what price band are they competing in.

Scaling across keywords and pages

One keyword on one page is a demo. The real value comes from running a list of keywords on a schedule and tracking how the advertiser set shifts. Loop your target terms, scrape the sponsored cards for each, and concatenate the results so a single CSV holds the full competitive picture.

python

import time

def scrape_keywords(keywords):
    all_ads = []
    for keyword in keywords:
        url = f"https://www.amazon.com/s?k={keyword}"
        html = crawl(url)
        if not html:
            continue
        ads = parse_ads(html, keyword)
        all_ads.extend(ads)
        print(f"{keyword}: {len(ads)} sponsored placements")
        time.sleep(2)
    return all_ads

keywords = ["headphones", "wireless earbuds", "bluetooth speaker"]
ads = scrape_keywords(keywords)

The time.sleep(2) between keywords paces the run so you are not hammering search in a tight loop, which is the fastest way to get throttled. To walk additional result pages for a single keyword, Amazon paginates search with a &page= parameter on the URL, so you increment it the same way and stop when a page returns no sponsored cards. Keep the cap modest: the most valuable ad placements sit on the first one or two pages anyway.

Staying unblocked

Even with rendering handled, Amazon watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.

Pace your requests. Spread requests out with a delay between keywords and pages instead of crawling at full speed. The time.sleep in the loop is the floor, not the ceiling.
Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
Read the status codes. A run that starts returning challenges or errors is telling you the current rate or IP tier is no longer enough. Treat that as a signal to back off, not noise to ignore.

For the broader playbook on keeping a scraper healthy against defended sites, see how to scrape websites without getting blocked.

Is it legal to scrape Amazon ad data?

Whether scraping Amazon is allowed depends on Amazon's Conditions of Use, your jurisdiction, and what you do with the data. Amazon's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Amazon's Conditions of Use and its robots.txt, and treat both as the boundary for what you collect.

A few lines worth holding to. The sponsored title, price, position, and product link this scraper pulls are public data: anyone running the same search sees the same ads, with no account required. Stay on that public, ad-facing surface. Respect Amazon's stated rate expectations and keep your request volume low enough that you are not straining its servers. Avoid personal data, including anything tied to identifiable shoppers, reviewers, or sellers beyond what is publicly listed on a results page, and do not redistribute copyrighted product images or descriptions wholesale. Using ad data for your own competitive analysis is a very different thing from republishing Amazon's content.

This guide is deliberately scoped to public sponsored placements on search pages because that is the line that keeps the work defensible. It does not cover anything behind a login, seller-central or campaign-management data, another advertiser's private account metrics, payment flows, or any attempt to bypass authentication. The on-page "Sponsored" data you collect here is the competitor's public ad footprint, not their internal campaign numbers. For licensed or bulk access, Amazon offers official advertising and product APIs, and that is the right tool when you need large volumes, guaranteed structure, or commercial rights. If your project needs more than public ad placements, an official API or a data agreement is the correct path, not a cleverer scraper.

Recap

Key takeaways

Sponsored placements are public competitor intelligence. Which products advertise on a keyword, at which position and price, is visible to any shopper and reveals real ad strategy.
Amazon search is JavaScript-rendered. A plain fetch returns an empty shell, so you must render the page with the JS token before any sponsored card exists to parse.
The AdHolder selector isolates the ads. Targeting .AdHolder div[data-asin] collects only PPC cards, and the a-price and heading-anchor selectors pull title, price, and link from each one.
Position turns a list into placement data. Recording each ad's order with enumerate, then loading into pandas, lets you track top-of-page bids and price bands across keywords over time.
Stay on public data. Respect Amazon's Conditions of Use and robots.txt, prefer an official Amazon API for licensed or bulk data, and never touch accounts, campaign internals, or personal information.

Frequently Asked Questions (FAQs)

What is Amazon PPC advertising?

Amazon PPC (pay-per-click) advertising lets sellers and brands promote products inside Amazon's search results and product pages. These Sponsored Products are tagged "Sponsored" or "Ad," and the advertiser pays only when a shopper clicks. Because the ads are bid-driven, the set of products advertising on a given keyword is a live read on which brands are spending to compete for that term.

Why does a plain request return no ads from Amazon?

Because Amazon loads its search grid dynamically with JavaScript and Ajax. The initial HTML is largely a shell, so a raw HTTP request returns status 200 with the sponsored and organic cards both blank. To get real data you have to render the page first, which is what the Crawling API's JS token handles for you before BeautifulSoup parses it.

How do I scrape only the sponsored ads and not the organic results?

Amazon wraps sponsored cards in containers carrying an AdHolder class. Selecting on .AdHolder div[data-asin] matches only those cards, so your parser collects PPC placements and skips organic listings entirely. From each ad card you then read the title from its heading anchor, the price from the a-price block, and the href for the product link.

My title or price comes back as "not found." What changed?

Almost certainly Amazon's markup. Its class names and card structure change without notice, and the a-price or heading-anchor selectors above can stop matching. Re-inspect a live sponsored card in your browser's dev tools, confirm the AdHolder wrapper and the inner price and title selectors, and update them. Periodic selector maintenance is normal for any production scraper.

What can I do with scraped Amazon ad data?

Track competitor advertising over time: which products enter or leave the sponsored set for a keyword, where they place, and the price band they compete in. Run the same keywords on a schedule and the diffs surface new entrants and bid-priority shifts. The position and price fields feed directly into ad and price intelligence dashboards.

Do I need the normal token or the JS token for Amazon?

The JS token. The normal token fetches static HTML, which on Amazon is the same empty shell a plain fetch returns. The JS token renders the page in a real browser before handing back the HTML, so the sponsored cards are present when your parser runs. Pair it with the page_wait and ajax_wait options so the dynamic grid has time to finish loading.

Hassan Rehan

Software Engineer · Crawlbase

Software engineer at Crawlbase writing hands-on guides on rotating proxies, scraping, and the practical details of wiring proxies into real code.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Get a free API key →Read the docs

Self-serve · No sales call required · Enterprise crawl volumes available

What you will build

Why a plain request fails on Amazon

Prerequisites

Set up the project

Understanding Amazon sponsored placements

Step 1: Fetch the rendered search page

Step 2: Parse the sponsored cards with BeautifulSoup

Step 3: Put it together and analyze the keyword

What the output looks like

Scaling across keywords and pages

Staying unblocked

Is it legal to scrape Amazon ad data?

Key takeaways

Frequently Asked Questions (FAQs)

What is Amazon PPC advertising?

Why does a plain request return no ads from Amazon?

How do I scrape only the sponsored ads and not the organic results?

My title or price comes back as "not found." What changed?

What can I do with scraped Amazon ad data?

Do I need the normal token or the JS token for Amazon?

Crawl any site at scale, without fighting infrastructure.

Continue Reading

How to Scrape Google People Also Ask: full PAA extraction guide

Introducing the New Crawlbase Dashboard: a cleaner control center

13 Tips to Master Data Crawling: crawls that do not break

The infrastructure brief, in your inbox.