Costco runs more than 800 warehouses worldwide, and the same catalog sits on costco.com as public product pages and search results. Prices, item numbers, ratings, and stock status are all right there on the page for anyone to read without an account. That public data is a clean signal for price tracking, competitive research, and inventory monitoring, which is why retailers, analysts, and developers pull it on a schedule.

This guide shows you how to scrape Costco product data with Python. You build a small, runnable scraper that fetches Costco search and product pages through the Crawling API, parses a clean record for each item, handles pagination across search results, and exports the data to JSON and CSV. The whole walkthrough stays scoped to public product data: the titles, prices, item numbers, ratings, and availability anyone can see on a listing without logging in.

What you will build

A Python script that takes a Costco search URL or a product URL, retrieves the rendered page through the Crawling API, and extracts a structured record per product. We use a sofas search as the running example, the same category the legacy walkthrough used, and pull these fields:

  • Title the product name shown on the listing or product page.
  • Price the listed price, when the product shows one.
  • Item number Costco's unique product identifier, useful for tracking inventory across runs.
  • Rating the average star rating, read from the product card or page.
  • Availability the in-stock or out-of-stock signal on the product page.
  • Product URL the link to the item's own detail page.
  • Image URL the product image source.

Why a plain request fails on Costco

If you point a bare HTTP client at a Costco search or product URL, you rarely get the data you came for. Two things work against you. First, Costco's pages render a lot of their content client-side: the site ships a lightweight shell and fills the product grid and price blocks in as the page's JavaScript runs, so the initial HTML is often missing the fields you want. Second, Costco flags automated traffic quickly. Datacenter IP ranges and request patterns that do not look like a real browser get met with a challenge, an interstitial, or an outright block before you ever reach the listings.

So a working Costco scraper needs two things in one request: a browser that renders the page, and an IP that the site reads as a real shopper. You can assemble that yourself with a headless browser and a pool of rotating residential proxies, but keeping that stack healthy is most of the work. The Crawling API folds both into a single call: you send it the URL, it renders the page behind a trusted residential IP, handles the rotation and CAPTCHA solving, and returns finished HTML for you to parse.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, the official Python docs or any beginner course covers the level this tutorial assumes.

Python 3.8 or later. Confirm your version with python --version (or python3 --version). If you do not have it, install it from python.org and make sure Python is on your system PATH.

A Crawlbase account and token. Sign up for a free account, open your dashboard, and copy your token. Costco's pages are JavaScript-heavy, so you want the JavaScript token for these requests. The free tier includes 1,000 requests with no card, which is plenty to build and test this scraper. Treat the token like a password and keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the two libraries the scraper needs. crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull each field out of the page by CSS selector.

bash
python --version

python -m venv costco_env
source costco_env/bin/activate

pip install crawlbase beautifulsoup4

On Windows, activate the environment with costco_env\Scripts\activate instead of the source line. With both libraries installed, create the script file the rest of the guide builds up:

bash
touch costco_scraper.py

Understanding the Costco pages

There are two page types worth scraping, and they carry different markup. A search results page like https://www.costco.com/s?dept=All&keyword=sofas lays out a grid of product cards. A product page like https://www.costco.com/coddle-aria-fabric-sleeper-sofa.product.4000223041.html shows the full detail view for one item, including the description, specifications, and availability.

Before writing selectors, open each page in your browser, right-click a product, and choose Inspect. On the search page, Costco wraps the listing in div[id="productList"] and groups items under div[data-testid="Grid"]. Each card exposes the title in a div whose data-testid starts with Text_ProductTile_, the price in one starting with Text_Price_, the rating under Rating_ProductTile_, the link in an a[data-testid="Link"] anchor, and the image in an img tag. On the product page, the title is an h1[automation-id="productName"], the price a span[automation-id="productPriceOutput"], and the rating a div[itemprop="ratingValue"]. These are the elements you target.

Step 1: Fetch a rendered Costco page

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your token, set the search URL, and request it. Costco loads its grid asynchronously, so pass the ajax_wait and page_wait options to give the page time to finish before it is captured. Checking the status code before you parse keeps failures loud instead of silent.

python
from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("utf-8")
    print(f"Request failed: {response['status_code']}")
    return None

if __name__ == "__main__":
    search_url = "https://www.costco.com/s?dept=All&keyword=sofas"
    html = crawl(search_url)
    print(html[:500] if html else "No HTML returned")

The two wait options matter for a grid that fills in as the page loads. ajax_wait tells the API to wait for asynchronous content to finish, and page_wait holds for a fixed number of milliseconds after load so the late-rendering cards appear before the page is captured. Run the script and you should see real product markup, not a loading shell. That confirms rendering works before you write a single selector.

Crawlbase Crawling API

That sofas grid only appears after Costco's JavaScript runs, and it only loads at all from an IP the site trusts. The Crawling API takes your token, runs the page in a real browser, rotates through residential IPs server-side, and handles the CAPTCHA solving, then hands you finished HTML. You skip running a headless browser fleet and a proxy pool yourself, which is why the ajax_wait and page_wait options above are all it takes. Point it at a Costco search on the free 1,000-request tier first.

Step 2: Parse the search listings with BeautifulSoup

With rendered HTML in hand, load it into BeautifulSoup, find every product card, and pull each field by its selector. Costco groups items under div[id="productList"] > div[data-testid="Grid"], and each card exposes its title, price, rating, link, and image. Wrap the loop so a missing field returns a clean fallback instead of crashing the run.

python
from bs4 import BeautifulSoup
import re

def item_number_from_url(url):
    match = re.search(r"\.product\.(\d+)\.html", url or "")
    return match.group(1) if match else "N/A"

def scrape_search_listings(html):
    soup = BeautifulSoup(html, "html.parser")
    products = []
    items = soup.select('div[id="productList"] > div[data-testid="Grid"]')
    for item in items:
        title_el = item.select_one('div[data-testid^="Text_ProductTile_"]')
        price_el = item.select_one('div[data-testid^="Text_Price_"]')
        rating_el = item.select_one('div[data-testid^="Rating_ProductTile_"] > div')
        link_el = item.select_one('a[data-testid="Link"]')
        image_el = item.find("img")

        product_url = link_el["href"] if link_el else "N/A"
        products.append({
            "title": title_el.get_text(strip=True) if title_el else "N/A",
            "price": price_el.get_text(strip=True) if price_el else "N/A",
            "item_number": item_number_from_url(product_url),
            "rating": rating_el["aria-label"] if rating_el and rating_el.has_attr("aria-label") else "N/A",
            "product_url": product_url,
            "image_url": image_el["src"] if image_el and image_el.has_attr("src") else "N/A",
        })
    return products

Each card maps to a clean dictionary. The title comes from div[data-testid^="Text_ProductTile_"] and the price from div[data-testid^="Text_Price_"], both using the ^= starts-with match because Costco suffixes those test IDs per product. The rating is read from the aria-label on the nested rating div, which holds the readable "Average rating is 4.65 out of 5 stars" text. The item number is parsed out of the product URL, since Costco encodes it in the .product.NNNN.html segment, so you get a stable identifier without a separate selector. Every field falls back to "N/A" when absent, which keeps extraction resilient since not every card carries a price or rating.

Selectors drift

Costco's per-product data-testid suffixes and class names change as the site ships updates, while the structural markers like div[id="productList"] and the Text_ProductTile_ prefix are more durable. Treat the selectors above as a starting template, not a contract. When a field comes back as "N/A" for every card, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper.

Step 3: Handle pagination across search pages

Costco search results span multiple pages, and the site loads each one with a &currentPage= parameter on the URL. To collect a full category, append that parameter and walk the pages in order, pacing the requests so you are not hammering the site in a tight loop.

python
import time

def scrape_all_pages(base_url, total_pages):
    all_products = []
    for page_num in range(1, total_pages + 1):
        paginated_url = f"{base_url}&currentPage={page_num}"
        print(f"Scraping page {page_num}")
        html = crawl(paginated_url)
        if not html:
            break
        found = scrape_search_listings(html)
        if not found:
            break
        all_products.extend(found)
        time.sleep(2)
    return all_products

The empty-results break stops you early when a category runs out of pages, and the time.sleep(2) between requests paces the run so you are not flagged for rapid-fire traffic. Set total_pages to however many result pages the search spans for your keyword.

Step 4: Scrape an individual product page

Search cards give you the headline fields, but the product page carries the rest: the full description, the structured specifications, and the availability signal. The page renders the same way the search grid does, so reuse the crawl helper and parse the detail selectors.

python
def scrape_product_page(html, url):
    soup = BeautifulSoup(html, "html.parser")

    title_el = soup.select_one('h1[automation-id="productName"]')
    price_el = soup.select_one('span[automation-id="productPriceOutput"]')
    rating_el = soup.select_one('div[itemprop="ratingValue"]')
    desc_el = soup.select_one('div[id="product-tab1-espotdetails"]')
    image_el = soup.find("img", class_="thumbnail-image")
    stock_el = soup.select_one('div[automation-id="productInventoryStatus"]')

    specifications = {}
    for row in soup.select("div.product-info-description .row"):
        name = row.select_one(".spec-name")
        value = row.select_one("div:not(.spec-name)")
        if name and value:
            specifications[name.get_text(strip=True)] = value.get_text(strip=True)

    return {
        "title": title_el.get_text(strip=True) if title_el else "N/A",
        "price": price_el.get_text(strip=True) if price_el else "N/A",
        "item_number": item_number_from_url(url),
        "rating": rating_el.get_text(strip=True) if rating_el else "N/A",
        "availability": stock_el.get_text(strip=True) if stock_el else "N/A",
        "description": desc_el.get_text(strip=True) if desc_el else "N/A",
        "image_url": image_el["src"] if image_el and image_el.has_attr("src") else "N/A",
        "specifications": specifications,
    }

The product-page selectors come straight from Costco's detail markup. The title is an h1[automation-id="productName"], the price a span[automation-id="productPriceOutput"], the rating a div[itemprop="ratingValue"], and the description block lives under div[id="product-tab1-espotdetails"]. The specifications loop walks each .row in the description table, reads the label from .spec-name, and pairs it with the sibling value cell, so you end up with a clean dictionary of attributes like frame material and dimensions. Availability is read from the inventory status block; reuse the same item_number_from_url helper so the identifier stays consistent with the search records.

Step 5: Assemble the script and export JSON and CSV

Now wire the fetch, the pagination, and the parsers into one runnable script, then write the records to both JSON and CSV so you can load them into a notebook or a spreadsheet. A shared field list keeps the CSV columns in step with the dictionary keys.

python
import csv
import json
import re
import time
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})
FIELDS = ["title", "price", "item_number", "rating", "product_url", "image_url"]

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("utf-8")
    print(f"Request failed: {response['status_code']}")
    return None

def item_number_from_url(url):
    match = re.search(r"\.product\.(\d+)\.html", url or "")
    return match.group(1) if match else "N/A"

def scrape_search_listings(html):
    soup = BeautifulSoup(html, "html.parser")
    products = []
    items = soup.select('div[id="productList"] > div[data-testid="Grid"]')
    for item in items:
        title_el = item.select_one('div[data-testid^="Text_ProductTile_"]')
        price_el = item.select_one('div[data-testid^="Text_Price_"]')
        rating_el = item.select_one('div[data-testid^="Rating_ProductTile_"] > div')
        link_el = item.select_one('a[data-testid="Link"]')
        image_el = item.find("img")
        product_url = link_el["href"] if link_el else "N/A"
        products.append({
            "title": title_el.get_text(strip=True) if title_el else "N/A",
            "price": price_el.get_text(strip=True) if price_el else "N/A",
            "item_number": item_number_from_url(product_url),
            "rating": rating_el["aria-label"] if rating_el and rating_el.has_attr("aria-label") else "N/A",
            "product_url": product_url,
            "image_url": image_el["src"] if image_el and image_el.has_attr("src") else "N/A",
        })
    return products

def scrape_all_pages(base_url, total_pages):
    all_products = []
    for page_num in range(1, total_pages + 1):
        paginated_url = f"{base_url}&currentPage={page_num}"
        print(f"Scraping page {page_num}")
        html = crawl(paginated_url)
        if not html:
            break
        found = scrape_search_listings(html)
        if not found:
            break
        all_products.extend(found)
        time.sleep(2)
    return all_products

def export(rows, name="costco_products"):
    with open(f"{name}.json", "w", encoding="utf-8") as f:
        json.dump(rows, f, indent=2, ensure_ascii=False)
    with open(f"{name}.csv", "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=FIELDS)
        writer.writeheader()
        writer.writerows(rows)
    print(f"Saved {len(rows)} products to {name}.json and {name}.csv")

def main():
    base_url = "https://www.costco.com/s?dept=All&keyword=sofas"
    products = scrape_all_pages(base_url, total_pages=5)
    export(products)

if __name__ == "__main__":
    main()

Run the full script with python costco_scraper.py. It walks the search pages, parses one row per product, and writes both costco_products.json and costco_products.csv. The shared FIELDS list keeps the CSV column order in step with the dictionary keys, so the two exports never drift apart. To enrich the data with descriptions, specifications, and availability, feed each product_url to crawl and scrape_product_page from Step 4.

What the output looks like

You get a clean list of product records, ready to write to JSON, CSV, or a database.

json
[
  {
    "title": "Coddle Aria Fabric Sleeper Sofa with Reversible Chaise Gray",
    "price": "$1,299.99",
    "item_number": "4000223041",
    "rating": "Average rating is 4.65 out of 5 stars. Based on 1668 reviews.",
    "product_url": "https://www.costco.com/coddle-aria-fabric-sleeper-sofa-with-reversible-chaise-gray.product.4000223041.html",
    "image_url": "https://cdn.bfldr.com/U447IH35/at/nx2pbmjk76t8c5k4h3qpsg6/4000223041-847_gray_1.jpg"
  },
  {
    "title": "Larissa Fabric Chaise Sofa",
    "price": "$1,899.99",
    "item_number": "4000052035",
    "rating": "Average rating is 4.03 out of 5 stars. Based on 87 reviews.",
    "product_url": "https://www.costco.com/larissa-fabric-chaise-sofa.product.4000052035.html",
    "image_url": "https://cdn.bfldr.com/U447IH35/as/ck2h3n29gz2j6m7c9f7x4rhm/4000052035-847_gray_1"
  }
]

A product-page run returns a richer record, with the description, the availability signal, and a specifications dictionary of attributes like back style, frame material, and overall dimensions pulled straight from the detail table.

Staying unblocked

Even with rendering handled, Costco watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.

  • Pace your requests. Spread requests out with a delay between pages rather than crawling everything at full speed. Schedule heavier jobs during off-peak hours to ease load on Costco's servers.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
  • Retain only what you need. Store the product fields your project uses and discard the rest. Re-check your selectors periodically so the scraper keeps pace with markup changes.

For the broader playbook on avoiding blocks, see how to scrape websites without getting blocked, and for more on why rendering matters here, how to crawl JavaScript websites. The data you collect feeds straight into price intelligence work, and the broader patterns carry over to other stores covered in this ecommerce web scraping guide.

Whether scraping Costco is allowed depends on Costco's Terms of Use, your jurisdiction, and what you do with the data. Costco's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Costco's Terms of Use and its robots.txt, and treat both as the boundary for what you collect. For commercial or competitive use, the legal picture gets more complex, and consulting a legal expert about your specific case is the sensible move.

A few lines worth holding to. Collect only public data: the titles, prices, item numbers, ratings, availability, and listing links that anyone can see on a Costco page without an account. Keep your request volume low enough that you are not straining Costco's servers, and avoid personal data, including anything tied to identifiable members, reviewers, or sellers beyond what is publicly listed. If you plan to reuse the data commercially, get permission or an official agreement rather than assuming silence is consent.

This guide is deliberately scoped to public product and search pages because that is the line that keeps the work defensible. It does not cover anything behind a login, member or order data, pricing shown only to signed-in members, or any attempt to bypass authentication or a CAPTCHA you are not entitled to pass. If your project needs more than public product data, an official data agreement or partner program is the correct path, not a cleverer scraper.

Recap

Key takeaways

  • Costco data is a clean retail signal. Public titles, prices, item numbers, ratings, and availability feed price tracking, market research, and inventory monitoring.
  • You need rendering and a trusted IP together. Costco loads its grid client-side and blocks bot traffic, so the Crawling API renders the page behind a residential IP in one call with ajax_wait and page_wait.
  • BeautifulSoup does the extraction. Loop div[id="productList"] > div[data-testid="Grid"] cards on search and the automation-id selectors on product pages, and expect those selectors to drift.
  • Paginate with currentPage and export both formats. Walk the &currentPage= parameter to cover a full category, then write JSON and CSV from a shared field list so they stay in sync.
  • Stay on public data. Respect Costco's Terms of Use and robots.txt, pace your requests, and never touch member accounts, orders, or personal information.

Frequently Asked Questions (FAQs)

Why does a plain request return no products from Costco?

Costco renders much of its product grid and price blocks client-side as the page loads, so a raw request often gets a shell missing the fields you want. On top of that, Costco challenges or blocks traffic that does not look like a real browser. Rendering the page through the Crawling API behind a trusted IP, with the ajax_wait and page_wait options set, solves both, which is why the scraper here routes its request through it.

What data can I extract from Costco with this scraper?

From search listings you get the title, price, item number, rating, product URL, and image URL for each card. From an individual product page you also get the description, the full specifications table, and the availability signal. All of it is public product data you can store in JSON or CSV for analysis.

How do I scrape multiple pages of Costco search results?

Costco loads each search page with a &currentPage= parameter on the URL. Append that parameter and loop over the page numbers, parsing each page in turn and breaking early when a page returns no products. Add a short delay between requests so you pace the run rather than hammering the site.

How do I get the Costco item number?

Costco encodes the item number in the product URL, in the .product.NNNN.html segment. The scraper parses it out with a small regex helper, so you get a stable identifier from both search cards and product pages without relying on a separate on-page selector that might change.

Do I need the JavaScript token to scrape Costco?

Yes. Costco's pages depend on JavaScript to render the product grid, prices, and availability, so you want the JavaScript token and the wait options when you call the Crawling API. The free tier includes 1,000 requests to build and test with, and you can check current rates on the pricing page.

How do I avoid getting blocked while scraping Costco?

Keep your per-IP request rate low, add a delay between pages, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation, a trusted IP pool, and CAPTCHA handling for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing challenges.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available