Office Depot is one of the largest office-supply retailers in the United States. The company runs more than 1,400 stores, employs over 38,000 people, and owns the OfficeMax and Grand & Toy brands alongside its main catalog of chairs, desks, stationery, printers, and school supplies. Its public storefront lists prices, ratings, review counts, and stock status across thousands of SKUs, which makes it a useful source for competitor price tracking, assortment research, and inventory monitoring.

This guide shows you how to scrape Office Depot with Python the reliable way. You build a small, runnable scraper that fetches a rendered product page through the Crawling API, parses it with BeautifulSoup, and pulls a clean record for each item: product name, price, SKU or model number, rating, review count, availability, and the product URL. The whole walkthrough stays scoped to public catalog data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.

What you will build

A Python script that takes an Office Depot product URL, retrieves the rendered page through the Crawling API, and extracts a structured record. We will use the Epson Expression Home XP-4200 wireless printer as the running example and pull these fields:

  • Product name the listing title, for example "Epson Expression Home XP-4200 Wireless Color Inkjet All-In-One Printer".
  • Price the listed price shown on the page.
  • SKU / model the item number that identifies the product in Office Depot's catalog.
  • Rating the average star rating, when the product has one.
  • Review count the number of customer reviews behind that rating.
  • Availability whether the item reads as in stock or out of stock.
  • Product URL the canonical link to the product's detail page.

Why a plain request fails on Office Depot

If you request an Office Depot URL with a bare HTTP client, you usually get a response with status 200 and almost none of the product data in the body. Two things work against you. First, the site builds much of its product and price content client-side: the price block and several other fields are filled in by JavaScript after the initial HTML loads, so a raw fetch sees placeholders instead of values. Second, Office Depot flags automated traffic. Datacenter IPs and request patterns that do not look like a real browser get challenged with a CAPTCHA, rate limited, or blocked before they reach the finished page.

So a working Office Depot scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real shopper. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but keeping that stack healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted residential IP, and it returns finished HTML for you to parse. For more on why client-rendered pages defeat a plain fetch, see how to crawl JavaScript websites.

Why the JS token

Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Office Depot fills its price and several fields in client-side, so you want the JS token here. The normal token returns the pre-render shell, which leaves the price and stock fields empty when BeautifulSoup goes looking for them.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, our guide on web scraping with Python covers the groundwork this tutorial assumes.

Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.

A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token from the account docs page. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the libraries the scraper needs.

bash
python --version

python -m venv office_depot_env
source office_depot_env/bin/activate

pip install crawlbase beautifulsoup4 pandas

On Windows, activate the environment with office_depot_env\Scripts\activate instead of the source line. Three dependencies do the work: crawlbase is the official client for the Crawling API, beautifulsoup4 parses the returned HTML so you can pull each field out by CSS selector, and pandas writes the records to CSV at the end.

Understanding the Office Depot product page

An Office Depot product page exposes the same handful of fields you care about: a title in a heading, a price in a dedicated price element, an item number that doubles as the SKU, a star rating with a review count, and a delivery message that tells you whether the item is in stock. Below those sit a description and a specifications table.

Before writing selectors, open a product page in your browser, right-click the title, and choose Inspect. Office Depot prefixes most of its class names with od-, for example od-heading on the title and od-graphql-price-big-price on the price. Those class names are what you target. They change from time to time, so treat the selectors below as a current snapshot rather than a permanent contract.

Step 1: Fetch the rendered product page

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JS token, and request the product URL. Checking the status code before you parse keeps failures loud instead of silent.

python
from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"})

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("utf-8")
    print(f"Request failed: {response['status_code']}")
    return None

if __name__ == "__main__":
    url = "https://www.officedepot.com/a/products/8761287/Epson-Expression-Home-XP-4200-Wireless/"
    html = crawl(url)
    print(html[:500] if html else "No HTML returned")

The two wait options matter for a client-rendered target like this. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so the late-rendering price and stock fields appear before the page is captured. Five seconds is a reasonable start; raise it if fields come back empty. Run the script and you should see real product markup, not the pre-render shell a plain fetch returns. That confirms rendering works before you write a single selector.

Crawlbase Crawling API

Office Depot needs a rendered page behind a trusted IP, in one call, so the price and stock fields are actually present when you parse. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a single product URL on the free tier first.

Step 2: Parse the fields with BeautifulSoup

With rendered HTML in hand, load it into BeautifulSoup and pull each field by its selector. Office Depot exposes the title in h1.od-heading.sku-heading, the price in span.od-graphql-price-big-price, and the delivery message in span.od-delivery-message-text. A small helper that returns a fallback when an element is missing keeps extraction from crashing on a field that is absent on a given page.

python
from bs4 import BeautifulSoup

def text_of(soup, selector, default="N/A"):
    el = soup.select_one(selector)
    return el.get_text(strip=True) if el else default

def parse_rating(soup):
    el = soup.select_one(".od-stars-inner")
    if not el or not el.get("style"):
        return None
    # style is like "width: 86%"; map percent of 5 stars
    percent = el["style"].split(":")[-1].strip().rstrip("%")
    try:
        return round(float(percent) / 20, 1)
    except ValueError:
        return None

def extract_product(html, url):
    soup = BeautifulSoup(html, "html.parser")
    reviews = text_of(soup, ".od-reviews-count-number").strip("()")
    sku = text_of(soup, ".od-product-card-region-product-number").split("#")[-1]
    delivery = text_of(soup, "span.od-delivery-message-text", "").lower()
    availability = "In Stock" if "in stock" in delivery else "Out of Stock"
    return {
        "name": text_of(soup, "h1.od-heading.sku-heading"),
        "price": text_of(soup, "span.od-graphql-price-big-price"),
        "sku": sku,
        "rating": parse_rating(soup),
        "review_count": reviews,
        "availability": availability,
        "product_url": url,
    }

The text_of helper returns a fallback when an element is missing instead of throwing on a .get_text() call against nothing. The SKU comes from .od-product-card-region-product-number, split on the # so you keep just the item number. Office Depot encodes the star rating as a CSS width percentage on .od-stars-inner, so parse_rating reads that percent and divides by 20 to recover a value out of five. The review count lives in .od-reviews-count-number wrapped in parentheses, which strip("()") removes, and availability is derived from the delivery message text. For a refresher on selecting elements, see how to use BeautifulSoup in Python.

One thing to expect: Office Depot's od- class names (the od-graphql-price-big-price price element, the od-stars-inner rating bar) change without notice, and the page layout shifts occasionally too. Treat the selectors above as a starting template, not a contract. When a field comes back as N/A or None on a page you know has the value, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.

Step 3: Put it together

Now wire the fetch, the parse, and a CSV write into one runnable script. Fetch the rendered page, hand it to the extractor, print the record, and save it with pandas.

python
import json
import pandas as pd
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"})

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("utf-8")
    print(f"Request failed: {response['status_code']}")
    return None

def text_of(soup, selector, default="N/A"):
    el = soup.select_one(selector)
    return el.get_text(strip=True) if el else default

def parse_rating(soup):
    el = soup.select_one(".od-stars-inner")
    if not el or not el.get("style"):
        return None
    percent = el["style"].split(":")[-1].strip().rstrip("%")
    try:
        return round(float(percent) / 20, 1)
    except ValueError:
        return None

def extract_product(html, url):
    soup = BeautifulSoup(html, "html.parser")
    reviews = text_of(soup, ".od-reviews-count-number").strip("()")
    sku = text_of(soup, ".od-product-card-region-product-number").split("#")[-1]
    delivery = text_of(soup, "span.od-delivery-message-text", "").lower()
    availability = "In Stock" if "in stock" in delivery else "Out of Stock"
    return {
        "name": text_of(soup, "h1.od-heading.sku-heading"),
        "price": text_of(soup, "span.od-graphql-price-big-price"),
        "sku": sku,
        "rating": parse_rating(soup),
        "review_count": reviews,
        "availability": availability,
        "product_url": url,
    }

def main():
    url = "https://www.officedepot.com/a/products/8761287/Epson-Expression-Home-XP-4200-Wireless/"
    html = crawl(url)
    if not html:
        return
    product = extract_product(html, url)
    print(json.dumps(product, indent=2))
    pd.DataFrame([product]).to_csv("office_depot_products.csv", index=False)
    print("Saved to office_depot_products.csv")

if __name__ == "__main__":
    main()

What the output looks like

Run the full script with python scraper.py and you get a clean record, ready to write to JSON, CSV, or a database. The JSON printout looks like this:

json
{
  "name": "Epson Expression Home XP-4200 Wireless Color Inkjet All-In-One Printer",
  "price": "$99.99",
  "sku": "8761287",
  "rating": 4.3,
  "review_count": "512",
  "availability": "In Stock",
  "product_url": "https://www.officedepot.com/a/products/8761287/Epson-Expression-Home-XP-4200-Wireless/"
}

Scaling to a search page and pagination

One product is a demo; a real job collects many. Office Depot search results lay out a grid of product cards, each carrying a title, price, rating, review count, item number, and a link into the product page. You scrape that grid the same way: select every card, pull the same fields per card, then follow the links to product pages when you need the deeper detail. Search paginates with a &page= parameter, so you walk pages by incrementing it and stopping when a page returns no cards.

python
import time

def extract_cards(html):
    soup = BeautifulSoup(html, "html.parser")
    grid = ".od-search-browse-products-vertical-grid-product"
    products = []
    for card in soup.select(grid):
        link_el = card.select_one(".od-product-card-region-description a")
        href = link_el["href"] if link_el else ""
        products.append({
            "name": text_of(card, ".od-product-card-region-description a"),
            "price": text_of(card, ".od-graphql-price-big-price"),
            "review_count": text_of(card, ".od-reviews-count-number").strip("()"),
            "product_url": "https://www.officedepot.com" + href if href else "N/A",
        })
    return products

def scrape_all_pages(base_url, max_pages=5):
    all_products = []
    for page in range(1, max_pages + 1):
        html = crawl(f"{base_url}&page={page}")
        if not html:
            break
        products = extract_cards(html)
        if not products:
            break
        all_products.extend(products)
        print(f"Page {page}: {len(products)} products")
        time.sleep(2)
    return all_products

The max_pages cap keeps a run bounded so a broad query does not spin forever, and the empty-results break stops you early when search runs out of pages. The time.sleep(2) between pages paces requests so you are not hammering search in a tight loop, which is the fastest way to get throttled. For more on structuring multi-page retail crawls, our guide on ecommerce web scraping covers the patterns in depth.

Prefer structured output?

If you would rather not maintain CSS selectors at all, the Crawling API with auto-parse returns ready-made structured JSON for common ecommerce pages, so you get fields like name, price, and rating without writing extraction code. It is a good fit when you want the data shape handled for you and the selector drift handled upstream.

Staying unblocked

Even with rendering handled, Office Depot watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.

  • Pace your requests. Spread requests out with a delay between pages and vary your queries instead of crawling one term at full speed. The time.sleep in the pagination loop is the floor, not the ceiling.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
  • Read the status codes. A run that starts returning challenges or errors is telling you the current rate or IP tier is no longer enough. Treat that as a signal to back off, not noise to ignore.

For the broader playbook, see how to scrape websites without getting blocked. If you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart AI Proxy (also called the AI Proxy) gives you the same residential IP rotation as a drop-in proxy endpoint.

Whether scraping Office Depot is allowed depends on Office Depot's terms of service, your jurisdiction, and what you do with the data. Office Depot's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Office Depot's terms of use and its robots.txt, and treat both as the boundary for what you collect.

A few lines worth holding to. Collect only public catalog data: the product names, prices, SKUs, ratings, review counts, availability, and listing links that anyone can see without an account. Keep your request volume low enough that you are not straining Office Depot's servers. Avoid personal data, including anything tied to identifiable shoppers, reviewers, or store employees, and do not redistribute copyrighted product photography or descriptions as if they were yours. Scraping for internal research or price comparison is a far easier case to defend than scraping to republish or resell the catalog wholesale.

This guide is deliberately scoped to public product and search pages because that is the line that keeps the work defensible. It does not cover anything behind a login, account or order data, payment or checkout flows, or any attempt to bypass authentication. For licensed or high-volume access, the correct path is an official feed, API, or a data agreement with Office Depot rather than a cleverer scraper. If your project needs guaranteed structure, large volumes, or commercial rights, ask for sanctioned access first; a managed crawling tool only handles the fetch, not the permission to use the data.

Recap

Key takeaways

  • Office Depot fills key fields client-side. A plain fetch returns a pre-render shell with the price and stock blank, so you must render the page before you parse it.
  • You need rendering and a trusted IP together. The Crawling API with a JS token does both in one call; ajax_wait and page_wait control how long it waits for the late fields.
  • BeautifulSoup does the extraction. Map name, price, SKU, rating, review count, availability, and the product URL to current od- selectors, and expect those selectors to drift.
  • Scale with search and pagination. Loop the result-card grid, walk &page= until a page returns no cards, pace requests with a delay, and cap the page count.
  • Stay on public data. Respect Office Depot's ToS and robots.txt, prefer an official feed or API for licensed or bulk data, and never touch accounts, orders, or personal information.

Frequently Asked Questions (FAQs)

Why does a plain request return no price from Office Depot?

Because Office Depot fills its price and several other fields client-side with JavaScript. The initial HTML is a pre-render shell, so a raw HTTP request returns status 200 with the price and stock fields empty. To get real values you have to render the page first, which is what the Crawling API's JS token handles for you.

Do I need the normal token or the JS token for Office Depot?

The JS token. The normal token fetches static HTML, which on Office Depot leaves the price and availability fields unfilled. The JS token renders the page in a real browser before handing back the HTML, so those fields are present when BeautifulSoup parses them.

What fields can I extract from an Office Depot product page?

The public ones shown on the page: product name, price, the item number that serves as the SKU or model, the star rating, the review count, availability, and the canonical product URL. Search result cards expose most of the same fields per listing, which is how you collect many products before drilling into individual pages.

Search paginates with a &page= parameter on the URL. Increment it in a loop, scrape each page with the same card parser, and stop when a page returns no products. Cap the page count and add a short delay between requests so you pace the run and do not get throttled.

My selectors return N/A. What changed?

Almost certainly Office Depot's markup. Its od- class names like od-graphql-price-big-price for price and od-stars-inner for the rating bar change without notice, and the page layout shifts occasionally too. Re-inspect a live page in your browser's dev tools and update the selectors. Periodic maintenance is normal for any production scraper.

Can I scrape account or order data from Office Depot?

No, and this guide does not cover it. Account details, order history, and checkout flows sit behind a login, so they are not public data. Scraping login-walled content, or bypassing authentication to reach it, is out of scope here and runs against Office Depot's terms. For sanctioned access to richer data, the correct route is an official feed, API, or a data agreement.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available