Zoro.com is a large industrial-supply retailer with millions of products: tools, equipment, fasteners, safety gear, and MRO parts, each listing carrying a brand, a price, a Zoro number, and a stock status. Those listings are a clean, public signal for anyone doing price tracking, catalog research, or competitive analysis in the industrial space, which is why distributors, procurement teams, and analysts watch them.

This guide shows you how to scrape Zoro product data with Python. You build a small, runnable scraper that fetches Zoro search and product pages through the Crawling API, parses a clean record for each item, handles pagination, and exports the results to JSON and CSV. The whole walkthrough stays scoped to public catalog data: the brands, titles, prices, Zoro numbers, and availability anyone can see on a search or product page without logging in.

What you will build

A Python script that takes a Zoro search URL, retrieves the rendered page through the Crawling API, and extracts a structured record per product. We use a tool box search as the running example, the same query the legacy walkthrough used, and pull these fields from each listing:

  • Brand the manufacturer name shown on the product card.
  • Title the product name as listed.
  • Price the listed price, when the product shows one.
  • Zoro number the SKU-style identifier from the product URL, for example G6893443.
  • Availability the stock status, such as in stock or backordered.
  • Link the absolute URL to the product's own detail page.

Why a plain request fails on Zoro

If you point a bare HTTP client at a Zoro search URL, you rarely get the product grid you came for. Two things work against you. First, Zoro renders much of the listing client-side: the page ships a lightweight shell and fills the product cards in as its JavaScript runs, so the initial HTML is often missing the items you want to parse. Second, Zoro flags automated traffic. Datacenter IP ranges and request patterns that do not look like a real browser get met with a challenge page or an outright block before you ever reach the listings.

So a working Zoro scraper needs two things in one request: a browser that renders the page, and an IP that Zoro reads as a real shopper. You can assemble that yourself with a headless browser and a pool of rotating residential proxies, but keeping that stack healthy is most of the work. The Crawling API folds both into a single call: you send it the URL, it renders the page behind a trusted residential IP, handles the rotation and CAPTCHA solving, and returns finished HTML for you to parse.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, the guide on web scraping with Python covers the level this tutorial assumes.

Python 3.8 or later. Confirm your version with python --version (or python3 --version). If you do not have it, install it from python.org and make sure Python is on your system PATH.

A Crawlbase account and token. Sign up for a free account, open your dashboard, and copy your token from the account docs page. The free tier includes 1,000 requests with no card, which is plenty to build and test this scraper. Because Zoro is JavaScript-rendered, use your JavaScript token for these requests. Treat the token like a password and keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the two libraries the scraper needs. crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull each field out of the product cards by CSS selector.

bash
python --version

python -m venv zoro_env
source zoro_env/bin/activate

pip install crawlbase beautifulsoup4

On Windows, activate the environment with zoro_env\Scripts\activate instead of the source line. With both libraries installed, create the script file the rest of the guide builds up:

bash
touch zoro_scraper.py

Understanding the Zoro search page

A Zoro search lives at a stable URL built from the q query parameter, for example https://www.zoro.com/search?q=tool+box, and it paginates with a page parameter appended as &page=2. The page lays out a grid of product cards, one per item, each carrying the same handful of fields: a brand name, a title, a price, a thumbnail image, and a link into the product's detail page.

Before writing selectors, open a Zoro search page in your browser, right-click a product card, and choose Inspect. From the DOM explorer you can read off the CSS selectors for each field:

  • Brand name inside a <span> with the class brand-name.
  • Product title inside a <div> with the class product-title.
  • Price inside a <div> with the class price.
  • Product URL in the href of an <a> nested inside div.product-title.
  • Product image in the src of an <img> carrying data-za="product-image".

The product cards themselves sit inside a section[data-za="product-cards-list"] container, each one a div.search-product-card, which is the loop target. The Zoro number is not its own element on the card: it is the SKU segment in the product URL (the G6893443 in /i/G6893443/), so you derive it from the link rather than selecting it directly.

Step 1: Fetch the rendered search page

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your token, set the search URL, and request it. Checking the status code before you parse keeps failures loud instead of silent.

python
from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("utf-8")
    print(f"Request failed: {response['status_code']}")
    return None

if __name__ == "__main__":
    search_url = "https://www.zoro.com/search?q=tool+box"
    html = crawl(search_url)
    print(html[:500] if html else "No HTML returned")

The two wait options matter for a grid that fills in as the page loads. ajax_wait tells the API to wait for asynchronous content to finish, and page_wait holds for a fixed number of milliseconds (5000 here, since Zoro's listing pages can be slow) so the late-rendering cards appear before the page is captured. Run the script and you should see real listing markup, not a challenge shell. That confirms rendering works before you write a single selector.

Crawlbase Crawling API

Zoro's product grid needs a rendered page behind a trusted IP, in one call. The Crawling API takes your token, runs the search page in a real browser with the ajax_wait and page_wait options you just set, rotates through residential IPs server-side, and handles the CAPTCHA solving, then hands you finished HTML. You skip running a headless browser fleet and a proxy pool yourself. Point it at a search URL on the free 1,000-request tier first.

Step 2: Parse the product cards with BeautifulSoup

With rendered HTML in hand, load it into BeautifulSoup, find every product card, and pull each field by its selector. Each card lives inside the section[data-za="product-cards-list"] container as a div.search-product-card. Read the brand, title, price, image, and link off the card, then derive the Zoro number from the link. Wrap each card in a try/except so one malformed listing does not crash the run.

python
import re
from bs4 import BeautifulSoup

BASE = "https://www.zoro.com"

def text_of(card, selector):
    el = card.select_one(selector)
    return el.get_text(strip=True) if el else None

def zoro_number(url):
    match = re.search(r"/i/(G\d+)/", url)
    return match.group(1) if match else None

def scrape_listings(html):
    soup = BeautifulSoup(html, "html.parser")
    cards = soup.select('section[data-za="product-cards-list"] > div.search-product-card')
    results = []
    for card in cards:
        try:
            anchor = card.select_one("div.product-title a")
            href = anchor["href"] if anchor else ""
            link = BASE + href if href.startswith("/") else href
            img = card.select_one('img[data-za="product-image"]')
            results.append({
                "brand": text_of(card, "span.brand-name"),
                "title": text_of(card, "div.product-title"),
                "price": text_of(card, "div.price"),
                "zoro_number": zoro_number(link),
                "availability": text_of(card, "div.availability"),
                "image_url": img["src"] if img else None,
                "link": link,
            })
        except Exception as e:
            print(f"Skipped a card: {e}")
    return results

The text_of helper queries one element inside a card and returns None when it is missing, instead of throwing on a .get_text() call against nothing. That keeps extraction resilient when a field is absent, which is common since not every card shows a price or an availability badge. The zoro_number helper pulls the G-prefixed SKU out of the product URL with a small regex, and the link is normalized to an absolute URL since Zoro serves a relative href.

Selectors drift

Class names like brand-name, product-title, and price can change when Zoro reworks its markup, while structural markers like the data-za="product-cards-list" attribute tend to be more durable. Treat the selectors above as a starting template, not a contract. When a field comes back as None for every card, re-inspect the live search page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper.

Step 3: Handle pagination and export JSON and CSV

One search page is a demo; a real job walks the result set. Zoro paginates with the page parameter, so you append &page=N and loop until a page returns no cards. Now wire the fetch, the parse, and the pagination loop into one runnable script, then write the records to both JSON and CSV so you can load them into a notebook or a spreadsheet.

python
import csv
import json
import re
import time
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})
BASE = "https://www.zoro.com"
FIELDS = ["brand", "title", "price", "zoro_number", "availability", "image_url", "link"]

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("utf-8")
    print(f"Request failed: {response['status_code']}")
    return None

def text_of(card, selector):
    el = card.select_one(selector)
    return el.get_text(strip=True) if el else None

def zoro_number(url):
    match = re.search(r"/i/(G\d+)/", url)
    return match.group(1) if match else None

def scrape_listings(html):
    soup = BeautifulSoup(html, "html.parser")
    cards = soup.select('section[data-za="product-cards-list"] > div.search-product-card')
    results = []
    for card in cards:
        try:
            anchor = card.select_one("div.product-title a")
            href = anchor["href"] if anchor else ""
            link = BASE + href if href.startswith("/") else href
            img = card.select_one('img[data-za="product-image"]')
            results.append({
                "brand": text_of(card, "span.brand-name"),
                "title": text_of(card, "div.product-title"),
                "price": text_of(card, "div.price"),
                "zoro_number": zoro_number(link),
                "availability": text_of(card, "div.availability"),
                "image_url": img["src"] if img else None,
                "link": link,
            })
        except Exception as e:
            print(f"Skipped a card: {e}")
    return results

def scrape_all_pages(search_url, max_pages=5):
    all_rows = []
    for page in range(1, max_pages + 1):
        page_url = f"{search_url}&page={page}"
        print(f"Scraping page {page}...")
        html = crawl(page_url)
        if not html:
            break
        rows = scrape_listings(html)
        if not rows:
            print("No more products. Stopping.")
            break
        all_rows.extend(rows)
        time.sleep(2)
    return all_rows

def export(rows, name="zoro_listings"):
    with open(f"{name}.json", "w", encoding="utf-8") as f:
        json.dump(rows, f, indent=2, ensure_ascii=False)
    with open(f"{name}.csv", "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=FIELDS)
        writer.writeheader()
        writer.writerows(rows)
    print(f"Saved {len(rows)} products to {name}.json and {name}.csv")

def main():
    search_url = "https://www.zoro.com/search?q=tool+box"
    rows = scrape_all_pages(search_url, max_pages=3)
    export(rows)

if __name__ == "__main__":
    main()

Run the full script with python zoro_scraper.py. It walks up to max_pages search pages, parses one row per product, and writes both zoro_listings.json and zoro_listings.csv. The scrape_all_pages loop stops early when a page returns no cards, and the time.sleep(2) between requests paces the run. The shared FIELDS list keeps the CSV column order in step with the dictionary keys, so the two exports never drift apart.

What the output looks like

You get a clean list of product records, in page order, ready to write to JSON, CSV, or a database.

json
[
  {
    "brand": "Apex Tool Group",
    "title": "3 Drawer Tool Box",
    "price": "$149.99 /ea",
    "zoro_number": "G6893443",
    "availability": "In Stock",
    "image_url": "https://www.zoro.com/static/cms/product/prev/KDT83151xx1200.jpg",
    "link": "https://www.zoro.com/apex-tool-group-3-drawer-tool-box-83151/i/G6893443/"
  },
  {
    "brand": "Dewalt",
    "title": "Rolling Tool Box, Plastic, Black, 28 in W",
    "price": "$43.19 /ea",
    "zoro_number": "G3778857",
    "availability": "In Stock",
    "image_url": "https://www.zoro.com/static/cms/product/prev/Z1wK0zqcpEx-.JPG",
    "link": "https://www.zoro.com/dewalt-rolling-tool-box-plastic-black-28-in-w-dwst28100/i/G3778857/"
  }
]

Scraping individual product pages

The search scraper gives you the listing grid. When you need deeper detail (a full description, the specification table, every image), follow the link from each record to its product page and parse that. Zoro exposes these elements on a product page:

  • Product title in an <h1> with data-za="product-name".
  • Price in a <div> with data-za="product-price".
  • Description in div.product-description div.description-text.
  • Specifications rows in a <table> inside div.product-details-info, two <td> cells per row.
  • Product images the <img> tags with class product-image inside div.product-images.
python
import re
from bs4 import BeautifulSoup

def clean(value):
    return re.sub(r"\s+", " ", value).strip() if value else None

def scrape_product_page(product_url):
    html = crawl(product_url)
    if not html:
        return {}
    soup = BeautifulSoup(html, "html.parser")

    title = soup.select_one('h1[data-za="product-name"]')
    price = soup.select_one('div[data-za="product-price"]')
    desc = soup.select_one("div.product-description div.description-text")

    specs = {}
    for row in soup.select("div.product-details-info table tr"):
        cells = row.find_all("td")
        if len(cells) == 2:
            specs[clean(cells[0].text)] = clean(cells[1].text)

    images = [img["src"] for img in soup.select("div.product-images img.product-image") if img.get("src")]

    return {
        "title": clean(title.text) if title else None,
        "price": clean(price.text) if price else None,
        "zoro_number": zoro_number(product_url),
        "description": clean(desc.text) if desc else None,
        "specifications": specs,
        "image_urls": images,
        "url": product_url,
    }

This reuses the same crawl and zoro_number helpers from the listing scraper, so you can feed it any link from the search results. The clean helper collapses runs of whitespace into single spaces, which matters for Zoro's specification tables and multi-line descriptions. The specifications come back as a flat key-value dictionary, one entry per two-cell table row, which is the shape that loads cleanly into a DataFrame or a database column set.

Staying unblocked

Even with rendering handled, Zoro watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.

  • Pace your requests. Spread requests out with a delay between pages rather than crawling everything at full speed. Schedule heavier jobs during off-peak hours to ease load on Zoro's servers.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
  • Retain only what you need. Store the catalog fields your project uses and discard the rest. Re-check your selectors periodically so the scraper keeps pace with markup changes.

For the broader playbook on avoiding blocks, see how to scrape websites without getting blocked. If price tracking is your goal, the guide on web scraping for price intelligence shows how to turn these snapshots into a trend feed, and the overview of ecommerce web scraping covers the wider pattern across retail sites. For a similar MRO and office-supply target, see how to scrape Office Depot.

Whether scraping Zoro is allowed depends on Zoro's terms of service, your jurisdiction, and what you do with the data. Zoro's terms place limits on automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Zoro's terms of service and its robots.txt, and treat both as the boundary for what you collect. For commercial or competitive use, the legal picture gets more complex, and consulting a legal expert about your specific case is the sensible move.

A few lines worth holding to. Collect only public data: the brands, titles, prices, Zoro numbers, availability, and listing links that anyone can see on a search or product page without an account. Keep your request volume low enough that you are not straining Zoro's servers, and avoid personal data, including anything tied to identifiable buyers, reviewers, or sellers beyond what is publicly listed. Do not redistribute Zoro's copyrighted product photography or descriptions wholesale. If you plan to reuse the data commercially, get permission or an official agreement rather than assuming silence is consent.

This guide is deliberately scoped to public catalog pages because that is the line that keeps the work defensible. It does not cover anything behind a login, account or order data, personal information, or any attempt to bypass authentication or a CAPTCHA you are not entitled to pass. If Zoro offers a business data feed, a partner program, or an official API for your use case, that is the right tool when you need large volumes, guaranteed structure, or commercial rights. If your project needs more than public catalog data, an official channel or a data agreement is the correct path, not a cleverer scraper.

Recap

Key takeaways

  • Zoro listings are public catalog data. Each search and product page carries a brand, title, price, Zoro number, and availability, which is why they are so useful for price tracking and industrial-supply research.
  • You need rendering and a trusted IP together. Zoro fills its product grid client-side and challenges bot traffic, so the Crawling API renders the page behind a residential IP in one call with ajax_wait and page_wait.
  • BeautifulSoup does the extraction. Loop div.search-product-card cards, map brand, title, price, availability, and link to current selectors, and derive the Zoro number from the URL with a small regex.
  • Paginate and export to JSON and CSV. Append &page=N and loop until a page returns no cards; a shared field list keeps the JSON and CSV exports in sync.
  • Stay on public data. Respect Zoro's terms of service and robots.txt, pace your requests, prefer an official feed for licensed or bulk data, and never touch accounts, orders, or personal information.

Frequently Asked Questions (FAQs)

Why does a plain request return no products from Zoro?

Two reasons. Zoro fills much of its product grid client-side as the page loads, so a raw request often gets a shell missing the listings. On top of that, Zoro challenges or blocks traffic that does not look like a real browser. Rendering the page through the Crawling API behind a trusted IP solves both, which is why the scraper here routes its request through it with ajax_wait and a page_wait of 5000 milliseconds.

What fields can I extract from a Zoro search page?

From each product card you can read the brand (span.brand-name), title (div.product-title), price (div.price), thumbnail image (img[data-za="product-image"]), and the product link. The Zoro number is the G-prefixed SKU in that link, which you pull out with a short regex. Product pages add a full description, a specification table, and the larger image set.

How do I handle pagination on Zoro?

Zoro paginates with the page query parameter. Append &page=2, &page=3, and so on to the search URL and loop, stopping when a page returns no product cards or when you hit a page cap you set. Add a short delay between requests so you are not crawling at full speed.

How do I get the Zoro number for a product?

The Zoro number is the SKU segment in the product URL, the G6893443 in /i/G6893443/. The scraper derives it with a regex against the link rather than selecting a separate element, so it is available for both search-result records and individual product pages.

Can I store the scraped data as CSV instead of JSON?

Yes. The script writes both: a JSON file for nested structure and a CSV for spreadsheets. A shared FIELDS list drives the CSV header and column order so it stays aligned with the dictionary keys. You can also load the records straight into a database if you prefer.

How do I avoid getting blocked while scraping Zoro?

Keep your per-IP request rate low, add a delay between pages, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation, a trusted IP pool, and CAPTCHA handling for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing challenges.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available