Zalando is one of the largest fashion retailers in Europe, and its public product pages carry exactly the structured data that powers price monitoring, assortment tracking, and fashion-trend research: product names, brands, prices, available sizes, and colors. The problem is that Zalando renders those pages with JavaScript and defends hard against bots, so a plain HTTP request hands you an empty shell or a challenge page instead of the catalogue you came for.

This guide shows you how to build a Zalando scraper in Python the reliable way. You fetch fully rendered product pages through the Crawling API using its JavaScript token, parse the markup with BeautifulSoup, and pull out the fields that matter. We keep the whole walkthrough scoped to public product data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.

What you will build

A Python script that takes a public Zalando product URL, retrieves the rendered HTML through the Crawling API, and extracts a structured record of the item. We will use a single product page as the running example and pull these fields:

  • Product name the title of the item, for example "Leather Handbag".
  • Brand the label behind the product, like "Zign".
  • Price the current selling price, including the currency.
  • Available sizes the sizes shown as in stock on the page.
  • Color the colorway of the listed variant.

Once one page works, we loop a list of product URLs and pace the requests so a real job stays healthy. The fashion catalogue is a classic ecommerce web scraping target, and the pattern here transfers to most retailers.

Why a plain fetch fails on Zalando

If you request a Zalando product URL with a bare HTTP client, you get one of two unhelpful responses: a status 200 with almost none of the product data in the body, or a bot challenge. Two things work against you. First, Zalando renders its product content in the browser with JavaScript, so the initial HTML is a shell that only fills in after the page's scripts run. Second, Zalando flags automated traffic quickly: datacenter IPs and request patterns that do not look like a real browser get challenged or blocked before they ever reach the rendered content.

So a working Zalando scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted IP, and it returns finished HTML for you to parse. If you want the background on why client-side rendering breaks naive scrapers, see how to crawl JavaScript websites.

Why the JS token

Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Zalando is client-side rendered, so you need the JS token here. Using the normal token returns the same empty shell a plain fetch would, and there is nothing useful to parse out of it.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If BeautifulSoup is new to you, our guide to BeautifulSoup in Python covers the parsing basics this tutorial assumes.

Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.

A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token from the account docs page. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the two libraries the scraper needs.

bash
python --version

python -m venv zalando_env
source zalando_env/bin/activate

pip install crawlbase beautifulsoup4

On Windows, activate the environment with zalando_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull out individual fields by CSS selector.

Step 1: Fetch the rendered product page

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JS token, and request the product URL. Checking the status code before you parse keeps failures loud instead of silent.

python
from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"})

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("utf-8")
    print(f"Request failed: {response['status_code']}")
    return None

if __name__ == "__main__":
    page_url = "https://en.zalando.de/zign-handbag-black-zi151h08a-q11.html"
    html = crawl(page_url)
    print(html[:500] if html else "No HTML returned")

The two wait options matter for a client-rendered target like this. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so late-rendering elements appear before the page is captured. Five seconds is a reasonable start; raise it if the product fields come back empty. Run the script with python scraper.py and you should see real product markup, not the empty shell a plain fetch returns. That confirms rendering works before you write a single selector.

Crawlbase Crawling API

Zalando needs a rendered page behind a trusted IP, in one call. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a public product page on the free tier first.

Step 2: Parse the product fields with BeautifulSoup

With rendered HTML in hand, load it into BeautifulSoup and pull each field by its selector. Zalando product pages lay the core details out in a fairly predictable structure, so you can map name, brand, price, sizes, and color to individual selectors. Wrap the extraction in helpers so one missing field does not crash the run.

python
from bs4 import BeautifulSoup

def text_of(soup, selector):
    el = soup.select_one(selector)
    return el.get_text(strip=True) if el else None

def all_text(soup, selector):
    return [el.get_text(strip=True) for el in soup.select(selector)]

def scrape_product(html):
    soup = BeautifulSoup(html, "html.parser")

    sizes = all_text(soup, "[data-testid='pdp-size-picker'] [role='option']")

    return {
        "name": text_of(soup, "span.EKabf7.R_QwOV"),
        "brand": text_of(soup, "span.OBkCPz.Z82GLX"),
        "price": text_of(soup, "span.voFjEy.Km7l2y"),
        "color": text_of(soup, "[data-testid='color-name']"),
        "sizes": [s for s in sizes if s],
    }

The text_of helper does two useful things at once: it queries a single element and returns None when the element is missing, instead of throwing on a .get_text() call against nothing. The all_text helper collects every match for fields that repeat, which is what you want for the size options. That keeps the extraction resilient when one field is absent on a given page, which is common since not every product lists every detail.

Selectors drift

Zalando's class names (the hashed tokens like EKabf7 and voFjEy) are generated and change without notice. Treat the selectors above as a starting template, not a contract. When a field comes back as None or an empty list, re-inspect the live page in your browser's dev tools and update the selector. The stable data-testid attributes tend to survive longer than the hashed class names, so prefer them where they exist. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.

Step 3: Put it together

Now wire the fetch and the parse into one runnable script. Fetch the rendered HTML, hand it to the parser, and print the structured record.

python
import json
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"})

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 5000}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("utf-8")
    print(f"Request failed: {response['status_code']}")
    return None

def text_of(soup, selector):
    el = soup.select_one(selector)
    return el.get_text(strip=True) if el else None

def all_text(soup, selector):
    return [el.get_text(strip=True) for el in soup.select(selector)]

def scrape_product(html):
    soup = BeautifulSoup(html, "html.parser")
    sizes = all_text(soup, "[data-testid='pdp-size-picker'] [role='option']")
    return {
        "name": text_of(soup, "span.EKabf7.R_QwOV"),
        "brand": text_of(soup, "span.OBkCPz.Z82GLX"),
        "price": text_of(soup, "span.voFjEy.Km7l2y"),
        "color": text_of(soup, "[data-testid='color-name']"),
        "sizes": [s for s in sizes if s],
    }

def main():
    page_url = "https://en.zalando.de/zign-handbag-black-zi151h08a-q11.html"
    html = crawl(page_url)
    if not html:
        return
    data = scrape_product(html)
    print(json.dumps(data, indent=2, ensure_ascii=False))

if __name__ == "__main__":
    main()

Setting ensure_ascii=False on the JSON dump keeps the Euro sign and any accented characters readable instead of escaping them. Run the full script with python scraper.py and you get a clean structured record for the product.

What the output looks like

The script prints a single record ready to write to JSON, CSV, or a database.

json
{
  "name": "LEATHER - Handbag",
  "brand": "Zign",
  "price": "49,99 €",
  "color": "black",
  "sizes": ["One Size"]
}

For an apparel item the sizes list would carry the real size run, for example ["XS", "S", "M", "L"], with sold-out sizes dropped if you filter on the size picker's availability state.

Scaling to many products

One page is a demo; a real job runs over a list of products. The shape stays the same: keep a list of product URLs, fetch each through the Crawling API, parse it with the same function, and collect the rows. Because every product page shares the same structure, the parser you already wrote works across all of them without changes. The key addition is pacing: a short pause between requests keeps the run from looking like a burst of bot traffic.

python
import time

products = [
    "https://en.zalando.de/zign-handbag-black-zi151h08a-q11.html",
    "https://en.zalando.de/anna-field-handbag-black-an651h0x2-q11.html",
]

results = []
for url in products:
    html = crawl(url)
    if html:
        results.append({"url": url, **scrape_product(html)})
    time.sleep(2)

with open("zalando_products.json", "w", encoding="utf-8") as f:
    json.dump(results, f, indent=2, ensure_ascii=False)

To find product URLs at scale, scrape Zalando's public catalogue pages (for example https://en.zalando.de/catalogue/?q=handbags) with the same fetch-then-parse pattern, collect the product links, and then visit each one. Just keep the volume reasonable and respect the rate limits covered below. If price is your end goal, the collected records feed straight into a price intelligence pipeline.

Staying unblocked

Even with rendering handled, Zalando watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.

  • Pace your requests. Hammering pages in a tight loop is the fastest way to get throttled. The time.sleep above spreads requests out; vary your targets instead of crawling one path at full speed.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
  • Read the status codes. A run that starts returning challenges or errors is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.

For the broader playbook, see how to scrape websites without getting blocked and the deeper dive on how to bypass captchas while web scraping. If you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart AI Proxy (also called the AI Proxy) gives you the same residential IP rotation as a drop-in proxy endpoint. The same approach extends to neighboring catalogues like AliExpress and Walmart.

Whether scraping Zalando is allowed depends on Zalando's terms of service, your jurisdiction, and what you do with the data. Zalando's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Zalando's Terms of Service and its robots.txt, and treat both as the boundary for what you collect.

A few lines worth holding to. Collect only public product data: product name, brand, price, available sizes, and color that anyone can see without an account. Respect Zalando's stated rate expectations and keep your request volume low enough that you are not straining its servers. Avoid personal data, including anything tied to identifiable individuals such as reviewer profiles or account information. If you plan to reuse the data commercially, get permission or an official agreement rather than assuming silence is consent.

This guide is deliberately scoped to public product pages because that is the line that keeps the work defensible. It does not cover anything behind a login, account or order data, personal data, login-walled pages, or any attempt to bypass authentication. For large-scale or commercial use, prefer an official API or a data agreement with Zalando over a cleverer scraper. Public product data only is the rule that keeps you on the right side of both the terms and the law.

Recap

Key takeaways

  • Zalando is client-side rendered. A plain fetch returns an empty shell or a challenge, so you must render the page before you parse it.
  • You need rendering and a trusted IP together. The Crawling API with a JS token does both in one call; ajax_wait and page_wait control how long it waits for content.
  • BeautifulSoup does the extraction. Map product name, brand, price, sizes, and color to current selectors, prefer stable data-testid attributes, and expect the hashed class names to drift.
  • Scale by looping URLs with pacing. The same parser works across every product, so a real job is just a list of links plus a short pause between requests.
  • Stay on public data. Respect Zalando's ToS and robots.txt, prefer an official API for bulk or commercial use, and never touch accounts, login-walled pages, or personal information.

Frequently Asked Questions (FAQs)

Why does a plain fetch return no data from Zalando?

Because Zalando renders its product content client-side with JavaScript. The initial HTML is a shell that only fills in after the page's scripts run in a browser, so a raw HTTP request returns status 200 with the product fields blank, or a bot challenge instead of the page. To get real data you have to render the page first, which is what the Crawling API's JS token handles for you.

Do I need the normal token or the JS token for Zalando?

The JS token. The normal token fetches static HTML, which on Zalando is the same empty shell a plain fetch returns. The JS token renders the page in a real browser before handing back the HTML, so the product name, brand, price, sizes, and color are present when BeautifulSoup parses them.

My selectors return None or empty lists. What changed?

Almost certainly Zalando's markup. Its hashed class names like EKabf7 and voFjEy are generated and change without notice, so selectors that worked last month can break. Re-inspect a live product page in your browser's dev tools and update the selectors. Prefer the stable data-testid attributes where they exist, since they survive redesigns longer than the hashed classes.

Can I use Python instead of Puppeteer or Node.js to scrape Zalando?

Yes. Because the Crawling API renders the page server-side and returns finished HTML, you do not need to run a headless browser locally at all. A small Python script with the crawlbase client and BeautifulSoup is enough, which is the whole approach in this guide. That keeps the scraper light and easy to schedule.

How do I scrape many Zalando products without getting blocked?

Keep your per-IP request rate low, add a short pause between requests, and vary your targets instead of looping one path. Route through rotating residential IPs so no single address trips a rate limit; the Crawling API manages rotation and a trusted IP pool for you. Watch the status codes and back off when you start seeing challenges.

It depends on Zalando's terms of service, your jurisdiction, and your use of the data. This guide stays on public product data (name, brand, price, sizes, color) and asks you to respect robots.txt and the ToS. It does not cover account data, personal data, login-walled pages, or bypassing authentication. For bulk or commercial use, an official API or a data agreement is the right path, not a scraper.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available