Shein is one of the largest fast-fashion retailers on the web, with a sprawling catalog of men's and women's wear, accessories, and home goods that turns over constantly. Each category and search page is a public grid of products: a title, a price, a star rating, and a link into the product's own page, all visible to anyone browsing without an account. That public listing data is what dropshippers, price trackers, and market researchers watch to spot deals, follow pricing, and size up what is trending.
This guide shows you how to scrape Shein listings with Python. You build a small, runnable scraper that fetches a Shein category or search page through the Crawling API, parses a clean record for each product, handles pagination, and exports the results to JSON and CSV. The whole walkthrough stays scoped to public listing data: the product names, prices, ratings, and links anyone can see on a Shein listing page without logging in.
What you will build
A Python script that takes a Shein category or search URL, retrieves the rendered page through the Crawling API, and extracts a structured record per product on the grid. We use a women's clothing category page as the running example. From each product card the scraper pulls these fields:
- Title the product name shown on the listing card.
- Price the current selling price, when the card shows one.
- Rating the average star rating displayed on the card.
- Link the absolute URL to the product's own detail page.
- Image the product image URL from the card's thumbnail.
These map directly onto the fields the legacy walkthrough collected (product name, pricing, ratings, and image URLs), trimmed to what a listing grid actually exposes. Per-product detail like SKU, full description, colors, and sizes lives on each product's own page, which you reach by following the link field and parsing that page the same way.
Why a plain request fails on Shein
If you point a bare HTTP client at a Shein category URL, you rarely get the product grid you came for. Two things work against you. First, Shein renders its listings client-side: the server ships a lightweight shell and fills the product cards in as the page's JavaScript runs and as you scroll, so the initial HTML is usually missing most of the products. A library that only reads the first HTTP response, with no browser to execute the scripts, sees an almost empty page.
Second, Shein flags automated traffic. Datacenter IP ranges and request patterns that do not look like a real shopper get met with a challenge page, a regional redirect, or an outright block before you ever reach the grid. So a working Shein scraper needs two things in one request: a browser that renders the page, and an IP that Shein reads as a real visitor. You can assemble that yourself with a headless browser and a pool of rotating residential proxies, but keeping that stack healthy is most of the work. The Crawling API folds both into a single call: you send it the listing URL, it renders the page behind a trusted residential IP, handles the rotation and any challenge, and returns finished HTML for you to parse.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, the official Python docs or any beginner course covers the level this tutorial assumes, and our guide on using BeautifulSoup in Python walks through the parsing library in depth.
Python 3.8 or later. Confirm your version with python --version (or python3 --version). If you do not have it, install it from python.org and make sure Python is on your system PATH.
A Crawlbase account and token. Sign up for a free account, open your dashboard, and copy your token from the account docs page. The free tier includes 1,000 requests with no card, which is plenty to build and test this scraper. Shein needs the JavaScript token because the listing renders client-side. Treat the token like a password and keep it out of version control.
Set up the project
Create a virtual environment so project dependencies stay isolated, then install the two libraries the scraper needs. crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull each field out of the product cards by CSS selector.
python --version python -m venv shein_env source shein_env/bin/activate pip install crawlbase beautifulsoup4
On Windows, activate the environment with shein_env\Scripts\activate instead of the source line. With both libraries installed, create the script file the rest of the guide builds up:
touch shein_listing.py
Understanding the listing page
A Shein listing lives at a category URL such as https://www.shein.com/Women-Clothing-c-2030.html, or at a search URL like https://www.shein.com/pdsearch/dress/. Both lay out a grid of product cards, one per product, each carrying the same handful of fields: a thumbnail image, a title, a price, an optional star rating, and a link into the product's detail page.
Before writing selectors, open a listing page in your browser, right-click a product card, and choose Inspect. Shein wraps each product in a section marked with a class that contains product-card, holds the title in a link or name element, shows the price in a dedicated price element, and exposes the thumbnail and the product link inside that container. Those are the elements you target. Shein's exact class names change over time, so where you can, lean on partial matches like a product-card substring rather than a brittle full class chain.
Step 1: Fetch the rendered listing page
Start by getting the finished page. Import the CrawlingAPI class, initialize it with your token, set the listing URL, and request it with rendering turned on. Checking the status code before you parse keeps failures loud instead of silent.
from crawlbase import CrawlingAPI api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) def crawl(page_url): options = {"ajax_wait": "true", "page_wait": 5000} response = api.get(page_url, options) if response["status_code"] == 200: return response["body"].decode("utf-8", "ignore") print(f"Request failed: {response['status_code']}") return None if __name__ == "__main__": listing_url = "https://www.shein.com/Women-Clothing-c-2030.html" html = crawl(listing_url) print(html[:500] if html else "No HTML returned")
The two wait options matter for a grid that fills in as the page loads. ajax_wait tells the API to wait for asynchronous content to finish, and page_wait holds for a fixed number of milliseconds after load so the late-rendering cards appear before the page is captured. Run the script and you should see real product markup, not a near-empty shell. That confirms rendering works before you write a single selector.
That product grid only appears once the page's JavaScript runs, and only if Shein reads the request as a real shopper. The Crawling API takes your token, runs the listing page in a real browser, rotates through residential IPs server-side, and handles any challenge, then hands you finished HTML. You skip running a headless browser fleet and a proxy pool yourself. Point it at a category or search URL on the free 1,000-request tier first.
Step 2: Parse the product cards with BeautifulSoup
With rendered HTML in hand, load it into BeautifulSoup, find every product card, and pull each field by its selector. Shein wraps each product in a card container you select on, holds the title in a name link, shows the price in a price element, and exposes the thumbnail image and the product link inside the card. Wrap each card in a try/except so one malformed listing does not crash the run.
from bs4 import BeautifulSoup BASE = "https://www.shein.com" def text_of(card, selector): el = card.select_one(selector) return el.get_text(strip=True) if el else None def absolute(href): if not href: return None if href.startswith("//"): return "https:" + href return href if href.startswith("http") else BASE + href def parse_link(card): a = card.select_one("a[href]") return absolute(a["href"]) if a else None def parse_image(card): img = card.select_one("img") if not img: return None src = img.get("src") or img.get("data-src") return absolute(src) def scrape_listing(html): soup = BeautifulSoup(html, "html.parser") cards = soup.select("section[class*='product-card']") results = [] for card in cards: try: title = text_of(card, "a.goods-title-link") if not title: continue results.append({ "title": title, "price": text_of(card, "span.product-card__price"), "rating": text_of(card, "span.rate-num"), "link": parse_link(card), "image": parse_image(card), }) except Exception as e: print(f"Skipped a card: {e}") return results
The text_of helper queries one element inside a card and returns None when it is missing, instead of throwing on a .get_text() call against nothing. That keeps extraction resilient when a field is absent, which is common since not every product shows a rating. The absolute helper normalizes links and image sources to full URLs, because Shein often serves a protocol-relative // source or a relative href. Skipping any card without a title drops the empty placeholder slots Shein leaves in the grid.
Shein's class names (the card container, the price span, the rating element) shift as the site updates, while structural patterns like a product-card substring and an inner anchor are more durable. Treat the class-based selectors above as a starting template, not a contract. When a field comes back as None for every card, re-inspect the live listing page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper.
Step 3: Assemble the script and export JSON and CSV
Now wire the fetch and the parse into one runnable script, then write the records to both JSON and CSV so you can load them into a notebook or a spreadsheet. Fetch the rendered listing page, hand it to the parser, and dump the structured rows.
import csv import json from crawlbase import CrawlingAPI from bs4 import BeautifulSoup api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) BASE = "https://www.shein.com" FIELDS = ["title", "price", "rating", "link", "image"] def crawl(page_url): options = {"ajax_wait": "true", "page_wait": 5000} response = api.get(page_url, options) if response["status_code"] == 200: return response["body"].decode("utf-8", "ignore") print(f"Request failed: {response['status_code']}") return None def text_of(card, selector): el = card.select_one(selector) return el.get_text(strip=True) if el else None def absolute(href): if not href: return None if href.startswith("//"): return "https:" + href return href if href.startswith("http") else BASE + href def parse_link(card): a = card.select_one("a[href]") return absolute(a["href"]) if a else None def parse_image(card): img = card.select_one("img") if not img: return None src = img.get("src") or img.get("data-src") return absolute(src) def scrape_listing(html): soup = BeautifulSoup(html, "html.parser") cards = soup.select("section[class*='product-card']") results = [] for card in cards: try: title = text_of(card, "a.goods-title-link") if not title: continue results.append({ "title": title, "price": text_of(card, "span.product-card__price"), "rating": text_of(card, "span.rate-num"), "link": parse_link(card), "image": parse_image(card), }) except Exception as e: print(f"Skipped a card: {e}") return results def export(rows, name="shein_listing"): with open(f"{name}.json", "w", encoding="utf-8") as f: json.dump(rows, f, indent=2, ensure_ascii=False) with open(f"{name}.csv", "w", newline="", encoding="utf-8") as f: writer = csv.DictWriter(f, fieldnames=FIELDS) writer.writeheader() writer.writerows(rows) print(f"Saved {len(rows)} products to {name}.json and {name}.csv") def main(): url = "https://www.shein.com/Women-Clothing-c-2030.html" html = crawl(url) if not html: return rows = scrape_listing(html) export(rows) if __name__ == "__main__": main()
Run the full script with python shein_listing.py. It fetches the rendered listing page, parses one row per product, and writes both shein_listing.json and shein_listing.csv. The shared FIELDS list keeps the CSV column order in step with the dictionary keys, so the two exports never drift apart.
What the output looks like
You get a clean list of product records, in grid order, ready to write to JSON, CSV, or a database.
[ { "title": "Women's Ribbed Knit Long Sleeve Bodycon Dress", "price": "$12.49", "rating": "4.92", "link": "https://www.shein.com/Ribbed-Knit-Bodycon-Dress-p-12345678.html", "image": "https://img.ltwebstatic.com/images/ribbed-dress.jpg" }, { "title": "Floral Print Tie Front Blouse", "price": "$9.00", "rating": "4.78", "link": "https://www.shein.com/Floral-Print-Blouse-p-87654321.html", "image": "https://img.ltwebstatic.com/images/floral-blouse.jpg" } ]
Scaling across pages and categories
One page is a demo; a real research job runs across the whole category. Shein paginates its listings with a ?page= query parameter, so you walk the pages in order until a page comes back empty, then move on to the next category. Pace the requests so you are not hammering Shein in a tight loop.
import time CATEGORIES = { "women-clothing": "https://www.shein.com/Women-Clothing-c-2030.html", "dresses": "https://www.shein.com/Dresses-c-1727.html", } def scrape_categories(categories, max_pages=5): everything = {} for name, base_url in categories.items(): rows = [] for page in range(1, max_pages + 1): sep = "&" if "?" in base_url else "?" page_url = f"{base_url}{sep}page={page}" html = crawl(page_url) if not html: break found = scrape_listing(html) if not found: break rows.extend(found) print(f"{name} page {page}: {len(found)} products") time.sleep(2) everything[name] = rows return everything
The empty-results break stops you early when a category runs out of products, the max_pages cap keeps a run bounded, and the time.sleep(2) between requests paces the run so you are not flagged for rapid-fire traffic. Keying the output by category name keeps each list separate, which is what you want when comparing prices across categories. To track price changes over time, run the job on a schedule, stamp each export with the date, and diff successive snapshots. That feeds straight into a price intelligence workflow, where the listing fields you collect here become the raw signal.
Staying unblocked
Even with rendering handled, Shein watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.
- Pace your requests. Spread requests out with a delay between pages and categories rather than crawling everything at full speed. Schedule heavier jobs during off-peak hours to ease load on Shein's servers.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
- Retain only what you need. Store the listing fields your project uses and discard the rest. Re-check your selectors periodically so the scraper keeps pace with markup changes.
For the broader playbook on avoiding blocks, see how to scrape websites without getting blocked, and for more on why rendering matters here, how to crawl JavaScript websites. For the wider context of pulling structured product data from online stores, our overview of ecommerce web scraping covers the patterns this scraper follows.
Is it legal to scrape Shein?
Whether scraping Shein is allowed depends on Shein's Terms of Service, your jurisdiction, and what you do with the data. Shein's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Shein's Terms of Service and its robots.txt, and treat both as the boundary for what you collect. For commercial or competitive use, the legal picture gets more complex, and consulting a legal expert about your specific case is the sensible move.
A few lines worth holding to. Collect only public data: the product titles, prices, ratings, listing links, and thumbnail images that anyone can see on a Shein listing page without an account. Keep your request volume low enough that you are not straining Shein's servers, and avoid personal data, including anything tied to identifiable shoppers or reviewers beyond what is publicly listed. Do not redistribute Shein's product photography as your own; an image URL is a reference, not a license to republish the media. If you plan to reuse the data commercially, get permission or an official agreement rather than assuming silence is consent.
This guide is deliberately scoped to public category and search listing pages because that is the line that keeps the work defensible. It does not cover anything behind a login, account or order data, personal information, or any attempt to bypass authentication or a challenge you are not entitled to pass. If Shein offers an affiliate or partner data program that fits your use case, that is the right tool when you need large volumes, guaranteed structure, or commercial rights. If your project needs more than public listing data, an official channel or a data agreement is the correct path, not a cleverer scraper.
Key takeaways
- Listings are public product signal. Each Shein category and search page lays out titles, prices, ratings, and links, which is why it is so useful for price tracking and product research.
- You need rendering and a trusted IP together. Shein fills the product grid client-side and blocks bot traffic, so the Crawling API renders the page behind a residential IP in one call.
-
BeautifulSoup does the extraction. Loop the
product-cardcontainers and map title, price, rating, link, and image to current selectors, and expect those selectors to drift. -
Export to JSON and CSV. A shared field list keeps both files in sync, and paginating with
?page=until a page is empty scales the run across a whole category. - Stay on public data. Respect Shein's Terms of Service and robots.txt, do not republish product imagery, and never touch accounts, orders, or personal information.
Frequently Asked Questions (FAQs)
Why does a plain request return no products from Shein?
Shein renders its listings client-side, so a raw HTTP request gets a near-empty shell with no product cards in it. On top of that, Shein challenges or blocks traffic that does not look like a real browser. Rendering the page through the Crawling API behind a trusted IP solves both, which is why the scraper here routes its request through it rather than fetching the URL directly.
How do I scrape a specific Shein category or search?
Point the scraper at the category URL, for example /Women-Clothing-c-2030.html, or at a search URL like /pdsearch/dress/. Both render the same product-card grid, so the same selectors carry over. To cover several categories, keep a map of names to URLs and loop over it, pacing the requests with a short delay between pages.
Can I get the SKU, colors, and sizes for each product?
Those fields live on each product's own detail page, not on the listing grid. Collect the link field from the listing first, then fetch each product page through the same crawl function and parse the SKU, full description, color swatches, and size options from that page. The legacy fields map onto a two-stage scrape: the listing for discovery, the detail page for depth.
How do I handle pagination on Shein?
Shein paginates listings with a ?page= query parameter. Walk the pages in order, stop when a page returns no products, and cap the loop with a maximum page count so a run stays bounded. The scrape_categories helper above does exactly this, and pacing each request with a short sleep keeps the run from looking like rapid-fire traffic.
Should I export to JSON or CSV?
Both, and the script writes both from the same records. JSON keeps the nested structure and is easy to load back into Python, while CSV opens directly in a spreadsheet for quick filtering and charting. The shared FIELDS list keeps the CSV columns in the same order as the dictionary keys, so the two files never drift apart.
How do I avoid getting blocked while scraping Shein?
Keep your per-IP request rate low, add a delay between pages and categories, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation, a trusted IP pool, and challenge handling for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing challenges.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
