AliExpress is one of the largest consumer marketplaces on the web, and every product page carries the kind of structured data that drives price tracking, market research, and competitor analysis: a title, a current price, a rating, how many units have sold, the store behind the listing, and the shipping terms. The problem is that AliExpress renders those pages in the browser and pushes back hard on automated traffic, so a plain HTTP request hands you a near-empty shell instead of the fields you came for.
This guide shows you how to scrape AliExpress with Python the reliable way. You build a small, runnable scraper that fetches a fully rendered product page through the Crawling API, parses the fields you want with BeautifulSoup, and prints a clean structured record. The whole walkthrough stays scoped to public product data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.
What you will build
A Python script that takes a public AliExpress product URL, retrieves the rendered HTML through the Crawling API, and extracts a structured record of the listing. We use a single product page as the running example and pull these fields:
- Title the full product name as shown on the listing.
- Price the current price, for example "US $3.45".
- Rating the average customer rating, like "4.8".
- Orders sold how many units the listing reports as sold.
- Store name the seller's store behind the product.
- Shipping the shipping cost or terms shown on the page.
Why a plain fetch fails on AliExpress
If you request an AliExpress item URL with a bare HTTP client, you usually get a response with status 200 and almost none of the product detail in the body. Two forces work against you. First, AliExpress builds its product content in the browser with JavaScript, so the initial HTML is a skeleton that only fills in after the page's scripts run. Second, AliExpress flags automated traffic quickly: datacenter IPs and request patterns that do not look like a real browser get challenged, geo-redirected, or blocked before they ever reach the rendered content.
So a working AliExpress scraper needs two things in a single request: a browser that actually renders the page, and an IP the platform reads as a genuine shopper. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into one call: you send it the URL with a JavaScript token, it renders the page behind a trusted IP, and it returns finished HTML for you to parse. If you want the wider e-commerce context first, see our overview of e-commerce web scraping.
Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. AliExpress is client-side rendered, so you need the JS token here. Using the normal token returns much the same skeleton a plain fetch would, and there is little to parse out of it.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic Python. You should be comfortable writing and running a script and installing packages with pip. If you are new to parsing HTML, our primer on how to use BeautifulSoup in Python covers the selector basics this tutorial leans on.
Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.
A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token from the account docs page. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create a virtual environment so project dependencies stay isolated, then install the two libraries the scraper needs.
python --version python -m venv aliexpress_env source aliexpress_env/bin/activate pip install crawlbase beautifulsoup4
On Windows, activate the environment with aliexpress_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull out individual fields by CSS selector.
Step 1: Fetch the rendered product page
Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JS token, and request the item URL. Checking the status code before you parse keeps failures loud instead of silent.
from crawlbase import CrawlingAPI api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"}) def crawl(page_url): options = {"ajax_wait": "true", "page_wait": 8000, "country": "US"} response = api.get(page_url, options) if response["status_code"] == 200: return response["body"].decode("utf-8") print(f"Request failed: {response['status_code']}") return None if __name__ == "__main__": page_url = "https://www.aliexpress.com/item/1005006597796136.html" html = crawl(page_url) print(html[:500] if html else "No HTML returned")
The options matter for a client-rendered target like this. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so late-rendering price and shipping blocks appear before the page is captured. Eight seconds is a reasonable start for AliExpress; raise it if fields come back empty. The country option pins the request to a region so prices and shipping render in a stable locale instead of shifting per IP. Run the script with python scraper.py and you should see real product markup, not the skeleton a plain fetch returns. That confirms rendering works before you write a single selector.
AliExpress needs a rendered page behind a trusted, region-pinned IP, in one call. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a public product page on the free tier first.
Step 2: Parse the product fields with BeautifulSoup
With rendered HTML in hand, load it into BeautifulSoup and pull each field by its selector. AliExpress lays the core product details out in a predictable structure, so you can map title, price, rating, orders sold, store name, and shipping to individual selectors. Wrap the extraction in a helper so one missing field does not crash the run.
from bs4 import BeautifulSoup def text_of(soup, selector): el = soup.select_one(selector) return el.get_text(strip=True) if el else None def scrape_product(html): soup = BeautifulSoup(html, "html.parser") return { "title": text_of(soup, "h1[data-pl='product-title']"), "price": text_of(soup, ".product-price-current"), "rating": text_of(soup, "[data-pl='product-reviewer'] strong"), "orders_sold": text_of(soup, "[data-pl='product-reviewer'] span"), "store_name": text_of(soup, "a[data-pl='store-name']"), "shipping": text_of(soup, ".dynamic-shipping-line strong"), }
The text_of helper does two useful things at once: it queries a single element and returns None when the element is missing, instead of throwing on a .get_text() call against nothing. That keeps the extraction resilient when one field is absent on a given listing, which is common since not every product reports an order count or a shipping line. Orders sold and rating often share the same reviewer block, so they are read from sibling elements inside it.
AliExpress class names and data-pl markers change without notice, and they differ across product templates and locales. Treat the selectors above as a starting template, not a contract. When a field comes back as None, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.
Step 3: Put it together
Now wire the fetch and the parse into one runnable script. Fetch the rendered HTML, hand it to the parser, and print the structured record.
import json from crawlbase import CrawlingAPI from bs4 import BeautifulSoup api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"}) def crawl(page_url): options = {"ajax_wait": "true", "page_wait": 8000, "country": "US"} response = api.get(page_url, options) if response["status_code"] == 200: return response["body"].decode("utf-8") print(f"Request failed: {response['status_code']}") return None def text_of(soup, selector): el = soup.select_one(selector) return el.get_text(strip=True) if el else None def scrape_product(html): soup = BeautifulSoup(html, "html.parser") return { "title": text_of(soup, "h1[data-pl='product-title']"), "price": text_of(soup, ".product-price-current"), "rating": text_of(soup, "[data-pl='product-reviewer'] strong"), "orders_sold": text_of(soup, "[data-pl='product-reviewer'] span"), "store_name": text_of(soup, "a[data-pl='store-name']"), "shipping": text_of(soup, ".dynamic-shipping-line strong"), } def main(): page_url = "https://www.aliexpress.com/item/1005006597796136.html" html = crawl(page_url) if not html: return data = scrape_product(html) print(json.dumps(data, indent=2)) if __name__ == "__main__": main()
What the output looks like
Run the full script with python scraper.py and you get a clean structured record for the product, ready to write to JSON, CSV, or a database.
{ "title": "Wireless Bluetooth Earbuds Noise Cancelling Touch Control", "price": "US $12.74", "rating": "4.7", "orders_sold": "3,210 sold", "store_name": "TechGear Official Store", "shipping": "Free shipping" }
Scaling to many products
One product is a demo; a real job runs over a list of items. The shape stays the same: keep a list of product URLs, fetch each through the Crawling API, parse it with the same function, and collect the rows. Because every product page shares the same structure, the parser you already wrote works across all of them without changes. The one habit to add is pacing, so you do not march through the list in a tight loop.
import time import random products = [ "https://www.aliexpress.com/item/1005006597796136.html", "https://www.aliexpress.com/item/1005005899876543.html", ] results = [] for url in products: html = crawl(url) if html: results.append(scrape_product(html)) time.sleep(random.uniform(2, 5)) with open("products.json", "w") as f: json.dump(results, f, indent=2)
The randomized sleep between requests spreads your traffic out instead of firing in a predictable rhythm. To gather product URLs at scale, scrape AliExpress search and category pages with the same fetch-then-parse pattern, collect the item links, then visit each one. Keep the volume reasonable and respect the rate limits covered below.
Staying unblocked
Even with rendering handled, AliExpress watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.
- Pace your requests. Hammering item pages in a tight loop is the fastest way to get throttled. Spread requests out, add jitter, and vary your targets instead of crawling one path at full speed.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
-
Pin the region. AliExpress changes prices, currency, and shipping by location. Setting a consistent
countrykeeps your records comparable and avoids the redirect loops that hit mismatched IPs. - Read the status codes. A run that starts returning challenges or errors is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.
For the broader playbook, see how to scrape websites without getting blocked and the deeper dive on how to bypass captchas while web scraping. There is also a focused walkthrough on using AliExpress proxy scraping if you want more on the proxy side. If you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart AI Proxy (also called the AI Proxy) gives you the same residential IP rotation as a drop-in proxy endpoint.
Is it legal to scrape AliExpress?
Whether scraping AliExpress is allowed depends on AliExpress's terms of service, your jurisdiction, and what you do with the data. AliExpress's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read the AliExpress Terms of Service and its robots.txt, and treat both as the boundary for what you collect.
A few lines worth holding to. Collect only public product data: the title, current price, rating, orders sold, store name, and shipping terms that anyone can see without an account. Respect AliExpress's stated rate expectations and keep your request volume low enough that you are not straining its servers. Avoid anything tied to identifiable people, including buyer or seller personal data beyond the public store name on a listing. If you plan to reuse the data commercially, get permission or an official agreement rather than assuming silence is consent.
This guide is deliberately scoped to public product pages because that is the line that keeps the work defensible. It does not cover anything behind a login, buyer or seller personal data, private order or account data, or any attempt to bypass authentication. For licensed or bulk access, AliExpress and the wider Alibaba platform offer official APIs and data agreements, and that is the right tool when you need large volumes, guaranteed structure, or commercial rights. If your project needs more than public product data, an official API or a data agreement is the correct path, not a cleverer scraper.
Key takeaways
- AliExpress is client-side rendered. A plain fetch returns a near-empty skeleton, so you must render the page before you parse it.
-
You need rendering and a trusted IP together. The Crawling API with a JS token does both in one call;
ajax_wait,page_wait, andcountrycontrol how it waits and where it renders. - BeautifulSoup does the extraction. Map title, price, rating, orders sold, store name, and shipping to current selectors, and expect those selectors to drift.
- Scale by looping URLs with pacing. The same parser works across every product, so a real job is just a list of item links plus jittered delays and rotation.
- Stay on public data. Respect AliExpress's ToS and robots.txt, prefer an official API for licensed or bulk data, and never touch logins, order data, or personal information.
Frequently Asked Questions (FAQs)
Why does a plain fetch return no product data from AliExpress?
Because AliExpress builds its product content in the browser with JavaScript. The initial HTML is a skeleton that only fills in after the page's scripts run, so a raw HTTP request returns status 200 with the price, rating, and shipping blank. To get real data you have to render the page first, which is what the Crawling API's JS token handles for you.
Do I need the normal token or the JS token for AliExpress?
The JS token. The normal token fetches static HTML, which on AliExpress is much the same skeleton a plain fetch returns. The JS token renders the page in a real browser before handing back the HTML, so the product fields are present when BeautifulSoup parses them.
My selectors return None. What changed?
Almost certainly AliExpress's markup. Its class names and data-pl markers change without notice and differ across product templates and locales, so selectors that worked last month can break. Re-inspect a live product page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.
How do I keep prices and shipping consistent across runs?
Pin the region with the country option on every request. AliExpress varies price, currency, and shipping by location, so an unpinned run that rotates across countries produces records you cannot compare. Setting a consistent country also avoids the geo-redirect loops that hit mismatched IPs.
How do I avoid getting blocked while scraping AliExpress?
Keep your per-IP request rate low, add jitter between requests, vary your targets instead of looping one path, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation and a trusted IP pool for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing challenges.
Can I scrape AliExpress prices to price my own products?
Using public price data for market analysis is a common use case, but it sits inside AliExpress's terms of service and your local regulations, so confirm both before you build on it. Stick to public product data, keep your volume modest, and for licensed or bulk access use an official API or a data agreement rather than scraping at scale.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
