AliExpress is one of the largest marketplaces on the web, and the product, price, and search data it surfaces is genuinely useful if you are doing market research, tracking competitor pricing, or building a sourcing tool. The catch is that AliExpress defends hard against automated traffic: hit it from a single datacenter IP at any real volume and you collect rate limits, CAPTCHAs, and IP bans instead of data. This guide shows you how to do AliExpress proxy scraping in Python by routing requests through the Crawlbase Smart AI Proxy so collection stays anonymous and unblocked.

The whole walkthrough is scoped to public data: product titles, prices, images, shipping messages, and sold counts that anyone can see on a search results page without logging in. It does not touch user accounts, login-walled content, checkout actions, or personal data. The legality section near the end is not boilerplate, so read it before you point this at production volume.

Why you need a proxy to scrape AliExpress

AliExpress, like most large e-commerce platforms, treats automated traffic as a threat. Three defenses get in the way of a naive scraper, and a rotating proxy is what neutralizes all three.

IP blocks. Send too many requests from one address and AliExpress throttles or bans it outright. A single static IP simply cannot sustain meaningful scraping volume.

CAPTCHAs. When traffic looks scripted, AliExpress challenges it with a CAPTCHA designed to separate humans from bots. A flagged IP keeps getting challenged until you rotate away from it.

Bot detection. Beyond raw request counts, the platform fingerprints request patterns to spot automation. Predictable, high-frequency traffic from the same origin is the easiest pattern to flag.

A pool of rotating residential proxies answers all three: each request can leave from a different real-user IP, so no single address trips a rate limit, gets stuck behind a CAPTCHA, or stands out as a bot. You can assemble that pool yourself, but sourcing healthy residential IPs and keeping the rotation logic working is most of the job. The Crawlbase Smart AI Proxy folds it into a single endpoint you point your existing HTTP client at.

How the Crawlbase Smart AI Proxy fits in

The Smart AI Proxy is a single proxy endpoint that sits in front of Crawlbase's full crawling infrastructure. Instead of rewriting your code around a new API, you set one proxy URL and your normal requests calls flow through it. Behind that endpoint it does the work a hand-rolled proxy stack would:

  • Automatic IP rotation. The origin IP changes across requests, drawing from a large pool of residential and datacenter addresses, so you bypass rate limits and bans without managing the rotation yourself. The mechanics are the same as any rotating residential proxy setup, just handled server-side.
  • Traffic routing and load balancing. Requests are spread across many proxy servers so no single one is overloaded, which keeps throughput steady during a long run.
  • It forwards to the Crawling API. The Smart AI Proxy passes your requests to the Crawling API, so you inherit most of that API's capabilities, including server-side rendering and the parameter controls covered below, without calling the API directly.

The practical payoff for AliExpress is that you stop fighting the anti-bot layer. Your scraper sends a plain request to a single proxy URL; the proxy handles the IP, the rotation, and the routing so the page comes back instead of a block.

Smart AI Proxy vs Crawling API

Both reach the same crawling backend. The difference is integration shape. The Smart AI Proxy is a proxy endpoint you drop into any tool that speaks HTTP proxies, so existing code works unchanged. The Crawling API is an HTTP API you call directly when you are writing fresh code. This guide uses the Smart AI Proxy because the AliExpress scraper is a standard requests script.

Test the proxy with a single curl command

Before writing any Python, confirm the proxy works with one command. First create a Crawlbase account and open your Smart AI Proxy dashboard to copy your proxy authentication token. Then run the following, replacing USER_TOKEN with that token.

bash
curl -x "http://[email protected]:8012" -k "https://www.aliexpress.com/w/wholesale-macbook-pro.html"

A few things are happening here. The -x flag routes the request through the Smart AI Proxy, which listens on smartproxy.crawlbase.com at port 8012. Your token is the username in the proxy URL and the password is left empty. The -k flag tells curl to skip SSL certificate verification, which is required when tunneling HTTPS through the proxy; without it the handshake between curl and the proxy can fail. On success you get back the raw HTML of the AliExpress search page.

Add parameters to fine-tune the request

Because the Smart AI Proxy forwards to the Crawling API, you can send Crawling API parameters through a special header named CrawlbaseAPI-Parameters. This is how you tell the proxy exactly how to handle a request rather than accepting the defaults.

The most useful one for this job is scraper=aliexpress-serp. Instead of returning raw HTML, it runs Crawlbase's built-in AliExpress search-results parser and hands you clean, structured JSON, so you skip writing and maintaining selectors entirely.

bash
curl -H "CrawlbaseAPI-Parameters: scraper=aliexpress-serp" \
  -x "http://[email protected]:8012" -k \
  "https://www.aliexpress.com/w/wholesale-macbook-pro.html"

That single header turns a messy HTML dump into organized product data. We will use the same parameter from Python in a moment.

Set up the Python project

You need Python 3 installed. If this is your first time, install it from python.org and confirm the version. Then create a project folder and add a script file.

bash
python --version

mkdir aliexpress-scraper && cd aliexpress-scraper
touch aliexpress.py

You only need one third-party package. The requests library handles HTTP and supports proxies natively, and json ships with the Python standard library, so there is nothing to install for it.

bash
pip install requests

Fetch a page through the Smart AI Proxy

Start by confirming the proxy works from Python before parsing anything. Open aliexpress.py and add the code below, swapping in your token. The pattern mirrors the curl command: your token and an empty password become the proxy credentials, and both the http and https schemes route through the same Smart AI Proxy URL.

python
import requests

username = "USER_TOKEN"
password = ""
proxy_auth = f"{username}:{password}"

url = "https://www.aliexpress.com/w/wholesale-macbook-pro.html"
proxy_url = f"http://{proxy_auth}@smartproxy.crawlbase.com:8012"
proxies = {"http": proxy_url, "https": proxy_url}

response = requests.get(url=url, proxies=proxies, verify=False)

print("Status:", response.status_code)
print(response.text[:500])

Three details matter. The proxies dictionary tells requests to send every request through the Smart AI Proxy. The verify=False argument disables SSL verification for the same reason -k did in curl; for a long-lived production service you would handle certificates properly rather than disabling the check. Run it with python aliexpress.py and you should see a 200 status and the first chunk of AliExpress HTML, which confirms the proxy is routing correctly.

Crawlbase AliExpress Scraper

AliExpress blocks scraper-shaped traffic fast. The Smart AI Proxy is a single endpoint you point your requests calls at: it rotates through residential and datacenter IPs server-side, handles routing, and forwards to the Crawling API so CAPTCHAs and IP bans stop being your problem. Drop it into the code you already have and point it at a public page on the free tier first.

Parse the results into structured JSON

Raw HTML is hard to work with. Rather than writing CSS selectors against AliExpress markup that changes without notice, lean on the built-in parser by sending the scraper=aliexpress-serp parameter as a header. The proxy returns clean JSON instead of HTML, and json.loads turns that into a Python dictionary you can walk.

python
import requests
import json

username = "USER_TOKEN"
password = ""
proxy_auth = f"{username}:{password}"

url = "https://www.aliexpress.com/w/wholesale-macbook-pro.html"
proxy_url = f"http://{proxy_auth}@smartproxy.crawlbase.com:8012"
proxies = {"http": proxy_url, "https": proxy_url}

headers = {"CrawlbaseAPI-Parameters": "scraper=aliexpress-serp"}

response = requests.get(url=url, proxies=proxies, headers=headers, verify=False)

data = json.loads(response.text)
print(json.dumps(data, indent=4))

The response is a JSON object whose body.products array holds one entry per listing. A trimmed sample looks like this:

json
{
  "original_status": 200,
  "pc_status": 200,
  "body": {
    "products": [
      {
        "title": "5 In 1 USB C Hub Type C to 4K HD Adapter for Macbook Pro",
        "price": { "current": "$1.27" },
        "url": "https://www.aliexpress.com/item/1005005653517644.html",
        "image": "https://ae04.alicdn.com/kf/Sbffa8b7a90564cff8.jpg",
        "shippingMessage": "Free shipping",
        "soldCount": 207
      }
    ],
    "relatedSearches": []
  }
}

Each product carries the fields you actually want for research: title, current price, product URL, image, shipping message, and sold count. The parser also returns a relatedSearches list you can use to discover adjacent queries.

Pick the right results URL

AliExpress search results live at /w/wholesale-<keyword>.html. Swap the keyword in that path to scrape a different query, and use a regional subdomain (for example a country-specific one) if you want prices and shipping in a particular locale. Keep the slug URL-safe by joining multi-word keywords with hyphens.

Save the data to a JSON file

Printing to the console is fine while you iterate, but you want the results on disk. Add a few lines to write the parsed dictionary to a file so you can analyze it later or load it into a pipeline.

python
import requests
import json

username = "USER_TOKEN"
password = ""
proxy_auth = f"{username}:{password}"

url = "https://www.aliexpress.com/w/wholesale-macbook-pro.html"
proxy_url = f"http://{proxy_auth}@smartproxy.crawlbase.com:8012"
proxies = {"http": proxy_url, "https": proxy_url}
headers = {"CrawlbaseAPI-Parameters": "scraper=aliexpress-serp"}

response = requests.get(url=url, proxies=proxies, headers=headers, verify=False)
data = json.loads(response.text)

with open("scraped_data.json", "w") as json_file:
    json.dump(data, json_file, indent=4)

print("Saved scraped_data.json")

The two new lines do the work: open(...) creates scraped_data.json in write mode, and json.dump serializes the parsed dictionary into it. That is a complete, runnable AliExpress scraper: it fetches a search page through a rotating proxy, parses it into structured data, and persists the result.

Scale the scraper up

One search page is a demo. A real job runs across many keywords or pages and has to stay reliable and within the site's tolerance. A few practices carry the script from a single run to a production job.

  • Loop over many queries. Wrap the fetch in a function that takes a keyword, build the /w/wholesale-<keyword>.html URL from it, and iterate a list of search terms, collecting the rows as you go.
  • Go concurrent. Fetching one URL at a time is slow. Send requests in parallel with concurrent.futures, or move to asyncio and aiohttp for higher throughput. The Smart AI Proxy rotates IPs per request, so concurrency does not concentrate load on one address.
  • Handle errors gracefully. Wrap each request in a try/except, check response.status_code before parsing, and add a retry with backoff for transient failures so one bad response does not crash the whole run.
  • Use a real store at volume. Flat JSON files are fine to start, but for large datasets write rows into a database like PostgreSQL or MySQL with proper indexing.

Rotation is the one piece you do not have to build, since the Smart AI Proxy gives each request a fresh IP from its pool. The broader playbook for staying unblocked at scale lives in how to scrape websites without getting blocked.

Whether scraping AliExpress is allowed depends on the platform's terms of service, your jurisdiction, and what you do with the data. AliExpress's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work.

A few lines worth holding to. Collect only public data: product titles, prices, images, shipping messages, and ratings that anyone can see on a search page without an account. Respect the site's robots.txt and its stated rate expectations, and keep your request volume low enough that you are not straining anyone's servers. If you plan to reuse the data commercially, get permission or an official data agreement rather than assuming silence is consent. And never collect personal data, including anything tied to individual buyer or seller accounts.

This guide is deliberately scoped to public listing data because that is the line that keeps the work defensible. It does not cover anything behind a login, account or profile data, messages, order history, or any action that bypasses authentication. If your project needs more than public listings, the right move is an official API or a data agreement with the platform, not a cleverer scraper.

Recap

Key takeaways

  • AliExpress blocks scrapers fast. IP bans, CAPTCHAs, and bot detection make a single static IP unworkable, so you route requests through a rotating proxy.
  • The Smart AI Proxy is a drop-in endpoint. Point your requests calls at one proxy URL and it rotates IPs and forwards to the Crawling API server-side, no code rewrite needed.
  • Skip selectors with the built-in parser. Send scraper=aliexpress-serp in the CrawlbaseAPI-Parameters header and the proxy returns clean structured JSON instead of raw HTML.
  • Save and scale. Persist results to JSON or a database, then loop keywords, add concurrency, and handle errors to move from a demo to a production job.
  • Stay on public data. Respect AliExpress's ToS and robots.txt; no accounts, no personal data, no auth-bypassing actions.

Frequently Asked Questions (FAQs)

Why do I need a proxy to scrape AliExpress?

AliExpress blocks automated traffic with IP bans, CAPTCHAs, and bot detection, so a single static IP gets throttled or banned almost immediately at any real volume. A rotating proxy gives each request a different real-user IP, so no single address trips a rate limit or gets stuck behind a challenge. The Crawlbase Smart AI Proxy handles that rotation for you behind one endpoint.

What is the difference between the Smart AI Proxy and the Crawling API?

Both reach the same Crawlbase crawling backend; they differ in how you integrate. The Smart AI Proxy is a proxy endpoint you drop into any tool that speaks HTTP proxies, so existing code works unchanged. The Crawling API is an HTTP API you call directly when writing fresh code. This guide uses the Smart AI Proxy because the scraper is a standard Python requests script.

How do I get structured data instead of raw HTML?

Send the scraper=aliexpress-serp parameter in a CrawlbaseAPI-Parameters request header. Because the Smart AI Proxy forwards to the Crawling API, that runs Crawlbase's built-in AliExpress search parser and returns clean JSON with each product's title, price, URL, image, shipping message, and sold count, so you never write or maintain CSS selectors.

Why do I have to disable SSL verification with the proxy?

Tunneling HTTPS traffic through the Smart AI Proxy requires skipping the local certificate check, which is the -k flag in curl and verify=False in Python requests. Without it the handshake between your client and the proxy can fail. For a long-lived production service, configure certificate handling properly rather than disabling the check outright.

Can I use the Smart AI Proxy for sites other than AliExpress?

Yes. The Smart AI Proxy is a general-purpose rotating proxy endpoint, so it works for most public websites, not just AliExpress. The scraper=aliexpress-serp parser is AliExpress-specific, but the proxy itself routes requests to any target. Other sites have their own parsers, or you can fetch raw HTML and parse it yourself.

It depends on AliExpress's terms of service, your jurisdiction, and your purpose, and their terms restrict automated access. Keep strictly to public listing data, respect robots.txt and rate expectations, and never touch accounts, personal data, or actions that bypass authentication. For commercial reuse, get permission or an official data agreement rather than relying on a scraper.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available