Alibaba.com is one of the largest B2B marketplaces in the world, listing products across more than 40 categories from suppliers in dozens of countries. For anyone doing sourcing, price research, or competitor analysis, the public search results are a dense signal: every query returns product titles, price ranges, minimum order quantities, supplier names, and the links that tie them together. Pulling that into a structured dataset turns a manual browse into something you can sort, compare, and track over time.

This guide shows you how to scrape Alibaba search results with Node.js the reliable way. You build a small, runnable scraper that fetches a rendered SERP through the Crawling API, parses each product card with Cheerio, handles pagination, and exports clean JSON and CSV. The whole walkthrough stays scoped to public search-results data that anyone can see without an account, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.

What you will build

A Node.js script that takes a public Alibaba search URL, retrieves the HTML through the Crawling API, and extracts a structured record for every product card on the page. We will use a sample query as the running example and pull these fields from each result:

  • Title the product headline as shown in the listing card.
  • Price the displayed price or price range for the product.
  • Minimum order the minimum order quantity or sale-feature text the supplier sets.
  • Supplier the store or company name behind the listing.
  • Link the destination URL of the product detail page.
  • Store link the supplier's company profile URL.

Why a plain request fails on Alibaba

If you fire a bare HTTP request at an Alibaba search URL from a script, you rarely get the clean page you see in your own browser. Two things work against you. First, much of the SERP is assembled by JavaScript after the initial HTML loads, so a raw fetch can return a shell with the product cards missing. Second, Alibaba watches for automated traffic: requests that do not look like a real browser get challenged, fed a CAPTCHA, or blocked before they reach the listings.

So a working Alibaba scraper needs two things in one request: an IP the platform reads as a real visitor, and a browser that renders the page when it leans on scripts. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but keeping those healthy is most of the work. The Crawling API folds both into a single call: you send it the URL, it fetches from a trusted IP and renders when needed, and it returns finished HTML for you to parse with a lightweight library like Cheerio.

Why rotation matters here

Alibaba ranks among the more aggressively protected marketplaces, so a single datacenter IP firing repeated SERP requests is an immediate tell. The Crawling API rotates through datacenter and residential addresses server-side and handles CAPTCHAs for you, so you do not have to source and maintain that pool yourself. You can start with 1,000 free requests, no credit card needed.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Node.js. You should be comfortable writing and running a Node script and installing packages with npm. If you are new to scraping in this stack, our guide on how to build a web scraper with Node.js covers the basics this tutorial assumes.

Node.js 14 or later. Confirm your version with node -v. If you do not have it, install it from nodejs.org.

A Crawlbase account and token. Sign up, open your dashboard, and copy your request token from the account docs page. Your first 1,000 requests are free. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a project directory and install the two libraries the scraper needs.

bash
mkdir alibaba-serp-scraper
cd alibaba-serp-scraper

npm init -y
npm install crawlbase cheerio

Two dependencies do the work: crawlbase is the official client that sends the request to the Crawling API, and cheerio is a fast, jQuery-style HTML parser that lets you pull out individual fields by CSS selector. Create a file named index.js in that directory; that is where the scraper code goes.

Step 1: Fetch the page through the Crawling API

Start by getting the HTML. Initialize the Crawling API client with your token, point it at a public Alibaba search URL, and confirm the page comes back before you write a single selector. The CrawlingAPI client returns a response whose body holds the rendered HTML.

javascript
const { CrawlingAPI } = require('crawlbase');

// Replace with your token from the Crawlbase dashboard
const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });

const alibabaSerpURL =
  'https://www.alibaba.com/trade/search?SearchText=samsung+s24+ultra';

api
  .get(alibabaSerpURL)
  .then((response) => {
    // original_status is the status Alibaba itself returned
    console.log('Status:', response.originalStatus);
    console.log(response.body.slice(0, 500));
  })
  .catch((error) => {
    console.error('Request failed:', error);
  });

The sample query is samsung+s24+ultra, carried in the SearchText parameter, which is how Alibaba's trade search passes a search term. Run the script with node index.js and you should see a 200 status and real product markup in the first 500 characters. That confirms the fetch works, with the page rendered and the request accepted, before you commit to parsing logic. Checking originalStatus keeps a region gate or a block loud instead of silently feeding an error page into the parser.

Crawlbase Crawling API

That 200 status only comes back because the request reached Alibaba as a real visitor in the first place. The Crawling API fetches the SERP from a rotating IP, renders the JavaScript-built product cards, and clears CAPTCHAs server-side, then hands you finished HTML, so you skip running a headless browser fleet and sourcing a residential proxy pool yourself. Point it at a public search URL on the free tier first.

Step 2: Parse the product cards with Cheerio

With HTML in hand, load it into Cheerio and pull each product by its selector. Alibaba wraps each listing in a card under the offer-list container, with the title, price, minimum order, and supplier each in their own element inside the card. Inspect the live page in your browser's dev tools (right-click, then Inspect) to confirm the current class names; the selectors below match the layout at the time of writing.

javascript
const cheerio = require('cheerio');

// Alibaba serves protocol-relative URLs (//...); normalise them to https
function toHttps(href) {
  if (!href) return null;
  return href.includes('http') ? href : `https:${href}`;
}

function parseSerp(html) {
  const $ = cheerio.load(html);
  const results = [];

  const numberOfResults = $('.seb-refine-result_all').text().trim();

  $('.offer-list-wrapper .J-search-card-wrapper').each((index, element) => {
    const card = $(element);

    const title = card.find("[data-spm='d_title']").text().trim();
    const url = card.find("[data-spm='d_title']").attr('href');
    const price = card.find('.search-card-e-price-main').text().trim();
    const minItem = card.find('.search-card-m-sale-features__item').text().trim();
    const storeName = card.find('.search-card-e-company').text().trim();
    const storeLink = card.find('.search-card-e-company').attr('href');
    const image = card.find('.search-card-e-slider__img').attr('src');
    const reviews = card.find('.search-card-e-review').text().trim();

    if (!title) return;

    results.push({
      position: index + 1,
      title,
      price,
      minItem,
      storeName,
      reviews,
      url: toHttps(url),
      storeLink: toHttps(storeLink),
      image: toHttps(image),
    });
  });

  return { numberOfResults, results };
}

module.exports = { parseSerp };

The wrapper .offer-list-wrapper .J-search-card-wrapper selects each product card. Inside each card, the title and its product URL come from the [data-spm='d_title'] anchor, the price from .search-card-e-price-main, the minimum order text from .search-card-m-sale-features__item, and the supplier name and profile link from .search-card-e-company. Alibaba returns many of its links protocol-relative (starting with //), so the small toHttps helper prepends https: when the value is not already absolute. The if (!title) return; guard skips empty or non-product cards so promo tiles do not pollute the output. The .seb-refine-result_all element holds the total result count shown at the top of the page.

Selectors drift

Alibaba's class names, like J-search-card-wrapper and search-card-e-price-main, change when the front end is redeployed. Treat the selectors above as a starting template, not a contract. When a field comes back empty for every card, re-inspect a live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.

Step 3: Put it together and export

Now wire the fetch and the parse into one runnable script. Crawl the rendered SERP, hand the HTML to the parser, then write the structured output to both JSON and CSV so the data is ready for a spreadsheet or a database.

javascript
const { CrawlingAPI } = require('crawlbase');
const cheerio = require('cheerio');
const fs = require('fs');

const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });

function toHttps(href) {
  if (!href) return null;
  return href.includes('http') ? href : `https:${href}`;
}

function parseSerp(html) {
  const $ = cheerio.load(html);
  const results = [];
  const numberOfResults = $('.seb-refine-result_all').text().trim();

  $('.offer-list-wrapper .J-search-card-wrapper').each((index, element) => {
    const card = $(element);
    const title = card.find("[data-spm='d_title']").text().trim();
    if (!title) return;

    results.push({
      position: index + 1,
      title,
      price: card.find('.search-card-e-price-main').text().trim(),
      minItem: card.find('.search-card-m-sale-features__item').text().trim(),
      storeName: card.find('.search-card-e-company').text().trim(),
      reviews: card.find('.search-card-e-review').text().trim(),
      url: toHttps(card.find("[data-spm='d_title']").attr('href')),
      storeLink: toHttps(card.find('.search-card-e-company').attr('href')),
      image: toHttps(card.find('.search-card-e-slider__img').attr('src')),
    });
  });

  return { numberOfResults, results };
}

function toCsv(results) {
  const headers = ['position', 'title', 'price', 'minItem', 'storeName', 'url'];
  const escape = (v) => `"${(v || '').toString().replace(/"/g, '""')}"`;
  const rows = results.map((r) =>
    headers.map((h) => escape(r[h])).join(',')
  );
  return [headers.join(','), ...rows].join('\n');
}

async function main() {
  const url =
    'https://www.alibaba.com/trade/search?SearchText=samsung+s24+ultra';
  const response = await api.get(url);

  if (response.originalStatus !== 200) {
    throw new Error(`Unable to crawl, status ${response.originalStatus}`);
  }

  const data = parseSerp(response.body);
  fs.writeFileSync('alibaba-serp.json', JSON.stringify(data, null, 2));
  fs.writeFileSync('alibaba-serp.csv', toCsv(data.results));
  console.log(`Saved ${data.results.length} products`);
}

main().catch((error) => console.error(error));

Run the full script with node index.js. It fetches the SERP for "samsung s24 ultra", extracts a record for each product card, and writes everything to alibaba-serp.json and alibaba-serp.csv. The JSON keeps the nested structure with the result count, while the CSV flattens the core sourcing fields into rows you can drop straight into a spreadsheet. Swap the query in the URL and the same two functions handle whatever comes back.

What the output looks like

You get a clean object with the total result count and an ordered list of products, each carrying the title, price, minimum order, supplier, links, and image.

json
{
  "numberOfResults": "3,000+ products found",
  "results": [
    {
      "position": 1,
      "title": "Mobile Phone Case For Samsung Galaxy S24 Ultra Plus Tpu Pc Shockproof Covers",
      "price": "US$1.29 - US$1.69",
      "minItem": "Min. order: 50 pieces",
      "storeName": "Guangzhou Junbo Electronic Co., Ltd.",
      "reviews": "4.9/5.0 (68)",
      "url": "https://www.alibaba.com/product-detail/Mobile-Phone-Case_1600969904884.html",
      "storeLink": "https://gzjunbo.en.alibaba.com/company_profile.html",
      "image": "https://s.alicdn.com/@sc04/kf/Hcdcc7db446e9420f9378c0ec3482037bk.png_300x300.png"
    },
    {
      "position": 2,
      "title": "Cellphone Original S24 Ultra 16GB+512GB Smartphone 7inch Unlocked 5G",
      "price": "US$43.42 - US$54.47",
      "minItem": "Min. order: 1 piece",
      "storeName": "Dongguan Zhongfu Electronic Technology Co., Ltd.",
      "reviews": "3.3/5.0 (197)",
      "url": "https://www.alibaba.com/product-detail/Hot-selling-S24-Ultra_1600969407142.html",
      "storeLink": "https://fukadi.en.alibaba.com/company_profile.html",
      "image": "https://s.alicdn.com/@sc04/kf/H771126c0475c4a3d9ee7842740b0cf4an.jpg_300x300.jpg"
    }
  ]
}

That structure is what makes Alibaba data useful for B2B work: the price ranges and minimum order quantities feed sourcing comparisons, the supplier names and store links let you shortlist vendors, and the review strings give you a rough quality signal per listing. For more on turning marketplace data into pricing decisions, see our guide on web scraping for price intelligence.

Scaling across pages and queries

One query on one page is a demo; a real sourcing job runs over several searches and deeper into the results. Alibaba's trade search paginates with the page query parameter, so page=2 is the second page, page=3 the third, and so on. The shape stays the same: build each URL, fetch it through the Crawling API, and parse it with the same function. The one habit that keeps a long run healthy is pacing, so pause between requests rather than firing them in a tight loop.

javascript
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));

async function scrapeQuery(searchText, pages = 3) {
  const encoded = encodeURIComponent(searchText).replace(/%20/g, '+');
  const allResults = [];

  for (let page = 1; page <= pages; page++) {
    const url =
      `https://www.alibaba.com/trade/search?SearchText=${encoded}&page=${page}`;
    const response = await api.get(url);
    if (response.originalStatus === 200) {
      allResults.push(...parseSerp(response.body).results);
    }
    await sleep(3000);
  }

  console.log(`Collected ${allResults.length} products across ${pages} pages`);
  return allResults;
}

Crawlbase serves up to 20 requests per second by default, which is plenty of headroom for a scraper that paces itself; if you genuinely need more, support can raise it. Any 5XX response from the API is free of charge, so retrying a blocked or unavailable URL costs you nothing. If you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart AI Proxy gives you the same IP rotation as a drop-in proxy endpoint. For the broader marketplace playbook, our guide on ecommerce web scraping covers the patterns that carry across sites.

Staying unblocked

Even with a trusted IP handled, Alibaba watches for scraper-shaped traffic, and its marketplace pages render heavily, so a few habits keep a run healthy.

  • Pace your requests. Hammering search pages in a tight loop is the fastest way to get challenged. Spread requests out and vary your queries instead of paging one term at full speed.
  • Lean on rotation. A pool of rotating IPs spreads requests across many addresses so no single one trips a limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
  • Render when cards are missing. If the product cards come back empty, the page needed JavaScript to build them. Turn on the Crawling API's rendering option so the SERP is fetched the way a real browser would load it. Our guide on crawling JavaScript websites explains when that matters.
  • Re-inspect when fields go empty. Alibaba changes its markup periodically. If cards stop parsing, open a live page in dev tools and update the selectors.

For the broader playbook, see how to scrape websites without getting blocked.

Whether scraping Alibaba is allowed depends on Alibaba's terms of service, your jurisdiction, and what you do with the data. Alibaba's terms place limits on automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Alibaba's terms and its robots.txt, and treat both as the boundary for what you collect.

A few lines worth holding to. Collect only public search-results data: the product titles, prices, minimum order quantities, supplier names, and links that anyone can see on a SERP without an account. Keep your request volume low enough that you are not straining Alibaba's servers, and pace your crawl rather than running it flat out. Do not scrape buyer or supplier contact details, messages, or anything behind a login, and do not redistribute product images or descriptions as if they were your own.

This guide is deliberately scoped to public search-results pages because that is the line that keeps the work defensible. Alibaba operates an Open Platform with official APIs for partners who need sanctioned, higher-volume access to product and supplier data, and that is the correct path when a project outgrows public-page scraping. For anything involving personal data, account-gated content, or copyrighted media you plan to republish, an official data agreement is the right route, not a cleverer scraper.

Recap

Key takeaways

  • Plain requests fall short. Alibaba builds its SERP with JavaScript and challenges bot-shaped traffic, so you need rendering plus a trusted, rotating IP to get the real product cards.
  • The Crawling API fetches behind a real IP. Send it the URL, it rotates IPs server-side, renders the cards, and clears CAPTCHAs, then returns finished HTML for you to parse.
  • Cheerio does the extraction. Select each .J-search-card-wrapper, then read title, price, minimum order, supplier, and links from it, and expect the class names to drift.
  • Paginate with the page parameter and export both formats. Increment page to walk deeper, pace with a sleep, and write JSON for structure and CSV for spreadsheets.
  • Stay on public data. Respect Alibaba's ToS and robots.txt, keep volume low, never touch contact or account data, and use the official Open Platform API when you outgrow public-page scraping.

Frequently Asked Questions (FAQs)

Why does a plain request fail or return an empty page on Alibaba?

Alibaba assembles much of its search results page with JavaScript after the initial HTML loads, so a raw fetch can return a shell with the product cards missing. It also flags traffic that does not look like a real browser and can serve a CAPTCHA or block. Fetching through the Crawling API, which renders the page and uses rotating IPs, makes the request look like an ordinary visitor so you get the real product listings.

Can I scrape Alibaba search results with Node.js?

Yes. With the Crawlbase client and Cheerio you can fetch a SERP and pull out titles, prices, minimum order quantities, suppliers, and links. The Crawling API acts as the bridge that gets your request to Alibaba from a trusted IP and renders the page, so requests are processed smoothly instead of being blocked. For a broader primer, see our guide on how to build a web scraper with Node.js.

What fields can I extract from an Alibaba SERP?

This tutorial pulls the product title, price, minimum order quantity, supplier name, product link, store link, image, and review string from each card, plus the total result count for the query. Stay within public search-results data and avoid anything behind a login, including supplier contact details and account-gated content.

Do I need JavaScript rendering to scrape Alibaba?

Often yes, because the product cards are built by scripts after the page loads. If a basic fetch returns an empty offer list, turn on the Crawling API's JavaScript rendering option so the page is fetched the way a real browser would load it. Our guide on crawling JavaScript websites covers when that is necessary.

How do I paginate through more Alibaba results?

Use the page query parameter: page=2 is the second page, page=3 the third, and so on. Build each page URL, fetch it through the Crawling API, parse it with the same function, and pause a few seconds between requests so you are pacing the crawl rather than hammering it.

My selectors return nothing. What changed?

Almost certainly Alibaba's markup. Class names like J-search-card-wrapper and search-card-e-price-main change when Alibaba redeploys its front end, so selectors that worked last month can break. Re-inspect a live SERP in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available