Tokopedia is one of Indonesia's largest online marketplaces, with tens of millions of active buyers and hundreds of thousands of merchants listing products across electronics, fashion, groceries, and home goods. Its public search and product pages are a rich demand signal: the titles, prices, sellers, ratings, and sold counts that show up for any query are exactly what teams use for price tracking, competitor research, and category trend analysis.
This guide shows you how to scrape Tokopedia data with JavaScript and Node.js using cheerio. You build a small, runnable scraper that fetches Tokopedia search listings and product pages through the Crawling API, parses each product's title, price, shop, rating, sold count, and link, handles pagination, and exports the result as JSON and CSV. The whole walkthrough stays scoped to public product-listing data, and the legality section near the end is worth reading before you point this at any real volume.
What you will build
A Node.js script that takes a public Tokopedia search URL, retrieves the rendered HTML through the Crawling API, and extracts a structured record for every product on the results grid. We use a headset search as the running example and pull these fields per item:
- Title the product name shown on the card.
- Price the price as displayed, for example "Rp178.000".
- Shop the store or seller name listing the product.
- Rating the star rating text when the card shows one.
- Sold the units-sold count when present, for instance "60+ terjual".
- Link the URL to the individual product page.
Why a plain request fails on Tokopedia
If you request a Tokopedia search URL with a bare HTTP client, you almost never get the product grid back. Two things work against you. First, Tokopedia renders its listing cards in the browser with JavaScript, so the initial HTML is a near-empty shell until the page's scripts run and fetch the product data. Second, the platform flags automated traffic: datacenter IPs and request patterns that do not look like a real browser get rate-limited or blocked before they reach the rendered listings.
So a working Tokopedia scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL, it renders the page behind a trusted IP, and it returns finished HTML for you to parse with cheerio.
Crawlbase issues two tokens: a normal token for static sites and a JavaScript token for browser-rendered content. Tokopedia loads its products through JavaScript, so use the JavaScript token here. The free tier gives you 1,000 requests with no card so you can test the whole flow before paying for anything.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic JavaScript and Node.js. You should be comfortable writing and running a Node script and installing packages with npm. If you are new to Node, the guide to building a web scraper with Node.js covers the ground this tutorial assumes.
Node.js 16 or later. Confirm your version with node --version. If you do not have it, install it from the Node.js website or through a version manager like nvm.
A Crawlbase account and token. Sign up, open your dashboard, and copy your JavaScript token. The free tier gives you 1,000 requests with no card. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create a project folder, initialize it, and install the two libraries the scraper needs.
node --version mkdir tokopedia-scraper && cd tokopedia-scraper npm init -y npm install crawlbase cheerio
Two dependencies do the work: crawlbase is the official Node client for the Crawling API, and cheerio parses the returned HTML with a jQuery-style API so you can pull out individual fields by CSS selector. Create a file named tokopedia-scraper.js in this folder and add the code from the steps below.
Step 1: Fetch the rendered search page
Start by getting the finished page. Import the CrawlingAPI class, initialize it with your token, and request the search URL. Tokopedia loads results lazily, so pass ajax_wait and a page_wait delay to give the page time to render before the API captures it. Checking the status code before you parse keeps failures loud instead of silent.
const { CrawlingAPI } = require('crawlbase'); const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' }); const options = { ajax_wait: 'true', page_wait: '5000' }; const searchURL = 'https://www.tokopedia.com/search?q=headset'; api .get(searchURL, options) .then((response) => { if (response.statusCode === 200) { console.log(response.body.slice(0, 500)); } }) .catch((error) => console.error('API request error:', error));
Run the script with node tokopedia-scraper.js and you should see real Tokopedia product markup near the top of the body, not a stripped-down shell. That confirms rendering works before you write a single selector. The ajax_wait flag tells the API to wait for in-page requests to settle, and page_wait holds for an extra five seconds so lazily loaded cards have time to appear.
That first request just returned a fully rendered Tokopedia search page without a headless browser or a proxy on your side. The Crawling API runs the page in a real browser, waits for the JavaScript-loaded cards to settle, and rotates through residential IPs server-side, so you get finished HTML from one call. Point it at any public search query on the free tier first.
Step 2: Parse each product with cheerio
With rendered HTML in hand, load it into cheerio and walk the product cards. Tokopedia lays each result out in a repeating container inside the search-results region, so you select every card, then read the title, price, shop, rating, sold count, and link from inside it. Reading each field defensively keeps one missing value from crashing the run.
const cheerio = require('cheerio'); function parseSearchListings(html) { const $ = cheerio.load(html); const products = []; const cards = $( 'div[data-testid="divSRPContentProducts"] div.css-5wh65g' ); cards.each((index, element) => { const card = $(element); const title = card .find('span.OWkG6oHwAppMn1hIBsC3pQ\\=\\=') .text() .trim(); const price = card .find('div.ELhJqP-Bfiud3i5eBR8NWg\\=\\=') .text() .trim(); const shop = card .find('span.X6c-fdwuofj6zGvLKVUaNQ\\=\\=') .text() .trim(); // Rating and sold count sit in small text rows under the price const rating = card.find('span.nBBbPk2cBpbZJ2nFPN8jKA\\=\\=').text().trim(); const sold = card.find('span.eLNb-rRDe6X9p64ZsQAx9w\\=\\=').text().trim(); const link = card.find('a.Nq8NlC5Hk9KgVBJzMYBUsg\\=\\=').attr('href'); if (title) { products.push({ title: title, price: price || 'N/A', shop: shop || 'N/A', rating: rating || 'N/A', sold: sold || 'N/A', link: link || 'N/A', }); } }); return products; }
A few details keep this faithful to the page. The cards live under the divSRPContentProducts test id, and each card carries the title in a span with class OWkG6oHwAppMn1hIBsC3pQ==, the price in a div with class ELhJqP-Bfiud3i5eBR8NWg==, and the shop name in a span with class X6c-fdwuofj6zGvLKVUaNQ==. The product link is read from the card's anchor href. Those class names contain literal == characters, so they are escaped as \\=\\= inside the cheerio selector strings. The rating and sold-count selectors follow the same small-text pattern under the price.
Tokopedia's hashed class names (OWkG6oHwAppMn1hIBsC3pQ== and the rest) are generated and change without notice. Treat the selectors above as a starting template, not a contract. When a field comes back empty, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.
Step 3: Handle pagination
Tokopedia spreads search results across multiple pages, and each one is reachable by appending a page parameter to the URL, such as &page=2. Loop from the first page to the last you want, fetch each through the Crawling API, parse it with the function from Step 2, and collect everything into one array. If a page fails to fetch, stop early rather than pushing empty results.
async function fetchHtml(url) { const response = await api.get(url, options); if (response.statusCode === 200) return response.body; console.error(`Failed to fetch page: ${response.statusCode}`); return null; } async function scrapeMultiplePages(baseUrl, maxPages) { const allProducts = []; for (let page = 1; page <= maxPages; page++) { const paginatedUrl = `${baseUrl}&page=${page}`; const html = await fetchHtml(paginatedUrl); if (!html) break; const products = parseSearchListings(html); allProducts.push(...products); console.log(`Page ${page}: ${products.length} products`); } return allProducts; }
This loops through the pages you ask for, scrapes the listings from each, and aggregates the results. Because every search page shares the same card structure, the parser you wrote in Step 2 works across all of them without changes.
Step 4: Assemble the full script with JSON and CSV export
Now wire the fetch, the parser, and pagination into one runnable script, then write the records to disk as both JSON and CSV. The JSON keeps the full nested structure; the CSV drops straight into a spreadsheet.
const fs = require('fs'); const { CrawlingAPI } = require('crawlbase'); const cheerio = require('cheerio'); const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' }); const options = { ajax_wait: 'true', page_wait: '5000' }; // Paste parseSearchListings, fetchHtml, and scrapeMultiplePages here function toCsv(rows) { const headers = ['title', 'price', 'shop', 'rating', 'sold', 'link']; const escape = (value) => `"${String(value).replace(/"/g, '""')}"`; const lines = [headers.join(',')]; for (const row of rows) { lines.push(headers.map((h) => escape(row[h])).join(',')); } return lines.join('\n'); } async function main() { const baseUrl = 'https://www.tokopedia.com/search?q=headset'; const maxPages = 5; const products = await scrapeMultiplePages(baseUrl, maxPages); fs.writeFileSync( 'tokopedia_search_results.json', JSON.stringify(products, null, 2) ); fs.writeFileSync('tokopedia_search_results.csv', toCsv(products)); console.log(`Saved ${products.length} products to JSON and CSV`); } main();
Paste the parseSearchListings, fetchHtml, and scrapeMultiplePages functions into the same file so main can call them. Run it with node tokopedia-scraper.js and you get two files: tokopedia_search_results.json with the full structured records and tokopedia_search_results.csv ready to open in a spreadsheet. The toCsv helper quotes every field and doubles any embedded quotes, which matters here because Tokopedia titles are long and frequently contain commas.
What the output looks like
The JSON file holds one object per product in search order, each with the title, price, shop, rating, sold count, and link.
[ { "title": "Ipega PG-R008 Gaming Headset for P4 /X1 series/N-Switch Lite/Mobile", "price": "Rp178.000", "shop": "ipegaofficial", "rating": "4.9", "sold": "250+ terjual", "link": "https://www.tokopedia.com/ipegaofficial/ipega-pg-r008-gaming-headset" }, { "title": "Hippo Toraz Handsfree Earphone Stereo Sound - headset, Putih", "price": "Rp13.000", "shop": "HippoCenter", "rating": "4.8", "sold": "1rb+ terjual", "link": "https://www.tokopedia.com/hippocenter88/hippo-toraz-handsfree-earphone" } ]
The CSV mirrors the same rows with a header line, so it drops straight into Excel, Google Sheets, or any data pipeline that reads delimited files.
title,price,shop,rating,sold,link "Ipega PG-R008 Gaming Headset for P4 /X1 series","Rp178.000","ipegaofficial","4.9","250+ terjual","https://www.tokopedia.com/ipegaofficial/ipega-pg-r008-gaming-headset" "Hippo Toraz Handsfree Earphone Stereo Sound - headset, Putih","Rp13.000","HippoCenter","4.8","1rb+ terjual","https://www.tokopedia.com/hippocenter88/hippo-toraz-handsfree-earphone"
Scraping individual product pages
Search listings give you breadth; product pages give you depth. Once you have a link from the search scraper, you can fetch the product page through the same Crawling API and pull richer fields. On a Tokopedia product page the name sits in an h1 with data-testid="lblPDPDetailProductName", the price in a div with data-testid="lblPDPDetailProductPrice", the shop in an a with data-testid="llbPDPFooterShopName", the description in a div with data-testid="lblPDPDescriptionProduk", and the thumbnail images in img tags inside the data-testid="PDPImageThumbnail" buttons.
async function scrapeProductPage(url) { const html = await fetchHtml(url); if (!html) return null; const $ = cheerio.load(html); const images = $('button[data-testid="PDPImageThumbnail"] img') .map((i, el) => $(el).attr('src')) .get(); return { name: $('h1[data-testid="lblPDPDetailProductName"]').text().trim(), price: $('div[data-testid="lblPDPDetailProductPrice"]').text().trim(), shop: $('a[data-testid="llbPDPFooterShopName"]').text().trim(), description: $('div[data-testid="lblPDPDescriptionProduk"]').text().trim(), images: images, }; }
This reuses the same fetchHtml helper from the search scraper, so the product-page scraper inherits the rendering and IP rotation without any extra setup. Feed it any link the search scraper collected and it returns a single structured object you can store the same way, with JSON.stringify or a row in your CSV.
Staying unblocked
Even with rendering handled, Tokopedia watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.
- Pace your requests. Introduce a delay between page fetches rather than hammering the search results in a tight loop. Spreading requests out is the single biggest factor in staying under the platform's rate limits.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
- Read the status codes. A run that starts returning non-200 responses is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.
For the broader playbook, see how to scrape websites without getting blocked and the deeper guide to crawling JavaScript websites, which both cover the rendering and rotation side in more detail.
Is it legal to scrape Tokopedia?
Whether scraping Tokopedia is allowed depends on Tokopedia's terms of service, your jurisdiction, and what you do with the data. Tokopedia's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Tokopedia's Terms and Conditions and its robots.txt, and treat both as the boundary for what you collect.
A few lines are worth holding to. Collect only public product data: the title, price, shop name, rating, sold count, and product link that anyone can see on a search or product page without an account. Respect Tokopedia's stated rate expectations and keep your request volume low enough that you are not straining its servers. Avoid personal data, including anything tied to identifiable buyers or reviewers beyond the public review text and star counts shown on the page. Do not redistribute Tokopedia's copyrighted media, such as merchant product photography, as if it were your own.
This guide is deliberately scoped to public search and product listings because that is the line that keeps the work defensible. It does not cover anything behind a login, buyer or seller personal data, order history, chat, or any attempt to bypass authentication or a CAPTCHA you were not meant to pass. If your project needs more than public listings, the correct path is a sanctioned data agreement or any official API channel Tokopedia offers, not a cleverer scraper. When in doubt, prefer permission over assuming silence is consent.
Key takeaways
- Tokopedia renders listings client-side and rate-limits hard. A plain request returns an empty shell, so you must render the page behind a trusted IP before you parse it, using the JavaScript token.
-
The Crawling API does both in one call. It renders the page with
ajax_waitandpage_wait, rotates residential IPs, and hands you finished HTML to parse with cheerio. -
cheerio extracts the fields. Select every product card, then read title, price, shop, rating, sold count, and link, escaping the hashed
==class names and expecting them to drift over time. -
Pagination and product pages extend the scraper. Loop the
pageparameter for breadth, then reuse the same fetch helper to pull name, price, shop, description, and images from individual product pages. - Stay on public data. Respect Tokopedia's ToS and robots.txt, pace your requests, export to JSON and CSV, and avoid anything behind a login or any personal data.
Frequently Asked Questions (FAQs)
Is it legal to scrape data from Tokopedia?
Scraping Tokopedia can be permissible when you stay on public product data and follow Tokopedia's terms of service. Review the site's rules and robots.txt first, keep your request volume reasonable, and avoid personal or login-walled data. Use what you collect for legitimate purposes like research, price tracking, or analysis, not for anything that violates Tokopedia's policies or local law.
Why do I need the Crawling API to scrape Tokopedia?
Tokopedia loads its products through JavaScript, so a raw HTTP request usually returns an empty shell rather than the product cards. The Crawling API renders the page in a real browser, waits for the lazily loaded content with ajax_wait and page_wait, and rotates residential IPs so you reach the finished listings without managing a browser fleet or a proxy pool.
What data points can I extract from Tokopedia?
From search listings you can pull the product title, price, shop name, rating, sold count, and link. From individual product pages you can extract the name, price, shop, full description, and image URLs. Together these cover the public fields most teams need for price monitoring, competitor research, and product trend analysis.
Why are my Tokopedia selectors returning empty values?
Almost certainly the markup changed. Tokopedia's hashed class names like OWkG6oHwAppMn1hIBsC3pQ== are generated and change without notice, so selectors that worked last month can break. Re-inspect a live page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.
How do I handle pagination on Tokopedia search results?
Append a page parameter to the search URL, for example &page=2, and loop from the first page to the last you want. Fetch each page through the Crawling API, parse it with the same cheerio function, and collect the results into one array before you export. Stop early if a page fails to fetch so you do not pad your data with empty results.
Can I scrape buyer or seller personal data from Tokopedia?
No, and this guide does not cover it. Buyer accounts, order history, chat, and anything behind a login are not public data. Scraping login-walled content, personal data about buyers or reviewers beyond the public review text, or bypassing authentication is out of scope here and runs against Tokopedia's terms. For broader needs, see the ecommerce web scraping guide or the overview of web scraping for price intelligence, and prefer a sanctioned data agreement for anything beyond public listings.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
