Samsung's storefront is one of the cleaner public catalogs of consumer electronics on the web. Each smartphone listing carries a model name, a price, customer ratings, and a short spec sheet, all visible to anyone who opens the page. That makes it a useful source for price tracking, competitive research, and trend analysis: the public catalog tells you what Samsung is selling, at what price, and how shoppers rate it.
This guide shows you how to scrape Samsung products with Node.js, using the Crawlbase Crawling API to fetch the rendered page and cheerio to pull each field out of the HTML. The whole walkthrough stays scoped to public catalog data: the product name, model, price, rating, review count, key specs, and product URL that the storefront shows every visitor. Nothing here touches accounts, orders, or anything behind a sign-in.
What you will build
A small Node.js script that requests a Samsung listing page, gets back fully rendered HTML, loops over each product card, and writes a clean record per phone to JSON. Each record captures:
- Product name. The model line as shown on the card, for example Galaxy S24 Ultra.
- Model and variant. The storage and configuration options listed for that product.
- Price. The catalog price the storefront displays for the listing.
- Rating. The average customer rating score.
- Review count. How many ratings that score is based on.
- Key specs. The short spec bullets on the card, such as display size, camera, and battery.
- Product URL. The link to the full product page.
Why a plain request fails on Samsung
If you point a bare HTTP client at a Samsung listing URL, you rarely get the catalog you see in a browser. The storefront renders much of its content client-side: the product grid, prices, and ratings load through JavaScript after the initial HTML arrives. A raw request returns the page shell with the data slots still empty, so cheerio finds nothing to parse.
On top of that, large retail sites watch for automated traffic. Datacenter IPs and request patterns that do not look like a real browser get challenged or throttled before they reach the useful markup. So a working Samsung scraper needs two things in one request: a browser that actually renders the page, and an IP the storefront reads as a real visitor. You can build that yourself with a headless browser like Puppeteer plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL, it renders the page behind a trusted IP, and it returns finished HTML for you to parse.
Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Samsung's catalog relies on client-side rendering, so the JS token is the right choice here. If fields come back empty with the normal token, switch to the JS token and the rendered markup will include them.
Prerequisites
A few things need to be in place before writing any code. None of them take long.
Node.js 18 or later. Check your version with node --version. If you do not have it, install the current LTS from nodejs.org. Node lets you run JavaScript outside the browser and gives you npm for installing packages.
Basic JavaScript. You should be comfortable writing a script, running it from the terminal, and installing packages with npm. A working knowledge of promises and then chains will make the code read cleanly.
A Crawlbase account and token. Sign up, open your dashboard, and copy your token from the account docs page. Use the JavaScript (JS) token for the Samsung storefront. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create a folder, initialize a project, and install the two dependencies the scraper needs.
mkdir scrape-samsung-products && cd scrape-samsung-products npm init -y npm install crawlbase cheerio
Two dependencies do the work: crawlbase is the official client for the Crawling API, and cheerio is a fast, lightweight parser that gives you jQuery-style selectors over the returned HTML with no browser attached. Create an index.js file in the folder; that is where the code below goes.
Step 1: Fetch the rendered page
Start by getting the finished catalog. Import CrawlingAPI from the crawlbase package, initialize it with your JS token, and request the Samsung listing URL. The two wait options matter here: ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so the late-rendering grid appears before the page is captured. Checking the status code before parsing keeps failures loud instead of silent.
const { CrawlingAPI } = require("crawlbase"); const api = new CrawlingAPI({ token: "YOUR_CRAWLBASE_JS_TOKEN" }); const samsungProductsURL = "https://www.samsung.com/levant/smartphones/all-smartphones/"; api .get(samsungProductsURL, { ajax_wait: true, page_wait: 10000 }) .then((response) => { if (response.statusCode === 200) { console.log(response.body.slice(0, 500)); } else { throw new Error(`Failed to fetch HTML. Status: ${response.statusCode}`); } }) .catch(console.error);
Run it with node index.js and you should see real catalog markup, not a stripped-down shell. Ten seconds of page_wait is generous for a grid this heavy; lower it once you confirm the products are present. That output confirms rendering works before you write a single selector. If you want more background on why client-side pages need this step, see how to crawl JavaScript websites.
That single api.get call you just ran did two jobs at once: a Samsung listing needs a rendered page behind a trusted IP, and the Crawling API takes your token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML. You skip running a headless browser fleet and a proxy pool yourself. Point it at a public listing on the free tier first.
Step 2: Parse the product fields with cheerio
With rendered HTML in hand, load it into cheerio and walk the product grid. The pattern is: find the repeating product card, then for each card pull the fields you want by CSS selector. Samsung's storefront lays each card out predictably, so you can map the product name, color, variants, rating, specs, and URL to individual selectors. The selectors below come straight from the live catalog markup.
const cheerio = require("cheerio"); function scrapeProducts(html) { const $ = cheerio.load(html); const products = []; $(".js-pfv2-content-wrap .js-pfv2-product-card").each((_, element) => { const card = $(element); const name = card .find(".pd03-product-card__product-name-text") .text() .trim(); const color = card .find(".option-selector-v2__color-name-text-in") .text() .trim(); const variants = card .find(".option-selector-v2__size-text") .map((_, el) => $(el).text().trim()) .get(); const rating = card .find(".rating__point span:last-child") .text() .trim(); const reviewCount = card .find(".rating__count") .text() .replace(/[^0-9]/g, ""); const specifications = card .find(".pd03-product-card__spec-list .pd03-product-card__spec-item") .map((_, el) => $(el).text().trim()) .get(); const url = card .find(".pd03-product-card__product-image-link") .attr("href"); products.push({ name, model: [color, ...variants].filter(Boolean).join(", "), rating, reviewCount, specifications: specifications.join(", "), url: url && (url.startsWith("http") ? url : `https://www.samsung.com${url}`), }); }); return products; }
Each field maps to a selector you can verify yourself: right-click the element on a live Samsung listing, choose Inspect, and read the class off the highlighted node. .text() reads the visible label; .attr("href") reads the link. The .map().get() pair turns a set of matched elements (the variants and spec bullets) into a plain array. Because hrefs on the storefront are sometimes relative, the code prefixes the domain when the URL does not already start with http. For more on building robust selectors, see the guide to XPath and CSS selectors.
Step 3: Assemble the full script
Now wire the fetch and the parser together: request the page, hand the body to scrapeProducts, log the result, and write it to a JSON file. This is the complete, copy-pasteable script.
const { CrawlingAPI } = require("crawlbase"); const cheerio = require("cheerio"); const fs = require("fs"); const api = new CrawlingAPI({ token: "YOUR_CRAWLBASE_JS_TOKEN" }); const samsungProductsURL = "https://www.samsung.com/levant/smartphones/all-smartphones/"; function scrapeProducts(html) { const $ = cheerio.load(html); const products = []; const totalResults = $(".js-pfv2-result-total-count").text().trim(); $(".js-pfv2-content-wrap .js-pfv2-product-card").each((_, element) => { const card = $(element); const color = card .find(".option-selector-v2__color-name-text-in") .text() .trim(); const variants = card .find(".option-selector-v2__size-text") .map((_, el) => $(el).text().trim()) .get(); const url = card .find(".pd03-product-card__product-image-link") .attr("href"); products.push({ name: card.find(".pd03-product-card__product-name-text").text().trim(), model: [color, ...variants].filter(Boolean).join(", "), rating: card.find(".rating__point span:last-child").text().trim(), reviewCount: card.find(".rating__count").text().replace(/[^0-9]/g, ""), specifications: card .find(".pd03-product-card__spec-list .pd03-product-card__spec-item") .map((_, el) => $(el).text().trim()) .get() .join(", "), url: url && (url.startsWith("http") ? url : `https://www.samsung.com${url}`), }); }); return { totalResults, products }; } api .get(samsungProductsURL, { ajax_wait: true, page_wait: 10000 }) .then((response) => { if (response.statusCode !== 200) { throw new Error(`Failed to fetch HTML. Status: ${response.statusCode}`); } const data = scrapeProducts(response.body); console.log(data); fs.writeFileSync("samsung-scraped.json", JSON.stringify(data, null, 2)); }) .catch(console.error);
Run node index.js again. The script fetches the rendered catalog, parses every product card, prints the result, and saves it to samsung-scraped.json. The totalResults field captures the count the storefront shows so you can sanity-check how many products the page reported against how many cards you parsed.
What the output looks like
Each entry is one phone with the fields mapped to clean keys. A trimmed sample of the JSON file:
{ "totalResults": "42 results", "products": [ { "name": "Galaxy S24 Ultra", "model": "Titanium Yellow, 256 GB, 512 GB, 1 TB", "rating": "4.3", "reviewCount": "128", "specifications": "Industry-leading hardware meets world-changing AI, Made with titanium. Built to last, 200MP high-resolution photography", "url": "https://www.samsung.com/levant/smartphones/galaxy-s24-ultra/" }, { "name": "Galaxy A04s", "model": "Black, 64 GB, 128 GB", "rating": "5.0", "reviewCount": "17", "specifications": "6.5\" Infinity-V Display, 50MP Camera, 5000mAh Battery", "url": "https://www.samsung.com/levant/smartphones/galaxy-a/galaxy-a04s-black-64gb-sm-a047fzkgmeb/" } ] }
From here, JSON drops straight into a database or a notebook. If you would rather have a flat file for a spreadsheet, map the array to rows and write a CSV instead; the field set stays the same.
Scaling across categories and pages
One catalog page is the building block. A real job collects several. Samsung publishes separate listing URLs per region and category (all-smartphones, foldables, tablets, and so on), and longer lists paginate or load more cards as you scroll. To scale, keep a list of listing URLs and run the same fetch-and-parse over each one, then concatenate the results.
const listingURLs = [ "https://www.samsung.com/levant/smartphones/all-smartphones/", "https://www.samsung.com/levant/smartphones/galaxy-z/", ]; async function scrapeAll(urls) { const all = []; for (const url of urls) { const response = await api.get(url, { ajax_wait: true, page_wait: 10000 }); if (response.statusCode === 200) { all.push(...scrapeProducts(response.body).products); } await new Promise((r) => setTimeout(r, 2000)); } return all; }
The short setTimeout between requests paces the run so you are not hammering the storefront. Because every listing shares the same card layout, the one scrapeProducts parser works across all of them. If a category uses an infinite-scroll grid rather than discrete pages, the Crawling API can scroll the page for you before capture, so the additional cards land in the HTML you parse.
Staying unblocked
The Crawling API handles the hard part of not getting blocked: it renders each page in a real browser and routes the request through rotating residential IPs, so your traffic looks like ordinary visitors rather than a single hammering address. Keep your own request rate reasonable, pace batches with a short delay as shown above, and watch the status codes so you back off when challenges appear. If you build your own stack with Puppeteer instead, the proxy rotation and IP health are the parts to invest in. For a broader treatment, read how to scrape websites without getting blocked.
Is it legal to scrape Samsung?
Scraping public catalog data carries far less risk than scraping content behind a login, but the answer still depends on what you collect, how you use it, and the rules of the site. This guide is deliberately scoped to public product information: the model name, price, rating, review count, key specs, and product URL that Samsung's storefront shows every visitor without an account. That is the data a shopper sees, and collecting it for price tracking or research is the defensible case.
Before you run anything at volume, read Samsung's terms of service and check its robots.txt to see which paths the site asks crawlers to leave alone, and honor those requests. Keep your request rate modest so you are not straining its servers. This walkthrough does not cover anything behind a sign-in, personal or account data, order history, or copyrighted media such as product photography that you would redistribute. Those are out of scope here, and reaching them runs against Samsung's terms.
For commercial or high-volume use, prefer official channels. If you need bulk catalog or pricing data for resale or distribution, Samsung's partner and developer programs are the right route, and reaching out for permission or a data feed is cleaner than scraping at scale. When the public catalog is genuinely the only source for what you need, keep the work scoped to that public data, respect the site's stated rules, and consult a legal professional if your use case is commercial or unclear.
Key takeaways
- Render before you parse. Samsung's catalog loads client-side, so a plain fetch returns an empty shell; the Crawling API with the JS token renders the grid first.
-
One call covers rendering and a trusted IP.
api.getwithajax_waitandpage_waitwaits for the late-loading grid and routes through residential IPs server-side. - cheerio does the extraction. Map name, model, rating, review count, specs, and URL to the storefront's card selectors, and expect those classes to drift over time.
- Scale by looping listing URLs with pacing. The same parser works across every category, so a real job is a list of listing links plus a short delay between requests.
-
Stay on public data. Collect only catalog fields anyone can see, respect Samsung's ToS and
robots.txt, and prefer official channels for commercial or high-volume needs.
Frequently Asked Questions (FAQs)
Why does a plain request return no products from Samsung?
Because the storefront renders its product grid client-side with JavaScript. A raw HTTP request comes back with the page shell but the catalog slots still empty, since the products, prices, and ratings fill in only after the page's scripts run in a browser. The Crawling API's JS token renders the page first, so the cards are present when cheerio parses them.
Do I need the normal token or the JS token for Samsung?
Use the JS token. The normal token fetches static HTML, which on Samsung's storefront leaves the product grid empty. The JS token renders the page in a real browser before handing back the HTML, so the name, model, price, rating, review count, specs, and URL are all present when you parse.
What data can I scrape from a Samsung listing page?
Public catalog fields: the product name, model and variant options, price, average rating, review count, the key spec bullets on each card, and the product URL. This guide stays scoped to that public data. It does not cover account information, order history, or anything behind a sign-in.
My cheerio selectors return empty strings. What changed?
Almost certainly Samsung's markup. Class names like js-pfv2-product-card, pd03-product-card__product-name-text, and rating__point change without notice, so selectors that worked last month can break. Re-inspect a live listing in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.
Can I scrape many Samsung pages without getting blocked?
Yes, within reason. The Crawling API rotates through residential IPs and renders each page in a real browser, so individual requests look like genuine visits. Keep your per-IP rate low, add a short delay between requests as shown in the scaling section, and vary your targets instead of looping one path. Watch the status codes and back off when you start seeing challenges.
Does Samsung offer an official way to get product data?
For commercial or high-volume use, Samsung's partner and developer programs are the right channel, and requesting permission or a data feed is cleaner than scraping at scale. Reach for scraping only when the public catalog is genuinely the only source for the public fields you need, and keep the work scoped to that public data.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

