Noon is one of the largest e-commerce marketplaces in the Middle East, serving millions of shoppers across the UAE, Saudi Arabia, and Egypt. Its catalog spans electronics, fashion, beauty, and groceries, and the prices and ratings on those listing pages are a clean public signal for anyone tracking a competitor, studying a category, or building a price monitor. The data sits right there on each search page: product titles, prices, star ratings, brands, and a link to every item.
This guide shows you how to scrape Noon data with JavaScript and Node.js using cheerio. You build a small, runnable scraper that fetches a Noon search listing through the Crawling API, parses the title, price, rating, brand, and link for each product, handles pagination across result pages, and exports the result as JSON and CSV. The whole walkthrough stays scoped to public product-listing data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.
What you will build
A Node.js script that takes a public Noon search URL plus a query, retrieves the rendered HTML through the Crawling API, and extracts a structured record for every product on the listing. We use the UAE storefront and the query "smartphones" as the running example and pull these fields per item:
-
Title the product name, read from the
data-qa="product-name"element. - Price the numeric amount shown on the card, for example "1,799".
- Currency the currency label that sits next to the amount, like "AED".
- Rating the star rating text when the product has reviews.
- Brand the brand label shown above the product name on the card.
- Link the absolute URL to the individual product page.
Why a plain request fails on Noon
If you request a Noon search URL with a bare HTTP client, you rarely get the product grid back. Two things work against you. First, Noon renders its listing cards in the browser with JavaScript, so the initial HTML is a near-empty shell until the page's scripts run and the product data loads over AJAX. Second, Noon flags automated traffic: datacenter IPs and request patterns that do not look like a real browser get challenged with a CAPTCHA, rate-limited, or blocked before they ever reach the rendered listings.
So a working Noon scraper needs two things in one request: a browser that actually renders the page and waits for the AJAX content, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL, it renders the page behind a trusted IP, waits for the dynamic content, and returns finished HTML for you to parse with cheerio.
Noon loads its product cards asynchronously, so the parameters that matter most are ajax_wait and page_wait. Setting ajax_wait: 'true' tells the Crawling API to hold the request until background fetches settle, and page_wait adds a fixed delay (in milliseconds) so slow cards have time to paint before the HTML is captured.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic JavaScript and Node.js. You should be comfortable writing and running a Node script and installing packages with npm. If you are new to Node, the official docs or any beginner course will get you to the level this tutorial assumes. The companion guide on how to build a web scraper with Node.js covers the basics if you want a refresher.
Node.js 16 or later. Confirm your version with node --version. If you do not have it, install it from the Node.js website or through a version manager like nvm.
A Crawlbase account and token. Sign up, open your dashboard, and copy your token from the account docs page. The free tier gives you 1,000 requests with no card. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create a project folder, initialize it, and install the two libraries the scraper needs.
node --version mkdir noon-scraper && cd noon-scraper npm init -y npm install crawlbase cheerio
Two dependencies do the work: crawlbase is the official Node client for the Crawling API, and cheerio parses the returned HTML with a jQuery-style API so you can pull out individual fields by CSS selector. Create a file named noon-scraper.js in this folder and add the code from the steps below.
Step 1: Fetch the rendered search page
Start by getting the finished page. Import the CrawlingAPI class, initialize it with your token, and request a Noon search URL with the AJAX-wait options set. Checking the status code before you parse keeps failures loud instead of silent.
const { CrawlingAPI } = require('crawlbase'); const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' }); function searchUrl(query, page) { return `https://www.noon.com/uae-en/search/?q=${query}&page=${page}`; } const options = { ajax_wait: 'true', page_wait: '5000' }; api .get(searchUrl('smartphones', 1), options) .then((response) => { if (response.statusCode === 200) { console.log(response.body.slice(0, 500)); } }) .catch((error) => console.error('API request error:', error));
Run the script with node noon-scraper.js and you should see real Noon listing markup at the top of the body, not a stripped-down shell. That confirms rendering and the AJAX wait worked before you write a single selector. If you do not need custom fields and would rather have structured JSON back without writing a parser, pass autoparse: 'true' in the options and the API returns parsed data directly.
That first request just returned a fully rendered Noon search page, AJAX cards and all, without a headless browser or a proxy on your side. The Crawling API runs the page in a real browser, waits for the dynamic content with ajax_wait and page_wait, rotates through residential IPs server-side, and handles the CAPTCHAs Noon throws at scrapers, so you get finished HTML from one call. Point it at the smartphones search on the free tier first.
Step 2: Parse each product with cheerio
With rendered HTML in hand, load it into cheerio and walk the product cards. Noon lays each search result out in a repeating container, so you select every card, then read the title, price, currency, rating, brand, and link from inside it. Reading each field defensively keeps one missing value from crashing the run.
const cheerio = require('cheerio'); function extractProducts(html) { const $ = cheerio.load(html); const products = []; $('div.grid > span.productContainer').each((index, element) => { const card = $(element); const title = card .find('div[data-qa="product-name"]') .text() .trim(); const price = card.find('strong.amount').text().trim(); const currency = card.find('span.currency').text().trim(); const rating = card.find('div.dGLdNc').text().trim(); const brand = card.find('div[data-qa="product-brand"]').text().trim(); const href = card.find('a').attr('href'); const link = href ? new URL(href, 'https://www.noon.com').href : ''; if (title && price) { products.push({ title, price, currency, rating, brand, link }); } }); return products; }
A few details keep this faithful to the page. The title comes from the data-qa="product-name" element, the numeric price sits in strong.amount, and the currency label is read separately from span.currency so you can keep "AED" out of the amount. The rating text lives in div.dGLdNc and is empty for products with no reviews, the brand sits in the data-qa="product-brand" element, and the link is read from the card's anchor href and resolved to an absolute URL so it works outside the page. The guard at the end only pushes a record when both a title and a price are present, which drops ad slots and empty placeholder cards.
Noon's hashed class names (dGLdNc and similar) are generated by its build and change without notice. Treat the selectors above as a starting template, not a contract. When a field comes back empty, re-inspect the live page in your browser's dev tools and update the selector. The stable data-qa attributes tend to outlast the hashed classes, so prefer them where the page offers one. Periodic selector maintenance is normal for any production scraper.
Step 3: Handle pagination across result pages
One search page is a demo; a real run walks the full result set. Noon paginates its search with a page query parameter, so you fetch each page in turn, parse it with the function from Step 2, and stop when a page comes back with no products. That empty-page check is what keeps you from looping past the end of the results.
async function fetchPage(query, page) { const options = { ajax_wait: 'true', page_wait: '5000' }; const response = await api.get(searchUrl(query, page), options); if (response.statusCode === 200) return response.body; console.error(`Failed to fetch page ${page}: ${response.statusCode}`); return null; } async function scrapeAllPages(query, maxPages) { const all = []; for (let page = 1; page <= maxPages; page++) { console.log(`Scraping page ${page}...`); const html = await fetchPage(query, page); if (!html) break; const products = extractProducts(html); if (products.length === 0) { console.log('No more results found. Stopping.'); break; } all.push(...products); } return all; }
The fetchPage helper wraps a single request and returns null on a non-200 status, and scrapeAllPages loops from page 1 up to maxPages, breaking early the moment a page yields zero products. Because each request renders a full page, keeping maxPages modest while you test is the friendly way to work: each JavaScript-rendered request costs more credits than a plain one, so confirm the parser on one or two pages before you turn the count up.
Step 4: Assemble the full script with JSON and CSV export
Now wire the fetch, the parse, and the pagination into one runnable script, then write the records to disk as both JSON and CSV.
const fs = require('fs'); const { CrawlingAPI } = require('crawlbase'); const cheerio = require('cheerio'); const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' }); function searchUrl(query, page) { return `https://www.noon.com/uae-en/search/?q=${query}&page=${page}`; } function extractProducts(html) { const $ = cheerio.load(html); const products = []; $('div.grid > span.productContainer').each((index, element) => { const card = $(element); const title = card.find('div[data-qa="product-name"]').text().trim(); const price = card.find('strong.amount').text().trim(); const currency = card.find('span.currency').text().trim(); const rating = card.find('div.dGLdNc').text().trim(); const brand = card.find('div[data-qa="product-brand"]').text().trim(); const href = card.find('a').attr('href'); const link = href ? new URL(href, 'https://www.noon.com').href : ''; if (title && price) { products.push({ title, price, currency, rating, brand, link }); } }); return products; } async function fetchPage(query, page) { const options = { ajax_wait: 'true', page_wait: '5000' }; const response = await api.get(searchUrl(query, page), options); if (response.statusCode === 200) return response.body; console.error(`Failed to fetch page ${page}: ${response.statusCode}`); return null; } async function scrapeAllPages(query, maxPages) { const all = []; for (let page = 1; page <= maxPages; page++) { console.log(`Scraping page ${page}...`); const html = await fetchPage(query, page); if (!html) break; const products = extractProducts(html); if (products.length === 0) break; all.push(...products); } return all; } function toCsv(rows) { const headers = ['title', 'price', 'currency', 'rating', 'brand', 'link']; const escape = (value) => `"${String(value).replace(/"/g, '""')}"`; const lines = [headers.join(',')]; for (const row of rows) { lines.push(headers.map((h) => escape(row[h])).join(',')); } return lines.join('\n'); } async function main() { const query = 'smartphones'; const maxPages = 3; const products = await scrapeAllPages(query, maxPages); if (products.length === 0) return; fs.writeFileSync('noon-products.json', JSON.stringify(products, null, 2)); fs.writeFileSync('noon-products.csv', toCsv(products)); console.log(`Saved ${products.length} products to JSON and CSV`); } main();
Run it with node noon-scraper.js and you get two files: noon-products.json with the full structured records and noon-products.csv ready to open in a spreadsheet. The toCsv helper quotes every field and doubles any embedded quotes, which matters here because product titles are long and frequently contain commas. Change the query to any search term and bump maxPages when you are ready for a wider pull.
What the output looks like
The JSON file holds one object per product in the order Noon returned them, each with the title, price, currency, rating, brand, and link.
[ { "title": "Galaxy S25 AI Dual SIM Silver Shadow 12GB RAM 256GB 5G", "price": "3,199", "currency": "AED", "rating": "4.5", "brand": "Samsung", "link": "https://www.noon.com/uae-en/galaxy-s25-ai-dual-sim-silver-shadow-12gb-ram-256gb-5g/N70140511V/p/" }, { "title": "A78 5G Dual SIM Glowing Black 8GB RAM 256GB", "price": "899", "currency": "AED", "rating": "4.3", "brand": "OPPO", "link": "https://www.noon.com/uae-en/a78-5g-dual-sim-glowing-black-8gb-ram-256gb/N70115717V/p/" } ]
The CSV mirrors the same rows with a header line, so it drops straight into Excel, Google Sheets, or any data pipeline that reads delimited files.
title,price,currency,rating,brand,link "Galaxy S25 AI Dual SIM Silver Shadow 12GB RAM 256GB 5G","3,199","AED","4.5","Samsung","https://www.noon.com/uae-en/galaxy-s25-ai-dual-sim-silver-shadow-12gb-ram-256gb-5g/N70140511V/p/" "A78 5G Dual SIM Glowing Black 8GB RAM 256GB","899","AED","4.3","OPPO","https://www.noon.com/uae-en/a78-5g-dual-sim-glowing-black-8gb-ram-256gb/N70115717V/p/"
From here the records feed straight into price tracking and research work. For a wider view of turning listing data into decisions, see how to use web scraping for price intelligence, and the general guide to ecommerce web scraping covers the patterns that carry across marketplaces.
Staying unblocked
Even with rendering handled, Noon watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.
- Pace your requests. Introduce a delay between page fetches rather than hammering the search in a tight loop. Spreading requests out is the single biggest factor in staying under Noon's rate limits.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a limit or a CAPTCHA. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
- Read the status codes. A run that starts returning challenges or non-200 responses is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.
Because Noon renders client-side, the same render-then-parse approach applies to other dynamic stores. The broader playbook lives in how to scrape websites without getting blocked, and if you want the underlying technique on its own, how to crawl JavaScript websites walks through rendering in more depth.
Is it legal to scrape Noon?
Whether scraping Noon is allowed depends on Noon's terms of service, your jurisdiction, and what you do with the data. Noon's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it only makes the technical part work. Read Noon's Terms of Use and its robots.txt, and treat both as the boundary for what you collect.
A few lines worth holding to. Collect only public product data: the title, price, currency, rating, brand, and product link that anyone can see on a search page without an account. Respect Noon's stated rate expectations and keep your request volume low enough that you are not straining its servers. Avoid personal data, including anything tied to identifiable reviewers beyond the public review text and star counts shown on the page. Do not redistribute Noon's copyrighted media, such as product photography, as if it were your own. If you plan to reuse the data commercially, get permission or an official agreement rather than assuming silence is consent.
This guide is deliberately scoped to public search and listing data because that is the line that keeps the work defensible. It does not cover anything behind a login, customer or seller personal data, order history, or any attempt to bypass authentication or a CAPTCHA you were not meant to pass. If Noon or one of its partners offers a sanctioned data feed or an official API for your use case, that is the right tool when you need large volumes, guaranteed structure, or commercial rights. If your project needs more than public listings, an official agreement is the correct path, not a cleverer scraper.
Key takeaways
- Noon renders listings client-side and blocks hard. A plain request returns an empty shell or a CAPTCHA, so you must render the page and wait for its AJAX content behind a trusted IP before you parse it.
-
The Crawling API does it in one call. Pass
ajax_wait: 'true'andpage_waitso the dynamic cards load, and the API rotates residential IPs and handles CAPTCHAs server-side; addautoparse: 'true'if you want JSON instead of raw HTML. -
cheerio extracts the fields. Select every
span.productContainer, then read title, price, currency, rating, brand, and link, preferring the stabledata-qaattributes since the hashed class names drift. -
Pagination is a page parameter. Loop the
pagequery value and stop when a page returns zero products, keepingmaxPagesmodest while you test because each rendered request costs more credits. - Stay on public data. Respect Noon's ToS and robots.txt, pace your requests, avoid personal data and login-walled content, and prefer an official feed for volume or commercial use.
Frequently Asked Questions (FAQs)
What data can I scrape from a Noon search page?
The public fields on each search card: the product title, the numeric price, the currency, the star rating when the item has reviews, the brand, and the link to the product page. This guide reads exactly those fields from every span.productContainer in the grid. Anything behind a login, such as account details or order history, is out of scope and is not public data.
Why does a plain request return incomplete data from Noon?
Because Noon renders its product grid client-side with JavaScript and loads the cards over AJAX, then challenges automated traffic with CAPTCHAs. A raw HTTP request from a datacenter IP usually returns an empty shell or a block page rather than the product cards. To get a complete page you have to render it, wait for the AJAX content, and request it behind a trusted IP, which is what the Crawling API handles for you.
What do ajax_wait and page_wait do?
They control how long the Crawling API waits before capturing the HTML. Setting ajax_wait: 'true' holds the request until background fetches settle, and page_wait adds a fixed delay in milliseconds so slow-loading cards have time to paint. Both matter on Noon because the product data arrives after the initial page load, not in the first response.
How do I handle pagination on Noon?
Noon paginates its search with a page query parameter, so you increment it and fetch each page in turn. The scrapeAllPages function in this guide loops from page 1 up to a maxPages limit and stops the moment a page returns zero products, which is how it detects the end of the results without guessing the page count in advance.
My selectors return empty values. What changed?
Almost certainly Noon's markup. Its hashed class names like dGLdNc are generated by the build and change without notice, so selectors that worked last month can break. Re-inspect a live page in your browser's dev tools and update the selector, and prefer the stable data-qa attributes where the page offers one. Periodic selector maintenance is normal for any production scraper.
Can I scrape customer personal data from Noon?
No, and this guide does not cover it. Customer account details, order history, and anything behind a login are not public data. Scraping login-walled content, personal data about reviewers beyond the public review text, or bypassing authentication is out of scope here and runs against Noon's terms. For sanctioned access at volume, an official data agreement is the correct route.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
