Walmart runs one of the largest retail catalogs on the web, and almost all of it is public: search results, product detail pages, customer reviews, best-seller rankings, and the sponsored placements scattered through both search and category pages. That data feeds price tracking, competitor research, assortment analysis, and demand forecasting, which is why developers keep reaching for it. The problem is getting it out cleanly, because Walmart renders most of a page in the browser and challenges automated traffic hard.

This guide is the roadmap for the whole Walmart cluster. It maps the data surfaces you can scrape, explains why a plain HTTP request comes back nearly empty, and walks one runnable end-to-end example in JavaScript and Node.js: a Walmart search results page parsed into structured JSON with cheerio, fetched through the Crawling API. It stays scoped to public product data, points you to the deeper per-topic guides for prices, reviews, best sellers, and ads, and ends with an honest read on the legal side. Read that section before you point this at real volume.

The Walmart data surfaces you can scrape

Before any code, it helps to know what is actually on the table. Walmart exposes several distinct public surfaces, and each one is a different scraping job with its own layout and its own selectors. Treat these as the map for the cluster:

  • Search results (SERP) the list view you get from a query like "iphone 14 pro". Each card carries a title, price, rating, review count, delivery message, and a sponsored flag. This is the running example below, and the deeper walkthrough lives in scrape Walmart search with Python.
  • Product pages the full detail page for a single item: long description, specifications, every image, seller, and the live price. The dedicated guide is scrape a Walmart product page with Selenium.
  • Prices the single field most teams want on a schedule, for competitive monitoring and repricing. See how to scrape Walmart prices easily.
  • Reviews the public rating distribution and individual review text on a product page, useful for sentiment and quality signals. See the Walmart reviews scraping guide.
  • Best sellers Walmart's ranked category lists, a fast read on what is moving. See scrape Walmart best sellers.
  • Sponsored ads the paid placements inside search and category results, worth isolating for ad-intelligence work. See Walmart sponsored ads scraping.

The technique is the same across all of them: render the page, then parse the fields you want with CSS selectors. The example here uses search results because the card layout shows almost every field type Walmart has, so once you can read a SERP card you can read the rest.

What you will build

A Node.js script that takes a public Walmart search URL, retrieves the rendered HTML through the Crawling API, and extracts a structured record for each product on the results page. We use the query "iphone 14 pro" as the running example and pull these fields per item:

  • Title the product name, for example "Straight Talk Apple iPhone 14 Pro Max, 128GB, Silver".
  • Price the listed price as shown, like "1,099".
  • Currency the currency symbol, for instance "$".
  • Review star the rating string, such as "4.4 out of 5 Stars. 31 reviews".
  • Reviews count the number of reviews on the card.
  • Delivery message the fulfillment line, like "Free shipping, arrives in 3+ days".
  • Product badge a card badge such as "Popular pick" when present.
  • Inventory status "In Stock" or "Out of stock".
  • Is sponsored a boolean flag for paid placements.
  • Image the product thumbnail URL.

Why a plain request fails on Walmart

If you request a Walmart search URL with a bare HTTP client, you get a response that is missing almost everything you came for. Two forces work against you. First, Walmart renders prices, ratings, delivery messages, and most of the card detail in the browser with JavaScript, so the initial HTML is a thin shell until the page's scripts run. Second, Walmart flags automated traffic quickly: datacenter IPs and request patterns that do not look like a real browser get challenged, rate-limited, or blocked before they ever reach the rendered content.

So a working Walmart scraper needs two things in a single request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into one call: you send it the URL, it renders the page behind a trusted IP, and it returns finished HTML for you to parse. For the proxy side of this trade-off specifically, the Walmart scraping proxies benchmark goes deeper.

Normal vs JavaScript requests

Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript token renders the page in a real browser first. Walmart loads key card fields client-side, so the JavaScript token gives you the most complete page. The normal token can return a partial result with prices or ratings missing, leaving you nothing reliable to parse. JavaScript requests cost more credits, so reach for them where the page genuinely needs rendering.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic JavaScript and Node.js. You should be comfortable writing and running a Node script and installing packages with npm. If you are new to Node, the official docs and any beginner course will get you to the level this tutorial assumes. For a fuller walkthrough, see our guide on how to build a web scraper with Node.js.

Node.js 16 or later. Confirm your version with node --version. If you do not have it, install it from the Node.js website or through a version manager like nvm.

A Crawlbase account and token. Sign up, open your dashboard, and copy your token from the account docs page. You get 1,000 free requests on signup with no card, which is plenty for this project. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a project folder, initialize it, and install the two libraries the scraper needs.

bash
node --version

mkdir walmart-scraper && cd walmart-scraper
npm init -y

npm install crawlbase cheerio

Two dependencies do the work: crawlbase is the official Node client for the Crawling API, and cheerio parses the returned HTML with a jQuery-style API so you can pull out individual fields by CSS selector. If selectors are new to you, the primer on XPath and CSS selectors is a good companion. The legacy version of this project also used Express to expose a /scrape endpoint, which is a fine pattern if you want to call the scraper over HTTP, but it is optional and we keep the core script self-contained here.

Step 1: Fetch the rendered search page

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your token, and request the search URL. Checking the status code before you parse keeps failures loud instead of silent.

javascript
const { CrawlingAPI } = require('crawlbase');

const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });

async function crawl(pageUrl) {
  const response = await api.get(pageUrl);
  if (response.statusCode === 200) {
    return response.body;
  }
  console.error(`Request failed: ${response.statusCode}`);
  return null;
}

const searchUrl = 'https://www.walmart.com/search?q=iphone+14+pro';
crawl(searchUrl).then((html) => {
  console.log(html ? html.slice(0, 500) : 'No HTML returned');
});

Run the script with node scraper.js and you should see real Walmart markup, not a stripped-down shell. That confirms rendering and IP rotation are working before you write a single selector. If the body comes back thin, the page needed a JavaScript request: switch to your JavaScript token, which renders the page in a real browser before returning the HTML.

Crawlbase Walmart Scraper

That first call just rendered a Walmart search page behind a trusted IP without you touching a browser or a proxy. The Crawling API runs the page in a real browser, rotates through residential IPs server-side, and hands back finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a public search page on the free tier first.

Step 2: Parse each product with cheerio

With rendered HTML in hand, load it into cheerio and walk the result cards. Walmart lays each search result out in a repeating list-view block, so you select every card, then read the fields from inside it. The selectors below come straight from a live Walmart SERP. Reading each field defensively keeps one missing value from crashing the run.

javascript
const cheerio = require('cheerio');

function parseProductsFromHTML(html) {
  const $ = cheerio.load(html);
  const products = [];

  $('div[role="group"] div[data-testid="list-view"]').each((_, element) => {
    const el = $(element);
    const title = el.find('[data-automation-id="product-title"]').text();
    if (!title) return;

    products.push({
      title,
      image: el.find('[data-testid="productTileImage"]').attr('src'),
      price: el.find('[data-automation-id="product-price"] .lh-copy span.f2').text(),
      currency: el.find('[data-automation-id="product-price"] .f6.f5-l:first').text(),
      reviewsCount: el.find('[aria-hidden=true].f7').text(),
      reviewStar: el.find('.flex.items-center.mt2 .w_iUH7').text(),
      deliveryMessage: el.find('[data-automation-id="fulfillment-badge"]').text().trim(),
      productBadge: el.find('.tag-leading-badge').text(),
      inventoryStatus: el.find('[data-automation-id="inventory-status"]').text() || 'In Stock',
      isSponsored: el.find('.lh-title > .gray.f7').text() ? true : false,
    });
  });

  return { products, productsCount: products.length };
}

This walks every list-view card on the page and reads each field by its selector: title, image, price, currency, reviewsCount, reviewStar, deliveryMessage, productBadge, inventoryStatus, and the isSponsored flag. The inventory status falls back to "In Stock" when no out-of-stock marker is present, and isSponsored resolves to a boolean by checking for the small sponsored label inside the card. The same loop is what you would adapt to isolate paid placements for ad work, since the sponsored flag is already in the record.

Selectors drift

Walmart's class names and data-testid markers change without notice, and they differ between search, product, and category pages. Treat the selectors above as a starting template, not a contract. When a field comes back empty, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.

Step 3: Put it together

Now wire the fetch and the parse into one runnable script. Fetch the rendered HTML, hand it to the parser, and print the structured records as JSON.

javascript
const { CrawlingAPI } = require('crawlbase');
const cheerio = require('cheerio');

const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });

async function crawl(pageUrl) {
  const response = await api.get(pageUrl);
  if (response.statusCode === 200) return response.body;
  console.error(`Request failed: ${response.statusCode}`);
  return null;
}

function parseProductsFromHTML(html) {
  const $ = cheerio.load(html);
  const products = [];
  $('div[role="group"] div[data-testid="list-view"]').each((_, element) => {
    const el = $(element);
    const title = el.find('[data-automation-id="product-title"]').text();
    if (!title) return;
    products.push({
      title,
      image: el.find('[data-testid="productTileImage"]').attr('src'),
      price: el.find('[data-automation-id="product-price"] .lh-copy span.f2').text(),
      currency: el.find('[data-automation-id="product-price"] .f6.f5-l:first').text(),
      reviewsCount: el.find('[aria-hidden=true].f7').text(),
      reviewStar: el.find('.flex.items-center.mt2 .w_iUH7').text(),
      deliveryMessage: el.find('[data-automation-id="fulfillment-badge"]').text().trim(),
      productBadge: el.find('.tag-leading-badge').text(),
      inventoryStatus: el.find('[data-automation-id="inventory-status"]').text() || 'In Stock',
      isSponsored: el.find('.lh-title > .gray.f7').text() ? true : false,
    });
  });
  return { products, productsCount: products.length };
}

async function main() {
  const searchUrl = 'https://www.walmart.com/search?q=iphone+14+pro';
  const html = await crawl(searchUrl);
  if (!html) return;
  const data = parseProductsFromHTML(html);
  console.log(JSON.stringify(data, null, 2));
}

main();

Once a URL is provided, the script renders the page through the Crawling API, parses every product card, and returns a clean object with the product list and a count. That is the whole loop, from URL to structured data, in one file.

What the output looks like

Run the full script with node scraper.js and you get a clean object: a products array, one entry per card, plus a productsCount. It is ready to write to JSON, CSV, or a database.

json
{
  "products": [
    {
      "title": "Straight Talk Apple iPhone 14 Pro Max, 128GB, Silver- Prepaid Smartphone",
      "image": "https://i5.walmartimages.com/seo/Straight-Talk-Apple-iPhone-14-Pro-Max.jpeg",
      "price": "1,099",
      "currency": "$",
      "reviewsCount": "31",
      "reviewStar": "4.4 out of 5 Stars. 31 reviews",
      "deliveryMessage": "Free shipping, arrives in 3+ days",
      "productBadge": "Popular pick",
      "inventoryStatus": "In Stock",
      "isSponsored": true
    },
    {
      "title": "Restored Apple iPhone 14 Pro 128GB Deep Purple (Unlocked) Used Excellent",
      "image": "https://i5.walmartimages.com/asr/1385d15c-17b0-4392-8fc1-414cae1a51ed.jpeg",
      "price": "899",
      "currency": "$",
      "reviewsCount": "5",
      "reviewStar": "4.2 out of 5 Stars. 5 reviews",
      "deliveryMessage": "Free shipping, arrives in 3+ days",
      "productBadge": "",
      "inventoryStatus": "In Stock",
      "isSponsored": false
    }
  ],
  "productsCount": 2
}

Scale across result pages

One page of results is a demo; a real job walks the pagination. Walmart structures its search URLs with a page parameter, so you can build each page URL in a loop, fetch it through the Crawling API, parse it with the same function, and collect the rows. A typical sequence looks like this:

  • https://www.walmart.com/search?q=iphone+14+pro
  • https://www.walmart.com/search?q=iphone+14+pro&page=2
  • https://www.walmart.com/search?q=iphone+14+pro&page=3
javascript
async function scrapePages(query, totalPages) {
  const all = [];
  for (let page = 1; page <= totalPages; page++) {
    const url =
      `https://www.walmart.com/search?q=${encodeURIComponent(query)}&page=${page}`;
    const html = await crawl(url);
    if (html) all.push(...parseProductsFromHTML(html).products);
  }
  return all;
}

scrapePages('iphone 14 pro', 3).then((rows) => {
  console.log(`Collected ${rows.length} products`);
});

From here you can scale to thousands of search pages and store the output in a database or the cloud. To enrich each row with full detail (long description, every image, the complete review breakdown), take a product's link and fetch that individual product page through the same crawl function, then write a small parser for the product layout. The pattern is identical: render, then parse. The product page guide and the reviews guide pick up exactly there.

Staying unblocked

Even with rendering handled, Walmart watches for scraper-shaped traffic. There is a real risk of blocks if you run without a large, rotating IP pool, and building one yourself is both time-consuming and costly. A few habits keep a run healthy, and they apply to any hard commercial target.

  • Pace your requests. Hammering pages in a tight loop is the fastest way to get throttled. Spread requests out and vary your queries instead of crawling one path at full speed.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
  • Read the status codes. A run that starts returning challenges or errors is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.

For the broader playbook, see how to scrape websites without getting blocked. If you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart AI Proxy gives you the same residential IP rotation as a drop-in endpoint. Walmart is also a frequent target for broader ecommerce web scraping, where the same render-then-parse pattern carries across sites, and the resulting price feeds plug straight into price intelligence work.

Whether scraping Walmart is allowed depends on Walmart's terms of service, your jurisdiction, and what you do with the data. Walmart's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Walmart's Terms of Use and its robots.txt, and treat both as the boundary for what you collect.

A few lines worth holding to. Collect only public product data: title, price, rating, review count, delivery message, the badge, inventory status, and the sponsored flag that anyone can see without an account. Respect Walmart's stated rate expectations and keep your request volume low enough that you are not straining its servers. Avoid personal data, including anything tied to identifiable reviewers beyond the public review text itself, and do not redistribute copyrighted media such as product images as if they were your own.

For volume or commercial use, prefer an official channel. Walmart runs a developer platform and affiliate and marketplace APIs, and those are the right tools when you need large volumes, guaranteed structure, or commercial rights. This guide is deliberately scoped to public search and product pages because that is the line that keeps the work defensible. It does not cover anything behind a login, customer or seller account data, order history, private messages, or any attempt to bypass authentication. If your project needs more than public product listings, Walmart's official APIs or a data agreement are the correct path, not a cleverer scraper.

Recap

Key takeaways

  • Walmart has several public surfaces. Search, product pages, prices, reviews, best sellers, and sponsored ads each parse the same way, render then read fields by selector, and each has its own deeper guide in this cluster.
  • A plain request returns a shell. Walmart renders cards client-side and challenges bots, so you must render the page behind a trusted IP before you parse it.
  • The Crawling API does both in one call. It renders the page and rotates residential IPs server-side, so you skip running a headless fleet and a proxy pool; JavaScript requests cost more credits, so use them where the page needs them.
  • cheerio does the extraction. Select every list-view card, then map title, price, currency, rating, delivery, badge, inventory, and the sponsored flag to current selectors, and expect those selectors to drift.
  • Stay on public data. Respect Walmart's ToS and robots.txt, pace your requests, prefer official APIs for volume or commercial use, and never touch logins, account data, or personal information.

Frequently Asked Questions (FAQs)

Why does a plain request return incomplete data from Walmart?

Because Walmart renders prices, ratings, delivery messages, and most of the card detail client-side with JavaScript. The initial HTML is a thin shell until the page's scripts run in a browser, so a raw HTTP request returns key fields missing or blank. To get a complete page you have to render it first, which is what the Crawling API handles, and the JavaScript token forces a full browser render when a page needs it.

Do I need the normal token or the JavaScript token for Walmart?

Start with the normal token and check the body. If prices, ratings, or other card fields come back empty, switch to the JavaScript token, which renders the page in a real browser before returning the HTML. JavaScript requests cost more credits, so use the normal token where it is enough and reserve the JavaScript token for pages that genuinely need rendering.

How do I scrape the next pages of Walmart search results?

Walmart structures its search URLs with a page parameter, so appending &page=2, &page=3, and so on walks through the result pages. Build each page URL in a loop, fetch it through the Crawling API, and run the same parser on each page. Because every results page shares the same card structure, the parser you already wrote works across all of them without changes.

My selectors return empty strings. What changed?

Almost certainly Walmart's markup. Its class names and data-testid markers change without notice, and they differ between search, product, and category pages, so selectors that worked last month can break. Re-inspect a live page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.

Can I scrape customer or seller personal data from Walmart?

No, and this guide does not cover it. Account details, order history, and private messages sit behind a login, so they are not public data. Public review text is visible on a product page, but you should still avoid tying it to identifiable individuals. Scraping login-walled content, personal data, or bypassing authentication to reach it is out of scope here and runs against Walmart's terms.

Where do I go next for prices, reviews, best sellers, or ads?

This roadmap covers search results end to end; the cluster has a dedicated guide for each other surface. See scraping Walmart prices, the reviews scraping guide, best sellers, and sponsored ads. Each uses the same render-then-parse approach on a different Walmart page.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available