Walmart's catalog is one of the richest public sources of retail data on the web: every search query returns a grid of product listings with names, prices, star ratings, review counts, and links into each product page. That data drives competitor price tracking, market research, assortment analysis, and trend monitoring. The problem is that Walmart renders those listings in the browser and defends hard against automated traffic, so a plain HTTP request hands you a JavaScript shell instead of the products you came for.
This guide shows you how to scrape Walmart search results with Python the reliable way. You build a small, runnable scraper that fetches a rendered search page through the Crawling API, parses the result grid with BeautifulSoup, and pulls a clean record for each listing: product name, price, rating, review count, and product link. We keep the walkthrough scoped to public search and listing data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.
What you will build
A Python script that takes a Walmart search query, retrieves the rendered results page through the Crawling API, and extracts a structured record per product. We will use an iPhone search as the running example and pull these fields from each result card:
- Name the product title, for example "Apple iPhone 15, 128GB, Black".
- Price the listed price shown on the card.
- Rating the average star rating, when the product has one.
- Reviews the number of customer reviews behind that rating.
- Link the URL to the product's own detail page.
Why a plain fetch fails on Walmart
If you request a Walmart search URL with a bare HTTP client, you get a response with status 200 and almost none of the product data in the body. Two things work against you. First, Walmart builds its search grid client-side: the initial HTML is a shell that only fills in after the page's JavaScript runs in a browser. Second, Walmart flags automated traffic quickly. Datacenter IPs and request patterns that do not look like a real browser get challenged with a CAPTCHA or blocked before they ever reach the rendered listings.
So a working Walmart scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real shopper. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted residential IP, and it returns finished HTML for you to parse.
Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Walmart's search grid is client-side rendered, so you need the JS token here. Using the normal token returns the same empty shell a plain fetch would, and there is nothing to parse out of it.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, the official Python docs and any beginner course will get you to the level this tutorial assumes.
Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.
A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token from the account docs page. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create a virtual environment so project dependencies stay isolated, then install the two libraries the scraper needs.
python --version python -m venv walmart_env source walmart_env/bin/activate pip install crawlbase beautifulsoup4
On Windows, activate the environment with walmart_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull each field out of the result grid by CSS selector.
Understanding the Walmart search page
A Walmart search results page lays out a grid of product cards, one per listing. Each card carries the same handful of fields: a title, a price, a star rating with a review count, and a link to the product's detail page. Below the grid sit pagination controls that let you walk through additional result pages for the same query.
Before writing selectors, open a search page in your browser, right-click a product card, and choose Inspect. You will see each card wrapped in a container with stable data attributes, and the title, price, and rating exposed through data-automation-id markers. Those attributes are what you target. Walmart's utility class names change often, but the automation IDs are more durable, so lean on them where you can.
Step 1: Fetch the rendered search page
Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JS token, build the search URL from a query, and request it. Checking the status code before you parse keeps failures loud instead of silent.
from crawlbase import CrawlingAPI api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"}) def crawl(page_url): options = {"ajax_wait": "true", "page_wait": 5000} response = api.get(page_url, options) if response["status_code"] == 200: return response["body"].decode("latin1") print(f"Request failed: {response['status_code']}") return None if __name__ == "__main__": query = "iPhone" search_url = f"https://www.walmart.com/search?q={query}" html = crawl(search_url) print(html[:500] if html else "No HTML returned")
The two wait options matter for a client-rendered target like this. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so the late-rendering grid appears before the page is captured. Five seconds is a reasonable start; raise it if cards come back missing. The body is decoded as latin1 because Walmart pages mix in characters that strict UTF-8 decoding can choke on. Run the script and you should see real product markup, not the empty shell a plain fetch returns. That confirms rendering works before you write a single selector.
Walmart needs a rendered page behind a trusted IP, in one call. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a search query on the free tier first.
Step 2: Parse the result grid with BeautifulSoup
With rendered HTML in hand, load it into BeautifulSoup, find every product card, and pull each field by its selector. Walmart wraps each listing in a container you can select on, and exposes the title and price through data-automation-id attributes. Read the link from the card's anchor, and parse the combined rating and review string into two values. Wrap each card in a try/except so one malformed listing does not crash the run.
from bs4 import BeautifulSoup def text_of(card, selector): el = card.select_one(selector) return el.get_text(strip=True) if el else None def parse_rating(card): el = card.select_one("span.w_iUH7") if not el: return None, None text = el.get_text(strip=True) rating = text.split(" out of")[0] if "out of" in text else None reviews = None if "reviews" in text: reviews = text.split("Stars.")[-1].replace("reviews", "").strip() return rating, reviews def scrape_results(html): soup = BeautifulSoup(html, "html.parser") cards = soup.select("div[data-item-id]") results = [] for card in cards: try: rating, reviews = parse_rating(card) link = card.select_one("a[href]") results.append({ "name": text_of(card, 'span[data-automation-id="product-title"]'), "price": text_of(card, 'div[data-automation-id="product-price"] span.f2'), "rating": rating, "reviews": reviews, "link": link["href"] if link else None, }) except Exception as e: print(f"Skipped a card: {e}") return results
The text_of helper queries a single element inside one card and returns None when the element is missing, instead of throwing on a .get_text() call against nothing. That keeps extraction resilient when a field is absent, which is common since not every listing carries a rating. The price is read from the whole-number span inside the price block, and the product link comes from the card's anchor href rather than its text. The parse_rating helper splits Walmart's combined "4.2 out of 5 Stars. 3244 reviews" string into a numeric rating and a review count so you store them as separate fields.
Walmart's utility class names (the span.f2 price span, the span.w_iUH7 rating string) change without notice, and the result-card container attributes shift occasionally too. Treat the selectors above as a starting template, not a contract. When a field comes back as None for every card, re-inspect the live search page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.
Step 3: Put it together
Now wire the fetch and the parse into one runnable script. Fetch the rendered search page, hand it to the parser, and print the structured records.
import json from crawlbase import CrawlingAPI from bs4 import BeautifulSoup api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"}) def crawl(page_url): options = {"ajax_wait": "true", "page_wait": 5000} response = api.get(page_url, options) if response["status_code"] == 200: return response["body"].decode("latin1") print(f"Request failed: {response['status_code']}") return None def text_of(card, selector): el = card.select_one(selector) return el.get_text(strip=True) if el else None def parse_rating(card): el = card.select_one("span.w_iUH7") if not el: return None, None text = el.get_text(strip=True) rating = text.split(" out of")[0] if "out of" in text else None reviews = None if "reviews" in text: reviews = text.split("Stars.")[-1].replace("reviews", "").strip() return rating, reviews def scrape_results(html): soup = BeautifulSoup(html, "html.parser") cards = soup.select("div[data-item-id]") results = [] for card in cards: try: rating, reviews = parse_rating(card) link = card.select_one("a[href]") results.append({ "name": text_of(card, 'span[data-automation-id="product-title"]'), "price": text_of(card, 'div[data-automation-id="product-price"] span.f2'), "rating": rating, "reviews": reviews, "link": link["href"] if link else None, }) except Exception as e: print(f"Skipped a card: {e}") return results def main(): query = "iPhone" search_url = f"https://www.walmart.com/search?q={query}" html = crawl(search_url) if not html: return data = scrape_results(html) print(json.dumps(data, indent=2)) if __name__ == "__main__": main()
What the output looks like
Run the full script with python scraper.py and you get a clean list of records, one per result, ready to write to JSON, CSV, or a database.
[ { "name": "AT&T Apple iPhone 11, 64GB, Black - Prepaid Smartphone", "price": "399", "rating": "3.8", "reviews": "202", "link": "https://www.walmart.com/ip/..." }, { "name": "Straight Talk Apple iPhone 11, 64GB, Black", "price": "249", "rating": "4.2", "reviews": "3244", "link": "https://www.walmart.com/ip/..." } ]
Handling pagination across result pages
One page is a demo; a real job runs across every page of results for a query. Walmart paginates search with a &page= parameter, so you walk pages by incrementing it and stopping when a page returns no cards. That avoids hardcoding a page count and naturally handles queries with only a few results.
import time def scrape_all_pages(query, max_pages=5): base = f"https://www.walmart.com/search?q={query}" all_results = [] for page in range(1, max_pages + 1): page_url = f"{base}&page={page}" html = crawl(page_url) if not html: break results = scrape_results(html) if not results: break all_results.extend(results) print(f"Page {page}: {len(results)} products") time.sleep(2) return all_results
The max_pages cap keeps a run bounded so a broad query does not spin forever, and the empty-results break stops you early when Walmart runs out of pages. The time.sleep(2) between pages paces requests so you are not hammering search in a tight loop, which is the fastest way to get throttled. Tune both to your volume and the rate limits below.
Staying unblocked
Even with rendering handled, Walmart watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.
-
Pace your requests. Spread requests out with a delay between pages and vary your queries instead of crawling one term at full speed. The
time.sleepin the pagination loop is the floor, not the ceiling. - Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
- Read the status codes. A run that starts returning challenges or errors is telling you the current rate or IP tier is no longer enough. Treat that as a signal to back off, not noise to ignore.
For the broader playbook, see how to bypass captchas while web scraping, and if you want to scrape a single listing in depth after collecting search links, the companion guide on scraping a Walmart product page with Selenium picks up there. For numbers on how a managed API holds up against a raw proxy pool on this exact target, the Walmart scraping proxies benchmark is worth a read. If you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart AI Proxy (also called the AI Proxy) gives you the same residential IP rotation as a drop-in proxy endpoint.
Is it legal to scrape Walmart?
Whether scraping Walmart is allowed depends on Walmart's terms of service, your jurisdiction, and what you do with the data. Walmart's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Walmart's Terms of Use and its robots.txt, and treat both as the boundary for what you collect.
A few lines worth holding to. Collect only public data: the product names, prices, ratings, review counts, and listing links that anyone can see on a search results page without an account. Respect Walmart's stated rate expectations and keep your request volume low enough that you are not straining its servers. Avoid personal data, including anything tied to identifiable shoppers, reviewers, or sellers beyond what is publicly listed on a results page. If you plan to reuse the data commercially, get permission or an official agreement rather than assuming silence is consent.
This guide is deliberately scoped to public search and listing pages because that is the line that keeps the work defensible. It does not cover anything behind a login, account or order data, personal information, payment or checkout flows, or any attempt to bypass authentication. For licensed or bulk access, Walmart offers official APIs and partner programs, and that is the right tool when you need large volumes, guaranteed structure, or commercial rights. If your project needs more than public listings, an official API or a data agreement is the correct path, not a cleverer scraper.
Key takeaways
- Walmart search is client-side rendered. A plain fetch returns an empty shell, so you must render the page before you parse it.
-
You need rendering and a trusted IP together. The Crawling API with a JS token does both in one call;
ajax_waitandpage_waitcontrol how long it waits for the grid. - BeautifulSoup does the extraction. Loop the result cards and map name, price, rating, reviews, and link to current selectors, and expect those selectors to drift.
-
Paginate with the page parameter. Walk
&page=until a page returns no cards, pace requests with a delay, and cap the page count. - Stay on public data. Respect Walmart's ToS and robots.txt, prefer an official Walmart API for licensed or bulk data, and never touch accounts, orders, or personal information.
Frequently Asked Questions (FAQs)
Why does a plain fetch return no products from Walmart?
Because Walmart builds its search grid client-side with JavaScript. The initial HTML is a shell that only fills in after the page's scripts run in a browser, so a raw HTTP request returns status 200 with the listings blank. To get real data you have to render the page first, which is what the Crawling API's JS token handles for you.
Do I need the normal token or the JS token for Walmart?
The JS token. The normal token fetches static HTML, which on Walmart is the same empty shell a plain fetch returns. The JS token renders the page in a real browser before handing back the HTML, so the product cards are present when BeautifulSoup parses them.
How do I scrape every page of a Walmart search?
Walmart paginates with a &page= parameter on the search URL. Increment it in a loop, scrape each page with the same parser, and stop when a page returns no cards. Cap the page count and add a short delay between requests so you pace the run and do not get throttled.
My selectors return None. What changed?
Almost certainly Walmart's markup. Its utility class names like span.f2 for price and span.w_iUH7 for the rating string change without notice, and the card container attributes shift occasionally too. Re-inspect a live search page in your browser's dev tools, prefer the more durable data-automation-id markers where you can, and update the selectors. Periodic maintenance is normal for any production scraper.
Can I scrape order, account, or checkout data from Walmart?
No, and this guide does not cover it. Order history, account details, and checkout flows sit behind a login, so they are not public data. Scraping login-walled content, or bypassing authentication to reach it, is out of scope here and runs against Walmart's terms. For sanctioned access to richer data, the correct route is an official Walmart API or a partner agreement.
How do I avoid getting blocked while scraping Walmart?
Keep your per-IP request rate low, add a delay between pages, vary your queries instead of looping one term, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation and a trusted IP pool for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing challenges.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
