Walmart is one of the largest online retailers in the world, and the product data it surfaces, titles, prices, ratings, and availability, is genuinely useful if you are doing price research, monitoring a market, or building a retail product. The catch is that Walmart renders much of its content client-side and defends hard against bots, so a plain HTTP fetch hands you an empty shell or a block page. This guide shows you how to scrape a Walmart product page with Selenium: a small, runnable Python build that drives headless Firefox, routes its traffic through the Crawlbase Smart AI Proxy so the request reads as a real visitor, extracts the public product fields, and writes them to disk.
To keep this honest and defensible, the whole walkthrough is scoped to public data: the product title, price, rating, and review count that anyone can see without logging in. It does not touch user accounts, login-walled content, checkout actions, or personal data. The legality section near the end is not boilerplate, so read it before you point this at production volume.
Why use Selenium and a proxy together
Selenium is a browser automation tool. It drives a real browser programmatically, so it runs the page's JavaScript and sees the same rendered DOM a human would. That solves the rendering half of the problem: Walmart populates its product details client-side, and Selenium waits for those elements to appear before you read them. What Selenium does not solve is the network half. By default it sends requests from your own IP, and Walmart flags datacenter and repeat-visitor traffic quickly, challenging or blocking it before the page ever finishes loading.
That is the gap a proxy fills. The Crawlbase Smart AI Proxy is a single proxy endpoint that rotates requests across a pool of residential IPs server-side. You point Firefox at it once, and every request Selenium makes goes out through a fresh, real-user address. You get the rendering from Selenium and the unblocking from the proxy, each tool doing the part it is actually good at. You could assemble IP rotation yourself with a list of rotating residential proxies, but keeping that pool healthy and rotating it correctly is most of the work the Smart AI Proxy already does for you.
Keep the boundary clear in your head. Selenium renders and reads the page: it runs the JavaScript, waits for elements, and pulls the fields. The Smart AI Proxy handles the network: it rotates residential IPs so the request looks like a real visitor instead of a bot. Mixing those responsibilities up, or skipping the proxy entirely, is the most common reason a Walmart scraper returns empty fields or a block page.
What you will build
A small, runnable Python script that takes a Walmart product URL, launches headless Firefox configured to route through the Smart AI Proxy, waits for the title and price to render, extracts the public fields, retries on timeout, and prints the structured result. You can run every snippet as written; just swap in your own access token and product URL.
Set up Firefox, Python, and geckodriver
Selenium needs three things on your machine: a browser to drive, the Python bindings, and the driver that connects them. For Firefox that driver is geckodriver.
First install Mozilla Firefox from the official site if you do not already have it. Then confirm you have Python 3.8 or later.
python --version
Next download geckodriver. It is the bridge between Selenium and Firefox: head to the geckodriver releases page on GitHub, grab the build for your operating system, and extract it somewhere you can reference. Note the path, because the script needs it. Modern Selenium can often find geckodriver automatically if it is on your PATH, but passing an explicit path is the reliable default and the approach this guide uses.
Now create a virtual environment so project dependencies stay isolated, then install the libraries.
python -m venv walmart_env source walmart_env/bin/activate pip install selenium random-user-agent
On Windows, activate the environment with walmart_env\Scripts\activate instead of the source line. Two dependencies do the work: selenium drives Firefox, and random-user-agent generates realistic user-agent strings so each session looks a little different. The user agent is a small touch; the proxy does the heavy lifting on staying unblocked.
Get your Smart AI Proxy endpoint
Create a Crawlbase account and open the dashboard to find your Smart AI Proxy access token. The proxy is a single endpoint you point Firefox at, and it takes the form below.
http://[email protected]:8012
The host is smartproxy.crawlbase.com, the port is 8012, and your token goes in the user position before the @. Every request Firefox sends through this endpoint gets a rotating residential IP, so you do not manage a proxy list yourself. The free tier is enough to run this whole tutorial against a public page before you commit to a plan.
The examples below inline the token for readability, but in real code load it from an environment variable or a .env file rather than committing it. A leaked proxy token is a leaked credential, and anyone who has it can spend your quota.
Configure headless Firefox to use the Smart AI Proxy
This is the core of the setup: build a Firefox options object that runs headless, carries a random user agent, and routes every request through the Smart AI Proxy. Firefox takes proxy settings as browser preferences, so you set the proxy type to manual and point each protocol at the proxy host and port.
import selenium.webdriver as webdriver from selenium.webdriver.firefox.service import Service from selenium.webdriver.firefox.options import Options from random_user_agent.user_agent import UserAgent from random_user_agent.params import SoftwareName, OperatingSystem user_agent_rotator = UserAgent( software_names=[SoftwareName.FIREFOX.value], operating_systems=[OperatingSystem.WINDOWS.value, OperatingSystem.LINUX.value], limit=100, ) user_agent = user_agent_rotator.get_random_user_agent() firefox_options = Options() firefox_options.add_argument("--headless") firefox_options.add_argument("--no-sandbox") firefox_options.add_argument("--window-size=1420,1080") firefox_options.add_argument("--disable-gpu") firefox_options.add_argument(f"user-agent={user_agent}") proxy_host = "http://[email protected]" proxy_port = 8012 firefox_options.set_preference("network.proxy.type", 1) firefox_options.set_preference("network.proxy.http", proxy_host) firefox_options.set_preference("network.proxy.http_port", proxy_port) firefox_options.set_preference("network.proxy.ssl", proxy_host) firefox_options.set_preference("network.proxy.ssl_port", proxy_port) firefox_options.set_preference("network.http.use-cache", False)
The --headless flag runs Firefox without a visible window, which is what you want on a server and what keeps resource use low. The network.proxy.type set to 1 means "manual proxy configuration," and the lines that follow route HTTP and HTTPS (SSL) traffic through the Smart AI Proxy host and port. Disabling the cache makes sure each run fetches a fresh page rather than serving stale content. The legacy version of this setup also configured FTP and SOCKS preferences, but a Walmart product page is plain HTTPS, so those are noise you can drop.
Verify the proxy is working
Before you point anything at Walmart, confirm the traffic is actually leaving through the proxy. The simplest check is to hit a service that echoes back the requesting IP. Load httpbin.org/ip and print the body: if rendering and routing are working, you see one of the proxy's residential addresses rather than your own.
import os from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC driver_path = os.path.join(os.getcwd(), "drivers", "geckodriver") service = Service(driver_path) driver = webdriver.Firefox(service=service, options=firefox_options) driver.get("https://httpbin.org/ip") try: WebDriverWait(driver, 10).until( EC.presence_of_element_located((By.TAG_NAME, "body")) ) print(driver.find_element(By.TAG_NAME, "body").text) finally: driver.quit()
Here WebDriverWait paired with presence_of_element_located holds until the page's body element appears, up to ten seconds, so you do not read the DOM before it exists. The finally block always closes the browser session, even if the wait times out, which keeps stray Firefox processes from piling up. A successful run prints something like the following.
{ "origin": "51.15.242.202" }
If you see a residential-looking IP that is not your own, routing works and you are ready to point this at a real target. If you see your own address, the proxy preferences did not take, so re-check the host, port, and token before going further.
Selenium renders pages, it does not hide your IP. The Smart AI Proxy closes that gap as a single drop-in endpoint: point Firefox at it once and every request rotates through residential IPs server-side, so Walmart reads your scraper as a real visitor instead of a bot. No proxy list to manage, no rotation logic to write. Point it at a public product page on the free tier first.
Understand the Walmart product page
To pull fields off a Walmart product page you need to know where they live in the rendered DOM. The cleanest way to find current selectors is to open a product page in your browser, right-click the value you want, and choose Inspect. The fields this guide extracts and reasonable selectors for them are below.
-
Product title the most prominent element on the page, an
h1that carries anitemprop="name"attribute, so the XPath//h1[@itemprop="name"]targets it. -
Product price rendered inside the buy box, commonly within an element marked
itemprop="price", reachable with//span[@itemprop="price"]. - Rating the average star rating, usually exposed in an aria label or a dedicated rating element near the title.
- Review count the number of customer reviews, typically a link or span next to the rating.
Walmart changes its markup and attribute names without notice, so treat the selectors above as a starting template, not a contract. When extraction returns empty strings, re-inspect the live page in your browser's dev tools and update the selectors. This is normal maintenance for any production scraper, not a sign something is broken.
Extract the product fields
With routing verified and selectors in hand, write the function that navigates to a product URL, waits for the title and price to render, and reads the fields. A retry loop wraps the whole thing so a single timeout or transient block does not kill the run; it tries again with a fresh browser session up to a configurable limit.
from selenium.common.exceptions import TimeoutException from time import sleep TITLE_XPATH = '//h1[@itemprop="name"]' PRICE_XPATH = '//span[@itemprop="price"]' def scrape_walmart_product(url, max_retries=3, retry_delay=5): for attempt in range(1, max_retries + 1): driver = webdriver.Firefox(service=service, options=firefox_options) try: driver.get(url) WebDriverWait(driver, 15).until( EC.presence_of_element_located((By.XPATH, TITLE_XPATH)) ) title = driver.find_element(By.XPATH, TITLE_XPATH).text.strip() price = read_optional(driver, PRICE_XPATH) return {"url": url, "title": title, "price": price} except TimeoutException: print(f"Timeout on attempt {attempt} for {url}") except Exception as error: print(f"Error on attempt {attempt}: {error}") finally: driver.quit() if attempt < max_retries: print(f"Retrying in {retry_delay}s...") sleep(retry_delay) return None def read_optional(driver, xpath): try: return driver.find_element(By.XPATH, xpath).text.strip() except Exception: return None
A few decisions make this robust. The script waits on the title before reading anything, because the title is reliably present on every product page and signals the page has rendered. Price, rating, and review count are read through a small read_optional helper that returns None when an element is missing rather than throwing, since not every product carries every field. And each attempt creates and quits its own browser, so a retry starts from a clean session with a fresh proxy IP instead of reusing a poisoned one.
The full script
Here is everything wired together into one runnable file. Fill in your access token, set the geckodriver path, change the product URL, and run it.
import os import json from time import sleep import selenium.webdriver as webdriver from selenium.webdriver.firefox.service import Service from selenium.webdriver.firefox.options import Options from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium.common.exceptions import TimeoutException from random_user_agent.user_agent import UserAgent from random_user_agent.params import SoftwareName, OperatingSystem ACCESS_TOKEN = os.getenv("CRAWLBASE_PROXY_TOKEN", "YOUR_ACCESS_TOKEN") PROXY_HOST = f"http://{ACCESS_TOKEN}@smartproxy.crawlbase.com" PROXY_PORT = 8012 TITLE_XPATH = '//h1[@itemprop="name"]' PRICE_XPATH = '//span[@itemprop="price"]' def build_options(): rotator = UserAgent( software_names=[SoftwareName.FIREFOX.value], operating_systems=[OperatingSystem.WINDOWS.value, OperatingSystem.LINUX.value], limit=100, ) user_agent = rotator.get_random_user_agent() options = Options() options.add_argument("--headless") options.add_argument("--no-sandbox") options.add_argument("--window-size=1420,1080") options.add_argument("--disable-gpu") options.add_argument(f"user-agent={user_agent}") options.set_preference("network.proxy.type", 1) options.set_preference("network.proxy.http", PROXY_HOST) options.set_preference("network.proxy.http_port", PROXY_PORT) options.set_preference("network.proxy.ssl", PROXY_HOST) options.set_preference("network.proxy.ssl_port", PROXY_PORT) options.set_preference("network.http.use-cache", False) return options def read_optional(driver, xpath): try: return driver.find_element(By.XPATH, xpath).text.strip() except Exception: return None def scrape_walmart_product(url, service, options, max_retries=3, retry_delay=5): for attempt in range(1, max_retries + 1): driver = webdriver.Firefox(service=service, options=options) try: driver.get(url) WebDriverWait(driver, 15).until( EC.presence_of_element_located((By.XPATH, TITLE_XPATH)) ) return { "url": url, "title": driver.find_element(By.XPATH, TITLE_XPATH).text.strip(), "price": read_optional(driver, PRICE_XPATH), } except TimeoutException: print(f"Timeout on attempt {attempt} for {url}") except Exception as error: print(f"Error on attempt {attempt}: {error}") finally: driver.quit() if attempt < max_retries: print(f"Retrying in {retry_delay}s...") sleep(retry_delay) return None def main(): driver_path = os.path.join(os.getcwd(), "drivers", "geckodriver") service = Service(driver_path) options = build_options() product_url = "https://www.walmart.com/ip/Ozark-Trail-Basic-Mesh-Chair-Blue-Adult/577309300" result = scrape_walmart_product(product_url, service, options) if result: with open("walmart_product.json", "w") as f: json.dump(result, f, indent=2) print(json.dumps(result, indent=2)) else: print("Could not scrape the product after all retries.") if __name__ == "__main__": main()
What the output looks like
Run it with python walmart_scraper.py and you get clean structured data written to walmart_product.json and echoed to the console.
{ "url": "https://www.walmart.com/ip/Ozark-Trail-Basic-Mesh-Chair-Blue-Adult/577309300", "title": "Ozark Trail Basic Mesh Chair, Blue, Adult", "price": "$12.98" }
Add the rating and review-count selectors to the result dict the same way once you have inspected them on the live page, and each run captures the full public field set for the product. To turn this into a price-monitoring job, loop a list of product URLs through scrape_walmart_product and append each result to a list before writing the file.
Staying unblocked at volume
The Smart AI Proxy handles IP rotation for you, but a few habits keep a larger run healthy, and they apply to any hard commercial target.
- Pace your requests. Hammering Walmart in a tight loop is the fastest way to get throttled. Spread requests out and vary the products you hit instead of looping the same URL.
- Lean on rotation. The Smart AI Proxy spreads your traffic across many real-user IPs so no single address trips a rate limit. That is the part you would otherwise have to build and maintain yourself.
- Read the status codes. A run that starts returning challenges or empty pages is telling you the current rate is too high. Treat proxy status error codes as signal, not noise, and back off when you see them.
For the broader playbook, see how to scrape websites without getting blocked. If you would rather skip the headless browser entirely and let an API return parsed product data, compare this build with the Crawling API, which returns pre-parsed JSON for supported sites, or the Crawling API for rendered HTML without running Selenium yourself. The same Selenium pattern works on other retailers too; scraping Amazon by ASIN walks through a closely related job.
Is it legal to scrape Walmart?
Scraping a large commercial retailer sits in a legal gray area, and the answer to "is it allowed" depends on Walmart's terms of service, your jurisdiction, and what you do with the data. Walmart's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work.
A few lines worth holding to. Collect only public data: the title, price, rating, and review count that anyone can see without an account. Respect Walmart's robots.txt and its stated rate expectations, and keep your request volume low enough that you are not straining anyone's servers. If you plan to reuse the data commercially, get permission or an official data agreement rather than assuming silence is consent. And never collect personal data, including anything tied to individual customer accounts or reviews attributable to real people.
This guide is deliberately scoped to public product data because that is the line that keeps the work defensible. It does not cover anything behind a login, account or order data, payment or checkout actions, or any attempt to bypass authentication or a CAPTCHA tied to an account. If your project needs more than public product fields, the right move is an official API or a data agreement with Walmart, not a cleverer scraper.
Key takeaways
- Split the job. Selenium renders and reads the page; the Smart AI Proxy handles the network. Each tool does one part, and that separation is what makes the scraper reliable.
-
Route Firefox through the proxy. Set
network.proxy.typeto manual and point HTTP and SSL atsmartproxy.crawlbase.com:8012with your token, then verify againsthttpbin.org/ipbefore scraping. -
Wait, then read. Use
WebDriverWaiton the title before extracting, and read optional fields through a helper that returnsNonewhen an element is missing. - Expect selectors to drift. Walmart changes its markup without notice, so re-inspect and update XPaths when extraction returns empty strings.
- Stay on public data. Respect Walmart's ToS and robots.txt; no accounts, no personal data, no checkout or auth-bypass actions.
Frequently Asked Questions (FAQs)
Why does a plain request return no data from a Walmart product page?
Two things work against a bare HTTP request. First, Walmart renders much of its product detail client-side, so the initial HTML is a shell that only fills in after the page's JavaScript runs in a real browser. Second, Walmart flags automated traffic quickly and challenges or blocks it. Selenium solves the rendering by driving a real Firefox, and routing that browser through the Smart AI Proxy gives you an IP the site reads as a real visitor.
Do I need a proxy to scrape Walmart with Selenium?
For anything beyond a single test request, yes. Selenium renders the page, but it sends requests from your own IP by default, and Walmart throttles or blocks repeat automated traffic fast. Routing Firefox through the Smart AI Proxy rotates your requests across residential IPs server-side, so no single address trips a rate limit. It is the difference between a demo that works once and a scraper that keeps working.
How do I point Firefox at the Smart AI Proxy in Selenium?
Set the Firefox preference network.proxy.type to 1 for manual configuration, then set network.proxy.http and network.proxy.ssl to http://[email protected] with the matching _port preferences set to 8012. Pass those options when you create the driver, and every request Firefox makes goes out through the proxy. Verify it by loading httpbin.org/ip and checking the returned address is not your own.
My XPath selectors return empty strings. What changed?
Almost certainly Walmart's markup. Its attribute names and class structure change without notice, so selectors that worked last month can break. Re-inspect a live product page in your browser's dev tools, find the current attribute or element for the field you want, and update the XPath. Periodic selector maintenance is normal for any production scraper, not a sign the approach is broken.
Should I use Selenium or an API to scrape Walmart?
Use Selenium when you want full control of a real browser, need to interact with the page, or are learning how rendering and proxies fit together. If you would rather skip the headless browser, the Scraper API returns pre-parsed product JSON for supported sites, and the Crawling API returns rendered HTML in a single call without running a browser fleet yourself. For one-off or unusual layouts, the Selenium build in this guide is the flexible option.
Is it legal to scrape Walmart?
It depends on Walmart's terms of service, your jurisdiction, and your purpose, and their terms restrict automated access. Keep strictly to public product data, respect robots.txt and rate expectations, and never touch accounts, personal data, or checkout and authentication flows. For commercial reuse, get permission or an official data agreement rather than relying on a scraper.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
