An Amazon product listing is built around its images: a main hero shot, a gallery of alternate angles, lifestyle photos, and infographics that spell out features and sizing. Those images are public, and being able to pull them in bulk is useful for catalog building, product research, competitive analysis, and feeding a dataset into image classification or visual search work.
This guide shows you how to download images from an Amazon product page with Python the reliable way. You build a small, runnable scraper that fetches a rendered product page through the Crawling API, extracts the image URLs (the main image plus the gallery and thumbnails), and saves each one to a local file. The whole walkthrough stays scoped to public product images, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.
What you will build
A Python script that takes an Amazon product URL, retrieves the rendered page through the Crawling API, collects every product image URL, and downloads each to a local folder. We will use a single product page as the running example and extract these outputs:
- Main image the primary hero image shown at the top of the listing.
- Gallery images the alternate angles and lifestyle shots behind the thumbnail strip.
- Thumbnails the small preview images that map to each gallery shot.
- Image URLs the direct, full-resolution link to each image file.
- Local files each image saved to disk under a per-product folder.
Why a plain request fails on Amazon
If you request an Amazon product URL with a bare HTTP client, you rarely get the gallery you came for. Two things work against you. First, Amazon loads much of the image gallery client-side: the high-resolution variants and the thumbnail-to-main mapping are populated by JavaScript after the initial HTML lands, so a raw request often sees only a low-resolution placeholder or an incomplete set. Second, Amazon flags automated traffic quickly. Datacenter IPs and request patterns that do not look like a real browser get challenged with a CAPTCHA or blocked before they ever reach the rendered listing.
So a working Amazon image scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real shopper. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted residential IP, and it returns finished HTML for you to parse.
Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Amazon hydrates the high-resolution gallery client-side, so the JS token gives you the complete image set. Using the normal token can return a thinner page where some gallery variants never loaded.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, the official Python docs and any beginner course will get you to the level this tutorial assumes. The companion guide on how to download images using Python covers the file-writing mechanics in more depth.
Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.
A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token from the account docs page. Your first 1,000 requests are free, and you pay only for successful ones. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create a virtual environment so project dependencies stay isolated, then install the libraries the scraper needs.
python --version python -m venv amazon_env source amazon_env/bin/activate pip install crawlbase beautifulsoup4 requests
On Windows, activate the environment with amazon_env\Scripts\activate instead of the source line. Three dependencies do the work: crawlbase is the official client for the Crawling API, beautifulsoup4 parses the returned HTML so you can pull image URLs out by selector, and requests downloads each image file to disk.
Understanding the Amazon image gallery
Amazon shows several types of images on a product page, and it helps to know them before writing selectors. The main image is the primary product shot at the top of the listing. Alternate images show different angles or features. Lifestyle images show the product in real-world use, and infographics spell out features, sizing, or specifications. All of them sit in the same gallery and share the same image host.
Before writing selectors, open a product page in your browser, right-click the main image, and choose Inspect. You will see the hero image inside a container with a stable id of landingImage (also reachable as #imgTagWrapperId img), and the thumbnail strip rendered as a list of small images with the id altImages. Amazon serves a sized variant of each image by encoding the dimensions into the filename, so a thumbnail URL and its full-resolution counterpart differ only by that size token. That detail is what lets you turn a small preview link into a high-resolution download.
Step 1: Fetch the rendered product page
Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JS token, and request the product URL. Checking the status code before you parse keeps failures loud instead of silent.
from crawlbase import CrawlingAPI api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) def crawl(page_url): options = {"ajax_wait": "true", "page_wait": 4000} response = api.get(page_url, options) if response["status_code"] == 200: return response["body"].decode("latin1") print(f"Request failed: {response['status_code']}") return None if __name__ == "__main__": product_url = "https://www.amazon.com/dp/B0CHX3QBCH" html = crawl(product_url) print(html[:500] if html else "No HTML returned")
The two wait options matter for a client-rendered gallery like this. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so the late-rendering image variants appear before the page is captured. Four seconds is a reasonable start; raise it if the gallery comes back thin. The body is decoded as latin1 because Amazon pages mix in characters that strict UTF-8 decoding can choke on. Run the script and you should see real product markup, which confirms rendering works before you write a single selector.
Amazon needs a rendered page behind a trusted IP, in one call, before its full-resolution gallery is even in the HTML. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at one product URL on the free tier first.
Step 2: Extract the image URLs
With rendered HTML in hand, load it into BeautifulSoup and pull the image links. Read the main image from #landingImage, then collect the gallery from the thumbnail strip under #altImages. Amazon stores thumbnails at a small size, so you rewrite each thumbnail URL to its full-resolution form by stripping the size token Amazon encodes into the filename.
import re from bs4 import BeautifulSoup def full_res(url): # Amazon encodes a size token like _SX466_ or _AC_US40_ into the # filename; dropping it returns the full-resolution image. return re.sub(r"\._[^.]+_\.", ".", url) def extract_image_urls(html): soup = BeautifulSoup(html, "html.parser") urls = [] main = soup.select_one("#landingImage, #imgTagWrapperId img") if main and main.get("src"): urls.append(full_res(main["src"])) for thumb in soup.select("#altImages img"): src = thumb.get("src") if src and "images/I/" in src: urls.append(full_res(src)) # De-duplicate while preserving order. seen = set() unique = [] for u in urls: if u not in seen: seen.add(u) unique.append(u) return unique
The full_res helper is the key step. Amazon serves the same image at many sizes by inserting a token such as ._SX466_ or ._AC_US40_ between the image id and the file extension; the regex removes that token so you keep the original, full-resolution file instead of a thumbnail. The main image comes from #landingImage, with #imgTagWrapperId img as a fallback, and the gallery comes from the thumbnail images under #altImages. Filtering on images/I/ in the source skips Amazon sprite and icon assets that are not product photos, and the final de-duplication pass drops the case where the main image also appears as the first thumbnail.
Amazon's gallery markup changes between page variants and over time. The #landingImage and #altImages ids are among the more durable hooks, but a given listing may render its gallery differently. When the list comes back empty, re-inspect the live product page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.
Step 3: Download each image to a local file
Now turn the URLs into files. For each image URL, request the bytes and write them to a per-product folder, deriving the filename from the image id in the URL. Routing the download through the Crawling API keeps it on the same trusted IP path as the page fetch, which avoids a separate request from your own IP getting blocked.
import os from urllib.parse import urlparse def download_images(urls, folder="amazon_images"): os.makedirs(folder, exist_ok=True) saved = [] for i, url in enumerate(urls): name = os.path.basename(urlparse(url).path) or f"image_{i}.jpg" path = os.path.join(folder, name) response = api.get(url) if response["status_code"] == 200: with open(path, "wb") as f: f.write(response["body"]) saved.append(path) print(f"Saved {path}") else: print(f"Skipped {url}: {response['status_code']}") return saved
The function creates the output folder once with exist_ok=True so a re-run does not error, then walks the URL list. The filename comes from the last path segment of the URL, which for Amazon is the unique image id, so two different gallery shots never collide on disk. Each image is written in binary mode, and a non-200 response is reported and skipped rather than crashing the run. Because the request goes back through api.get, the image fetch reuses the same trusted-IP path as the page fetch.
Step 4: Put it together
Now wire the fetch, the extraction, and the download into one runnable script.
import os import re import json from urllib.parse import urlparse from crawlbase import CrawlingAPI from bs4 import BeautifulSoup api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) def crawl(page_url): options = {"ajax_wait": "true", "page_wait": 4000} response = api.get(page_url, options) if response["status_code"] == 200: return response["body"].decode("latin1") print(f"Request failed: {response['status_code']}") return None def full_res(url): return re.sub(r"\._[^.]+_\.", ".", url) def extract_image_urls(html): soup = BeautifulSoup(html, "html.parser") urls = [] main = soup.select_one("#landingImage, #imgTagWrapperId img") if main and main.get("src"): urls.append(full_res(main["src"])) for thumb in soup.select("#altImages img"): src = thumb.get("src") if src and "images/I/" in src: urls.append(full_res(src)) seen, unique = set(), [] for u in urls: if u not in seen: seen.add(u) unique.append(u) return unique def download_images(urls, folder="amazon_images"): os.makedirs(folder, exist_ok=True) saved = [] for i, url in enumerate(urls): name = os.path.basename(urlparse(url).path) or f"image_{i}.jpg" path = os.path.join(folder, name) response = api.get(url) if response["status_code"] == 200: with open(path, "wb") as f: f.write(response["body"]) saved.append(path) print(f"Saved {path}") return saved def main(): product_url = "https://www.amazon.com/dp/B0CHX3QBCH" html = crawl(product_url) if not html: return urls = extract_image_urls(html) print(json.dumps(urls, indent=2)) download_images(urls) if __name__ == "__main__": main()
Save this as amazon_images.py, drop in your token, and run python amazon_images.py. The script prints the list of full-resolution image URLs, then downloads each into an amazon_images folder beside the script. If you are scraping the structured product fields too (title, price, ASIN, and so on) the companion guide on how to scrape Amazon product data extends this same fetch into a full record.
What the output looks like
The printed list is the set of image URLs the parser found, in gallery order, and the folder fills with one file per URL.
[ "https://m.media-amazon.com/images/I/71xKDtMfaZL.jpg", "https://m.media-amazon.com/images/I/61hr19MzN3L.jpg", "https://m.media-amazon.com/images/I/71f7tsamQ4L.jpg", "https://m.media-amazon.com/images/I/81Qj0nFLanL.jpg" ]
On disk you get the matching files, each named by its image id:
amazon_images/ 71xKDtMfaZL.jpg 61hr19MzN3L.jpg 71f7tamQ4L.jpg 81Qj0nFLanL.jpg
Scaling to many products
One product is a demo; a real job runs across a list of products. Because every step is a plain function, you scale by looping over a list of product URLs and giving each its own output folder so the files never mix. Pace the loop with a short delay so you are not hammering Amazon in a tight cycle, which is the fastest way to get throttled.
import time def scrape_many(product_urls): for url in product_urls: asin = url.rstrip("/").split("/")[-1] html = crawl(url) if not html: continue urls = extract_image_urls(html) download_images(urls, folder=f"amazon_images/{asin}") print(f"{asin}: {len(urls)} images") time.sleep(2)
Each product gets a folder named by its ASIN, the last path segment of a /dp/ASIN URL, so a batch of products stays neatly separated on disk. The time.sleep(2) between products paces the run. For deriving and validating that ASIN from any Amazon URL, the guide on how to find and scrape the Amazon ASIN goes deeper.
Staying unblocked
Even with rendering handled, Amazon watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.
-
Pace your requests. Spread requests out with a delay between products and avoid scraping the same listing on a tight loop. The
time.sleepin the batch loop is the floor, not the ceiling. - Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
- Read the status codes. A run that starts returning challenges or errors is telling you the current rate or IP tier is no longer enough. Treat that as a signal to back off, not noise to ignore.
For the broader playbook, see the guide on how to scrape websites without getting blocked. The same fetch-then-download pattern works on other image-heavy sites too; the companion walkthrough on scraping images from DeviantArt applies it to a very different gallery layout.
Is it legal to scrape Amazon images?
Whether downloading Amazon images is allowed depends on Amazon's terms of service, your jurisdiction, and what you do with the images. Amazon's Conditions of Use restrict automated access and place clear limits on reuse of its content, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Amazon's Conditions of Use and its robots.txt, and treat both as the boundary for what you collect.
The most important point is downstream use. Product images on Amazon are almost always copyrighted, owned by the brand, the seller, or the photographer, not by you and not by Amazon to license onward. Collecting public images for analysis is one thing: feeding them into image classification, building an internal catalog reference, or comparing variants across listings. Redistributing or republishing those images, passing them off as your own, or reusing them in your own storefront is a different matter and can infringe copyright. Do not redistribute copyrighted product images, and do not strip or alter watermarks. When in doubt, get permission from the rights holder.
This guide is deliberately scoped to public product images because that is the line that keeps the work defensible. It does not cover anything behind a login, account or order data, personal information, or any attempt to bypass authentication. For licensed or bulk access, Amazon offers the Product Advertising API and seller programs, and that is the right tool when you need large volumes, guaranteed structure, or commercial rights to the media. If your project needs more than public images for your own analysis, an official API or a direct agreement with the rights holder is the correct path, not a cleverer scraper.
Key takeaways
- Amazon hydrates its gallery client-side. A plain fetch often returns a thin image set, so you render the page with a JS token before you parse it.
-
One call gives you rendering and a trusted IP. The Crawling API with
ajax_waitandpage_waitwaits for the gallery to load, then returns finished HTML. -
The main image and gallery have stable hooks. Read
#landingImagefor the hero and#altImages imgfor the thumbnails, then strip Amazon's size token to get the full-resolution file. -
Download through the same API path. Routing image fetches back through
api.getkeeps them on the trusted IP and names each file by its unique image id. - Use public images for analysis only. Respect Amazon's ToS and robots.txt, do not redistribute copyrighted product images, and prefer an official API for licensed or bulk access.
Frequently Asked Questions (FAQs)
Why does a plain request return a low-resolution or incomplete gallery?
Because Amazon populates the high-resolution image variants and the thumbnail mapping with JavaScript after the initial HTML loads. A raw HTTP request sees the page before those scripts run, so it often gets a placeholder or a partial set. Rendering the page first, which is what the Crawling API's JS token does, gives you the complete gallery before you parse it.
How do I get the full-resolution image instead of a thumbnail?
Amazon encodes the requested size into the filename with a token like ._SX466_ or ._AC_US40_ between the image id and the extension. Removing that token with a small regex returns the original, full-resolution file. The full_res helper in this guide does exactly that for both the main image and every thumbnail.
Which selectors hold the Amazon product images?
The main image sits at #landingImage (also reachable through #imgTagWrapperId img), and the alternate, lifestyle, and infographic shots come from the thumbnail strip under #altImages. Filtering on images/I/ in the source skips sprites and icons. These ids are durable but not permanent, so re-inspect the live page if the list comes back empty.
Can I scrape Amazon images without getting blocked?
Keep your request rate low, add a delay between products, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation and a trusted IP pool for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing challenges.
Is it legal to download images from Amazon?
Collecting public product images for your own analysis is generally low risk, but the images themselves are usually copyrighted by the brand, seller, or photographer. Redistributing, republishing, or reusing them commercially can infringe that copyright, and automated access can run against Amazon's terms regardless. Read Amazon's Conditions of Use and robots.txt, do not redistribute copyrighted images, and prefer the official Product Advertising API for licensed access.
Can I download images for a whole list of products at once?
Yes. Because each step is a plain function, you loop over a list of product URLs, give each its own folder named by its ASIN so the files do not mix, and add a short delay between products to pace the run. The scrape_many helper in this guide is a working starting point for that batch job.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
