How To Scrape Amazon Search Pages With Python

Scraping with Python is a popular method for automating data extraction from Amazon, allowing users to efficiently gather large volumes of product information. However, scraping Amazon’s search results at scale is a challenging feat, mainly due to the vast data, intricate web layouts, rate limitations, CAPTCHAs, and other security measures in play.

This comprehensive guide will explore how to scrape Amazon search pages at scale using Python and the Crawlbase Crawling API. In this tutorial, we will focus on scraping Amazon product data and demonstrate web scraping activity, including setting up the necessary tools and techniques. You can efficiently extract and analyze product data from Amazon by leveraging this powerful combination.

Our ready-to-use Amazon scraper is a comprehensive solution for extracting key data from Amazon. You can try it now.

Why Scrape Amazon Search Pages?
Prerequisites

Setting Up Your Development Environment
Installing Required Libraries
Creating a Crawlbase Account

Understanding Amazon Search Page Structure

Dynamic Content and JavaScript Rendering

How to Scrape Amazon Search Pages

Getting the Correct Crawlbase Token
Setting up Crawlbase Crawling API
Handling Dynamic Content
Choosing a Scraper
Handling Pagination

Final Thoughts
Frequently Asked Questions

1. Why Scrape Amazon Search Pages?

In the vast world of online shopping, Amazon stands as a giant. It boasts an unparalleled selection of products across an extensive range of categories, making it a go-to destination for shoppers worldwide. Data gathering from Amazon is crucial for business intelligence, as it enables companies to analyze trends, monitor competitors, and make data-driven decisions.

Crawlbase Python Library

To harness the power of Crawlbase Crawling API, you can use the Crawlbase Python library. This library simplifies the integration of Crawlbase into your Python projects, making it accessible to Python developers of all levels of expertise.

First, initialize the Crawling API class.

1	api = CrawlingAPI({ 'token': 'YOUR_CRAWLBASE_TOKEN' })

Pass the URL that you want to scrape by using the following function.

1	api.get(url, options = {})

You can pass any options from the ones available in the API documentation.

Example:

response = api.get('https://www.reddit.com/r/pics/comments/5bx4bx/thanks_obama/', {
    'user_agent': 'Mozilla/5.0 (Windows NT 6.2; rv:20.0) Gecko/20121202 Firefox/30.0',
    'format': 'json'
})
if response['status_code'] == 200:
    print(response['body'])

There are many other functionalities provided by Crawlbase Python library. You can read more about it here.

In the following sections, we will guide you through harnessing the capabilities of the Crawlbase Crawling API to scrape Amazon search pages effectively. We’ll use Python, a versatile programming language, to demonstrate the process step by step. Let’s explore Amazon’s wealth of information and learn how to unlock its potential.

2. Prerequisites

Before we embark on our web scraping journey, let’s ensure that you have all the necessary tools and resources ready. In this chapter, we’ll cover the prerequisites needed for successful web scraping of Amazon search pages using the Crawlbase Crawling API.

Setting Up Your Development Environment

You’ll need a suitable development environment to get started with web scraping. Here’s what you’ll require:

Python:
Python is a versatile programming language widely used in web scraping. Ensure that you have Python installed on your system. You can download the latest version of Python from the official website here.

Code Editor or IDE:
Choose a code editor or integrated development environment (IDE) for writing and running your Python code. Popular options include PyCharm, and Jupyter Notebook. You can also use Google Colab. Select the one that best suits your preferences and workflow.

Installing Required Libraries

Web scraping in Python is made more accessible using libraries that simplify tasks like making HTTP, parsing HTML, and handling data. Install the following libraries using pip, Python’s package manager:

1 2	pip install pandas pip install crawlbase

Pandas: Pandas is a powerful data manipulation library that will help you organize and analyze the scraped data efficiently.
Crawlbase: A lightweight, dependency free Python class that acts as wrapper for Crawlbase API.

Creating a Crawlbase Account

To access the Crawlbase Crawler API, you’ll need a Crawlbase account. If you don’t have one, follow these steps to create an account. Click here to create a new Crawlbase Account and fill the necessary details.

3. Understanding Amazon Search Page Structure

Before we embark on our web scraping journey, it’s essential to understand the structure of an Amazon search page. Amazon’s web pages are meticulously designed to provide a seamless shopping experience, but a wealth of valuable data lies beneath the user-friendly interface. Amazon provides the following URL for the search queries.

1 2	# Replace serch_query with your desire one https://www.amazon.com/s?k=search_query

Identifying Data Points of Interest

To scrape Amazon search pages effectively, you need to identify the specific data points you want to extract. Depending on your objectives, you might be interested in various pieces of information, including:

Product Title: The name of the product being sold.
Price: The current price of the product.
Seller Information: Details about the seller, such as their name and ratings.
Product Availability: Information about whether the product is in stock or out of stock.
Product URL: The URL that leads to the product’s page on Amazon.
Customer Ratings: Ratings and reviews provided by customers who have purchased the product.
Product Features: Key features or attributes of the product.
Shipping Information: Details about shipping options, including delivery times and costs.
Sponsored Listings: Amazon often includes sponsored listings at the top of search results. These are paid advertisements.

Like many modern websites, Amazon employs dynamic loading of content using JavaScript rendering and Ajax calls. This means some parts of the page, such as search results and filters, may not be present in the initial HTML source code. Instead, they are loaded dynamically after the page is initially loaded in the user’s browser.

4. How to Scrape Amazon Search Pages

In this section, we embark on an exciting journey to scrape Amazon search pages at scale using the Crawlbase Crawling API. For example, we will gather essential information about products related to the search query “games” on Amazon. To accomplish this, we’ll employ the Crawlbase Python library, which offers seamless integration with the Crawling API. Let’s dive into the process:

Getting the Correct Crawlbase Token

We must obtain an API token before we can unleash the power of the Crawlbase Crawling API. Crawlbase provides two types of tokens: the Normal Token (TCP) for static websites and the JavaScript Token (JS) for dynamic or JavaScript-driven websites. Given that Amazon relies heavily on JavaScript for dynamic content loading, we will opt for the JavaScript Token.

from crawlbase import CrawlingAPI

# Initialize the Crawling API with your Crawlbase JavaScript token
api = CrawlingAPI({ 'token': 'YOU_CRAWLBASE_JS_TOKEN' })

You can get your Crawlbase token here after creating account on it.

Setting up Crawlbase Crawling API

With our JavaScript token in hand, we are ready to configure the Crawlbase Crawling API. Before starting further, we need to know about the output response structure. You can receive the output response in two types: HTML or JSON. By default, Crawling API uses HTML format.

HTML response:

Headers:
  url: "The URL which was crawled"
  original_status: 200
  pc_status: 200

Body:
  The HTML of the page

To get the response in JSON format you have to pass a parameter “format” with the value “json”.

JSON Response:

{
  "original_status": "200",
  "pc_status": 200,
  "url": "The URL which was crawled",
  "body": "The HTML of the page"
}

We can read more about Crawling API response here. For the example, we will go with the default option. We’ll utilize the initialized API object to make requests. Specify the URL you intend to scrape using the api.get(url, options={}) function.

from crawlbase import CrawlingAPI

# Initialize the Crawling API with your Crawlbase token
api = CrawlingAPI({ 'token': 'YOU_CRAWLBASE_JS_TOKEN' })

# URL of the Amazon search page you want to scrape
amazon_search_url = 'https://www.amazon.com/s?k=games'

# Make a request to scrape the Amazon search page
response = api.get(amazon_search_url)

# Check if the request was successful
if response['status_code'] == 200:
    # Extracted HTML content after decoding byte data
    #latin1 will also handle chinese characters)
    html_content = response['body'].decode('latin1')

    # Save the HTML content to a file
    with open('output.html', 'w', encoding='utf-8') as file:
      file.write(html_content)
else:
    print("Failed to retrieve the page. Status code:", response['status_code'])

In the code snippet above, we are preserving the retrieved HTML content by saving it to an HTML file. This step is essential for verifying that we have successfully obtained the desired HTML data. We can preview the file and see which content is included in the crawled HTML.

output.html Preview:

As you can see above, no useful information is present in the crawled HTML. This is because Amazon loads its important content dynamically using JavaScript and Ajax.

Handling Dynamic Content

Like many modern websites, Amazon search pages employ dynamic loading of content using JavaScript rendering and Ajax calls. This dynamic behavior can pose challenges when attempting to scrape data from these pages. However, with the Crawlbase Crawling API, you can effectively overcome these challenges. We can use the following query parameters provided by Crawling API to overcome this problem.

Adding Parameters
When using the JavaScript token with the Crawlbase API, you can specify some special parameters to ensure that you capture the dynamically rendered content accurately. Here are some crucial parameters:

page_wait: This optional parameter allows you to specify the number of milliseconds to wait before the browser captures the resulting HTML code. Use this parameter in situations where a page takes time to render or when AJAX requests need to be loaded before capturing the HTML.
ajax_wait: Another optional parameter for the JavaScript token. It lets you specify whether to wait for AJAX requests to finish before receiving the HTML response. This is important when the content relies on AJAX requests.

For using these parameters in our example, we can update our code like this:

from crawlbase import CrawlingAPI

# Initialize the Crawling API with your Crawlbase token
api = CrawlingAPI({ 'token': 'YOU_CRAWLBASE_JS_TOKEN' })

# URL of the Amazon search page you want to scrape
amazon_search_url = 'https://www.amazon.com/s?k=games'

# options for Crawling API
options = {
 'page_wait': 2000,
 'ajax_wait': 'true'
}

# Make a request to scrape the Amazon search page with options
response = api.get(amazon_search_url, options)

# Check if the request was successful
if response['status_code'] == 200:
    # Extracted HTML content after decoding byte data
    html_content = response['body'].decode('latin1')

    # Save the HTML content to a file
    with open('output.html', 'w', encoding='utf-8') as file:
      file.write(html_content)
else:
    print("Failed to retrieve the page. Status code:", response['status_code'])

output.html Preview:

Crawling API provides many other important parameters. You can read about them here.

Choosing an Amazon Scraper

Crawling API provides multiple built-in scrapers for different important websites, including Amazon. You can read about the available scrapers here. The “scraper” parameter is used to parse the retrieved data according to a specific scraper provided by the Crawlbase API. It’s optional; if not specified, you will receive the full HTML of the page for manual scraping. If you use this parameter, the response will return as JSON containing the information parsed according to the specified scraper.

Example:

1 2	# Example using a specific scraper response = api.get('https://www.amazon.com/s?k=your_search_query', { 'scraper': 'scraper_name' })

One of the available scrapers is “amazon-serp”, designed for Amazon search result pages. It returns an array of products with details like name, price, customer reviews, and more. Here’s an example of the output from the “amazon-serp” scraper:

{
  "products": [
    {
      "name": "Product Name",
      "price": "$19.99",
      "rawPrice": 19.99,
      "currency": "$",
      "offer": "Offer Details",
      "customerReview": "4.5 out of 5 stars",
      "customerReviewCount": "1,234",
      "shippingMessage": "Shipping Details",
      "asin": "Product ASIN",
      "image": "Product Image URL",
      "url": "Product URL",
      "isPrime": true,
      "sponsoredAd": false,
      "couponInfo": "Coupon Details",
      "badgesInfo": ["Badge 1", "Badge 2"]
    }
    // Additional product entries...
  ],
  "resultInfo": "Result Information",
  "pagination": {
    "currentPage": 1,
    "nextPage": 2,
    "totalPages": 20
  }
}

This includes all the information we want. Since the response will be JSON this time, we will store some important information of every product object into a CSV file. So, lets add this parameter into our example and do the changes as per the response:

from crawlbase import CrawlingAPI
import pandas as pd
import json

# Initialize the Crawling API with your Crawlbase token
api = CrawlingAPI({ 'token': 'YOU_CRAWLBASE_JS_TOKEN' })

# URL of the Amazon search page you want to scrape
amazon_search_url = 'https://www.amazon.com/s?k=games'

# options for Crawling API
options = {
 'page_wait': 2000,
 'ajax_wait': 'true',
 'scraper': 'amazon-serp'
}

# Make a request to scrape the Amazon search page with options
response = api.get(amazon_search_url, options)

# Check if the request was successful
if response['status_code'] == 200:
  # Loading JSON from response body after decoding byte data
  response_json = json.loads(response['body'].decode('latin1'))

  # Getting Scraper Results
  scraper_result = response_json['body']

  # Extracting Products from the JSON response
  products = scraper_result.get("products", [])

  # Creating lists to store extracted data
  product_data = []
  for product in products:
    product_info = {
      "url": product.get("url", ""),
      "name": product.get("name", ""),
      "asin": product.get("asin", ""),
      "image": product.get("image", ""),
      "price": product.get("price", ""),
      "isPrime": product.get("isPrime", ""),
      "offer": product.get("offer", ""),
      "customerReview": product.get("customerReview", ""),
      "customerReviewCount": product.get("customerReviewCount", ""),
    }
    product_data.append(product_info)

  # Create a Pandas DataFrame from the extracted data
  df = pd.DataFrame(product_data)

  # Save the DataFrame to a CSV file
  df.to_csv("amazon_products.csv", index=False)
else:
    print("Failed to retrieve the page. Status code:", response['status_code'])

In the above code, we added the scarper in the options and then collected all the information we wanted for each product JSON object inside the response. Last, we are creating a Pandas data-frame to use its “to_csv” function to save the data in a CSV file.

amazon_products.csv Preview:

When scraping Amazon search pages, it’s crucial to handle pagination correctly to collect all the products you need. Crawlbase “amazon-serp” provides pagination information in the JSON response, including the current page, the next page, and the total number of pages.

// Example
"pagination": {
  "currentPage": 1,
  "nextPage": 2,
  "totalPages": 20
}

As you can see, the “currentPage” indicates the page you are currently on, the “nextPage” shows the page number of the next set of results, and “totalPages” tells you how many pages are available in total.

To scrape all the products, you’ll want to iterate through these pages, sending requests with the appropriate page number appended to the URL, just as Amazon does:

Page 1: https://www.amazon.com/s?k=games&page=1
Page 2: https://www.amazon.com/s?k=games&page=2
… So, on until the last page.

Lets update the example code to handle pagination and scrape all the products:

from crawlbase import CrawlingAPI
import pandas as pd
import json

# Initialize the Crawling API with your Crawlbase token
api = CrawlingAPI({ 'token': 'YOU_CRAWLBASE_JS_TOKEN' })

# URL of the Amazon search page you want to scrape
amazon_search_url = 'https://www.amazon.com/s?k=games'

# options for Crawling API
options = {
 'page_wait': 2000,
 'ajax_wait': 'true',
 'scraper': 'amazon-serp'
}

# List to store the scraped product information
product_data = []

def scrape_url(url):
    # Make a request to scrape the Amazon search page with options
    response = api.get(url, options)

    # Check if the request was successful
    if response['status_code'] == 200:
        # Loading JSON from response body after decoding byte data
        response_json = json.loads(response['body'].decode('latin1'))

        # Getting Scraper Results
        scraper_result = response_json['body']

        # Extracting Products from the JSON response
        products = scraper_result.get("products", [])

        for product in products:
            product_info = {
                "url": product.get("url", ""),
                "name": product.get("name", ""),
                "asin": product.get("asin", ""),
                "image": product.get("image", ""),
                "price": product.get("price", ""),
                "isPrime": product.get("isPrime", ""),
                "offer": product.get("offer", ""),
                "customerReview": product.get("customerReview", ""),
                "customerReviewCount": product.get("customerReviewCount", ""),
            }
            product_data.append(product_info)

        # Extract pagination information and return it
        pagination = scraper_result.get("pagination")
        return pagination

    else:
        print("Failed to retrieve the page. Status code:", response['status_code'])
        return None

# Scrape the initial page and get pagination information
pagination_info = scrape_url(amazon_search_url)

# Check if pagination information is available
if pagination_info:
    total_pages = pagination_info.get('totalPages', 1)

    # Start from page 2 since the first page is already scraped
    for page_number in range(2, total_pages + 1):
        page_url = f'{amazon_search_url}&page={page_number}'
        scrape_url(page_url)

# Create a Pandas DataFrame from the extracted data
df = pd.DataFrame(product_data)

# Save the DataFrame to a CSV file
df.to_csv("amazon_products.csv", index=False)

In this code section, we initiate the web scraping process. First, we define the Amazon search URL we want to scrape. Then, the code checks for pagination information on the initial page. If pagination is present, meaning there are multiple result pages, the code iterates through subsequent pages to scrape additional product data.

Finally, the extracted data is organized into a Pandas data-frame, allowing easy data manipulation, and the data-frame is saved to a CSV file. This code ensures you can gather a comprehensive dataset of Amazon products from search results, even if they span multiple pages.

amazon_products.csv Preview:

5. Final Thoughts

As professionals further engage in web scraping activities, adhering to ethical and technical standards remains paramount. Compliance with a website’s terms of service and robots.txt guidelines is mandatory. Moreover, ensure web scraping is used for legitimate, constructive purposes. With the knowledge acquired from this guide, professionals are well-equipped to harness the extensive data potential that Amazon’s product listings offer, driving analytical and business outcomes.

6. Frequently Asked Questions

Q: Can I scrape Amazon search pages for personal research or analysis?

Scraping Amazon search pages for personal research or analysis is generally acceptable, provided you comply with Amazon’s terms of service and respect their website’s rules. However, it’s essential to be mindful of the volume of requests you send to the website, as excessive scraping can lead to IP blocking or other measures to prevent scraping. To tackle this problem, you can consider using a Crawlbase Crawling API which allows you to scrape data from websites in a more structured and controlled manner, helping you avoid potential issues associated with excessive requests. This approach can enable you to conduct research and analysis while staying within the bounds of Amazon’s policies.

Q: Are there any rate limitations or CAPTCHAs when scraping Amazon?

Yes, Amazon employs rate limiting and CAPTCHAs to protect its website from excessive or automated access. When scraping Amazon, it’s essential to send requests at a reasonable rate and implement mechanisms to handle CAPTCHAs if they are encountered. Using a service like the Crawlbase Crawling API can help you navigate these challenges effectively.

Q: Can I scrape other e-commerce websites using similar techniques?

Yes, many e-commerce websites employ similar web scraping techniques, and the principles discussed in this guide can be applied to scrape data from other e-commerce platforms. However, keep in mind that each website may have its own policies and challenges, so it’s essential to review their terms of service and adapt your scraping approach accordingly.

Q: What are some common use cases for scraping Amazon search pages?

Common use cases for scraping Amazon search pages include market research, competitor analysis, pricing optimization, content aggregation for product review websites, and making informed investment decisions. Web scraping can provide valuable insights for e-commerce businesses, data analysts, researchers, and entrepreneurs.

How To Scrape Amazon Search Pages With Python

Table of Contents

1. Why Scrape Amazon Search Pages?

Crawlbase Python Library

2. Prerequisites

Setting Up Your Development Environment

Installing Required Libraries

Creating a Crawlbase Account

3. Understanding Amazon Search Page Structure

Identifying Data Points of Interest

4. How to Scrape Amazon Search Pages

Getting the Correct Crawlbase Token

Setting up Crawlbase Crawling API

Handling Dynamic Content

Choosing an Amazon Scraper

5. Final Thoughts

6. Frequently Asked Questions

Q: Can I scrape Amazon search pages for personal research or analysis?

Q: Are there any rate limitations or CAPTCHAs when scraping Amazon?

Q: Can I scrape other e-commerce websites using similar techniques?

Q: What are some common use cases for scraping Amazon search pages?

Hassan Rehan

Our solution

Crawling API

Similar to "How To Scrape Amazon Search Pages With Python"

Automating Amazon Scraping Tutorial

How to Scrape Amazon Images

How to Scrape Amazon Best Sellers with Python

Best Amazon Reviews Scraper in 2025

How to Scrape Amazon Prices with Python and AI

Most read from advanced web scraping tutorials

Top Scraper API Alternative in 2025 - Best Replacement

Top Challenges of Scraping Google Search Results and How to Overcome Them

What is Cloud Storage? Types and Uses of Cloud Storage

Start crawling and scraping the web today

How To Scrape Amazon Search Pages With Python

Table of Contents

1. Why Scrape Amazon Search Pages?

Crawlbase Python Library

2. Prerequisites

Setting Up Your Development Environment

Installing Required Libraries

Creating a Crawlbase Account

3. Understanding Amazon Search Page Structure

Identifying Data Points of Interest

4. How to Scrape Amazon Search Pages

Getting the Correct Crawlbase Token

Setting up Crawlbase Crawling API

Handling Dynamic Content

Choosing an Amazon Scraper

Handling Pagination

5. Final Thoughts

6. Frequently Asked Questions

Q: Can I scrape Amazon search pages for personal research or analysis?

Q: Are there any rate limitations or CAPTCHAs when scraping Amazon?

Q: Can I scrape other e-commerce websites using similar techniques?

Q: What are some common use cases for scraping Amazon search pages?

Hassan Rehan

Our solution

Crawling API

Share this post

Similar to "How To Scrape Amazon Search Pages With Python"

Most read from advanced web scraping tutorials

Start crawling and scraping the web today