How to Scrape Audible Audiobook Data

Q: How do I scrape a specific Audible category?

Every category and search has its own URL, for example /search?keywords=science+fiction for a keyword search or /search?node=... for a category node. Point the scraper at the URL you want. To cover many categories, keep a map of names to URLs and loop over it, pacing the requests with a short delay.

Q: How does pagination work on Audible?

Audible appends a ?page= parameter (or &page= when the URL already has a query string) to walk through result pages. The script increments the page number, parses each page, and stops when a page returns no product cards, which is the signal you have reached the end of the category.

Audible is Amazon's audiobook service, and its public catalog runs to hundreds of thousands of titles across every genre, each listed with its author, narrator, runtime, star rating, and price. Those search and category pages are one of the cleanest views of the audiobook market on the open web, which is why analysts, librarians, and hobbyists pull from them to track pricing, build reading lists, and study what is selling.

This guide shows you how to scrape Audible audiobook data with Python and assemble the results into a small "mini library" file. You build a runnable scraper that fetches an Audible search or category page through the Crawling API, parses a clean record per audiobook, handles pagination, and exports the set to JSON and CSV. The whole walkthrough stays scoped to public catalog data: the titles, authors, narrators, runtimes, ratings, and prices anyone can see on a results page without signing in. It never touches the audio itself.

What you will build

A Python script that takes an Audible search or category URL, retrieves the rendered listing through the Crawling API, and extracts a structured record per audiobook. We use a category results page as the running example and pull these fields from each product card:

Title the audiobook's name as shown on the listing card.
Author the writer credited on the title, "By: ..." on the card.
Narrator the voice performer, "Narrated by: ..." on the card.
Length the total runtime, for example "10 hrs and 32 mins".
Rating the average star rating when the title has one.
Price the listed price, when the card shows one.
Link the URL to the audiobook's own detail page.

Why a plain request fails on Audible

If you point a bare HTTP client at an Audible results URL, you rarely get the listing you came for. Two things work against you. First, Audible renders much of the product grid client-side: it ships a lightweight shell and fills the cards in as the page's JavaScript runs, so the initial HTML is often missing the runtimes, prices, and ratings you want. Second, as an Amazon property, Audible flags automated traffic quickly. Datacenter IP ranges and request patterns that do not look like a real browser get met with a CAPTCHA, a "robot check" interstitial, or an outright block before you reach the list.

So a working Audible scraper needs two things in one request: a browser that renders the page, and an IP that Audible reads as a real visitor. You can assemble that yourself with a headless browser and a pool of rotating residential proxies, but keeping that stack healthy is most of the work. The Crawling API folds both into a single call: you send it the results URL, it renders the page behind a trusted residential IP, handles the rotation and CAPTCHA solving, and returns finished HTML for you to parse.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, the official Python docs or any beginner course covers the level this tutorial assumes.

Python 3.8 or later. Confirm your version with python --version (or python3 --version). If you do not have it, install it from python.org and make sure Python is on your system PATH.

A Crawlbase account and token. Sign up for a free account, open your dashboard, and copy your token. The free tier includes up to 20,000 requests with no card, which is plenty to build and test this scraper. Treat the token like a password and keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the two libraries the scraper needs. crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull each field out of the listing cards by CSS selector.

bash

python --version

python -m venv audible_env
source audible_env/bin/activate

pip install crawlbase beautifulsoup4

On Windows, activate the environment with audible_env\Scripts\activate instead of the source line. With both libraries installed, create the script file the rest of the guide builds up:

bash

touch audible_library.py

Understanding the Audible results page

Audible organizes its catalog the way a library does, with overlapping categories that a title can belong to several of at once. Each category and each search lives at a stable URL, for example https://www.audible.com/search?node=18573211011 for a category node or https://www.audible.com/search?keywords=science+fiction for a keyword search. Both render the same kind of result grid: an ordered list of product cards, one per audiobook, each carrying a title, an author line, a narrator line, a runtime, a rating, and a price.

Before writing selectors, open a results page in your browser, right-click a product card, and choose Inspect. Audible wraps each result in a list item marked li.productListItem, puts the title in an h3.bc-heading link, and labels the author and narrator lines with li.authorLabel and li.narratorLabel. The runtime sits in li.runtimeLabel and the rating in li.ratingsLabel. Those are the elements you target. Audible's utility class names shift over time, so lean on the more durable label classes rather than a brittle chain of generated names.

Step 1: Fetch the rendered results page

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your token, set the results URL, and request it. Checking the status code before you parse keeps failures loud instead of silent.

python

from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 4000}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("latin1")
    print(f"Request failed: {response['status_code']}")
    return None

if __name__ == "__main__":
    results_url = "https://www.audible.com/search?keywords=science+fiction"
    html = crawl(results_url)
    print(html[:500] if html else "No HTML returned")

The two wait options matter for a grid that fills in as the page loads. ajax_wait tells the API to wait for asynchronous content to finish, and page_wait holds for a fixed number of milliseconds after load so the late-rendering cards appear before the page is captured. The body is decoded as latin1 because Audible pages mix in characters that strict UTF-8 decoding can choke on. Run the script and you should see real listing markup, not a robot-check shell. That confirms rendering works before you write a single selector.

Crawlbase Crawling API

That Audible results page needs a rendered grid behind a trusted IP, in one call. The Crawling API takes your token, runs the search page in a real browser, rotates through residential IPs server-side, and handles the CAPTCHA solving, then hands you finished HTML. You skip running a headless browser fleet and a proxy pool yourself. Point it at a search or category URL on the free tier first.

Start free

Step 2: Parse the audiobook cards with BeautifulSoup

With rendered HTML in hand, load it into BeautifulSoup, find every product card, and pull each field by its selector. Audible wraps each result in li.productListItem, puts the title in the heading link, and labels the author, narrator, runtime, and rating lines with their own classes. Read the detail-page link from the title's anchor. Wrap each card in a try/except so one malformed listing does not crash the run.

python

from bs4 import BeautifulSoup

BASE = "https://www.audible.com"

def text_of(card, selector):
    el = card.select_one(selector)
    return el.get_text(strip=True) if el else None

def strip_label(value, label):
    if value and value.startswith(label):
        return value[len(label):].strip()
    return value

def parse_link(card):
    a = card.select_one("h3.bc-heading a[href]")
    if not a:
        return None
    href = a["href"].split("?")[0]
    return href if href.startswith("http") else BASE + href

def scrape_audiobooks(html):
    soup = BeautifulSoup(html, "html.parser")
    cards = soup.select("li.productListItem")
    results = []
    for card in cards:
        try:
            author = text_of(card, "li.authorLabel")
            narrator = text_of(card, "li.narratorLabel")
            results.append({
                "title": text_of(card, "h3.bc-heading a"),
                "author": strip_label(author, "By:"),
                "narrator": strip_label(narrator, "Narrated by:"),
                "length": strip_label(text_of(card, "li.runtimeLabel"), "Length:"),
                "rating": text_of(card, "li.ratingsLabel span.bc-text"),
                "price": text_of(card, "p.buybox-regular-price span.bc-text"),
                "link": parse_link(card),
            })
        except Exception as e:
            print(f"Skipped a card: {e}")
    return results

The text_of helper queries one element inside a card and returns None when it is missing, instead of throwing on a .get_text() call against nothing. That keeps extraction resilient when a field is absent, which is common since not every title shows a price or a rating. The strip_label helper trims the By:, Narrated by:, and Length: prefixes Audible prints inside those label rows so you store clean values. The link is read from the title anchor, has its query string dropped, and is normalized to an absolute URL since Audible serves a relative href.

Selectors drift

Audible's generated utility class names change without notice, while the semantic label classes (li.authorLabel, li.narratorLabel, li.runtimeLabel, li.ratingsLabel) are more durable. Treat the selectors above as a starting template, not a contract. When a field comes back as None for every card, re-inspect the live results page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper.

Step 3: Handle pagination, assemble, and export

One page of results is a demo; a real mini library spans the whole category. Audible paginates its search and category pages with a ?page= parameter, so you walk the pages in order, parse each, and stop when a page returns no cards. Then wire the fetch and the parse into one runnable script and write the records to both JSON and CSV so you can load them into a notebook or a spreadsheet.

python

import csv
import json
import time
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})
BASE = "https://www.audible.com"
FIELDS = ["title", "author", "narrator", "length", "rating", "price", "link"]

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 4000}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("latin1")
    print(f"Request failed: {response['status_code']}")
    return None

def text_of(card, selector):
    el = card.select_one(selector)
    return el.get_text(strip=True) if el else None

def strip_label(value, label):
    if value and value.startswith(label):
        return value[len(label):].strip()
    return value

def parse_link(card):
    a = card.select_one("h3.bc-heading a[href]")
    if not a:
        return None
    href = a["href"].split("?")[0]
    return href if href.startswith("http") else BASE + href

def scrape_audiobooks(html):
    soup = BeautifulSoup(html, "html.parser")
    cards = soup.select("li.productListItem")
    results = []
    for card in cards:
        try:
            author = text_of(card, "li.authorLabel")
            narrator = text_of(card, "li.narratorLabel")
            results.append({
                "title": text_of(card, "h3.bc-heading a"),
                "author": strip_label(author, "By:"),
                "narrator": strip_label(narrator, "Narrated by:"),
                "length": strip_label(text_of(card, "li.runtimeLabel"), "Length:"),
                "rating": text_of(card, "li.ratingsLabel span.bc-text"),
                "price": text_of(card, "p.buybox-regular-price span.bc-text"),
                "link": parse_link(card),
            })
        except Exception as e:
            print(f"Skipped a card: {e}")
    return results

def build_library(search_url, max_pages=5):
    library = []
    for page in range(1, max_pages + 1):
        sep = "&" if "?" in search_url else "?"
        page_url = f"{search_url}{sep}page={page}"
        html = crawl(page_url)
        if not html:
            break
        found = scrape_audiobooks(html)
        if not found:
            break
        library.extend(found)
        print(f"Page {page}: {len(found)} audiobooks")
        time.sleep(2)
    return library

def export(rows, name="audible_library"):
    with open(f"{name}.json", "w", encoding="utf-8") as f:
        json.dump(rows, f, indent=2, ensure_ascii=False)
    with open(f"{name}.csv", "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=FIELDS)
        writer.writeheader()
        writer.writerows(rows)
    print(f"Saved {len(rows)} audiobooks to {name}.json and {name}.csv")

def main():
    url = "https://www.audible.com/search?keywords=science+fiction"
    library = build_library(url, max_pages=5)
    if library:
        export(library)

if __name__ == "__main__":
    main()

Run the full script with python audible_library.py. It walks up to five pages of results, parses one row per audiobook, and writes both audible_library.json and audible_library.csv. The empty-results break stops you early when the category runs out of pages, the time.sleep(2) between requests paces the run so you are not flagged for rapid-fire traffic, and the shared FIELDS list keeps the CSV column order in step with the dictionary keys so the two exports never drift apart.

What the output looks like

You get a clean list of audiobook records, in listing order, ready to write to JSON, CSV, or a database. This is your mini library.

json

[
  {
    "title": "Project Hail Mary",
    "author": "Andy Weir",
    "narrator": "Ray Porter",
    "length": "16 hrs and 10 mins",
    "rating": "4.9 out of 5 stars",
    "price": "$24.49",
    "link": "https://www.audible.com/pd/Project-Hail-Mary-Audiobook/B08G9PRS1K"
  },
  {
    "title": "Dune",
    "author": "Frank Herbert",
    "narrator": "Scott Brick, Orlagh Cassidy, Euan Morton",
    "length": "21 hrs and 2 mins",
    "rating": "4.6 out of 5 stars",
    "price": "$29.65",
    "link": "https://www.audible.com/pd/Dune-Audiobook/B002V1OF70"
  }
]

From here the mini library is yours to use. Load the CSV into a spreadsheet to sort by length or rating, filter the JSON for titles under a price ceiling, or feed the set into a recommendation list. Because each record carries the detail-page link, you can follow up later and scrape a single title's page for the full description or publisher when you need more than the listing fields.

Scaling across categories

One search is a starting point; a fuller library spans several categories. Audible exposes a results page for every category node and every keyword search, each at its own URL, so you can keep a map of names to URLs and run the same build_library routine over each, keying the output by category. Pace the requests with the delay already in the loop, and cap max_pages so a broad run does not balloon. To track price and rating trends over time, run the job on a schedule, stamp each export with the date, and diff successive snapshots to see what moved. The same approach powers any ecommerce scraping project that watches a catalog over time.

Staying unblocked

Even with rendering handled, Audible watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.

Pace your requests. Spread requests out with a delay between pages and categories rather than crawling everything at full speed. Schedule heavier jobs during off-peak hours to ease load on Audible's servers.
Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
Retain only what you need. Store the catalog fields your project uses and discard the rest. Re-check your selectors periodically so the scraper keeps pace with markup changes.

For the broader playbook on avoiding blocks, see how to scrape websites without getting blocked. If you want to deepen the parsing side, the guide on using BeautifulSoup in Python covers the library in detail, and for a related catalog target, the walkthrough on scraping Goodreads ratings and comments pairs naturally with an audiobook set.

Is it legal to scrape Audible?

Whether scraping Audible is allowed depends on Audible's and Amazon's Conditions of Use, your jurisdiction, and what you do with the data. Those terms restrict automated access, so scraping can run against them regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Audible's Conditions of Use and its robots.txt, and treat both as the boundary for what you collect. For commercial or competitive use, the legal picture gets more complex, and consulting a legal expert about your specific case is the sensible move.

A few lines worth holding to. Collect only public catalog data: the titles, authors, narrators, runtimes, ratings, prices, and listing links that anyone can see on a results page without an account. This guide never touches the audio itself, which is copyrighted content you are not entitled to download or redistribute, and it stays away from anything behind a login, account or purchase data, and personal information about identifiable listeners or reviewers. Keep your request volume low enough that you are not straining Audible's servers, and if you plan to reuse the data commercially, get permission or an official agreement rather than assuming silence is consent.

This guide is deliberately scoped to public search and category pages because that is the line that keeps the work defensible. For licensed or bulk access, Amazon offers the Product Advertising API and other official programs, and that is the right tool when you need large volumes, guaranteed structure, or commercial rights. If your project needs more than public catalog metadata, an official API or a data agreement is the correct path, not a cleverer scraper, and it is always the better choice when one exists.

Recap

Key takeaways

Audible's catalog is rich public metadata. Each results page lists title, author, narrator, runtime, rating, and price, which is why it is so useful for building a reading list or studying the audiobook market.
You need rendering and a trusted IP together. Audible fills the product grid client-side and blocks bot traffic, so the Crawling API renders the page behind a residential IP in one call.
BeautifulSoup does the extraction. Loop li.productListItem cards and map title, author, narrator, length, rating, price, and link to current selectors, and expect those selectors to drift.
Pagination builds the library. Walk the ?page= pages until one returns no cards, then export to JSON and CSV with a shared field list so the two files stay in sync.
Stay on public metadata. Respect Audible's Conditions of Use and robots.txt, never touch the copyrighted audio or anything behind a login, and prefer Amazon's official API for licensed or bulk data.

Frequently Asked Questions (FAQs)

Why does a plain request return no audiobooks from Audible?

Two reasons. Audible fills much of the product grid client-side as the page loads, so a raw request often gets a shell missing the runtimes, ratings, and prices. On top of that, as an Amazon property, Audible challenges or blocks traffic that does not look like a real browser. Rendering the page through the Crawling API behind a trusted IP solves both, which is why the scraper here routes its request through it.

What fields can I scrape from an Audible listing?

From a search or category results card you can pull the title, author, narrator, total length, average rating, price, and the link to the audiobook's detail page. Those are the public listing fields. If you need the full description, publisher, or release date, follow each card's link and scrape the individual detail page, which exposes more metadata.

How do I scrape a specific Audible category?

Every category and search has its own URL, for example /search?keywords=science+fiction for a keyword search or /search?node=... for a category node. Point the scraper at the URL you want. To cover many categories, keep a map of names to URLs and loop over it, pacing the requests with a short delay.

How does pagination work on Audible?

Audible appends a ?page= parameter (or &page= when the URL already has a query string) to walk through result pages. The script increments the page number, parses each page, and stops when a page returns no product cards, which is the signal you have reached the end of the category.

Can I download the audiobooks themselves?

No, and you should not try. The audio files are copyrighted content tied to Audible accounts and purchases, and this guide deliberately stays away from them. It collects only the public catalog metadata, the titles, authors, narrators, runtimes, ratings, and prices that appear on listing pages, never the audio.

How do I avoid getting blocked while scraping Audible?

Keep your per-IP request rate low, add a delay between pages and categories, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation, a trusted IP pool, and CAPTCHA handling for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing challenges.

Bilal Ahmed

Software Engineer · Crawlbase

Software engineer who wrote some of the most-read pieces on the Crawlbase blog, covering web scraping, proxies, and data tooling.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. Up to 20,000 requests free, no card required.

Get a free API key →Read the docs

Self-serve · No sales call required · Enterprise crawl volumes available

What you will build

Why a plain request fails on Audible

Prerequisites

Set up the project

Understanding the Audible results page

Step 1: Fetch the rendered results page

Step 2: Parse the audiobook cards with BeautifulSoup

Step 3: Handle pagination, assemble, and export

What the output looks like

Scaling across categories

Staying unblocked

Is it legal to scrape Audible?

Key takeaways

Frequently Asked Questions (FAQs)

Why does a plain request return no audiobooks from Audible?

What fields can I scrape from an Audible listing?

How do I scrape a specific Audible category?

How does pagination work on Audible?

Can I download the audiobooks themselves?

How do I avoid getting blocked while scraping Audible?

Crawl any site at scale, without fighting infrastructure.

Continue Reading

How to Scrape Google People Also Ask: full PAA extraction guide

Introducing the New Crawlbase Dashboard: a cleaner control center

13 Tips to Master Data Crawling: crawls that do not break

The infrastructure brief, in your inbox.

We use cookies

Customize cookies