Cryptocurrency markets run around the clock, and public market pages on sites like CoinGecko publish exactly the structured numbers that drive price tracking, portfolio dashboards, and research: each coin's name and symbol, its current price, the 24-hour change, the 24-hour trading volume, and the market capitalization. That table refreshes constantly, and reading it by hand across dozens of coins is slow and error-prone.

This guide shows you how to extract crypto market data with Python the reliable way. You build a small, runnable scraper that fetches the rendered market page through the Crawling API, parses each row with BeautifulSoup, handles pagination, and exports clean JSON and CSV. The whole walkthrough stays scoped to public market data, the kind any visitor sees without an account, and the legality section near the end is not boilerplate.

What you will build

A Python script that fetches a public crypto market page, walks each coin row in the rendered table, and extracts a structured record per coin. The running example is the main markets table on CoinGecko. We pull these fields:

  • Name the full coin name, such as Bitcoin or Ethereum.
  • Symbol the ticker symbol, such as BTC or ETH.
  • Price the current price in your chosen quote currency.
  • Change 24h the percentage price move over the last 24 hours.
  • Volume 24h the total traded volume over the last 24 hours.
  • Market cap the total market capitalization of the coin.

Why a plain request fails on a crypto market page

Request a crypto market URL with a bare HTTP client and you get status 200 with only a fraction of the data in the body. Two things work against you. First, these market tables load their rows in the browser through JavaScript, so the initial HTML is a thin shell that fills in only after the page's scripts run. Parse that first response and you capture an empty table instead of the full coin list. Second, high-traffic market sites flag automated traffic quickly: datacenter IPs and request patterns that do not look like a real browser get rate-limited or challenged before they ever reach the rendered content.

So a working market-data scraper needs two things in one request: a browser that renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted IP, and it returns finished HTML for you to parse.

Why the JS token

Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. A crypto market table fills its rows client-side, so you need the JS token here. The normal token returns the same thin shell a plain fetch would, and there is little useful to parse out of it.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the parsing side, the BeautifulSoup guide is a good companion to this tutorial.

Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda, and make sure Python is on your PATH.

A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token from the account docs page. Crawlbase includes 1,000 free requests to start, which is plenty for working through this guide. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the libraries the scraper needs.

bash
python --version

python -m venv crypto_env
source crypto_env/bin/activate

pip install crawlbase beautifulsoup4

On Windows, activate the environment with crypto_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull out individual fields by CSS selector. Both json and csv ship with the standard library, so there is nothing more to install for the export step.

Step 1: Fetch a rendered market page

Start by getting a finished page. Import the CrawlingAPI class, initialize it with your JS token, and request the market URL. The table loads asynchronously, so pass ajax_wait and page_wait to hold for the dynamic content before the page is captured. Checking the Crawlbase pc_status before you parse keeps failures loud instead of silent.

python
from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"})

OPTIONS = {
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/122.0",
    "ajax_wait": "true",
    "page_wait": 5000,
}

def crawl(page_url):
    response = api.get(page_url, OPTIONS)
    if response["headers"]["pc_status"] == "200":
        return response["body"].decode("utf-8")
    print(f"Request failed: {response['headers']['pc_status']}")
    return None

if __name__ == "__main__":
    market_url = "https://www.coingecko.com/"
    html = crawl(market_url)
    print(html[:500] if html else "No HTML returned")

The two wait options matter for a client-rendered target. ajax_wait tells the API to wait for asynchronous content to load, and page_wait holds for a fixed number of milliseconds after load so the live-updating rows settle before the page is captured. Five seconds is a reasonable start; raise it if the table comes back thin. Run python crypto_scraper.py and you should see real market markup, not the shell a plain request returns, which confirms rendering works before you write a single selector.

Crawlbase Crawling API

A crypto market page needs a rendered table behind a trusted IP, in one call, which is exactly what the ajax_wait and page_wait options above set up. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at the public market page on the free tier first.

Step 2: Parse one coin row

The market page is a table where each tr is one coin. Load the rendered HTML into BeautifulSoup and read the cells you care about. The columns sit in a known order, and the name and symbol live together in the coin cell, so a small helper that maps each field to its cell keeps the parsing readable. Each lookup is guarded so a missing field returns None instead of crashing the run.

python
from bs4 import BeautifulSoup

ROW_SELECTOR = 'table[data-coin-table-target="table"] > tbody > tr'

def text_of(node, selector):
    el = node.select_one(selector)
    return el.get_text(strip=True) if el else None

def parse_row(row):
    return {
        "name": text_of(row, 'td[data-view-component="true"] a span.tw-text-gray-700'),
        "symbol": text_of(row, 'td[data-view-component="true"] a span.tw-text-gray-500'),
        "price": text_of(row, 'td[data-target="price.price"]'),
        "change_24h": text_of(row, 'td.tw-text-right span[data-target="price-change-percentage-24h"]'),
        "volume_24h": text_of(row, 'td.tw-text-right span[data-coin-table-target="totalVolume"]'),
        "market_cap": text_of(row, 'td.tw-text-right span[data-coin-table-target="marketCap"]'),
    }

The text_of helper queries one element inside a row and returns its stripped text, or None when the element is absent, so a coin that omits a field does not break the loop. The coin cell holds both the full name and the symbol in two nested spans, while price, 24-hour change, volume, and market cap each sit in their own right-aligned cell. Reading each row as a unit keeps every field aligned to the correct coin even when a column shifts.

Selectors drift

A market site's generated class names and data-target attributes change without notice. Treat the selectors here as a starting template, not a contract. When a field comes back None, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.

Step 3: Walk every row on the page

With a row parser in hand, select all the rows in the table and map each one to a record. A small retry wrapper around the fetch keeps a single slow request from ending the run.

python
import time

def fetch_html(page_url, max_retries=2):
    for attempt in range(max_retries + 1):
        html = crawl(page_url)
        if html:
            return html
        if attempt < max_retries:
            print(f"Retrying ({attempt + 1}/{max_retries})...")
            time.sleep(1)
    print(f"Unable to fetch {page_url}")
    return None

def parse_market(html):
    soup = BeautifulSoup(html, "html.parser")
    rows = soup.select(ROW_SELECTOR)
    return [parse_row(r) for r in rows if r.select_one('td[data-target="price.price"]')]

fetch_html retries a failed fetch up to twice with a short pause, returning the HTML on success and None once it gives up. parse_market selects every coin row and maps each to a record, skipping any row with no price cell so spacer or header rows do not produce empty entries. The result is a clean list of coin dictionaries from one page.

Step 4: Handle pagination across the market

One page is a slice of the full market. CoinGecko paginates with a ?page= query parameter, so you walk each page up to a ceiling and gather rows from all of them. Capping the crawl with a max_pages argument keeps a large market from running away, and a short sleep between pages paces the run so you are not hammering the site.

python
def collect_all_coins(base_url, max_pages):
    records = []
    for page in range(1, max_pages + 1):
        page_url = f"{base_url}?page={page}"
        html = fetch_html(page_url)
        if not html:
            continue
        page_coins = parse_market(html)
        if not page_coins:
            break
        records.extend(page_coins)
        print(f"Page {page}: {len(page_coins)} coins")
        time.sleep(2)
    return records

collect_all_coins requests each page in turn, parses its rows, and stops early if a page returns no coins, which happens once you pass the last populated page. The time.sleep(2) between pages spreads the requests out. Adjust max_pages to control how deep into the market ranking you go; the first page alone already covers the largest coins by market cap.

Step 5: Assemble the full script

Now wire the pieces into one runnable script: collect coins across pages, then export the records to both JSON and CSV.

python
import csv
import json
import time
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"})

OPTIONS = {
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/122.0",
    "ajax_wait": "true",
    "page_wait": 5000,
}

ROW_SELECTOR = 'table[data-coin-table-target="table"] > tbody > tr'

def crawl(page_url):
    response = api.get(page_url, OPTIONS)
    if response["headers"]["pc_status"] == "200":
        return response["body"].decode("utf-8")
    print(f"Request failed: {response['headers']['pc_status']}")
    return None

def fetch_html(page_url, max_retries=2):
    for attempt in range(max_retries + 1):
        html = crawl(page_url)
        if html:
            return html
        if attempt < max_retries:
            time.sleep(1)
    return None

def text_of(node, selector):
    el = node.select_one(selector)
    return el.get_text(strip=True) if el else None

def parse_row(row):
    return {
        "name": text_of(row, 'td[data-view-component="true"] a span.tw-text-gray-700'),
        "symbol": text_of(row, 'td[data-view-component="true"] a span.tw-text-gray-500'),
        "price": text_of(row, 'td[data-target="price.price"]'),
        "change_24h": text_of(row, 'td.tw-text-right span[data-target="price-change-percentage-24h"]'),
        "volume_24h": text_of(row, 'td.tw-text-right span[data-coin-table-target="totalVolume"]'),
        "market_cap": text_of(row, 'td.tw-text-right span[data-coin-table-target="marketCap"]'),
    }

def parse_market(html):
    soup = BeautifulSoup(html, "html.parser")
    rows = soup.select(ROW_SELECTOR)
    return [parse_row(r) for r in rows if r.select_one('td[data-target="price.price"]')]

def collect_all_coins(base_url, max_pages):
    records = []
    for page in range(1, max_pages + 1):
        html = fetch_html(f"{base_url}?page={page}")
        if not html:
            continue
        page_coins = parse_market(html)
        if not page_coins:
            break
        records.extend(page_coins)
        time.sleep(2)
    return records

def save_outputs(records):
    with open("crypto_market.json", "w") as f:
        json.dump(records, f, indent=2)
    if not records:
        return
    with open("crypto_market.csv", "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=records[0].keys())
        writer.writeheader()
        writer.writerows(records)

def main():
    market_url = "https://www.coingecko.com/"
    coins = collect_all_coins(market_url, max_pages=2)
    save_outputs(coins)
    print(f"Saved {len(coins)} coins")

if __name__ == "__main__":
    main()

The script collects coin records across up to two market pages and paces the loop with a two-second sleep. save_outputs writes both a JSON file and a CSV using the keys of the first record as the header, so you have the data in whichever shape your downstream tool wants. Adjust max_pages and the market URL to fit how much of the ranking you want.

What the output looks like

Run the full script with python crypto_scraper.py and you get a clean structured record per coin, ready for analysis, a database, or a spreadsheet.

json
[
  {
    "name": "Bitcoin",
    "symbol": "BTC",
    "price": "$86,650.00",
    "change_24h": "2.4%",
    "volume_24h": "$28,540,118,233",
    "market_cap": "$1,712,884,991,402"
  },
  {
    "name": "Ethereum",
    "symbol": "ETH",
    "price": "$2,015.42",
    "change_24h": "1.1%",
    "volume_24h": "$14,902,551,870",
    "market_cap": "$243,118,440,905"
  }
]

The matching CSV carries the same columns, one row per coin, which drops straight into pandas or any spreadsheet for sorting by market cap, filtering by 24-hour change, or charting volume. The values above are illustrative; live prices and percentages shift constantly, which is the whole reason you pull them on a schedule rather than once.

Staying unblocked at scale

Even with rendering handled, a high-traffic market site watches for scraper-shaped traffic. A few habits keep a longer run healthy, and they apply to any hard commercial target.

  • Pace your requests. Hammering pages in a tight loop is the fastest way to get throttled or challenged. The two-second sleeps above are the floor, not the ceiling; widen them for larger jobs and avoid pulling the same page on a tight cycle.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
  • Read the status codes. A run that starts returning non-200 pc_status values is telling you the current rate or IP tier is no longer enough. Treat that as a signal to back off, not noise to ignore.

For larger crawls, the async Crawler queues requests and delivers results to a webhook, which suits running many market pages without holding open connections. For the broader playbook, see how to scrape websites without getting blocked. If you want to feed this data into a monitoring workflow, the same approach carries over to price intelligence, and a sibling guide covers scraping crypto prices from CoinMarketCap if you want a second source.

Whether scraping a crypto market site is allowed depends on that site's terms of service, your jurisdiction, and what you do with the data. Sites like CoinGecko publish their market tables openly, and the numbers themselves (price, 24-hour change, volume, market cap) are factual public market data rather than personal data, which makes the field set in this guide low-risk on the privacy front. It does not exempt you from the site's terms, though: most market sites restrict heavy automated access in their Terms of Use, so read those terms and the site's robots.txt and treat both as the boundary for what and how much you collect.

A few lines worth holding to. Collect only public market data, the figures any visitor sees without an account, and keep your request volume low enough that you are not straining the site's servers. Do not scrape anything behind a login, a paywall, or an account, and do not collect or build profiles around personal data, which is out of scope here and would bring GDPR and CCPA obligations into play. Respect copyright on any editorial content a market site publishes alongside its tables: the price numbers are facts, but a written market analysis is not yours to republish.

For production use, the cleaner path is the official API. Most crypto market sites, CoinGecko included, offer a public API that returns the same price, volume, and market-cap fields as structured JSON under clear rate limits and terms. An API is more stable than HTML selectors, it does not break when the page layout changes, and it keeps you inside the provider's permitted-use rules. Scrape the rendered page for a quick one-off or a field the API does not expose; reach for the official API or a licensed data feed when you are building something durable or commercial.

Recap

Key takeaways

  • Market tables are client-side rendered. A plain request returns a thin shell with an empty table, so you must render the page before you parse it.
  • You need rendering and a trusted IP together. The Crawling API with a JS token does both in one call; ajax_wait and page_wait control how long it waits for content.
  • Parse row by row. Read each tr as one coin and map its cells to name, symbol, price, 24-hour change, 24-hour volume, and market cap, so every field stays aligned to the right coin.
  • Paginate and export. Walk the ?page= query up to a ceiling, pace the run with short sleeps, and write the records to JSON and CSV.
  • Prefer the official API for production. Stay on public market data, respect the site's ToS and robots.txt, and use the provider's public API or a licensed feed for anything durable or commercial.

Frequently Asked Questions (FAQs)

Why does a plain request return an empty crypto table?

Because the market page loads its rows client-side with JavaScript. The initial HTML is a shell that fills in only after the scripts run in a browser, so a raw HTTP request returns status 200 with the table empty. To get the rows you have to render the page first, which is what the Crawling API's JS token handles for you.

Do I need the normal token or the JS token here?

The JS token. The normal token fetches static HTML, which on a crypto market page is the same empty-table shell a plain fetch returns. The JS token renders the page in a real browser first, so the coin rows are present when BeautifulSoup parses them.

What fields can I extract from a market page?

The public per-coin figures: the name and ticker symbol, the current price, the 24-hour change percentage, the 24-hour trading volume, and the market capitalization. Stay on data that is visible to any visitor without an account, and treat written market analysis or editorial content as copyrighted, not as something to republish.

My selectors return None. What changed?

Almost certainly the site's markup. Generated class names and data-target attributes (the data-coin-table-target hooks, the right-aligned cell classes) change without notice, so selectors that worked last month can break. Re-inspect a live page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.

Should I scrape the page or use the official API?

For anything durable or commercial, prefer the official API. Most crypto market sites, including CoinGecko, expose a public API that returns price, volume, and market-cap data as structured JSON under clear terms and rate limits, which is more stable than parsing HTML. Scraping the rendered page is best for a quick one-off or for a field the API does not surface.

How often should I collect crypto market data?

It depends on your use case. For a near-real-time dashboard you might pull every few minutes; for trend research, hourly or daily snapshots are usually enough. Whatever the cadence, pace your requests and respect the site's rate limits so a frequent schedule does not turn into scraper-shaped traffic the site blocks.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available