Financial decisions run on data. Whether you are tracking a stock you hold, building a market dashboard, or feeding a research model, the public numbers that move the market (quotes, index levels, reported company financials) are the raw material. Most of that data sits in plain sight on finance sites, but pulling it by hand across dozens of tickers is slow, and the live pages that show it are built to render in a browser, not to hand a clean answer to a bare script.

This guide shows you how to scrape financial data with Python the reliable way. It covers which public sources matter, then walks through a small, runnable scraper that fetches a rendered finance quote page through the Crawling API, parses the headline fields (price, change, volume, and a few neighbours), and exports clean JSON and CSV. Everything here stays scoped to public market data: factual numbers anyone can see without a login, never personal or paywalled content. The legality section near the end is not boilerplate, so read it before you point this at real volume.

What public financial data is worth collecting

Before writing code, it helps to know what data is both useful and fair to collect. Public financial data falls into a few broad buckets, each driving a different kind of analysis.

  • Market quotes. The live price, intraday change, percent change, day range, and trading volume for a stock, ETF, or fund. This is the most commonly tracked data and the focus of the worked example below.
  • Indices. Aggregate levels for benchmarks such as broad-market or sector indices, useful for measuring a position against the wider market.
  • Public company financials. Figures that public companies are required to disclose: revenue, earnings, market capitalisation, and the summary ratios shown on a company profile page.
  • Public market news and headlines. Editorial headlines and timestamps that feed sentiment analysis. Collect the factual headline and link, not the full copyrighted article body.

General business sources like Forbes, Reuters, Bloomberg, MarketWatch, and the Financial Times are good reading, but for a programmatic feed you want a page that exposes structured quote fields, which is what we target below. For the wider landscape first, our roundup of the best financial data providers compares the major sources.

What you will build

A Python script that takes a public finance quote URL for a ticker, fetches the rendered page through the Crawling API, and extracts a structured record. The fields below are representative public market fields; the exact CSS selectors depend on the source page you target, so treat them as a template to adapt.

  • Symbol the ticker the quote belongs to.
  • Price the current quoted price.
  • Change the absolute price change on the day.
  • Change percent the day's move expressed as a percentage.
  • Volume the number of shares traded.
  • Day range the intraday low and high.
  • Market cap the company's market capitalisation, where the page shows it.

Why a plain request fails on finance pages

Request a modern finance quote URL with a bare HTTP client and you usually get status 200 with almost none of the numbers in the body. Two things work against you. First, most finance sites render quotes client-side: the price, change, and volume are streamed in by JavaScript after the initial HTML loads, so the first response is a thin shell, and parsing it captures page chrome instead of the figures you came for. Second, finance sites watch for automated traffic closely, because live quote data is valuable. Datacenter IPs and non-browser request patterns get rate-limited, IP-blocked, or challenged before they reach the rendered content.

So a working financial scraper needs two things in one request: a browser that actually renders the page, and an IP the site reads as a real visitor. You can build that yourself with a headless browser plus rotating residential proxies, but keeping that stack healthy is most of the work. The Crawling API folds both into a single call: you send the URL with a JavaScript token, it renders the page behind a trusted IP, and it returns finished HTML for you to parse. For more on why client-rendered pages need this, see scraping JavaScript pages with Python.

Why the JS token

Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Finance pages fill their quote fields client-side, so you need the JS token here. The normal token returns the same thin shell a plain fetch would, with little useful to parse out of it.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the parsing side, the BeautifulSoup guide is a good companion to this tutorial.

Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda, and make sure Python is on your PATH.

A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token from the account docs page. Crawlbase includes 1,000 free requests to start, which is plenty for working through this guide, and you only pay for successful requests. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the libraries the scraper needs.

bash
python --version

python -m venv finance_env
source finance_env/bin/activate

pip install crawlbase beautifulsoup4

On Windows, activate the environment with finance_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull out individual fields by CSS selector. Both json and csv ship with the standard library, so there is nothing more to install for the export step.

Step 1: Fetch a rendered quote page

Start by getting a finished page. Import the CrawlingAPI class, initialize it with your JS token, and request a public quote URL. Finance pages stream their numbers in asynchronously, so pass ajax_wait and page_wait to hold for the dynamic content before the page is captured. Checking the Crawlbase pc_status before you parse keeps failures loud instead of silent.

python
from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

OPTIONS = {
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/122.0",
    "ajax_wait": "true",
    "page_wait": 5000,
}

def crawl(page_url):
    response = api.get(page_url, OPTIONS)
    if response["headers"]["pc_status"] == "200":
        return response["body"].decode("utf-8")
    print(f"Request failed: {response['headers']['pc_status']}")
    return None

if __name__ == "__main__":
    quote_url = "https://www.example-finance.com/quote/AAPL"
    html = crawl(quote_url)
    print(html[:500] if html else "No HTML returned")

Swap quote_url for the public quote page you intend to track. The two wait options matter for a client-rendered finance page: ajax_wait waits for asynchronous content to finish loading, and page_wait holds a fixed number of milliseconds after load so late-streaming numbers appear before capture. Five seconds is a reasonable start; raise it if figures come back missing. Run the script and you should see real quote markup, not the shell a plain request returns, which confirms rendering works before you write a single selector.

Crawlbase Crawling API

A finance quote page needs a rendered page behind a trusted IP, in one call, which is exactly what the ajax_wait and page_wait options above set up. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a public quote page on the free tier first.

Step 2: Parse the quote fields

With finished HTML in hand, load it into BeautifulSoup and pull the headline numbers. Most finance pages mark their key fields with stable data-* attributes or field names, which are more durable than generated class names. Each lookup is guarded so a missing field returns None instead of crashing the run.

python
from bs4 import BeautifulSoup

def text_of(soup, selector):
    el = soup.select_one(selector)
    return el.get_text(strip=True) if el else None

def parse_quote(html, symbol, url):
    soup = BeautifulSoup(html, "html.parser")
    return {
        "symbol": symbol,
        "price": text_of(soup, '[data-field="regularMarketPrice"]'),
        "change": text_of(soup, '[data-field="regularMarketChange"]'),
        "change_percent": text_of(soup, '[data-field="regularMarketChangePercent"]'),
        "volume": text_of(soup, '[data-field="regularMarketVolume"]'),
        "day_range": text_of(soup, '[data-field="regularMarketDayRange"]'),
        "market_cap": text_of(soup, '[data-field="marketCap"]'),
        "link": url,
    }

The text_of helper queries one element and returns its stripped text, or None when the element is absent, so a quote page that omits a field does not break the loop. The data-field values here (regularMarketPrice, regularMarketChange, regularMarketVolume, and friends) are typical of how finance pages tag their live numbers. Open your target page in dev tools, find the attribute or class wrapping each figure, and substitute the real selectors; keeping the lookups in one dictionary makes that a one-line edit per field.

Selectors drift

Finance sites change their markup without notice, and the exact data-field names vary from one source to another. Treat the selectors here as a starting template, not a contract. When a field comes back None, re-inspect the live page and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.

Step 3: Assemble the full script

Now wire the pieces into one runnable script: loop over a list of tickers, fetch and parse each quote with a small retry wrapper, and export the records to both JSON and CSV.

python
import csv
import json
import time
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

OPTIONS = {
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/122.0",
    "ajax_wait": "true",
    "page_wait": 5000,
}

# Map each ticker to its public quote page URL.
QUOTE_BASE = "https://www.example-finance.com/quote/"
SYMBOLS = ["AAPL", "MSFT", "GOOGL"]

def crawl(page_url):
    response = api.get(page_url, OPTIONS)
    if response["headers"]["pc_status"] == "200":
        return response["body"].decode("utf-8")
    print(f"Request failed: {response['headers']['pc_status']}")
    return None

def fetch_html(page_url, max_retries=2):
    for attempt in range(max_retries + 1):
        html = crawl(page_url)
        if html:
            return html
        if attempt < max_retries:
            time.sleep(1)
    return None

def text_of(soup, selector):
    el = soup.select_one(selector)
    return el.get_text(strip=True) if el else None

def parse_quote(html, symbol, url):
    soup = BeautifulSoup(html, "html.parser")
    return {
        "symbol": symbol,
        "price": text_of(soup, '[data-field="regularMarketPrice"]'),
        "change": text_of(soup, '[data-field="regularMarketChange"]'),
        "change_percent": text_of(soup, '[data-field="regularMarketChangePercent"]'),
        "volume": text_of(soup, '[data-field="regularMarketVolume"]'),
        "day_range": text_of(soup, '[data-field="regularMarketDayRange"]'),
        "market_cap": text_of(soup, '[data-field="marketCap"]'),
        "link": url,
    }

def save_outputs(records):
    with open("quotes.json", "w") as f:
        json.dump(records, f, indent=2)
    if not records:
        return
    with open("quotes.csv", "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=records[0].keys())
        writer.writeheader()
        writer.writerows(records)

def main():
    records = []
    for symbol in SYMBOLS:
        url = QUOTE_BASE + symbol
        html = fetch_html(url)
        if html:
            records.append(parse_quote(html, symbol, url))
        time.sleep(2)

    save_outputs(records)
    print(f"Saved {len(records)} quotes")

if __name__ == "__main__":
    main()

The script loops over SYMBOLS, builds each quote URL, fetches it with the retry wrapper, parses it into a record, and paces the loop with a two-second sleep. save_outputs writes both JSON and CSV using the first record's keys as the header, so you have the data in whichever shape your downstream tool wants. Add tickers to SYMBOLS and adjust QUOTE_BASE to fit the public source you target.

What the output looks like

Run the full script and you get a clean structured record per ticker, ready for analysis, a database, or a spreadsheet. The values below are illustrative, not live figures.

json
[
  {
    "symbol": "AAPL",
    "price": "225.40",
    "change": "+1.85",
    "change_percent": "+0.83%",
    "volume": "48,210,300",
    "day_range": "223.10 - 226.05",
    "market_cap": "3.42T",
    "link": "https://www.example-finance.com/quote/AAPL"
  },
  {
    "symbol": "MSFT",
    "price": "418.20",
    "change": "-2.40",
    "change_percent": "-0.57%",
    "volume": "19,840,100",
    "day_range": "416.00 - 421.30",
    "market_cap": "3.11T",
    "link": "https://www.example-finance.com/quote/MSFT"
  }
]

The matching CSV carries the same columns, one row per ticker, which drops straight into pandas or any spreadsheet for sorting by change, filtering by volume, or charting price over repeated runs. Schedule the script on a cron and you have a simple time series of public quotes for the tickers you follow.

Scaling and staying unblocked

Tracking hundreds of symbols, or polling the same set on a tight schedule, is where finance scraping gets harder, because that is exactly the traffic these sites watch for. A few habits keep a longer run healthy.

  • Pace your requests. Polling a watchlist in a tight loop is the fastest way to get throttled or challenged. The two-second sleeps above are a floor, not a ceiling; widen them for larger jobs and stagger your runs rather than firing every ticker at once.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this server-side; if you roll your own stack, this is the part to get right.
  • Read the status codes. A run that starts returning non-200 pc_status values is telling you the current rate or IP tier is no longer enough. Treat that as a signal to back off, not noise to ignore.

For larger jobs, the async Crawler queues requests and delivers results to a webhook, which suits polling many symbols without holding open connections. For the broader playbook, see large-scale finance scraping and our general guide to scraping without getting blocked. If your end goal is competitive or pricing analysis, our guide to web scraping for price intelligence shows how teams turn this kind of feed into decisions.

Whether scraping financial data is allowed depends on the source's terms of service, your jurisdiction, and what you do with the data. Market quotes, index levels, and the financials a public company must disclose are factual, public information, and collecting public facts for analysis is broadly defensible. But the page that serves them belongs to someone, and most finance sites restrict automated access and bulk collection in their terms. None of the code here changes that; it only makes the technical part work. Read the target site's Terms of Service and its robots.txt, respect any stated rate limits, and treat both as the boundary for what you collect.

A few lines worth holding to. Collect only public, non-paywalled data: the quote fields, index levels, and disclosed financials anyone can see without an account. Do not scrape anything behind a login or paywall, and never bypass authentication or metering to reach premium data, which crosses from public collection into unauthorized access. Keep your volume low enough that you are not straining the source's servers. And remember that even factual data carries licensing terms when you redistribute it: many sites source quotes from exchanges and data vendors under agreements that restrict republishing, so collecting a number for your own analysis is not the same as having the right to resell or publish it. Editorial content like full article text is copyrighted; take the headline and link for sentiment work, not the body.

This guide is scoped to public market data because that is the line that keeps the work defensible. For anything beyond personal research, and especially for production or commercial products, use an official market-data API or a licensed feed rather than a cleverer scraper. Most major finance platforms offer their own data API or partner with established market-data vendors, and exchanges license real-time and historical data directly. Those routes give you a stable contract, documented fields, and redistribution rights that scraping a public page never grants. Use scraping to prototype and fill gaps; license a feed when the data drives something real.

Recap

Key takeaways

  • Know what is fair to collect. Public market data (quotes, indices, disclosed company financials, headlines) is factual and fair game; personal and paywalled data is not.
  • Finance pages are client-rendered. A plain request returns a thin shell with the numbers missing, so you must render the page before you parse it.
  • You need rendering and a trusted IP together. The Crawling API with a JS token does both in one call; ajax_wait and page_wait control how long it waits for content.
  • Parse, loop, and export. Map each field to a guarded selector, loop over your tickers with a retry wrapper, pace the run with short sleeps, and write JSON and CSV.
  • License for production. Respect each source's ToS and robots.txt, note that quote data carries redistribution terms, and move to an official market-data API or licensed feed for anything commercial.

Frequently Asked Questions (FAQs)

Why does a plain request return a finance page with no numbers?

Because most finance sites load their quote fields client-side with JavaScript. The initial HTML is a shell that fills in only after the page's scripts run, so a raw HTTP request returns status 200 with the price, change, and volume missing. To get the figures you have to render the page first, which the Crawling API's JS token handles for you.

Do I need the normal token or the JS token for finance pages?

The JS token. The normal token fetches static HTML, which on a client-rendered finance page is the same thin shell a plain fetch returns. The JS token renders the page in a real browser before handing back the HTML, so the quote fields are present when BeautifulSoup parses them.

What financial data can I scrape safely?

Public, non-paywalled market data: live quotes, index levels, the financials a public company is required to disclose, and factual headlines with their links. Stay on data visible to any visitor without an account, do not collect anything behind a login or paywall, and avoid republishing copyrighted article text or licensed data feeds.

My selectors return None. What changed?

Almost certainly the source's markup. Finance sites change their generated class names and data-field attributes without notice, and those attribute names differ from one source to another, so selectors that worked last month can break. Re-inspect a live page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.

Should I scrape or use an official market-data API for production?

For anything commercial or production-grade, use an official API or a licensed feed. Scraping a public page is excellent for prototyping and for filling gaps, but official APIs give you documented fields, stable contracts, and redistribution rights that a public page never grants. Most major finance platforms offer a data API or partner with established market-data vendors.

How do I scrape many tickers without getting blocked?

Pace your requests with sleeps, stagger runs rather than polling every symbol at once, and rely on rotating residential IPs so no single address trips a rate limit. The Crawling API rotates IPs server-side, and the async Crawler queues large jobs and posts results to a webhook. Watch the pc_status codes and back off when they stop returning 200.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available