Yahoo Finance is one of the most-visited platforms for tracking stocks, indices, and crypto, and its quote pages hold exactly the structured market data that drives price tracking, screening, and research: the last price, the day's change, previous close, market cap, volume, and the day range. For anyone watching a watchlist of tickers, that public quote data is the raw material, and reading it by hand across dozens of symbols is slow and easy to get wrong.
This guide shows you how to scrape Yahoo Finance with Python the reliable way. You build a small, runnable scraper that fetches rendered quote pages through the Crawling API, parses the fields you want with BeautifulSoup, loops over a list of tickers, and exports clean JSON and CSV. The whole walkthrough stays scoped to public market data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.
What you will build
A Python script that takes a list of ticker symbols, fetches each rendered Yahoo Finance quote page through the Crawling API, and extracts a structured record per symbol. The running example uses AAPL, TSLA, and BTC-USD. We pull these fields:
- Price the most recent trading price for the symbol.
- Change the change in price from the previous close, both the absolute value and the percentage.
- Previous close the prior session's closing price.
- Market cap the total market capitalization shown in the quote statistics.
- Volume the number of shares or units traded in the current session.
- Day range the low-to-high price band for the current trading day.
Why a plain request fails on Yahoo Finance
If you request a Yahoo Finance quote URL with a bare HTTP client, you get a response with status 200 and only a fraction of the numbers in the body. Two things work against you. First, the quote page renders its live price, change, and statistics table in the browser through JavaScript, so the initial HTML is a thin shell that fills in only after the page's scripts run. Parse that first response and you capture placeholders or empty nodes instead of the real figures. Second, Yahoo flags automated traffic quickly: datacenter IPs and request patterns that do not look like a real browser get rate-limited, redirected to a consent wall, or challenged before they ever reach the rendered content.
So a working Yahoo Finance scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted IP, and it returns finished HTML for you to parse. If you want the background on rendering targets like this, the guide to scraping JavaScript pages with Python is a good companion.
Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Yahoo Finance fills its price and statistics fields client-side, so you need the JS token here. The normal token returns the same thin shell a plain fetch would, and there is little useful to parse out of it.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the parsing side, the BeautifulSoup guide is a good companion to this tutorial.
Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda, and make sure Python is on your PATH.
A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token from the account docs page. Crawlbase includes 1,000 free requests to start, which is plenty for working through this guide. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create a virtual environment so project dependencies stay isolated, then install the libraries the scraper needs.
python --version python -m venv yahoo_env source yahoo_env/bin/activate pip install crawlbase beautifulsoup4
On Windows, activate the environment with yahoo_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull out individual fields by CSS selector. Both json and csv ship with the standard library, so there is nothing more to install for the export step.
Step 1: Fetch a rendered quote page
Start by getting a finished page. Import the CrawlingAPI class, initialize it with your JS token, and request a Yahoo Finance quote URL. Yahoo loads its price and statistics asynchronously, so pass ajax_wait and page_wait to hold for the dynamic content before the page is captured. Checking the Crawlbase pc_status before you parse keeps failures loud instead of silent.
from crawlbase import CrawlingAPI api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) OPTIONS = { "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/122.0", "ajax_wait": "true", "page_wait": 5000, } def crawl(page_url): response = api.get(page_url, OPTIONS) if response["headers"]["pc_status"] == "200": return response["body"].decode("utf-8") print(f"Request failed: {response['headers']['pc_status']}") return None if __name__ == "__main__": quote_url = "https://finance.yahoo.com/quote/AAPL" html = crawl(quote_url) print(html[:500] if html else "No HTML returned")
The two wait options matter for a client-rendered target like this. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so the live price settles before the page is captured. Five seconds is a reasonable start; raise it if the figures come back empty. Run the script with python yahoo_scraper.py and you should see real quote markup, not the shell a plain request returns. That confirms rendering works before you write a single selector.
A Yahoo Finance quote needs a rendered page behind a trusted IP, in one call, which is exactly what the ajax_wait and page_wait options above set up. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a public quote page on the free tier first.
Step 2: Parse the headline price and change
Yahoo Finance exposes its live numbers through stable data-testid attributes on the quote header, which makes them reliable parsing targets. Load the rendered HTML into BeautifulSoup and read the title, the price, and the change off those attributes. Each lookup is guarded so a missing field returns None instead of crashing the run.
from bs4 import BeautifulSoup def text_of(soup, selector): el = soup.select_one(selector) return el.get_text(strip=True) if el else None def scrape_header(html): soup = BeautifulSoup(html, "html.parser") return { "title": text_of(soup, "div.hdr h1"), "price": text_of(soup, '.livePrice[data-testid="qsp-price"]'), "change": text_of(soup, '.priceChange[data-testid="qsp-price-change"]'), "change_percent": text_of(soup, '[data-testid="qsp-price-change-percent"]'), }
The text_of helper queries one element and returns its stripped text, or None when the element is absent, so a symbol that omits a field does not break the loop. The selectors come straight from Yahoo's quote header: title reads the company name and ticker from the div.hdr h1 heading, price reads the live price node tagged qsp-price, and the two change selectors read the absolute move (qsp-price-change) and the percentage move (qsp-price-change-percent) that sit next to it.
Step 3: Parse the statistics table
Below the header, Yahoo Finance renders a small statistics block with previous close, market cap, volume, day range, and more. Each metric sits in a list item tagged with a data-field attribute, so you read the value node by field name rather than by brittle position. That keeps the parse stable even when Yahoo reorders the grid.
STAT_FIELDS = { "previous_close": "regularMarketPreviousClose", "market_cap": "marketCap", "volume": "regularMarketVolume", "day_range": "regularMarketDayRange", } def scrape_stats(soup): stats = {} for key, field in STAT_FIELDS.items(): stats[key] = text_of( soup, f'li[data-field="{field}"] span.value, li[data-field="{field}"] fin-streamer' ) return stats
The STAT_FIELDS map keys each output name to Yahoo's internal field name. Yahoo wraps live values in a fin-streamer element and static values in a span.value, so the selector tries both and takes whichever is present. To find the exact field name for a metric, open a quote page in your browser, right-click the value, and read the data-field attribute on its list item. Day range comes back as a single string like 168.49 - 171.05, which you can split on the dash later if you want separate low and high numbers.
Yahoo Finance reworks its quote markup periodically, and the generated class names change with it. The data-testid and data-field attributes used here are more stable than class names, but treat every selector as a starting template, not a contract. When a field comes back None, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.
Step 4: Assemble the full script
Now wire the pieces into one runnable script: loop over a list of tickers, fetch each rendered quote page with a small retry wrapper, parse the header and statistics into a single record, and export the records to both JSON and CSV.
import csv import json import time from crawlbase import CrawlingAPI from bs4 import BeautifulSoup api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) OPTIONS = { "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/122.0", "ajax_wait": "true", "page_wait": 5000, } STAT_FIELDS = { "previous_close": "regularMarketPreviousClose", "market_cap": "marketCap", "volume": "regularMarketVolume", "day_range": "regularMarketDayRange", } def crawl(page_url): response = api.get(page_url, OPTIONS) if response["headers"]["pc_status"] == "200": return response["body"].decode("utf-8") print(f"Request failed: {response['headers']['pc_status']}") return None def fetch_html(page_url, max_retries=2): for attempt in range(max_retries + 1): html = crawl(page_url) if html: return html if attempt < max_retries: time.sleep(1) return None def text_of(soup, selector): el = soup.select_one(selector) return el.get_text(strip=True) if el else None def scrape_quote(html, symbol): soup = BeautifulSoup(html, "html.parser") record = { "symbol": symbol, "title": text_of(soup, "div.hdr h1"), "price": text_of(soup, '.livePrice[data-testid="qsp-price"]'), "change": text_of(soup, '.priceChange[data-testid="qsp-price-change"]'), "change_percent": text_of(soup, '[data-testid="qsp-price-change-percent"]'), } for key, field in STAT_FIELDS.items(): record[key] = text_of( soup, f'li[data-field="{field}"] span.value, li[data-field="{field}"] fin-streamer' ) return record def save_outputs(records): with open("yahoo_quotes.json", "w") as f: json.dump(records, f, indent=2) if not records: return with open("yahoo_quotes.csv", "w", newline="") as f: writer = csv.DictWriter(f, fieldnames=records[0].keys()) writer.writeheader() writer.writerows(records) def main(): symbols = ["AAPL", "TSLA", "BTC-USD"] records = [] for symbol in symbols: url = f"https://finance.yahoo.com/quote/{symbol}" html = fetch_html(url) if html: records.append(scrape_quote(html, symbol)) time.sleep(2) save_outputs(records) print(f"Saved {len(records)} quotes") if __name__ == "__main__": main()
The script loops over the ticker list, fetches each quote page with the retry wrapper, merges the header fields and the statistics into one record, and paces the loop with a two-second sleep. save_outputs writes both a JSON file and a CSV using the keys of the first record as the header, so you have the data in whichever shape your downstream tool wants. Add or remove symbols in the symbols list to fit your own watchlist.
What the output looks like
Run the full script with python yahoo_scraper.py and you get a clean structured record per symbol, ready for analysis, a database, or a spreadsheet. The values below are illustrative; live figures move every session.
[ { "symbol": "AAPL", "title": "Apple Inc. (AAPL)", "price": "168.99", "change": "-3.70", "change_percent": "(-2.14%)", "previous_close": "172.69", "market_cap": "2.61T", "volume": "54,318,920", "day_range": "168.49 - 171.05" }, { "symbol": "TSLA", "title": "Tesla, Inc. (TSLA)", "price": "156.90", "change": "-4.58", "change_percent": "(-2.84%)", "previous_close": "161.48", "market_cap": "499.81B", "volume": "112,045,300", "day_range": "155.41 - 160.39" } ]
The matching CSV carries the same columns, one row per symbol, which drops straight into pandas or any spreadsheet for charting a watchlist or comparing day moves across tickers.
Scaling to many tickers and staying unblocked
Even with rendering handled, Yahoo watches for scraper-shaped traffic. A few habits keep a longer run healthy, and they apply to any hard commercial target.
- Pace your requests. Hammering quote pages in a tight loop is the fastest way to get throttled or challenged. The two-second sleeps above are the floor, not the ceiling; widen them for larger watchlists and vary your tickers instead of crawling one path at full speed.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
-
Read the status codes. A run that starts returning non-200
pc_statusvalues is telling you the current rate or IP tier is no longer enough. Treat that as a signal to back off, not noise to ignore.
For a large watchlist, the async Crawler queues requests and delivers results to a webhook, which suits fetching hundreds of quote pages without holding open connections. For the broader playbook, see how to scrape websites without getting blocked and the notes on large-scale finance scraping. The same parse pattern carries over to other market sources, such as scraping crypto prices from CoinMarketCap.
Is it legal to scrape Yahoo Finance?
Whether scraping Yahoo Finance is allowed depends on Yahoo's terms of service, your jurisdiction, and what you do with the data. Yahoo's terms restrict automated access and bulk data collection, so scraping can run against those terms regardless of how careful your tooling is. The figures shown on a quote page are factual market data rather than personal data, which lowers the privacy stakes, but it does not put you outside the site's terms. None of the code here changes any of that; it only makes the technical part work. Read Yahoo Finance's Terms of Service and its robots.txt, and treat both as the boundary for what you collect.
The harder constraint on financial data is licensing, not privacy. The prices, market cap, and volume Yahoo displays are not Yahoo's own readings: they come from exchanges and third-party market-data providers, and those feeds carry their own redistribution restrictions. Collecting a number off a public page does not grant you a license to republish it, resell it, or build a commercial product on top of it. Stay on public quote and statistics pages, keep your request volume modest enough that you are not straining Yahoo's servers, and do not scrape anything behind a login, a paywall, or a premium tier such as Yahoo Finance Plus.
This guide is deliberately scoped to public market data because that is the line that keeps the work defensible. For anything beyond light research, ad hoc analysis, or a personal watchlist, the right path is a licensed feed: Yahoo and its data partners, along with dedicated market-data vendors and exchange APIs, offer official, terms-compliant access for production use. If you are weighing options, our roundup of the best financial data providers is a good place to start. A licensed feed, not a cleverer scraper, is the correct route for commercial or high-volume use.
Key takeaways
- Yahoo Finance is client-side rendered. A plain request returns a thin shell with placeholder values, so you must render the page before you parse the price and statistics.
-
You need rendering and a trusted IP together. The Crawling API with a JS token does both in one call;
ajax_waitandpage_waitcontrol how long it waits for the live numbers. -
Parse by stable attributes. Read the header off
data-testidvalues likeqsp-priceand the statistics offdata-fieldnames likemarketCap, which survive markup changes better than class names. - Loop and export. Iterate a ticker list, pace the run with short sleeps, and write the records to JSON and CSV so the data drops into pandas or a spreadsheet.
- Mind the licensing. The numbers come from exchanges and data providers with their own redistribution terms; stay on public pages and use a licensed feed for any commercial or high-volume use.
Frequently Asked Questions (FAQs)
Why does a plain request return empty prices from Yahoo Finance?
Because Yahoo loads its live price, change, and statistics table client-side with JavaScript. The initial HTML is a shell that fills in only after the page's scripts run in a browser, so a raw HTTP request returns status 200 with placeholder or empty value nodes. To get the real figures you have to render the page first, which is what the Crawling API's JS token handles for you.
Do I need the normal token or the JS token for Yahoo Finance?
The JS token. The normal token fetches static HTML, which on a Yahoo quote page is the same thin shell a plain fetch returns. The JS token renders the page in a real browser before handing back the HTML, so the price, change, and statistics are present when BeautifulSoup parses them.
Which fields can I extract from a Yahoo Finance quote page?
The public market data on the page: the last price, the absolute and percentage change from the previous close, the previous close itself, market cap, volume, and the day range. These are factual quote and statistics fields, not personal data. Stay on the public quote pages and avoid anything behind a login or a premium tier.
My selectors return None. What changed?
Most likely Yahoo's markup. The site reworks its quote layout periodically, and generated class names change with it. The data-testid attributes (qsp-price, qsp-price-change) and data-field names (marketCap, regularMarketVolume) used here are more stable than classes, but they can still move. Re-inspect a live page in your browser's dev tools and update the selectors; periodic maintenance is normal for any production scraper.
How do I scrape many tickers without getting blocked?
Loop over your symbol list, keep a short sleep between requests, and let the Crawling API rotate residential IPs so no single address trips a rate limit. For a large watchlist, move to the async Crawler, which queues requests and posts results to a webhook instead of holding connections open. Watch the pc_status header and back off if it starts returning non-200 values.
Can I use scraped Yahoo Finance data commercially?
Treat that as a legal question, not a technical one. The prices and statistics on Yahoo come from exchanges and licensed market-data providers with their own redistribution terms, and Yahoo's own Terms of Service restrict automated collection and reuse. For commercial or high-volume use, the correct route is a licensed market-data feed or an official API, not a scraper. Review the terms and seek legal advice before building a product on top of the data.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

