Bloomberg is one of the most influential financial news and markets platforms on the open web. Its section and topic pages publish a steady stream of headlines on technology, markets, economics, and politics, each with a link, a published timestamp, and the section it belongs to. Analysts track what is moving a market, researchers map coverage trends over time, and product teams monitor which stories surface on which desks. All of that signal sits in the public listing layout of a section page, before you ever open an article.
This guide shows you how to scrape Bloomberg with JavaScript and Node.js using Cheerio. You build a small, runnable scraper that fetches a public Bloomberg section page through the Crawling API, parses the headline, article link, published timestamp, and section for each story on the page, and exports the result as JSON and CSV. The whole walkthrough stays scoped to public headline and link metadata. It does not touch full article text, and the legality section near the end is not boilerplate, because Bloomberg's editorial content is copyrighted. Read it before you point this at any real volume.
What you will build
A Node.js script that takes a public Bloomberg section URL, retrieves the rendered HTML through the Crawling API, and extracts a structured record for every story link on the listing page. We use the technology section as the running example and collect these public listing fields per story:
- Headline the article headline shown on the listing card.
- Link the URL to the individual article on bloomberg.com.
-
Published the publication timestamp from the card's
timeelement, as an ISO date. - Section the section or topic the story is filed under, for example "Technology".
Note what is deliberately absent: the article body, the abstract, and any media. This scraper collects links and metadata only, never the copyrighted text that sits behind each headline.
Why a plain request fails on Bloomberg
If you request a Bloomberg section URL with a bare HTTP client, you rarely get a usable listing back. Two things work against you. First, Bloomberg builds its section pages in the browser with JavaScript, so the initial HTML is a near-empty shell until the page's scripts run and the story cards populate. Second, Bloomberg flags automated traffic aggressively: datacenter IPs and request patterns that do not look like a real browser get challenged, rate-limited, or blocked before they ever reach the rendered headlines.
So a working Bloomberg scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL, it renders the page behind a trusted IP, and it returns finished HTML for you to parse with Cheerio.
The Crawling API gives you two tokens: a normal one and a JavaScript one. Bloomberg needs the page rendered in a real browser, so use your JavaScript token for every request in this guide. The normal token returns the unrendered shell and your selectors will come back empty.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic JavaScript and Node.js. You should be comfortable writing and running a Node script, working with the DOM in concept, and installing packages with npm. If you are new to Node, the official docs and any beginner course will get you to the level this tutorial assumes. For a fuller walkthrough, our guide to building a web scraper with Node.js covers the basics.
Node.js 16 or later. Confirm your version with node --version. If you do not have it, install it from the Node.js website or through a version manager like nvm.
A Crawlbase account and token. Sign up, open your dashboard, and copy your JavaScript token from the account docs page. The free tier gives you 1,000 requests with no card, and you only pay for successful requests. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create a project folder, initialize it, and install the two libraries the scraper needs.
node --version mkdir bloomberg-scraper && cd bloomberg-scraper npm init -y npm install crawlbase cheerio
Two dependencies do the work: crawlbase is the official Node client for the Crawling API, and cheerio parses the returned HTML with a jQuery-style API so you can pull out individual fields by CSS selector. Create a file named scraper.js in this folder and add the code from the steps below.
Step 1: Fetch the rendered section page
Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JavaScript token, and request a public Bloomberg section URL. The legacy tutorial used the technology section, so we do the same here. Checking the status code before you parse keeps failures loud instead of silent.
const { CrawlingAPI } = require('crawlbase'); const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' }); const bloombergPageURL = 'https://www.bloomberg.com/technology'; api .get(bloombergPageURL) .then((response) => { if (response.statusCode === 200) { console.log(response.body.slice(0, 500)); } }) .catch((error) => console.error('API request error:', error));
Run the script with node scraper.js and you should see real Bloomberg section markup at the top of the body, not a stripped-down shell. That confirms rendering works before you write a single selector. The Crawling API uses the JavaScript token you supplied to render the page in a real browser, so the story cards are present in the HTML you get back.
That first request just returned a fully rendered Bloomberg technology page without a headless browser or a proxy on your side. The Crawling API runs the page in a real browser, rotates through residential IPs server-side, and handles the challenges Bloomberg throws at scrapers, so you get finished HTML from one call. Point it at a public section page on the free tier first, then add your parser.
Step 2: Parse each headline with Cheerio
With rendered HTML in hand, load it into Cheerio and walk the story links. On a Bloomberg section page, each article is reachable through an anchor whose href points at an /news/articles/ path, and the section title sits in an eyebrow element above the listing. The legacy tutorial read the section from .Eyebrow_sectionTitle-Wew2fboZsjA- a and the publication date from the page's time element via its datetime attribute, so we reuse those and collect one record per story link. Reading each field defensively keeps one missing value from crashing the run.
const cheerio = require('cheerio'); function parseDataFromHTML(html) { const $ = cheerio.load(html); const seen = new Set(); const results = { section: '', articles: [], }; // Section / topic title from the eyebrow element results.section = $('.Eyebrow_sectionTitle-Wew2fboZsjA- a').first().text().trim() || 'Technology'; // One record per public article link on the listing $('a[href*="/news/articles/"]').each((_, element) => { const anchor = $(element); const headline = anchor.text().replace(/\n\s+/g, ' ').trim(); let link = anchor.attr('href'); if (!headline || !link) return; if (link.startsWith('/')) { link = new URL(link, 'https://www.bloomberg.com').href; } if (seen.has(link)) return; // skip duplicate links seen.add(link); // Published timestamp from a nearby time element, if present const timeAttr = anchor .closest('article') .find('time') .attr('datetime'); const published = timeAttr ? timeAttr.split('T')[0] : ''; results.articles.push({ headline, link, published: published || 'Date not available', section: results.section, }); }); return results; }
A few details keep this faithful to the page. The section title comes from the .Eyebrow_sectionTitle-Wew2fboZsjA- anchor, exactly as the legacy parser read it. Each story is matched by an anchor pointing at /news/articles/, and we resolve relative paths to absolute bloomberg.com URLs so the link works outside the page. A Set drops duplicate links, since the same article often appears in more than one module on a section page. The published date is read from a nearby time element's datetime attribute and trimmed to its ISO date with split('T')[0], the same approach the original used for the article timestamp.
Bloomberg's generated class names (the Eyebrow_* suffix above) change without notice, and section pages reshuffle their modules often. Treat the selectors as a starting template, not a contract. When the headline or section comes back empty, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.
Step 3: Assemble the full script with JSON and CSV export
Now wire the fetch and the parse into one runnable script, then write the records to disk as both JSON and CSV. The legacy guide saved the raw HTML to a file and parsed it in a second pass; a single script keeps the moving parts down and does the same job end to end.
const fs = require('fs'); const { CrawlingAPI } = require('crawlbase'); const cheerio = require('cheerio'); const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' }); async function crawl(pageUrl) { const response = await api.get(pageUrl); if (response.statusCode === 200) return response.body; console.error(`Request failed: ${response.statusCode}`); return null; } function toCsv(rows) { const headers = ['headline', 'link', 'published', 'section']; const escape = (value) => `"${String(value).replace(/"/g, '""')}"`; const lines = [headers.join(',')]; for (const row of rows) { lines.push(headers.map((h) => escape(row[h])).join(',')); } return lines.join('\n'); } async function main() { const url = 'https://www.bloomberg.com/technology'; const html = await crawl(url); if (!html) return; const data = parseDataFromHTML(html); fs.writeFileSync('bloomberg.json', JSON.stringify(data, null, 2)); fs.writeFileSync('bloomberg.csv', toCsv(data.articles)); console.log(`Saved ${data.articles.length} headlines to JSON and CSV`); } main();
Paste the parseDataFromHTML function from Step 2 into the same file so main can call it. Run it with node scraper.js and you get two files: bloomberg.json with the full structured records and bloomberg.csv ready to open in a spreadsheet. The toCsv helper quotes every field and doubles any embedded quotes, which matters here because headlines frequently contain commas.
What the output looks like
The JSON file holds the section plus one object per headline, each with the headline text, the article link, the published date, and the section it came from. Headlines and links shown below are illustrative placeholders, not live data.
{ "section": "Technology", "articles": [ { "headline": "Chipmaker Delays Second Plant as Subsidies Stay in Flux", "link": "https://www.bloomberg.com/news/articles/example-chip-delay", "published": "2024-01-18", "section": "Technology" }, { "headline": "Cloud Provider Posts Record Quarter on AI Demand", "link": "https://www.bloomberg.com/news/articles/example-cloud-quarter", "published": "2024-01-17", "section": "Technology" } ] }
The CSV mirrors the same headline rows with a header line, so it drops straight into Excel, Google Sheets, or any data pipeline that reads delimited files.
headline,link,published,section "Chipmaker Delays Second Plant as Subsidies Stay in Flux","https://www.bloomberg.com/news/articles/example-chip-delay","2024-01-18","Technology" "Cloud Provider Posts Record Quarter on AI Demand","https://www.bloomberg.com/news/articles/example-cloud-quarter","2024-01-17","Technology"
Scale across sections and pages
One section page is a demo; a real job tracks several desks over time. Bloomberg exposes a set of public section URLs (technology, markets, economics, politics, and more), so you can loop over them, fetch each through the Crawling API, parse it with the same function, and merge the results. Because every section page shares the same listing structure, the parser you already wrote works across all of them without changes.
async function scrapeSections(sections) { const all = []; for (const path of sections) { const url = `https://www.bloomberg.com/${path}`; const html = await crawl(url); if (!html) continue; const { articles } = parseDataFromHTML(html); all.push(...articles); console.log(`${path}: ${articles.length} headlines`); // Pace requests so you stay under the rate limit await new Promise((r) => setTimeout(r, 2000)); } return all; } // scrapeSections(['technology', 'markets', 'economics']);
For large or repeated runs, the same fetch-then-parse pattern carries straight over to the async Crawler, which queues many URLs and pushes results back to you instead of blocking on each request. For more on rendered, JavaScript-heavy pages like these, see our guide to crawling JavaScript websites, and for the wider context, our notes on large-scale finance scraping.
Staying unblocked
Even with rendering handled, Bloomberg watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.
- Pace your requests. Introduce a delay between section fetches rather than hammering the site in a tight loop. Spreading requests out is the single biggest factor in staying under Bloomberg's rate limits.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a limit or a challenge. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
- Read the status codes. A run that starts returning challenges or non-200 responses is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.
For the broader playbook, see how to scrape websites without getting blocked.
Is it legal to scrape Bloomberg?
Whether scraping Bloomberg is allowed depends on Bloomberg's terms of service, your jurisdiction, and what you do with the data. Bloomberg's terms restrict automated access and reuse of its content, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Bloomberg's Terms of Service and its robots.txt, respect any rate expectations they state, and treat both as the boundary for what you collect. Critically, never scrape anything behind a login or a paywall: a large share of Bloomberg's coverage is gated, and bypassing that gate is both a terms violation and, in many places, a legal one.
This guide is deliberately scoped to public headline and link metadata: the headline text, the article URL, the publication timestamp, and the section a story is filed under, all visible on a section page without logging in. That is very different from the article itself. Bloomberg's reporting, analysis, and media are copyrighted works. Do not scrape, store, or redistribute full article text, abstracts, or images, and do not assemble a derivative archive that reproduces Bloomberg's editorial output. Collecting a headline and a link so a reader can click through to the original on bloomberg.com is a normal, link-led use; copying the body behind that link is not. If any field you touch ever involves identifiable individuals, privacy law such as GDPR and CCPA applies on top of copyright, which is another reason to stay on factual listing metadata.
If your project needs more than public headlines, the right path is a sanctioned one, not a cleverer scraper. Bloomberg offers official and licensed data products, including its Terminal and enterprise data feeds, that deliver market data and content under clear commercial terms, attribution rules, and reuse rights. Those are the correct tools when you need full content, guaranteed structure, high volume, or the right to redistribute. When you are unsure whether a use is allowed, get a license or a data agreement rather than assuming silence is consent. For a broader survey of sanctioned options, see our overview of the best financial data providers.
Key takeaways
- Bloomberg renders listings client-side and blocks hard. A plain request returns an empty shell or a challenge, so you must render the page behind a trusted IP, using the JavaScript token, before you parse it.
- The Crawling API does both in one call. It renders the page in a real browser, rotates residential IPs, and handles challenges, returning finished HTML you parse with Cheerio.
-
Cheerio extracts the public fields. Match every
/news/articles/anchor, read the headline, link, published timestamp, and section, dedupe by link, and expect the generated class names to drift. - Scale and export. Loop over Bloomberg's public section URLs, pace your requests, and write structured records to both JSON and CSV; reach for the async Crawler when volume grows.
- Headlines and links only. Bloomberg's article text and media are copyrighted, so never scrape or redistribute the body, never touch anything behind a login or paywall, respect ToS and robots.txt, and prefer Bloomberg's official or licensed feeds for production use.
Frequently Asked Questions (FAQs)
What data can I collect from Bloomberg with this scraper?
This guide collects public listing metadata only: the article headline, the article link, the publication timestamp, and the section the story is filed under, all visible on a Bloomberg section page without logging in. It does not collect the article body, abstracts, or media, because that content is copyrighted. The output is a set of links and metadata you can use to monitor coverage and click through to the original story.
Why does a plain request return incomplete data from Bloomberg?
Because Bloomberg builds its section pages client-side with JavaScript and challenges automated traffic. A raw HTTP request from a datacenter IP usually returns an empty shell or a block page rather than the story cards. To get a complete page you have to render it behind a trusted IP, which is what the Crawling API handles for you when you use the JavaScript token.
Can I scrape Bloomberg articles that are behind a paywall?
No. This guide is strict on that point: never scrape anything behind a login or a paywall. Gated content is restricted by Bloomberg's terms and bypassing the gate can carry legal exposure as well. Stay on the public headlines and links that anyone can see on a section page, and use Bloomberg's official or licensed products if you need the full, gated content.
My selectors return empty values. What changed?
Almost certainly Bloomberg's markup. Generated class names like Eyebrow_sectionTitle-Wew2fboZsjA- change without notice, and section pages reshuffle their modules often, so selectors that worked last month can break. Re-inspect a live page in your browser's dev tools, update the selectors in parseDataFromHTML, and you are back in business. Periodic selector maintenance is normal for any production scraper.
Can I build a Bloomberg scraper in a language other than JavaScript?
Yes. This guide uses JavaScript with Cheerio, but the same approach works in any language. The Crawling API has libraries and SDKs for several languages, so you fetch the rendered HTML the same way and parse it with whatever HTML parser your stack prefers, such as BeautifulSoup in Python. The selectors and fields stay the same; only the parsing syntax changes.
Does Bloomberg offer an official data feed?
Yes. Bloomberg provides official and licensed data products, including its Terminal and enterprise data feeds, that deliver market data and content under clear commercial terms and reuse rights. If you need full content, high volume, guaranteed structure, or the right to redistribute, that sanctioned route is the correct one. This public-metadata scraper is best for research, monitoring, and link collection where an official agreement is not warranted.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
