How to Scrape Forbes Data: articles and lists with Node

Q: My selectors return empty values. What changed?

Almost certainly Forbes' markup. Class names like stream-item__author and the a.color-link headline anchors change over time, so selectors that worked last month can break. Re-inspect a live page in your browser's dev tools, update the selectors in parseForbesData, and you are back in business. Periodic selector maintenance is normal for any production scraper.

Forbes is one of the largest business and financial news sites on the open web, covering industries, companies, markets, and the people behind them. Its public section and topic pages carry a lot of structured signal: article headlines, who wrote them, when they were published, and which topic each one sits under. Analysts watch those listings to track what a sector is talking about, content teams use them to map coverage and trends, and researchers index headlines to follow a story over time. All of that metadata sits on the public listing page in a predictable layout, before you ever open an article.

This guide shows you how to scrape Forbes data with JavaScript and Node.js using Cheerio. You build a small, runnable scraper that fetches a public Forbes topic or section page through the Crawling API, parses the headline, author byline, article link, published date, and section for each story, handles pagination, and exports the result as JSON and CSV. The whole walkthrough stays scoped to public listing metadata: headlines and links only, never the full article text or media. The legality section near the end is not boilerplate, because Forbes content is copyrighted, so read it before you point this at any real volume.

What you will build

A Node.js script that takes a public Forbes section or topic URL, retrieves the rendered HTML through the Crawling API, and extracts a structured record for every article card on the listing page. We use a technology section as the running example and pull these fields per story:

Headline the article title shown on the listing card.
Author the byline, the contributor or staff writer credited on the card.
Link the public URL to the individual article.
Published date the date or relative time the listing shows for the story.
Section the topic or channel the listing belongs to, such as technology or business.

Note what is deliberately absent: the article body, images, and any media. This scraper collects links and metadata so you can index and track coverage, not the copyrighted content itself.

Why a plain request fails on Forbes

If you request a Forbes section URL with a bare HTTP client, you rarely get the article cards back. Two things work against you. First, Forbes loads much of its listing content in the browser with JavaScript, so the initial HTML is a near-empty shell until the page's scripts run. Second, Forbes flags automated traffic: datacenter IPs and request patterns that do not look like a real browser get rate-limited, challenged, or blocked before they ever reach the rendered listings.

So a working Forbes scraper needs two things in one request: a browser that actually renders the page, and an IP the site reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL, it renders the page behind a trusted IP, and it returns finished HTML for you to parse with Cheerio.

Use the JavaScript token

The Crawling API gives you two tokens: a normal one and a JavaScript one. Forbes needs the page rendered in a real browser, so use your JavaScript token for every request in this guide. The normal token returns the unrendered shell and your selectors will come back empty.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic JavaScript and Node.js. You should be comfortable writing and running a Node script and installing packages with npm. If you are new to Node, our guide to building a web scraper with Node.js covers the basics this tutorial assumes.

Node.js 16 or later. Confirm your version with node --version. If you do not have it, install it from the Node.js website or through a version manager like nvm.

A Crawlbase account and token. Sign up, open your dashboard, and copy your JavaScript token. The free tier gives you 1,000 requests with no card, and you only pay for successful requests. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a project folder, initialize it, and install the two libraries the scraper needs.

bash

node --version

mkdir forbes-scraper && cd forbes-scraper
npm init -y

npm install crawlbase cheerio

Two dependencies do the work: crawlbase is the official Node client for the Crawling API, and cheerio parses the returned HTML with a jQuery-style API so you can pull out individual fields by CSS selector. Create a file named scraper.js in this folder and add the code from the steps below.

Step 1: Fetch the rendered listing page

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JavaScript token, and request a public Forbes section URL. Forbes loads cards in as you scroll, so passing ajax_wait and a short page_wait tells the API to wait for the dynamic content before returning. Checking the status code before you parse keeps failures loud instead of silent.

javascript

const { CrawlingAPI } = require('crawlbase');

const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });

const forbesPageURL = 'https://www.forbes.com/technology/';

api
  .get(forbesPageURL, { ajax_wait: 'true', page_wait: '5000' })
  .then((response) => {
    if (response.statusCode === 200) {
      console.log(response.body.slice(0, 500));
    }
  })
  .catch((error) => console.error('API request error:', error));

Run the script with node scraper.js and you should see real Forbes listing markup at the top of the body, not a stripped-down shell. That confirms rendering works before you write a single selector. The Crawling API uses the JavaScript token you supplied to render the page in a real browser, and ajax_wait with page_wait give the lazy-loaded cards time to populate, so the article links are present in the HTML you get back.

Crawlbase Crawling API

That first request just returned a fully rendered Forbes section page without a headless browser or a proxy on your side. The Crawling API runs the page in a real browser, rotates through residential IPs server-side, and handles the rate limits and challenges Forbes throws at scrapers, so you get finished HTML from one call. Point it at a public section on the free tier first, then add your parser.

Start free

Step 2: Parse each article card with Cheerio

With rendered HTML in hand, load it into Cheerio and walk the article cards. Forbes lists each story inside a stream item, and the headline link is the same kind of a.color-link anchor the site uses throughout its listings, so you select every article link, then read the headline, author byline, published date, and section from inside or around each card. Reading every field defensively keeps one missing value from crashing the run.

javascript

const cheerio = require('cheerio');

function parseForbesData(html, section) {
  const $ = cheerio.load(html);
  const articles = [];
  const seen = new Set();

  // One record per article card in the listing stream
  $('article.stream-item').each((_, element) => {
    const card = $(element);

    const titleLink = card.find('a.color-link').first();
    const headline = titleLink.text().trim();
    let link = titleLink.attr('href') || '';
    if (link && link.startsWith('/')) {
      link = new URL(link, 'https://www.forbes.com').href;
    }

    const author = card.find('.stream-item__author').text().trim();
    const date = card.find('.stream-item__date').text().trim();

    // Skip empty cards and de-duplicate repeated links
    if (!headline || !link || seen.has(link)) return;
    seen.add(link);

    articles.push({
      headline,
      author: author || 'N/A',
      date: date || 'N/A',
      section: section || 'N/A',
      link,
    });
  });

  return articles;
}

A few details keep this faithful to the page. Each story lives in an article.stream-item, and the headline link is the a.color-link anchor inside it, which is the same link class the legacy Forbes listings use. The author comes from .stream-item__author and the published date from .stream-item__date. The link is read from the anchor's href and resolved to an absolute URL so it works outside the page, and a Set drops the duplicate links a listing stream often repeats. The section is passed in from the caller, since you already know which channel URL you requested.

Selectors drift

Forbes ships markup changes regularly, so the class names above are a starting template, not a contract. When a field comes back empty, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.

Step 3: Assemble the full script with JSON and CSV export

Now wire the fetch and the parse into one runnable script, then write the records to disk as both JSON and CSV.

javascript

const fs = require('fs');
const { CrawlingAPI } = require('crawlbase');
const cheerio = require('cheerio');

const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });

async function crawl(pageUrl) {
  const response = await api.get(pageUrl, { ajax_wait: 'true', page_wait: '5000' });
  if (response.statusCode === 200) return response.body;
  console.error(`Request failed: ${response.statusCode}`);
  return null;
}

function toCsv(rows) {
  const headers = ['headline', 'author', 'date', 'section', 'link'];
  const escape = (value) => `"${String(value).replace(/"/g, '""')}"`;
  const lines = [headers.join(',')];
  for (const row of rows) {
    lines.push(headers.map((h) => escape(row[h])).join(','));
  }
  return lines.join('\n');
}

async function main() {
  const section = 'technology';
  const url = `https://www.forbes.com/${section}/`;
  const html = await crawl(url);
  if (!html) return;

  const articles = parseForbesData(html, section);
  fs.writeFileSync('forbes.json', JSON.stringify(articles, null, 2));
  fs.writeFileSync('forbes.csv', toCsv(articles));
  console.log(`Saved ${articles.length} articles to JSON and CSV`);
}

main();

Paste the parseForbesData function from Step 2 into the same file so main can call it. Run it with node scraper.js and you get two files: forbes.json with the full structured records and forbes.csv ready to open in a spreadsheet. The toCsv helper quotes every field and doubles any embedded quotes, which matters here because headlines frequently contain commas.

What the output looks like

The JSON file holds one object per article, each with the headline, author byline, published date, section, and link. The values below are illustrative placeholders, not live Forbes data.

json

[
  {
    "headline": "How AI Startups Are Reshaping Enterprise Software",
    "author": "Jane Doe",
    "date": "Jun 10, 2026",
    "section": "technology",
    "link": "https://www.forbes.com/sites/example/2026/06/10/ai-startups-enterprise/"
  },
  {
    "headline": "The Quiet Rise Of Mid-Market Cloud Providers",
    "author": "John Smith",
    "date": "Jun 9, 2026",
    "section": "technology",
    "link": "https://www.forbes.com/sites/example/2026/06/09/mid-market-cloud/"
  }
]

The CSV mirrors the same rows with a header line, so it drops straight into Excel, Google Sheets, or any data pipeline that reads delimited files.

csv

headline,author,date,section,link
"How AI Startups Are Reshaping Enterprise Software","Jane Doe","Jun 10, 2026","technology","https://www.forbes.com/sites/example/2026/06/10/ai-startups-enterprise/"
"The Quiet Rise Of Mid-Market Cloud Providers","John Smith","Jun 9, 2026","technology","https://www.forbes.com/sites/example/2026/06/09/mid-market-cloud/"

Handle pagination

One section page is a demo; a real job pulls more than the first batch of headlines. Forbes section pages expose additional results through a page query parameter, so you can loop over page numbers, fetch each through the Crawling API, parse it with the same function, and stop when a page returns no new articles. Because every listing page shares the same card structure, the parser you already wrote works across all of them without changes.

javascript

async function scrapeAllPages(section, maxPages) {
  const all = [];
  const seen = new Set();

  for (let page = 1; page <= maxPages; page++) {
    const url = `https://www.forbes.com/${section}/?page=${page}`;
    const html = await crawl(url);
    if (!html) break;

    const batch = parseForbesData(html, section)
      .filter((a) => !seen.has(a.link));
    if (batch.length === 0) break; // no new articles

    batch.forEach((a) => seen.add(a.link));
    all.push(...batch);
    console.log(`Page ${page}: ${batch.length} new articles`);

    // Pace requests so you stay under the rate limit
    await new Promise((r) => setTimeout(r, 2000));
  }

  return all;
}

The exact pagination parameter can change, so check a couple of real "next page" links in your browser and match the pattern. The important habits carry over to any target: loop until the results run out, de-duplicate by link so you do not store the same headline twice, and put a short delay between requests so you are not hammering the site. For more on rendered, JavaScript-heavy pages like this one, see our guide to crawling JavaScript websites.

Staying unblocked

Even with rendering handled, Forbes watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.

Pace your requests. Introduce a delay between page fetches rather than hammering the section in a tight loop. Spreading requests out is the single biggest factor in staying under Forbes' rate limits.
Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a limit or a challenge. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
Read the status codes. A run that starts returning challenges or non-200 responses is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.

For the broader playbook, see how to scrape websites without getting blocked. If you want similar headline metadata from other news sources, the same fetch-then-parse pattern carries straight over to scraping Google News.

Is it legal to scrape Forbes?

Whether scraping Forbes is allowed depends on Forbes' terms of service, your jurisdiction, and what you do with the data. Forbes' terms restrict automated access and reuse of its content, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Forbes' Terms of Service and its robots.txt, respect any rate expectations they state, and treat both as the boundary for what you collect. Never scrape anything behind a login or a paywall: paywalled and subscriber-only articles are off limits, and bypassing access controls to reach them is a separate and more serious line to cross.

This guide is deliberately scoped to public listing metadata: the headline, author byline, public article link, published date, and section that anyone can see on a section page without an account. Crucially, it does not collect the article body, photos, charts, or video. Forbes articles and media are copyrighted works, and copying or republishing full text or media is copyright infringement, not fair scraping. Collecting headlines and links to index, monitor, or point back to the original is very different from reproducing the content itself. Keep your stored data to facts and pointers, attribute and link back to the source, and do not present Forbes' editorial work as your own or behind your own wall.

If your project needs more than public headlines and links, the right path is a sanctioned one, not a cleverer scraper. Forbes offers official feeds and licensing and syndication arrangements for reusing its content with clear terms, attribution rules, and commercial rights. Those are the correct tools when you need full text, large volumes, guaranteed structure, or the right to republish. When you are unsure whether a use is allowed, get a license or a data agreement rather than assuming silence is consent.

Recap

Key takeaways

Forbes renders listings client-side and guards its traffic. A plain request returns an empty shell or a challenge, so you must render the page behind a trusted IP, using the JavaScript token, before you parse it.
The Crawling API does both in one call. It renders the page in a real browser, rotates residential IPs, and waits for the lazy-loaded cards, returning finished HTML you parse with Cheerio.
Cheerio extracts the metadata. Select every article.stream-item, read the headline and link from the a.color-link anchor, plus author, date, and section, and expect the class names to drift.
Paginate and export. Loop over Forbes' section pages until no new articles appear, de-duplicate by link, pace your requests, and write structured records to both JSON and CSV.
Stay on public metadata and respect copyright. Collect headlines, links, bylines, and dates only, never the article body or media, respect ToS, robots.txt, and the paywall, and prefer Forbes' official feeds or licensing for reuse.

Frequently Asked Questions (FAQs)

Can I extract data from Forbes?

You can collect public listing metadata such as headlines, article links, bylines, and published dates, as long as you respect Forbes' Terms of Service and robots.txt and do not copy the full article text or media. Keep collection to public pages, never anything behind a login or paywall, and prefer Forbes' official feeds or a licensing agreement when you need to reuse content. This guide stays inside those lines by storing only headlines and links, not article bodies.

Why does a plain request return incomplete data from Forbes?

Because Forbes loads much of its listing content client-side with JavaScript and challenges automated traffic. A raw HTTP request from a datacenter IP usually returns an empty shell or a block page rather than the article cards. To get a complete page you have to render it behind a trusted IP, which is what the Crawling API handles for you when you use the JavaScript token together with ajax_wait and page_wait.

Why use the Crawling API instead of running Puppeteer myself?

A headless browser like Puppeteer can render Forbes, but you then have to run and maintain the browser fleet, attach a rotating residential proxy pool, and handle challenges yourself, which is most of the work and gets slow at volume. The Crawling API folds rendering, IP rotation, and challenge handling into one call, so you send a URL and get finished HTML back. You spend your time on parsing, not on keeping infrastructure healthy.

My selectors return empty values. What changed?

Almost certainly Forbes' markup. Class names like stream-item__author and the a.color-link headline anchors change over time, so selectors that worked last month can break. Re-inspect a live page in your browser's dev tools, update the selectors in parseForbesData, and you are back in business. Periodic selector maintenance is normal for any production scraper.

Can I scrape the full text of Forbes articles?

No, and you should not. Forbes articles and media are copyrighted, so copying and republishing the body text, images, or video is copyright infringement rather than fair scraping. This guide collects only the public headline, link, byline, date, and section so you can index and point back to the original. If you need full content, use Forbes' official feeds or a licensing and syndication agreement, which grant the reuse rights a scraper never can.

Will I get blocked while scraping Forbes?

You can, if you send too many requests too fast from one address. The Crawling API reduces that risk by rotating through residential IPs and handling challenges for you, but you should still pace your requests, add delays between pages, and watch the status codes so you can back off when challenges appear. Those habits matter on any hard commercial target.

Hassan Rehan

Software Engineer · Crawlbase

Software engineer at Crawlbase writing hands-on guides on rotating proxies, scraping, and the practical details of wiring proxies into real code.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Get a free API key →Read the docs

Self-serve · No sales call required · Enterprise crawl volumes available

What you will build

Why a plain request fails on Forbes

Prerequisites

Set up the project

Step 1: Fetch the rendered listing page

Step 2: Parse each article card with Cheerio

Step 3: Assemble the full script with JSON and CSV export

What the output looks like

Handle pagination

Staying unblocked

Is it legal to scrape Forbes?

Key takeaways

Frequently Asked Questions (FAQs)

Can I extract data from Forbes?

Why does a plain request return incomplete data from Forbes?

Why use the Crawling API instead of running Puppeteer myself?

My selectors return empty values. What changed?

Can I scrape the full text of Forbes articles?

Will I get blocked while scraping Forbes?

Crawl any site at scale, without fighting infrastructure.

Continue Reading

How to Scrape Google People Also Ask: full PAA extraction guide

Introducing the New Crawlbase Dashboard: a cleaner control center

13 Tips to Master Data Crawling: crawls that do not break

The infrastructure brief, in your inbox.

We use cookies

Customize cookies