Product Hunt is where new software, hardware, and side projects get launched and ranked every day, and the public side of that ranking is a clean signal for anyone tracking what makers are shipping. Founders watch it to size up competitors, researchers use it to spot category trends, and growth teams study which launches earn the most upvotes. The data is right there on each category page: product names, taglines, upvote counts, and a link to every listing.

This guide shows you how to scrape Product Hunt with JavaScript and Node.js using Cheerio. You build a small, runnable scraper that fetches a Product Hunt category page through the Crawling API, parses the product name, tagline, upvotes, and link for each listing, optionally pulls basic public maker details from a profile page, handles pagination, and exports the result as JSON and CSV. The whole walkthrough stays scoped to public product data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.

What you will build

A Node.js script that takes a public Product Hunt category URL, retrieves the rendered HTML through the Crawling API, and extracts a structured record for every product on the list. We use the engineering and development category page as the running example and pull these fields per item:

  • Name the product name as shown on the listing card.
  • Tagline the short description line that sits under the name.
  • Upvotes the upvote count, parsed to a number when present.
  • Reviews the review-count text, such as "151 reviews".
  • Link the URL to the individual product page.

Later in the guide we add a separate pass that reads basic public profile fields for a maker, such as name, headline, follower and following counts, and points, so you can see how the same approach extends from listings to a single public profile.

Why a plain request fails on Product Hunt

If you request a Product Hunt category URL with a bare HTTP client, you rarely get the product list back. Two things work against you. First, Product Hunt renders its listing cards in the browser with JavaScript, so the initial HTML is a near-empty shell until the page's scripts run. Second, the platform watches for automated traffic: datacenter IPs and request patterns that do not look like a real browser get rate-limited or blocked before they reach the rendered product data.

So a working Product Hunt scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL, it renders the page behind a trusted IP, and it returns finished HTML for you to parse with Cheerio.

Why the JavaScript token

Because Product Hunt builds its listings client-side, you request these pages with the Crawling API's JavaScript rendering enabled. In the official Node client that means initializing it with your JavaScript token. Rendered requests cost more credits than plain ones, so the free tier still lets you test the whole flow end to end before you scale up.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic JavaScript and Node.js. You should be comfortable writing and running a Node script, installing packages with npm, and working with the DOM concepts Cheerio mirrors. If you are new to Node, the guide to build a web scraper with Node.js covers the groundwork this tutorial assumes.

Node.js 16 or later. Confirm your version with node --version. If you do not have it, install it from the Node.js website or through a version manager like nvm.

A Crawlbase account and token. Sign up, open your dashboard, and copy your JavaScript token from the account docs page. The free tier gives you 1,000 requests with no card. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a project folder, initialize it, and install the two libraries the scraper needs.

bash
node --version

mkdir producthunt-scraper && cd producthunt-scraper
npm init -y

npm install crawlbase cheerio

Two dependencies do the work: crawlbase is the official Node client for the Crawling API, and cheerio parses the returned HTML with a jQuery-style API so you can pull out individual fields by CSS selector. Create a file named scraper.js in this folder and add the code from the steps below.

Step 1: Fetch the rendered category page

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JavaScript token, and request the category URL. Saving the body to disk lets you inspect the real markup once, then iterate on selectors without spending a request each time.

javascript
const { CrawlingAPI } = require('crawlbase');
const fs = require('fs');

const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });

const producthuntPageURL =
  'https://www.producthunt.com/categories/engineering-development';

api
  .get(producthuntPageURL)
  .then((response) => {
    if (response.statusCode === 200) {
      fs.writeFileSync('response.html', response.body);
      console.log('HTML saved to response.html');
    }
  })
  .catch((error) => console.error('API request error:', error));

Run the script with node scraper.js and you should see real Product Hunt markup written to response.html, not a stripped-down shell. That confirms rendering works before you write a single selector. Replace the example URL with whatever category page you want; every category lives under /categories/ on the same domain.

Crawlbase Crawling API

That first request just returned a fully rendered Product Hunt category page without a headless browser or a proxy on your side. The Crawling API runs the page in a real browser, rotates through residential IPs server-side, and handles the rate limits the platform throws at scrapers, so you get finished HTML from one call. Point it at the engineering and development category on the free tier first.

Step 2: Parse each product with Cheerio

With rendered HTML in hand, load it into Cheerio and walk the product cards. Product Hunt lays each listing out in a repeating container, so you select every card, then read the name, tagline, upvotes, reviews, and link from inside it. Reading each field defensively keeps one missing value from crashing the run.

javascript
const fs = require('fs');
const cheerio = require('cheerio');

function parseProducts(html) {
  const $ = cheerio.load(html);
  const products = [];

  const containers = $(
    'div.flex.direction-column.mb-mobile-10.mb-tablet-15.mb-desktop-15.mb-widescreen-15'
  );

  containers.each((index, element) => {
    const card = $(element);

    const name = card
      .find('div.color-blue.fontSize-18.fontWeight-600')
      .text()
      .trim();

    // Each filled star is one label element inside the rating row
    const upvotes = card.find(
      'div.flex.direction-row.align-center label'
    ).length;

    const reviews = card
      .find('div.ml-3.color-lighter-grey.fontSize-14.fontWeight-400')
      .text()
      .trim();

    const tagline = card
      .find(
        'div.color-lighter-grey.fontSize-mobile-14.fontSize-tablet-16.fontSize-desktop-16.fontSize-widescreen-16.fontWeight-400'
      )
      .text()
      .trim();

    const href = card.find('a').first().attr('href');
    const link = href
      ? new URL(href, 'https://www.producthunt.com').href
      : '';

    if (name) {
      products.push({ rank: index + 1, name, tagline, upvotes, reviews, link });
    }
  });

  return products;
}

A few details keep this faithful to the page. The product name comes from the color-blue fontSize-18 fontWeight-600 block, and the tagline from the long color-lighter-grey font-size class chain that Product Hunt uses for the description line. The upvote and rating row renders one label element per filled marker, so counting those label elements gives you the count without parsing any text. The review-count text sits in its own ml-3 color-lighter-grey block, and the link is read from the card's first anchor and resolved to an absolute URL so it works outside the page.

Selectors drift

Product Hunt's class names (the long fontSize-* chains and the rest) are generated and change without notice. Treat the selectors above as a starting template, not a contract. When a field comes back empty, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken. For the wider technique, see how to crawl JavaScript websites.

Step 3: Assemble the full script with JSON and CSV export

Now wire the fetch and the parse into one runnable script, then write the records to disk as both JSON and CSV.

javascript
const fs = require('fs');
const { CrawlingAPI } = require('crawlbase');
const cheerio = require('cheerio');

const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });

async function crawl(pageUrl) {
  const response = await api.get(pageUrl);
  if (response.statusCode === 200) return response.body;
  console.error(`Request failed: ${response.statusCode}`);
  return null;
}

function toCsv(rows) {
  const headers = ['rank', 'name', 'tagline', 'upvotes', 'reviews', 'link'];
  const escape = (value) =>
    `"${String(value).replace(/"/g, '""')}"`;
  const lines = [headers.join(',')];
  for (const row of rows) {
    lines.push(headers.map((h) => escape(row[h])).join(','));
  }
  return lines.join('\n');
}

async function main() {
  const url =
    'https://www.producthunt.com/categories/engineering-development';
  const html = await crawl(url);
  if (!html) return;

  const products = parseProducts(html);
  fs.writeFileSync('products.json', JSON.stringify(products, null, 2));
  fs.writeFileSync('products.csv', toCsv(products));
  console.log(`Saved ${products.length} products to JSON and CSV`);
}

main();

Paste the parseProducts function from Step 2 into the same file so main can call it. Run it with node scraper.js and you get two files: products.json with the full structured records and products.csv ready to open in a spreadsheet. The toCsv helper quotes every field and doubles any embedded quotes, which matters here because product taglines are long and frequently contain commas.

What the output looks like

The JSON file holds one object per product in listing order, each with the rank, name, tagline, upvote count, review-count text, and link.

json
[
  {
    "rank": 1,
    "name": "The Free Website Guys",
    "tagline": "A free website program that has helped over 10,000 entrepreneurs.",
    "upvotes": 5,
    "reviews": "151 reviews",
    "link": "https://www.producthunt.com/products/the-free-website-guys"
  },
  {
    "rank": 2,
    "name": "Zipy",
    "tagline": "A debugging platform with session replay and network monitoring.",
    "upvotes": 5,
    "reviews": "132 reviews",
    "link": "https://www.producthunt.com/products/zipy"
  }
]

The CSV mirrors the same rows with a header line, so it drops straight into Excel, Google Sheets, or any data pipeline that reads delimited files.

csv
rank,name,tagline,upvotes,reviews,link
"1","The Free Website Guys","A free website program that has helped over 10,000 entrepreneurs.","5","151 reviews","https://www.producthunt.com/products/the-free-website-guys"
"2","Zipy","A debugging platform with session replay and network monitoring.","5","132 reviews","https://www.producthunt.com/products/zipy"

Handle pagination across the category

One page is a demo; a real job walks the whole category. Product Hunt loads more listings as you scroll, and category pages also accept a page parameter you can step through. The simplest reliable pattern is to fetch successive pages, parse each with the same function, and stop when a page returns no new products. Pacing the loop with a short delay keeps your request rate calm.

javascript
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));

async function scrapeCategory(slug, maxPages = 5) {
  const all = [];
  for (let page = 1; page <= maxPages; page++) {
    const url =
      `https://www.producthunt.com/categories/${slug}?page=${page}`;
    const html = await crawl(url);
    if (!html) break;

    const rows = parseProducts(html);
    if (rows.length === 0) break;

    all.push(...rows.map((p) => ({ category: slug, ...p })));
    await sleep(2000);
  }
  return all;
}

scrapeCategory('engineering-development').then((rows) => {
  console.log(`Collected ${rows.length} products`);
});

Because every category page shares the same card structure, the parser you already wrote works across all of them without changes. Tag each row with its category before export and you can compare what is shipping across, say, engineering, design, and AI in one dataset. This pattern carries straight into product research; for a deeper look at turning ranked listings into decisions, see how to automate ecommerce product research.

Scrape basic public profile data

The same fetch-then-parse approach works on a single public maker profile. Profiles expose basic public fields such as a display name, headline, follower and following counts, points, and the products a maker has shipped. The legacy version of this guide read those from the profile page; the snippet below ports the same fields. Keep this scoped to public profile basics only, and read the legality section before running it.

javascript
function parseProfile(html, handle) {
  const $ = cheerio.load(html);
  const profile = {};

  profile.name = $(
    'h1.color-darker-grey.fontSize-24.fontWeight-600'
  ).text().trim();
  profile.headline = $(
    'div.color-lighter-grey.fontSize-18.fontWeight-300'
  ).text().trim();
  profile.followers = $(
    `a[href="/@${handle}/followers"]`
  ).text().trim();
  profile.following = $(
    `a[href="/@${handle}/following"]`
  ).text().replace(/\n\s+/g, ' ').trim();
  profile.points = $(
    'span.color-lighter-grey.fontSize-14.fontWeight-400:contains("points")'
  ).text().trim();

  // Public list of products the maker has shipped
  profile.products = [];
  $('.styles_even__Qeyum, .styles_odd__wazk7').each((i, el) => {
    profile.products.push({
      name: $(el).find('img.styles_thumbnail__Y9ZpZ').attr('alt'),
    });
  });

  return profile;
}

async function scrapeProfile(handle) {
  const html = await crawl(`https://www.producthunt.com/@${handle}`);
  return html ? parseProfile(html, handle) : null;
}

The selectors mirror the legacy fields: the display name from the h1.color-darker-grey heading, the headline from the fontSize-18 fontWeight-300 line, the follower and following counts from the anchors that link to the handle's own followers and following pages, and the points total from the span that contains the word "points". The products list reads each thumbnail's alt text. Pull only these public basics, and do not go further into anything that identifies a private individual.

Staying unblocked

Even with rendering handled, Product Hunt watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any commercial target.

  • Pace your requests. Introduce a delay between page fetches rather than hammering pages in a tight loop. Spreading requests out is the single biggest factor in staying under the platform's rate limits.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
  • Read the status codes. A run that starts returning challenges or non-200 responses is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.

For the broader playbook, see how to scrape websites without getting blocked.

Whether scraping Product Hunt is allowed depends on Product Hunt's terms of service, your jurisdiction, and what you do with the data. Product Hunt's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Product Hunt's Terms of Service and its robots.txt, and treat both as the boundary for what you collect.

A few lines worth holding to. Collect only public product data: the names, taglines, upvote counts, review counts, and links that anyone can see on a category page without an account. Do not harvest personal data about makers beyond the public profile basics shown on the page, and never assemble profiles into a dataset that targets or identifies private individuals. Respect Product Hunt's stated rate expectations and keep your request volume low enough that you are not straining its servers. Do not redistribute Product Hunt's copyrighted media, such as product imagery or maker avatars, as if it were your own. If you plan to reuse the data commercially, get permission or an official agreement rather than assuming silence is consent.

For volume or commercial use, Product Hunt runs an official API. It uses OAuth2 token authentication and applies its own rate limits, and commercial use is restricted by default, so you contact Product Hunt for approval before building a business on it. That official API is the right tool when you need large volumes, guaranteed structure, or commercial rights. This guide is deliberately scoped to public listings and public profile basics because that is the line that keeps the work defensible. It does not cover anything behind a login, private user data, or any attempt to bypass authentication. If your project needs more than public data, the official Product Hunt API or a data agreement is the correct path, not a cleverer scraper.

Recap

Key takeaways

  • Product Hunt renders listings client-side. A plain request returns an empty shell, so you must render the page behind a trusted IP before you parse it.
  • The Crawling API does both in one call. It renders the page with the JavaScript token, rotates residential IPs, and returns finished HTML for you to parse with Cheerio.
  • Cheerio extracts the fields. Select every product container, then read name, tagline, upvotes, reviews, and link, deriving rank from the loop index, and expect the generated class names to drift.
  • Paginate and export. Step through category pages until one comes back empty, then write structured records to JSON and CSV, quoting CSV fields so comma-heavy taglines stay intact.
  • Stay on public data. Respect Product Hunt's ToS and robots.txt, take only public profile basics, never personal data, and prefer the official Product Hunt API for volume or commercial use.

Frequently Asked Questions (FAQs)

What data can I scrape from Product Hunt?

From a public category page you can extract each product's name, tagline, upvote count, review-count text, and the link to its listing. From a public maker profile you can read basic public fields such as the display name, headline, follower and following counts, points, and the products that maker has shipped. Keep collection to these public basics and avoid anything that identifies a private individual beyond what the page openly shows.

Why does a plain request return incomplete data from Product Hunt?

Because Product Hunt renders its listings client-side with JavaScript and watches for automated traffic. A raw HTTP request from a datacenter IP usually returns a near-empty shell rather than the product cards. To get a complete page you have to render it behind a trusted IP, which is what the Crawling API handles for you when you use its JavaScript token.

Do I need the JavaScript token for Product Hunt?

Yes. Product Hunt builds its category and profile pages in the browser, so you request them with JavaScript rendering enabled, which in the official Node client means initializing the Crawling API with your JavaScript token. Rendered requests cost more credits than plain ones, but the free tier still lets you test the full flow before scaling.

My selectors return empty values. What changed?

Almost certainly Product Hunt's markup. Its generated class names, the long fontSize-* chains and the hashed styles_* classes, change without notice, so selectors that worked last month can break. Re-inspect a live page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.

Is there an official Product Hunt API?

Yes. Product Hunt offers an official API that uses OAuth2 token authentication and applies rate limits, with commercial use restricted by default, so you request approval before using it for business purposes. When you need large volumes, guaranteed structure, or commercial rights, the official API is the correct path rather than scraping.

How do I avoid getting blocked while scraping Product Hunt?

Keep your per-IP request rate low, add delays between page fetches, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation and a trusted IP pool for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing challenges.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available