G2 is where software buyers go to read what real users think before they commit. Its review pages carry the kind of signal product, sales, and competitive-intelligence teams want at scale: star ratings, review titles, the actual prose of each review, the reviewer's role or company segment, and when the review was posted. The catch is that G2 renders those pages dynamically and sits behind heavy, Cloudflare-style bot defenses, so a plain HTTP request from Node gets challenged long before it ever sees a review.

This guide shows you how to scrape G2 reviews with JavaScript the reliable way. You build a small Node.js script that fetches a rendered review page through the Crawling API, parses it with cheerio, and pulls a clean record per review. We keep the whole walkthrough scoped to public review data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.

What you will build

A Node.js script that takes a public G2 product reviews URL, retrieves the rendered HTML through the Crawling API, and extracts a structured list of reviews. We will use a public product's reviews page as the running example and pull these fields per review:

  • Review title the short headline a reviewer gives their review.
  • Star rating the numeric score, for example "4.5".
  • Review text the body of the review.
  • Reviewer role or segment the public job role or company-size label shown on the review.
  • Date when the review was posted.

Alongside those, we will also grab a little page-level context: the product name and the headline star average. That gives each batch of reviews something to hang on.

Why a plain fetch fails on G2

If you request a G2 reviews URL with a bare HTTP client like fetch or axios, you do not get review data back. Two things work against you. First, G2 renders much of its review content in the browser, so the raw HTML you receive is incomplete until the page's scripts run. Second, and more importantly, G2 runs aggressive anti-bot protection: datacenter IPs, missing browser fingerprints, and scraper-shaped request patterns get served a challenge page or an outright block instead of the content. You will see a 403, a CAPTCHA interstitial, or a "checking your browser" holding page rather than reviews.

So a working G2 scraper needs two things in one request: an IP the platform reads as a real visitor, and rendering when the page needs it. You can try to assemble that yourself with a headless browser plus a pool of rotating residential proxies, but keeping that stack healthy against a target that actively fights bots is most of the work. The Crawling API folds both into a single call: you send it the URL, it fetches the page behind a trusted residential IP with the right handling for the target, and it returns usable HTML for you to parse.

G2 is a hard target

G2's bot defenses are stronger than a typical site, so the Crawling API uses a tailored path for it rather than a generic fetch. If you sign up and your requests to G2 still come back as challenges, contact support to have the G2-specific handling enabled on your account. Once it is on, the code below works unchanged.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic JavaScript and Node. You should be comfortable writing and running a Node.js script and installing packages with npm. If you are newer to building scrapers in this stack, our walkthrough on building a web scraper with Node.js is a gentle starting point.

Node.js installed. Confirm with node --version. If you do not have it, download the LTS build from nodejs.org and run the installer for your operating system.

A Crawlbase account and token. Sign up, open your dashboard, and copy your request token from the account docs page. Treat the token like a password: it authenticates your requests, so keep it out of version control and out of any committed file.

Set up the project

Create a fresh project folder, initialize it, and install the two libraries the scraper needs: the official Crawlbase client and cheerio for parsing.

bash
mkdir g2-reviews-scraper
cd g2-reviews-scraper
npm init --yes

npm install crawlbase cheerio

Two dependencies do the work: crawlbase is the official client for the Crawling API, and cheerio gives you a jQuery-style API for querying the returned HTML so you can pull out individual fields by CSS selector. You do not need Express, a database, or any web server to extract the data; those belong in whatever you build around the scraper, not in the scraper itself.

Step 1: Fetch the rendered reviews page

Start by getting the page back at all. Import the CrawlingAPI class, initialize it with your token, and request the product's reviews URL. Checking the status code before you parse keeps failures loud instead of silent, which matters a lot on a target that returns challenge pages with a 200 body if you are not careful.

javascript
const { CrawlingAPI } = require("crawlbase");

const api = new CrawlingAPI({ token: "YOUR_CRAWLBASE_TOKEN" });

async function crawl(pageUrl) {
  const response = await api.get(pageUrl);
  if (response.statusCode === 200) {
    return response.body;
  }
  console.error(`Request failed: ${response.statusCode}`);
  return null;
}

(async () => {
  const pageUrl = "https://www.g2.com/products/xcode/reviews";
  const html = await crawl(pageUrl);
  console.log(html ? html.slice(0, 500) : "No HTML returned");
})();

Save this as scraper.js and run it with node scraper.js. If everything is wired up, you see real review markup in the first 500 characters, not a challenge page. That single check confirms the hard part, getting past G2's defenses, works before you write a single selector. If G2 ever serves a page that needs client-side rendering to fill in, add a JavaScript token and the ajax_wait and page_wait options to the request; for the standard reviews page the default fetch through the API is usually enough.

Crawlbase Crawling API

G2 fights bots hard, so the value is not parsing HTML, it is getting clean HTML back at all. The Crawling API fetches the page behind rotating residential IPs with G2-specific handling, absorbs the challenges and CAPTCHAs, and hands you the markup, so you skip running a headless fleet and a proxy pool yourself. Point it at one public reviews page on the free tier first.

Step 2: Parse the reviews with cheerio

With usable HTML in hand, load it into cheerio and walk the review list. G2 lays each review out as a repeating card, so the pattern is: select the page-level context once, then iterate the review elements and pull the same fields out of each one. Wrap the extraction in a try/catch so one malformed card does not abort the whole run.

javascript
const cheerio = require("cheerio");

function parseReviews(html) {
  try {
    const $ = cheerio.load(html);
    const data = {
      productName: $(".product-head [itemprop=name]").text().trim(),
      averageStars: $("#products-dropdown .fw-semibold").first().text().trim(),
      reviews: [],
    };

    $(".nested-ajax-loading > div.paper").each((_, el) => {
      const card = $(el);
      const title = card.find("[itemprop=name]").first().text().trim();
      const stars = card.find("[itemprop='ratingValue']").attr("content");
      const text = card.find(".pjax").text().trim();
      const role = card.find("[ue=tooltip]")
        .map((_, label) => $(label).text().trim())
        .get()
        .join(", ");
      const date = card.find(".x-current-review-date").text().trim();

      data.reviews.push({ title, stars, text, role, date });
    });

    return data;
  } catch (error) {
    console.error("Parse error:", error.message);
    return null;
  }
}

A few things are worth calling out. The star rating is read from the content attribute of the ratingValue element rather than its visible text, because G2 exposes the numeric score there cleanly. The reviewer role or segment is pulled from the tooltip labels G2 attaches to each card, joined into a single string, since a review can carry more than one public label, for example a job role plus a company-size band. Everything is trimmed so you do not store reviews padded with whitespace.

Selectors drift

G2's class names and markup change without notice. Treat the selectors above as a starting template, not a contract. When a field comes back empty across every review, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.

Step 3: Put it together

Now wire the fetch and the parse into one runnable script. Fetch the rendered HTML, hand it to the parser, and print the structured result. This is the whole scraper in one file.

javascript
const { CrawlingAPI } = require("crawlbase");
const cheerio = require("cheerio");

const api = new CrawlingAPI({ token: "YOUR_CRAWLBASE_TOKEN" });

async function crawl(pageUrl) {
  const response = await api.get(pageUrl);
  if (response.statusCode === 200) {
    return response.body;
  }
  console.error(`Request failed: ${response.statusCode}`);
  return null;
}

function parseReviews(html) {
  const $ = cheerio.load(html);
  const data = {
    productName: $(".product-head [itemprop=name]").text().trim(),
    averageStars: $("#products-dropdown .fw-semibold").first().text().trim(),
    reviews: [],
  };

  $(".nested-ajax-loading > div.paper").each((_, el) => {
    const card = $(el);
    data.reviews.push({
      title: card.find("[itemprop=name]").first().text().trim(),
      stars: card.find("[itemprop='ratingValue']").attr("content"),
      text: card.find(".pjax").text().trim(),
      role: card.find("[ue=tooltip]")
        .map((_, label) => $(label).text().trim()).get().join(", "),
      date: card.find(".x-current-review-date").text().trim(),
    });
  });

  return data;
}

(async () => {
  const pageUrl = "https://www.g2.com/products/xcode/reviews";
  const html = await crawl(pageUrl);
  if (!html) return;
  const data = parseReviews(html);
  console.log(JSON.stringify(data, null, 2));
})();

What the output looks like

Run the full script with node scraper.js and you get a clean structured object: the product, its average rating, and an array of reviews, each ready to write to JSON, CSV, or a database.

json
{
  "productName": "Xcode",
  "averageStars": "4.5",
  "reviews": [
    {
      "title": "Solid IDE for native Apple development",
      "stars": "5",
      "text": "The integration with the Apple toolchain is seamless...",
      "role": "Software Engineer, Small-Business",
      "date": "Aug 12, 2025"
    }
  ]
}

Scaling across review pages

One page is a demo; a real job runs across every page of reviews for a product, and often across several products. G2 paginates its reviews, so the page you want is reachable by appending a page number to the reviews URL. The shape stays the same: build the page URL, fetch it through the Crawling API, parse it with the same function, and collect the rows until a page comes back empty.

javascript
function sleep(ms) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

async function scrapeAllPages(productSlug, maxPages) {
  const base = `https://www.g2.com/products/${productSlug}/reviews`;
  const all = [];

  for (let page = 1; page <= maxPages; page++) {
    const url = page === 1 ? base : `${base}?page=${page}`;
    const html = await crawl(url);
    if (!html) break;

    const { reviews } = parseReviews(html);
    if (!reviews.length) break;

    all.push(...reviews);
    await sleep(2000);
  }

  return all;
}

Two details keep this healthy. The loop stops as soon as a page returns no reviews, so you do not request past the end of the list. And the sleep call between requests paces the run; hammering G2 in a tight loop is the fastest way to get throttled even with a managed API in front of you. Two seconds is a reasonable floor. If you are pulling many products, the asynchronous Crawler lets you push URLs and collect results via webhook instead of blocking on each fetch.

Staying unblocked

Even with the Crawling API absorbing most of G2's defenses, a few habits keep a long run healthy, and they apply to any hard commercial target.

  • Pace your requests. Spread requests out with a delay between pages rather than firing them as fast as the loop allows. Steady and slow finishes; fast and greedy gets challenged.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you ever roll your own stack, this is the part to get right.
  • Read the status codes. A run that starts returning 403s or challenge pages is telling you the current rate is too high. Back off and slow down rather than retrying harder.

For the broader playbook, see how to bypass Cloudflare and avoid bot detection and the deeper dive on how to bypass CAPTCHAs while web scraping. If you would rather route your own Node traffic through a rotating pool instead of using the managed API, the Smart AI Proxy gives you the same residential IP rotation as a drop-in proxy endpoint. And if reviews on other sites are next on your list, our guide to scraping customer reviews covers the general pattern.

Whether scraping G2 is allowed depends on G2's terms of service, your jurisdiction, and what you do with the data. G2's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read G2's Terms of Service and its robots.txt, and treat both as the boundary for what you collect.

A few lines worth holding to. Collect only publicly displayed review data: the review title, star rating, review body, the public role or segment label, and the date that anyone can see on a public reviews page without signing in. Respect G2's stated expectations and keep your request volume low enough that you are not straining its servers. Do not gather reviewer personal or contact data beyond what is publicly shown, and do not try to enrich a public review with a private identity.

This guide is deliberately scoped to public, login-free reviews pages because that is the line that keeps the work defensible. It does not cover anything behind a login, any data gated by a G2 account or a paid tier, reviewer personal or contact information beyond what is public, or any attempt to bypass authentication or G2's protections to reach gated content. G2 runs strong bot defenses for a reason; the right posture is to read only what it shows the public, at a polite rate. If your project needs more than public review data, an official data agreement is the correct path, not a cleverer scraper.

Recap

Key takeaways

  • G2 blocks plain fetches. Its Cloudflare-style defenses serve challenges and 403s to bare HTTP clients, so getting clean HTML back is the hard part, not parsing it.
  • The Crawling API does the heavy lifting. It fetches the page behind rotating residential IPs with G2-specific handling in one call, so you skip running a headless fleet and a proxy pool yourself.
  • cheerio does the extraction. Iterate the review cards and map title, stars, text, public role or segment, and date to current selectors, and expect those selectors to drift.
  • Scale by looping pages. Append a page number, fetch, parse, and pace each request with a delay so a long run does not trip throttling.
  • Stay on public data. Respect G2's ToS and robots.txt, collect only publicly displayed review fields, and never touch logins, gated data, or reviewer personal information.

Frequently Asked Questions (FAQs)

Why does a plain fetch return no reviews from G2?

Because G2 runs aggressive anti-bot protection and renders part of its review content client-side. A bare HTTP request from Node hits a challenge page, a CAPTCHA, or a 403 long before it reaches any review markup. To get real data you have to fetch the page behind an IP G2 reads as a real visitor, with rendering when the page needs it, which is what the Crawling API handles for you.

Do I need a special setup for G2 specifically?

Yes. G2's defenses are stronger than a typical site, so the Crawling API uses a tailored path for it. If you sign up and your G2 requests still come back as challenges, contact support to have the G2-specific handling enabled on your account. Once it is on, the code in this guide works without changes.

How do I handle pagination across G2 review pages?

G2 paginates reviews, so the next page is reachable by appending a page number to the reviews URL, for example ?page=2. Loop from page one upward, fetch and parse each page with the same function, and stop as soon as a page returns no reviews. Put a delay between requests so the run does not get throttled.

My selectors return empty strings. What changed?

Almost certainly G2's markup. Its class names and card structure change without notice, so selectors that worked last month can break. Re-inspect a live reviews page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper, not a sign the approach is wrong.

Can I scrape reviewer names, emails, or other personal data from G2?

No, and this guide does not cover it. Scope your collection to publicly displayed review fields: title, star rating, review text, the public role or segment label, and the date. Reviewer personal or contact data beyond what is publicly shown, anything behind a login, and any attempt to bypass authentication are all out of scope here and run against G2's terms. For more than public review data, the right route is an official data agreement.

Which database should I store the reviews in?

Whatever fits your stack. The scraper returns plain JSON objects, so they drop cleanly into PostgreSQL, MySQL, MongoDB, a cloud store, or even a flat JSON or CSV file for a small run. The extraction is deliberately decoupled from storage so you can choose later without touching the parser.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available