Quora is one of the larger public question-and-answer sites on the web, and the public questions and answers it hosts are a useful signal for content and topic research. The phrasing of a real question, how many answers it drew, and which answers earned the most upvotes together tell you what people actually ask about a subject and which framings resonate. That makes public Quora threads a practical input for SEO planning, content ideation, and audience research.

This guide shows you how to scrape Quora with JavaScript and Node.js using cheerio. You build a small, runnable scraper that fetches a public Quora question page through the Crawling API with rendering enabled, parses the question text, the answer bodies, and the answer and upvote counts, then exports the result to JSON and CSV. The whole walkthrough stays scoped to public Q&A. We treat author names as personal data and aggregate rather than profile, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.

What you will build

A Node.js script that takes a public Quora question URL, retrieves the rendered HTML through the Crawling API, and extracts a structured record for the question and its visible answers. We pull these fields, ported from the original Quora scraper:

  • Question text the actual question, for example "How do I start playing video games?".
  • Question link the canonical URL of the question page.
  • Answer count the total number of answers the question reports, plus how many were present on the page you scraped.
  • Answer text the body of each visible answer, captured for aggregate topic analysis rather than republication.
  • Upvote count the upvote total shown on each answer, your main popularity signal.
  • Answer position the order the answer appeared in, so you can weight by ranking.

Author names appear in the markup, and the legacy scraper captured them. We deliberately do not build a per-author profile out of them. The privacy section explains how to keep names aggregate, and why that matters here.

Why a plain request fails on Quora

If you request a Quora question URL with a bare HTTP client, you get back a thin shell rather than the thread. Quora renders the question, the answers, and the answer counts in the browser with JavaScript, so the initial HTML arrives largely empty until the page's scripts run. On top of that, Quora challenges automated traffic: datacenter IPs and request patterns that do not look like a real browser get redirected to a login or content wall before they ever reach the answers.

So a working Quora scraper needs two things in one request: a browser that actually renders the thread, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted IP, and it returns finished HTML for you to parse with cheerio.

Why the JS token

Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Quora loads the question body and every answer client-side, so the JS token is what gives you a complete page here. The normal token tends to return an empty frame with no answers to parse.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic JavaScript and Node.js. You should be comfortable writing and running a Node script, installing packages with npm, and working with promises and async functions. If selectors and the DOM are new to you, any beginner JavaScript resource covers the ground this tutorial assumes. For a fuller walkthrough of the workflow, see our guide on how to build a web scraper with Node.js.

Node.js 16 or later. Confirm your version with node --version. If you do not have it, install it from the Node.js website or through a version manager like nvm.

A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token from the account docs page. Crawlbase gives you 1,000 free requests to start, and you pay only for successful requests. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a project folder, initialize it, and install the two libraries the scraper needs.

bash
node --version

mkdir quora-scraper && cd quora-scraper
npm init -y

npm install crawlbase cheerio

Two dependencies do the work: crawlbase is the official Node client for the Crawling API, and cheerio parses the returned HTML with a jQuery-style API so you can pull out individual fields by CSS selector. If selectors are new to you, the primer on how to crawl JavaScript websites is a good companion for rendering-heavy targets like this one.

Step 1: Fetch the rendered question page

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JS token, and request the question URL. Checking the status code before you parse keeps failures loud instead of silent.

javascript
const { CrawlingAPI } = require('crawlbase');

const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });

async function crawl(pageUrl) {
  const options = { ajax_wait: 'true', page_wait: 6000 };
  const response = await api.get(pageUrl, options);
  if (response.statusCode === 200) {
    return response.body;
  }
  console.error(`Request failed: ${response.statusCode}`);
  return null;
}

const quoraUrl = 'https://www.quora.com/How-do-I-start-playing-video-games';
crawl(quoraUrl).then((html) => {
  console.log(html ? html.slice(0, 500) : 'No HTML returned');
});

The two wait options matter for a client-rendered target like this. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so late-rendering answers appear before the page is captured. Six seconds is a reasonable start; raise it if the answer list comes back short. Run the script with node scraper.js and you should see real question markup, not a stripped-down shell. That confirms rendering works before you write a single selector.

Crawlbase Quora Scraper

Quora needs a rendered thread behind a trusted IP, in one call, which is exactly what you just saw the crawl function do. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a public question page on the free tier first.

Step 2: Parse the question and answers with cheerio

With rendered HTML in hand, load it into cheerio and read the fields. The question text and link sit at the top of the page; each answer is a repeating block further down. Quora lays answers out in div.q-box containers, so you select the answer blocks, then read the body text and the upvote count from inside each one. Reading every field defensively keeps one missing value from crashing the run.

javascript
const cheerio = require('cheerio');

function parseQuestion(html, pageUrl) {
  const $ = cheerio.load(html);

  const questionText = $('div.puppeteer_test_question_title')
    .first()
    .text()
    .trim() || $('title').text().trim();

  const answers = [];
  $('div.q-box.qu-borderAll').each((i, el) => {
    const block = $(el);
    const answerText = block.find('.q-text').first().text().trim();
    if (!answerText) return;

    const upvoteRaw = block
      .find('.q-click-wrapper')
      .first()
      .text()
      .trim();

    answers.push({
      answerText,
      answerUpvoteCount: upvoteRaw || null,
      answerPosition: i + 1,
    });
  });

  return {
    question: {
      text: questionText,
      link: pageUrl,
      answerCountScraped: answers.length,
      answers,
    },
  };
}

The field names here are ported straight from the original Quora scraper output: question.text, question.link, answerCountScraped, answers, answerText, answerUpvoteCount, and answerPosition. We read the answer body from .q-text and the upvote total from the vote control, then index each answer by position so you can weight popular answers later. We are deliberately not capturing the author name or profile link into the record; the next section explains that choice.

Selectors drift

Quora's class names (q-box, q-text, q-click-wrapper, and the puppeteer_test_ markers) are obfuscated and change without notice. Treat the selectors above as a starting template, not a contract. When a field comes back empty, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.

If you would rather skip selector maintenance entirely, the Crawling API also ships a ready-made quora-question data scraper. Pass { scraper: 'quora-question' } in the options object and the API returns parsed JSON in response.json.body instead of raw HTML, so you do not write cheerio at all. The manual cheerio path above is worth learning because it puts you in control of exactly which fields you keep, which is what the privacy guidance below is about.

Step 3: Put it together

Now wire the fetch and the parse into one runnable script. Fetch the rendered HTML, hand it to the parser, and print the structured record.

javascript
const { CrawlingAPI } = require('crawlbase');
const cheerio = require('cheerio');

const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });

async function crawl(pageUrl) {
  const options = { ajax_wait: 'true', page_wait: 6000 };
  const response = await api.get(pageUrl, options);
  if (response.statusCode === 200) return response.body;
  console.error(`Request failed: ${response.statusCode}`);
  return null;
}

function parseQuestion(html, pageUrl) {
  const $ = cheerio.load(html);
  const questionText = $('div.puppeteer_test_question_title')
    .first().text().trim() || $('title').text().trim();

  const answers = [];
  $('div.q-box.qu-borderAll').each((i, el) => {
    const block = $(el);
    const answerText = block.find('.q-text').first().text().trim();
    if (!answerText) return;
    const upvoteRaw = block.find('.q-click-wrapper').first().text().trim();
    answers.push({
      answerText,
      answerUpvoteCount: upvoteRaw || null,
      answerPosition: i + 1,
    });
  });

  return {
    question: {
      text: questionText,
      link: pageUrl,
      answerCountScraped: answers.length,
      answers,
    },
  };
}

async function main() {
  const quoraUrl = 'https://www.quora.com/How-do-I-start-playing-video-games';
  const html = await crawl(quoraUrl);
  if (!html) return;
  const data = parseQuestion(html, quoraUrl);
  console.log(JSON.stringify(data, null, 2));
}

main();

What the output looks like

Run the full script with node scraper.js and you get a structured record for the question and its visible answers, ready to write to JSON or CSV.

json
{
  "question": {
    "text": "How do I start playing video games?",
    "link": "https://www.quora.com/How-do-I-start-playing-video-games",
    "answerCountScraped": 3,
    "answers": [
      {
        "answerText": "Playing video games is simple, the game will give you some rules, and you play by them.",
        "answerUpvoteCount": "7",
        "answerPosition": 1
      },
      {
        "answerText": "Start with a genre you already enjoy, then pick a beginner-friendly title and learn the controls slowly.",
        "answerUpvoteCount": "3.7K",
        "answerPosition": 2
      }
    ]
  }
}

Notice the record carries no author names or profile links. The upvote count stays as a string because Quora abbreviates large numbers ("3.7K"), and keeping the raw label avoids a lossy conversion. For topic research the question text plus the upvote ranking is usually all you need.

Export to JSON and CSV

For content research you usually want the data on disk, not just in the console. Node's built-in fs module writes JSON in one line, and a small helper flattens the answers into CSV rows so you can open them in a spreadsheet and sort by upvotes. Each CSV row is one answer, with the question text repeated as context.

javascript
const fs = require('fs');

function saveJson(data, file) {
  fs.writeFileSync(file, JSON.stringify(data, null, 2));
}

function csvCell(value) {
  const text = (value == null ? '' : String(value)).replace(/"/g, '""');
  return `"${text}"`;
}

function saveCsv(data, file) {
  const header = ['question', 'answerText', 'answerUpvoteCount', 'answerPosition'];
  const rows = data.question.answers.map((a) =>
    [data.question.text, a.answerText, a.answerUpvoteCount, a.answerPosition]
      .map(csvCell)
      .join(','),
  );
  fs.writeFileSync(file, [header.join(','), ...rows].join('\n'));
}

// In main(), after building `data`:
saveJson(data, 'quora_scraped.json');
saveCsv(data, 'quora_scraped.csv');

The CSV columns are the question, the answer text, the upvote count, and the position, exactly the fields a content or topic analysis needs. Author identity is intentionally absent from both exports.

Scaling to many questions

One question is a demo; topic research usually means a list of questions on a theme. Collect the question URLs you care about (from a Quora search, a sitemap, or your own list), then loop over them, fetch each through the Crawling API, parse with the same function, and concatenate the records. Because every question page shares the same structure, the parser you already wrote works across all of them without changes.

javascript
async function scrapeMany(urls) {
  const all = [];
  for (const url of urls) {
    const html = await crawl(url);
    if (html) all.push(parseQuestion(html, url).question);
  }
  return all;
}

const questions = [
  'https://www.quora.com/How-do-I-start-playing-video-games',
  'https://www.quora.com/What-is-Quora',
];

scrapeMany(questions).then((rows) => {
  console.log(`Collected ${rows.length} questions`);
});

Pace the loop and keep your volume modest. Quora watches for scraper-shaped traffic, so spacing requests out and routing them through rotating residential IPs, which the Crawling API handles for you, keeps a run healthy. For the broader playbook, see how to scrape websites without getting blocked. If you are turning the upvote rankings into a keyword or topic map, the workflow in how to extract and analyze Google SEO data pairs well with this output.

Whether scraping Quora is allowed depends on Quora's Terms of Service, your jurisdiction, and what you do with the data. Quora's terms restrict automated access and bulk collection, so scraping can run against those terms regardless of how careful your tooling is. Read Quora's Terms of Service and its robots.txt, respect the rate limits they imply, and treat both as the boundary for what you collect. None of the code here changes that; it just makes the technical part work, and only on public question pages that anyone can read without an account.

The bigger issue on a platform like Quora is personal data. Author names, profile links, and the credentials people attach to their answers are personal data, and a user's written answer is their content. That is why the scraper in this guide keeps only the question text, the answer bodies for aggregate analysis, the upvote counts, and the answer position, and deliberately drops author names and profile links. Use the output for trends, topic frequency, and which framings draw engagement. Do not build profiles of identifiable people, do not republish an individual's answer tied to their name, and do not assemble a dataset that singles someone out. If you are in the EU or California, the GDPR and CCPA apply the moment personal data is involved: you need a lawful basis to process it and you must honor deletion requests, which is a strong reason to aggregate and discard names rather than store them.

For sanctioned, structured access, prefer an official route. This guide is deliberately scoped to public question and answer pages because that is the line that keeps the work defensible. It does not cover anything behind a login, private or anonymous-author details, direct messages, or any attempt to bypass authentication or a content wall. If your project needs identifiable user data or volume beyond light public research, the right path is a formal data agreement or a partnership with the platform, not a cleverer scraper.

Recap

Key takeaways

  • Quora renders threads client-side. A plain fetch returns an empty frame, so you must render the page with the JS token before you parse it.
  • One call does rendering and a trusted IP. The Crawling API with a JS token handles both; ajax_wait and page_wait control how long it waits for answers to load.
  • cheerio extracts the fields. Read the question text, then map each answer block to its body, upvote count, and position, and expect the obfuscated selectors to drift.
  • Aggregate, do not profile. Keep question text, answer bodies, and upvote counts for topic research; drop author names and profile links so you are not building profiles of identifiable people.
  • Stay on public data. Respect Quora's ToS and robots.txt, keep volume modest, mind GDPR and CCPA when personal data is involved, and prefer an official agreement for anything beyond light public research.

Frequently Asked Questions (FAQs)

Why does a plain fetch return an empty page from Quora?

Because Quora renders the question and every answer client-side with JavaScript. The initial HTML is almost empty until the page's scripts run in a browser, and unauthenticated automated requests are often redirected to a login or content wall. To get a complete thread you have to render it behind a trusted IP, which is what the Crawling API's JS token handles for you.

Do I need the normal token or the JS token for Quora?

Use the JS token. The normal token fetches static HTML, which on Quora comes back as an empty frame with no answers. The JS token renders the page in a real browser before handing back the HTML, so the question body, the answers, and the upvote counts are present when cheerio parses them.

Can I use the ready-made Quora data scraper instead of writing cheerio?

Yes. The Crawling API offers a quora-question data scraper. Pass { scraper: 'quora-question' } in the options object and the API returns parsed JSON in response.json.body instead of raw HTML, so you skip cheerio entirely. Writing your own parser is still worth it when you want tight control over which fields you keep, which matters for keeping author personal data out of your dataset.

My selectors return empty values. What changed?

Almost certainly Quora's markup. Its class names are obfuscated (q-box, q-text, the puppeteer_test_ markers) and change without notice, so selectors that worked last month can break. Re-inspect a live question page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.

Is it OK to store the author names I see in answers?

Treat author names, profile links, and credentials as personal data and avoid storing them. The scraper in this guide drops them on purpose and keeps only the question, the answer text for aggregate analysis, and the upvote counts. If you must touch personal data, the GDPR and CCPA apply: you need a lawful basis and must honor deletion requests, so aggregating and discarding identities is the safer default.

How do I avoid getting blocked while scraping Quora?

Keep your request rate low, space requests out instead of looping at full speed, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation and a trusted IP pool for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing redirects or challenges.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available