How to Extract Facebook Data

A public Facebook business Page is a useful signal source: a brand publishes posts there, customers react to them, and the page-level metadata (name, about text, recent public posts, and the public reaction and comment counts) is the kind of data that feeds competitor research, content benchmarking, and brand monitoring. The problem is that Facebook renders almost everything client-side with JavaScript and AJAX, and it challenges automated traffic aggressively, so a plain HTTP request returns a near-empty loader shell instead of the content you can see in a browser.

This guide shows you how to extract data from a public Facebook Page using JavaScript and Node.js through the Crawling API. You will build a small, runnable script that fetches a rendered public business Page and pulls page-level fields: the page name, the text of public posts, and public engagement counts. The whole walkthrough stays scoped to public business and brand pages only. It does not touch personal profiles, private groups, comments tied to identifiable people, or anything behind a login. The legality and privacy section is near the top of the article for a reason, so read it before you point this at anything.

Read this first

Facebook's Terms of Service strongly restrict automated collection, and most of the platform is personal data. Treat this as an educational, public-data-only walkthrough. For any real project, the sanctioned path is the official Facebook Graph API, not scraping. The "Is it legal to scrape Facebook?" section below is not boilerplate.

What you will build

A Node.js script that takes the URL of a public Facebook business Page, retrieves the rendered HTML through the Crawling API, and returns a structured record of public page-level data. We will use a well-known brand Page as the running example and pull these fields:

Page name the public display name of the business or brand Page, for example "Alibaba.com".
Post text the public text body of recent posts the Page itself published.
Reaction count the public total of reactions shown on each post.
Comment count the public number of comments shown on each post, as an aggregate count only.
Share count the public number of shares shown on each post.

Note what is deliberately absent: no commenter names, no individual comment text, no profile details, no follower lists. Those are personal data and out of scope. We aggregate at the page and post level and stop there.

Why a plain request fails on Facebook

If you request a Facebook Page URL with a bare HTTP client, you get a response that is technically successful but practically empty. Two forces work against you.

First, Facebook builds the page in the browser. The page name, the about section, and every post are loaded dynamically through JavaScript and AJAX calls after the initial document arrives, and more posts appear only as you scroll. Fetch the raw URL and you mostly get the markup for loading spinners, not the content rendered around them. Capturing the real data means waiting for those AJAX calls to resolve and simulating the scroll that triggers additional content.

Second, Facebook actively defends against automated traffic. It watches IP addresses, flags request patterns that do not look like a real browser, and enforces strict rate limits that can lead to temporary or permanent blocks. A datacenter IP firing requests in a tight loop is exactly the shape it is built to stop.

So a working approach needs two things in one request: a real browser that renders the page and waits for its async content, and an IP the platform reads as an ordinary visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but keeping that stack healthy is most of the work. The Crawling API folds both into a single call: you send the URL with a JavaScript token and the right wait options, it renders the page behind a trusted IP, and it returns finished HTML or parsed JSON. For background on rendering-heavy targets, see how to crawl JavaScript websites.

Prerequisites

You need a few things in place before writing any code. None take long.

Basic JavaScript and Node.js. You should be comfortable writing and running a Node script and installing packages with npm. If you are newer to this, our guide on how to build a web scraper with Node.js covers the foundations this tutorial assumes.

Node.js 16 or later. Confirm your version with node --version. If you do not have it, install it from the Node.js website or through a version manager like nvm.

A Crawlbase account and JS token. Sign up for a free account, open your dashboard, and copy your JavaScript (JS) token. Crawlbase gives you up to 20,000 free requests to start, and you pay only for successful requests. Facebook is client-rendered, so you need the JavaScript token here, not the normal one. Treat the token like a password and keep it out of version control.

Set up the project

Create a project folder, initialize it, and install the Crawlbase Node client.

bash

node --version

mkdir facebook-page-scraper && cd facebook-page-scraper
npm init -y

npm install crawlbase

The crawlbase package is the official Node client for the Crawling API. For the page-level demo we lean on Crawlbase's built-in Facebook Page scraper, which returns structured JSON, so we do not need a separate HTML parser for the main example.

Step 1: Fetch the rendered public Page

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JS token, and request a public business Page URL. The wait options are what make a Facebook fetch work at all, so they matter more here than on a static site.

javascript

const { CrawlingAPI } = require('crawlbase');

const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });

// A PUBLIC business/brand Page only, never a personal profile or private group
const pageUrl = 'https://www.facebook.com/Alibaba.comGlobal/';

async function fetchPage(url) {
  const options = {
    format: 'json',
    ajax_wait: 'true',
    scroll: 'true',
    scroll_interval: 30,
  };
  const response = await api.get(url, options);
  if (response.statusCode === 200) {
    return JSON.parse(response.body);
  }
  console.error(`Request failed: ${response.statusCode}`);
  return null;
}

fetchPage(pageUrl).then((data) => {
  if (data) console.log(data.body.slice(0, 500));
});

Each option earns its place. format: 'json' asks for a structured response so the rendered HTML arrives in the body field rather than as a raw document. ajax_wait: 'true' tells the API to hold until the page's AJAX calls resolve, which is essential because Facebook loads its real content that way; skip it and you capture loader markup. scroll: 'true' simulates a user scrolling so additional posts load, and scroll_interval sets how long to scroll in seconds (the maximum is 60). Run the script with node script.js and you should see real page markup in the slice, not a stripped-down shell. That confirms rendering works before you parse anything.

Crawlbase Facebook Scraper

That single api.get call did what a headless browser plus a proxy pool would otherwise do for you. The Crawling API renders the Page in a real browser, waits out the AJAX calls with ajax_wait, simulates the scroll that loads more posts, and rotates through residential IPs server-side, so you skip standing up and babysitting that whole stack. Start on the free tier and point it at a public business Page.

Start free

Step 2: Get structured page data with the built-in scraper

Raw HTML is workable, but you would have to write and maintain selectors against Facebook's frequently changing markup. The Crawling API ships a built-in facebook-page scraper that returns the public page data already parsed as JSON, which is the right tool for page-level extraction. You enable it with the scraper parameter.

javascript

const { CrawlingAPI } = require('crawlbase');

const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });
const pageUrl = 'https://www.facebook.com/Alibaba.comGlobal/';

async function scrapePage(url) {
  const options = {
    ajax_wait: 'true',
    scraper: 'facebook-page',
  };
  const response = await api.get(url, options);
  if (response.statusCode === 200) {
    return JSON.parse(response.body);
  }
  console.error(`Request failed: ${response.statusCode}`);
  return null;
}

scrapePage(pageUrl).then((data) => {
  if (data) console.log(JSON.stringify(data.body, null, 2));
});

The scraper response carries page-level fields such as the page name and about text, along with an array of the public posts the Page published, where each post includes its text and its public reaction, comment, and share counts. Because the scraper handles parsing, you do not chase CSS selectors against markup that changes weekly. Several scrapers ship with the Crawling API; the facebook-page one is purpose-built for public Page layouts.

Step 3: Extract only the public page-level fields

Now narrow the scraper output to exactly the public fields we want and drop everything else. This is where we enforce the scope: page name, post text, and the three aggregate counts per post. We do not read or store commenter identities or individual comment bodies.

javascript

const { CrawlingAPI } = require('crawlbase');

const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });
const pageUrl = 'https://www.facebook.com/Alibaba.comGlobal/';

async function scrapePage(url) {
  const options = { ajax_wait: 'true', scraper: 'facebook-page' };
  const response = await api.get(url, options);
  if (response.statusCode !== 200) {
    console.error(`Request failed: ${response.statusCode}`);
    return null;
  }
  return JSON.parse(response.body).body;
}

function extractPublicData(page) {
  const posts = (page.posts || []).map((post) => ({
    text: post.text || null,
    reactionCount: post.reactionCounts || 0,
    commentCount: post.commentsCount || 0,
    shareCount: post.sharesCount || 0,
  }));

  return {
    pageName: page.pageName || page.title || null,
    postCount: posts.length,
    posts,
  };
}

async function main() {
  const page = await scrapePage(pageUrl);
  if (!page) return;
  const publicData = extractPublicData(page);
  console.log(JSON.stringify(publicData, null, 2));
}

main();

The extractPublicData function does the scoping work. It keeps the page name, the post text, and the three public counts per post, and it reads nothing tied to an identifiable person. Each field falls back to a safe default when the scraper omits it, since not every post shows shares or comments. Field names map to the scraper's response keys (pageName, reactionCounts, commentsCount, sharesCount); if a key returns empty, check the live scraper output and adjust, because Page layouts evolve.

What the output looks like

Run the full script with node script.js and you get a compact JSON object: the page name and a list of public posts with their text and aggregate counts, ready to write to a file or database.

json

{
  "pageName": "Alibaba.com",
  "postCount": 2,
  "posts": [
    {
      "text": "Source smarter this season with verified suppliers.",
      "reactionCount": 1280,
      "commentCount": 94,
      "shareCount": 37
    },
    {
      "text": "New buyer guide: how to vet a manufacturer in 5 steps.",
      "reactionCount": 863,
      "commentCount": 51,
      "shareCount": 22
    }
  ]
}

Every value here is public and page-level. There are no usernames, no individual comment text, and no profile data, which is exactly the line this tutorial holds.

Handling scroll and AJAX content

Two parameters from Step 1 are what make Facebook tractable, and they are worth understanding before you run this at any volume.

ajax_wait. Facebook hydrates its content through AJAX after the document loads. Without ajax_wait: 'true' you capture the page before that content arrives and get loader markup. With it, the API returns the HTML only once the async calls have resolved.
scroll and scroll_interval. Posts load progressively as a user scrolls. scroll: 'true' simulates that, and scroll_interval controls how many seconds to scroll, up to a maximum of 60. A longer interval surfaces more posts at the cost of a slower request, so tune it to how many recent posts you actually need.

Beyond data, the Crawling API can also return a screenshot of the rendered Page with the screenshot parameter, handing back a screenshot_url in the response that expires after about an hour. That is useful for visually confirming what was captured, but the structured fields above are what you build on.

Staying within rate limits

Even with rendering handled, Facebook enforces strict rate limits, and crossing them risks temporary or permanent blocks. A few habits keep a run healthy and respectful.

Pace your requests. Do not hammer Pages in a tight loop. Space requests out and keep total volume low; this is both a courtesy and the fastest way to avoid being flagged.
Lean on rotation. Requests spread across a pool of residential IPs are far less likely to trip a limit than a single datacenter address. The Crawling API handles rotation for you; if you build your own stack, this is the part to invest in. See how to scrape websites without getting blocked for the broader playbook.
Watch the status codes. When responses start coming back as challenges or errors, that is a signal to back off, not noise to push through.

For larger, scheduled jobs across many public Pages, an asynchronous queue fits better than a synchronous loop. Our guide on how to extract data using the Crawlbase Crawler covers that pattern, where requests are queued and delivered to a webhook instead of blocking your script.

Is it legal to scrape Facebook?

Read this section before you run anything. Facebook's Terms of Service strongly restrict automated collection. Its terms and its automated-access policies prohibit scraping in broad terms, and that restriction stands regardless of how careful your tooling is. None of the code in this guide overrides Facebook's terms; it only makes the technical part work. Before collecting anything, read Facebook's Terms of Service, its robots.txt, and its developer and platform policies, and treat all three as the boundary for what you may touch.

If you proceed for research or educational purposes, stay strictly inside a narrow public lane. Collect only public data from public business or brand Pages: the page name, the text of posts the Page itself published, and aggregate public engagement counts. Do not collect personal data. That means no personal profiles, no private groups, no follower or member lists, no private messages, and no individual comments tied to identifiable people. Usernames, handles, profile details, and user-written comments are personal data, and building a profile of an identifiable individual from them is exactly what to avoid. Aggregate at the page and post level, as the code above does, and stop there.

Where personal data is involved at all, privacy law applies. Under the GDPR and the CCPA you need a lawful basis to process personal data and you must honor deletion and opt-out requests, which is a heavy obligation that public-page aggregate counts are specifically designed to sidestep. For any production use, the sanctioned and far safer route is the official Facebook Graph API, which provides authorized, rate-limited access to the data a Page owner or app is permitted to see, with clear terms attached. Strongly prefer the Graph API. Use the public-data approach in this guide only for small, educational, public-page work, and never as a way around a login, a privacy setting, or the platform's terms.

Recap

Key takeaways

Facebook renders client-side. A plain request returns loader markup, so you must render the page, wait for AJAX, and simulate scroll before any content appears.
Rendering and a trusted IP, in one call. The Crawling API with a JS token does both; ajax_wait, scroll, and scroll_interval control how the page is captured.
Use the built-in facebook-page scraper. It returns public page data as JSON, so you avoid maintaining selectors against markup that changes constantly.
Scope to public page-level data. Page name, post text, and aggregate reaction, comment, and share counts only; never commenter identities, profiles, private groups, or individual comments.
Prefer the official API. Facebook's ToS strongly restricts scraping and GDPR/CCPA apply to personal data, so the Facebook Graph API is the sanctioned path for anything beyond small educational use.

Frequently Asked Questions (FAQs)

Why does a plain request return no real content from a Facebook Page?

Because Facebook loads its content client-side. The page name, about section, and posts arrive through JavaScript and AJAX after the initial document, and more posts appear only on scroll. A bare HTTP request captures the page before that happens, so you get loader markup instead of data. Rendering the page and waiting for AJAX with the Crawling API's JS token is what returns the actual content.

Do I need the normal token or the JS token for Facebook?

Use the JavaScript (JS) token. Facebook builds its pages with client-side rendering, so the normal token, which fetches static HTML, comes back with loader markup and no meaningful content. The JS token renders the page in a real browser first, which is what makes the data appear.

What public data can I safely extract from a Facebook business Page?

Stick to page-level public fields: the page name, the text of posts the Page itself published, and aggregate engagement counts (reactions, comments, shares) as numbers. Avoid anything personal, including commenter names, individual comment text, profile details, follower lists, private groups, and anything behind a login. Aggregate counts at the page and post level are the defensible scope.

Can I scrape personal profiles or private groups?

No, and this guide does not cover it. Personal profiles, private groups, member lists, and private messages are personal and non-public data, and collecting them runs against Facebook's terms and privacy law. This walkthrough is limited to public business and brand Pages on purpose. For sanctioned access to more, use the Facebook Graph API with proper authorization.

Should I use the Facebook Graph API instead?

For any production or commercial use, yes. The Facebook Graph API is the official, authorized path, with rate limits and clear terms, and it is the right tool when a Page owner or app needs reliable access. The public-data scraping approach here is suited only to small, educational, public-page work where no API access is in place, and it must still respect Facebook's terms.

How do I avoid getting blocked or rate-limited?

Keep request volume low, pace requests rather than looping tightly, and route through rotating residential IPs so no single address trips Facebook's limits. The Crawling API manages rotation and a trusted IP pool for you. Watch the status codes and back off the moment you start seeing challenges or errors instead of pushing through them.

Hassan Rehan

Software Engineer · Crawlbase

Software engineer at Crawlbase writing hands-on guides on rotating proxies, scraping, and the practical details of wiring proxies into real code.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. Up to 20,000 requests free, no card required.

Get a free API key →Read the docs

Self-serve · No sales call required · Enterprise crawl volumes available

What you will build

Why a plain request fails on Facebook

Prerequisites

Set up the project

Step 1: Fetch the rendered public Page

Step 2: Get structured page data with the built-in scraper

Step 3: Extract only the public page-level fields

What the output looks like

Handling scroll and AJAX content

Staying within rate limits

Is it legal to scrape Facebook?

Key takeaways

Frequently Asked Questions (FAQs)

Why does a plain request return no real content from a Facebook Page?

Do I need the normal token or the JS token for Facebook?

What public data can I safely extract from a Facebook business Page?

Can I scrape personal profiles or private groups?

Should I use the Facebook Graph API instead?

How do I avoid getting blocked or rate-limited?

Crawl any site at scale, without fighting infrastructure.

Continue Reading

How to Scrape Google People Also Ask: full PAA extraction guide

Introducing the New Crawlbase Dashboard: a cleaner control center

13 Tips to Master Data Crawling: crawls that do not break

The infrastructure brief, in your inbox.

We use cookies

Customize cookies