The Apple App Store is one of the largest public catalogs of software on the open web. Each app page carries a consistent block of structured detail: the app name, who built it, what category it sits in, its public star rating and rating count, the price, and the canonical URL. Developers track that data to benchmark competitors, analysts study category trends, and product teams watch how ratings move over time. All of it sits on the public product page in a predictable layout that anyone can open without signing in.

This guide shows you how to crawl Apple App Store data with JavaScript and Node.js using Cheerio. You build a small, runnable scraper that fetches a public app page through the Crawling API, parses the public metadata fields, and exports the result as JSON. The whole walkthrough stays scoped to public app metadata. It does not collect or profile individual reviewers, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.

What you will build

A Node.js script that takes a public App Store product URL, retrieves the rendered HTML through the Crawling API, and extracts a structured record of the app's public metadata. We use Google Authenticator as the running example and pull these fields:

  • App name the product title shown at the top of the page.
  • Developer the name of the seller or studio that publishes the app.
  • Category the App Store category the app is listed under, for example "Utilities".
  • Rating the public average star rating Apple displays for the app.
  • Rating count the public count of ratings behind that average.
  • Price the listed price, or "Free" when the app has no upfront cost.
  • App URL the canonical public URL of the product page.

Why a plain request fails on the App Store

If you request an App Store product URL with a bare HTTP client, you rarely get usable markup back. Two things work against you. First, Apple renders much of the product page in the browser, so the initial HTML is a thin shell until the page's scripts run and populate the header, ratings, and metadata blocks. Second, the App Store flags automated traffic: datacenter IPs and request patterns that do not look like a real browser get throttled or blocked before they reach the rendered content.

So a working App Store scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL, it renders the page behind a trusted IP, and it returns finished HTML for you to parse with Cheerio.

Use the JavaScript token

The Crawling API gives you two tokens: a normal one and a JavaScript one. App Store product pages need the content rendered in a real browser, so use your JavaScript token for every request in this guide. The normal token returns the unrendered shell and your selectors will come back empty.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic JavaScript and Node.js. You should be comfortable writing and running a Node script and installing packages with npm. If you are new to Node, the official docs and any beginner course will get you to the level this tutorial assumes. For a fuller walkthrough, our guide to building a web scraper with Node.js covers the basics.

Node.js 16 or later. Confirm your version with node --version. If you do not have it, install it from the Node.js website or through a version manager like nvm.

A Crawlbase account and token. Sign up, open your dashboard, and copy your JavaScript token from the account docs page. The free tier gives you 1,000 requests with no card, and you only pay for successful requests. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a project folder, initialize it, and install the two libraries the scraper needs.

bash
node --version

mkdir appstore-scraper && cd appstore-scraper
npm init -y

npm install crawlbase cheerio

Two dependencies do the work: crawlbase is the official Node client for the Crawling API, and cheerio parses the returned HTML with a jQuery-style API so you can pull out individual fields by CSS selector. Create a file named scraper.js in this folder and add the code from the steps below.

Step 1: Fetch the rendered app page

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JavaScript token, and request a public App Store product URL. Checking the status code before you parse keeps failures loud instead of silent.

javascript
const { CrawlingAPI } = require('crawlbase');

const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });

const appURL =
  'https://apps.apple.com/us/app/google-authenticator/id388497605';

api
  .get(appURL)
  .then((response) => {
    if (response.statusCode === 200) {
      console.log(response.body.slice(0, 500));
    }
  })
  .catch((error) => console.error('API request error:', error));

Run the script with node scraper.js and you should see real App Store product markup at the top of the body, not a stripped-down shell. That confirms rendering works before you write a single selector. The Crawling API uses the JavaScript token you supplied to render the page in a real browser, so the header, ratings, and metadata blocks are present in the HTML you get back.

Crawlbase Crawling API

That first request just returned a fully rendered App Store product page without a headless browser or a proxy on your side. The Crawling API runs the page in a real browser, rotates through residential IPs server-side, and handles the blocks the App Store throws at scrapers, so you get finished HTML from one call. Point it at a public app page on the free tier first, then add your parser.

Step 2: Parse the public metadata with Cheerio

With rendered HTML in hand, load it into Cheerio and read the fields out of the header and ratings blocks. The product header holds the app name, developer, category, and price; the ratings widget holds the average star rating and the rating count. Reading each field defensively keeps one missing value from crashing the run.

javascript
const cheerio = require('cheerio');

function parseAppMetadata(html, sourceUrl) {
  const $ = cheerio.load(html);

  // App name lives in the product header title
  let name = $('.app-header__title').text().trim();
  const titleBadge = $('.badge--product-title').text().trim();
  if (titleBadge) name = name.replace(titleBadge, '').trim();

  // Developer / seller
  const developer = $('.app-header__identity').text().trim();

  // Category, parsed from the "... in <Category>" header item
  let category = null;
  try {
    category = $('.product-header__list__item a.inline-list__item')
      .text()
      .trim()
      .split('in')[1]
      .trim();
  } catch {
    category = null;
  }

  // Price, or "Free" when there is no upfront cost
  const price = $('.app-header__list__item--price').text().trim();

  // Public average rating from the star widget's aria-label
  const rating = $('.we-star-rating').attr('aria-label') || null;

  // Public rating count, after the "•" separator
  let ratingCount = null;
  try {
    ratingCount = $('.we-rating-count')
      .text()
      .trim()
      .split('•')[1]
      .trim();
  } catch {
    ratingCount = null;
  }

  return {
    name,
    developer,
    category,
    rating,
    ratingCount,
    price,
    appUrl: sourceUrl,
  };
}

A few details keep this faithful to the page. The app name comes from .app-header__title, with the small product-title badge stripped off so you keep just the name. The developer reads from .app-header__identity, and the category is parsed out of the .product-header__list__item a.inline-list__item text, which reads like "Utilities" after the word "in". The price comes from .app-header__list__item--price. For the ratings block, the average is read from the .we-star-rating widget's aria-label, and the public rating count is taken from .we-rating-count after its separator. Every field is read defensively, so a missing rating or price returns null instead of throwing.

Selectors drift

Apple's class names (the app-header__* and we-rating-* selectors above) are part of a layout that changes over time. Treat the selectors as a starting template, not a contract. When a field comes back empty, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.

Step 3: Assemble the full script with JSON export

Now wire the fetch and the parse into one runnable script, then write the record to disk as JSON. A plain script keeps the moving parts down; you can wrap it in an endpoint later if you want one.

javascript
const fs = require('fs');
const { CrawlingAPI } = require('crawlbase');
const cheerio = require('cheerio');

const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' });

async function crawl(appUrl) {
  const response = await api.get(appUrl);
  if (response.statusCode === 200) return response.body;
  console.error(`Request failed: ${response.statusCode}`);
  return null;
}

async function main() {
  const appUrl =
    'https://apps.apple.com/us/app/google-authenticator/id388497605';
  const html = await crawl(appUrl);
  if (!html) return;

  const app = parseAppMetadata(html, appUrl);
  fs.writeFileSync('app.json', JSON.stringify(app, null, 2));
  console.log(`Saved metadata for ${app.name}`);
}

main();

Paste the parseAppMetadata function from Step 2 into the same file so main can call it. Run it with node scraper.js and you get an app.json file with the full structured record. The crawl helper checks the status code and returns null on a failed request, so main stops cleanly rather than parsing a broken page.

What the output looks like

The JSON file holds one object with the app's public metadata: its name, developer, category, average rating, rating count, price, and canonical URL.

json
{
  "name": "Google Authenticator",
  "developer": "Google LLC",
  "category": "Utilities",
  "rating": "4.7 out of 5",
  "ratingCount": "1.2M Ratings",
  "price": "Free",
  "appUrl": "https://apps.apple.com/us/app/google-authenticator/id388497605"
}

That single record is a solid foundation for further analysis, reporting, or visualization. If you store records like this over time, you can track how an app's rating and rating count move, which is often the point of the exercise.

Scale to many apps

One app page is a demo; a real job pulls a list of apps. The App Store does not expose a single public index of every app, so you build your own list of product URLs, then loop over it, fetch each through the Crawling API, parse it with the same function, and collect the records. Because every product page shares the same header and ratings structure, the parser you already wrote works across all of them without changes.

javascript
async function scrapeMany(appUrls) {
  const records = [];

  for (const url of appUrls) {
    const html = await crawl(url);
    if (!html) continue;

    records.push(parseAppMetadata(html, url));
    console.log(`Parsed ${url}`);

    // Pace requests so you stay under the rate limit
    await new Promise((r) => setTimeout(r, 2000));
  }

  return records;
}

The important habits carry over to any target: collect a clean list of URLs first, parse each with the same function, and put a short delay between requests so you are not hammering the site. For more on rendered, JavaScript-heavy pages like this one, see our guide to crawling JavaScript websites.

Staying unblocked

Even with rendering handled, the App Store watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.

  • Pace your requests. Introduce a delay between page fetches rather than hammering the store in a tight loop. Spreading requests out is the single biggest factor in staying under rate limits.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a limit or a block. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
  • Read the status codes. A run that starts returning non-200 responses is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.

For the broader playbook, see how to scrape websites without getting blocked. If you want a ready-made tool for this exact target, our Apple App Store scraper walkthrough covers the same ground from a different angle.

Whether scraping the App Store is allowed depends on Apple's terms, your jurisdiction, and what you do with the data. Apple's terms of service restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Apple's terms and the App Store's robots.txt, respect any rate expectations they state, keep your request volume reasonable, and treat both as the boundary for what you collect.

This guide is deliberately scoped to public app metadata: the app name, developer, category, public average rating, public rating count, price, and the canonical URL that anyone can see on a product page without signing in. That is different from the personal data on the platform. Individual reviews and the people who wrote them are personal data. Use rating counts and averages as aggregate signal about an app, never assemble profiles of individual reviewers, and do not republish a person's review tied to their identity. Anything behind an Apple account, scraped at scale, or involving identifiable individuals pulls in privacy law such as GDPR and CCPA, and that is squarely out of scope here. Treat copyrighted screenshots, icons, and description text as Apple's and the developer's property rather than yours to redistribute.

If your project needs more than public metadata, the right path is a sanctioned one, not a cleverer scraper. Apple runs official programs for this data. App Store Connect exposes your own app's data to you as a developer, and the public iTunes Search API returns structured app metadata, including name, developer, category, price, and ratings, under documented terms. Those official APIs are the correct tools when you need large volumes, guaranteed structure, or the right to reuse the data commercially. When you are unsure whether a use is allowed, get permission or use the sanctioned API rather than assuming silence is consent.

Recap

Key takeaways

  • The App Store renders content client-side and blocks automated traffic. A plain request returns a thin shell or a block, so you must render the page behind a trusted IP, using the JavaScript token, before you parse it.
  • The Crawling API does both in one call. It renders the page in a real browser, rotates residential IPs, and handles blocks, returning finished HTML you parse with Cheerio.
  • Cheerio extracts the public fields. Read app name, developer, category, rating, rating count, price, and the app URL from the header and ratings blocks, and expect the class names to drift over time.
  • Scale by looping over a URL list. Build your own list of product URLs, parse each with the same function, pace your requests, and write structured records to JSON.
  • Stay on public metadata. Collect public app metadata only, treat individual reviews and reviewers as personal data, respect Apple's terms and robots.txt, and prefer Apple's App Store Connect and iTunes Search API for volume or commercial use.

Frequently Asked Questions (FAQs)

Can I scrape any app on the App Store?

You can fetch any app's public product page as long as you have its URL. Apple does not publish a complete public index of every app, so you build your own list of product URLs from search results, charts, or links you already have, then loop over that list. Keep your volume reasonable and stay on the public metadata fields covered here.

Why does a plain request return incomplete data from the App Store?

Because Apple renders much of the product page in the browser and challenges automated traffic. A raw HTTP request from a datacenter IP usually returns a thin shell rather than the header and ratings content. To get a complete page you have to render it behind a trusted IP, which is what the Crawling API handles for you when you use the JavaScript token.

My selectors return empty values. What changed?

Almost certainly Apple's markup. Class names like app-header__title and we-rating-count are part of a layout that changes over time, so selectors that worked last month can break. Re-inspect a live page in your browser's dev tools, update the selectors in parseAppMetadata, and you are back in business. Periodic selector maintenance is normal for any production scraper.

Can I scrape individual App Store reviews and reviewer names?

That is out of scope for this guide, and for good reason. Individual reviews and the people who wrote them are personal data, which pulls in privacy law like GDPR and CCPA. Use the public rating count and average as an aggregate signal about an app, do not build profiles of individual reviewers, and do not republish a person's review tied to their identity. For anything beyond public metadata, use Apple's official APIs.

Does Apple have an official API for app data?

Yes. App Store Connect gives developers access to their own app's data, and the public iTunes Search API returns structured metadata for apps, including name, developer, category, price, and ratings, under documented terms. If you need large volumes, guaranteed structure, or the right to reuse the data commercially, those sanctioned routes are the correct choice. This public-metadata scraper is best for research, prototyping, and smaller-scale analysis where an official agreement is not warranted.

Can I build an App Store scraper in a language other than JavaScript?

Yes. This guide uses JavaScript with Cheerio, but the same approach works in any language. The Crawling API has libraries and SDKs for several languages, so you fetch the rendered HTML the same way and parse it with whatever HTML parser your stack prefers, such as BeautifulSoup in Python. The selectors and fields stay the same; only the parsing syntax changes.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available