Glassdoor is one of the largest job and company-research platforms on the open web, and its public listing pages carry a lot of structured signal: job titles, the companies hiring, where the roles are based, and the star ratings employees give those companies. Recruiters track it to benchmark demand, analysts use it to study hiring trends across industries, and job seekers compare openings and employer ratings side by side. All of that sits on the public search-results page in a predictable layout.
This guide shows you how to scrape Glassdoor with JavaScript and Node.js using Cheerio. You build a small, runnable scraper that fetches a public Glassdoor job-search page through the Crawling API, parses the job title, company, location, company rating, salary estimate, and link for each posting, handles pagination, and exports the result as JSON and CSV. The whole walkthrough stays scoped to public job and company listing data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.
What you will build
A Node.js script that takes a public Glassdoor job-search URL, retrieves the rendered HTML through the Crawling API, and extracts a structured record for every job card on the results page. We use a developer-jobs search as the running example and pull these fields per posting:
- Title the job title shown on the card, for example "React Native Developer".
- Company the name of the hiring employer.
- Rating the public company star rating, when Glassdoor shows one for that employer.
- Location the city and region where the role is based.
- Salary the estimated salary range when present on the card.
- Post date how long ago the listing went up, like "30d+".
- Link the URL to the individual job posting.
Why a plain request fails on Glassdoor
If you request a Glassdoor search URL with a bare HTTP client, you rarely get the job cards back. Two things work against you. First, Glassdoor renders the results list in the browser with JavaScript, so the initial HTML is a near-empty shell until the page's scripts run. Second, Glassdoor flags automated traffic aggressively: datacenter IPs and request patterns that do not look like a real browser get challenged with a CAPTCHA, rate-limited, or blocked before they ever reach the rendered listings.
So a working Glassdoor scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL, it renders the page behind a trusted IP, and it returns finished HTML for you to parse with Cheerio.
The Crawling API gives you two tokens: a normal one and a JavaScript one. Glassdoor needs the page rendered in a real browser, so use your JavaScript token for every request in this guide. The normal token returns the unrendered shell and your selectors will come back empty.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic JavaScript and Node.js. You should be comfortable writing and running a Node script and installing packages with npm. If you are new to Node, the official docs and any beginner course will get you to the level this tutorial assumes. For a fuller walkthrough, our guide to building a web scraper with Node.js covers the basics.
Node.js 16 or later. Confirm your version with node --version. If you do not have it, install it from the Node.js website or through a version manager like nvm.
A Crawlbase account and token. Sign up, open your dashboard, and copy your JavaScript token from the account docs page. The free tier gives you 1,000 requests with no card, and you only pay for successful requests. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create a project folder, initialize it, and install the two libraries the scraper needs.
node --version mkdir glassdoor-scraper && cd glassdoor-scraper npm init -y npm install crawlbase cheerio
Two dependencies do the work: crawlbase is the official Node client for the Crawling API, and cheerio parses the returned HTML with a jQuery-style API so you can pull out individual fields by CSS selector. Create a file named scraper.js in this folder and add the code from the steps below.
Step 1: Fetch the rendered search page
Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JavaScript token, and request a public Glassdoor search URL. Checking the status code before you parse keeps failures loud instead of silent.
const { CrawlingAPI } = require('crawlbase'); const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' }); const glassdoorPageURL = 'https://www.glassdoor.com/Job/new-york-ny-react-native-developer-jobs-SRCH_IL.0,11_IC1132348_KO12,34.htm'; api .get(glassdoorPageURL) .then((response) => { if (response.statusCode === 200) { console.log(response.body.slice(0, 500)); } }) .catch((error) => console.error('API request error:', error));
Run the script with node scraper.js and you should see real Glassdoor job markup at the top of the body, not a stripped-down shell. That confirms rendering works before you write a single selector. The Crawling API uses the JavaScript token you supplied to render the page in a real browser, so the job cards are present in the HTML you get back.
That first request just returned a fully rendered Glassdoor search page without a headless browser or a proxy on your side. The Crawling API runs the page in a real browser, rotates through residential IPs server-side, and handles the CAPTCHAs Glassdoor throws at scrapers, so you get finished HTML from one call. Point it at a public job search on the free tier first, then add your parser.
Step 2: Parse each job card with Cheerio
With rendered HTML in hand, load it into Cheerio and walk the job cards. Glassdoor lists results inside a "Jobs List" container, with each posting in its own card, so you select every card, then read the title, company, rating, location, salary, post date, and link from inside it. Reading each field defensively keeps one missing value from crashing the run.
const cheerio = require('cheerio'); function parseDataFromHTML(html) { const $ = cheerio.load(html); const searchResults = { resultInfo: '', jobs: [], }; // Result summary (for example "1,200 React Native Developer Jobs") searchResults.resultInfo = $('.SearchResultsHeader_jobCount__12dWB') .text() .trim(); // One record per job card $('ul[aria-label="Jobs List"] .jobCard').each((_, element) => { const card = $(element); const title = card.find('.JobCard_seoLink__WdqHZ').text().trim(); const company = card .find('.EmployerProfile_employerName__Xemli') .text() .trim(); const rating = card .find('.EmployerProfile_ratingContainer__N4hxE') .text() .trim(); const location = card.find('.JobCard_location__N_iYE').text().trim(); const postDate = card .find('.JobCard_listingAge__KuaxZ') .text() .trim(); const salary = card .find('.JobCard_salaryEstimate___m9kY') .text() .trim(); let link = card.find('.JobCard_seoLink__WdqHZ').attr('href'); if (link && link.startsWith('/')) { link = new URL(link, 'https://www.glassdoor.com').href; } if (title) { searchResults.jobs.push({ title, company, rating: rating || 'Rating not available', location, salary, postDate, link: link || '', }); } }); return searchResults; }
A few details keep this faithful to the page. The result summary comes from .SearchResultsHeader_jobCount__12dWB, and each posting lives in a .jobCard inside the ul[aria-label="Jobs List"] container. Inside a card, the title and its link both come from the .JobCard_seoLink__WdqHZ anchor, the company from .EmployerProfile_employerName__Xemli, the public company rating from .EmployerProfile_ratingContainer__N4hxE, the location from .JobCard_location__N_iYE, the post date from .JobCard_listingAge__KuaxZ, and the salary estimate from .JobCard_salaryEstimate___m9kY. The link is read from the anchor's href and resolved to an absolute URL so it works outside the page.
Glassdoor's class names (the JobCard_* and EmployerProfile_* suffixes above) are generated and change without notice. Treat the selectors as a starting template, not a contract. When a field comes back empty, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.
Step 3: Assemble the full script with JSON and CSV export
Now wire the fetch and the parse into one runnable script, then write the records to disk as both JSON and CSV. The legacy guide used an Express endpoint to trigger the crawl, but a plain script keeps the moving parts down; you can wrap it in an endpoint later if you want one.
const fs = require('fs'); const { CrawlingAPI } = require('crawlbase'); const cheerio = require('cheerio'); const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' }); async function crawl(pageUrl) { const response = await api.get(pageUrl); if (response.statusCode === 200) return response.body; console.error(`Request failed: ${response.statusCode}`); return null; } function toCsv(rows) { const headers = [ 'title', 'company', 'rating', 'location', 'salary', 'postDate', 'link', ]; const escape = (value) => `"${String(value).replace(/"/g, '""')}"`; const lines = [headers.join(',')]; for (const row of rows) { lines.push(headers.map((h) => escape(row[h])).join(',')); } return lines.join('\n'); } async function main() { const url = 'https://www.glassdoor.com/Job/new-york-ny-react-native-developer-jobs-SRCH_IL.0,11_IC1132348_KO12,34.htm'; const html = await crawl(url); if (!html) return; const data = parseDataFromHTML(html); fs.writeFileSync('glassdoor.json', JSON.stringify(data, null, 2)); fs.writeFileSync('glassdoor.csv', toCsv(data.jobs)); console.log(`Saved ${data.jobs.length} jobs to JSON and CSV`); } main();
Paste the parseDataFromHTML function from Step 2 into the same file so main can call it. Run it with node scraper.js and you get two files: glassdoor.json with the full structured records and glassdoor.csv ready to open in a spreadsheet. The toCsv helper quotes every field and doubles any embedded quotes, which matters here because job titles and locations frequently contain commas.
What the output looks like
The JSON file holds the result summary plus one object per job, each with the title, company, public rating, location, salary estimate, post date, and link.
{ "resultInfo": "React Native Developer Jobs in New York, NY", "jobs": [ { "title": "React Native Developer", "company": "Example Tech Inc", "rating": "4.1", "location": "New York, NY", "salary": "$110K - $140K (Employer est.)", "postDate": "30d+", "link": "https://www.glassdoor.com/job-listing/react-native-developer" }, { "title": "Senior Mobile Engineer", "company": "Acme Software", "rating": "3.8", "location": "Remote", "salary": "$130K - $160K (Glassdoor est.)", "postDate": "5d", "link": "https://www.glassdoor.com/job-listing/senior-mobile-engineer" } ] }
The CSV mirrors the same job rows with a header line, so it drops straight into Excel, Google Sheets, or any data pipeline that reads delimited files.
title,company,rating,location,salary,postDate,link "React Native Developer","Example Tech Inc","4.1","New York, NY","$110K - $140K (Employer est.)","30d+","https://www.glassdoor.com/job-listing/react-native-developer" "Senior Mobile Engineer","Acme Software","3.8","Remote","$130K - $160K (Glassdoor est.)","5d","https://www.glassdoor.com/job-listing/senior-mobile-engineer"
Handle pagination
One search page is a demo; a real job pulls every page of results. Glassdoor paginates its search URLs by appending a page segment, so you can loop over page numbers, fetch each through the Crawling API, parse it with the same function, and stop when a page returns no cards. Because every results page shares the same card structure, the parser you already wrote works across all of them without changes.
async function scrapeAllPages(baseUrl, maxPages) { const allJobs = []; for (let page = 1; page <= maxPages; page++) { // Glassdoor adds the page number before the .htm extension const pageUrl = baseUrl.replace('.htm', `_IP${page}.htm`); const html = await crawl(pageUrl); if (!html) break; const { jobs } = parseDataFromHTML(html); if (jobs.length === 0) break; // no more results allJobs.push(...jobs); console.log(`Page ${page}: ${jobs.length} jobs`); // Pace requests so you stay under the rate limit await new Promise((r) => setTimeout(r, 2000)); } return allJobs; }
The exact pagination token in the URL can change, so check a couple of real "next page" links in your browser and match the pattern. The important habits carry over to any target: loop until the results run out, and put a short delay between requests so you are not hammering the site. For more on rendered, JavaScript-heavy pages like this one, see our guide to crawling JavaScript websites.
Staying unblocked
Even with rendering handled, Glassdoor watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.
- Pace your requests. Introduce a delay between page fetches rather than hammering the search in a tight loop. Spreading requests out is the single biggest factor in staying under Glassdoor's rate limits.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a limit or a CAPTCHA. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
- Read the status codes. A run that starts returning challenges or non-200 responses is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.
For the broader playbook, see how to scrape websites without getting blocked. If you want similar job data from another platform, the same fetch-then-parse pattern carries straight over to scraping Indeed job posts and to scraping Monster jobs with Python.
Is it legal to scrape Glassdoor?
Whether scraping Glassdoor is allowed depends on Glassdoor's terms of service, your jurisdiction, and what you do with the data. Glassdoor's terms restrict automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Glassdoor's Terms of Use and its robots.txt, respect any rate expectations they state, and treat both as the boundary for what you collect.
This guide is deliberately scoped to public job and company listing data: the job title, hiring company, location, salary estimate, post date, link, and the company's public star rating that anyone can see on a search page without logging in. That is different from the personal data on the platform. Individual employee reviews, interview reports, and the people who wrote them are personal data. Treat company ratings as aggregate signal about an employer, never assemble profiles of individual reviewers, and do not republish a person's review tied to their identity. Anything behind a login, scraped at scale, or involving identifiable individuals pulls in privacy law such as GDPR and CCPA, and that is squarely out of scope here.
If your project needs more than public listings, the right path is a sanctioned one, not a cleverer scraper. Glassdoor and its parent operate official partner and API programs that expose employer and listing data with clear usage terms, attribution rules, and commercial rights. Those are the correct tools when you need large volumes, guaranteed structure, or the right to reuse the data commercially. When you are unsure whether a use is allowed, get permission or a data agreement rather than assuming silence is consent.
Key takeaways
- Glassdoor renders listings client-side and blocks hard. A plain request returns an empty shell or a CAPTCHA, so you must render the page behind a trusted IP, using the JavaScript token, before you parse it.
- The Crawling API does both in one call. It renders the page in a real browser, rotates residential IPs, and handles CAPTCHAs, returning finished HTML you parse with Cheerio.
-
Cheerio extracts the fields. Select every
.jobCardin the Jobs List, then read title, company, rating, location, salary, post date, and link, and expect the generated class names to drift. - Paginate and export. Loop over Glassdoor's page segments until results run out, pace your requests, and write structured records to both JSON and CSV.
- Stay on public data. Collect public job and company listings only, treat individual reviews and reviewers as personal data, respect ToS and robots.txt, and prefer Glassdoor's official API or partner program for volume or commercial use.
Frequently Asked Questions (FAQs)
Can I build a Glassdoor scraper in a language other than JavaScript?
Yes. This guide uses JavaScript with Cheerio, but the same approach works in any language. The Crawling API has libraries and SDKs for several languages, so you fetch the rendered HTML the same way and parse it with whatever HTML parser your stack prefers, such as BeautifulSoup in Python. The selectors and fields stay the same; only the parsing syntax changes.
Why does a plain request return incomplete data from Glassdoor?
Because Glassdoor renders its job list client-side with JavaScript and challenges automated traffic with CAPTCHAs. A raw HTTP request from a datacenter IP usually returns an empty shell or a block page rather than the job cards. To get a complete page you have to render it behind a trusted IP, which is what the Crawling API handles for you when you use the JavaScript token.
My selectors return empty values. What changed?
Almost certainly Glassdoor's markup. Its generated class names like JobCard_seoLink__WdqHZ change without notice, so selectors that worked last month can break. Re-inspect a live page in your browser's dev tools, update the selectors in parseDataFromHTML, and you are back in business. Periodic selector maintenance is normal for any production scraper.
Will I get blocked while scraping Glassdoor?
You can, if you send too many requests too fast from one address. The Crawling API reduces that risk by rotating through residential IPs and handling CAPTCHAs for you, but you should still pace your requests, add delays between pages, and watch the status codes so you can back off when challenges appear. Those habits matter on any hard commercial target.
Can I scrape individual employee reviews and reviewer names?
That is out of scope for this guide, and for good reason. Individual reviews and the people who wrote them are personal data, which pulls in privacy law like GDPR and CCPA. Use the public company rating as an aggregate signal about an employer, do not build profiles of individual reviewers, and do not republish a person's review tied to their identity. For anything beyond public listings, use Glassdoor's official API or partner program.
Does Glassdoor have an official API?
Glassdoor and its parent run official partner and API programs that expose employer and listing data under clear terms, with attribution rules and defined commercial rights. If you need large volumes, guaranteed structure, or the right to reuse the data commercially, that sanctioned route is the correct one. This public-data scraper is best for research, prototyping, and smaller-scale analysis where an official agreement is not warranted.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

