Craigslist has been the United States' biggest classifieds platform since the late 1990s, and it still carries millions of public postings across housing, items for sale, services, and community categories. Those listing pages are a useful source of public market signal: how rents move across neighborhoods, what used goods sell for in a metro, where supply is thin. Each search result sits on the page in a predictable, server-rendered layout, which makes the public fields straightforward to read.
This guide shows you how to scrape Craigslist with JavaScript and Node.js using Cheerio. You build a small, runnable scraper that fetches a public Craigslist search-results page through the Crawling API, parses the title, price, location, post date, and link for each listing, and exports the result as JSON and CSV. The whole walkthrough stays scoped to public, non-personal listing data. Seller contact details and the free-text content of a person's post are personal data, and the legality section near the end explains why this scraper deliberately leaves them out, so read it before you point this at any real volume.
What you will build
A Node.js script that takes a public Craigslist search URL, retrieves the HTML through the Crawling API, and extracts a structured record for every listing on the results page. We use a real-estate search as the running example and pull these fields per posting:
- Title the listing title shown on the result card, for example "2 bedroom trailer for rent".
- Price the asking price as displayed, like "$675", kept as text because Craigslist prices carry currency symbols and commas.
- Location the neighborhood or area hint Craigslist shows next to the listing.
- Post date the publish date for the listing, when the result card exposes it.
- Link the absolute URL to the individual listing page.
Why a plain request can fail on Craigslist
A bare HTTP request to a Craigslist search URL can work for a single hit, but it does not hold up once you scrape at any volume. Craigslist watches for automated traffic and enforces against it: datacenter IPs and request patterns that do not look like a real browser get rate-limited, served a CAPTCHA, or blocked outright. Run a tight loop from one address and you will see non-200 responses and challenge pages instead of listings fairly quickly.
So a durable Craigslist scraper needs an IP the platform reads as a real visitor, and it needs to behave politely. You can assemble that yourself with a pool of rotating residential proxies, but keeping that pool healthy and unblocked is most of the work. The Crawling API folds it into a single call: you send it the URL, it fetches the page behind a trusted IP with CAPTCHA handling built in, and it returns the HTML for you to parse with Cheerio.
Everything in this guide reads fields that any visitor sees on a public search-results page: title, price, location hint, post date, and the link. It does not open individual posts to harvest a seller's phone number or email, and it does not build a profile of any poster. That boundary is intentional, and the legality section below explains it.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic JavaScript and Node.js. You should be comfortable writing and running a Node script and installing packages with npm. If you are new to Node, the official docs or any beginner course will get you to the level this tutorial assumes. For a fuller walkthrough, our guide to building a web scraper with Node.js covers the basics.
Node.js 16 or later. Confirm your version with node --version. If you do not have it, install it from the Node.js website or through a version manager like nvm.
A Crawlbase account and token. Sign up, open your dashboard, and copy your token from the account docs page. The free tier gives you 1,000 requests with no card, and you only pay for successful requests. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create a project folder, initialize it, and install the two libraries the scraper needs.
node --version mkdir craigslist-scraper && cd craigslist-scraper npm init -y npm install crawlbase cheerio
Two dependencies do the work: crawlbase is the official Node client for the Crawling API, and cheerio parses the returned HTML with a jQuery-style API so you can pull out individual fields by CSS selector. The original version of this tutorial used jsdom to parse the saved HTML; Cheerio does the same job with a lighter, faster API and is a better fit for a scraping pipeline. Create a file named scraper.js in this folder and add the code from the steps below.
Step 1: Fetch the search-results page
Start by getting the page HTML. Import the CrawlingAPI class, initialize it with your token, and request a public Craigslist search URL. Pick a search listing page you want to scrape, for example a real-estate-for-sale search with the gallery view, and check the status code before you parse so failures stay loud instead of silent.
const { CrawlingAPI } = require('crawlbase'); const fs = require('fs'); const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' }); const craigslistPageURL = 'https://chicago.craigslist.org/search/rea?hasPic=1'; api .get(craigslistPageURL) .then((response) => { if (response.statusCode === 200) { fs.writeFileSync('response.html', response.body); console.log('HTML saved to response.html'); } else { console.error(`Request failed: ${response.statusCode}`); } }) .catch((error) => console.error('API request error:', error));
Run the script with node scraper.js. On success it writes the page to response.html, which lets you inspect the markup and develop selectors against a stable copy instead of hitting the network on every change. The Crawling API fetches the page behind a trusted IP, so the listings are present in the HTML you get back rather than a block page.
That first request just returned a real Craigslist results page without a proxy pool or CAPTCHA-solving on your side. The Crawling API fetches the page behind rotating residential IPs server-side and handles the challenges Craigslist throws at scrapers, so you get usable HTML from one call. Point it at a public search on the free tier first, then add your parser.
Step 2: Parse each listing with Cheerio
With the saved HTML in hand, load it into Cheerio and walk the listings. Craigslist renders its static search results inside an ol.cl-static-search-results list, with each posting in its own li.cl-static-search-result item, so you select every item and read the title, price, location, post date, and link from inside it. Reading each field defensively keeps one missing value from crashing the run.
const cheerio = require('cheerio'); function parseListings(html) { const $ = cheerio.load(html); const listings = []; $('ol.cl-static-search-results li.cl-static-search-result').each((_, el) => { const item = $(el); const title = item.find('.title').text().trim(); const price = item.find('.price').text().trim(); const location = item.find('.location').text().trim(); const postDate = item.find('.meta time').attr('datetime') || ''; const link = item.find('a').attr('href') || ''; if (title) { listings.push({ title, price: price || 'N/A', location: location || 'N/A', postDate, url: link, }); } }); return listings; }
The selectors map directly to the page. Each listing's title comes from .title, the asking price from .price, the neighborhood hint from .location, and the link from the item's anchor href. The post date is read from the datetime attribute of the time element in the listing's .meta row, which gives you a clean machine-readable date rather than relative text. Price stays a string on purpose, because Craigslist values include the currency symbol and thousands separators; convert to a number later if your analysis needs it.
Craigslist adjusts its markup from time to time, and individual city subdomains can differ slightly. Treat these selectors as a starting template, not a contract. When a field comes back empty, open response.html or the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.
Step 3: Assemble the full script with JSON and CSV export
Now wire the fetch and the parse into one runnable script, then write the records to disk as both JSON and CSV.
const fs = require('fs'); const { CrawlingAPI } = require('crawlbase'); const cheerio = require('cheerio'); const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' }); async function crawl(pageUrl) { const response = await api.get(pageUrl); if (response.statusCode === 200) return response.body; console.error(`Request failed: ${response.statusCode}`); return null; } function toCsv(rows) { const headers = ['title', 'price', 'location', 'postDate', 'url']; const escape = (value) => `"${String(value).replace(/"/g, '""')}"`; const lines = [headers.join(',')]; for (const row of rows) { lines.push(headers.map((h) => escape(row[h])).join(',')); } return lines.join('\n'); } async function main() { const url = 'https://chicago.craigslist.org/search/rea?hasPic=1'; const html = await crawl(url); if (!html) return; const listings = parseListings(html); fs.writeFileSync('listings.json', JSON.stringify(listings, null, 2)); fs.writeFileSync('listings.csv', toCsv(listings)); console.log(`Saved ${listings.length} listings to JSON and CSV`); } main();
Paste the parseListings function from Step 2 into the same file so main can call it. Run it with node scraper.js and you get two files: listings.json with the full structured records and listings.csv ready to open in a spreadsheet. The toCsv helper quotes every field and doubles any embedded quotes, which matters here because listing titles frequently contain commas.
What the output looks like
The JSON file holds one object per listing, each with the title, price, location, post date, and link. The values below are illustrative, drawn from a real-estate search.
[ { "title": "2 bedroom trailer for rent", "price": "$675", "location": "165th & Kennedy", "postDate": "2024-04-05 09:12", "url": "https://chicago.craigslist.org/nwi/reo/d/hammond-bedroom-trailer-for-rent/7732856568.html" }, { "title": "Barrington Village Home", "price": "$439,000", "location": "northwest suburbs", "postDate": "2024-04-04 16:48", "url": "https://chicago.craigslist.org/nwc/reo/d/barrington-barrington-village-home/7734168844.html" } ]
The CSV mirrors the same rows with a header line, so it drops straight into Excel, Google Sheets, or any data pipeline that reads delimited files.
title,price,location,postDate,url "2 bedroom trailer for rent","$675","165th & Kennedy","2024-04-05 09:12","https://chicago.craigslist.org/nwi/reo/d/hammond-bedroom-trailer-for-rent/7732856568.html" "Barrington Village Home","$439,000","northwest suburbs","2024-04-04 16:48","https://chicago.craigslist.org/nwc/reo/d/barrington-barrington-village-home/7734168844.html"
Handle pagination
One search page is a demo; a real job pulls every page of results. Craigslist paginates its search URLs with a numeric offset, advancing 120 results at a time, so you can loop over offsets, fetch each page through the Crawling API, parse it with the same function, and stop when a page returns no listings. Because every results page shares the same item structure, the parser you already wrote works across all of them without changes.
async function scrapeAllPages(baseUrl, maxPages) { const all = []; for (let page = 0; page < maxPages; page++) { // Craigslist pages search results in steps of 120 const offset = page * 120; const pageUrl = `${baseUrl}&s=${offset}`; const html = await crawl(pageUrl); if (!html) break; const listings = parseListings(html); if (listings.length === 0) break; // no more results all.push(...listings); console.log(`Page ${page + 1}: ${listings.length} listings`); // Pace requests so you stay under the rate limit await new Promise((r) => setTimeout(r, 2000)); } return all; }
The exact pagination parameter can change, so check a couple of real "next page" links in your browser and match the pattern. The important habits carry over to any target: loop until the results run out, and put a short delay between requests so you are not hammering the site. For more on this style of work, see our guide to crawling JavaScript websites, and if you are tracking prices over time, our notes on web scraping for price intelligence.
Staying unblocked
Craigslist enforces against scrapers, so a few habits keep a run healthy. They apply to any hard target.
- Pace your requests. Introduce a delay between page fetches rather than hammering the search in a tight loop. Spreading requests out is the single biggest factor in staying under Craigslist's rate limits.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a limit or a CAPTCHA. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
- Read the status codes. A run that starts returning challenges or non-200 responses is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.
For the broader playbook, see how to scrape websites without getting blocked. If you want similar listing data from other classifieds and rental sites, the same fetch-then-parse pattern carries over to scraping Apartments.com.
Is it legal to scrape Craigslist?
Whether scraping Craigslist is allowed depends on Craigslist's terms of service, your jurisdiction, and what you do with the data. This matters more on Craigslist than on most sites: Craigslist actively enforces against automated access and has a long record of pursuing scrapers in court. Its Terms of Use prohibit automated collection, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Craigslist's Terms of Use and its robots.txt, respect the rate limits they imply, and treat both as the boundary for what you collect.
This guide is deliberately scoped to public, non-personal listing data: the title, price, location hint, post date, and link that anyone sees on a search-results page without logging in. That is different from the personal data on the platform. A seller's name, phone number, email, or the free-text content they wrote inside a post are personal data. Do not harvest seller contact details, do not assemble profiles of posters, and do not republish a post tied to an identifiable person. The moment a project touches identifiable individuals, privacy law such as GDPR and CCPA applies, and that is squarely out of scope here. Aggregate facts like "two-bedroom rents in this neighborhood cluster around X" are fine; a list of who is selling what, with their contact details, is not.
Craigslist does not publish a general-purpose public API, though some categories expose RSS feeds for limited, sanctioned access. Where a feed or an explicit data agreement exists, prefer it: a sanctioned route comes with clear usage terms rather than the legal and technical risk of scraping a site that fights it. When you are unsure whether a use is allowed, get permission or a data agreement rather than assuming silence is consent, and keep both the volume and the scope of what you collect proportionate to a legitimate, non-personal research purpose.
Key takeaways
- Craigslist enforces against scrapers. A tight loop from a datacenter IP gets rate-limited, challenged, or blocked, so fetch the page behind a trusted, rotating IP and pace your requests.
- The Crawling API does the hard part in one call. It fetches the page behind residential IPs and handles CAPTCHAs server-side, returning HTML you parse with Cheerio.
-
Cheerio extracts the fields. Select every
li.cl-static-search-resultinsideol.cl-static-search-results, then read title, price, location, post date, and link, and expect markup to drift across cities and over time. - Paginate and export. Loop over Craigslist's offset parameter until results run out, pace your requests, and write structured records to both JSON and CSV.
- Stay on public, non-personal data. Collect listing fields only, never seller contact info or post bodies tied to a person, respect ToS and robots.txt, and remember GDPR and CCPA apply the moment personal data is involved.
Frequently Asked Questions (FAQs)
Does Craigslist have an official API?
Craigslist does not provide a general-purpose public API for accessing its data. Some sections offer RSS feeds for limited access, but there is no comprehensive API. Where a sanctioned feed or a data agreement exists for what you need, use it in preference to scraping, since it comes with clear, permitted-use terms.
Can I build a Craigslist scraper in a language other than JavaScript?
Yes. This guide uses JavaScript with Cheerio, but the same approach works in any language. The Crawling API has libraries and SDKs for several languages, so you fetch the HTML the same way and parse it with whatever HTML parser your stack prefers, such as BeautifulSoup in Python. The selectors and fields stay the same; only the parsing syntax changes.
My selectors return empty values. What changed?
Almost certainly Craigslist's markup, or a difference between city subdomains. Open the saved response.html or a live page in your browser's dev tools, confirm the listing container is still ol.cl-static-search-results with li.cl-static-search-result items, and update the inner selectors in parseListings. Periodic selector maintenance is normal for any production scraper.
Will I get blocked while scraping Craigslist?
You can, especially on Craigslist, if you send too many requests too fast from one address. The Crawling API reduces that risk by rotating through residential IPs and handling CAPTCHAs for you, but you should still pace your requests, add delays between pages, and watch the status codes so you can back off when challenges appear.
Can I scrape seller phone numbers and contact details from posts?
No, and this scraper is built not to. A seller's name, phone number, email, and the free-text body they wrote are personal data. Harvesting them, building profiles of posters, or republishing a post tied to a person pulls in privacy law like GDPR and CCPA and runs against Craigslist's terms. Keep your collection to the public, non-personal listing fields covered here.
What is Craigslist data useful for?
Public listing data supports market research and pricing analysis: tracking how rents and used-goods prices move across neighborhoods and metros, spotting supply gaps, and studying local demand over time. The value is in the aggregate, non-personal signal across many listings, not in any single poster's identity or contact details.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
