Foursquare holds public location data on millions of places: restaurants, cafes, parks, bars, and museums, each with a name, a category, an address, and a public rating. For local business research, market analysis, or building a recommendation feature, that public venue data is genuinely useful. The catch is that Foursquare renders its pages with JavaScript, so a plain HTTP request returns a near-empty shell rather than the venue list you can see in a browser.
This guide shows you how to extract public Foursquare venue data with Python through the Crawling API, which renders the page and routes the request through a trusted IP in one call. Everything here stays scoped to public venue and place data: names, categories, addresses, and public ratings. It does not cover anything behind a login, and it does not touch personal data about individual users or their check-ins. For production use, the official Foursquare Places API is the right tool, and the legality section near the end explains why.
What you will build
A small Python scraper that takes a public Foursquare search URL or a single venue URL, fetches the fully rendered page through the Crawling API, and parses a handful of public venue fields:
- Venue name the business or place name shown on the listing.
- Category the type of venue, such as Thai, Bakery, or Bar.
- Address the public street address of the venue.
- Rating the aggregate public rating the venue displays.
- Link the permalink to the venue's public detail page.
The script handles multiple results from a search page, walks each listing, and exports the collected records to JSON and CSV so the data is ready for local business research. Notice what is deliberately absent: no individual user profiles, no check-in histories, no personal data tied to a named person. Those are out of scope here on purpose.
Why a plain request fails on Foursquare
Request a Foursquare search page with a bare HTTP client and you get a response that is technically successful and practically useless. The venue content loads dynamically: the real listings only appear after the page's scripts run in a browser and fetch data from internal endpoints. A raw request captures the page before any of that happens, so there is nothing to parse.
On top of rendering, Foursquare watches for automated traffic. Datacenter IP ranges and repetitive request patterns get challenged or rate-limited before the interesting content loads. So a working scraper needs two things in the same request: a real browser that renders the page, and an IP address the platform reads as an ordinary visitor. You can build that with a headless browser and a pool of residential proxies, but keeping that stack healthy is most of the work. The Crawling API folds both into one call. You send it a URL, it renders the page behind a trusted residential IP, and it returns finished HTML you can parse. For the deeper background, see our guide on how to crawl JavaScript websites.
Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Foursquare is client-side rendered, so you need the JS token here. The normal token returns the same shell a plain fetch would, with nothing useful to parse out of it.
Prerequisites
A few things to have in place first. None take long.
Basic Python. You should be comfortable running a script and installing packages with pip. If you are new to parsing HTML, our primer on how to use BeautifulSoup in Python covers the extraction side.
Python 3.8 or later. Confirm with python --version. If you do not have it, install it from python.org.
A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token. Crawlbase gives you 1,000 free requests to start, and you pay only for successful requests. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create an isolated virtual environment, then install the two libraries the scraper needs.
python --version python -m venv foursquare_env source foursquare_env/bin/activate pip install crawlbase beautifulsoup4
On Windows, activate with foursquare_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull out individual fields by selector.
Step 1: Fetch the rendered search page
Start by getting the finished page. Import CrawlingAPI, initialize it with your JS token, and request a public search URL. Because Foursquare loads listings asynchronously, pass the ajax_wait and page_wait options so the API holds until the content has rendered. Check the status before parsing so failures stay loud instead of silent.
from crawlbase import CrawlingAPI crawling_api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) def make_crawlbase_request(url): options = { "ajax_wait": "true", "page_wait": "5000", } response = crawling_api.get(url, options) if response["headers"]["pc_status"] == "200": return response["body"].decode("utf-8") print(f"Failed to fetch the page. Crawlbase status: {response['headers']['pc_status']}") return None if __name__ == "__main__": url = "https://foursquare.com/explore?near=New%20York&q=Food" html = make_crawlbase_request(url) print(html[:500] if html else "No HTML returned")
The two wait options matter for a client-rendered target. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so late-rendering listings appear before the page is captured. Five seconds is a reasonable starting point; raise it if listings come back empty. The status check reads pc_status from the response headers, which is the Crawlbase status for the crawl itself. Run the script and you should see real venue markup, which confirms rendering works before you write a single selector.
Foursquare needs a rendered page behind a trusted IP, in one call. The Crawling API takes a JS token, runs the page in a real browser so those ajax_wait listings actually load, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a public search page on the free tier first.
Step 2: Inspect the markup and parse listings
Before writing selectors, open a Foursquare search results page in your browser, right-click a listing, and choose Inspect. You are looking for the elements that wrap each venue and hold the fields you want. On the search results page, each place sits inside a list item, and the public fields map to these selectors:
-
Venue name lives in an
<a>tag inside adiv.venueName. -
Address lives in a
div.venueAddress. -
Category lives in a
span.categoryName. -
Link is the
hrefof that samediv.venueName aanchor.
With rendered HTML in hand, load it into BeautifulSoup and walk every listing. Each result row matches ul.recommendationList > li.singleRecommendation. Guarding every field with an existence check keeps the parser from crashing when a venue is missing one of them.
from bs4 import BeautifulSoup def scrape_foursquare_listings(html): soup = BeautifulSoup(html, "html.parser") venues = [] listings = soup.select("ul.recommendationList > li.singleRecommendation") for listing in listings: name_el = listing.select_one("div.venueName a") address_el = listing.select_one("div.venueAddress") category_el = listing.select_one("span.categoryName") rating_el = listing.select_one("span.venueScore") href = name_el["href"] if name_el and name_el.has_attr("href") else "" venues.append({ "name": name_el.text.strip() if name_el else "", "category": category_el.text.strip() if category_el else "", "address": address_el.text.strip() if address_el else "", "rating": rating_el.text.strip() if rating_el else "", "link": f"https://foursquare.com{href}" if href else "", }) return venues
The function returns a list of dictionaries, one per venue, with the five public fields. The link is built by joining the relative href onto the https://foursquare.com origin so each record carries a usable permalink. The rating reads from span.venueScore; if Foursquare has renamed that class on the page you inspect, swap in whatever class wraps the visible score. Treat the rating as a public aggregate, not a signal about any individual reviewer.
Foursquare changes its markup and class names without notice. When a field comes back empty, re-inspect the live page in your browser's dev tools and update the selector. Periodic maintenance is normal for any production scraper, not a sign something is broken. The guarded extraction above means a renamed class yields an empty string rather than a crash.
Step 3: Handle multiple pages of results
Foursquare search results use button-based pagination: a "See more results" button loads the next batch of venues in place rather than navigating to a new URL. The Crawling API can click that button for you with the css_click_selector option, so the rendered HTML you receive already contains the expanded list. Point the selector at the button responsible for loading more.
def make_request_with_pagination(url): options = { "ajax_wait": "true", "page_wait": "5000", "css_click_selector": "li.moreResults > button", } response = crawling_api.get(url, options) if response["headers"]["pc_status"] == "200": return response["body"].decode("utf-8") print(f"Failed to fetch the page. Crawlbase status: {response['headers']['pc_status']}") return None
The css_click_selector value targets the button inside li.moreResults. If the button class differs on the page you inspect, update the selector to match. Keep the volume modest: public-data research does not require loading an entire city's listings in one run. Sample what you need and stop.
Step 4: Assemble the full scraper and export
Now wire fetch, parse, and export into one runnable script. The script fetches a search page with pagination, parses every listing, and writes the records to both JSON and CSV. JSON keeps the structure intact for further processing; CSV drops straight into a spreadsheet for quick local business research.
import json import csv from crawlbase import CrawlingAPI from bs4 import BeautifulSoup crawling_api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) def make_request_with_pagination(url): options = { "ajax_wait": "true", "page_wait": "5000", "css_click_selector": "li.moreResults > button", } response = crawling_api.get(url, options) if response["headers"]["pc_status"] == "200": return response["body"].decode("utf-8") print(f"Failed to fetch the page. Crawlbase status: {response['headers']['pc_status']}") return None def scrape_foursquare_listings(html): soup = BeautifulSoup(html, "html.parser") venues = [] listings = soup.select("ul.recommendationList > li.singleRecommendation") for listing in listings: name_el = listing.select_one("div.venueName a") address_el = listing.select_one("div.venueAddress") category_el = listing.select_one("span.categoryName") rating_el = listing.select_one("span.venueScore") href = name_el["href"] if name_el and name_el.has_attr("href") else "" venues.append({ "name": name_el.text.strip() if name_el else "", "category": category_el.text.strip() if category_el else "", "address": address_el.text.strip() if address_el else "", "rating": rating_el.text.strip() if rating_el else "", "link": f"https://foursquare.com{href}" if href else "", }) return venues def save_to_json(data, filename="foursquare_data.json"): with open(filename, "w", encoding="utf-8") as f: json.dump(data, f, indent=4, ensure_ascii=False) print(f"Saved {len(data)} venues to {filename}") def save_to_csv(data, filename="foursquare_data.csv"): if not data: return fields = ["name", "category", "address", "rating", "link"] with open(filename, "w", newline="", encoding="utf-8") as f: writer = csv.DictWriter(f, fieldnames=fields) writer.writeheader() writer.writerows(data) print(f"Saved {len(data)} venues to {filename}") if __name__ == "__main__": url = "https://foursquare.com/explore?near=New%20York&q=Food" html = make_request_with_pagination(url) if html: venues = scrape_foursquare_listings(html) save_to_json(venues) save_to_csv(venues)
This is the whole pipeline in one file: fetch with pagination, parse the listings, and export to both formats. The CSV writer pins the column order to the five public fields so the output is stable across runs. Swap the url for any public Foursquare search to retarget the scraper at a different city or category.
What the output looks like
Run the full script and you get a clean list of public venue records. Here is a trimmed JSON sample of what foursquare_data.json holds.
[ { "name": "Thai Diner", "category": "Thai", "address": "186 Mott St (at Kenmare), New York", "rating": "9.5", "link": "https://foursquare.com/v/thai-diner/5e46e2ec5791a10008c55728" }, { "name": "Mah-Ze-Dahr Bakery", "category": "Bakery", "address": "28 Greenwich Ave (Charles Street), New York", "rating": "9.1", "link": "https://foursquare.com/v/mahzedahr-bakery/568c0ce238fafac5f5ffe631" } ]
The CSV version carries the same fields as one row per venue with a header line, which opens directly in any spreadsheet tool. From here you can filter by category, group by neighborhood, or join the addresses against another dataset for local business research. If price signals are part of your analysis, our guide on web scraping for price intelligence covers how aggregated public data feeds that kind of work.
Scaling to venue detail pages
The search scraper gives you a list and a link per venue. To enrich each record, feed the link field back through the same fetch function and parse the venue's own page, where Foursquare exposes more structured public detail. On a venue page the public fields map to these selectors: the name sits in an h1.venueName, the address in a div.venueAddress, the rating in a span[itemprop="ratingValue"], and the public review count in a div.numRatings. Reuse the guarded extraction pattern from Step 2, pace your requests between detail-page fetches, and keep the run scoped to the venues you actually need rather than crawling everything a search returns.
Foursquare is a useful entry point, but venue and local business data lives on many surfaces. For neighboring techniques, see our guides on how to scrape data from Google Maps and scraping local business listings, both of which apply the same render-plus-trusted-IP approach to other map and directory sources.
Staying unblocked
Even with rendering handled by the Crawling API, Foursquare watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any defended target.
- Pace your requests. Hammering pages in a tight loop is the fastest way to get throttled. Add real delays between fetches and resist the urge to parallelize aggressively.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you build your own stack, this is the part to get right.
- Read the status codes. A run that starts returning challenges or errors is telling you the current rate or IP tier is no longer enough. Back off rather than pushing harder.
- Keep volume low and targets varied. Public-data research does not require crawling a whole city. Sample what you need and stop.
For the broader playbook, see our guide on how to scrape websites without getting blocked.
Is it legal to scrape Foursquare?
This is the section to read before you write production code. Scraping public venue data sits in a gray area that depends heavily on how you do it and what you collect. Foursquare's Terms of Service restrict automated access, so read them along with the site's robots.txt, and treat both as the boundary for what you collect and how fast you collect it. The code above makes the technical part work; it does not change what the terms permit.
Keep to public, non-personal venue data. Venue names, categories, public addresses, and aggregate ratings describe places, not people, and that is the safe lane for this kind of research. What you should not touch: anything behind a login, individual user profiles, check-in histories, or any personal data about identifiable users. A venue's aggregate rating is a public number about a place; the individuals who left reviews or checked in are not yours to harvest. When personal data is involved at all, privacy laws such as GDPR and CCPA apply, which means you need a lawful basis to process it and you must honor deletion requests. The simplest way to stay clear of that burden is to not collect personal data in the first place, which is exactly what this guide does.
For any real, ongoing, or commercial use, the right tool is the official Foursquare Places API. It is the sanctioned route, gives you structured venue and category data with a clear usage license, and keeps you inside Foursquare's terms. This article is a technical walkthrough scoped narrowly to public venue data, not an endorsement of large-scale collection or any handling of personal user data. If your project needs more than a sample of public venue fields, the Places API or a formal data agreement is the correct path, not a cleverer scraper.
Key takeaways
- Foursquare is client-side rendered. A plain request returns an empty shell, so you must render the page before you parse it, which the Crawling API's JS token handles.
-
Port the real selectors. Listings live in
li.singleRecommendation, with name, category, address, and link invenueName,categoryName, andvenueAddress. -
Handle pagination server-side. The
css_click_selectoroption clicks the "See more results" button so the rendered HTML already holds the expanded list. - Export for research. Write the venue records to JSON for structure and CSV for spreadsheets, then filter or join them for local business analysis.
- Public venues only, prefer the official API. Collect place data, never personal user data or check-ins, and use the Foursquare Places API for anything real or commercial.
Frequently Asked Questions (FAQs)
Why does a plain request return no data from Foursquare?
Because Foursquare loads its venue listings client-side with JavaScript. The initial HTML is a shell that only fills in after the page's scripts run in a browser, so a raw HTTP request returns a near-empty body. To get real venue data you have to render the page first, which is what the Crawling API's JS token handles for you.
Do I need the normal token or the JS token for Foursquare?
The JS token. The normal token fetches static HTML, which on Foursquare is the same empty shell a plain request returns. The JS token renders the page in a real browser before handing back the HTML, so the venue fields are present when BeautifulSoup parses them.
What Foursquare data is safe to scrape?
Public, non-personal venue data: venue names, categories, public addresses, aggregate ratings, and the public links to venue pages. Anything behind a login, individual user profiles, and check-in histories are off limits. Those are personal data, and collecting them runs against Foursquare's terms and, in many places, privacy law.
How do I handle pagination while scraping Foursquare?
Foursquare search uses a "See more results" button rather than separate URLs. Pass the Crawling API the css_click_selector option pointed at that button (for example li.moreResults > button), and the API clicks it during rendering so the HTML you receive already contains the expanded list of venues.
Should I use the official Foursquare Places API or scrape the site?
For any real, ongoing, or commercial use, use the official Foursquare Places API. It is the sanctioned route, gives structured venue and category data with a clear license, and keeps you inside Foursquare's terms. Scraping a small sample of public venue fields with the approach here fits lightweight research where no API access is in place, as long as you respect the terms, robots.txt, and rate limits.
How do I avoid getting blocked while scraping Foursquare?
Keep your per-IP request rate low, add real delays between requests, vary your targets instead of crawling a whole city, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation and a trusted IP pool for you. Watch the status codes and back off the moment you start seeing challenges.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
