Groupon runs thousands of local deals a day across food, travel, wellness, and retail, and each listing carries the kind of structured data a deal-tracking tool, a price-watcher, or a competitive-research dashboard wants: a title, the discounted price, the original price, the saving, where the deal applies, and a link to the offer. The catch is that Groupon builds its category pages in the browser with JavaScript, so a plain HTTP request hands you a near-empty shell instead of the deals you came for.
This guide shows you how to scrape Groupon with Python the reliable way. You build a small, runnable scraper that fetches a rendered category page through the Crawling API, parses each deal card with BeautifulSoup, and prints clean structured output. The whole walkthrough stays scoped to public deal data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.
What you will build
A Python script that takes a public Groupon category URL, retrieves the rendered HTML through the Crawling API, and extracts a structured record for every deal on the page. We will use a city deals page as the running example and pull these fields from each card:
- Title the deal headline, for example "55-Minute Couples Massage".
- Current price the discounted price you actually pay.
- Original price the struck-through list price.
- Discount the saving, derived from the two prices.
- Location the address or area the deal applies to.
- Link the URL of the individual deal page.
Why a plain fetch fails on Groupon
If you request a Groupon category URL with a bare HTTP client, you get a response with status 200 and almost none of the deal data in the body. Two things work against you. First, Groupon renders its deal cards in the browser with JavaScript, so the initial HTML is a frame that only fills in after the page's scripts run. Second, Groupon flags automated traffic quickly: datacenter IPs and request patterns that do not look like a real browser get challenged or blocked before they ever reach the rendered cards.
So a working Groupon scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted IP, and it returns finished HTML for you to parse.
Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Groupon is client-side rendered, so you need the JS token here. Using the normal token returns the same empty frame a plain fetch would, and there is nothing to parse out of it. You can start with 1,000 free requests, no credit card needed.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If BeautifulSoup is new to you, our guide to using BeautifulSoup in Python covers the parsing basics this tutorial assumes.
Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.
A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token from the account docs page. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create a virtual environment so project dependencies stay isolated, then install the two libraries the scraper needs.
python --version python -m venv groupon_env source groupon_env/bin/activate pip install crawlbase beautifulsoup4
On Windows, activate the environment with groupon_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull out individual fields by CSS selector.
Step 1: Fetch the rendered category page
Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JS token, and request the category URL. Checking the status before you parse keeps failures loud instead of silent.
from crawlbase import CrawlingAPI api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"}) def crawl(page_url): options = {"ajax_wait": "true", "page_wait": 5000} response = api.get(page_url, options) if response["status_code"] == 200: return response["body"].decode("utf-8") print(f"Request failed: {response['status_code']}") return None if __name__ == "__main__": page_url = "https://www.groupon.com/local/washington-dc" html = crawl(page_url) print(html[:500] if html else "No HTML returned")
The two wait options matter for a client-rendered target like this. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so late-rendering cards appear before the page is captured. Five seconds is a reasonable start; raise it if the deal cards come back empty. Run the script with python scraper.py and you should see real deal markup, not the empty frame a plain fetch returns. That confirms rendering works before you write a single selector.
Groupon needs a rendered page behind a trusted IP, in one call. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a public city deals page on the free tier first.
Step 2: Parse the deal cards with BeautifulSoup
With rendered HTML in hand, load it into BeautifulSoup and pull each deal by its selector. Groupon lays its deal cards out in a repeated structure, so you select all the cards once and then read the same fields from each one. Inspect the live page in your browser's dev tools to confirm the current class names; the selectors below match the layout at the time of writing.
from bs4 import BeautifulSoup def text_of(card, selector): el = card.select_one(selector) return el.get_text(strip=True) if el else None def parse_deals(html): soup = BeautifulSoup(html, "html.parser") cards = soup.select('div[data-item-type="card"] > article') deals = [] for card in cards: link = card.select_one("a[href]") deals.append({ "title": text_of(card, "h2.text-dealCardTitle"), "current_price": text_of(card, 'div[data-testid="green-price"]'), "original_price": text_of(card, 'div[data-testid="strike-through-price"]'), "location": text_of(card, "h2 + div span"), "link": link["href"] if link else None, }) return deals
The text_of helper does two useful things at once: it queries a single element inside the card and returns None when that element is missing, instead of throwing on a .get_text() call against nothing. That keeps the extraction resilient when one field is absent on a given card, which is common since not every deal lists a struck-through original price or a location. The link is read from an anchor's href rather than its text, so it is handled separately.
Groupon's class names and data-testid attributes change without notice. Treat the selectors above as a starting template, not a contract. When a field comes back as None for every card, re-inspect a live deal in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.
Step 3: Compute the discount
Groupon sometimes prints a discount badge, but the most reliable saving is the one you derive from the two prices you already extracted. Parse the numeric value out of each price string, then compute the percentage. Doing the math yourself means the field is consistent across every card, whether or not the page shows a badge.
import re def to_amount(price): if not price: return None match = re.search(r"[\d,.]+", price) return float(match.group().replace(",", "")) if match else None def discount_percent(original, current): o, c = to_amount(original), to_amount(current) if not o or not c or o <= 0: return None return round((o - c) / o * 100)
The to_amount helper strips currency symbols and thousands separators so "$45" and "$1,299.00" both parse cleanly, and it returns None rather than crashing when a price is missing. discount_percent guards against a zero or absent original price before dividing, so a card with no struck-through price simply yields a None discount instead of raising. Fold the result into each record in the loop with "discount": discount_percent(original_price, current_price).
Step 4: Put it together
Now wire the fetch, the parse, and the discount math into one runnable script. Fetch the rendered HTML, hand it to the parser, enrich each record with the computed discount, and write the result to JSON.
import json import re from crawlbase import CrawlingAPI from bs4 import BeautifulSoup api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"}) def crawl(page_url): options = {"ajax_wait": "true", "page_wait": 5000} response = api.get(page_url, options) if response["status_code"] == 200: return response["body"].decode("utf-8") print(f"Request failed: {response['status_code']}") return None def text_of(card, selector): el = card.select_one(selector) return el.get_text(strip=True) if el else None def to_amount(price): if not price: return None match = re.search(r"[\d,.]+", price) return float(match.group().replace(",", "")) if match else None def discount_percent(original, current): o, c = to_amount(original), to_amount(current) if not o or not c or o <= 0: return None return round((o - c) / o * 100) def parse_deals(html): soup = BeautifulSoup(html, "html.parser") cards = soup.select('div[data-item-type="card"] > article') deals = [] for card in cards: link = card.select_one("a[href]") current = text_of(card, 'div[data-testid="green-price"]') original = text_of(card, 'div[data-testid="strike-through-price"]') deals.append({ "title": text_of(card, "h2.text-dealCardTitle"), "current_price": current, "original_price": original, "discount": discount_percent(original, current), "location": text_of(card, "h2 + div span"), "link": link["href"] if link else None, }) return deals def main(): page_url = "https://www.groupon.com/local/washington-dc" html = crawl(page_url) if not html: return deals = parse_deals(html) with open("groupon_deals.json", "w") as f: json.dump(deals, f, indent=2) print(f"Saved {len(deals)} deals") if __name__ == "__main__": main()
What the output looks like
Run the full script with python scraper.py and you get a clean structured record for each deal, ready to write to JSON, CSV, or a database.
[ { "title": "55-Minute Couples Massage or 50-Minute Deep Tissue Massage", "current_price": "$65", "original_price": "$95", "discount": 32, "location": "4238 Wilson Boulevard, Arlington", "link": "https://www.groupon.com/deals/refresh-therapeutic-massage-3" }, { "title": "Spa Package with Glass of Wine at Spa Logic", "current_price": "$219", "original_price": "$330", "discount": 34, "location": "1721 Connecticut Avenue Northwest, Washington", "link": "https://www.groupon.com/deals/spa-logic-12" } ]
Looping category pages and pacing requests
One category is a demo; a real job runs over several cities or verticals. The shape stays the same: keep a list of category URLs, fetch each through the Crawling API, parse it with the same function, and collect the rows. Because every category page shares the same card structure, the parser you already wrote works across all of them without changes. The one habit that keeps a long run healthy is pacing: pause between requests so you are not hammering Groupon in a tight loop.
import time categories = [ "https://www.groupon.com/local/washington-dc", "https://www.groupon.com/local/new-york-city", "https://www.groupon.com/local/chicago", ] results = [] for url in categories: html = crawl(url) if html: results.extend(parse_deals(html)) time.sleep(3) with open("groupon_deals.json", "w") as f: json.dump(results, f, indent=2)
Groupon also loads more deals behind a "Load more" button rather than a numbered page. If you want the cards below the fold, pass a css_click_selector option to the Crawling API pointing at that button, and the API clicks it before capturing the page. Inspect the live button in dev tools to read its current selector, since that attribute drifts like the rest of the markup.
Staying unblocked
Even with rendering handled, Groupon watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.
- Pace your requests. Hammering category pages in a tight loop is the fastest way to get throttled. Spread requests out and vary your targets instead of crawling one city at full speed.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
- Read the status codes. A run that starts returning challenges or errors is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.
For the broader playbook, see how to scrape websites without getting blocked and the deeper dive on how to bypass captchas while web scraping. If your target is client-rendered like Groupon, our guide on crawling JavaScript websites explains why rendering matters. And if you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart AI Proxy (also called the AI Proxy) gives you the same residential IP rotation as a drop-in proxy endpoint.
Is it legal to scrape Groupon?
Whether scraping Groupon is allowed depends on Groupon's terms of service, your jurisdiction, and what you do with the data. Groupon's terms place limits on automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read the Groupon Terms of Service and its robots.txt, and treat both as the boundary for what you collect.
A few lines worth holding to. Collect only public deal data: the title, current price, original price, discount, location, and link that anyone can see on a category page without an account. Respect Groupon's stated rate expectations and keep your request volume low enough that you are not straining its servers. If you plan to reuse the data commercially, get permission or an official agreement rather than assuming silence is consent.
This guide is deliberately scoped to public deal and category pages because that is the line that keeps the work defensible. It does not cover anything behind a login, account or personal data, customer details, login-walled pages, or any attempt to bypass authentication. Public deal data only. If your project needs more than that, an official data agreement is the correct path, not a cleverer scraper.
Key takeaways
- Groupon is client-side rendered. A plain fetch returns an empty frame, so you must render the page before you parse it.
-
You need rendering and a trusted IP together. The Crawling API with a JS token does both in one call;
ajax_waitandpage_waitcontrol how long it waits for content. - BeautifulSoup does the extraction. Select all deal cards, then read title, current price, original price, location, and link from each, and expect the selectors to drift.
- Derive the discount yourself. Parsing both prices and computing the saving is more consistent than scraping a badge that is not always present.
- Scale by looping categories and pacing. The same parser works across every category page, so a real job is a list of URLs plus a sleep between requests.
- Stay on public data. Respect Groupon's ToS and robots.txt, and never touch accounts, personal data, or login-walled pages.
Frequently Asked Questions (FAQs)
Why does a plain fetch return no deals from Groupon?
Because Groupon renders its deal cards client-side with JavaScript. The initial HTML is a frame that only fills in after the page's scripts run in a browser, so a raw HTTP request returns status 200 with the cards empty. To get real data you have to render the page first, which is what the Crawling API's JS token handles for you.
Do I need the normal token or the JS token for Groupon?
The JS token. The normal token fetches static HTML, which on Groupon is the same empty frame a plain fetch returns. The JS token renders the page in a real browser before handing back the HTML, so the deal cards are present when BeautifulSoup parses them.
How do I scrape all the deals below the "Load more" button?
Groupon loads extra deals behind a button rather than a numbered page. Pass a css_click_selector option to the Crawling API with the button's CSS selector, and the API clicks it before capturing the page so the additional cards are in the HTML you parse. Confirm the button's current selector in your browser's dev tools, since that attribute changes over time.
My selectors return None for every card. What changed?
Almost certainly Groupon's markup. Its class names and data-testid attributes change without notice, so selectors that worked last month can break. Re-inspect a live deal in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.
Can I scrape account or personal data from Groupon?
No, and this guide does not cover it. Account details, personal data, and anything behind a login sit outside public deal pages, so they are not in scope here and run against Groupon's terms. Scraping login-walled content, or bypassing authentication to reach it, is not part of this approach. Stick to public deal and category data.
How do I avoid getting blocked while scraping Groupon?
Keep your per-IP request rate low, vary your targets instead of looping one city, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation and a trusted IP pool for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing challenges.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
