Gumtree is one of the most-visited online classifieds sites in the UK, a single marketplace where people list cars, furniture, property, electronics, and jobs for their local area. Every search results page is a public, structured feed of what is for sale right now: each tile carries a title, an asking price, a location, and a link through to the full ad. That makes it a clean source for price comparison, regional demand research, and tracking how listings move over time.
This guide shows you how to scrape Gumtree classified listings with Python. You build a small, runnable scraper that fetches a Gumtree search page through the Crawling API, parses a clean record for each listing tile, handles pagination, and exports the results to JSON and CSV. The whole walkthrough stays scoped to public listing data: the titles, prices, locations, and links anyone can see on a search results page without logging in.
What you will build
A Python script that takes a Gumtree search URL, retrieves the rendered results page through the Crawling API, and extracts a structured record per listing tile. We use a headset search as the running example, the same query the legacy walkthrough used, and pull these fields from each tile:
- Title the listing title shown on the tile, for example the make and model of a car.
- Price the seller's asking price, when the tile shows one.
- Location the area the item is listed in, useful for regional demand analysis.
- Link the absolute URL to the listing's own ad page.
Why a plain request fails on Gumtree
If you point a bare HTTP client at a Gumtree search URL, you rarely get the tiles you came for. Two things work against you. First, Gumtree renders much of the results grid client-side: the server ships a lightweight shell and the listing tiles fill in as the page's JavaScript runs, so the initial HTML you get from a plain requests.get is often missing the very tiles you want to parse. Second, Gumtree watches for automated traffic. Datacenter IP ranges and request patterns that do not look like a real browser get met with a challenge page, a rate limit, or an outright block before you ever reach the listings.
So a working Gumtree scraper needs two things in one request: a browser that renders the page, and an IP that Gumtree reads as a real visitor. You can assemble that yourself with a headless browser and a pool of rotating residential proxies, but keeping that stack healthy is most of the work. The Crawling API folds both into a single call: you send it the search URL, it renders the page behind a trusted residential IP, handles the rotation and any CAPTCHA, and returns finished HTML for you to parse.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, the guide on web scraping with Python covers the level this tutorial assumes.
Python 3.8 or later. Confirm your version with python --version (or python3 --version). If you do not have it, install it from python.org and make sure Python is on your system PATH.
A Crawlbase account and token. Sign up for a free account, open your dashboard, and copy your token. The free tier includes 1,000 requests with no card, which is plenty to build and test this scraper. Treat the token like a password and keep it out of version control.
Set up the project
Create a virtual environment so project dependencies stay isolated, then install the two libraries the scraper needs. crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull each field out of the listing tiles by CSS selector.
python --version python -m venv gumtree_env source gumtree_env/bin/activate pip install crawlbase beautifulsoup4
On Windows, activate the environment with gumtree_env\Scripts\activate instead of the source line. With both libraries installed, create the script file the rest of the guide builds up:
touch gumtree_scraper.py
Inspect the search page for selectors
Before writing any selectors, open a Gumtree search results page in your browser, right-click a listing tile, and choose Inspect. Gumtree wraps each result in an article element marked with a data-q="search-result" attribute, and exposes the fields inside that container with their own stable data-* attributes. Lean on those structural attributes rather than a brittle chain of generated class names: they survive cosmetic redesigns far better.
From inspecting a results page, these are the selectors the legacy walkthrough relied on, and they remain the right starting point:
-
Title a
<div>withdata-q="tile-title". -
Price a
<div>withdata-testid="price". -
Location a
<div>withdata-q="tile-location". -
Link the
hrefon the<a>tag withdata-q="search-result-anchor", which is relative and needs thehttps://www.gumtree.comhost prepended.
Step 1: Fetch the rendered search page
Start by getting the finished page. Import the CrawlingAPI class, initialize it with your token, set the search URL, and request it. Checking the status code before you parse keeps failures loud instead of silent.
from crawlbase import CrawlingAPI api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) def crawl(page_url): options = {"ajax_wait": "true", "page_wait": 3000} response = api.get(page_url, options) if response["status_code"] == 200: return response["body"].decode("utf-8", "ignore") print(f"Request failed: {response['status_code']}") return None if __name__ == "__main__": search_url = "https://www.gumtree.com/search?q=headset" html = crawl(search_url) print(html[:500] if html else "No HTML returned")
The two wait options matter for a grid that fills in as the page loads. ajax_wait tells the API to wait for asynchronous content to finish, and page_wait holds for a fixed number of milliseconds after load so the late-rendering tiles appear before the page is captured. Run the script and you should see real listing markup, not a challenge shell. That confirms rendering works before you write a single selector.
Gumtree's results grid renders client-side and the site challenges traffic that does not look like a real visitor. The Crawling API takes your token, runs the search page in a real browser, rotates through residential IPs server-side, and handles any CAPTCHA, then hands you finished HTML. You skip running a headless browser fleet and a proxy pool yourself. Point it at a search URL on the free 1,000-request tier first.
Step 2: Parse the listing tiles with BeautifulSoup
With rendered HTML in hand, load it into BeautifulSoup, find every result tile, and pull each field by its selector. Gumtree wraps each result in an article[data-q="search-result"] container and exposes the title, price, and location inside it on their own data-* attributes. Read the listing link from the tile's anchor and normalize it to an absolute URL. Wrap each tile in a try/except so one malformed listing does not crash the run.
from bs4 import BeautifulSoup BASE = "https://www.gumtree.com" def text_of(tile, selector): el = tile.select_one(selector) return el.get_text(strip=True) if el else None def parse_link(tile): a = tile.select_one('a[data-q="search-result-anchor"]') if not a or not a.get("href"): return None href = a["href"] return href if href.startswith("http") else BASE + href def scrape_gumtree_search(html): soup = BeautifulSoup(html, "html.parser") tiles = soup.select('article[data-q="search-result"]') results = [] for tile in tiles: try: results.append({ "title": text_of(tile, 'div[data-q="tile-title"]'), "price": text_of(tile, 'div[data-testid="price"]'), "location": text_of(tile, 'div[data-q="tile-location"]'), "link": parse_link(tile), }) except Exception as e: print(f"Skipped a tile: {e}") return results
The text_of helper queries one element inside a tile and returns None when it is missing, instead of throwing on a .get_text() call against nothing. That keeps extraction resilient when a field is absent, which is common since not every listing shows a clean price. The link comes from the a[data-q="search-result-anchor"] anchor and is normalized to an absolute URL, because Gumtree serves a relative href that points to the ad page.
Gumtree's data-q and data-testid attributes are more durable than generated class names, but they are not permanent: a redesign can rename or remove them. Treat the selectors above as a starting template, not a contract. When a field comes back as None for every tile, re-inspect the live search page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper.
Step 3: Handle pagination and export JSON and CSV
One results page is a demo; a real job walks several. Gumtree paginates its search results with a ?page=N query parameter, so you can loop over a fixed number of pages, fetch each one, and collect the tiles. Now wire the fetch, the parse, and the pagination loop into one runnable script, then write the records to both JSON and CSV so you can load them into a notebook or a spreadsheet.
import csv import json import time from crawlbase import CrawlingAPI from bs4 import BeautifulSoup api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) BASE = "https://www.gumtree.com" FIELDS = ["title", "price", "location", "link"] def crawl(page_url): options = {"ajax_wait": "true", "page_wait": 3000} response = api.get(page_url, options) if response["status_code"] == 200: return response["body"].decode("utf-8", "ignore") print(f"Request failed: {response['status_code']}") return None def text_of(tile, selector): el = tile.select_one(selector) return el.get_text(strip=True) if el else None def parse_link(tile): a = tile.select_one('a[data-q="search-result-anchor"]') if not a or not a.get("href"): return None href = a["href"] return href if href.startswith("http") else BASE + href def scrape_gumtree_search(html): soup = BeautifulSoup(html, "html.parser") tiles = soup.select('article[data-q="search-result"]') results = [] for tile in tiles: try: results.append({ "title": text_of(tile, 'div[data-q="tile-title"]'), "price": text_of(tile, 'div[data-testid="price"]'), "location": text_of(tile, 'div[data-q="tile-location"]'), "link": parse_link(tile), }) except Exception as e: print(f"Skipped a tile: {e}") return results def scrape_pages(base_url, max_pages): all_listings = [] for page in range(1, max_pages + 1): page_url = f"{base_url}&page={page}" html = crawl(page_url) if not html: break found = scrape_gumtree_search(html) if not found: break all_listings.extend(found) print(f"Page {page}: {len(found)} listings") time.sleep(2) return all_listings def export(rows, name="gumtree_listings"): with open(f"{name}.json", "w", encoding="utf-8") as f: json.dump(rows, f, indent=2, ensure_ascii=False) with open(f"{name}.csv", "w", newline="", encoding="utf-8") as f: writer = csv.DictWriter(f, fieldnames=FIELDS) writer.writeheader() writer.writerows(rows) print(f"Saved {len(rows)} listings to {name}.json and {name}.csv") def main(): base_url = "https://www.gumtree.com/search?q=headset" rows = scrape_pages(base_url, max_pages=5) export(rows) if __name__ == "__main__": main()
Run the full script with python gumtree_scraper.py. It walks up to five pages of the headset search, parses one row per listing tile, and writes both gumtree_listings.json and gumtree_listings.csv. The scrape_pages loop appends &page=N to the search URL, breaks early when a page returns no tiles (so you stop at the end of the results instead of fetching empty pages), and the time.sleep(2) between requests paces the run. The shared FIELDS list keeps the CSV column order in step with the dictionary keys, so the two exports never drift apart.
What the output looks like
You get a clean list of listing records, in page order, ready to write to JSON, CSV, or a database.
[ { "title": "SteelSeries Arctis 7 Wireless Gaming Headset", "price": "£65.00", "location": "Manchester, Greater Manchester", "link": "https://www.gumtree.com/p/headphones/steelseries-arctis-7/1488114476" }, { "title": "Sony WH-1000XM4 Noise Cancelling Headphones", "price": "£180.00", "location": "Leeds, West Yorkshire", "link": "https://www.gumtree.com/p/headphones/sony-wh-1000xm4/1483456978" } ]
The CSV mirrors these rows with one line per listing under a title,price,location,link header. From there you can sort by price, group by location to see where a given item is most common, or diff successive runs to track how listings and asking prices move over time. For a worked example of turning listing exports into a comparison view, see e-commerce web scraping.
Scaling beyond one search
The same pattern extends to as many searches as you need. Keep a map of query names to search URLs, loop over it, and run the paginated scraper for each one, keying the output by query so the lists stay separate. If you want richer records, follow each tile's link to its ad page and parse the fuller fields there (the full description, the date listed, image URLs), then merge those into the listing record. Either way, keep the request rate modest and lean on the Crawling API's rotation so a fan-out across many searches does not trip a rate limit.
For local market research the location field is the lever: filtering by area, or scraping the same query across several city searches, lets you compare prices and supply region by region. That is exactly the kind of regional signal Gumtree's public listings are well suited to, and it is why classifieds data is worth collecting in a structured form rather than eyeballing the site.
Staying unblocked
Even with rendering handled, Gumtree watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any classifieds or marketplace target.
- Pace your requests. Spread requests out with a delay between pages and searches rather than crawling everything at full speed. Schedule heavier jobs during off-peak hours to ease load on Gumtree's servers.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
- Retain only what you need. Store the public listing fields your project uses and discard the rest. Re-check your selectors periodically so the scraper keeps pace with markup changes.
For the broader playbook on avoiding blocks, see how to scrape websites without getting blocked, and for more on why rendering matters here, how to crawl JavaScript websites. If you want to deepen the BeautifulSoup side, the guide on using BeautifulSoup in Python covers the parsing library in detail.
Is it legal to scrape Gumtree?
Whether scraping Gumtree is allowed depends on Gumtree's Terms of Service, your jurisdiction, and what you do with the data. Gumtree's terms place limits on automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Gumtree's Terms of Service and its robots.txt, and treat both as the boundary for what you collect. For commercial or competitive use, the legal picture gets more complex, and consulting a legal expert about your specific case is the sensible move.
A few lines worth holding to. Collect only public listing data: the titles, prices, locations, and ad links that anyone can see on a search results page without an account. Do not collect personal data of private sellers: names, phone numbers, email addresses, or anything that could identify an individual, even when it appears on a public ad. Keep your request volume low enough that you are not straining Gumtree's servers, and never scrape anything behind a login or a contact-reveal step that exists to gate a seller's details.
This guide is deliberately scoped to public search-listing fields because that is the line that keeps the work defensible. It does not cover account or messaging data, a seller's personal contact details, or any attempt to bypass authentication or a CAPTCHA you are not entitled to pass. If your project needs more than public listing data, the right path is permission or a data agreement, not a cleverer scraper. Where a site offers an official API or data feed, prefer it for licensed or bulk access.
Key takeaways
- Gumtree search pages are a public classifieds feed. Each tile carries a title, price, location, and ad link, which is why the data is useful for price comparison and regional demand research.
- You need rendering and a trusted IP together. Gumtree fills the results grid client-side and challenges bot traffic, so the Crawling API renders the page behind a residential IP in one call.
-
BeautifulSoup does the extraction. Loop
article[data-q="search-result"]tiles and map title, price, location, and link from theirdata-*attributes, and expect those attributes to drift. -
Pagination is a query parameter. Append
&page=Nto walk multiple pages, break when a page returns no tiles, and pace requests with a short delay. - Stay on public listing data. Respect Gumtree's Terms of Service and robots.txt, and never collect a private seller's personal contact details or anything behind a login.
Frequently Asked Questions (FAQs)
Why does a plain request return no listings from Gumtree?
Two reasons. Gumtree fills much of the results grid client-side as the page loads, so a raw requests.get often gets a shell missing the listing tiles. On top of that, Gumtree challenges or blocks traffic that does not look like a real browser. Rendering the page through the Crawling API behind a trusted IP solves both, which is why the scraper here routes its request through it.
What data can I scrape from a Gumtree search page?
From each result tile you can read the public fields the listing shows: the title, the asking price, the location, and the link to the full ad. This walkthrough extracts exactly those four. Following the link to the ad page gives you fuller fields such as the description, date listed, and image URLs, but keep collection to public listing content and avoid a private seller's personal contact details.
How do I handle pagination on Gumtree?
Gumtree paginates search results with a ?page=N query parameter, so you append &page=2, &page=3, and so on to the search URL and fetch each one. The scrape_pages function in this guide loops over a page range, stops early when a page returns no tiles, and adds a short delay between requests so you are not hammering the site.
Which selectors does the scraper use?
Each result is an article[data-q="search-result"]. Inside it, the title is div[data-q="tile-title"], the price is div[data-testid="price"], the location is div[data-q="tile-location"], and the ad link is the href on a[data-q="search-result-anchor"]. These data-* attributes are more durable than generated class names, but re-inspect the live page if a field starts returning empty.
Can I export the data to a spreadsheet?
Yes. The export function writes both a JSON file and a CSV with a title,price,location,link header, one row per listing. Open the CSV directly in Excel, Google Sheets, or any spreadsheet tool, or load the JSON into a pandas DataFrame or a notebook for analysis and charting.
How do I avoid getting blocked while scraping Gumtree?
Keep your per-IP request rate low, add a delay between pages and searches, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation, a trusted IP pool, and CAPTCHA handling for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing challenges.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
