How to Scrape Gumtree Data: classified listings, prices, and locations

Q: How do I handle pagination on Gumtree?

Gumtree paginates search results with a ?page=N query parameter, so you append &page=2, &page=3, and so on to the search URL and fetch each one. The scrape_pages function in this guide loops over a page range, stops early when a page returns no tiles, and adds a short delay between requests so you are not hammering the site.

Q: Which selectors does the scraper use?

Each result is an article[data-q="search-result"]. Inside it, the title is div[data-q="tile-title"], the price is div[data-testid="price"], the location is div[data-q="tile-location"], and the ad link is the href on a[data-q="search-result-anchor"]. These data-* attributes are more durable than generated class names, but re-inspect the live page if a field starts returning empty.

Q: Can I export the data to a spreadsheet?

Yes. The export function writes both a JSON file and a CSV with a title,price,location,link header, one row per listing. Open the CSV directly in Excel, Google Sheets, or any spreadsheet tool, or load the JSON into a pandas DataFrame or a notebook for analysis and charting.

Gumtree is one of the most-visited online classifieds sites in the UK, a single marketplace where people list cars, furniture, property, electronics, and jobs for their local area. Every search results page is a public, structured feed of what is for sale right now: each tile carries a title, an asking price, a location, and a link through to the full ad. That makes it a clean source for price comparison, regional demand research, and tracking how listings move over time.

This guide shows you how to scrape Gumtree classified listings with Python. You build a small, runnable scraper that fetches a Gumtree search page through the Crawling API, parses a clean record for each listing tile, handles pagination, and exports the results to JSON and CSV. The whole walkthrough stays scoped to public listing data: the titles, prices, locations, and links anyone can see on a search results page without logging in.

What you will build

A Python script that takes a Gumtree search URL, retrieves the rendered results page through the Crawling API, and extracts a structured record per listing tile. We use a headset search as the running example, the same query the legacy walkthrough used, and pull these fields from each tile:

Title the listing title shown on the tile, for example the make and model of a car.
Price the seller's asking price, when the tile shows one.
Location the area the item is listed in, useful for regional demand analysis.
Link the absolute URL to the listing's own ad page.

Why a plain request fails on Gumtree

If you point a bare HTTP client at a Gumtree search URL, you rarely get the tiles you came for. Two things work against you. First, Gumtree renders much of the results grid client-side: the server ships a lightweight shell and the listing tiles fill in as the page's JavaScript runs, so the initial HTML you get from a plain requests.get is often missing the very tiles you want to parse. Second, Gumtree watches for automated traffic. Datacenter IP ranges and request patterns that do not look like a real browser get met with a challenge page, a rate limit, or an outright block before you ever reach the listings.

So a working Gumtree scraper needs two things in one request: a browser that renders the page, and an IP that Gumtree reads as a real visitor. You can assemble that yourself with a headless browser and a pool of rotating residential proxies, but keeping that stack healthy is most of the work. The Crawling API folds both into a single call: you send it the search URL, it renders the page behind a trusted residential IP, handles the rotation and any CAPTCHA, and returns finished HTML for you to parse.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to the language, the guide on web scraping with Python covers the level this tutorial assumes.

Python 3.8 or later. Confirm your version with python --version (or python3 --version). If you do not have it, install it from python.org and make sure Python is on your system PATH.

A Crawlbase account and token. Sign up for a free account, open your dashboard, and copy your token. The free tier includes up to 20,000 requests with no card, which is plenty to build and test this scraper. Treat the token like a password and keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the two libraries the scraper needs. crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull each field out of the listing tiles by CSS selector.

bash

python --version

python -m venv gumtree_env
source gumtree_env/bin/activate

pip install crawlbase beautifulsoup4

On Windows, activate the environment with gumtree_env\Scripts\activate instead of the source line. With both libraries installed, create the script file the rest of the guide builds up:

bash

touch gumtree_scraper.py

Inspect the search page for selectors

Before writing any selectors, open a Gumtree search results page in your browser, right-click a listing tile, and choose Inspect. Gumtree wraps each result in an article element marked with a data-q="search-result" attribute, and exposes the fields inside that container with their own stable data-* attributes. Lean on those structural attributes rather than a brittle chain of generated class names: they survive cosmetic redesigns far better.

From inspecting a results page, these are the selectors the legacy walkthrough relied on, and they remain the right starting point:

Title a <div> with data-q="tile-title".
Price a <div> with data-testid="price".
Location a <div> with data-q="tile-location".
Link the href on the <a> tag with data-q="search-result-anchor", which is relative and needs the https://www.gumtree.com host prepended.

Step 1: Fetch the rendered search page

Start by getting the finished page. Import the CrawlingAPI class, initialize it with your token, set the search URL, and request it. Checking the status code before you parse keeps failures loud instead of silent.

python

from crawlbase import CrawlingAPI

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 3000}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("utf-8", "ignore")
    print(f"Request failed: {response['status_code']}")
    return None

if __name__ == "__main__":
    search_url = "https://www.gumtree.com/search?q=headset"
    html = crawl(search_url)
    print(html[:500] if html else "No HTML returned")

The two wait options matter for a grid that fills in as the page loads. ajax_wait tells the API to wait for asynchronous content to finish, and page_wait holds for a fixed number of milliseconds after load so the late-rendering tiles appear before the page is captured. Run the script and you should see real listing markup, not a challenge shell. That confirms rendering works before you write a single selector.

Crawlbase Crawling API

Gumtree's results grid renders client-side and the site challenges traffic that does not look like a real visitor. The Crawling API takes your token, runs the search page in a real browser, rotates through residential IPs server-side, and handles any CAPTCHA, then hands you finished HTML. You skip running a headless browser fleet and a proxy pool yourself. Point it at a search URL on the free tier first.

Start free

Step 2: Parse the listing tiles with BeautifulSoup

With rendered HTML in hand, load it into BeautifulSoup, find every result tile, and pull each field by its selector. Gumtree wraps each result in an article[data-q="search-result"] container and exposes the title, price, and location inside it on their own data-* attributes. Read the listing link from the tile's anchor and normalize it to an absolute URL. Wrap each tile in a try/except so one malformed listing does not crash the run.

python

from bs4 import BeautifulSoup

BASE = "https://www.gumtree.com"

def text_of(tile, selector):
    el = tile.select_one(selector)
    return el.get_text(strip=True) if el else None

def parse_link(tile):
    a = tile.select_one('a[data-q="search-result-anchor"]')
    if not a or not a.get("href"):
        return None
    href = a["href"]
    return href if href.startswith("http") else BASE + href

def scrape_gumtree_search(html):
    soup = BeautifulSoup(html, "html.parser")
    tiles = soup.select('article[data-q="search-result"]')
    results = []
    for tile in tiles:
        try:
            results.append({
                "title": text_of(tile, 'div[data-q="tile-title"]'),
                "price": text_of(tile, 'div[data-testid="price"]'),
                "location": text_of(tile, 'div[data-q="tile-location"]'),
                "link": parse_link(tile),
            })
        except Exception as e:
            print(f"Skipped a tile: {e}")
    return results

The text_of helper queries one element inside a tile and returns None when it is missing, instead of throwing on a .get_text() call against nothing. That keeps extraction resilient when a field is absent, which is common since not every listing shows a clean price. The link comes from the a[data-q="search-result-anchor"] anchor and is normalized to an absolute URL, because Gumtree serves a relative href that points to the ad page.

Selectors drift

Gumtree's data-q and data-testid attributes are more durable than generated class names, but they are not permanent: a redesign can rename or remove them. Treat the selectors above as a starting template, not a contract. When a field comes back as None for every tile, re-inspect the live search page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper.

Step 3: Handle pagination and export JSON and CSV

One results page is a demo; a real job walks several. Gumtree paginates its search results with a ?page=N query parameter, so you can loop over a fixed number of pages, fetch each one, and collect the tiles. Now wire the fetch, the parse, and the pagination loop into one runnable script, then write the records to both JSON and CSV so you can load them into a notebook or a spreadsheet.

python

import csv
import json
import time
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})
BASE = "https://www.gumtree.com"
FIELDS = ["title", "price", "location", "link"]

def crawl(page_url):
    options = {"ajax_wait": "true", "page_wait": 3000}
    response = api.get(page_url, options)
    if response["status_code"] == 200:
        return response["body"].decode("utf-8", "ignore")
    print(f"Request failed: {response['status_code']}")
    return None

def text_of(tile, selector):
    el = tile.select_one(selector)
    return el.get_text(strip=True) if el else None

def parse_link(tile):
    a = tile.select_one('a[data-q="search-result-anchor"]')
    if not a or not a.get("href"):
        return None
    href = a["href"]
    return href if href.startswith("http") else BASE + href

def scrape_gumtree_search(html):
    soup = BeautifulSoup(html, "html.parser")
    tiles = soup.select('article[data-q="search-result"]')
    results = []
    for tile in tiles:
        try:
            results.append({
                "title": text_of(tile, 'div[data-q="tile-title"]'),
                "price": text_of(tile, 'div[data-testid="price"]'),
                "location": text_of(tile, 'div[data-q="tile-location"]'),
                "link": parse_link(tile),
            })
        except Exception as e:
            print(f"Skipped a tile: {e}")
    return results

def scrape_pages(base_url, max_pages):
    all_listings = []
    for page in range(1, max_pages + 1):
        page_url = f"{base_url}&page={page}"
        html = crawl(page_url)
        if not html:
            break
        found = scrape_gumtree_search(html)
        if not found:
            break
        all_listings.extend(found)
        print(f"Page {page}: {len(found)} listings")
        time.sleep(2)
    return all_listings

def export(rows, name="gumtree_listings"):
    with open(f"{name}.json", "w", encoding="utf-8") as f:
        json.dump(rows, f, indent=2, ensure_ascii=False)
    with open(f"{name}.csv", "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=FIELDS)
        writer.writeheader()
        writer.writerows(rows)
    print(f"Saved {len(rows)} listings to {name}.json and {name}.csv")

def main():
    base_url = "https://www.gumtree.com/search?q=headset"
    rows = scrape_pages(base_url, max_pages=5)
    export(rows)

if __name__ == "__main__":
    main()

Run the full script with python gumtree_scraper.py. It walks up to five pages of the headset search, parses one row per listing tile, and writes both gumtree_listings.json and gumtree_listings.csv. The scrape_pages loop appends &page=N to the search URL, breaks early when a page returns no tiles (so you stop at the end of the results instead of fetching empty pages), and the time.sleep(2) between requests paces the run. The shared FIELDS list keeps the CSV column order in step with the dictionary keys, so the two exports never drift apart.

What the output looks like

You get a clean list of listing records, in page order, ready to write to JSON, CSV, or a database.

json

[
  {
    "title": "SteelSeries Arctis 7 Wireless Gaming Headset",
    "price": "£65.00",
    "location": "Manchester, Greater Manchester",
    "link": "https://www.gumtree.com/p/headphones/steelseries-arctis-7/1488114476"
  },
  {
    "title": "Sony WH-1000XM4 Noise Cancelling Headphones",
    "price": "£180.00",
    "location": "Leeds, West Yorkshire",
    "link": "https://www.gumtree.com/p/headphones/sony-wh-1000xm4/1483456978"
  }
]

The CSV mirrors these rows with one line per listing under a title,price,location,link header. From there you can sort by price, group by location to see where a given item is most common, or diff successive runs to track how listings and asking prices move over time. For a worked example of turning listing exports into a comparison view, see e-commerce web scraping.

Scaling beyond one search

The same pattern extends to as many searches as you need. Keep a map of query names to search URLs, loop over it, and run the paginated scraper for each one, keying the output by query so the lists stay separate. If you want richer records, follow each tile's link to its ad page and parse the fuller fields there (the full description, the date listed, image URLs), then merge those into the listing record. Either way, keep the request rate modest and lean on the Crawling API's rotation so a fan-out across many searches does not trip a rate limit.

For local market research the location field is the lever: filtering by area, or scraping the same query across several city searches, lets you compare prices and supply region by region. That is exactly the kind of regional signal Gumtree's public listings are well suited to, and it is why classifieds data is worth collecting in a structured form rather than eyeballing the site.

Staying unblocked

Even with rendering handled, Gumtree watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any classifieds or marketplace target.

Pace your requests. Spread requests out with a delay between pages and searches rather than crawling everything at full speed. Schedule heavier jobs during off-peak hours to ease load on Gumtree's servers.
Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
Retain only what you need. Store the public listing fields your project uses and discard the rest. Re-check your selectors periodically so the scraper keeps pace with markup changes.

For the broader playbook on avoiding blocks, see how to scrape websites without getting blocked, and for more on why rendering matters here, how to crawl JavaScript websites. If you want to deepen the BeautifulSoup side, the guide on using BeautifulSoup in Python covers the parsing library in detail.

Is it legal to scrape Gumtree?

Whether scraping Gumtree is allowed depends on Gumtree's Terms of Service, your jurisdiction, and what you do with the data. Gumtree's terms place limits on automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Gumtree's Terms of Service and its robots.txt, and treat both as the boundary for what you collect. For commercial or competitive use, the legal picture gets more complex, and consulting a legal expert about your specific case is the sensible move.

A few lines worth holding to. Collect only public listing data: the titles, prices, locations, and ad links that anyone can see on a search results page without an account. Do not collect personal data of private sellers: names, phone numbers, email addresses, or anything that could identify an individual, even when it appears on a public ad. Keep your request volume low enough that you are not straining Gumtree's servers, and never scrape anything behind a login or a contact-reveal step that exists to gate a seller's details.

This guide is deliberately scoped to public search-listing fields because that is the line that keeps the work defensible. It does not cover account or messaging data, a seller's personal contact details, or any attempt to bypass authentication or a CAPTCHA you are not entitled to pass. If your project needs more than public listing data, the right path is permission or a data agreement, not a cleverer scraper. Where a site offers an official API or data feed, prefer it for licensed or bulk access.

Recap

Key takeaways

Gumtree search pages are a public classifieds feed. Each tile carries a title, price, location, and ad link, which is why the data is useful for price comparison and regional demand research.
You need rendering and a trusted IP together. Gumtree fills the results grid client-side and challenges bot traffic, so the Crawling API renders the page behind a residential IP in one call.
BeautifulSoup does the extraction. Loop article[data-q="search-result"] tiles and map title, price, location, and link from their data-* attributes, and expect those attributes to drift.
Pagination is a query parameter. Append &page=N to walk multiple pages, break when a page returns no tiles, and pace requests with a short delay.
Stay on public listing data. Respect Gumtree's Terms of Service and robots.txt, and never collect a private seller's personal contact details or anything behind a login.

Frequently Asked Questions (FAQs)

Why does a plain request return no listings from Gumtree?

Two reasons. Gumtree fills much of the results grid client-side as the page loads, so a raw requests.get often gets a shell missing the listing tiles. On top of that, Gumtree challenges or blocks traffic that does not look like a real browser. Rendering the page through the Crawling API behind a trusted IP solves both, which is why the scraper here routes its request through it.

What data can I scrape from a Gumtree search page?

From each result tile you can read the public fields the listing shows: the title, the asking price, the location, and the link to the full ad. This walkthrough extracts exactly those four. Following the link to the ad page gives you fuller fields such as the description, date listed, and image URLs, but keep collection to public listing content and avoid a private seller's personal contact details.

How do I handle pagination on Gumtree?

Gumtree paginates search results with a ?page=N query parameter, so you append &page=2, &page=3, and so on to the search URL and fetch each one. The scrape_pages function in this guide loops over a page range, stops early when a page returns no tiles, and adds a short delay between requests so you are not hammering the site.

Which selectors does the scraper use?

Each result is an article[data-q="search-result"]. Inside it, the title is div[data-q="tile-title"], the price is div[data-testid="price"], the location is div[data-q="tile-location"], and the ad link is the href on a[data-q="search-result-anchor"]. These data-* attributes are more durable than generated class names, but re-inspect the live page if a field starts returning empty.

Can I export the data to a spreadsheet?

Yes. The export function writes both a JSON file and a CSV with a title,price,location,link header, one row per listing. Open the CSV directly in Excel, Google Sheets, or any spreadsheet tool, or load the JSON into a pandas DataFrame or a notebook for analysis and charting.

How do I avoid getting blocked while scraping Gumtree?

Keep your per-IP request rate low, add a delay between pages and searches, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation, a trusted IP pool, and CAPTCHA handling for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing challenges.

Hassan Rehan

Software Engineer · Crawlbase

Software engineer at Crawlbase writing hands-on guides on rotating proxies, scraping, and the practical details of wiring proxies into real code.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. Up to 20,000 requests free, no card required.

Get a free API key →Read the docs

Self-serve · No sales call required · Enterprise crawl volumes available

What you will build

Why a plain request fails on Gumtree

Prerequisites

Set up the project

Inspect the search page for selectors

Step 1: Fetch the rendered search page

Step 2: Parse the listing tiles with BeautifulSoup

Step 3: Handle pagination and export JSON and CSV

What the output looks like

Scaling beyond one search

Staying unblocked

Is it legal to scrape Gumtree?

Key takeaways

Frequently Asked Questions (FAQs)

Why does a plain request return no listings from Gumtree?

What data can I scrape from a Gumtree search page?

How do I handle pagination on Gumtree?

Which selectors does the scraper use?

Can I export the data to a spreadsheet?

How do I avoid getting blocked while scraping Gumtree?

Crawl any site at scale, without fighting infrastructure.

Continue Reading

How to Scrape Google People Also Ask: full PAA extraction guide

Introducing the New Crawlbase Dashboard: a cleaner control center

13 Tips to Master Data Crawling: crawls that do not break

The infrastructure brief, in your inbox.