The Ultimate Fighting Championship publishes a deep archive of public fight data on ufcstats.com: every completed event, every bout, and a per-fighter breakdown of strikes, takedowns, and results. For anyone building a fight-prediction model, a betting-research dataset, or a simple "who beat whom" reference, that public statistics page is a goldmine, and copying numbers out of it by hand stops working the moment you care about more than one card.
This guide shows the best way to scrape UFC stats with Python: you will fetch the rendered stats pages through the Crawling API, parse the tables with BeautifulSoup, and assemble a clean record for each fighter and bout. The whole walkthrough stays scoped to public sports statistics (fighter names, records, results, and fight metrics). It does not touch social media, personal data, or anything behind a login, and the legality section near the end is not boilerplate, so read it before you point this at real volume.
What you will build
A Python script that takes a public ufcstats.com fighter or event URL, retrieves the rendered HTML through the Crawling API, and extracts a structured record. We will use a fighter's public details page as the running example and pull these fields:
- Fighter name the fighter the page belongs to, for example "Jon Jones".
- Record the career win-loss-draw line, such as "27-1-0".
- Event and date the card each listed bout was part of, with its date.
- Opponent the other fighter in each bout.
- Result the outcome from this fighter's perspective: win, loss, or draw.
- Key stats significant strikes and takedowns recorded for the bout.
Why a plain request struggles on ufcstats.com
You can sometimes pull the static markup from ufcstats.com with a bare HTTP client, but a raw scraper runs into two recurring problems on any public sports site. First, the layout leans on tables whose rows load alongside scripts and styling, so a naive fetch can hand you a partial page or a stripped-down shell that is missing the stat columns you came for. Second, sites watch for automated traffic: a stream of requests from a single datacenter IP, all hitting the same path in a tight loop, gets throttled or challenged long before you finish a season of events.
So a working UFC stats scraper needs two things in one request: a fetch that returns the fully formed page, and an IP the site reads as a normal visitor. You can build that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL, it fetches the page behind a trusted IP, and it returns finished HTML for you to parse.
Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Many ufcstats.com pages return their tables in the static HTML, so start with the normal token. If a stat column comes back empty, switch to the JS token and the rendered markup will include the late-loading content.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If you are new to parsing HTML, the guide to BeautifulSoup in Python covers the selector work this tutorial assumes.
Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.
A Crawlbase account and token. Sign up, open your dashboard, and copy your token from the account page. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create a virtual environment so project dependencies stay isolated, then install the two libraries the scraper needs.
python --version python -m venv ufc_env source ufc_env/bin/activate pip install crawlbase beautifulsoup4
On Windows, activate the environment with ufc_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull out individual fields by CSS selector.
Step 1: Fetch the rendered stats page
Start by getting the finished page. Import the CrawlingAPI class, initialize it with your token, and request the fighter URL. Checking the status code before you parse keeps failures loud instead of silent.
from crawlbase import CrawlingAPI api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) def crawl(page_url): response = api.get(page_url) if response["status_code"] == 200: return response["body"].decode("utf-8") print(f"Request failed: {response['status_code']}") return None if __name__ == "__main__": page_url = "http://www.ufcstats.com/statistics/events/completed" html = crawl(page_url) print(html[:500] if html else "No HTML returned")
This mirrors the minimal snippet from the original walkthrough, modernized to the official client: instead of hand-building the request URL and URL-encoding the target yourself, you pass the page URL to api.get and let the client handle the encoding and the token. Run the script with python scraper.py and you should see real stats markup, the completed-events table, not an empty shell. That confirms the fetch works before you write a single selector. Swap the example URL for a fighter details URL like http://www.ufcstats.com/fighter-details/<id> to target one fighter's page.
That single api.get call is doing more than it looks. Behind it, the Crawling API fetches the ufcstats.com page through a trusted, rotating residential IP and hands you finished HTML, so you skip running a headless browser fleet and maintaining a proxy pool just to read a public stats table. Point it at a fighter page on the free tier first.
Step 2: Parse the fighter fields with BeautifulSoup
With the HTML in hand, load it into BeautifulSoup and pull each field. A ufcstats.com fighter page lays the name and the career record near the top, then lists every bout in a results table where each row carries the opponent, the event, the date, the outcome, and the per-bout metrics. Wrap the extraction in small helpers so one missing field does not crash the run.
from bs4 import BeautifulSoup def text_of(node): return node.get_text(strip=True) if node else None def scrape_fighter(html): soup = BeautifulSoup(html, "html.parser") name = text_of(soup.select_one(".b-content__title-record")) record = text_of(soup.select_one(".b-content__title-record")) bouts = [] rows = soup.select("tr.b-fight-details__table-row") for row in rows: cells = row.select("td.b-fight-details__table-col") if len(cells) < 10: continue fighters = [text_of(p) for p in cells[1].select("p")] opponent = fighters[1] if len(fighters) > 1 else None strikes = [text_of(p) for p in cells[4].select("p")] takedowns = [text_of(p) for p in cells[5].select("p")] event = [text_of(p) for p in cells[6].select("p")] bouts.append({ "result": text_of(cells[0]), "opponent": opponent, "sig_strikes": strikes[0] if strikes else None, "takedowns": takedowns[0] if takedowns else None, "event": event[0] if event else None, "date": event[1] if len(event) > 1 else None, }) return {"name": name, "record": record, "bouts": bouts}
The text_of helper does one useful thing: it returns the trimmed text of a node, or None when the node is missing, so a .get_text() call never throws against nothing. The fighter page packs the name and the win-loss-draw record into the same title block, which is why both fields read from .b-content__title-record and you split the name from the record string downstream. Each results row lives in a tr.b-fight-details__table-row, and its cells follow a stable order: outcome, the two fighters, knockdowns, significant strikes, takedowns, then the event and date. Indexing the cells by position keeps the parser readable while pulling the result, opponent, key stats, and event for every bout. If you want a refresher on choosing robust selectors, the guide to XPath and CSS selectors is a good companion.
The b-content__title-record and b-fight-details__table class names reflect ufcstats.com's current markup. Public sites re-skin without notice, so treat the selectors above as a starting template, not a contract. When a field comes back as None, re-inspect the live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.
Step 3: Put it together
Now wire the fetch and the parse into one runnable script. Fetch the HTML, hand it to the parser, and print the structured record.
import json from crawlbase import CrawlingAPI from bs4 import BeautifulSoup api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"}) def crawl(page_url): response = api.get(page_url) if response["status_code"] == 200: return response["body"].decode("utf-8") print(f"Request failed: {response['status_code']}") return None def text_of(node): return node.get_text(strip=True) if node else None def scrape_fighter(html): soup = BeautifulSoup(html, "html.parser") name = text_of(soup.select_one(".b-content__title-record")) record = text_of(soup.select_one(".b-content__title-record")) bouts = [] for row in soup.select("tr.b-fight-details__table-row"): cells = row.select("td.b-fight-details__table-col") if len(cells) < 10: continue fighters = [text_of(p) for p in cells[1].select("p")] strikes = [text_of(p) for p in cells[4].select("p")] takedowns = [text_of(p) for p in cells[5].select("p")] event = [text_of(p) for p in cells[6].select("p")] bouts.append({ "result": text_of(cells[0]), "opponent": fighters[1] if len(fighters) > 1 else None, "sig_strikes": strikes[0] if strikes else None, "takedowns": takedowns[0] if takedowns else None, "event": event[0] if event else None, "date": event[1] if len(event) > 1 else None, }) return {"name": name, "record": record, "bouts": bouts} def main(): page_url = "http://www.ufcstats.com/fighter-details/f4c49976c75c5ab2" html = crawl(page_url) if not html: return data = scrape_fighter(html) print(json.dumps(data, indent=2)) if __name__ == "__main__": main()
What the output looks like
Run the full script with python scraper.py and you get a clean structured record for the fighter, ready to write to JSON, CSV, or a database. The exact numbers depend on the fighter and the bout; the shape stays the same.
{ "name": "Jon Jones", "record": "Record: 27-1-0", "bouts": [ { "result": "win", "opponent": "Ciryl Gane", "sig_strikes": "9", "takedowns": "1", "event": "UFC 285: Jones vs. Gane", "date": "Mar. 04, 2023" } ] }
Scaling to many fighters and events
One fighter page is a demo; a real dataset runs over many of them. The shape stays the same: keep a list of fighter URLs, fetch each through the Crawling API, parse it with the same function, and collect the rows. Because every fighter page shares the same table structure, the parser you already wrote works across all of them without changes. A common pattern is to start from the completed-events index, walk to each event's bout list, and follow the fighter links from there, then write everything out at the end.
import time fighters = [ "http://www.ufcstats.com/fighter-details/f4c49976c75c5ab2", "http://www.ufcstats.com/fighter-details/07f72a2a7591b409", ] results = [] for url in fighters: html = crawl(url) if html: results.append(scrape_fighter(html)) time.sleep(2) with open("ufc_stats.json", "w") as f: json.dump(results, f, indent=2)
The time.sleep call between requests is deliberate. Even though the Crawling API rotates IPs for you, pacing keeps your run polite and predictable, and it gives each page time to come back without piling requests on top of one another. To turn the per-fighter records into a flat table, loop the bouts list and write one CSV row per bout with the fighter name attached; the same fetch-then-parse pattern extends to event pages if you would rather index by card than by fighter.
Staying unblocked
Even on a public stats site, a stream of automated requests can get throttled. A few habits keep a run healthy, and they apply to any site you scrape at scale.
- Pace your requests. Hammering pages in a tight loop is the fastest way to get throttled. Spread requests out and vary your targets instead of crawling one path at full speed.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
- Read the status codes. A run that starts returning challenges or errors is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.
For the broader playbook, see the deep dive on scraping websites without getting blocked and the general guide to scraping a website with Python. If you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart Proxy (also called the AI Proxy) gives you the same residential IP rotation as a drop-in proxy endpoint.
Is it legal to scrape UFC stats?
Whether scraping UFC stats is allowed depends on the stats site's terms of service, your jurisdiction, and what you do with the data. Scraping public information that anyone can view without an account is generally on safer ground than collecting gated or personal data, but "public" is not a blanket permission slip. Read the terms of service and the robots.txt of whatever site you point this at, including ufcstats.com, and treat both as the boundary for what you collect and how fast. None of the code here changes those terms; it just makes the technical part work.
Keep the work narrow. This guide is deliberately scoped to public fight and fighter statistics: names, win-loss-draw records, events and dates, results, and per-bout metrics like significant strikes and takedowns. It does not cover social media data, personal data about fighters or fans, login-walled content, or any attempt to bypass authentication, and it does not redistribute copyrighted media such as event footage or photos. Those are out of scope here on purpose, because the public statistics are what keep the work defensible. If you are tempted to widen the net to social posts or personal profiles, that is exactly where this tutorial stops.
For anything commercial, prefer an official or licensed source. The UFC and its data partners license fight statistics through official feeds, and broader sports-data vendors offer paid APIs with clear usage rights and a stable schema you do not have to re-discover every time a page is re-skinned. Scraping the public stats page is the right tool for research, a personal project, or a one-off dataset where no licensed feed covers your need; when you are building a product on top of this data, a licensed sports-data API is the cleaner and safer route.
Key takeaways
- Fetch through a trusted IP. A bare request to ufcstats.com can return a partial page or get throttled; the Crawling API fetches it behind a rotating residential IP in one call.
-
Parse the tables with BeautifulSoup. The fighter page exposes name and record in a title block and every bout in a
b-fight-details__table-row, so index the cells to pull result, opponent, strikes, takedowns, and event. -
Expect selectors to drift. The
b-content__title-recordand table class names reflect the current markup; re-inspect and update them when a field returnsNone. - Scale by looping URLs with pacing. The same parser works across every fighter page, so a real dataset is a list of links plus a short sleep between requests.
-
Stay on public stats, license for commercial use. Keep to public fight and fighter statistics, respect ToS and
robots.txt, and prefer an official or licensed sports-data feed when you are building a product.
Frequently Asked Questions (FAQs)
What is the best way to scrape UFC stats with Python?
Fetch the public stats page through the Crawling API so you get the full HTML behind a trusted, rotating IP, then parse it with BeautifulSoup. That combination handles the two hard parts, retrieval that returns the complete page and an IP the site reads as a normal visitor, while you focus on selecting the fields you want from the tables.
Which UFC stats can I extract?
From a public fighter or event page you can pull the fighter name, the win-loss-draw record, each bout's opponent, the event and date, the result, and per-bout metrics like significant strikes and takedowns. This guide stays on public sports statistics only; it does not cover social media, personal data, or anything behind a login.
Do I need the normal token or the JS token for ufcstats.com?
Start with the normal token. Many ufcstats.com pages return their tables in the static HTML, so the normal token is usually enough. If a stat column comes back empty, switch to the JavaScript (JS) token, which renders the page in a real browser before handing back the HTML, and the late-loading content will be present when you parse.
My selectors return None. What changed?
Almost certainly the site's markup. Class names like b-content__title-record and the b-fight-details__table rows change when a page is re-skinned, so selectors that worked last month can break. Re-inspect a live page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.
Does the UFC offer an official stats API?
The UFC and its data partners license fight statistics through official feeds, and broader sports-data vendors sell paid APIs with clear usage rights and a stable schema. For anything commercial, prefer one of those licensed sources. Scraping the public stats page is best for research, a personal project, or a one-off dataset where no licensed feed covers your need.
How do I avoid getting blocked while scraping UFC stats?
Keep your per-IP request rate low, vary your targets instead of looping one path, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation and a trusted IP pool for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing challenges.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
