Clutch.co lists more than 150,000 service providers across IT, marketing, design, and development, and each company profile carries the kind of structured B2B data a lead-generation pipeline, a competitive-research dashboard, or a market study actually wants: a company name, a star rating, a review count, a minimum project size, an hourly rate band, a location, and a link to the full profile. The catch is that Clutch sits behind heavy bot protection, so a plain HTTP request rarely reaches the listing at all.
This guide shows you how to build a Python Clutch.co scraper the reliable way. You build a small, runnable script that fetches a rendered category page through the Crawling API, parses each company card with BeautifulSoup, and writes clean structured rows. The whole walkthrough stays scoped to public company-listing data, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.
What you will build
A Python script that takes a public Clutch.co category URL, retrieves the rendered HTML through the Crawling API, and extracts a structured record for every company on the page. We will use the IT services directory as the running example and pull these fields from each card:
- Company name the provider's listed business name.
- Rating the aggregate star score shown on the card.
- Number of reviews how many client reviews back that rating.
- Min project size the smallest engagement the provider takes, for example "$5,000+".
- Hourly rate the listed hourly rate band, for example "$50 - $99 / hr".
- Location the provider's primary city or region.
- Profile URL the link to the company's full Clutch profile.
Why a plain request fails on Clutch.co
If you request a Clutch.co category URL with a bare HTTP client, you usually do not get a listing back at all. Clutch runs aggressive bot protection, and a datacenter IP making an obviously automated request gets challenged or returned a 403 before any company data reaches you. Even when a request slips through, parts of the page fill in client-side, so the raw HTML can be missing the very cards you came for.
So a working Clutch scraper needs two things in one request: a browser that actually renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but stitching those together and keeping them healthy is most of the work. The Crawling API folds both into a single call: you send it the URL with a JavaScript token, it renders the page behind a trusted residential IP, and it returns finished HTML for you to parse.
Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first and routes the request through rotating residential IPs. Clutch is well defended and partly client-rendered, so you want the JS token here. The first 1,000 requests are free to get you started, with no credit card required.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If BeautifulSoup is new to you, our guide to using BeautifulSoup in Python covers the parsing basics this tutorial assumes.
Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org and make sure Python is on your system PATH.
A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create a virtual environment so project dependencies stay isolated, then install the libraries the scraper needs.
python --version python -m venv clutch_env source clutch_env/bin/activate pip install crawlbase beautifulsoup4 pandas
On Windows, activate the environment with clutch_env\Scripts\activate instead of the source line. Three dependencies do the work: crawlbase is the official client for the Crawling API, beautifulsoup4 parses the returned HTML so you can pull out fields by CSS selector, and pandas turns the collected rows into a CSV at the end.
Step 1: Fetch the rendered category page
Start by getting the finished page. Import the CrawlingAPI class, initialize it with your JS token, and request the category URL. Checking the status before you parse keeps failures loud instead of silent.
from crawlbase import CrawlingAPI api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"}) def fetch_html(url): response = api.get(url, {"ajax_wait": "true", "page_wait": 5000}) if response["status_code"] == 200: return response["body"].decode("utf-8") print(f"Request failed: {response['status_code']}") return None if __name__ == "__main__": base_url = "https://clutch.co/it-services" html = fetch_html(base_url) print(html[:500] if html else "No HTML returned")
The two wait options matter for a defended, partly client-rendered target like this. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so late-rendering cards appear before the page is captured. Five seconds is a reasonable start; raise it if cards come back empty. Run the script with python scraper.py and you should see real provider markup, not the 403 page a plain requests.get returns. That confirms the request is getting through before you write a single selector.
Clutch.co answers a bare request with a 403, so you need a rendered page behind a trusted IP in one call. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at the public IT services directory on the free tier first.
Step 2: Parse the company cards with BeautifulSoup
With rendered HTML in hand, load it into BeautifulSoup and pull each provider by its selector. Clutch lays its listings out in a repeated structure: every provider is a li.provider inside ul.providers__list, so you select all the cards once and then read the same fields from each one. Inspect the live page in your browser's dev tools (usually F12) to confirm the current class names; the selectors below match the layout at the time of writing.
import re from bs4 import BeautifulSoup def text_of(card, selector): el = card.select_one(selector) return re.sub(r"\s+", " ", el.get_text(strip=True)) if el else "N/A" def parse_html(html): soup = BeautifulSoup(html, "html.parser") data = [] companies = soup.select("ul.providers__list > li.provider") for company in companies: profile = company.select_one("h3.provider__title a") profile_url = profile["href"] if profile else "N/A" data.append({ "Company Name": text_of(company, "h3.provider__title"), "Rating": text_of(company, "span.sg-rating__number"), "Number of Reviews": text_of(company, "a.sg-rating__reviews"), "Min Project Size": text_of(company, "li.provider__highlights-item.min-project-size span"), "Hourly Rate": text_of(company, "li.provider__highlights-item.hourly-rate span"), "Location": text_of(company, "li.provider__highlights-item.location span.locality"), "Profile URL": profile_url, }) return data
The text_of helper does two useful things at once: it returns "N/A" when an element is missing, instead of throwing on a .get_text() call against nothing, and it collapses runs of whitespace with re.sub(r"\s+", " ", ...) so a review count like "\n 128 reviews " comes back clean. That keeps extraction resilient when a field is absent, which is common since not every provider lists a minimum project size or an hourly rate. The profile URL is read from the anchor's href rather than its text, so it is handled separately.
Clutch's class names change without notice. Treat the selectors above as a starting template, not a contract. When a field comes back as "N/A" for every card, re-inspect a live listing in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.
Step 3: Handle pagination
Clutch lists providers across many pages, and it uses a page query parameter to move between them. To collect a whole directory you walk the pages in a loop, fetch each one through the same function, and gather the rows. Because every page shares the same card structure, the parser you already wrote works across all of them without changes.
import time def scrape_clutch_data(base_url, pages): all_data = [] for page in range(1, pages + 1): url = f"{base_url}?page={page}" html = fetch_html(url) if html: all_data.extend(parse_html(html)) time.sleep(3) return all_data
The time.sleep(3) between requests is deliberate. Pacing keeps you from hammering Clutch in a tight loop, which is the fastest way to get throttled even when each request is rendered through a trusted IP. Start with a handful of pages while you confirm the selectors hold, then raise the count once the output looks right.
Step 4: Put it together and save to CSV
Now wire the fetch, the parse, and the pagination loop into one runnable script, then hand the collected rows to pandas to write a CSV. A flat CSV is the most portable output for B2B data: it opens in any spreadsheet, loads into a database, and feeds a CRM import without further work.
import re import time import pandas as pd from bs4 import BeautifulSoup from crawlbase import CrawlingAPI api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"}) def fetch_html(url): response = api.get(url, {"ajax_wait": "true", "page_wait": 5000}) if response["status_code"] == 200: return response["body"].decode("utf-8") print(f"Request failed: {response['status_code']}") return None def text_of(card, selector): el = card.select_one(selector) return re.sub(r"\s+", " ", el.get_text(strip=True)) if el else "N/A" def parse_html(html): soup = BeautifulSoup(html, "html.parser") data = [] for company in soup.select("ul.providers__list > li.provider"): profile = company.select_one("h3.provider__title a") data.append({ "Company Name": text_of(company, "h3.provider__title"), "Rating": text_of(company, "span.sg-rating__number"), "Number of Reviews": text_of(company, "a.sg-rating__reviews"), "Min Project Size": text_of(company, "li.provider__highlights-item.min-project-size span"), "Hourly Rate": text_of(company, "li.provider__highlights-item.hourly-rate span"), "Location": text_of(company, "li.provider__highlights-item.location span.locality"), "Profile URL": profile["href"] if profile else "N/A", }) return data def scrape_clutch_data(base_url, pages): all_data = [] for page in range(1, pages + 1): html = fetch_html(f"{base_url}?page={page}") if html: all_data.extend(parse_html(html)) time.sleep(3) return all_data def main(): base_url = "https://clutch.co/it-services" data = scrape_clutch_data(base_url, pages=5) df = pd.DataFrame(data) df.to_csv("clutch_data.csv", index=False) print(f"Saved {len(data)} companies to clutch_data.csv") if __name__ == "__main__": main()
Run the full script with python scraper.py and it walks five pages of the IT services directory, parses every provider on each, and writes one CSV. Swap base_url for any other public category, such as https://clutch.co/agencies/digital, and change pages to control how deep you go.
What the output looks like
Each row is a clean structured record, ready to open in a spreadsheet, load into a database, or feed a CRM. Here is a sample of the data the script produces, shown as JSON for readability.
[ { "Company Name": "Lorem Software Group", "Rating": "4.9", "Number of Reviews": "128 reviews", "Min Project Size": "$25,000+", "Hourly Rate": "$50 - $99 / hr", "Location": "Austin, TX", "Profile URL": "https://clutch.co/profile/lorem-software-group" }, { "Company Name": "Ipsum Digital Labs", "Rating": "4.7", "Number of Reviews": "54 reviews", "Min Project Size": "$10,000+", "Hourly Rate": "$100 - $149 / hr", "Location": "London, England", "Profile URL": "https://clutch.co/profile/ipsum-digital-labs" } ]
Scaling across categories and staying unblocked
One directory is a demo; a real job runs over many categories. The shape stays the same: keep a list of category URLs, run scrape_clutch_data on each, and concatenate the rows before writing the CSV. The parser works across all of them without changes. Clutch is a hard commercial target, though, so a few habits keep a long run healthy.
- Pace your requests. Hammering pages in a tight loop is the fastest way to get throttled. Keep the sleep between requests and spread a large job out over time rather than crawling a whole directory at full speed.
- Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
-
Read the status codes. A run that starts returning
403or other challenges is telling you the current rate is too aggressive. Treat that as signal to back off, not noise to ignore.
For the broader playbook, see how to scrape websites without getting blocked and the deeper dive on how to bypass captchas while web scraping. Because Clutch renders parts of its pages client-side, our guide on scraping JavaScript pages with Python explains why rendering matters. And if you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart AI Proxy (also called the AI Proxy) gives you the same residential IP rotation as a drop-in proxy endpoint.
Is it legal to scrape Clutch.co?
Whether scraping Clutch.co is allowed depends on Clutch's terms of service, your jurisdiction, and what you do with the data. Clutch's terms place limits on automated access, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read the Clutch Terms of Service and its robots.txt, and treat both as the boundary for what you collect. Clutch does not publish a public, open API for its directory, so there is no sanctioned endpoint to prefer over the page; that makes respecting the stated limits more important, not less.
A few lines worth holding to. Collect only public company-listing data: the company name, rating, review count, minimum project size, hourly rate, location, and profile link that anyone can see on a category page without an account. Keep your request volume modest so you are not straining Clutch's servers, and pace the run rather than pulling an entire directory at once. If you plan to reuse the data commercially, for outreach, resale, or a product, get permission or an official agreement rather than assuming silence is consent.
This guide is deliberately scoped to public directory and category pages because that is the line that keeps the work defensible. It does not cover anything behind a login, reviewer personal data, contact details that are not publicly listed, or copyrighted review text you would redistribute as your own. Public company-listing data only. If your project needs more than that, a data partnership with Clutch is the correct path, not a cleverer scraper.
Key takeaways
-
Clutch blocks plain requests. A bare
requests.getusually returns a 403, so you have to render the page behind a trusted IP before you can parse it. -
Use the JS token through the Crawling API. One call renders the page in a real browser and rotates residential IPs;
ajax_waitandpage_waitcontrol how long it waits for content. -
BeautifulSoup does the extraction. Select every
li.providerinul.providers__list, then read company name, rating, reviews, min project size, hourly rate, location, and profile URL, and expect the selectors to drift. -
Paginate with the page parameter. Clutch walks pages via
?page=N, so a real job loops the pages, reuses the same parser, and sleeps between requests. - Stay on public data. Respect Clutch's ToS and robots.txt, keep volume modest, and get permission before any commercial reuse.
Frequently Asked Questions (FAQs)
Why does a plain request to Clutch.co return a 403?
Clutch runs aggressive bot protection. A datacenter IP making an obviously automated request gets challenged or blocked with a 403 before any listing data reaches you. To get real data you need a request that renders the page and comes from an IP the platform reads as a real visitor, which is what the Crawling API's JS token handles for you.
Do I need the normal token or the JS token for Clutch.co?
The JS token. The normal token fetches static HTML and does not run a browser, so on a defended, partly client-rendered site like Clutch it tends to come back empty or blocked. The JS token renders the page in a real browser and routes through rotating residential IPs before handing back the HTML, so the company cards are present when BeautifulSoup parses them.
What data can I scrape from a Clutch.co listing?
From a public category page you can read each provider's company name, aggregate rating, number of reviews, minimum project size, hourly rate band, location, and the link to its full profile. This guide pulls exactly those fields. Anything behind a login, reviewer personal data, or private contact details is out of scope and out of bounds.
How do I scrape multiple pages of Clutch.co results?
Clutch paginates with a page query parameter, so you build each URL as f"{base_url}?page={page}" and loop the page numbers. Fetch each page through the same function, run the same parser, and collect the rows. Add a short time.sleep between requests so you are not hammering the site, and raise the page count once the output looks right.
My selectors return "N/A" for every card. What changed?
Almost certainly Clutch's markup. Its class names change without notice, so selectors that worked last month can break. Re-inspect a live listing in your browser's dev tools and update the selectors, for example h3.provider__title or span.sg-rating__number. Periodic selector maintenance is normal for any production scraper.
How do I avoid getting blocked while scraping Clutch.co?
Keep your per-IP request rate low, pace the run with a sleep between requests, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation and a trusted IP pool for you; if you build your own stack, that is the part to invest in. Watch the status codes and back off when you start seeing 403 responses or other challenges.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
