GoodFirms is a B2B platform that connects businesses with IT service providers and software companies. With thousands of verified company profiles, user reviews, and details, GoodFirms is a one-stop shop for decision-makers, researchers, and marketers to research industry trends, find competitors, Click on the company link from the search results. Click on the company link from the search results. or find partners.
As of 2023, GoodFirms has over 60,000 company profiles and detailed listings in various categories like software development, digital marketing, mobile app development, and more. The platform is known for its powerful search functionality and transparent review system, that’s why businesses worldwide use it.
In this blog, we’ll guide you on how to scrape data from the GoodFirms website using Python and the Crawlbase Crawling API. Let’s begin!
Scraping GoodFirms data can help businesses, researchers, and developers. As a trusted platform with thousands of company profiles and detailed reviews, GoodFirms has data that can be used in many ways:
Competitor Analysis: Get insights into competitors services, pricing and customer reviews.
Lead Generation: Extract company details like contact information, services, and industries served.
Market Research: Analyze industry trends by looking at top-performing companies and services.
Building Databases: Create a structured repository of IT service providers for applications like recommendation engines or CRMs.
Talent Acquisition: Get company team size, industries served, and operational expertise to find collaboration or hiring opportunities.
Portfolio Benchmarking: See what makes successful projects in company portfolios and case studies.
Trend Tracking: Study customer preferences and demand patterns from detailed reviews and services offered.
Business Expansion: Find potential markets or regions to target by analyzing company locations and regional services.
With GoodFirm’s vast dataset, you have a treasure trove of data for data-driven decisions. By automating the extraction process, you save time and get access to comprehensive and up-to-date information.
Key Data Points to Extract from GoodFirms
When scraping GoodFirms you need to extract data that gives insights about companies and their services. The below image shows the key data points to extract:
Crawlbase Crawling API for GoodFirms Scraping
Crawlbase Crawling API is good for scraping static websites like GoodFirms. It takes care of proxies, HTTP headers and ensures high data accuracy. With Crawlbase, you can extract data from GoodFirms search listings and company profile pages without worrying about IP blocks or CAPTCHA challenges.
First, install the Crawlbase Python library. Use the following command:
1
pip install crawlbase
Once installed, you’ll need an API token. You can get your token by signing up on the Crawlbase website. This token will authenticate your requests.
Here’s how to initialize the Crawlbase Crawling API in Python:
1 2 3 4 5 6 7 8 9 10 11
from crawlbase import CrawlingAPI
# Initialize the Crawlbase Crawling API crawling_api = CrawlingAPI({'token': 'YOUR_CRAWLBASE_TOKEN'})
# Test your setup response = crawling_api.get('https://www.goodfirms.co/companies') if response['headers']['pc_status'] == '200': print("Setup successful! You are ready to scrape GoodFirms.") else: print(f"Failed to connect. Status code: {response['headers']['pc_status']}")
Note: Crawlbase provides two types of tokens. A normal token for static sites and a JS Token for JS-rendered sites. For GoodFirms, the normal token will work fine. Crawlbase offers 1,000 requests free for its Crawling API. Check the documentation for more.
With Crawlbase, you can scrape data without worrying about technical hurdles like blocked IPs or complex headers. In the next section, we’ll go through setting up the Python environment to scrape data.
Preparing for GoodFirms Scraping
Before you start scraping GoodFirms, you need to set up the right tools and libraries. This section will walk you through the process of installing the required libraries and setting up your Python environment for scraping.
Tools and Libraries Required
To scrape GoodFirms, you will need:
Python: Due to its ease of use and robust libraries, Python is among the best languages for web scraping.
Crawlbase Python Library: This will facilitate your Crawlbase Crawling API calls.
BeautifulSoup: A Python library for parsing HTML and extracting data from it.
Installing Python and Required Libraries
If you don’t have Python installed, download it from here: Python.org. Once installed, you can use pip to install the libraries. Run the following commands in your terminal:
1 2
pip install crawlbase pip install beautifulsoup4
These libraries will allow you to interact with the Crawlbase Crawling API, parse the HTML content from GoodFirms, and handle requests effectively.
How to Choose the Right IDE for Web Scraping
For writing your scraping script, you can use any Integrated Development Environment (IDE) or text editor. You can pick between some of the popular ones, like VS Code, PyCharm, and Jupyter Notebooks.
Scraping GoodFirms Search Listings
Here we will go through how to scrape search listings from GoodFirms. This includes inspecting the HTML, writing the scraper, handling pagination, and storing the scraped data in a JSON file.
Inspecting HTML to Identify Selectors
Before we start scraping, we need to inspect the HTML structure of the GoodFirms search listings page to get the CSS selectors we will use to extract the data. Follow these steps:
Open a GoodFirms Search Listing Page: Go to the search results for a category on GoodFirms.
Inspect the Page: Right-click on the page and select “Inspect” (or press Ctrl + Shift + I).
Identify the Relevant Data: Find the HTML elements that contain the company information. Common data points include:
Company Name: Typically found within an <h3> tag with the class firm-name.
Location: Often in a <div> element with the class firm-location.
Service Category: Usually in a <div> element nested within firm-content and under the class tagline.
Rating: Displayed in a <span> tag with the class rating-number.
Company Profile URL: Found in an <a> tag with the class visit-profile.
Once you’ve identified the relevant CSS selectors, we can proceed with writing the scraper.
Writing the GoodFirms Search Listings Scraper
Let’s now write the scraper to extract the company data from the search listings page.
if response['headers']['pc_status'] == '200': html_content = response['body'].decode('utf-8') return html_content else: print(f"Failed to fetch the page. Crawlbase status code: {response['headers']['pc_status']}") returnNone
defextract_company_details(company): name = company.select_one('h3.firm-name').text.strip() if company.select_one('h3.firm-name') else'' location = company.select_one('div.firm-location').text.strip() if company.select_one('div.firm-location') else'' category = company.select_one('div.firm-content > div.tagline').text.strip() if company.select_one('div.firm-content > div.tagline') else'' rating = company.select_one('span.rating-number').text.strip() if company.select_one('span.rating-number') else'No rating' link = company.select_one('div.firm-urls > a.visit-profile')['href'] if company.select_one('div.firm-urls > a.visit-profile') else''
for company in companies: details = extract_company_details(company) company_data.append(details)
return company_data
# Example usage url = "https://www.goodfirms.co/companies/web-development-agency/london" data = scrape_goodfirms_search_listings(url)
Handling Pagination
GoodFirms uses a page parameter in the URL to navigate through search listings. To scrape all pages, we need to handle pagination by adjusting the page parameter.
Here’s how we can modify the scraper to handle pagination:
After scraping the data, it’s essential to store it in a format that’s easy to work with. In this case, we’ll save the data in a JSON file for later use or analysis.
1 2 3 4 5 6 7 8
defsave_data_to_json(data, filename='goodfirms_search_data.json'): """Save the scraped data to a JSON file.""" withopen(filename, 'w') as f: json.dump(data, f, indent=4) print(f"Data saved to {filename}")
# Example usage save_data_to_json(all_data)
The save_data_to_json function will store the data as a JSON file, making it easy to load into a database or process further.
Complete Code Example
Here’s the complete data scraper that combines everything from making requests to handling pagination and storing the data in a JSON file:
if response['headers']['pc_status'] == '200': html_content = response['body'].decode('utf-8') return html_content else: print(f"Failed to fetch the page. Crawlbase status code: {response['headers']['pc_status']}") returnNone
defextract_company_details(company): name = company.select_one('h3.firm-name').text.strip() if company.select_one('h3.firm-name') else'' location = company.select_one('div.firm-location').text.strip() if company.select_one('div.firm-location') else'' category = company.select_one('div.firm-content > div.tagline').text.strip() if company.select_one('div.firm-content > div.tagline') else'' rating = company.select_one('span.rating-number').text.strip() if company.select_one('span.rating-number') else'No rating' link = company.select_one('div.firm-urls > a.visit-profile')['href'] if company.select_one('div.firm-urls > a.visit-profile') else''
[ { "name":"Unified Infotech", "location":"London, United Kingdom", "category":"Driving Digital Transformation with Advanced Tech", "rating":"5.0", "profile_url":"https://www.goodfirms.co/company/unified-infotech" }, { "name":"Sigli", "location":"London, United Kingdom", "category":"Signature Quality", "rating":"5.0", "profile_url":"https://www.goodfirms.co/company/sigli" }, { "name":"Closeloop Technologies", "location":"London, United Kingdom", "category":"Bringing Awesome Ideas to Life", "rating":"5.0", "profile_url":"https://www.goodfirms.co/company/closeloop-technologies" }, { "name":"instinctools", "location":"London, United Kingdom", "category":"Building Custom Software Solutions", "rating":"4.9", "profile_url":"https://www.goodfirms.co/company/instinctools" }, { "name":"Salt Technologies", "location":"London, United Kingdom", "category":"Developers by Choice", "rating":"5.0", "profile_url":"https://www.goodfirms.co/company/salt-technologies" }, .... more ]
In the next section, we’ll cover scraping company profile pages in detail.
Scraping GoodFirms Company Profile Pages
Scraping company profile pages from GoodFirms gives you more information about a company’s services, expertise, hourly rates, number of employees, and more. In this section, we will break down how to extract such data, store it, and write a complete code example for you to implement.
Inspecting HTML to Identify Selectors
The first step in scraping company profiles is to understand the structure of the profile page. Follow these steps to inspect the page:
Open a Company Profile Page: Click on the company link from the search results.
Inspect the Page: Right-click on the page and select “Inspect” (or press Ctrl + Shift + I).
Identify the Relevant Data: Find the HTML elements that contain the company information. Common data points include:
Company Name: Located in an <h1> tag with the attribute itemprop="name".
Description: Found in a <div> with the class profile-summary-text.
Hourly Rate: Located in a <div> with the class profile-pricing and a nested <span> tag.
Number of Employees: Found in a <div> with the class profile-employees and a nested <span> tag.
Year Founded: Part of a <div> with the class profile-founded and a nested <span> tag.
Services Offered: Extracted from <ul> with the class services-chart-list, where <button> tags contain the data-name attribute.
Extracting Key Details from GoodFirms Profiles
Here’s how you can extract the essential details from a GoodFirms company profile page with Python using the Crawlbase Crawling API and BeautifulSoup:
from bs4 import BeautifulSoup from crawlbase import CrawlingAPI import re
# Initialize Crawlbase API with your access token crawling_api = CrawlingAPI({'token': 'YOUR_CRAWLBASE_TOKEN'})
defmake_crawlbase_request(url): """Fetch the HTML content of a page using Crawlbase.""" response = crawling_api.get(url)
if response['headers']['pc_status'] == '200': html_content = response['body'].decode('utf-8') return html_content else: print(f"Failed to fetch the page. Crawlbase status code: {response['headers']['pc_status']}") returnNone
defextract_profile_details(html_content): """Extract detailed information from a company profile page.""" soup = BeautifulSoup(html_content, 'html.parser')
name = soup.select_one('h1[itemprop="name"]').text.strip() if soup.select_one('h1[itemprop="name"]') else'N/A' description = re.sub(r'\s+', ' ', soup.select_one('div.profile-summary-text').text.strip()) if soup.select_one('div.profile-summary-text') else'N/A' hourly_rate = soup.select_one('div.profile-pricing > span').text.strip() if soup.select_one('div.profile-pricing > span') else'N/A' no_of_employees = soup.select_one('div.profile-employees > span').text.strip() if soup.select_one('div.profile-employees > span') else'N/A' year_founded = soup.select_one('div.profile-founded > span').text.strip() if soup.select_one('div.profile-founded > span') else'N/A' services = [item['data-name'] for item in soup.select('ul.services-chart-list button')]
The extracted data can be stored in a structured format, such as JSON, for easy processing and future use.
1 2 3 4 5 6 7 8 9 10
import json
defsave_profiles_to_json(data, filename='goodfirms_profiles.json'): """Save company profile data to a JSON file.""" withopen(filename, 'w') as f: json.dump(data, f, indent=4) print(f"Profile data saved to {filename}")
# Example usage save_profiles_to_json(profiles_data)
Complete Code Example
Here is the complete implementation, including fetching, extracting, and storing profile data:
from bs4 import BeautifulSoup import json from crawlbase import CrawlingAPI import re
# Initialize Crawlbase API with your access token crawling_api = CrawlingAPI({'token': 'YOUR_CRAWLBASE_TOKEN'})
defmake_crawlbase_request(url): """Fetch the HTML content of a page using Crawlbase.""" response = crawling_api.get(url)
if response['headers']['pc_status'] == '200': html_content = response['body'].decode('utf-8') return html_content else: print(f"Failed to fetch the page. Crawlbase status code: {response['headers']['pc_status']}") returnNone
defextract_profile_details(html_content): """Extract detailed information from a company profile page.""" soup = BeautifulSoup(html_content, 'html.parser')
name = soup.select_one('h1[itemprop="name"]').text.strip() if soup.select_one('h1[itemprop="name"]') else'N/A' description = re.sub(r'\s+', ' ', soup.select_one('div.profile-summary-text').text.strip()) if soup.select_one('div.profile-summary-text') else'N/A' hourly_rate = soup.select_one('div.profile-pricing > span').text.strip() if soup.select_one('div.profile-pricing > span') else'N/A' no_of_employees = soup.select_one('div.profile-employees > span').text.strip() if soup.select_one('div.profile-employees > span') else'N/A' year_founded = soup.select_one('div.profile-founded > span').text.strip() if soup.select_one('div.profile-founded > span') else'N/A' services = [item['data-name'] for item in soup.select('ul.services-chart-list button')]
defscrape_company_profiles(profile_urls): """Scrape multiple company profiles.""" profiles_data = []
for url in profile_urls: print(f"Scraping profile: {url}") html_content = make_crawlbase_request(url) if html_content: details = extract_profile_details(html_content) profiles_data.append(details)
return profiles_data
defsave_profiles_to_json(data, filename='goodfirms_profiles.json'): """Save company profile data to a JSON file.""" withopen(filename, 'w') as f: json.dump(data, f, indent=4) print(f"Profile data saved to {filename}")
# Example usage profile_urls = [ "https://www.goodfirms.co/company/unified-infotech", "https://www.goodfirms.co/company/sigli" ]
[ { "name":"Unified Infotech", "description":"Unified Infotech is a 14-year-old, multi-award winning digital transformation partner. We\u2019re dedicated to turbocharging business growth with emerging technologies and streamlined digital processes. We serve Fortune 500 companies, multinational corporations (MNCs), small and medium-sized enterprises (SMEs), and Startups, serving as their comprehensive technology allies for bespoke web, mobile, and custom software solutions. Our organization prides itself on its consulting-oriented approach, defining itself as a \"Trusted Digital Transformation Partner\". We embody the values of being Unified, Confident, Reliable, and Transformative. Unified in vision and execution, we specialize in cutting-edge software solutions that drive seamless integration and success in the digital age. Our confidence fuels every project, ensuring robust and innovative outcomes. Reliability is at the heart of our digital transformation services, delivering end-to-end solutions that foster resilience and growth. With a transformative ethos, we revolutionize industries, empowering businesses to achieve unparalleled growth and innovation. We\u2019re your go-to partner for: Digital Transformation, Custom Web, Mobile, and Desktop Software Development Digital Customer Experience - UX/UI Research & Design SaaS and Software Product Development IT Consulting and Staff Augmentation Software Modernization and Cloud Migration Data and Analytics Cloud Engineering We serve the following industries: SaaS & Digital Platforms Education and Publication Pharma, Healthcare, and Life Sciences Fintech, Banking, Financial Services Insurance Retail, E-Commerce Supply Chain Speech and Translation Construction & Real Estate Automotive Media and Entertainment Travel and Hospitality Why choose Unified Infotech? We\u2019re a multi-award winning, global digital transformation company. We help enterprises dramatically improve business outcomes with bespoke Digital Experience, Software Development, Cloud Engineering, Data Analytics, IT Consulting & Advisory services. Deloitte Technology Fast 50 BBB Accreditation & A+ Rating (2023) 5 Rating on GoodFirms 4.6 Rating on Clutch Certified Great Place to Work", "hourly_rate":"$50 - $99/hr", "no_of_employees":"50 - 249", "year_founded":"2010", "services":[ "Web Development", "Software Development", "Web Designing (UI/UX)", "Mobile App Development", "E-commerce Development" ] }, { "name":"Sigli", "description":"Sigli is a dynamic software development company specializing in Digital Product Development and Digital Transformation. We excel at turning innovative ideas into reality, providing end-to-end solutions that cover the entire development cycle, from concept to deployment. Our keen focus on delivering high-quality, scalable, and future-proof digital products enables us to navigate the complexities of digital transformation effectively. In addition to our core development services, we place a strong emphasis on AI and Data solutions, helping businesses leverage advanced analytics and intelligent systems to drive innovation and efficiency. By modernizing operations and enhancing customer experiences, we ensure our clients remain competitive in the ever-evolving digital landscape. Our team's expertise, combined with a commitment to utilizing the latest technologies and best practices, positions Sigli as a reliable partner for businesses aiming to thrive in the digital age. We are proud to be ISO/IEC 27001 certified, ensuring that our clients' data and operations are secure and compliant. At Sigli, we believe that a successful project requires a harmonious blend of cutting-edge technology, a professional and adaptable team, mutual respect, and meticulous planning. Our client-centric approach is underpinned by transparency, ownership, and a relentless commitment to service. We communicate openly, take responsibility for our work, and always prioritize client satisfaction.", "hourly_rate":"$50 - $99/hr", "no_of_employees":"50 - 249", "year_founded":"2015", "services":[ "Software Development", "Web Development", "Big Data & BI", "Artificial Intelligence", "Mobile App Development", "Testing Services", "DevOps" ] } ]
Final Thoughts
Scraping GoodFirms data can give you insights into companies, services, and industry trends. Use Crawlbase Crawling API and Python to gather and organize search listings and company profiles. This data can be used for market research, competitor analysis, or data-driven solutions.
But always follow ethical web scraping practices by reading GoodFirm’s terms of service and using the data wisely. With the right approach and tools, you can turn GoodFirms into a treasure trove of business insights.
Here are some more guides you might be interested in:
If you have any questions or want to give feedback, our support team can help you with web scraping. Happy scraping!
Frequently Asked Questions (FAQs)
Q. What are the ethical considerations when scraping GoodFirms?
Scraping is a powerful tool, but always follow GoodFirm’s terms of service and ethical guidelines. Don’t make excessive requests that can overload their servers, respect their robots.txt file, and use the data without violating intellectual property rights.
Q. How do I handle potential challenges like captchas or blocks while scraping GoodFirms?
GoodFirms is a static website so captchas or blocks are rare. But to ensure smooth scraping, you can use Crawlbase Crawling API for features like IP rotation and request retries, which can bypass rate limiting and maintain consistent access.
Q. What insights can businesses gain from scraping GoodFirms?
Company profiles, services offered, client reviews, hourly rates, number of employees, years founded. Use this data to benchmark competitors, industry trends and build targeted outreach.