Identifying your target web page, inspecting the full HTML, locating the data you need, using parsing tools to extract it, manually managing your proxies, and hoping you don’t get blocked for doing it repeatedly. It’s a tedious process, but that’s what web scraping looked like before API-based scraping came along.

Today, services like Crawlbase make the whole process so much easier. They let you skip all the complicated steps and focus on what actually matters; getting the data you need.

This article will teach you how to learn the key differences between traditional and API-based scraping and how to get started with a more efficient approach to web data extraction through Crawlbase.

Table of Contents

  1. The Limitations of Traditional Scrapers
  1. Traditional Approach Examples
  1. Key Benefits of API-Based Scraping
  1. Crawlbase API-Based Approach
  1. Why API-Based Scraping with Crawlbase Wins
  1. Frequently Asked Questions

The Limitations of Traditional Scrapers

Building your web scraper from scratch is easier said than done. For starters, you need a solid understanding of how HTML works. You have to inspect the page’s structure, figure out which tags like <div>, <span>, or <a>; hold the data you’re after, and know exactly how to extract it. And that’s just the beginning. Several other challenges come with traditional scraping:

Handling JavaScript-Rendered Pages

Solving this on your own takes a lot of effort. You’ll likely need tools like Selenium or Playwright to run a headless browser, since the data you’re after doesn’t always appear in the page’s initial HTML. It’s often generated dynamically after the page loads. If you rely on a simple GET request, your scraper will probably return an empty response.

IP Bans and Rate Limiting

This is one of the biggest challenges in traditional scraping, as it’s how websites detect and block automated crawling and scraping activities. Bypassing these defenses often means writing custom code to rotate proxies or IP addresses, and adding logic to mimic human-like browsing behavior. All of this requires advanced coding skills and adds a lot of complexity to your scraper.

Maintenance Cost

Traditional scrapers will almost always cost you more, not just in money, but also in development time and effort. Manually coded scrapers tend to break often and need constant updates. Managing healthy IPs or rotating proxies adds even more maintenance overhead. Failed scrapes or incomplete data also lead to wasted computing resources. Most of these problems are avoidable when you use modern, well-supported APIs.

Lack of Scalability

With all of the above issues combined, it is no surprise that scaling this would be a huge problem. The high costs and low reliability make it a poor choice, especially if you’re aiming to scale your project for larger companies. If growth and efficiency matter, sticking with traditional scraping doesn’t make sense, especially today, where API-based tools like Crawlbase exist.

Traditional Scraping Examples

This method is fairly straightforward. In this example, we’ll use Python’s requests library to demonstrate the most basic form of crawling and scraping a website.

Setup the Coding Environment

  1. Install Python 3 on your computer
  2. Open your terminal and run
1
python -m pip install requests

Basic (non-JavaScript) page

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import requests
from requests.exceptions import RequestException

# Configuration
TARGET_URL = "https://www.google.com/search?q=Mike+Tyson"
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/122.0.0.0 Safari/537.36"
)
}

# Fetch the HTML content of the page
try:
response = requests.get(TARGET_URL, headers=HEADERS)
response.raise_for_status()

html_content = response.text
print(html_content) # Output the raw HTML content

# To extract structured data (e.g., search results),
# use a parser like Beautiful Soup on `html_content`.

except RequestException as error:
print(f"\n Failed to fetch the page: {error}\n")

Save the following code in a file named basic_page.py, then execute it from the command line using:

1
python basic_page.py

Output:

Screenshot of example output after traditional scraping

As you can see from the output, this method returns the raw HTML of the page. While it works for basic or static pages, it falls short when dealing with modern websites that rely heavily on JavaScript to render content, which you will see in the next example.

JavaScript page

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import requests
from requests.exceptions import RequestException

# Configuration
TARGET_URL = "https://www.instagram.com/leomessi"
OUTPUT_FILE_NAME = "output.html"
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/122.0.0.0 Safari/537.36"
)
}

# Fetch and save page
try:
response = requests.get(TARGET_URL, headers=HEADERS)
response.raise_for_status()

with open(OUTPUT_FILE_NAME, "w", encoding="utf-8") as file:
file.write(response.text)

print(f"\nPage successfully saved to '{OUTPUT_FILE_NAME}'\n")

except RequestException as error:
print(f"\nFailed to fetch the page: {error}\n")

Save the following code in a file named javascript_page.py, then execute it from the command line using:

1
python javascript_page.py

Here is the terminal console output:

Screenshot of terminal console output after traditional scraping

And when you open the file output.html on a browser:

Screenshot of browser after traditional scraping

The browser renders a blank Instagram page because the JavaScript responsible for loading the content was not executed during the crawling process.

In such cases, you’d need to implement additional tools or switch to more advanced solutions, like using a headless browser or, better yet, an API-based scraper to save time and effort.

Key Benefits of API-Based Scraping

In the context of scraping, “API-based” means collecting data by making requests to official endpoints provided by a website or service. This makes the entire process faster, more reliable, and far less complicated.

While official APIs like GitHub API are a good alternative to traditional scraping, Crawlbase offers an even more powerful solution. Its generalized approach allows you to scrape almost any publicly available website, and it can also be used alongside official APIs to significantly enhance your scraping workflow. Here are some key advantages:

IP Management and CAPTCHA Handling

Crawlbase provides an API that acts as middleware to simplify web scraping. Instead of accessing official site APIs, they handle complex tasks such as IP rotation, bot detection, and CAPTCHA solving. The API utilizes massive IP pools, AI-based behavior, and built-in automation features to avoid bans and blocks. Users simply send a target URL to the endpoint and receive accurate data. No need to worry about managing proxies, avoiding CAPTCHAs, or simulating browser behavior manually.

Built-in Data Scrapers

Crawlbase doesn’t just provide the complete HTML code of your target page; it can also deliver clean, structured data, eliminating the need to constantly adjust your code every time a website changes something on its side.

It has built-in scrapers for major platforms like Facebook, Instagram, Amazon, eBay, and many others. This saves developers a ton of time and hassle, letting them focus on using the data rather than figuring out how to extract it.

Efficient and Reliable

Even if you’re planning to crawl small or large volumes of data, reliability, and speed are key factors in deciding which approach to use for your project. Crawlbase is known for having one of the most stable and reliable services in the market. A quick look at the Crawlbase status page shows an almost 100% uptime for its API.

Fast Integration and Scalability

With a single API endpoint, you can access Crawlbase’s main product, the Crawling API, for scraping and data extraction. Any programming language that supports HTTP or HTTPS requests can work with this API, making it easy to use across different platforms. To simplify integration even further, Crawlbase also offers free libraries and SDKs for various languages. Using this API as the foundation for your scraper is a big reason why scaling your projects becomes much simpler.

Crawlbase API-Based Approach

You can spend time learning headless browsers, managing proxies, and parsing HTML, or you can skip all that complexity and use the Crawling API instead. Here’s how easy it is to get started:

Signup and Quickstart Guide

Crawling API (Basic page)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import requests
import json
from requests.exceptions import RequestException

# Configuration
API_TOKEN = "<Normal requests token>"
TARGET_URL = "https://www.google.com/search?q=Mike+Tyson"
API_ENDPOINT = "https://api.crawlbase.com/"

params = {
"token": API_TOKEN,
"url": TARGET_URL,
"scraper": "google-serp",
"country": "US"
}

# Fetch the content of the page as structured JSON format
try:
response = requests.get(API_ENDPOINT, params=params)
response.raise_for_status()

json_string_content = response.text
json_data = json.loads(json_string_content)
pretty_json = json.dumps(json_data, indent=2)
print(pretty_json)

except RequestException as error:
print(f"\n Failed to fetch the page: {error}\n")

Note:

  • Make sure to replace Normal_requests_token with your actual token.
  • The "scraper": "google-serp" is optional. Remove it if you wish to get the complete HTML response.

Save the script as basic_page_using_crawling_api.py, then run it from the command line using:

1
python basic_page_using_crawling_api.py

Response

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
{
"original_status": 200,
"pc_status": 200,
"url": "https://www.google.com/search?q=Mike+Tyson",
"domain_complexity": "complex",
"body": {
"ads": [],
"peopleAlsoAsk": [],
"snackPack": {
"mapLink": "",
"moreLocationsLink": "",
"results": []
},
"searchResults": [
{
"position": 1,
"title": "Mike Tyson - Wikipedia",
"postDate": "",
"url": "https://en.wikipedia.org/wiki/Mike_Tyson",
"destination": "en.wikipedia.org › wiki › Mike_Tyson",
"description": "Michael Gerard Tyson (born June 30, 1966) is an American former professional boxer who competed primarily between 1985 and 2005. Nicknamed \"Iron Mike\" and ... Vs. Buster Douglas · Mike Tyson Mysteries · Mike Tyson (disambiguation) · Girls 2"
},
// Note: some results have been omitted for brevity.
{
"position": 11,
"title": "is mike tyson still alive",
"postDate": "",
"url": "",
"destination": "Related searches",
"description": "is mike tyson still alive mike tyson net worth mike tyson children mike tyson stats mike tyson movie mike tyson height mike tyson daughter mike tyson record"
}
],
"relatedSearches": [
{
"title": "Mike Tyson returns to Riviera Beach for Boxing for Cause event at JFK Middle School WTVX · 3 hours ago",
"url": "https://google.com/url?q=https://cw34.com/news/local/mike-tyson-returns-to-riviera-beach-for-boxing-for-cause-event-at-jfk-middle-school-florida-may-19-2025&sa=U&ved=2ahUKEwi5_u2asLGNAxURVkEAHZfXAiQQvOMEegQIAhAC&usg=AOvVaw2yO_XM1BxlG5lQ5SFYqrcx"
},
// Note: some results have been omitted for brevity.
{
"title": "mike tyson record",
"url": "https://google.com/search?sca_esv=c77914c67f84fb9a&q=mike+tyson+record&sa=X&ved=2ahUKEwi5_u2asLGNAxURVkEAHZfXAiQQ1QJ6BAgBEAg"
}
],
"numberOfResults": 11
}
}

Crawling API (JavaScript page)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import json
import requests
from requests.exceptions import RequestException

# Configuration
API_TOKEN = "<JavaScript requests token>"
TARGET_URL = "https://www.instagram.com/leomessi"
API_ENDPOINT = "https://api.crawlbase.com/"
OUTPUT_FILE_NAME = "output.html"

params = {
"token": API_TOKEN,
"url": TARGET_URL,
## Uncomment the code below when output to console
# "scraper": "instagram-profile"
}

# Fetch and save page
try:
response = requests.get(API_ENDPOINT, params=params)
response.raise_for_status()

## START: Output to file
with open(OUTPUT_FILE_NAME, "w", encoding="utf-8") as file:
file.write(response.text)
## END: Output to file

print(f"\nPage successfully saved to '{OUTPUT_FILE_NAME}'\n")

## Uncomment the code below when output to console
## START: Output to console
# json_string_content = response.text
# json_data = json.loads(json_string_content)
# pretty_json = json.dumps(json_data, indent=2)
# print(pretty_json)
## END: Output to console

except RequestException as error:
print(f"\nFailed to fetch the page: {error}\n")

Same as the previous code, you have to save this and go to your terminal to execute the code.

Once successfully executed, you should see a similar output below:

Screenshot of terminal console output after Crawlbase scraping

When you open output.html, you’ll see that the page is no longer blank, as the Crawling API runs your request through a headless browser infrastructure.

Screenshot of browser output after Crawlbase scraping

If you want clean, structured JSON response data that’s ready to use, simply add the "scraper": "instagram-profile" parameter to your request. This tells Crawlbase to automatically parse the Instagram profile page and return only the relevant data, saving you the effort of manually extracting the whole HTML page.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
{
"original_status": 200,
"pc_status": 200,
"url": "https://www.instagram.com/leomessi",
"domain_complexity": "standard",
"body": {
"username": "leomessi",
"verified": true,
"postsCount": {
"value": "1,352 posts",
"text": "1,352 posts"
},
"followersCount": {
"value": "1,352",
"text": "1,352"
},
"followingCount": {
"value": "505M followers",
"text": "505M followers"
},
"picture": "",
"name": "leomessi",
"bio": {
"text": "Bienvenidos a la cuenta oficial de Instagram de Leo Messi / Welcome to the official Leo Messi Instagram account",
"tags": []
},
"openStories": [
{
"image": "https://instagram.fdac5-1.fna.fbcdn.net/v/t51.12442-15/29087606_126595214845908_6406382890979950592_n.jpg?stp=c0.398.1024.1024a_dst-jpg_e35_s150x150_tt6&_nc_ht=instagram.fdac5-1.fna.fbcdn.net&_nc_cat=1&_nc_oc=Q6cZ2QH6EqvaVyfXNk8zSys32rW4yL8DZ4rc2YnAOPfML_oniyB2vNF-QkDP6ODCwR-S1RA&_nc_ohc=r0nEuFs6-HsQ7kNvwFu5CEg&_nc_gid=yagnghB9KYY63NmgzUZcwA&edm=AGW0Xe4BAAAA&ccb=7-5&oh=00_AfI539_HwS461-oFMMMRcfZRsGHpm9g9dK4ZnAzTuy2OLg&oe=6831F937&_nc_sid=94fea1",
"text": "Selección's profile picture"
}
// Note: some results have been omitted for brevity.
],
"posts": [
{
"link": "https://www.instagram.com/leomessi/p/DHwD6QfNjtM/",
"image": "https://instagram.fdac5-2.fna.fbcdn.net/v/t51.2885-15/487279743_18564110437033891_6646105334131093181_n.jpg?stp=dst-jpg_e35_p640x640_sh0.08_tt6&_nc_ht=instagram.fdac5-2.fna.fbcdn.net&_nc_cat=107&_nc_oc=Q6cZ2QEQESi6ZBcLSC7mzApMy8pkVFjaMzqMN3LHMBymIMNTLgW-O5pkV7NYRmMMPm-OXUk&_nc_ohc=2syeyScYoDgQ7kNvwF29WUn&_nc_gid=7sozkWOc6vQySL1gR5H2pQ&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfLT72_fv6olEKMMljFOlP-rthEnep23at8tiMxiSV9NvA&oe=6831F3EB&_nc_sid=8b3546",
"imageData": "Photo shared by Leo Messi on March 28, 2025 tagging @masbymessi. May be an image of 1 person, playing soccer, playing football, cleats, ball, sports equipment, sportswear and text.",
"images": [
"https://instagram.fdac5-2.fna.fbcdn.net/v/t51.2885-15/487279743_18564110437033891_6646105334131093181_n.jpg?stp=c0.169.1350.1350a_dst-jpg_e35_s150x150_tt6&efg=eyJ2ZW5jb2RlX3RhZyI6ImltYWdlX3VybGdlbi4xMzUweDE2ODguc2RyLmY3NTc2MS5kZWZhdWx0X2ltYWdlIn0&_nc_ht=instagram.fdac5-2.fna.fbcdn.net&_nc_cat=107&_nc_oc=Q6cZ2QEQESi6ZBcLSC7mzApMy8pkVFjaMzqMN3LHMBymIMNTLgW-O5pkV7NYRmMMPm-OXUk&_nc_ohc=2syeyScYoDgQ7kNvwF29WUn&_nc_gid=7sozkWOc6vQySL1gR5H2pQ&edm=AOQ1c0wBAAAA&ccb=7-5&oh=00_AfJssBLRDQJbI-ACa2Iq1WwpRv3WwgSTEwlYgZrgOpiIWA&oe=6831F3EB&_nc_sid=8b3546"
// Note: some results have been omitted for brevity.
]
},
// Note: some results have been omitted for brevity.
{
"link": "https://www.instagram.comhttps://privacycenter.instagram.com/policy/",
"image": "",
"imageData": "",
"images": []
}
],
"igtvs": []
}
}

You can also visit Crawlbase’s GitHub repository to download the complete example code used in this guide.

Why is API-based Scraping Preferred Over Traditional Web Scraping?

As you can see in our demonstration above, using an API-based solution like Crawlbase’s Crawling API offers clear advantages over traditional scraping methods when it comes to collecting data from websites. Let’s take a closer look at why it’s a winning choice for both developers and businesses.

Reduced Dev Time and Costs

Instead of spending time developing a scraper that constantly needs updates whenever a website changes its HTML, handling JavaScript pages, or maintaining proxies to avoid getting blocked, you can simply use the Crawling API. Traditional scraping comes with too many time-consuming challenges. By letting Crawlbase take care of the heavy lifting, you’ll lower your overall project costs and reduce the need for extra manpower.

Scalable Infrastructure

Crawlbase products are built with scalability in mind. From simple HTTP/HTTPS requests to ready-to-use libraries and SDKs for various programming languages, integration is quick and easy.

The Crawling API is designed to scale with your needs. Crawlbase uses a pay-as-you-go payment model, giving you the flexibility to use as much or as little as you need each month. You’re not locked into a subscription, and you only pay for what you use, making it ideal for projects of any size.

Higher Success Rate

Crawlbase is built to maximize success rates with features like healthy IP pools, AI-powered logic to avoid CAPTCHAs and a highly maintained proxy network. A higher success rate means faster data collection and lower operational costs. Even in the rare case of a failed request, Crawlbase doesn’t charge you, making it a highly cost-effective solution for web scraping.

Give Crawlbase a try today and see how much faster and more efficient web scraping can be. Sign up for a free account to receive your 1,000 free API requests!

Frequently Asked Questions (FAQs)

Q: Why should I switch to an API-based solution like Crawlbase?

A: Traditional scraping is slow, complex, and hard to scale. Crawlbase handles IP rotation, JavaScript rendering, and CAPTCHA avoidance, so you get reliable data faster with less code and maintenance. Even if there’s an upfront cost, the overall expense is usually lower than building and maintaining your own scrapers.

Q. What are the limitations of Crawlbase?

A: Crawlbase is designed for flexibility and scalability, but like any API-based platform, it has certain operational limits depending on the crawling method being used. Below is a breakdown of the default limits:

Crawling API (Synchronous)

  • Bandwidth per Request: Unlimited
  • Rate Limit:
    • 20 requests per second for most websites
    • 1 request per second for Google domain
    • 5 requests per second for LinkedIn (Async Mode)

Note: Rate limits can be increased upon request. If you’re unsure which product suits your use case or want to request higher limits, the Crawlbase customer support is available to help tailor the setup for your project.

Q. What are the main differences between web scraping and API-based data collection?

A: API-based data collection uses a structured and authorized interface provided by the data source to get information in a clean, predictable format like JSON or XML.

Key differences:

  • Structure: APIs return structured data, and scraping requires parsing raw HTML.
  • Reliability: APIs are more stable and less likely to break due to design changes, scraping can break due to layout or code updates.
  • Access: APIs require authentication and have usage limits, scraping can access any publicly visible content (though it may raise ethical or legal issues).
  • Speed and Efficiency: API calls are generally faster and more efficient, especially for large-scale data collection.
  • Compliance: API usage is governed by clear terms of service, scraping may violate a site’s policies if not done correctly.

API is usually the preferred method when available, but scraping is useful when APIs are limited, unavailable, or too restrictive.