How to Scrape LinkedIn: Complete 2025 Guide

LinkedIn scraping unlocks valuable data for recruitment, sales, and market research. This guide shows you how to extract LinkedIn profiles, company pages, and feeds using Python and Crawlbase’s Crawling API.

Why Scrape LinkedIn?
What Can We Scrape from LinkedIn?
Potential Challenges of Scraping LinkedIn
Crawlbase Crawling API for LinkedIn Scraping
Setting Up Your Environment
Crawlbase LinkedIn Profiles Scraper
- Retrieving Data from Crawlbase Cloud Storage
Crawlbase LinkedIn Company Pages Scraper
- Retrieving Data from Crawlbase Cloud Storage
Crawlbase LinkedIn Feeds Scraper
- Scraping a LinkedIn Feed
- Retrieving Data from Crawlbase Cloud Storage
Supercharge Your Career Goals with Crawlbase
Frequently Asked Questions (FAQs)

Why Scrape LinkedIn?

LinkedIn data extraction offers powerful advantages:

An image that lists the reasons why scraping LinkedIn jobs in important

Talent Acquisition: Automate candidate sourcing and find qualified professionals faster
Sales & Lead Generation: Sales teams can scrape LinkedIn profiles to gather leads, monitor them for use by cold callers, or develop targeted outreach strategies.
Market Research: Monitor competitors, industry trends, and market benchmarks
Job Market Analysis: Track hiring patterns, salary trends, and in-demand skills
Academic Research: Gather datasets on professional networking and career trajectories.

What Data Can We Scrape from LinkedIn?

LinkedIn Profiles:

Personal Information: Names, job titles, current and past positions, education, skills, endorsements, and recommendations.
Contact details: Emails, phone numbers (if publicly available), and social media profiles.
Engagement: Posts, articles, and other content shared or liked by the user.

Company Pages:

Company Details: Name, industry, size, location, website, and company description.
Job Postings: Current openings, job descriptions, requirements, and application links.
Employee Information: List of employees, their roles, and connections within the company.
Updates and News: Company posts, articles, and updates shared on their page.

LinkedIn Feeds:

Activity Feed: Latest updates, posts, and articles from users and companies you are interested in.
Engagement Metrics: Likes, comments, shares, and the overall engagement of posts.
Content Analysis: Types of content being shared, trending topics, and user engagement patterns.

Scraping Challenges & Solutions

Scraping LinkedIn can provide valuable data, but it also comes with its challenges.

An image showing the potential challenges of scraping LinkedIn which are list below.

Anti-Scraping Measures

Challenge: IP blocking and CAPTCHAs
Solution: Crawlbase provides rotating proxies and CAPTCHA handling

Dynamic Content

Challenge: JavaScript-rendered pages
Solution: Use headless browsers or Crawlbase’s rendering engine

Legal Compliance

Challenge: LinkedIn’s Terms of Service restrictions
Solution: Focus on public data only and respect privacy laws

Data Volume

Challenge: Processing large datasets
Solution: Asynchronous requests and structured storage

Getting Started with Crawlbase

To scrape LinkedIn using Crawlbase’s Crawling API, you need to set up your Python environment. Before getting started, view LinkedIn pricing here.

1. Install Python:

Download and install Python from the official website. Ensure that you add Python to your system’s PATH during installation.

2. Create a Virtual Environment:

Open your terminal or command prompt and navigate to your project directory. Create a virtual environment by running:

1	python -m venv venv

Activate the virtual environment:

On Windows:
1
.\venv\Scripts\activate
On macOS/Linux:
1
source venv/bin/activate

3. Install Crawlbase Library:

With the virtual environment activated, install the Crawlbase library using pip:

1	pip install crawlbase

Scraping LinkedIn Profiles:

Start by importing the necessary libraries and initializing the Crawlbase API with your access token. Define the URL of the LinkedIn profile you want to scrape and set the scraping options.

from crawlbase import CrawlingAPI
import json

# Initialize Crawlbase API with your access token
crawling_api = CrawlingAPI({ 'token': 'YOUR_API_TOKEN' })

URL = 'https://www.linkedin.com/in/kaitlyn-owen'

options = {
    'scraper': 'linkedin-profile',
    'async': 'true'
}

# Function to make a request using Crawlbase API
def make_crawlbase_request(url):
    response = crawling_api.get(url, options)
    if response['status_code'] == 200:
        return json.loads(response['body'].decode('latin1'))
    else:
        print("Failed to fetch the page. Status code:", response['status_code'])
        return None

def scrape_profile(url):
    try:
        json_response = make_crawlbase_request(url)
        if json_response:
            return json_response
    except Exception as e:
        print(f"Request failed: {e}")

    return None

if __name__ == '__main__':
    scraped_data = scrape_profile(URL)
    print(json.dumps(scraped_data, indent=2))

This script initializes the Crawlbase API, defines the URL of the LinkedIn profile to scrape, and uses the linkedin-profile scraper. It makes an asynchronous request to fetch the profile data and prints the JSON response.

Example Output:

1
2
3

{
  "rid": "1dd4453c6f6bd93baf1d7e03"
}

Retrieving Data from Crawlbase Cloud Storage:

When using asynchronous requests, Crawlbase Cloud Storage saves the response and provides a request identifier (rid). You need to use this rid to retrieve the data.

from crawlbase import StorageAPI
import json

# Initialize Crawlbase Cloud Storage with your access token
storage_api = StorageAPI({ 'token': 'YOUR_API_TOKEN' })

RID = 'your_request_identifier'

# Function to retrieve data from Crawlbase storage
def retrieve_data(rid):
    response = storage_api.get(rid)
    if response['status_code'] == 200:
        return json.loads(response['body'].decode('latin1'))
    else:
        print("Failed to retrieve the data. Status code:", response['status_code'])
        return None

if __name__ == '__main__':
    retrieved_data = retrieve_data(RID)
    print(json.dumps(retrieved_data, indent=2))

This script retrieves the stored response using the rid and prints the JSON data.

Example Output:

{
  "summary": ["I am a self-motivated professional who is passionate about helping surgeons personally…"],
  "activities": [
    {
      "title": "With permission - 4 years after explantation of an infected aortic graft placed at another local institution. Playing golf and loving life. Best…",
      "link": "https://www.linkedin.com/posts/peter-rossi-md-facs-dfsvs-9393b934_aorta-aortaed-activity-7185799259269525504-DI5k?trk=public_profile",
      "image": "https://media.licdn.com/dms/image/D5622AQFKrMD3lTsK3w/feedshare-shrink_2048_1536/0/1713228047686?e=2147483647&v=beta&t=eZ4Blo9-IEPoDaF7TgUQbm-gFtDmRGTaW1uZOqLWEM4",
      "attributions": {
        "title": "Liked by Kaitlyn Owen",
        "link": "https://www.linkedin.com/in/kaitlyn-owen?trk=public_profile_actor-name"
      }
    },
        {
      "title": "Kaitlyn Owen",
      "position": "",
      "link": "https://www.linkedin.com/in/kaitlyn-owen-1a469575?trk=public_profile_samename-profile",
      "image": null,
      "location": "Redmond, WA"
    }
  ],
  "similarProfiles": []
}

Scraping Company Pages

Use the linkedin-company scraper to gather organizational data:

from crawlbase import CrawlingAPI
import json

# Initialize Crawlbase API with your access token
crawling_api = CrawlingAPI({ 'token': 'YOUR_API_TOKEN' })

URL = 'https://www.linkedin.com/company/amazon'

options = {
    'scraper': 'linkedin-company',
    'async': 'true'
}

# Function to make a request using Crawlbase API
def make_crawlbase_request(url):
    response = crawling_api.get(url, options)
    if response['status_code'] == 200:
        return json.loads(response['body'].decode('latin1'))
    else:
        print("Failed to fetch the page. Status code:", response['status_code'])
        return None

def scrape_company(url):
    try:
        json_response = make_crawlbase_request(url)
        if json_response:
            return json_response
    except Exception as e:
        print(f"Request failed: {e}")

    return None

if __name__ == '__main__':
    scraped_data = scrape_company(URL)
    print(json.dumps(scraped_data, indent=2))

This script initializes the Crawlbase API, sets the URL of the LinkedIn company page you want to scrape, and specifies the linkedin-company scraper. The script then makes an asynchronous request to fetch the company data and prints the JSON response.

Example Output:

1
2
3

{
  "rid": "f270321bbebe203b43cebedd"
}

Retrieving Data from Crawlbase Cloud Storage

As with profile scraping, asynchronous requests will return a rid. You can use this rid to retrieve the stored data.

from crawlbase import StorageAPI
import json

# Initialize Crawlbase Cloud Storage with your access token
storage_api = StorageAPI({ 'token': 'YOUR_API_TOKEN' })

RID = 'your_request_identifier'

# Function to retrieve data from Crawlbase storage
def retrieve_data(rid):
    response = storage_api.get(rid)
    if response['status_code'] == 200:
        return json.loads(response['body'].decode('latin1'))
    else:
        print("Failed to retrieve the data. Status code:", response['status_code'])
        return None

if __name__ == '__main__':
    retrieved_data = retrieve_data(RID)
    print(json.dumps(retrieved_data, indent=2))

This script retrieves and prints the stored company data using the rid.

Example Output:

{
  "title": "Amazon",
  "headline": "Software Development",
  "cover_image": "https://media.licdn.com/dms/image/D4D3DAQGri_YWxYb-GQ/image-scale_191_1128/0/1681945878609/amazon_cover?e=2147483647&v=beta&t=DEHImsFhQdlARMSTcY2AmdImxdLxIyvDncPmPQEpebY",
  "company_image": "https://media.licdn.com/dms/image/C560BAQHTvZwCx4p2Qg/company-logo_200_200/0/1630640869849/amazon_logo?e=2147483647&v=beta&t=2vRB20XZOYNtXSr5GHAUUQXXII4lvgcotA2QTMcRHOI",
  "url": "https://www.linkedin.com/company/amazon",
  "employees": {
    "numberOfEmployees": 737833,
    "link": "https://www.linkedin.com/search/results/people/?facetCurrentCompany=%5B15218805%2C+2649984%2C+17411%2C+78392228%2C+208137%2C+61712%2C+2382910%2C+49318%2C+16551%2C+80073065%2C+47157%2C+21433%2C+71099%2C+860467%2C+12227%2C+167364%2C+4787585%2C+11091426%2C+451028%2C+111446%2C+14951%2C+46825%2C+2320329%2C+34924%2C+1586%5D"
  },
        {
        "address": "Marcel-Breuer-Straße 12Munich, Bavaria 80807, DE",
        "link": "https://www.bing.com/maps?where=Marcel-Breuer-Stra%C3%9Fe+12+Munich+80807+Bavaria+DE&trk=org-locations_url"
      }
    ]
  },
  "employeesAtCompany": [
    {
      "title": "Steven Hatch",
      "position": "Experienced Amazon Engineering Leader | Generative AI at Amazon",
      "link": "https://www.linkedin.com/in/hatch?trk=org-employees",
      "image": "https://media.licdn.com/dms/image/D4E03AQG823Q38d3Igg/profile-displayphoto-shrink_100_100/0/1673281011530?e=2147483647&v=beta&t=sK2PKC8tMDWU5koa0DpKxZzhQ1Zofs1shi941xNscrQ",
      "location": ""
    },

Scraping LinkedIn Feeds

Monitor activity streams with the linkedin-feed scraper:

from crawlbase import CrawlingAPI
import json

# Initialize Crawlbase API with your access token
crawling_api = CrawlingAPI({ 'token': 'YOUR_API_TOKEN' })

URL = 'https://www.linkedin.com/feed/update/urn:li:activity:7022155503770251267'

options = {
    'scraper': 'linkedin-feed',
    'async': 'true'
}

# Function to make a request using Crawlbase API
def make_crawlbase_request(url):
    response = crawling_api.get(url, options)
    if response['status_code'] == 200:
        return json.loads(response['body'].decode('latin1'))
    else:
        print("Failed to fetch the page. Status code:", response['status_code'])
        return None

def scrape_feed(url):
    try:
        json_response = make_crawlbase_request(url)
        if json_response:
            return json_response
    except Exception as e:
        print(f"Request failed: {e}")

    return None

if __name__ == '__main__':
    scraped_data = scrape_feed(URL)
    print(json.dumps(scraped_data, indent=2))

Example Output:

1
2
3

{
  "rid": "977b3381ab11f938d6522775"
}

Retrieving Data from Crawlbase Cloud Storage

As with profile and company page scraping, asynchronous requests will return a rid. You can use this rid to retrieve the stored data.

from crawlbase import StorageAPI
import json

# Initialize Crawlbase Cloud Storage with your access token
storage_api = StorageAPI({ 'token': 'YOUR_API_TOKEN' })

RID = 'your_request_identifier'

# Function to retrieve data from Crawlbase storage
def retrieve_data(rid):
    response = storage_api.get(rid)
    if response['status_code'] == 200:
        return json.loads(response['body'].decode('latin1'))
    else:
        print("Failed to retrieve the data. Status code:", response['status_code'])
        return None

if __name__ == '__main__':
    retrieved_data = retrieve_data(RID)
    print(json.dumps(retrieved_data, indent=2))

This script retrieves and prints the stored feed data using the rid.

Example Output:

{
  "feeds": [
    {
      "text": "#AlphabetInc is eliminating 12,000 jobs, its chief executive said in a staff memo The cuts mark the latest to shake the #technology sector and come days after rival Microsoft Corp said it would lay off 10,000 workers. Full report - https://lnkd.in/dfxXc2N4",
      "images": [
        "https://media.licdn.com/dms/image/C4D22AQHvTzTp5mnMcg/feedshare-shrink_2048_1536/0/1674212335928?e=2147483647&v=beta&t=Aq3WKkxF1Q5ZwGB6ax6OOWRtCW7Vlz8KDdpBvvK4K_0"
      ],
      "videos": [],
      "datetime": "1y",
      "postUrl": "https://in.linkedin.com/company/hindustantimes?trk=public_post_feed-actor-image",
      "userName": "Hindustan Times",
      "reactionCount": 1177,
      "commentsCount": 13,
      "links": [
        {
          "text": "#AlphabetInc",
          "url": "https://www.linkedin.com/signup/cold-join?session_redirect=https%3A%2F%2Fwww.linkedin.com%2Ffeed%2Fhashtag%2Falphabetinc&trk=public_post-text"
        },
        {
          "text": "#technology",
          "url": "https://www.linkedin.com/signup/cold-join?session_redirect=https%3A%2F%2Fwww.linkedin.com%2Ffeed%2Fhashtag%2Ftechnology&trk=public_post-text"
        },

Supercharge Your Career Goals with Crawlbase

Scraping LinkedIn data can provide valuable insights for various applications, from job market analysis to competitive research. Crawlbase automate the process of gathering LinkedIn data, enabling you to focus on analyzing and utilizing the information. Using Crawlbase’s powerful Crawling API and Python, you can efficiently scrape LinkedIn profiles, company pages, and feeds.

If you’re looking to expand your web scraping capabilities, consider exploring our following guides on scraping other important websites.

📜 How to Scrape Indeed Job Posts
📜 How to Scrape Emails from LinkedIn
📜 How to Scrape Airbnb
📜 How to Scrape Realtor.com
📜 How to Scrape Expedia

If you have any questions or feedback, our support team is always available to assist you on your web scraping journey. Happy Scraping!

Frequently Asked Questions (FAQs)

Q. Is scraping LinkedIn legal?

Scraping LinkedIn is legal as long as you do not violate LinkedIn’s terms of service. It’s important to review LinkedIn’s policies and ensure that your scraping activities comply with legal and ethical guidelines. Always respect privacy and data protection laws, and consider using officially provided APIs when available.

Q. How to scrape LinkedIn?

To scrape LinkedIn, you can use Crawlbase’s Crawling API. First, set up your Python environment and install the Crawlbase library. Choose the appropriate scraper for your needs (profile, company, or feed), and make asynchronous requests to gather data. Retrieve the data using the Crawlbase Cloud Storage, which stores the response for easy access.

Q. What are the challenges in scraping LinkedIn?

Scraping LinkedIn involves several challenges. LinkedIn has strong anti-scraping measures that can block your activities. The dynamic nature of LinkedIn’s content makes it difficult to extract data consistently. Additionally, you must ensure compliance with legal and ethical standards, as violating LinkedIn’s terms of service can lead to account bans or legal action. Using a reliable tool like Crawlbase can help mitigate some of these challenges by providing robust scraping capabilities and adhering to best practices.

Q. What’s the best scraper for recruitment?

The LinkedIn profile scraper is ideal for recruitment, allowing you to extract candidate information including work history, skills, and education. Combine with the company scraper to research potential employers.

Q. Can I scrape multiple profiles at once?

Yes, use asynchronous requests to scrape multiple profiles efficiently. Crawlbase supports up to 20 requests per second, and the Storage API lets you retrieve all results using their unique request identifiers (rid).

How to Scrape LinkedIn: Complete 2025 Guide

Table of Contents

Why Scrape LinkedIn?

What Data Can We Scrape from LinkedIn?

LinkedIn Profiles:

Company Pages:

LinkedIn Feeds:

Scraping Challenges & Solutions

Anti-Scraping Measures

Dynamic Content

Legal Compliance

Data Volume

Getting Started with Crawlbase

1. Install Python:

2. Create a Virtual Environment:

3. Install Crawlbase Library:

Scraping LinkedIn Profiles:

Retrieving Data from Crawlbase Cloud Storage:

Scraping Company Pages

Retrieving Data from Crawlbase Cloud Storage

Scraping LinkedIn Feeds

Retrieving Data from Crawlbase Cloud Storage

Supercharge Your Career Goals with Crawlbase

Frequently Asked Questions (FAQs)

Q. Is scraping LinkedIn legal?

Q. How to scrape LinkedIn?

Q. What are the challenges in scraping LinkedIn?

Q. What’s the best scraper for recruitment?

Q. Can I scrape multiple profiles at once?

Hassan Rehan

Our solution

Crawling API

Share this post

Similar to "How to Scrape LinkedIn: Complete 2025 Guide"

Most read from advanced web scraping tutorials

Start crawling and scraping the web today