Growing a business is no easy task, you will run into lots of challenges and if you’re not well equipped, you may have a hard time succeeding. One of the obvious ways to grow your business is to simply advertise your product to as many potential customers as you can with minimal cost, and one proven and effective way of doing so is by generating leads.

In a marketing sense, lead generation is an approach to convert prospects into someone who has indicative interest to your business or product. There are various ways to generate leads, as well as different type of leads. However, before you can start building and generating leads, you will need to find a way to reach your prospects first. And the best way to reach them is by simply getting emails from various social media sites especially on LinkedIn.

So, why LinkedIn? Simply put, it is where you will find professionals leveling up their business. It is also a platform of choice for the Fortune 500 companies, which means that there is a high chance that you’ll find your target audience here. In short, if your business requires company emails for your targeted campaigns, then this is the best place to start.

An easy-to-follow guide for extracting Emails

The main objective of this article is to help you acquire company leads through LinkedIn. We will show you a step-by-step guide on how to utilize the Crawlbase (formerly ProxyCrawl) products to crawl user-profiles and their respective company pages and extract the emails from the company domain.

To help you get started, allow us to briefly discuss the two main API that we will be using to accomplish our goal:

Crawling API – This will be the main tool for our project. It will allow us to crawl and scrape publicly available profiles in LinkedIn efficiently and without getting blocked.

Leads API – Once we have obtained the domain of a company using our Crawling API, we will be able to use the Leads API to crawl the company domain for fresh leads. Fresh means the information that will be extracted is live and not cached or stored from any database, which will eliminate any worries about getting invalid or outdated emails.

The two API products that we have mentioned will serve as the backbone of our scraper. To demonstrate the effectiveness and flexibility of these APIs, we will be coding in Node.js and utilize the Crawlbase (formerly ProxyCrawl) Library with the help of Cheerio to scrape the data needed for our project.

Use Node.js to crawl user profile and company pages on LinkedIn

For the purpose of this article, we will be using Visual studio code as it is one of the most popular and accessible editor that can be used on most operating systems.

Before we dive into coding, let us prepare our project structure and be sure to install all prerequisites.

  1. Create a new Node.js project (example name: LINKEDIN)
  2. Install the Crawlbase (formerly ProxyCrawl) library for Node.js, open the terminal and execute npm i proxycrawl
  3. Install the Node Cheerio library, simply enter npm i cheerio
  4. Create a js file for the Crawling API. (example: Start.js)
  5. Create a secondary js file for the Leads API. (example: Leads.js)

Once done, let us start writing our code in the first .js file that we have created (Start.js). Our first two lines will declare all the constants and require the necessary API class in this project.

1
2
const { CrawlingAPI } = require('proxycrawl');
const cheerio = require('cheerio');

The next line will be important as it will hold the value of your Crawlbase (formerly ProxyCrawl) token:

1
const api = new CrawlingAPI({ token: 'normal_token' });

Now, we can write a simple API call based on the Crawlbase (formerly ProxyCrawl) library to crawl a LinkedIn user profile of your choice. We will also utilize cheerio in this part to parse the returned HTML source code to scrape the most recent company of the user and then display on the console log the URL of the company profile.

1
2
3
4
5
6
7
8
api.get('https://www.linkedin.com/in/williamhgates').then((response) => {
if (response.statusCode === 200) {
const $ = cheerio.load(response.body);
const companyURL = $('.experience__list a');
const output = companyURL.attr('href');
console.log('Company page: ', output);
}
});

The next part of our code will scrape the “output” of the previous part. This will allow us to get the actual link of the company, and from there, the Crawling API will again do its trick to crawl the company’s LinkedIn page. Cheerio will then parse the HTML the second time and display the actual website of the company.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// Previous code
api.get('https://www.linkedin.com/in/williamhgates').then((response) => {
if (response.statusCode === 200) {
const $ = cheerio.load(response.body);
const companyURL = $('.experience__list a');
const output = companyURL.attr('href');
console.log('Company page: ', output);

// New code
api.get(output).then((response) => {
if (response.statusCode === 200) {
const $ = cheerio.load(response.body);
const website = $('.basic-info-item__description a');
console.log('website: ', website.text());
}
});
}
});

This js file is now complete and once executed, the program will scrape your targeted LinkedIn profile and will return the LinkedIn page URL and the website of the user’s company, as our example shows below:

Output:

Javascript File Output

Use Crawlbase (formerly ProxyCrawl)’s Leads API to get fresh Emails

Now that we have obtained the company’s website, our next step is to use Crawlbase (formerly ProxyCrawl)’s Leads API to scrape any email from the company domain. Go ahead and open the second .js file you have created at the start of this guide and utilize the Leads API as you can see below:

1
2
3
4
5
6
7
8
9
const { LeadsAPI } = require('proxycrawl');
const api = new LeadsAPI({ token: 'private_token' });

api
.getFromDomain('gatesfoundation.org')
.then((response) => {
console.log(response.leads);
})
.catch((error) => console.error);

Please note that you must omit the http:// and enter a valid domain to get a successful API response.

Crawlbase (formerly ProxyCrawl) Leads API

At this point, we have completed our scraper. As you can see, using the API is very straightforward. With just a few lines of code, we are able to scrape LinkedIn and get exactly what we are looking for. However, this does not end here.

Crawlbase (formerly ProxyCrawl) cares for every client, be it developers or non-developers. So, we have developed a tool that everyone can use without the need to write your code. The Leads Finder is an easy to use tool with a simple user interface to quickly find emails by simply entering the target company’s domain. It works just as great as the API as you would see in this example:

Leads Finder

Conclusion

Differences between datacenter and residential proxies
On LinkedIn, as you may already know, public information is the data shown to anyone that will visit the site without logging in and is also visible on public search engines. Users can customize their profile settings and set limits on how much of the information can be displayed publicly. In generating leads, this public information is crucial if you wish to have a successful marketing campaign.

In summary, if your strategy is to reach out to more companies, then Crawlbase (formerly ProxyCrawl) can play a big role for you and your marketing team. By using our API combined with Artificial Intelligence, you can easily crawl and scrape publicly available data on LinkedIn, as well as extract fresh emails from company domains with the help of the Leads API, all while protecting your crawler against CAPTCHAs and blocked requests.

So, if you want an easy-to-use, fast, and reliable scraping tools for your future projects, be sure to check out Crawlbase (formerly ProxyCrawl)’s Crawling and Scraper API.