So, you have a business and you want to crawl LinkedIn to gather data for your marketing team. You want to scrape hundreds of company pages, or even user profiles to achieve your goals. Will you do it manually and spend your precious hours, days, and resources on it? Well, here in Crawlbase (formerly ProxyCrawl), we say you do not have to. Our Crawling API with the built-in LinkedIn data scraper can help you accomplish your goals faster with exceptional results.

In this guide, together we will build a simple scraper in Ruby that will crawl Amazon’s LinkedIn company profile, which then can be applied to any company of your choice.

Setting up the scraper

To actually start writing our code in Ruby, we will need to prepare the following:

  • The API URL https://api.crawlbase.com
  • The Scraper parameter ( scraper = linkedin-company )
  • Your Crawlbase (formerly ProxyCrawl) token
  • LinkedIn Company URL

Crawling LinkedIn with Crawlbase (formerly ProxyCrawl)

Now, let’s create a file and name it linkedin.rb which will contain our ruby code.

To start coding, open the file that you have created, and from there, we will first initialize the library and create a module to handle our API, token, and URL. Don’t forget to insert the scraper linkedin-company as well.

You can use your normal token for LinkedIn, and make sure to replace it with the actual token that you can find on your account.

1
2
3
4
5
6
7
require 'net/http'

uri = URI('https://api.crawlbase.com')
uri.query = URI.encode_www_form({
token: 'YOUR_TOKEN',
scraper: 'linkedin-company',
url: 'https://www.linkedin.com/company/amazon'})

Now that we’re done with the first part, let’s write the rest of the code. For this part, we will get the response of the URI module, the HTTP status code, and use a function JSON.pretty_generate so that our code will return a more readable JSON body.

The full code should now look like this:

1
2
3
4
5
6
7
8
9
10
11
12
require 'net/http'
require 'json'

uri = URI('https://api.crawlbase.com')
uri.query = URI.encode_www_form({
token: 'YOUR_TOKEN',
scraper: 'linkedin-company',
url: 'https://www.linkedin.com/company/amazon'})

res = Net::HTTP.get_response(uri)
puts "Response HTTP Status Code: #{res.code}"
puts JSON.pretty_generate(JSON.parse(res.body))

Now we just need to save our work and run the code. The result will return the following parsed data:

(Example output)

Ruby crawling example

We’re all set! Now is your turn to use this code however you like. Just be sure to replace the LinkedIn URL that you want to scrape. Alternatively, you can freely use our Crawling ruby library. Also, remember that we have two scrapers for LinkedIn, the linkedin-profile and linkedin-company which are pretty self-explanatory.

How and where would you use the information you extracted? It’s all up to you.
We hope you enjoyed this tutorial and we hope to see you soon in Crawlbase (formerly ProxyCrawl). Happy crawling!