LinkedIn's public job listings are a rich source of hiring data: titles, companies, locations, posting dates, and the search facets that tell you where demand is heading. The problem is getting at that data programmatically. LinkedIn renders its pages client-side and challenges automated traffic hard, so a plain HTTP request hands you an empty shell instead of jobs. This guide shows you how to scrape LinkedIn public job listings in Python: a small, runnable script that fetches the rendered page through the Crawling API, pulls the fields you want, and writes them to disk.
To keep this honest, the whole walkthrough is scoped to public data: job listings anyone can see on LinkedIn's public job-search pages without logging in. It does not touch accounts, login-walled content, or personal profiles. The legal section below names that line clearly, and it is not boilerplate, so read it before you point this at real volume.
Is it legal to scrape LinkedIn?
Read this section first, because LinkedIn is one of the most sensitive targets on the web and the honest answer is "it depends, and the scope matters enormously." This guide is deliberately limited to public, non-authenticated job listings: the pages LinkedIn serves to anyone, logged in or not, on its public job-search surface. Within that scope, a few rules are non-negotiable:
- Do not scrape behind authentication. If a page requires a login to view, it is out of bounds for this guide. No session cookies, no credential reuse, no auth bypass of any kind.
- Do not collect personal data or personal profiles. Names, contact details, connection graphs, and individual profile pages are off the table. This walkthrough collects job-posting metadata, not data about identifiable people.
- Do not violate LinkedIn's User Agreement. Read LinkedIn's Terms of Service and its robots.txt before you start, and honor what they say. Respect the stated rate expectations and keep your request volume low enough that you are not straining anyone's servers.
On the legal backdrop: in hiQ Labs v. LinkedIn, U.S. courts addressed the scraping of publicly available data and, at a high level, found that accessing public pages did not by itself violate the Computer Fraud and Abuse Act. That case is often cited as cover for scraping LinkedIn, but it is not blanket permission. Contract terms (the User Agreement), privacy law, and how you use the data all still apply, and the landscape keeps shifting. Treat the public-data line as the floor, not a loophole.
Everything in this guide collects public job-listing metadata: titles, companies, locations, and posting dates that anyone can see without an account. It does not cover login-walled data, personal profiles, connection data, messaging, or any auth bypass. If your project needs more than public listings, the right move is an official partnership or data agreement with LinkedIn, not a cleverer scraper.
Why a plain fetch fails on LinkedIn
Request a LinkedIn job-search URL with a bare HTTP client and you get a 200 response with almost no job data in the body. Two things work against you. First, LinkedIn renders its listings in the browser with JavaScript, so the initial HTML is a shell that fills in only after the page's scripts run. Second, LinkedIn flags automated traffic fast: datacenter IPs and request patterns that do not look like a real visitor get challenged or blocked before they reach the rendered content.
So a working LinkedIn job scraper needs two things in a single request: a browser that renders the page, and an IP the platform reads as a real visitor. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but keeping that stack healthy is most of the work. The Crawling API folds both into one call: send it the URL with a JavaScript token, it renders the page behind a trusted IP, and returns the finished HTML.
Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. LinkedIn is client-side rendered, so you need the JS token here. Using the normal token returns the same empty shell a plain fetch would.
Understand the target: LinkedIn's public job-search URL
LinkedIn's public job search lives at a predictable URL whose query parameters map directly to the search form, so you can construct any search programmatically without driving the UI. Here is a concrete example: Python developer roles in London.
https://www.linkedin.com/jobs/search?keywords=Python%20Developer&location=London
The parameters that matter:
- keywords the role or skill you are searching for, URL-encoded.
- location the city, region, or country to search within.
Build the URL with the parameters you care about and you have a repeatable target. Vary the keywords and location in a loop and you have a hiring-trends job that watches public demand over time.
Set up your environment
You need Python 3.8 or later. Confirm your version, create a virtual environment so project dependencies stay isolated, then install the libraries.
python --version python -m venv linkedin_env source linkedin_env/bin/activate pip install crawlbase beautifulsoup4
On Windows, activate the environment with linkedin_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull fields out of it. You also need a Crawlbase account and a JS token, which you get from the dashboard after signing up. Drop it into the code wherever you see YOUR_CRAWLBASE_JS_TOKEN.
Fetch the rendered job-search page
Start by getting the finished page. You pass two options that matter for a site like LinkedIn: ajax_wait tells the API to wait for asynchronous content to load, and page_wait holds for a fixed number of milliseconds after load so late-rendering listings have time to appear. Five seconds is a reasonable starting point; raise it if results come back thin.
from crawlbase import CrawlingAPI api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"}) options = {"ajax_wait": "true", "page_wait": 5000} jobs_url = "https://www.linkedin.com/jobs/search?keywords=Python%20Developer&location=London" def fetch_jobs_html(url): response = api.get(url, options) if response["status_code"] == 200: return response["body"].decode("utf-8") print("Failed to fetch the page. Status code:", response["status_code"]) return None html = fetch_jobs_html(jobs_url) print(html[:500])
Run it and you should see real markup with job cards in it, not the empty shell a plain fetch returns. That confirms rendering is working before you write a single selector.
LinkedIn needs a rendered page behind a trusted IP, in one call. The Crawling API takes a JS token, runs the page in a real browser, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless fleet and a proxy pool yourself. Point it at a public job search on the free tier first.
Parse the job listings
With the HTML in hand, load it into BeautifulSoup and walk the job cards. Each card on the public search page carries the fields you want: job title, company name, location, and the posting date. Inspect the live page in your browser's dev tools to find the current selectors, then map each field to one.
from bs4 import BeautifulSoup def extract_jobs(html): soup = BeautifulSoup(html, "html.parser") jobs = [] for card in soup.select("div.base-card"): title = card.select_one("h3.base-search-card__title") company = card.select_one("h4.base-search-card__subtitle") location = card.select_one("span.job-search-card__location") posted = card.select_one("time") jobs.append({ "title": title.get_text(strip=True) if title else "", "company": company.get_text(strip=True) if company else "", "location": location.get_text(strip=True) if location else "", "posted": posted["datetime"] if posted else "", }) return jobs
LinkedIn's class names change without notice. Treat the selectors above as a starting template, not a contract. When extraction returns empty strings, re-inspect a live public job-search page and update the selectors. This is normal maintenance for any production scraper, not a sign something is broken.
Wire the fetch and the parse together in a main function so you have one runnable script.
def main(): html = fetch_jobs_html(jobs_url) if not html: return jobs = extract_jobs(html) for job in jobs: print(job) if __name__ == "__main__": main()
What the output looks like
Run the full script and you get a list of structured job objects. A trimmed sample:
[ { "title": "Senior Python Developer", "company": "Monzo Bank", "location": "London, England, United Kingdom", "posted": "2026-01-14" }, { "title": "Python Backend Engineer", "company": "Deliveroo", "location": "London, England, United Kingdom", "posted": "2026-01-12" } ]
Handle pagination
The public job search shows a first batch of listings and reveals more as you scroll or step through pages. The cleanest way to walk them is the start query parameter, which offsets the results: start=0 is the first page, start=25 the next, and so on. Loop the offset, fetch each page through the Crawling API, and collect the rows.
all_jobs = [] base = "https://www.linkedin.com/jobs/search?keywords=Python%20Developer&location=London" for start in range(0, 75, 25): page_url = f"{base}&start={start}" html = fetch_jobs_html(page_url) if not html: break all_jobs.extend(extract_jobs(html)) print(f"Collected {len(all_jobs)} listings")
Keep the page count modest and pace the loop. Walking three pages to sample a search is very different from sweeping thousands of offsets in a tight loop, and the second pattern is exactly what gets a scraper throttled.
Save the results to CSV
Printing to the console is fine while you iterate, but you want the data on disk. Python's built-in csv module maps each object key to a column and writes your rows in a few lines, no extra dependency needed.
import csv def save_to_csv(jobs, path="linkedin_jobs.csv"): fields = ["title", "company", "location", "posted"] with open(path, "w", newline="", encoding="utf-8") as f: writer = csv.DictWriter(f, fieldnames=fields) writer.writeheader() writer.writerows(jobs) print(f"Saved {path}")
Call save_to_csv(all_jobs) after the pagination loop and each run writes a tidy linkedin_jobs.csv you can open in any spreadsheet or load into a pipeline. If you would rather query the data with SQL, write the same rows into a SQLite table instead; the parsing stays identical.
Staying unblocked
Even with rendering handled, LinkedIn watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard commercial target.
- Pace your requests. Hammering the same search in a tight loop is the fastest way to get throttled. Spread requests out and vary your keywords and location.
- Lean on rotation. A pool of residential proxies spreads requests across many real-user IPs so no single address trips a rate limit. The Crawling API handles this for you; if you would rather route your own traffic, the Smart Proxy (also called the Smart AI Proxy) gives you the same residential IP rotation as a drop-in proxy endpoint.
- Read the status codes. A run that starts returning challenges or errors is telling you the current rate or IP tier is no longer enough. Treat proxy status error codes as signal, not noise, and back off when you see them.
For the broader playbook, see how to scrape websites without getting blocked.
Key takeaways
- Stay on public job data. This guide collects public job-listing metadata only. No login-walled pages, no personal profiles, no auth bypass; respect LinkedIn's User Agreement and robots.txt.
- LinkedIn is client-side rendered. A plain fetch returns an empty shell, so you must render the page before you parse it.
-
You need rendering and a trusted IP together. The Crawling API with a JS token does both in one call;
ajax_waitandpage_waitcontrol how long it waits for content. -
Pagination is an offset. Step the
startparameter in increments of 25 to walk additional pages of public results. - Rotate and pace to stay unblocked. The Crawling API rotates IPs for you; the Smart Proxy is the drop-in option if you route your own traffic.
Frequently Asked Questions (FAQs)
Is it legal to scrape LinkedIn?
It depends on what you scrape and how you use it. This guide stays strictly on public, non-authenticated job listings and avoids personal profiles, login-walled content, and auth bypass. Even there, LinkedIn's User Agreement, robots.txt, and applicable privacy law all apply. The hiQ v. LinkedIn case addressed public-data access at a high level, but it is not blanket permission. Read LinkedIn's terms first and keep your scope to public data.
Why does a plain fetch return no job data from LinkedIn?
Because LinkedIn renders its listings client-side with JavaScript. The initial HTML is a shell that fills in only after the page's scripts run in a browser, so a raw HTTP request returns status 200 with the job fields blank. To get real data you have to render the page first, which is what the Crawling API's JS token handles for you.
Do I need the normal token or the JS token for LinkedIn?
The JS token. The normal token fetches static HTML, which on LinkedIn is the same empty shell a plain fetch returns. The JS token renders the page in a real browser before handing back the HTML, so the listings are present when BeautifulSoup parses them.
My selectors return empty strings. What changed?
Almost certainly LinkedIn's markup. Its class names change without notice, so selectors that worked last month can break. Re-inspect a live public job-search page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.
How do I avoid getting blocked while scraping LinkedIn?
Keep your per-IP request rate low, vary your keywords and location instead of looping the same URL, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation and a trusted IP pool for you; if you build your own stack, the Smart Proxy gives you that rotation as a drop-in endpoint. Watch the status codes and back off when challenges appear.
Can I scrape LinkedIn profiles or messages with this?
No, and you should not try. This guide is scoped to public job listings on purpose. Personal profiles, connection data, and messaging sit behind authentication and involve personal data, which is out of bounds here. If your project needs more than public job listings, pursue an official LinkedIn partnership or data agreement rather than scraping authenticated pages.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
