How to scrap LinkedIn Jobs with Python

Oscar Rojas
3 min readJan 4, 2020

Getting a job is by itself a job. And if you are like me, a time-constrained MBA student, tired of clicking and more willing to interact with Excel sheets and dissect data through the magic of spreadsheets this solution is for you:

Now before we proceed, LinkedIn probably doesn’t like this, but as long as you don’t distribute, republish, sell or store the data for the long term, you should be fine. You might even impress your recruiters with your data scraping capabilities.

What we are going to use is Selenium with Python to control an instance of Chrome as if we were clicking and researching only; instead, we will be watching Paw Patrol and changing diapers, like Excel macros but for the Web. Awesome!

What you will need and I assume you have a working knowledge of Python is:

Python -> at least version 3.7.4

Chromedriver -> Download here, unzip and save to a folder.

Selenium -> pip install selenium

Pandas -> pip install pandas

The information that can be extracted is limited to around 750 jobs per search. We will be using random times to appear more human (let it cook slowly) there’s still some chance that your account gets blocked, I’m kidding, not really. Still, I would use a secondary account.

We import all the stuff we are going to need.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd
import random

Then we invoke the zombie Chrome

#Settings
USER_NAME = '' #Insert Account User here
PASS_WORD = '' #Insert Account Password here
url = 'https://www.linkedin.com/login'
#Driver location
driver = webdriver.Chrome('C:\\Users\\oscar\\Desktop\\chromedriver.exe')
#Open the zombie Chrome
driver.get(url)
user_element = driver.find_element_by_id("username")
pass_element = driver.find_element_by_id("password")
user_element.send_keys(USER_NAME)
pass_element.send_keys(PASS_WORD)
pass_element.send_keys(Keys.RETURN)
time.sleep(5)
driver.refresh()

By now, a zombie Chrome should be open with LinkedIn logged in. We then need to define what type of search and results we would like to scrap. Try this link, so you get a sense of where we are going. Because this is a dirty and quick approach, we are just going to build a list with all the links we are going to visit, LinkedIn gives it to us in batches of 25.

url2 = ['https://www.linkedin.com/jobs/search/?geoId=103035651&location=Berlin%2C%20Germany']
for x in range(25,1000,25):
url2.append(url2[0] + "&start=" + str(x))

If your search result gives you less than 1000 jobs or so, you might need to adjust the number of links to visit.

We are now ready to scrap. We create an empty pandas data frame and get the data. One of the things I noticed is that not all the job links get loaded when you visit each site; it is my impression that you need to scroll for them to be retrievable.

try:
df_full = pd.DataFrame()
for url in url2:
driver.get(url)
time.sleep(random.randrange(1, 6, 1))
lists2= driver.find_elements_by_css_selector('li.occludable-update.artdeco-list__item--offset-4.artdeco-list__item.p0.ember-view')
for b in lists2:
b.location_once_scrolled_into_view
time.sleep(random.randrange(1, 5, 1)/10)
lists2= driver.find_elements_by_css_selector('li.occludable-update.artdeco-list__item--offset-4.artdeco-list__item.p0.ember-view')

for l in lists2:
try:
job_title = l.find_element_by_css_selector('h3.job-card-search__title').text
company = l.find_element_by_css_selector('artdeco-entity-lockup-subtitle.job-card-search__company-name').text
location = l.find_element_by_css_selector('span.job-card-search__location').text
date_posted = l.find_element_by_css_selector('time.job-card-search__time-badge').get_attribute('datetime')
link = l.find_element_by_tag_name('a').get_attribute('href').split("?",1)[0]
df = pd.DataFrame({'job_name': job_title,
'company': company,
'city': location,
'posted': date_posted,
'link': link}, index=[0])
df_full = df_full.append(df, ignore_index=True)
except Exception as e:
print(e)

except Exception as e:
print(e)

And last but not least, we take the data frame to a good old Excel sheet:

df_full.to_excel('jobs_berlin.xlsx')

And the result looks just like this. Now, this is something I can work with, filter companies, make remarks, and discover those that seem to be in a hiring spree, etc.

Result

As a next step, I would like to capture the job description. This task can be done right in this step or on a second run after filtering out some jobs. Just click on each element with selenium and get the info on the right.

A note: There’s one bug where not all the fields are being found, this happens to a minor subset of jobs and can be refined by hand later. If you find a solution, do let me know.

Happy scrapping! Best of luck in your New Year job hunt.

--

--

Oscar Rojas

Product @ Vanguard ex N26 | UBS Passion for Investments & Technology.