× Discover Scripted Blog Home

Hacking for Sales, Part 5

Until now, we’ve played mostly with APIs and page scrapes. In this lesson, we dive a little bit deeper into the later in order to solve a simple sales problem: knowing when your customer changes jobs.

I love JobChangeNotifier. It’s a simple web application that sends me an email each week showing which of my LinkedIn connections have new jobs, and where they are now. It’s fun to watch my friends’ careers evolve.

But more importantly, when a customer changes jobs, I suddenly have a sales opportunity.  If they liked Scripted at their old company, they’ll probably promote us at their new one. We have a dozen new deals this year alone from customer job changes.

However, JobChangeNotifier can’t help me if I’m not connected to my customer on LinkedIn. Some friends and I decided to hack a fix.

Step 1. Start with  a CSV.

Every legit CRM will have an export tool. I’m going to start with a CSV rather than explain how to use the Salesforce API to pull contacts. If you’ve read my previous posts on using APIs, you should be able to figure that out. Getting your contacts into a spreadsheet is a trivial step.

All we need is first name, last name, and company name in this three-column CSV. It should look something like this:

Jen Brian Band Digitial
David Skinner Band Digitial
Russel Evans Experian
Tim Titus Experian

Easy enough, right?

Step 2. Get the LinkedIn profile.

There are at least two ways to get your contact’s LinkedIn profile. The first and most direct (and perhaps terms compliant) approach is to use LinkedIn’s People Search API. There is complex authentication and required, and even when you’re authenticated, building the right query is actually not easy.

I decided to skip API approach in favor of a less direct but still very effective approach: the Internet search.

The code that follows is admittedly on the sketchier side of hacky. It uses a Python library to emulate a browser, so search engines treat your page loads the same as if you were doing it with a mouse. If you’re not comfortable with this approach, you can use urllib or urllib2 instead which adds no disguise about this being a Python script. But for those who want to see this hack in action, here you go.

from bs4 import BeautifulSoup
import urllib2, re, csv, sys, urllib, json, mechanize, os, time, random
import mechanize
import cookielib
#http://stockrt.github.com/p/emulating-a-browser-in-python-with-mechanize/
# Browser
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1309.0 Safari/537.17')]

The Python library I use here is called Mechanize, and once the options are set, it’s very easy to use. I create a browser instance called br with this line:

br = mechanize.Browser()

And you’ll see it’s used later in the script to load Bing queries and open public LinkedIn profile pages.

Now, let’s build a Bing query to find the LinkedIn profile.Screen Shot 2013-05-20 at 9.38.54 PM

burl = 'http://www.bing.com/search?q=site:linkedin.com %s %s %s' % (first, last, company)

I found that querying for “site:linkedin.com ryan buckley scripted” yields the target LinkedIn profile as the top search result about 80% of the time. That’s good enough for me, and it makes it easy to parse the results page and grab that first result.

Step 3. Parse the Bing results page.

Here’s some code you should understand from my previous posts.

r = br.open(burl.replace(' ', '%20'))
html = r.read()
soup = BeautifulSoup(html)
result = soup.find('div', { "class" : "sb_tlst" } )
if not hasattr(result, 'a'):
	print "No link?", result
	continue
if not result.a.has_key('href'):
	print "No href?", result
	continue
link = result.a['href']

The first line replaces all spaces with their HTML character, “%20”. This helps mechanize open the page. With the page loaded, I read it into a variable called html, which passes to BeautifulSoup for parsing. Again, all of this should be familiar (well, except the %20 thing, but you’d have discovered this need if you ran the code without it.)

Screen Shot 2013-05-20 at 9.39.10 PMThe “soup.find” command looks for the first div with class=”sb_tlst”. If I did find_all, it wouldhave returned an array, but find returns only the first one.

The next lines are some error testing. I need to make sure of two things:

1. The result object has an “a” attribute (meaning, it find the link within this div), and
2. The anchor tag has an “href” attribute (this has the LinkedIn URL that I’m looking for).

If these tests pass, then I store the link into a variable. And I’m off to the next step.

Step 4. Parse the LinkedIn profile.

Alright! We’re almost there, but LinkedIn makes this last part is tricky.

There are two types of public profiles, shown here:

Does not have a headline

Does not have a headline

Has a headline

Has a headline

One has a profile and the other doesn’t. We have to account for this difference in our code:

r = br.open(link)
html = r.read()
soup = BeautifulSoup(html)
result = soup.find('p', { "class" : "headline-title title" }) if soup.find('p', { "class" : "headline-title title" }) else soup.find('ul', { "class" : "current" })
if not result:
	continue
if result.string:
	if ' at ' in result.string:
		new_title, new_company = result.string.strip().split(' at ')
elif result.li:
	if ' at ' in result.find("li").get_text():
		new_title, new_company = result.find("li").get_text().strip().split(' at ')
if not new_company:
	continue

This little bit of code opens the LinkedIn profile URL we just saved from Bing, finds the profile headline using BeautifulSoup, and then parses the headline based on the word “at” into the title and company and stores them into two new variables. With some error handling, all that is just 15 lines of code. Python is amazing!

Step 5. Print changes to screen and/or write the CSV.

To recap, we read the contents of a three-column CSV including first name, last name, and company name. We used Bing to get the URL for the public LinkedIn profile. We loaded that page, parsed out the current job info and stored it.

Now, we compare the LinkedIn data to our spreadsheet and print people whose company names don’t match.

if company not in new_company.strip():
	ins['First'], ins['Last'], ins['Company'], ins['New Title'], ins['New Company'], ins['Link'] = first, last, company, new_title.strip(), new_company.strip(), link
	f = open(outpath,'a')
	dw = csv.DictWriter(f, fieldnames=fields)
	dw.writerow(ins)
	f.close()
	print "%s - %s %s: was at %s, now at %s (%s)" % (i, ins['First'], ins['Last'], ins['Company'], ins['New Company'], ins['Link'])

You won’t be able to run this, so let’s just read it. The dictionary “ins” is not defined yet (I did that earlier in the script but haven’t shown it yet in this post). Before I can insert these data to the dictionary, I have to define it (which also empties it at the top of each loop) like this:

ins = {}

The lines that follow the creation of the dictionary are the standard process you’ll copy every time you need to write a CSV. I’m a huge fan of the DictWriter now. The code is cleaner, it makes sense, and it’s quick. The print command at the end shows me the data I’m writing to the CSV.

Here’s the full script in all of its glory. Note that you should set inpath and outpath to match your file structure, and you’ll need that three column CSV to start with. For your convenience, I’m sharing one here.

from bs4 import BeautifulSoup
import urllib2, re, csv, sys, urllib, json, mechanize, os, time, random
import mechanize
import cookielib
#http://stockrt.github.com/p/emulating-a-browser-in-python-with-mechanize/
# Browser
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1309.0 Safari/537.17')]

# Assumes list is in ~/data/starting/list.csv
inpath = os.getenv("HOME")+"/data/starting/linkedin_test.csv"
outpath = os.getenv("HOME")+"/data/finished/linkedin_test.csv"
#field for outpath
fields = ['First', 'Last', 'Company', 'New Title', 'New Company', 'Link']

cos = csv.reader(open(inpath, 'rU'))
i = 0
for co in cos:
	try: 
		i += 1
		if i < 6: 
			continue
		ins, new_title, new_company = {}, '', ''
		first, last, company = co[0].strip(), co[1].strip(), co[2].strip()
		burl = 'http://www.bing.com/search?q=%s %s %s site:linkedin.com' % (first, last, company)
		r = br.open(burl.replace(' ', '%20'))
		html = r.read()
		soup = BeautifulSoup(html)
		result = soup.find('div', { "class" : "sb_tlst" } )
		if not hasattr(result, 'a'):
			print "No link?", result
			continue
		if not result.a.has_key('href'):
			print "No href?", result
			continue
		link = result.a['href']
		r = br.open(link)
		html = r.read()
		soup = BeautifulSoup(html)
		result = soup.find('p', { "class" : "headline-title title" }) if soup.find('p', { "class" : "headline-title title" }) else soup.find('ul', { "class" : "current" })
		if not result:
			continue
		if result.string:
			if ' at ' in result.string:
				new_title, new_company = result.string.strip().split(' at ')
		elif result.li:
			if ' at ' in result.find("li").get_text():
				new_title, new_company = result.find("li").get_text().strip().split(' at ')
		if not new_company:
			continue
		if company not in new_company.strip():
			ins['First'], ins['Last'], ins['Company'], ins['New Title'], ins['New Company'], ins['Link'] = first, last, company, new_title.strip(), new_company.strip(), link
			f = open(outpath,'a')
			dw = csv.DictWriter(f, fieldnames=fields)
			dw.writerow(ins)
			f.close()
			print "%s - %s %s: was at %s, now at %s (%s)" % (i, ins['First'], ins['Last'], ins['Company'], ins['New Company'], ins['Link'])
	except Exception as detail:
		print "Failed on record %s with error %s" % (i, detail)
		#c.close()

 

You might also like...
qr codes

It’s Time to Give Up on QR Codes

Although QR codes came on the scene in 1994 (as a special type of barcode designed for use by the Japanese auto industry), over the last few years marketers have jumped a...

Your Product Release Roundup

Content marketing just got a little easier. Each of these features was designed with someone like you in mind! If you haven’t seen them yet, we won’t take it personal...