Hacking for Sales, Part 4

Recently, I got curious what other companies received funding, and when the funding event happened. There’s no better resource for this data than Crunchbase, and it’s another great little Python exercise to get that data out and into a spreadsheet.

First, let’s start with the imports and data sources.

import json, urllib, os, csv
inpath = os.getenv("HOME")+"/data/starting/permalinks.csv"
outpath = os.getenv("HOME")+"/data/finished/funding_data.csv"

I like to define the source of data and its resting place upfront. You’ll see if you keep following me on these, that I’ll always call them inpath and outpath. I also usually run them on my Mac, but sometimes if it’s a big file I’ll shoot it up to my server and let it run there. The os library allows me to define a home path without providing absolute folders, so paths work the same on my Mac as they do on the server.

To make this clearer, my home directory on my MacBook is /Users/rbucks. I keep all of my scripts in /Users/rbucks/scripts/, and my data is all in /Users/rbucks/data/starting/ and /Users/rbucks/data/finished/.

I keep this same file structure on an Ubuntu server, but of course the home directory is different. On Ubuntu, it’s /home/rbucks/. In order to save myself time, and make the same script run locally and on my server, I like to stick os.getenv(“HOME”) in all of my script file paths.

Next, let’s make a function to write a csv.

def write(sales_data):
	f = open(outpath,'a')
	fields = ['permalink', 'total_money_raised', 'dates_raised']
	dw = csv.DictWriter(f, fieldnames=fields)

We covered functions in an earlier post. This one takes a list of dictionaries and writes it beautifully into a csv. All you need to do is tell it which keys to look for (that’s the fields list).

There’s another function called csv.writer but I’m sold now on dictionary writing. Without the keys, you need to offset empty fields when you create the list to write the csv, and the code is much messier. But csv.writer works well enough, and it was my first csv writing tool.

But don’t bother with it. DictWriter is way cooler.

Finally, loop and write.

Here’s the code for the rest of it.

key = "get your key from crunchbase"
cos = csv.reader(open(inpath, 'rU'))
for co in cos:
		sales_data = {}
		sales_data['permalink'] = co[0]
		qry_url = 'http://api.crunchbase.com/v/1/company/%s.js?api_key=%s' % (co[0], key)
		qry_response = urllib.urlopen(qry_url).read()
		qry_result = json.loads(qry_response)
		if qry_result.has_key('error'):
		if qry_result.has_key('total_money_raised'):
			sales_data['total_money_raised'] = qry_result['total_money_raised']
		if qry_result.has_key('funding_rounds'):
			dates = []
			for r in qry_result['funding_rounds']:
				funded_date = "%s/%s/%s" % (r['funded_month'], r['funded_day'], r['funded_year'])				
			sales_data['dates_raised'] = ", ".join(dates)
		print "Ran "+co[0]
	except Exception as detail:
		print "Gah", detail
		print qry_url

I’m going to breeze through this, and I’m not trying to be lazy. If you followed the first three of these, the Python code here should make sense, and if it doesn’t, you should know how to use the Python console to make it make sense.

I load a list of permalinks into a variable called cos via the csv.reader function that we imported in the first step. This csv is little more than a single field with a bunch of permalinks I pulled in Step 3, like this:


If you want to play along, take that list and save it into a file called “permalinks.csv” in your home’s data/starting folder. The csv.reader function loads this into an iterable csv object that I’m calling cos. At each iteration of the for loop, it’s pulling a row into a variable I’m calling “co,” which is a list of row values. To get the first value, I use co[0] (also covered earlier).

Now watch this closely. sales_data is a dictionary, defined by the empty curly brackets. Dictionaries take keys and values, unlike lists which just take values. So in the third line of the loop, I’m adding a key-value pair for ‘permalink’ in my dictionary. I’m defining permalink as that row’s value in the csv. I want to record this permalink in my new csv.

Also note that all of the fields in my write function get defined in this loop. So I create a key-value pair also for ‘total_money_raised’ and ‘dates_raised.’ This is very important! The write function will break if it sees a key in sales_data that’s not defined in the fields list.

The rest of this is same as last time. To learn how this works, take it line by line in the Python console. Go ahead and start with “scripted-com” as your permalink, and play with the JSON response.

qry_url = 'http://api.crunchbase.com/v/1/company/%s.js?api_key=%s' % ('scripted-com', key)
qry_response = urllib.urlopen(qry_url).read()
qry_result = json.loads(qry_response)

(Note: you need to have key defined with your own Crunchbase API key.)

Type qry_result in the Python console and look at the data. There’s a lot. Try iterating through the first level like this:

for r in qry_result:
	print r

Then try printing out the keys in r. Type r[‘total_money_raised’]. Or r[‘funding_rounds’]. You should start to see how this script works, and how awesome JSON data is for salespeople to work with.