***Special thanks go to Michael Schwartzman who runs his own scraping business, specializing in recovering websites from the Wayback Machine***
I recently came across some startup data from CrunchBase (data complete till Q4 of 2014) which is the most comprehensive online database for startups. Kudos to them for sharing their data. They even have an API that you can use to create startup orient web frameworks (more on that at the end of the post…)
Whenever I find an interesting data set on the web you know how itchy I get… Dying to get insights, squeeze the data in my analysis machines and get out useful results…So let’s get started…
I used Python Pandas for the data importing & data munging and analysis and matplotlib for creating the graphs. All these in a nice iPython –pylab environment for scraping the data. For those interested I can give you my iPython notebooks, which were kindly provided to me by the developer of Wayback Machine Downloader
Moreover, everything here is solely my own opinion. Feel free to discuss with me in the comments section.
This is a graph showing the total number of startups that got funded (timeframe: 1900 – now) per region. San Francisco is the king with more than twice the number of funded startups than the next best region. London comes 4th and it is the only European city among the first ten. The competition in San Francisco is huge according to my experience though. So if you want to give it a shot there you’d better be very well prepared before you visit it. Perfect example of this is Gate Keeper a SAAS for managing contracts with clients
Want an impressive quick calculation? The total number of companies for SF Bay Area is 6600. The population of it is 825,863. Say that we have 2 founders minimum per company. So there is one founder of a funded company in SF Bay area every 825,863/(6660*2) = 62. Gosh, this place is boiling with entrepreneurship….
One founder of a funded company in every 62 people in SF Bay Area!!!
So you want to create a startup. Which market should you choose? Of course the one you are passionate about…. But before you start let’s see what the data suggest. Which are the 10 most funded markets ever?
It seems that if you are not a biotechnology company the second best thing you could do is create for mobile or software or, why not, mobile software. If this is the case you might want to test your product fast (hint: google Lean Startup ). Try using Twitter Bootstrap to create a website which is mobile friendly. Add google analytics to your website. If most of your users come from mobile then reiterate your software code to native Android or iOS according to your traffic data. Thats the good thing with data. You always have a sense of direction….
Another point of interest in this table is the huge amount of investment into cleantech. It seems that there is hope for humanity after all. 😉
Below you can see a graph of the first 4 most funded market sectors and their total funding per year for the time period between 1985 – 2013
Again here you can see the impressive 2 peaks of funding in clean energy, one in 2001 and one in 2006. Along with the impressive peaks of software funding in 1998 (just before the dotcom bubble) and again in 2006.
The graph below displays the average funding for all startups and all the years for SF Bay Area, New York, Boston and London, stratified for acquired companies, companies that have closed and those still in operation.
On the other hand New York City seems to have lower average funding for companies that are closed. Probably they filter out the companies that are failing quicker. If you want to iterate faster then search for investors in New York city.
London also seems to have the lower average funding before acquisition which means that acquisitions happen faster and with lower overall funding there. It seems that in London you will get acquired faster and with lower average funding that in the other cities.
Seems that biotechnology got the lion’s share for the 3 out of 5 years. Cleantech also seems to be revived at least for 2014. Mobile was super trendy in 2010 and 2011 and what a surprise, news market seems to be coming first for 2014.
Did you know that Cardinal Health along with Verizon Communications are the most funded companies ever? That Facebook is 8th most funded company ever? That Uber ranks 14th in this list? Groupon 24rd? Tesla Motors that has been so accused for heavy funding is below Zynga and Survey Monkey. Take a look at the table below. The most known companies (at least to me) are highlighted with red.
The possibilities are limitless when you analyse data. For example, we could keep on analyzing the time lapse between starting time of companies and first round of funding to check the average to funding time. Stratify that time according to market or type of the company or according to region, etc etc. In general you can ask every possible question to your data and get answers.
And then a general pattern emerges. How about modelling this pattern? Thinking in reverse. Since you can get insights from data, how about creating models with machine learning and feed your models with new data to get predictions? How about scraping more data about these companies from the web to create more data points for more realistic models?
What other questions could we ask the CrunchBase data? Throw me some ideas in the comments below and I will do my best to include the answers in the future….