Back in 2011 we caused something of a stir, to say the least, when we covered the trend towards permissive licensing at the expense of reciprocal copyleft licenses.
Since some people were dubious of Black Duck’s statistics, to put it mildly, we also validated our initial findings, at Bradley M Kuhn’s suggestion, using a selection of data from FLOSSmole, which confirmed the rate of decline in the proportion of projects using the GPL family of licenses between October 2008 and May 2011.
Returning to Black Duck’s figures, we later projected that if the rate of decline continued the GPL family of licenses (including the LGPL and AGPL) would account for only 50% of all open source software by September 2012.
As 2012 draws to a close it seems like a good time to revisit that projection and check the latest statistics.
I will preface this with an admission that yes, we know these figures only provide a very limited perspective on the open source projects in question. A more rounded study would look at other aspects such as how many lines of code a project has, how often it is downloaded, its popularity in terms of number of users or developers, how often the project is being updated, how many of the developers are employed by a single vendor, and what proportion of the codebase is contributed by developers other than the core committers. Since that would involve checking all these for more than 300,000 projects I’m going to pass on that.
Additionally, while all that is true, it does not mean that there is no value in examining the proportion of projects using a certain license. I am more interested in what the data does tell us, than what it doesn’t.
We analysed two distinct data sources for our previous analysis: Black Duck’s license data and a selection of data collected by FLOSSmole. Specifically we chose data from Rubyforge, Freecode (fka Freshmeat), ObjectWeb and the Free Software Foundation because those were the only sets for which historical (October 2008) data was available in mid 2011. For this update we have to use FLOSSmole’s data from September 2012 since the November 2012 dataset for the Free Software Foundation is incomplete. It is not possible to get a picture of GPLv2 traction using this FLOSSmole data since the majority of projects on Freecode are labelled “GPL” with no version number. In addition, for this update we have also looked at FLOSSmole data from Google Code, comparing datasets for November 2011 and November 2012. to get a sense of the trends on a newer project hosting site.
Black Duck’s data
According to Black Duck’s data the proportion of projects using the GNU GPL family of licenses declined from 70% in June 2008 to 53.24% today. The first thing to note therefore is that the rate of decline seen a year ago did not continue, and that the GNU GPL family of licenses continues to account for more than 50% of all open source software. The rate of the decline of the GNU GPLv2 has actually accelerated over the past year, however, and its usage is now almost the same as the combination of permissive licenses (I went with MIT/Apache/BSD/Ms-PL, you can argue about that last one if you like, but I’ve got to stick with it for consistency) at around 32%.
Also in the interests of consistency I should clarify that we made a slight error in our previous calculations relating to the data from FLOSSmole. When we looked at the FLOSSmole data in June 2011 we reported a decline from 70.77% in October 2008 to 59.31% in May 2011. In calculating the data for this update I identified an error and that the figure should have been 62.8% in 2011. So less of a decline, but a decline nonetheless. The figures show that despite the total number of projects increasing from 54,000 in 2011 to 57,069 in September 2012, the proportion of projects using the GNU GPL family of licenses has remained steady at 62.8%. However, the proportion of projects using permissive licenses has grown, from 10.9% in 2008 to 13.4% in 2011 and 13.7% in September 2012.
Google Code data
The data from Google Code involves a much larger data set: 237,810 projects in 2011 and 300,465 in 2012. It also presents something problem since one of the choices on Google Code is dual-licensing using the Artistic License/GPL. Including these projects in the GNU GPL family count we see that the proportion of projects hosted on Google Code using the GNU GPL family of licenses declines from 54.7% in November 2011 to 52.7% in November 2011. Interestingly though the proportion of projects using permissive licenses also fell, from 38% in 2011 to 37.1% today. As a side note, the use of “other open source licenses” grew from 2.0% in 2011 to 4.3% in 2012.
What does it all mean? You can read as much or as little into the statistics as you wish. Since I am fed up with being accused of being a shill for providing analysis of the numbers I won’t bother to do so on this occasion – you are perfectly free to figure it out for yourselves.
Here’s everything in a single chart: