On the rise and fall of the GNU GPL

Back in 2011 we caused something of a stir, to say the least, when we covered the trend towards permissive licensing at the expense of reciprocal copyleft licenses.

Since some people were dubious of Black Duck’s statistics, to put it mildly, we also validated our initial findings, at Bradley M Kuhn’s suggestion, using a selection of data from FLOSSmole, which confirmed the rate of decline in the proportion of projects using the GPL family of licenses between October 2008 and May 2011.

Returning to Black Duck’s figures, we later projected that if the rate of decline continued the GPL family of licenses (including the LGPL and AGPL) would account for only 50% of all open source software by September 2012.

As 2012 draws to a close it seems like a good time to revisit that projection and check the latest statistics.

I will preface this with an admission that yes, we know these figures only provide a very limited perspective on the open source projects in question. A more rounded study would look at other aspects such as how many lines of code a project has, how often it is downloaded, its popularity in terms of number of users or developers, how often the project is being updated, how many of the developers are employed by a single vendor, and what proportion of the codebase is contributed by developers other than the core committers. Since that would involve checking all these for more than 300,000 projects I’m going to pass on that.

Additionally, while all that is true, it does not mean that there is no value in examining the proportion of projects using a certain license. I am more interested in what the data does tell us, than what it doesn’t.

Data sources:
We analysed two distinct data sources for our previous analysis: Black Duck’s license data and a selection of data collected by FLOSSmole. Specifically we chose data from Rubyforge, Freecode (fka Freshmeat), ObjectWeb and the Free Software Foundation because those were the only sets for which historical (October 2008) data was available in mid 2011. For this update we have to use FLOSSmole’s data from September 2012 since the November 2012 dataset for the Free Software Foundation is incomplete. It is not possible to get a picture of GPLv2 traction using this FLOSSmole data since the majority of projects on Freecode are labelled “GPL” with no version number. In addition, for this update we have also looked at FLOSSmole data from Google Code, comparing datasets for November 2011 and November 2012. to get a sense of the trends on a newer project hosting site.

Black Duck’s data
According to Black Duck’s data the proportion of projects using the GNU GPL family of licenses declined from 70% in June 2008 to 53.24% today. The first thing to note therefore is that the rate of decline seen a year ago did not continue, and that the GNU GPL family of licenses continues to account for more than 50% of all open source software. The rate of the decline of the GNU GPLv2 has actually accelerated over the past year, however, and its usage is now almost the same as the combination of permissive licenses (I went with MIT/Apache/BSD/Ms-PL, you can argue about that last one if you like, but I’ve got to stick with it for consistency) at around 32%.

FLOSSmole’s data
Also in the interests of consistency I should clarify that we made a slight error in our previous calculations relating to the data from FLOSSmole. When we looked at the FLOSSmole data in June 2011 we reported a decline from 70.77% in October 2008 to 59.31% in May 2011. In calculating the data for this update I identified an error and that the figure should have been 62.8% in 2011. So less of a decline, but a decline nonetheless. The figures show that despite the total number of projects increasing from 54,000 in 2011 to 57,069 in September 2012, the proportion of projects using the GNU GPL family of licenses has remained steady at 62.8%. However, the proportion of projects using permissive licenses has grown, from 10.9% in 2008 to 13.4% in 2011 and 13.7% in September 2012.

Google Code data
The data from Google Code involves a much larger data set: 237,810 projects in 2011 and 300,465 in 2012. It also presents something problem since one of the choices on Google Code is dual-licensing using the Artistic License/GPL. Including these projects in the GNU GPL family count we see that the proportion of projects hosted on Google Code using the GNU GPL family of licenses declines from 54.7% in November 2011 to 52.7% in November 2011. Interestingly though the proportion of projects using permissive licenses also fell, from 38% in 2011 to 37.1% today. As a side note, the use of “other open source licenses” grew from 2.0% in 2011 to 4.3% in 2012.

What does it all mean? You can read as much or as little into the statistics as you wish. Since I am fed up with being accused of being a shill for providing analysis of the numbers I won’t bother to do so on this occasion – you are perfectly free to figure it out for yourselves.

Here’s everything in a single chart:

FLOSSmole data confirms declining GPL usage

Last week we published a post looking at some statistics suggesting a decline in the usage of the GNU GPL.

The post sparked some interesting debate, not least about the validity of Black Duck Software’s numbers, which we had used to compare usage of the various FLOSS licenses over recent years.

While we have no specific reason to doubt Black Duck’s figures, Bradley M Kuhn, in particular, suggested that Black Duck’s data should be “ignored by serious researchers” since the company doesn’t disclose enough detail about its data collection methods.

He added that “AFAICT, FLOSSmole is the only project attempting to generate this kind of data and analysis thereof in a scientifically verifiable way”.

You can probably guess where this is going…

Started in 2004, FLOSSmole* collects data on open source software projects. FLOSSmole’s data is freely available via Google Code.

In order to test Black Duck’s data we downloaded FLOSSmole data from four sources for which both current (May 2011) and historical (October 2008) data was available: Rubyforge, Freshmeat, ObjectWeb and the Free Software Foundation.

We then sorted each data set and generated subtotals for each license type, checking the data manually to make sure we had combined all the relevant data (data tagged GPL2, GPLv2 and GNU GPLv2 for example).

Given the wide variety of ways in which the various GNU Public Licenses have been tagged across the four data sources (a huge number of Freshmeat projects are tagged simply “General Public License” with no version number) it also made sense to group the licenses together into the GPL family (including LGPL and AGPL).

The results show that the GPL family of licenses accounted for 70.77% of all 53,914 projects in the sample in October 2008. In May 2011 that figure had declined to 59.31% of 54,800.

As a reminder, the figures from Black Duck showed the proportion of projects using the GPL family of licenses had declined from 70% in June 2008 to 61% today. So the FLOSSmole figures actually show a more rapid decline in GPL usage than Black Duck’s.

One important point to note is that a significant number of projects (5,775) in the 2011 Freshmeat data do not have license details. Removing these projects from the sample would result in the GPL family of licenses representing 66.3% of 49,025 projects in 2011.

Either way, the FLOSSmole results confirm a decline in GPL usage.

UPDATE: Just to be clear, the figures for ‘GPL family’ above include both LGPL and AGPL as well. FLOSSmole’s figures show both increased from 2008-2011, from 6.22% to 7.21% and 0.11% to 0.36% respectively.

2ND UPDATE: Of course, the % of total projects is only one way to measure adoption, and some people will argue it’s not a particularly good one. Certainly we’re not going to get carried away with the fact that the % of projects hosted by the Free Software Foundation using the GPL family has declined from 81.2% to 76.7%. Although it is kind of interesting.

*Howison, J., Conklin, M., & Crowston, K. (2006). FLOSSmole: A collaborative repository for FLOSS research data and analyses. International Journal of Information Technology and Web Engineering, 1(3), 17–26. (more)