451 CAOS Theory 
A blog for the enterprise open source community
FLOSSmole data confirms declining GPL usage
Matthew Aslett, June 13, 2011 @ 10:45 am ETLast week we published a post looking at some statistics suggesting a decline in the usage of the GNU GPL.
The post sparked some interesting debate, not least about the validity of Black Duck Software’s numbers, which we had used to compare usage of the various FLOSS licenses over recent years.
While we have no specific reason to doubt Black Duck’s figures, Bradley M Kuhn, in particular, suggested that Black Duck’s data should be “ignored by serious researchers” since the company doesn’t disclose enough detail about its data collection methods.
He added that “AFAICT, FLOSSmole is the only project attempting to generate this kind of data and analysis thereof in a scientifically verifiable way”.
You can probably guess where this is going…
Started in 2004, FLOSSmole* collects data on open source software projects. FLOSSmole’s data is freely available via Google Code.
In order to test Black Duck’s data we downloaded FLOSSmole data from four sources for which both current (May 2011) and historical (October 2008) data was available: Rubyforge, Freshmeat, ObjectWeb and the Free Software Foundation.
We then sorted each data set and generated subtotals for each license type, checking the data manually to make sure we had combined all the relevant data (data tagged GPL2, GPLv2 and GNU GPLv2 for example).
Given the wide variety of ways in which the various GNU Public Licenses have been tagged across the four data sources (a huge number of Freshmeat projects are tagged simply “General Public License” with no version number) it also made sense to group the licenses together into the GPL family (including LGPL and AGPL).
The results show that the GPL family of licenses accounted for 70.77% of all 53,914 projects in the sample in October 2008. In May 2011 that figure had declined to 59.31% of 54,800.
As a reminder, the figures from Black Duck showed the proportion of projects using the GPL family of licenses had declined from 70% in June 2008 to 61% today. So the FLOSSmole figures actually show a more rapid decline in GPL usage than Black Duck’s.
One important point to note is that a significant number of projects (5,775) in the 2011 Freshmeat data do not have license details. Removing these projects from the sample would result in the GPL family of licenses representing 66.3% of 49,025 projects in 2011.
Either way, the FLOSSmole results confirm a decline in GPL usage.
UPDATE: Just to be clear, the figures for ‘GPL family’ above include both LGPL and AGPL as well. FLOSSmole’s figures show both increased from 2008-2011, from 6.22% to 7.21% and 0.11% to 0.36% respectively.
2ND UPDATE: Of course, the % of total projects is only one way to measure adoption, and some people will argue it’s not a particularly good one. Certainly we’re not going to get carried away with the fact that the % of projects hosted by the Free Software Foundation using the GPL family has declined from 81.2% to 76.7%. Although it is kind of interesting.
*Howison, J., Conklin, M., & Crowston, K. (2006). FLOSSmole: A collaborative repository for FLOSS research data and analyses. International Journal of Information Technology and Web Engineering, 1(3), 17–26. (more)
Comments (8) Categories: Licensing




I wanted to take a moment to thank you for doing the work of using an open and verifiable data source.
I’m not particularly worried to see a decline in GPL’d software, because there’s been an overall increase in Free Software creation, so probably in absolute numbers, I’d guess that GPL has increased but is lower in percentage of projects.
I’d be curious to see a lines of code analysis of how many lines of code are licensed under a GPL license. I would suspect that a lines of code analysis, rather than mere projects, would not show such a sharp decrease, but I have no way of verifying this (ISTR FLOSS Mole doesn’t have LoC data, but maybe I’m misremembering.)
Finally, I don’t see you mentioning LGPL and AGPL anywhere, and was wondering if you included them in the “family” of GNU licenses. It seems that maybe you didn’t, which would further skew the numbers.
Thanks Bradley, I didn’t see LoC data but I wasn’t looking for it specifically, so it may be there.
Yes, “GPL family” includes LGPL and AGPL. I’ll add a quick note about that.
In real terms the number of both GPLv2 and GPLv3 project did increase and all GPL (without AGPL, LGPL) declined, but those figures are not reliable due to the large number of projects tagged “General Public License” with no version number.
Thanks
Matt
Matthew,
Corroboration of data across multiple sources always makes sense. Thanks Matthew for taking an extra step here and also for pointing to subtleties. For instance, as you point out, should projects that don’t contain license details be counted in the sample? Recently when running a language use query regarding “commits” we had to decide whether to include commits that didn’t include languages in the total. While there are reasons for and against, a decision like this is often made by internal experts who need to decide whether their choice passes the “sniff test” in addition to whether the data was derived in a scientifically accurate way. We’d like to think that we both provide the data and that its usefulness is enhanced due to understanding the context. As members of the OSS community, we try to do both.
[...] people dod not trust Black Duck’s data I also took a look at data collected by FLOSSmole. The results are remarkably similar. – [...]
[...] went to a lot of effort to analyze data collected and published by FLOSSmole to find that it confirmed the trend suggested by Black Duck’s figures, I am confident that the trends are an accurate reflection [...]
[...] and the Free Software Foundation collected and published by FLOSSmole, only to find that it confirmed the trend suggested by Black Duck’s figures. I was personally therefore happy to use Black [...]
[...] and the Free Software Foundation collected and published by FLOSSmole, only to find that it confirmed the trend suggested by Black Duck’s figures. I was personally therefore happy to use Black [...]
[...] some people were dubious of Black Duck’s statistics, to put it mildly, we also validated our initial findings, at Bradley M Kuhn’s suggestion, using a selection of data from FLOSSmole, which confirmed [...]