On the rise and fall of the GNU GPL

Back in 2011 we caused something of a stir, to say the least, when we covered the trend towards permissive licensing at the expense of reciprocal copyleft licenses.

Since some people were dubious of Black Duck’s statistics, to put it mildly, we also validated our initial findings, at Bradley M Kuhn’s suggestion, using a selection of data from FLOSSmole, which confirmed the rate of decline in the proportion of projects using the GPL family of licenses between October 2008 and May 2011.

Returning to Black Duck’s figures, we later projected that if the rate of decline continued the GPL family of licenses (including the LGPL and AGPL) would account for only 50% of all open source software by September 2012.

As 2012 draws to a close it seems like a good time to revisit that projection and check the latest statistics.

I will preface this with an admission that yes, we know these figures only provide a very limited perspective on the open source projects in question. A more rounded study would look at other aspects such as how many lines of code a project has, how often it is downloaded, its popularity in terms of number of users or developers, how often the project is being updated, how many of the developers are employed by a single vendor, and what proportion of the codebase is contributed by developers other than the core committers. Since that would involve checking all these for more than 300,000 projects I’m going to pass on that.

Additionally, while all that is true, it does not mean that there is no value in examining the proportion of projects using a certain license. I am more interested in what the data does tell us, than what it doesn’t.

Data sources:
We analysed two distinct data sources for our previous analysis: Black Duck’s license data and a selection of data collected by FLOSSmole. Specifically we chose data from Rubyforge, Freecode (fka Freshmeat), ObjectWeb and the Free Software Foundation because those were the only sets for which historical (October 2008) data was available in mid 2011. For this update we have to use FLOSSmole’s data from September 2012 since the November 2012 dataset for the Free Software Foundation is incomplete. It is not possible to get a picture of GPLv2 traction using this FLOSSmole data since the majority of projects on Freecode are labelled “GPL” with no version number. In addition, for this update we have also looked at FLOSSmole data from Google Code, comparing datasets for November 2011 and November 2012. to get a sense of the trends on a newer project hosting site.

Black Duck’s data
According to Black Duck’s data the proportion of projects using the GNU GPL family of licenses declined from 70% in June 2008 to 53.24% today. The first thing to note therefore is that the rate of decline seen a year ago did not continue, and that the GNU GPL family of licenses continues to account for more than 50% of all open source software. The rate of the decline of the GNU GPLv2 has actually accelerated over the past year, however, and its usage is now almost the same as the combination of permissive licenses (I went with MIT/Apache/BSD/Ms-PL, you can argue about that last one if you like, but I’ve got to stick with it for consistency) at around 32%.

FLOSSmole’s data
Also in the interests of consistency I should clarify that we made a slight error in our previous calculations relating to the data from FLOSSmole. When we looked at the FLOSSmole data in June 2011 we reported a decline from 70.77% in October 2008 to 59.31% in May 2011. In calculating the data for this update I identified an error and that the figure should have been 62.8% in 2011. So less of a decline, but a decline nonetheless. The figures show that despite the total number of projects increasing from 54,000 in 2011 to 57,069 in September 2012, the proportion of projects using the GNU GPL family of licenses has remained steady at 62.8%. However, the proportion of projects using permissive licenses has grown, from 10.9% in 2008 to 13.4% in 2011 and 13.7% in September 2012.

Google Code data
The data from Google Code involves a much larger data set: 237,810 projects in 2011 and 300,465 in 2012. It also presents something problem since one of the choices on Google Code is dual-licensing using the Artistic License/GPL. Including these projects in the GNU GPL family count we see that the proportion of projects hosted on Google Code using the GNU GPL family of licenses declines from 54.7% in November 2011 to 52.7% in November 2011. Interestingly though the proportion of projects using permissive licenses also fell, from 38% in 2011 to 37.1% today. As a side note, the use of “other open source licenses” grew from 2.0% in 2011 to 4.3% in 2012.

What does it all mean? You can read as much or as little into the statistics as you wish. Since I am fed up with being accused of being a shill for providing analysis of the numbers I won’t bother to do so on this occasion – you are perfectly free to figure it out for yourselves.

Here’s everything in a single chart:

On the continuing decline of the GPL

Our most popular CAOS blog post of the year, by some margin, was this one, from early June, looking at the trend towards persmissive licensing, and the decline in the usage of the GNU GPL family of licenses.

Prompted by this post by Bruce Byfield, I thought it might be interesting to bring that post up to date with a look at the latest figures.

NB: I am relying on the current set of figures published by Black Duck Software for this post, combined with our previous posts on the topic. I am aware that some people are distrustful of Black Duck’s figures given the lack of transparency on the methodology for collecting them. Since I previously went to a lot of effort to analyze data collected and published by FLOSSmole to find that it confirmed the trend suggested by Black Duck’s figures, I am confident that the trends are an accurate reflection of the situation.

The figures indicate that not only has the usage of the GNU GPL family of licenses (GPL2+3, LGPL2+3, AGPL) continued to decline since June, but that the decline has accelerated. The GPL family now accounts for about 57% of all open source software, compared to 61% in June.

As you can see from the chart below, if the current rate of decline continues, we project that the GPL family of licenses will account for only 50% of all open source software by September 2012.

That is still a significant proportion of course, but would be down from 70% in June 2008. Our projection also suggests that permissive licenses (specifically in this case, MIT/Apache/BSD/Ms-PL) will account for close to 30% of all open source software by September 2012, up from 15% in June 2009 (we don’t have a figure for June 2008 unfortunately).

Of course, there is no guarantee that the current rate of decline will continue – as the chart indicates the rate of decline slowed between June 2009 and June 2011, and it may well do so again. Or it could accelerate further.

Interestingly, however, while the more rapid rate of decline prior to June 2009 was clearly driven by the declining use of the GPLv2 in particular, Black Duck’s data suggests that the usage of the GPL family declined at a faster rate between June 2011 and December 2011 (6.7%) than the usage of the GPLv2 specifically (6.2%).

UPDATE – It is has been rightfully noted that this decline relates to the proportion of all open source software, while the number of projects using the GPL family has increased in real terms. Using Black Duck’s figures we can calculate that in fact the number of projects using the GPL family of licenses grew 15% between June 2009 and December 2011, from 105,822 to 121,928. However, in the same time period the total number of open source projects grew 31% in real terms, while the number of projects using permissive licenses grew 117%. – UPDATE

As indicated in June, we believe there are some wider trends that need to be discussed in relation to license usage, particularly with regards to vendor engagement with open source projects and a decline in the number of vendors engaging with strong copyleft licensed software.

The analysis indicated that the previous dominance of strong copyleft licenses was achieved and maintained to a significant degree due to vendor-led open source projects, and that the ongoing shift away from projects controlled by a single vendor toward community projects was in part driving a shift towards more permissive non-copyleft licenses.

We will update this analysis over the next few days with a look at the latest trends regarding the engagement of vendors with open source projects, and venture funding for open source-related vendors, providing some additional context for the trends related to licensing.

FLOSSmole data confirms declining GPL usage

Last week we published a post looking at some statistics suggesting a decline in the usage of the GNU GPL.

The post sparked some interesting debate, not least about the validity of Black Duck Software’s numbers, which we had used to compare usage of the various FLOSS licenses over recent years.

While we have no specific reason to doubt Black Duck’s figures, Bradley M Kuhn, in particular, suggested that Black Duck’s data should be “ignored by serious researchers” since the company doesn’t disclose enough detail about its data collection methods.

He added that “AFAICT, FLOSSmole is the only project attempting to generate this kind of data and analysis thereof in a scientifically verifiable way”.

You can probably guess where this is going…

Started in 2004, FLOSSmole* collects data on open source software projects. FLOSSmole’s data is freely available via Google Code.

In order to test Black Duck’s data we downloaded FLOSSmole data from four sources for which both current (May 2011) and historical (October 2008) data was available: Rubyforge, Freshmeat, ObjectWeb and the Free Software Foundation.

We then sorted each data set and generated subtotals for each license type, checking the data manually to make sure we had combined all the relevant data (data tagged GPL2, GPLv2 and GNU GPLv2 for example).

Given the wide variety of ways in which the various GNU Public Licenses have been tagged across the four data sources (a huge number of Freshmeat projects are tagged simply “General Public License” with no version number) it also made sense to group the licenses together into the GPL family (including LGPL and AGPL).

The results show that the GPL family of licenses accounted for 70.77% of all 53,914 projects in the sample in October 2008. In May 2011 that figure had declined to 59.31% of 54,800.

As a reminder, the figures from Black Duck showed the proportion of projects using the GPL family of licenses had declined from 70% in June 2008 to 61% today. So the FLOSSmole figures actually show a more rapid decline in GPL usage than Black Duck’s.

One important point to note is that a significant number of projects (5,775) in the 2011 Freshmeat data do not have license details. Removing these projects from the sample would result in the GPL family of licenses representing 66.3% of 49,025 projects in 2011.

Either way, the FLOSSmole results confirm a decline in GPL usage.

UPDATE: Just to be clear, the figures for ‘GPL family’ above include both LGPL and AGPL as well. FLOSSmole’s figures show both increased from 2008-2011, from 6.22% to 7.21% and 0.11% to 0.36% respectively.

2ND UPDATE: Of course, the % of total projects is only one way to measure adoption, and some people will argue it’s not a particularly good one. Certainly we’re not going to get carried away with the fact that the % of projects hosted by the Free Software Foundation using the GPL family has declined from 81.2% to 76.7%. Although it is kind of interesting.

*Howison, J., Conklin, M., & Crowston, K. (2006). FLOSSmole: A collaborative repository for FLOSS research data and analyses. International Journal of Information Technology and Web Engineering, 1(3), 17–26. (more)

The trend towards permissive licensing

Ian Skerrett last week suggested that there is a growing trend in favour of permissive non-copyleft licenses at the expense of reciprocal copyleft licenses. Ian asked “name one popular community open source project created in the last 5 years that uses the AGPL or GPL?”

The responses didn’t exactly come thick and fast. I certainly couldn’t think of one. But the question did prompt me to look for some evidence for the trend away from copyleft licenses.

License usage
The first port of call for evidence of trends related to open source license use is Black Duck’s Open Source Resource Center. The lastest figures show that GPLv2 is used for 45.33% of projects in Black Duck’s KnowledgeBase, while the GPL family accounts for roughly 61% of all projects.

While the GPL family is dominant, comparing the latest figures with those provided in June 2008, June 2009, and some previous CAOS research from March 2010 indicates a steady decline in the use of the GPL family and the GPLv2 in particular.

According to Black Duck’s figures the proportion of open source projects using the GPL family of licenses has fallen to 61% today from 70% in June 2008, while the GPLv2 has fallen to 45% from 58% three years ago.

It is worth noting that the number of projects using the GPL licenses has increased in real terms over the past few years. According to our calculations based on Black Duck’s figures, the number of GPLv2 projects rose 5.5% between June 2009 and June 2011, while the total number of open source projects grew over 16%.

We should expect to see slower growth for the GPLv2 given it has been superseded but even though the number of AGPLv3 and GPLv3 projects grew 90% and 85% respectively over the past two years, that only resulted in 29% growth for the GPL family overall (while A/L/GPLv3 adoption appears to be slowing).

In comparison the number of Apache licensed projects grew 46% over the past two years, while the number of MIT licensed projects grew 152%. Indeed Black Duck’s figures indicate that the MIT License has been the biggest gainer in the last two years, jumping from 3.8% of all projects in June 2009 to 8.23% today, leapfrogging Apache, BSD, GPLv3 and LGPLv2.1 in the process.

While the level of adoption of copyleft licenses remains dominant, and continues to rise in terms of the number of projects, there is no escaping the continuing overall decline in terms of ‘license share’.

UPDATE – Since some people dod not trust Black Duck’s data I also took a look at data collected by FLOSSmole. The results are remarkably similar. – UPDATE

Vendor formation
Black Duck’s data is not the only indication that the importance of copyleft licenses has decreased in recent years. The research we conducted as part of of our Control and Community report also indicated a decline in the number of vendors engaging with strong copyleft licensed software.

Specifically, we evaluated the open source-related strategies of 300 software vendors and subsidiaries, including the license choice, development model, copyright strategy and revenue generator.

By plotting the results of this analysis against the year in which the companies were founded (for open source specialists) or began to engage with open source (for complementary vendors) we are able to gain a perspective on the changing popularity of the individual strategies*.

Having updated the results to the end of 2010, our analysis now covers 321 vendors and shows that 2010 was the first year in which there were more companies formed around projects with non-copyleft licences than with strong copyleft licences.

The formation of vendors around open source software with strong copyleft licenses peaked in 2006, having risen steadily between 1997 and 2006 – although there have been gains since 2007. By comparison, the formation of vendors around open source software with non-copyleft licences has been steadily increasing since 2002.

The results get even more interesting in terms of Ian’s question if we filter them by development model. Looking at community-led development projects, we see that there have been significantly more companies formed around community-led projects with non-copyleft licenses than with strong copyleft licenses since 2007.

In fact, strong copyleft licenses have been much more popular for vendor-led development projects, but even here there was an increase in the use of non-copyleft licenses in 2010.

This last chart illustrates something significant about the previous dominance of strong copyleft licenses: that it was achieved and maintained to a significant degree due to the vendor-led open source projects, rather than community-led projects.

One of the main findings of our Control and Community report was the ongoing shift away from projects controlled by a single vendor and back toward community and collaboration. While some might expect that to mean increased adoption of strong copyleft licenses – given that they are associated with collaborative development projects such as GNU and the Linux kernel – the charts above indicate a shift towards non copyleft.

As previously noted, while free software projects utilize strong copyleft to ensure that the software in question remains open (or as Bradley M Kuhn recently put it, to keep developers “honest”), vendors using the open core licensing strategy use strong copyleft licenses, along with copyright ownership, to ensure that only they have the opportunity to take it closed.

Either way, strong copyleft is used as a means of control on the code and the project, and our analysis backs up Ian’s contention that there is a trend away from control and towards more permissive non-copyleft licenses.

This is part of what we called the fourth stage of commercial open source business strategies and is being driven by the increased engagement of previously closed-source vendors with open source projects.

The fourth stage is about balancing the ability to create closed source derivatives with collaborative development through multi-vendor open source projects and permissive licensing, and as such it not only avoids the need to control a project through licensing, it actively discourages control through licensing.

That is why, in my opinion, the decline of the copyleft licenses has only just begun.

*The method is not perfect, since it plots the license being used today against the year of formation, and as such does not reflect licensing changes in the interim. It does provide us with an overview of general historical trends, however.

