451 Research Hadoop survey is now live

If you’re using or considering using Hadoop, please help shape our understanding of global Hadoop usage by taking our 2013 Hadoop survey, which can be found at http://www.surveymonkey.com/s/451Hadoop

The aim of this survey is to identify trends in Hadoop usage, as well as attitudes to Hadoop as it relates to data warehousing.

There are a minimum of 15 questions to answer, and a maximum of 24 (including three optional questions) depending on your organisation’s level of adoption, and the entire survey should take no longer than fifteen minutes to complete.

Some of the specific aspects covered by the survey are:

  • Current and planned Hadoop usage
  • Responsibility for managing Hadoop clusters
  • Preferred infrastructure for Hadoop deployments
  • Hadoop and the data warehouse
  • Potential Hadoop improvements
  • Hadoop-as-a-Service
  • Hadoop hardware
  • Alternative file systems
  • SQL-on/in-Hadoop

All individual responses are of course confidential. The results will be published as part of a major research report due during Q4 which will include market sizing estimates for the analytic database sector, as well as Hadoop. The full report will be available to 451 Research clients, while the results of the survey will also be made freely available.

Thank you in advance for your participation.

http://www.surveymonkey.com/s/451Hadoop

Your chance to define the “state of MySQL”

We are very honoured to have been asked to give a “state of the MySQL” keynote presentation at the Percona Live MySQL Conference and Expo in April.

While this will not be in any way an official “state of the dolphin” presentation, I think it is fitting given the expansion of the MySQL ecosystem that the Percona Live event includes an independent perspective on the state of MySQL. The full title of the presentation – MySQL, YourSQL, NoSQL, NewSQL – the state of the MySQL ecosystem – reflects that.

We want to present an independent perspective on the health of the MySQL ecosystem in 2013, drawing on our research and analysis, as well as the views of the participants in that ecosystem.

You have a chance to directly influence the content of the presentation by taking part in our 2013 Database survey.

The aim of this survey is to identify trends in database usage, as well as changing attitudes to MySQL following its acquisition by Oracle, and the competitive dynamic between MySQL and other databases, including NoSQL and NewSQL technologies, as well as MariaDB, Percona Server and other MySQL variants.

There are just 15 questions to answer, spread over five pages, and the entire survey should take less than ten minutes to complete.

All individual responses are of course confidential. The results will be published as part of a major research report due during Q2.

The full report will be available to 451 Research clients, while the results of the survey will also be made freely available via the keynote presentation.

Thanks in advance for your participation. We’re looking forward to analyzing and presenting the results. Once again, you can find the the survey at http://bit.ly/451db13

Our 2013 Database survey is now live

451 Research’s 2013 Database survey is now live at http://bit.ly/451db13 investigating the current use of database technologies, including MySQL, NoSQL and NewSQL, as well as traditional relation and non-relational databases.

The aim of this survey is to identify trends in database usage, as well as changing attitudes to MySQL following its acquisition by Oracle, and the competitive dynamic between MySQL and other databases, including NoSQL and NewSQL technologies.

There are just 15 questions to answer, spread over five pages, and the entire survey should take less than ten minutes to complete.

All individual responses are of course confidential. The results will be published as part of a major research report due during Q2.

The full report will be available to 451 Research clients, while the results of the survey will also be made freely available via a
presentation at the Percona Live MySQL Conference and Expo in April.

Last year’s results have been viewed nearly 55,000 times on SlideShare so we are hoping for a good response to this year’s survey.

One of the most interesting aspects of a 2012 survey results was the extent to which MySQL users were testing and adopting PostgreSQL. Will that trend continue or accelerate in 2013? And what of the adoption of cloud-based database services such as Amazon RDS and Google Cloud SQL?

Are the new breed of NewSQL vendors having any impact on the relational database incumbents such as Oracle, Microsoft and IBM? And how is SAP HANA adoption driving interest in other in-memory databases such as VoltDB and MemSQL?

We will also be interested to see how well NoSQL databases fair in this year’s survey results. Last year MongoDB was the most popular, followed by Apache Cassandra/DataStax and Redis. Are these now making a bigger impact on the wider market, and what of Basho’s Riak, CouchDB, Neo4j, Couchbase et al?

Additionally, we have been tracking attitudes to Oracle’s ownership of MySQL since the deal to acquire Sun was announced. Have MySQL users’ attitudes towards Oracle improved or declined in the last 12 months, and what impact will the formation of the MariaDB Foundation have on MariaDB adoption?

We’re looking forward to analyzing the results and providing answers to these and other questions. Please help us to get the most representative result set by taking part in the survey at http://bit.ly/451db13

On the rise and fall of the GNU GPL

Back in 2011 we caused something of a stir, to say the least, when we covered the trend towards permissive licensing at the expense of reciprocal copyleft licenses.

Since some people were dubious of Black Duck’s statistics, to put it mildly, we also validated our initial findings, at Bradley M Kuhn’s suggestion, using a selection of data from FLOSSmole, which confirmed the rate of decline in the proportion of projects using the GPL family of licenses between October 2008 and May 2011.

Returning to Black Duck’s figures, we later projected that if the rate of decline continued the GPL family of licenses (including the LGPL and AGPL) would account for only 50% of all open source software by September 2012.

As 2012 draws to a close it seems like a good time to revisit that projection and check the latest statistics.

I will preface this with an admission that yes, we know these figures only provide a very limited perspective on the open source projects in question. A more rounded study would look at other aspects such as how many lines of code a project has, how often it is downloaded, its popularity in terms of number of users or developers, how often the project is being updated, how many of the developers are employed by a single vendor, and what proportion of the codebase is contributed by developers other than the core committers. Since that would involve checking all these for more than 300,000 projects I’m going to pass on that.

Additionally, while all that is true, it does not mean that there is no value in examining the proportion of projects using a certain license. I am more interested in what the data does tell us, than what it doesn’t.

Data sources:
We analysed two distinct data sources for our previous analysis: Black Duck’s license data and a selection of data collected by FLOSSmole. Specifically we chose data from Rubyforge, Freecode (fka Freshmeat), ObjectWeb and the Free Software Foundation because those were the only sets for which historical (October 2008) data was available in mid 2011. For this update we have to use FLOSSmole’s data from September 2012 since the November 2012 dataset for the Free Software Foundation is incomplete. It is not possible to get a picture of GPLv2 traction using this FLOSSmole data since the majority of projects on Freecode are labelled “GPL” with no version number. In addition, for this update we have also looked at FLOSSmole data from Google Code, comparing datasets for November 2011 and November 2012. to get a sense of the trends on a newer project hosting site.

Black Duck’s data
According to Black Duck’s data the proportion of projects using the GNU GPL family of licenses declined from 70% in June 2008 to 53.24% today. The first thing to note therefore is that the rate of decline seen a year ago did not continue, and that the GNU GPL family of licenses continues to account for more than 50% of all open source software. The rate of the decline of the GNU GPLv2 has actually accelerated over the past year, however, and its usage is now almost the same as the combination of permissive licenses (I went with MIT/Apache/BSD/Ms-PL, you can argue about that last one if you like, but I’ve got to stick with it for consistency) at around 32%.

FLOSSmole’s data
Also in the interests of consistency I should clarify that we made a slight error in our previous calculations relating to the data from FLOSSmole. When we looked at the FLOSSmole data in June 2011 we reported a decline from 70.77% in October 2008 to 59.31% in May 2011. In calculating the data for this update I identified an error and that the figure should have been 62.8% in 2011. So less of a decline, but a decline nonetheless. The figures show that despite the total number of projects increasing from 54,000 in 2011 to 57,069 in September 2012, the proportion of projects using the GNU GPL family of licenses has remained steady at 62.8%. However, the proportion of projects using permissive licenses has grown, from 10.9% in 2008 to 13.4% in 2011 and 13.7% in September 2012.

Google Code data
The data from Google Code involves a much larger data set: 237,810 projects in 2011 and 300,465 in 2012. It also presents something problem since one of the choices on Google Code is dual-licensing using the Artistic License/GPL. Including these projects in the GNU GPL family count we see that the proportion of projects hosted on Google Code using the GNU GPL family of licenses declines from 54.7% in November 2011 to 52.7% in November 2011. Interestingly though the proportion of projects using permissive licenses also fell, from 38% in 2011 to 37.1% today. As a side note, the use of “other open source licenses” grew from 2.0% in 2011 to 4.3% in 2012.

What does it all mean? You can read as much or as little into the statistics as you wish. Since I am fed up with being accused of being a shill for providing analysis of the numbers I won’t bother to do so on this occasion – you are perfectly free to figure it out for yourselves.

Here’s everything in a single chart:

CouchDB – sink or swim?


CouchDB – up a creek without a paddle? Image source: bobbyfeind on Flickr

Almost a year ago Apache CouchDB creator Damien Katz announced that he would no longer be contributing to the CouchDB document database project he had created, choosing instead to focus on the development of Couchbase Server 2.0, which united CouchDB with Membase Server.

While the abandonment of an open source project by the person that created it is by no means unprecedented it is still unusual enough to warrant a look at what has happened to CouchDB in the year that followed.

Surviving or thriving?

The first point to make is that the survival of CouchDB following Katz’s departure was never in doubt, thanks to the fact that it is an Apache Foundation project. One of the benefits of the foundation model is that it doesn’t depend on a dominant developer or vendor to keep a project moving forward.

Although it briefly appeared that Cloudant would fulfil the role of the major corporate backer of CouchDB with its BigCouch clustered CouchDB technology after Couchbase discontinued its own CouchDB distribution, the company instead refocused its attention on its CouchDB- and BigCouch-based managed service.

While developers from both Couchbase and Cloudant continue to develop to the project Apache CouchDB doesn’t have a lead corporate backer, nor does it need one. According to factoids gathered by Ohloh, there were 30 contributors to the Apache CouchDB project in the past 12 months, up from 18 in the prior 12 months, and placing CouchDB in the top 2% of all project teams on Ohloh.

The question is not whether CouchDB is surviving, however, but whether it is thriving. That increase in contributor count would suggest so, but that’s by no means the full story. In contrast, the number of commits per month has declined in the past 12 months, representing, as Ohloh describes it, “a substantial decrease in development activity”. As the related chart illustrates, in fact, activity has pretty much flatlined since the beginning of the year.


Source: Ohloh

This should not be altogether surprising since the latest release went GA in April.

In response to a request for comment, a spokesperson on behalf of the Apache CouchDB PMC stated:

“Despite an unsettled start to the year, the CouchDB project and the
surrounding community continue to grow and evolve, with the release of
1.2.0 earlier this year, and the forthcoming 1.3.0, currently being
prepared for release
. 1.3.0 includes in the last year alone, over 221
commits on the just the master branch, comprising 167 files changed,
5745 insertions, 2248 deletions — solid progress for a project with
22,000 lines of code total.”

Additionally, while the start of that flatline coincides with Katz’s departure from the project, it is not clear that the two are actually related. Ohloh figures indicate that Katz hadn’t actually committed code to the project since August 2010 and is only the eighth all-time most active committer to the project.

It is clear that there is still a lot of activity ongoing in the Apache CouchDB community, with the PMC citing rcouch, bigcouch, PouchDB, TouchDB frameworks for both iOS and Android, a Mac OS X binary installation, and
GeoCouch.

The PMC spokesperson added:

“Structurally, the project has added both committers and grown the
project management committe, and has been having regular meetings
through the last 2 months to improve communication within the team,
and help steer the community. A roadmap has been put together, and
Ubuntu-style time-scheduled releases are planned for 2013 to keep the
good oil flowing.”

However, in assessing the health of Apache CouchDB, we must look at adoption trends, as well as project activity.

Waving or drowning?

Searching mailing list archives using MarkMail indicates that there has been a decline in the number of messages to the developer, user, commits mailing lists in the past 12 months, although with increased activity on the latter since July.

Additionally, figures from Indeed.com suggest that job activity related to CouchDB saw a sharp decline in the early months of the year, although also a recovery in recent months.


couchdb Job Trends graph

couchdb Job Trends Couchdb jobs

However, that activity is perhaps best viewed in the context of a comparison with another major NoSQL project – MongoDB for instance – which reveals that CouchDB job postings have more or less level-off since the start of the year.


couchdb, mongodb Job Trends graph

couchdb, mongodb Job Trends Couchdb jobsMongodb jobs

We have also been tracking the traction of NoSQL projects via searches of LinkedIn member profiles. The latest figures, due to be published later this week, show that mentions of CouchDB in LinkedIn member profiles grew over 139% between December 2011 and today.

That sounds good, but again must be viewed in the context of the rest of the NoSQL ecosystem. The statistics show that mentions of a selection of other major NoSQL databases grew significantly faster in the same period.

So what are we to make of all the evidence. Clearly the Apache CouchDB project will survive, and the lack of updates in 2012 is not a major concern, although the level of interest in the project is not growing as fast as other NoSQL technologies. My personal gut feel is that Apache CouchDB has the potential to become the PostgreSQL of the NoSQL generation: a solid, mature projects with a large community of developers and ecosystem of associated vendors that is often over-shadowed by more commercially-oriented alternatives but has a loyal and committed user-base.

Key to this comparison bearing up on longterm scrutiny will be the ability of the Apache CouchDB project to increase and maintain the level of development so that the Lines of code chart, above, better resembles that of PostgreSQL, below:

The comparison with PostgreSQL is also apt given the departure from the project of its creator. While many people do know the origins of the PostgreSQL project given that the original project leader is one of the most famous database experts in the world, I am sure a lot of PostgreSQL users wouldn’t know or care whether the project’s creator continued to be involved. Similarly, Katz’s departure from Apache CouchDB, while undoubtedly a short-term challenge, appears not to have had a significant impact on the project’s ongoing development.

Who doesn’t love Hadoop?

I tweeted recently that I had received a query from a journalist about whether Hadoop needs to go closed source to be fit for the enterprise.

Now that the resulting report has been published we can see who was behind that suggestion, with Brian Christian, Zettaset chief technology officer, arguing that “The community serves its needs, not the needs of the enterprise.”

The report also includes some, although naturally not all, of the response I provided to this suggestion, and since the report leaves a few misconceptions unanswered I thought I’d publish my more detailed response.

Hadoop is ‘free like a puppy’
Hadoop currently requires a degree of expertise to configure, manage and operate, but that statement is true for any serious data management technology. Apache Hadoop is relatively immature compared to some other established data management technologies, particularly in areas such as high availability, security and manageability. However, the development community is well-aware of its shortcomings and advances in all areas are currently in early access and should be ready for production deployment later this year.

Hadoop does require a degree of expertise to operate, and that expertise is currently at a premium and comes at a cost. However, all the major Hadoop supporters are working to train up a larger pool of Hadoop developers and administrators. Cloudera alone has trained more than 12,000 people to use Hadoop.

Apache Hadoop is a complex combination of data management technologies and is not without its challenges, which have arguably led to some enterprise taking longer to move from development and testing to deployment than they might have initially expected. However, the Hadoop development community is clearly committed to making Hadoop more suitable for enterprise adoption.

Hadoop is ‘driven by enthusiasts’
The idea that the open source community is populated by individual developers with no concern for enterprise requirements is completely bogus. The Apache Software Foundation has a proven history of developing enterprise-grade software projects through a collaborative development process that combines vendors, users and other interested parties.

The biggest contributors to Apache Hadoop include vendors such as Hortonworks, Cloudera, MapR and IBM, all of which have a vested interest in driving greater enterprise adoption, as well as users such as Yahoo, Facebook and eBay, all of which stand to gain from its improved capabilities.

On a broader note, open source development in general has a proven track record of producing enterprise-grade software. You only have to look at the success of Linux to see how rapidly open source software can be adopted by enterprises once it reaches a suitable level of maturity and has the support of commercial vendors. Hadoop is no exception, and is likely to follow in the footsteps of Linux as it matures.

Additionally, we see the open source nature of Hadoop as one of the adoption drivers – as users know that they can avoid vendor lock in and have a choice of providers for their Hadoop training, support and services.

Hadoop may need to be ‘taken out of open source’
There is no reason to believe that a closed source Hadoop would deliver any functionality that could not be developed by the Apache Hadoop community. While a number of vendors offer closed source alternatives for individual components in the Hadoop stack, anyone offering a fully closed source alternative would suffer by not being able to compete with the collaborative development process and competitive commercial ecosystem that the open source development process enables.

In addition it is worth noting that Hadoop, along with other distributed data management projects including many of the NoSQL databases, were initiated by organizations like Google, Amazon and Yahoo in response to the inability of the established data management vendors to fulfil their data management requirements.

The established closed source data management vendors have had plenty of time to develop a ‘better’ Hadoop than Hadoop, and do not lack development resources, but have chosen to collaborate with Hadoop distributors and contribute to Hadoop instead.

A prime example is Microsoft, which in late 2011 abandoned its own Dryad distributed computing project in favour of contributing to Apache Hadoop. This is a sign that Hadoop has already won enough attention to make it difficult for any competing product to gain traction.

While we see vendors offering closed source alternatives for individual components in the Hadoop stack we do not believe that a full closed source alternative would be viable, or desirable from a customer’s perspective. There is no reason to believe that enterprise-grade improvements to Hadoop cannot be delivered by the Apache Hadoop community and the open source development process.

MySQL vs. NoSQL and NewSQL – survey results

Back in January we launched a survey of database users to explore the competitive dynamic between MySQL, NoSQL and NewSQL databases, and to to discover if MySQL usage is really declining – as had been indicated by the results of a prior survey.

The publication of the associated report took longer than expected, mostly because we expanded its scope to include revenue and growth estimates for the MySQL ecosystem, NoSQL and NewSQL sectors respectively, and with that report now published I am pleased to fulfil our promise to share the survey results.

We seem to be having some random embedding issues so for now the results can be found on SlideShare, adapted from the presentation given at OSBC earlier this week. For greater context, we have also included an explanation of each slide, below:

Slide 2: Provides an overview of the associated report – MySQL vs NoSQL and NewSQL 2011:2015, which is available here.

Slide 3: Explains why we launched the report. We once described as the crown jewel of the open source database world, since its focus on Web-based applications, its lightweight architecture and fast-read capabilities, and its brand differentiated it from all of the established database vendors and made for a potentially complementary acquisition. Today, the competitive situation is very different.

Slide 4: Oracle’s MySQL business faces competition from the rest of the MySQL ecosystem, as illustrated in Slide 5, many of which have emerged following Oracle’s acquisition of Sun/MySQL.

Slide 6: The emergence of these alternatives was triggered, in part, by concern about the future of MySQL. A previous 451 survey,conducted in November 2009, showed that there was real concern about the acquisition, with only 17% of MySQL users believing Oracle should be allowed to acquire MySQL.

Slide 7: The 2009 survey also showed that while 82.1% of respondents were already using MySQL, that figure was expected to drop to 72.3% by 2014. That survey was conducted amid a climate of fear, uncertainty and doubt regarding the future of MySQL, and one of the drivers for our current report was to see if that predicted decline occurred.

Slide 8: To put this in context, we asked the current survey sample (which included 205 database users) about their reaction to the acquisition. While the vast majority of MySQL users reported that they continued to use MySQL where appropriate, 5% indicated that they were more inclined to use MySQL, and 26% said they were less inclined to use MySQL. Not surprisingly the proportion of users less inclined to use MySQL was much higher amongst those abandoning MySQL than those sticking with MySQL.

Slide 9: We also asked respondents to rate Oracle’s ownership of MySQL on a range of very good to very bad. Overall, the balance tipped in favour of a negative perception of Oracle’s track record, while there was naturally a more negative perception of Oracle amongst those abandoning MySQL compared to MySQL mainstays. However, the results showed that the percentage of respondents rating the company’s performance ‘very good’ and ‘very bad’ was actually quite similar for both abandoners and mainstays. While those abandoning MySQL are more likely to have a negative perception of Oracle, it is not necessarily safe to assume that Oracle’s actions and strategy are the cause of the abandonment. Clearly there are other competitive forces at work.

Slide 10: Not least the emergence of NoSQL, as illustrated in Slide 11, and NewSQL, as illustrated in Slide 12.

Slide 13: Based on some very high profile examples of projects migrating from MySQL to NoSQL, there is a common assumption that NoSQL and NewSQL pose a direct, immediate threat to MySQL. We believe the competitive dynamic is more complex.

Slide 14: While 49% of those survey respondents abandoning MySQL planned on retaining or adopting NoSQL databases, only 12.7% said they had actually deployed NoSQL databases as a *direct replacement* for MySQL.

Slide 15: In comparison, there is much greater overlap between NewSQL and MySQL, but of a complementary nature. 33% of respondents retaining MySQL had considered, tested or deployed NewSQL database technologies, while approximately 75% of the NewSQL revenue for 2011 is from vendors that we also consider part of the MySQL ecosystem.

Slide 16: The results of our 2012 survey show that MySQL is currently the most popular database amongst our survey sample, used by 80.5% of respondents today.

Slide 17: However, it’s popularity is again expected to decline to 2014 and 2017. This indicates an accelerated decline in the use of MySQL, compared the findings of our 2009 survey. While that survey was conducted amid a climate of fear, uncertainty and doubt regarding the future of MySQL we are not aware of any specific reason why the 2012 sample, which was self-selecting, should have a disproportionately negative attitude to MySQL or Oracle.

Slide 18: MySQL’s predicted decline of 26.4 percentage points between 2012 and 2017 compares to a predicted decline of just 9.3 percentage points for Microsoft SQL Server, and only 5.9 percentage points for Oracle Database. In comparison, MariaDB, Apache Cassandra and Apache CouchDB are expected to increase in usage by 3.0 percentage points or greater between 2011 and 2017.

Slide 19: Although alternative MySQL distributions including MariaDB, Drizzle and Percona Server are expected to see increased adoption over the next five years, they are not growing at the same rate that MySQL is declining.

Slide 20: So where are those abandoning MySQL going to? Looking specifically at the 55 MySQL users who expect to abandon it by 2017 (which is admittedly a small sample, and therefore not to be considered statistically relevant) we see that PostgreSQL is the most popular database being retained or adopted over the same period, followed by Microsoft SQL Server, Oracle, MongoDB, and MariaDB.

Slide 21: This only tells part of the story, however. Just because a company is retaining Oracle Database, for example, does not necessarily mean that Oracle Database is being used as a replacement for the abandoned MySQL. We therefore also specifically asked survey respondents which databases they had considered, tested or deployed as a direct replacement for MySQL. The response from the 55 respondents planning to abandon MySQL again saw PostgreSQL, MariaDB and MongoDB as the most popular answers, followed by Apache CouchDB and Apache HBase.

Slide 22: While NoSQL database were well-represented in this list, we saw that anyone considering NoSQL considered multiple NoSQL databases. Per respondent, NoSQL databases were the least considered of all alternatives by existing MySQL users.

Slide 23: The survey results suggest that MongoDB is the most often considered, tested or deployed as a replacement or complement for MySQL, followed by Apache CouchDB, Apache HBase, Apache Cassandra/DataStax, and Redis.

Slide 24: NewSQL technologies that improve the scalability and performance of MySQL scored well, with eight of the top 10 most considered NewSQL technologies being directly complementing MySQL. Of the other two, one (Drizzle) is a derivative of MySQL, and the other (Clustrix) can also be used in a complementary manner as part of a MySQL cluster, although in the long-term is positioned as a direct alternative.

Slide 25: MariaDB is the member of the MySQL ecosystem most often considered, tested or deployed as a replacement or complement for MySQL, followed by Continuent Tungsten, Percona Server, MySQL Cluster, and Amazon RDS.

Slide 26: More than half of all MySQL users had considered, tested or deployed another relational database as a direct replacement, while over 40% had considered, tested or deployed a caching technology to complement MySQL. The memcached caching technology was the most widely-deployed of all the technologies we asked about, followed closely by PostgreSQL, which supported anecdotal evidence that a number of MySQL users are migrating to the other major open source transactional database.

Slide 27: For the record, the survey had 205 respondents. Primary job roles among respondents included: director/manager of IT infrastructure (18.0%); architect/engineer (17.6%); developer/programmer (15.6%); database/systems administrator (14.6%); consultant (14.1%); VP level or above (13.7%); analyst (3.4%); and line-of-business manager (2.9%).

Further survey analysis and perspective on the competitive dynamic between MySQL, NoSQL and NewSQL is available in the MySQL vs NoSQL and NewSQL report, which also includes market sizing and growth predictions for the three segments.

451 Research delivers market sizing estimates for NoSQL, NewSQL and MySQL ecosystem

NoSQL and NewSQL database technologies pose a long-term competitive threat to MySQL’s position as the default database for Web applications, according to a new report published by 451 Research.

The report, MySQL vs. NoSQL and NewSQL: 2011-2015, examines the competitive dynamic between MySQL and the emerging NoSQL non-relational, and NewSQL relational database technologies.

It concludes that while the current impact of NoSQL and NewSQL database technologies on MySQL is minimal, they pose a long-term competitive threat due to their adoption for new development projects. The report includes market sizing and growth estimates, with the key findings as follows:

• NoSQL software vendors generated revenue* of $20m in 2011. NoSQL software revenue is expected to rapidly grow at a CAGR of 82% to reach $215m by 2015.

• NewSQL software vendors generated revenue* of $12m in 2011 (of which $9m is also considered MySQL ecosystem revenue). NewSQL revenue is also expected to grow rapidly at a CAGR of 75% to reach $112m by 2015 (including $56m in MySQL ecosystem revenue).

• The MySQL support ecosystem generated revenue* of $171m in 2011 (including $9m from NewSQL technologies). MySQL ecosystem revenue is expected to grow at a CAGR of 40% to reach $664m by 2015 (including $56m in NewSQL revenue).

“The MySQL ecosystem is now arguably more healthy and vibrant than it has ever been, with a strong vendor committed to the core product, and a wealth of alternative and complementary products and services on offer to maintain competitive pressure on Oracle,” commented report author Matthew Aslett, research manager, data management and analytics, 451 Research.

“However, the options for MySQL users have never been greater, and there is a significant element of the MySQL user base that is ready and willing to look elsewhere for alternatives,”

As well as revenue and growth estimates, the report also includes a survey of over 200 database administrators, developers, engineers and managers. The survey findings include:

• While the majority of MySQL users continue to use MySQL where appropriate, the use of MySQL is expected to decline from 80.5% of survey respondents today to 62.4% by 2014 and just 54.1% by 2017.

• Despite the emergence of NoSQL and NewSQL database products, the most common direct replacement for MySQL among survey respondents today is PostgreSQL, which is also the focus of a recent burst of commercial activity.

• While 49% of those survey respondents abandoning MySQL planned on retaining or adopting NoSQL databases, only 12.7% of MySQL abandoners said they had actually deployed NoSQL databases as a direct replacement for MySQL.

“While there have been some high profile example of users migrating from MySQL to NoSQL database, the huge size of MySQL installed base means that these projects are comparatively rare,” commented Aslett.

The report describes how NoSQL database technologies are largely being adopted for new projects that require additional scalability, performance, relaxed consistency and agility, while NewSQL database technologies are, at this stage, largely being adopted to improve the performance and scalability of existing databases, particularly MySQL.

“NoSQL and NewSQL have not made a significant impact on the MySQL installed base at this stage but MySQL is no longer the de facto standard for new application development projects,” said Aslett. “As a result, NoSQL and NewSQL pose a significant long-term competitive threat to MySQL’s dominance.”

MySQL vs. NoSQL and NewSQL: 2011-2015 is now available to existing 451 Research subscribers. Non-clients can apply for trial access to 451 Research’s content.

*451 Research’s analysis of MySQL, NoSQL and NewSQL revenue is based on a bottom-up analysis of each participating vendor’s current revenue and growth expectations, and includes software license and subscription support revenue only. Revenue line items not included in these figures include hardware associated with the delivery of these services, revenue related to applications deployed on these databases, traditional hosting services, or systems integration performed by the vendors or other third parties.

The revenue estimates do not take into account unpaid usage of open source licensed MySQL, NoSQL and NewSQL software, and therefore represent only a fraction of the total addressable market. Based on the above revenue figures and other analysis, 451 Research estimates that the total value of the MySQL ecosystem in terms of ‘displaced’ proprietary software might equate to $1.7bn in 2011, while the NoSQL market had a displaced value of $195.7m and the NewSQL sector a displaced value of $99.4m.

Back to the future of commercial open source

It’s been tempting to write a post about open source licensing trends and how they relate to commercial business strategies, given ongoing interest in our previous posts about the relative decline of the GPL.

Every time I start to write a post though I realise that I’d just be repeating myself, most notably The future of commercial open source business strategies from December 2011, but also Control and Community – and the future of commercial open source strategies from late 2010.

You can trace the origins of the theories and research in those posts back to The golden age of open source? in August 2010, and even further to Commercial open source business strategies in 2009 and beyond from early 2009.

That post in particular contains the core elements about why we believed we were at a tipping point with regards to commercial open source strategies, prompting the shift from vendor-led strategies that emphasised control via copyleft licenses, to community-led strategies that emphasised collaboration via permissive licenses.

The one aspect that those posts didn’t cover is what happens after this shift. That is a question that has recently been addressed by Simon Phipps, who predicts that the pendulum will swing to the centre and weak-copyleft licenses and specifically the recently released MPLv2.

While I don’t dispute the logic of that prediction, I can see nothing in the data that we have previously collected and analysed that indicates a shift to weak-copyleft. As you can see, while there was a strong shift from vendors towards non-copyleft licenses from 2007 onwards, we have seen no such shift with regards to weak-copyleft.

Which is not to say that it won’t happen – just that we see no evidence of it right now, and that we would have to see an enormous swing towards weak-copyleft licenses in the next couple of years. It will be interesting to see whether the release of MPLv2 will be the event that triggers that swing.

Announcing the Sixth Annual Future of Open Source Survey

Black Duck Software and North Bridge Venture Partners, in partnership with 451 Research, yesterday announced a collaboration to conduct the sixth annual Future of Open Source Survey.

The survey, an annual bellwether of the state of the open source industry, is supported by more than 20 open source software (OSS) industry leaders and is open to participation from the entire open source community.

The survey results point out market opportunities, identify issues affecting the enterprise adoption of open source, and foreshadow industry trends for 2012 and beyond. Open to the general public today, the survey closes at the end of April.

Survey results will be presented at the Open Source Business Conference (OSBC, May 20 – 21, 2012) at the Hyatt Regency San Francisco – Embarcadero during the keynote panel on opening day. Moderating the panel will be Tim Yeaton, CEO and President, Black Duck Software and Michael Skok, general partner at North Bridge Venture Partners. Yeaton and Skok will be joined by several industry executives including Tom Erickson, CEO, Acquia.

Take the survey here: http://www.zoomerang.com/Survey/WEB22F4B845DQ5

See results of last year’s survey here.

That’s not science: the FSF’s analysis of GPL usage

The Free Software Foundation has responded to our analysis of figures that indicate that the proportion of open source projects using the GPL is in decline.

Specifically, FSF executive director John Sullivan gave a presentation at FOSDEM which asked “Is copyleft being framed”. You can find his slides here, a write-up about the presentation here, and Slashdot discussion here.

Most of the opposition to the earlier posts on this subject addressed perceived problems with the underlying data, specifically that it comes from Black Duck, which does not publish details of its methodology. John’s response is no exception. “That’s not science,” he asserts, with regards to the lack of clarity.

This is a valid criticism, which is why – prompted by Bradley M Kuhn – I previously went to a lot of effort to analyze data from Rubyforge, Freshmeat, ObjectWeb and the Free Software Foundation collected and published by FLOSSmole, only to find that it confirmed the trend suggested by Black Duck’s figures. I was personally therefore happy to use Black Duck’s figures for our update.

John Sullivan is not overly impressed with the FLOSSmole numbers either, noting that while they are verifiable, they do leave a number of questions related to the breadth and depth of the sample, the relative activity of the projects, whether all lines of code and applications should be treated equally, and how packages with multiple licenses are treated.

These are all also valid questions. As we previously noted, a study that *might* satisfy all questions related to license usage would have to take into account how many lines of code a project has; how often it is downloaded; its popularity in terms of number of users or developers; how often the project is being updated; how many of the developers are employed by a single vendor; and what proportion of the codebase is contributed by developers other than the core committers.

John offers some evidence of his own that suggests that the use of the GPL is in fact growing. Anyone hoping for the all-encompassing study mentioned above is in for some disappointment, however. It is based on a script-based analysis of the Debian GNU’Linux distribution codebase.

Nothing wrong with the script-based analysis – but a single GNU/Linux distribution considered to be a representative sample of all free and open source software?

That’s not science.

Last chance to take part in our MySQL/NoSQL/NewSQL survey

Thanks to everyone who has already taken part in our survey exploring changing attitudes to MySQL following its acquisition by Oracle and examining the competitive dynamic between MySQL and other database technologies, including NoSQL and NewSQL.

The response has been great and even a quick look at the results makes for interesting reading, particularly in the light of our previous findings which indicated declining MySQL usage.

I am really looking forward to having the opportunity for a deep dive into the results and break out the figures to get a better understanding of the potential impact of alternative MySQL distribution and support providers, as well as NoSQL and NewSQL, on continued usage of MySQL.

The survey results will be made freely available on our blogs, as well as being included in a long format report containing our additional analysis and research related to the MySQL ecosystem and competitive dynamic.

Right now, however, is your last chance to contribute to the survey and get your voice heard. There are just 12 questions to answer, spread over four pages, and the entire survey should take no longer than five minutes to complete. All individual responses are of course confidential.

The survey will close in 24 hours.

Is MySQL usage really declining?

If you’re a MySQL user, tell us about your adoption plans by taking our current survey.

Back in late 2009, at the height of the concern about Oracle’s imminent acquisition of Sun Microsystems and MySQL, 451 Research conducted a survey of open source software users to assess their database usage and attitudes towards Oracle.

The results provided an interesting snapshot of the potential implications of the acquisition and the concerns of MySQL users and even, so I am told, became part of the European Commission’s hearing into the proposed acquisition (used by both sides, apparently, which says something about both our independence and the malleability of data).

One of the most interesting aspects concerned the apparently imminent decline in the usage of MySQL. Of the 285 MySQL users in our 2009 survey, only 90.2% still expected to be using it two years later, and only 81.8% in 2014.

Other non-MySQL users expected to adopt the open source database after 2009, but the overall prediction was decline. While 82.1% of our sample of 347 open source users were using MySQL in 2009, only 78.7% expected to be using it in 2011, declining to 72.3% in 2014.

This represented an interesting snapshot of sentiment towards MySQL, but the result also had to be taken with a pinch of salt given the significant level of concern regarding MySQL future at the time the survey was conducted.

The survey also showed that only 17% of MySQL users thought that Oracle should be allowed to keep MySQL, while 14% of MySQL users were less likely to use MySQL if Oracle completed the acquisition.

That is why we are asking similar questions again, in our recently launched MySQL/NoSQL/NewSQL survey.

More than two years later Oracle has demonstrated that it did not have nefarious plans for MySQL. While its stewardship has not been without controversial moments, Oracle has also invested in the MySQL development process and improved the performance of the core product significantly. There are undoubtedly users that have turned away from MySQL because of Oracle but we also hear of others that have adopted the open source database specifically because of Oracle’s backing.

That is why we are now asking MySQL users to again tell us about their database usage, as well as attitudes to MySQL following its acquisition by Oracle. Since the database landscape has changed considerably late 2009, we are now also asking about NoSQL and NewSQL adoption plans.

Is MySQL usage really in decline, or was the dip suggested by our 2009 survey the result of a frenzy of uncertainty and doubt given the imminent acquisition. Will our current survey confirm or contradict that result? If you’re a MySQL user, tell us about your adoption plans by taking our current survey.

451 Research MySQL/NoSQL/NewSQL survey

I’ve just launched a new survey that should be of interest if you are currently using or actively considering MySQL or any of the NoSQL or NewSQL offerings

The aim of the survey is threefold:

– identify trends in database usage over time
– explore changing attitudes to MySQL following its acquisition by Oracle
– examine the competitive dynamic between MySQL and other database technologies, including NoSQL and NewSQL

There are just 12 questions to answer, spread over four pages, and the entire survey should take no longer than five minutes to complete.

All individual responses are of course confidential. The results will be published as part of a major research report due at the end of Q1. Thanks in advance for your participation.

The survey can be found at: http://www.surveymonkey.com/s/MySQLNoSQLNewSQL

451 CAOS Links 2011.12.20

Red Hat revenue hits $290m. New CEOs for Cloudant and Lucid Imagination. And more.

# Red Hat announced Q3 revenue of $290m, up 23%, and net income of $38.2m, compared with $26.0m a year ago.

# Cloudant raised $2.1m in an equity and stock funding and named Derek Schoettle as its new chief executive officer.

# The Apache Software Foundation published an open letter explaining the progress of Apache OpenOffice (Incubating) and reinforcing its position on trademarks and fundraising.

# Lucid Imagination named Paul Doscher CEO.

# The founder of the ownCloud project, Frank Karlitschek, formed a commercial entity, ownCloud Inc, with former SUSE/Novell executive Markus Rex.

# Adobe published the proposal for Flex to become an Apache Incubator project.

# Actuate launched BIRT Performance Analytics.

# Uhuru Software introduced Uhuru .NET Services for Cloud Foundry.

# Palantir released its first open source code with the launch of two projects: Cinch and Sysmon.

# Quest Software introduced Quest One Privilege Manager for Sudo.

# CollabNet announced that Git is now available as a hosted offering on its Codesion cloud development platform.

# The Outercurve Foundation published the results of a survey of software developers about their open source coding habits.

# Basho Technologies introduced an early version of Riaknostic, a diagnostic system for Riak.

The future of commercial open source business strategies

The reason we are confident that the comparative decline in the use of the GNU GPL family of licenses and the increasing significance of complementary vendors in relation to funding for open source software-related vendors will continue is due to the analysis of our database of more than 400 open source software-related vendors, past and present.

We previously used the database to analyze the engagement of vendors with open source projects for our Control and Community report, plotting the strategies used by the vendors against the year in which they first began to engage with open source projects to get an approximate view of open source-related strategy changes over time.

For example, we found that the engagement of vendors with projects that used strong copyleft licenses peaked in 2006, while the engagement of vendors with projects using non-copyleft licenses had been rising steadily since 2002.

Analysis of our updated database shows that the the number of new vendors engaging with open source projects in each year has risen steadily in recent years, from 26 in 2008 to 44 in 2011. However, as noted last week, we have also seen a shift towards ‘complementary vendors’ – those that are dependent on open source software to build their products and services, even though those products and services may not themselves be open source.

2010 was the first year in which we saw more complementary vendors engage with open source projects than open source specialist, and that trend accelerated in 2011.

As previously explained, complementary vendors were responsible for over 30% of open source software-related funding raised in 2011, and we should expect that proportion to remain high given that over 57% of the vendors engaging with open source in 2011 were complementary vendors.

We have also seen that complementary vendors are more likely to engage with projects with non-copyleft licenses (38% of complementary vendors have engaged with projects with non-copyleft licenses, compared to 24% that have engaged with projects with strong copyleft licenses).

If we look at all 400+ vendors in our database in terms of open source software license preference, the trend towards new vendors engaging with non-copyleft licenses is clear.

There has been a strong shift from vendors towards non-copyleft licenses in recent years, accelerated in 2011 by the likes of Apache Hadoop and OpenStack in particular. This does not mean that the number of projects using strong copyleft licensing has decreased (although as we previously saw the proportion of projects using the GPL family of licenses has declined).

It is indicative, we believe, of the shift away from specialist open source vendors using vendor-led projects and strong copyleft licenses towards multi-vendor collaborative projects and proprietary implementations of open source code, however.

This trend should not really surprise anyone. For some time we have seen open source becoming part of the fabric of modern software development and licensing strategies, rather than a competitive differentiator. Back in 2009 we predicted the increased importance of business strategies that relied on vendor-led development communities, rather than projects dominated by a single vendor.

We called this “open source 4.0” and later suggested that it might be considered the golden age of open source, based on our belief that vendors had learned that they stand to gain more from collaborating on open source projects and differentiating at another stage in the software stack than they do from attempting to control open source projects.

Updating the results of our analysis to the end of 2011 and 400+ vendors indicates that, from the perspective of the commercial adoption of open source business strategies at least, we were not far off.

Some might not consider the proliferation of multi-vendor open source communities and proprietary distributions of open source software as the peak of achievement for open source. Each is of course entitled to come to their own conclusions about the implications.

Our perspective, as always, is that open source methodologies present a potentially disruptive, and also valuable, asset that complements the way both vendors and enterprise IT organizations conduct their businesses.

Our analysis indicates, however, that open source methodologies are increasingly being employed by ‘complementary vendors’ with a leaning towards more permissive licensing.

Our Total Data report is now totally available

…and it’s totally awesome. For more details of our Total Data report, and how to get it, see our Too Much Information blog.

VC funding for OSS hits new high. Or does it?

One of the favourite blog topics on CAOS Theory blog over the years has been our quarterly and annual updates on venture capital funding for open source-related businesses, based on our database of over 600 funding deals since January 1997 involving nearly 250 companies, and over $4.8bn.

There are still a few days left for funding deals to be announced in 2011 but it is already clear that 2011 will be a record year. $672.8m has been invested in open source-related vendors in 2011, according to our preliminary figures, an increase of over 48% on 2010, and the highest total amount invested in any year, beating the previous best of $623.6m, raised in 2006.

Following the largest single quarter for funding for open source-related vendors ever in Q3, Q4 was the second largest single quarter for funding for open source-related vendors ever, as $230.4m was invested in companies including Cloudera, Hortonworks, and Rapid7.

As with Q3, however, the list of vendors presents us with something of an existential dilemma, as we see an increasing amount of activity by what we have referred to as ‘complementary vendors’ – those that are dependent on open source software to build their products and services, even though those products and services may not themselves be open source – as opposed to open source specialists.

The list of complementary vendors has grown rapidly in 2011, particularly around projects such as OpenStack and Apache Hadoop. If we examine the figures in more detail we find that over 30% of the funding raised in 2011 was raised by complementary vendors, compared to just 4% in 2006.

In fact, as the chart below indicates, VC funding for specialist open source vendors in 2011 was actually less than that in 2006 and 2008, and only marginally up on 2010, when again just 4% of funding went to complementary vendors.

The low amount of funding for complementary vendors in 2010 shows that the significance of complementary vendors is not growing at a constant rate, although for reasons that will become clear when we publish a follow-up post on the latest trends regarding the engagement of vendors with open source projects, we do expect that the proportion of funding related to complementary vendors is more likely to increase in the future, rather than decline.

This has implications for the ongoing trends related to open source software licensing, as covered yesterday. Examining our database of over 400 open source-related vendors – funded and unfunded, complementary and specialist – indicates that specialist vendors are much more likely to engage with projects using strong copyleft licenses than complementary vendors.

Specifically, our data indicates that 55% of open source specialists have engaged with projects that use strong copyleft licenses, while just 20% have engaged with projects with non-copyleft licenses. In comparison, 38% of complementary vendors have engaged with projects with non-copyleft licenses, compared to 24% that have engaged with projects with strong copyleft licenses.

Will will take a more detailed look at the trends related to the engagement of vendors with open source projects in the concluding part of this series of posts.

On the continuing decline of the GPL

Our most popular CAOS blog post of the year, by some margin, was this one, from early June, looking at the trend towards persmissive licensing, and the decline in the usage of the GNU GPL family of licenses.

Prompted by this post by Bruce Byfield, I thought it might be interesting to bring that post up to date with a look at the latest figures.

NB: I am relying on the current set of figures published by Black Duck Software for this post, combined with our previous posts on the topic. I am aware that some people are distrustful of Black Duck’s figures given the lack of transparency on the methodology for collecting them. Since I previously went to a lot of effort to analyze data collected and published by FLOSSmole to find that it confirmed the trend suggested by Black Duck’s figures, I am confident that the trends are an accurate reflection of the situation.

The figures indicate that not only has the usage of the GNU GPL family of licenses (GPL2+3, LGPL2+3, AGPL) continued to decline since June, but that the decline has accelerated. The GPL family now accounts for about 57% of all open source software, compared to 61% in June.

As you can see from the chart below, if the current rate of decline continues, we project that the GPL family of licenses will account for only 50% of all open source software by September 2012.

That is still a significant proportion of course, but would be down from 70% in June 2008. Our projection also suggests that permissive licenses (specifically in this case, MIT/Apache/BSD/Ms-PL) will account for close to 30% of all open source software by September 2012, up from 15% in June 2009 (we don’t have a figure for June 2008 unfortunately).

Of course, there is no guarantee that the current rate of decline will continue – as the chart indicates the rate of decline slowed between June 2009 and June 2011, and it may well do so again. Or it could accelerate further.

Interestingly, however, while the more rapid rate of decline prior to June 2009 was clearly driven by the declining use of the GPLv2 in particular, Black Duck’s data suggests that the usage of the GPL family declined at a faster rate between June 2011 and December 2011 (6.7%) than the usage of the GPLv2 specifically (6.2%).

UPDATE – It is has been rightfully noted that this decline relates to the proportion of all open source software, while the number of projects using the GPL family has increased in real terms. Using Black Duck’s figures we can calculate that in fact the number of projects using the GPL family of licenses grew 15% between June 2009 and December 2011, from 105,822 to 121,928. However, in the same time period the total number of open source projects grew 31% in real terms, while the number of projects using permissive licenses grew 117%. – UPDATE

As indicated in June, we believe there are some wider trends that need to be discussed in relation to license usage, particularly with regards to vendor engagement with open source projects and a decline in the number of vendors engaging with strong copyleft licensed software.

The analysis indicated that the previous dominance of strong copyleft licenses was achieved and maintained to a significant degree due to vendor-led open source projects, and that the ongoing shift away from projects controlled by a single vendor toward community projects was in part driving a shift towards more permissive non-copyleft licenses.

We will update this analysis over the next few days with a look at the latest trends regarding the engagement of vendors with open source projects, and venture funding for open source-related vendors, providing some additional context for the trends related to licensing.

451 CAOS Links 2011.12.14

Jive goes public. webOS goes open source. Cloud Foundry goes .NET. And more.

# Jive Software started IPO at $12 a share, closing the day up nearly 30%.

# HP announced that it plans to release webOS under an open source license. Details are thin on the ground, although Fedora is reportedly an inspiration. Joel West’s post pretty much summed up my thoughts.

# Tier 3 announced that it has created Iron Foundry, and open source .NET Framework implementation of Cloud Foundry.

# Xeround raised $9m funding for its MySQL-as-a-service cloud database.

# Microsoft released the Windows Azure SDK for Node.js as open source and made available a preview of the Apache Hadoop on Windows Azure, amongst a slew of other open source-related announcements.

# Red Hat, Canonical, Cisco, IBM, Intel, NetApp, and SUSE created the oVirt project, based around Red Hat’s Enterprise Virtualization technology for managing KVM environments.

# Nuxeo announced the availability of Nuxeo Platform 5.5.

# Joyent launched its SmartMachine Appliance for MongoDB.

Red Hat announced JBoss Enterprise Portal Platform 5.2 and JBoss Operations Network 3.0.

# Novell announced the availability of Novell Open Enterprise Server 11.

# Couchbase claimed thousands of open source deployments and 150 commercial deployments, but has rethought its product line-up for 2012, having “confused the heck” out of potential users in 2011.

# Univention released Univention Corporate Server 3.0.

# SuccessBricks announced that its ClearDB distributed MySQL-based database service is now available through Heroku.

# Ember.js is the new name for the SproutCore 2.0 JavaScript framework.

# HEnrik Ingo examined the recent spate of MySQL authentication plug-ins.