451 CAOS Theory *
A blog for the enterprise open source community

How soon is now? Corporate contributions and open source innovation in the context of NoSQL

, April 26, 2010 @ 10:54 am ET

In my role as part of The 451 Group’s Information Management practice I have recently initiated coverage on the various “NoSQL” databases, which are providing a fresh challenge to conventional relational databases (clients can get a good introduction to our coverage here, while non-clients can also see some of my thinking aloud over at our Too Much Information blog).

The rise of the NoSQL movement is also highly relevant in the context of open source software, however, especially in relation to two key issues related to open source software.

1/ The (lack of) corporate user contributions
2/ Open source as a source of innovation (as opposed to disruption)

NoSQL is very much a user-led phenomenon and has occurred as the likes of Google, Amazon, Facebook, LinkedIn and Twitter have created their own distributed data management technologies to overcome the fact that traditional database products were not able to match their performance and scalability requirements.

No all NoSQL databases are the product of companies that we would traditionally think of as users rather than developers, and not all NoSQL databases are open source, but there are a large number of projects that fulfill both criteria: such as Apache Cassandra (which originated at Facebook), Apache Hbase (Yahoo), Hypertable (Zvents), Voldemort (LinkedIn) and FlockDB (Twitter).

Meanwhile there are a number of vendors and projects focused on adding persistence, replication, index and query capabilities to memcached, which was originally created by Danga Interactive to solve its database scalability issues.

This is also (mostly) not a matter of businesses creating projects in house and then simply throwing the code over the wall. At last week’s NoSQL EU event in London, Twitter’s analytics lead, Kevin Weill, discussed how Twitter is working with Digg to create real-time analytics for Cassandra. Kevin also recently Tweeted (naturally enough) about Hadoop-LZO, a project to bring splittable LZO compression to Hadoop, on which Twitter is collaborating with Cloudera and Facebook.

There are plenty of other examples of contributions being made by Twitter, Facebook, Digg and LinkedIn on their own open source pages, but in many ways the biggest thing here is not the individual contributions but the commitment to the overall culture of contribution and collaboration.

It is often said that open source developers begin by scratching their own itch, and that is most definitely true when we look at the motivations behind the creation of projects by the companies above, but there is also a culture and clear understanding that there is much to gain from collaboration.

The NoSQL technologies also undermine the suggestion that while open source can be used to commoditize established markets it is not good an innovation. While the likes of Cassandra and Voldemort – not to mention Neo4J, Redis, CouchDB, Riak and MongoDB – are undoubtedly operating within a larger established market, the longer we look at NoSQL the clearer it is that far from commoditizing an established market these technologies are being used to innovate beyond the realms of the established relational database and establish new database market segments.

Permalink | Technorati Links | Bookmark on del.icio.us | digg it
Comments (2) Categories: Software

2 Responses to “How soon is now? Corporate contributions and open source innovation in the context of NoSQL”

  1. I think of our work on the CouchDB project as more akin to the teams that developed the original web-browser and http servers. The rise of the web is another example of open-source that was innovative and focussed on creating new markets, not commoditizing old ones. CouchDB (or something inspired by CouchDB) will fundamentally reshape the way people interact online, over the next decade, or at least that is our hope.

  2. [...] to me – not for the first time – that these issues might be related. I previously noted that in the data management space we are seeing the Apache Hadoop ecosystem the various NoSQL [...]