Who doesn’t love Hadoop?

I tweeted recently that I had received a query from a journalist about whether Hadoop needs to go closed source to be fit for the enterprise.

Now that the resulting report has been published we can see who was behind that suggestion, with Brian Christian, Zettaset chief technology officer, arguing that “The community serves its needs, not the needs of the enterprise.”

The report also includes some, although naturally not all, of the response I provided to this suggestion, and since the report leaves a few misconceptions unanswered I thought I’d publish my more detailed response.

Hadoop is ‘free like a puppy’
Hadoop currently requires a degree of expertise to configure, manage and operate, but that statement is true for any serious data management technology. Apache Hadoop is relatively immature compared to some other established data management technologies, particularly in areas such as high availability, security and manageability. However, the development community is well-aware of its shortcomings and advances in all areas are currently in early access and should be ready for production deployment later this year.

Hadoop does require a degree of expertise to operate, and that expertise is currently at a premium and comes at a cost. However, all the major Hadoop supporters are working to train up a larger pool of Hadoop developers and administrators. Cloudera alone has trained more than 12,000 people to use Hadoop.

Apache Hadoop is a complex combination of data management technologies and is not without its challenges, which have arguably led to some enterprise taking longer to move from development and testing to deployment than they might have initially expected. However, the Hadoop development community is clearly committed to making Hadoop more suitable for enterprise adoption.

Hadoop is ‘driven by enthusiasts’
The idea that the open source community is populated by individual developers with no concern for enterprise requirements is completely bogus. The Apache Software Foundation has a proven history of developing enterprise-grade software projects through a collaborative development process that combines vendors, users and other interested parties.

The biggest contributors to Apache Hadoop include vendors such as Hortonworks, Cloudera, MapR and IBM, all of which have a vested interest in driving greater enterprise adoption, as well as users such as Yahoo, Facebook and eBay, all of which stand to gain from its improved capabilities.

On a broader note, open source development in general has a proven track record of producing enterprise-grade software. You only have to look at the success of Linux to see how rapidly open source software can be adopted by enterprises once it reaches a suitable level of maturity and has the support of commercial vendors. Hadoop is no exception, and is likely to follow in the footsteps of Linux as it matures.

Additionally, we see the open source nature of Hadoop as one of the adoption drivers – as users know that they can avoid vendor lock in and have a choice of providers for their Hadoop training, support and services.

Hadoop may need to be ‘taken out of open source’
There is no reason to believe that a closed source Hadoop would deliver any functionality that could not be developed by the Apache Hadoop community. While a number of vendors offer closed source alternatives for individual components in the Hadoop stack, anyone offering a fully closed source alternative would suffer by not being able to compete with the collaborative development process and competitive commercial ecosystem that the open source development process enables.

In addition it is worth noting that Hadoop, along with other distributed data management projects including many of the NoSQL databases, were initiated by organizations like Google, Amazon and Yahoo in response to the inability of the established data management vendors to fulfil their data management requirements.

The established closed source data management vendors have had plenty of time to develop a ‘better’ Hadoop than Hadoop, and do not lack development resources, but have chosen to collaborate with Hadoop distributors and contribute to Hadoop instead.

A prime example is Microsoft, which in late 2011 abandoned its own Dryad distributed computing project in favour of contributing to Apache Hadoop. This is a sign that Hadoop has already won enough attention to make it difficult for any competing product to gain traction.

While we see vendors offering closed source alternatives for individual components in the Hadoop stack we do not believe that a full closed source alternative would be viable, or desirable from a customer’s perspective. There is no reason to believe that enterprise-grade improvements to Hadoop cannot be delivered by the Apache Hadoop community and the open source development process.