Introducing Hypertable – a new open source database project

Yesterday I had a chance to catch up with Doug Judd, principle search architect at local search specialist Zvents, to get some insight into Hypertable, the new open source massively parallel database software project.

Hypertable is an open source (GPLv2) implementation of Google’s Bigtable, an internally-deployed database that serves the company’s web indexing, Google Earth and Google Finance services. According to Judd, the best way to think about Hypertable is as a traditional database, but one that trades advanced features like transactional capabilities for scalability: specifically thousands of commodity PCs.

While Hypertable is not designed to support transactional applications, it is designed to power any high traffic website. As an example of potential application examples, Judd cited capturing click information and feedback for analytics applications. He added that despite being based on a Google design, Hypertable is not just for the likes of Google and Amazon – one of the advantages of the architecture is that users can start small and add new machines as traffic increases.

Of course a distributed database needs a distributed file system, and just as Bigtable has the Google File System and MapReduce, so Hypertable requires a GFS-like file system (Zvents is using Hadoop, but Kosmos or any other file system based on GFS would also be supported by the Hypertable’s file system broker architecture.

Judd was previously involved in the Hbase project which builds Bigtable-like capabilities on the Hadoop Core, and admits that there was a difference of opinion on the best implementation language to use. While Hbase, like Hadoop, is written in Java, the Hypertable developers are convinced that Java is inappropriate for a Bigtable implementation due to the memory intensive nature of the software. Hypertable is written in C++.

Talking of Hypertable developers, the community stands at two Zvents employees right now, but version 0.9 has only been available since February 4, and there has already been interest on contributions and further sponsorship. It is thought the the project will enter beta in the next month and GA a month or two after that.

The long-term plan is to put together a non-profit organization that can take the project forward. While it is sponsoring the initial effort and expects to benefit from the project thanks to its early adoption and in-house expertise, Zvents certainly does not want to become a distributed database company.

“Zvents isn’t in the business of building databases and infrastructure, and this kind of infrastructure is something that everybody’s going to need,” commented Judd on the decision to go down the open source project route. “The secret sauce is going to be what we do on top of the technology.”

6 comments ↓

#1 John on 02.19.08 at 2:13 pm

You guys get to have all the fun. This has been something that I was planning to research and you just gave me a good jumpstart. Great article.

Thanks
johnmwillis.com

#2 Introducing Hypertable - a new open source database project | John M Willis ESM Blog on 02.19.08 at 2:16 pm

[…] Introducing Hypertable – a new open source database project Of course a distributed database needs a distributed file system, and just as Bigtable has the Google File System and MapReduce, so Hypertable requires a GFS-like file system (Zvents is using Hadoop, but Kosmos or any other file system based on GFS would also be supported by the Hypertable’s file system broker architecture. […]

#3 Introducing Hypertable - a new open source database project | open source business applications on 02.19.08 at 6:51 pm

[…] Read the rest of this great post here […]

#4 Log Buffer #85: a Carnival of the Vanities for DBAs on 02.22.08 at 1:03 pm

[…] The 451 CAOS Theory blog introduces Hypertable. “Hypertable is an open source (GPLv2) implementation of Google’s Bigtable, an internally-deployed database that serves the company’s web indexing, Google Earth and Google Finance services. According to Judd, the best way to think about Hypertable is as a traditional database, but one that trades advanced features like transactional capabilities for scalability: specifically thousands of commodity PCs.” […]

#5 Joe Del Torro on 02.27.08 at 4:04 pm

OPEN SORES is the name for useless source code released to the open community by a company in it’s death throes as a desperate attempt to have other people do the work for them because they’ve run out of money and talent. Zvents R.I.P.

#6 Mark Applebaum on 03.04.08 at 3:02 pm

Funny how both Zvents and Kosmix are so generous that they release source code that doesn’t work and remains perpetually in alpha. Why not release all of their source? (preferably code that works). Now they are trying to sucker Facebook into pick up the tab. Let’s see if Zuckerberg is dumb enough to pay to develop software that the rest of the world can use to destroy him. I’m sure Microsoft will be real happy with that plan. If I were VantagePoint I’d fire Ethan for squandering their money…