Saturday, September 6, 2008

Hypertable (beta)

Massively scalable database - open source too

Based on Google's well known BigTable project, the new entrant into the scalable database space is Hypertable. Hypertable is currently in beta release 0.9.x beta and is designed to manage the storage and processing of information on a large cluster of commodity servers, providing resilience to machine and component failures. According to their website, Hypertable is out to set the open source standard for highly available, petabyte scale, database systems. The goal is nothing less than that Hypertable become one of the world’s most massively parallel high performance database platforms.

Hypertable uses Apache Hadoop HDFS distributed file system, which Hypertable refers to as a third party file system. The Hypertable website contains a great high level architectural overview of how Hypertable is constructed, as well as more detailed documentation on the dependencies and structure of a Hypertable database implementation. Currently the Hypertable project has a few formidable deficiencies to overcome, the most critical being that the master and hyperspace servers are single instance with no cluster takeover capability. These two issues, among others, are currently being addressed by the development team. Given the fact that Hypertable is based on a Google project, this is one to keep your eye on if you are in the market for a massively scalable database. I would also venture to say that you should also consider it for ANY database deployment that needs resilience, even if it's implemented in a single rack versus geographically distributed. Looks like Oracle's Real Application Cluster database architecture may soon have some open source, scalable competition.

No comments: