January 10th, 2012 — Data management
Oracle OEMs Cloudera. The future of Apache CouchDB. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.
* Oracle announced the general availability of Big Data Appliance, and an OEM agreement with Cloudera for CDH and Cloudera Manager.
* The Future of Apache CouchDB Cloudant confirms intention to integrate the core capabilities of BigCouch into Apache CouchDB.
* Reinforcing Couchbase’s Commitment to Open Source and CouchDB Couchbase CEO Bob Wiederhold attempts to clear up any confusion.
* Hortonworks Appoints Shaun Connolly to Vice President of Corporate Strategy Former vice president of product strategy at VMware.
* Splunk even more data with 4.3 Introducing the latest Splunk release.
* Announcement of Percona XtraDB Cluster (alpha release) Based on Galera.
* Bringing Value of Big Data to Business: SAP’s Integrated Strategy Forbes interview with with Sanjay Poonen, President and corporate officer of SAP Global Solutions.
* New Release of Oracle Database Firewall Extends Support to MySQL and Enhances Reporting Capabilities Self-explanatory.
* Big data and the disruption curve “Many efforts are being funded by business units and not the IT department and money is increasingly being diverted from large enterprise vendors.”
* Get your SQL Server database ready for SQL Azure Microsoft “codename” SQL Azure Compatibility Assessment.
* An update on Apache Hadoop 1.0 Cloudera’s Charles Zedlewski helpfully explains Apache Hadoop branch numbering.
* Xeround and the CAP Theorem So where does Xeround fit in the CAP Theorem?
* Can Yahoo’s new CEO Thompson harness big data, analytics? Larry Dignan thinks Scott Thompson might just be the right guy for the job.
* US Companies Face Big Hurdles in ‘Big Data’ Use “21% of respondents were unsure how to best define Big Data”
* Schedule Your Agenda for 2012 NoSQL Events Alex Popescu updates his list of the year’s key NoSQL events.
* DataStax take Apache Cassandra Mainstream in 2011; Poised for Growth and Innovation in 2012 The usual momentum round-up from DataStax.
* Objectivity claimed significant growth in adoption of its graph database, InfiniteGraph and flagship object database, Objectivity/DB.
* Cloudera Connector for Teradata 1.0.0 Self-explanatory.
* For 451 Research clients
# SAS delivers in-memory analytics for Teradata and Greenplum Market Development report
# With $84m in funding, Opera sets out predictive-analytics plans Market Development report
* Google News Search outlier of the day: First Dagger Fencing Competition in the World Scheduled for January 14, 2012
And that’s the Data Day, today.
January 6th, 2012 — Data management
As I mentioned earlier this week, a major research focus for Q1 is the MySQL ecosystem, the positives and negatives of Oracle’s MySQL strategy, and the competitive overlap between MySQL, NoSQL and NewSQL.
It is impossible to think about this without reconsidering the commitments made by Oracle to customers, developers and users of MySQL in late December 2009, which played a significant part in satisfying European Commission concerns about Oracle’s acquisition of Sun.
While the commitments were both welcomed and derided when they were announced, it is worth considering today whether those commitments have been as significant in practice as they appeared to be two years ago.
For example, Oracle’s commitment to and investment in InnoDB – while positive for MySQL users – has arguably diminished the relevance of some of the storage engine-related commitments.
We will be coming to our own conclusions based on our research over the coming weeks, but I am interested in any feedback from MySQL customers, developers and users about how well Oracle has kept to its commitments and their significance in hindsight.
You can find a full list of the commitments here but the edited highlights are below:
1. Continued Availability of Storage Engine APIs.
2. Non-assertion of copyright and no requirement for a commercial license related to implementing the storage engine APIs .
3. Extension of any existing commercial storage engine licenses until December 10, 2014.
4. Commitment to continue licensing MySQL using the GNU GPL.
5. Customers would not be required to purchase support services from Oracle as a condition of obtaining a commercial license to MySQL.
6. Increase spending on MySQL research and development.
7. Commitment to create and fund a customer advisory board.
8. Commitment to create and fund a MySQL Storage Engine Vendor Advisory Board.
9. Commitment to retain the free MySQL Reference Manual.
10. Retention of annual or multi-year subscription renewals for end-users and embedded customers.
October 3rd, 2011 — Data management
We have previously speculated at The 451 Group about Oracle’s potential to respond to the growing adoption of NoSQL databases, noting that the company had a number of options at its disposal, including Berkeley DB and projects like HandlerSocket.
While some may wonder about the potential impact of Oracle NoSQL (based indeed on Berkeley DB) on the existing NoSQL vendors, I believe the launch says something very significant about NoSQL itself: specifically that its adoption is driven by more than the nature of the query language.
To get a sense of why Oracle NoSQL is significant, think about the way Oracle has traditionally responded to alternative approaches that threaten the relational model and its dominance thereof. Oracle’s approach has traditionally been to subsume the alternative approach, at least in part, into Oracle Database, nullifying the competitive threat.
Oracle CEO Larry Ellison explained the approach himself on a recent call with investors:
“We think that data should be integrated with a single database technology. That’s always been our strategy for Oracle. And it started as a relational database then we added objects, then we added text and then we’ve added a variety of other things like video and audio to the Oracle Database. We think that should be unified and that’s how we’re approaching the problem.”
As we recently covered (451 clients only), Oracle is in the process of replicating this strategy with MySQL, adding support for the ability to directly access MySQL’s InnoDB and MySQL’s Cluster’s NDB storage engines using the memcached API.
This ability to perform non-SQL querying of the database is part of the agility benefit of NoSQL, and if the term NoSQL were to be taken literally would perhaps be enough to discourage would-be NoSQL adopters from turning away from MySQL.
As our NoSQL, NewSQL and Beyond report highlighted, however, agility is just one of six key trends we see driving adoption of NoSQL databases. Scalability, performance, relaxed consistency, intricacy and necessity will not be solved by the ability to query MySQL or MySQL Cluster using the memcached API.
The launch of Oracle NoSQL is therefore a clear indication that there are trends at work here that cannot be solved by adding non-SQL querying to existing relational databases.
There is another significant factor here, which is the fact that Oracle has chose to name the product NoSQL. In one simple naming move the company has effectively disarmed the NoSQL ‘movement’.
We have previously noted that existing NoSQL vendors were turning away from the term in favor of emphasizing their individual strengths. How many of them are going to want to self-identify with an Oracle product? I’m not convinced any of them believe the brand is worth fighting for.
September 23rd, 2010 — Content management, Search
Document filters. There’s a phrase to conjure up excitement in any technologist eh? No? Didn’t think so. But look more carefully at what is going on and it does get more interesting, trust me.
I was moved to expand in this by Isys Search Software’s recent attempt at guerilla marketing at Oracle Open World which it tweeted about here:
isyssearch: ISYS goes guerrilla; kicked out of Oracle Open World party after projecting our branding on the Metreon http://tinyurl.com/272fync #oow10
Quite apart from what it says about Isys and how much it’s changed in the last two years – a bit like the nerdy guy in the playground trying to act tough – it shows how important some people – including me – think these filters have become.
There are two main companies selling products that enable the opening and viewing of myriad file formats (400 is a common number cited by both the vendors and their customers). So when a search engine comes across a Word 1997 or even something like Wordstar 4 file, how does it open it? Usually using one of two products: Oracle’s OutsideIn or Autonomy’s IDOL KeyView.
Both products came to these companies via acquisitions: Autonomy buying Verity in November 2005 and Oracle buying Stellent in 2007, (and Stellent, as it wasn’t known then, buying Inso in 2000). It’s also interesting to note that Isys still refers to them as Inso in its marketing even though the product has been called something else for years.
Like all OEM technology, these filters aren’t easily ripped out and replaced. And that’s what these two vendors like about them. It gives them a a foot in the door at software companies that they can try to expand upon, and quite often they do. The temptation of course is to use the difficulty to remove them as a point of leverage to crank up prices.
And that’s what we’re hearing Autonomy is doing from a number of vendors. We haven’t heard anything similar regarding Oracle, it should be noted. Autonomy has a reasonably significant OEM technology stream and as we have mentioned previously Autonomy regularly brags about its OEM wins, without specifying whether its KeyView or the full IDOL engine being OEMd. Incidentally after that earlier post Autonomy contacted us to say that KeyView isn’t the result of the acquisition of Verity and all it bought was the name. That’s despite what was said at the time, including its own press release shortly after the acquisition bragging about its features. But then Autonomy’s marketing these days increasingly requires a willing suspension of disbelief.
Isys has had this technology for a while but never sold it separately. But now it is finding quite a bit of success among software vendors nervous about having a key piece of technology owned by Autonomy or Oracle because they’re often search and/or content management companies; two markets in which both companies play. dtSearch, another veteran OEM provider also provides similar filters.
So for the first time in a long time, ISVs have a choice beyond the main two in filters and in their close relatives, connectors, the software to connect search engines to databases, content management systems and other repositories. In the often incestuous world of information management software, where vendors both compete and sell to one another, these have become points of leverage that customers may not notice in terms of functionality, but they certainly do in terms of the price they have to pay for their software.
May 20th, 2010 — Data management, M&A
SAP faces a number of challenges to make the most of its proposed $5.8bn acquisition of Sybase, not the least of which being that the company’s core enterprise applications do not currently run on Sybase’s database software.
As we suggested last week that should be pretty easy to fix technically, but even if SAP gets its applications, BI software and data warehousing products up and running on Sybase ASE and IQ in short-order, it still faces a challenge to persuade the estimated two-third of SAP users that run on an Oracle database to deploy Sybase for new workloads, let alone migrate existing deployments.
Even if SAP were to bundle ASE and IQ at highly competitive rates (which we expect it to do) it will have a hard time convincing die-hard Oracle users to give up on their investments in Oracle database administration skills and tools. As Hasso Plattner noted yesterday, “they do not want to risk what they already have.”
Hasso was talking about the migration from disk-based to in-memory databases, and that is clearly SAP’s long-term goal, but even if we “assume for a minute that it really works” as Hasso advised, they is going to be a long-term period where SAP’s customers are going to remain on disk-based databases, and SAP is going to need to move at least some of those to Sybase to prove the wisdom of the acquisition.
A solution may have appeared today from an unlikely source, with IBM’s release of DB2 SQL Skin for Sybase ASE, a new feature for its DB2 database product that provides compatibility with applications developed for Sybase’s Adaptive Server Enterprise (ASE) database. Most Sybase applications should be able to run on DB2 unchanged, according to the companies, while users are also able to retain their Sybase database tools, as well as their administration skills.
That may not sound like particularly good news for SAP or Sybase, but the underlying technology could be an answer to its problems. DB2 SQL Skin for Sybase ASE was developed with ANTs Software and is based on its ANTs Compatibility Server (ACS).
ACS is not specific to DB2. It is designed to is designed to support the API language of an application written for one database and translate to the language of the new database – and ANTs maintains that re-purposing the technology to support other databases is a matter of metadata changes. In fact the first version of ACS, released in 2008, targeted migration from Sybase to Oracle databases.
Sybase should be pretty familiar with ANTs. In 2008 it licensed components of the company’s ANTs Data Server (ADS) real-time database product (now FourJ’s Genero db), while also entering into a partnership agreement to create a version of ACS that would enable migrations from Microsoft’s SQL Server to Sybase Adaptive Server Enterprise and Sybase IQ (451 Group coverage).
That agreement was put on hold when ANTs’ IBM opportunity arose, and while ANTs is likely to have its hands full dealing with IBM migration projects, we would not be surprised to see Sybase reviving its interest in a version that targets Oracle.
It might not reduce the time it takes to port SAP to Sybase – it would take time to create a version of ACS for Oracle-Sybase migrations (DB2 SQL Skin for Sybase was in development and testing for most of 2009) – but it would potentially enable SAP to deploy Sybase databases for new workloads without asking its users to retool and re-train.
May 14th, 2010 — Data management, M&A
The 451 Group has published its take on the proposed acquisition of Sybase by SAP. The full report provides details on the deal, valuation and timing, as well as assessing the rationale and competitive impact in three core areas: data management, mobility, and applications.
As a taster, here’s an excerpt from our view of the deal from a database perspective:
The acquisition of Sybase significantly expands SAP’s interests in database technology, and the improved ability of the vendor to provide customers with an alternative to rival Oracle’s database products is, alongside mobile computing, a significant driver for the deal. Oracle and SAP have long been rivals in the enterprise application space, but Oracle’s dominance in the database market has enabled it to wield significant influence over SAP accounts. For instance, Oracle claims to be the most popular database for deploying SAP, and that two-thirds of all SAP customers run on Oracle Database. Buying a database platform of its own will enable SAP to break any perceived dependence on its rival, although this is very much a long-term play: Sybase’s database business is tiny compared to Oracle, which reported revenue from new licenses for database and middleware products of $1.2bn in the third quarter alone.
The long-term acquisition focus is on the potential for in-memory database technology, which has been a pet project for SAP cofounder and supervisory board chairman Hasso Plattner for some time. As the performance of systems hardware has improved, it is now possible to run more enterprise workloads in memory, rather than on disk. By using in-memory database technology, SAP is aiming to improve the performance of its transactional applications and BI software while also hoping to leapfrog rival Oracle, which has its disk-based database installed base to protect. Sybase also has a disk-based database installed base, but has been actively exploring in-memory database technology, and SAP can arguably afford to be much more aggressive about a long-term in-memory vision since its reliance on that installed base is much less than Sybase’s or Oracle’s.
SAP has already delivered columnar in-memory database technology to market via its Business Warehouse Accelerator (BWA) hardware-based acceleration engine and the SAP BusinessObjects Explorer data-exploration tool. Sybase has also delivered in-memory database technology for its transactional ASE database with the release of version 15.5 earlier this year. By acquiring Sybase, SAP has effectively delivered on Plattner’s vision of in-memory databases for both analytical and transaction processing, albeit with two different products. At this stage, it appears that SAP’s in-memory functionality will quickly be applied to the IQ analytic database while ASE will retain its own in-memory database features. Over time, expect R&D to focus on delivering column-based in-memory database technology for both operational and analytic workloads.
In addition, SAP touted the applicability of its in-memory database technology to Sybase’s complex-event-processing (CEP) technology and Risk Analytics Platform (RAP). Sybase was already planning to replicate the success of RAP in other verticals following its acquisition of CEP vendor Aleri in February, and we would expect SAP to accelerate that.
Meanwhile, SAP intends to continue to support databases from other vendors. In the short term, this will be a necessity since SAP’s application software does not currently run on Sybase’s databases. Technically, this should be easy to overcome, although clearly it will take time, and we would expect SAP to encourage its application and BI customers to move to Sybase ASE and IQ for new deployments in the long term. One of the first SAP products we would expect to see ported to Sybase IQ is the NetWeaver Business Warehouse (BW) model-driven data-warehouse environment. SAP’s own MaxDB is currently the default database for BW, although it enables deployment to Oracle, IBM DB2, Microsoft SQL Server, MaxDB, Teradata and Hewlett-Packard’s Neoview. Expect IQ to be added to that list sooner rather than later, and to potentially replace MaxDB as the default database.
I have some views on how SAP could accelerate the migration of its technology and users to Sybase’s databases but – for reasons that will become apparent – they will have to wait until next week.
October 28th, 2009 — Data management
In our recent report on the data warehousing market we speculated that there would soon be a change in the number of vendors operating in what is a crowded market. We were anticipating that the number of vendors would go down, rather than up, but – in the short term at least – we have been proved wrong, as two new open source analytical databases emerged this week.
First came the formation of Dynamo Business Intelligence Corp, (aka Dynamo BI), a new commercially supported distribution, and sponsor, of LucidDB. Then came the launch of InfiniDB Community Edition, a new open source analytic database based on MySQL from Calpont.
We actually included Calpont in our report but its product plans at that time looked precarious to say the least as the company found that its plans to launch a data warehousing platform based on MySQL were overshadowed by Oracle’s acquisition of Sun.
We were somewhat sceptical about whether Calpont – which has had a couple of false starts in the past – would find a way to bring something to market and we are impressed that the company has reached a licensing agreement with Sun that supports its open source and commercial aims.
Specifically the company has arranged an OEM agreement with Sun for the MySQL Community Server version that enables it to be used with both Calpont’s open source and commercially licensed products. The first of those is InfiniDB Community Edition, a column-oriented, multi-threaded data warehouse platform which acts as a storage engine for MySQL.
The GPLv2 Community Edition will only be available for deployment on a single-server and without any formal support from Calpont and is primarily aimed at raising interest among MySQL developers. A fully certified and supported commercial version will follow, although Calpont is reticent about providing details on that at the moment other than that it will make use of Calpont’s massively parallel processing capabilities and modular architecture to scale out as well as up.
Calpont faces some competition in the MySQL segment from Kickfire and Infobright, particularly the latter given their similar open source software strategies (Kickfire is a MySQL appliance). Infobright has has grown rapidly since going open source and now boasts more than 100 customers, although Calpont maintains that leaves plenty of opportunities amongst MySQL users.
We would agree with that, and also with the company’s claim to offer something different from Infobright technologically. Infobright also offers column-based storage but not massively parallel processing (although it is working on a shared-everything, peer-to-peer architecture). We should note that InfiniDB Community Edition is also restricted to a single server but this is the result of a strategic decision, rather than a technical limitation. The commercial version will be fully MPP.
We recently noted that LucidDB is another open source database that is often overlooked since the LucidDB code is not commercially supported.
Any concern over the future of LucidDB following the demise of LucidEra should be put to bed by the formation of Dynamo BI with the intention to provide a commercially supported distribution of LucidDB.
As LucidDB project lead John Sichi wrote:
“This is an offering which has been completely missing up until now, and which I and others such as Julian Hyde believe to be essential for accelerating adoption of LucidDB. LucidEra provided much of the critical development effort, but never offered commercial support on LucidDB since that was not part of its software-as-a-service business model. Eigenbase provides community infrastructure and development coordination, but a commercial offering is not part of its non-profit charter. So in the past, when individuals and companies have asked me whom they should talk to in order to purchase support for LucidDB, I have never had a good answer. “
Meanwhile Nicholas Goodman revealed that the company has acquired the commercial rights to LucidDB and plans to offer DynamoDB as a prepackaged, assembled distribution. It will also be fully open source and all new features will be contributed to LucidDB.
It is very early days for Dynamo BI, which doesn’t even have a website as yet, so it’s difficult to judge the company’s plans, but with some of the lead LucidDB developers involved and a solid starting project – “the best database no one ever told you about” – it has every chance. We’ll be looking to catch up with the company just as soon as it gets up and running.
The data warehousing sector is extremely crowded and we continue to believe that there will be a shakeout in the near future, but there are opportunities for companies that are able to differentiate themselves from the pack. Starting a data warehousing company is generally not something that we would recommend right now, but both Calpont and Dynamo BI have opportunities to establish themselves.
September 2nd, 2009 — Data management
Oracle has introduced a hybrid column-oriented storage option for Exadata with the release of Oracle Database 11g Release 2.
Ever since Mike Stonebraker and fellow researchers at MIT, Brandeis University, the University of Massachusetts and Brown University presented (PDF) C-Store, a column-oriented database at the 31st VLDB Conference, in 2005, the database industry has debated the relative merits of row- and column-store databases.
While row-based databases dominated the operational database market, column-based database have made in-roads in the analytic database space, with Vertica (based on C-Store) as well as Sybase, Calpont, Infobright, Kickfire, Paraccel and SenSage pushing column-based data warehousing products based on the argument that column-based storage favors the write performance required for query processing.
The debate took a fresh twist recently as former SAP chief executive, Hasso Plattner, recently presented a paper (PDF) calling for the use of in-memory column-based storage databases for both analytical and transaction processing.
As interesting as that is in theory, of more immediate interest is the fact that Oracle – so often the target of column-based database vendors – has introduced a hybrid column-oriented storage option with the release of Oracle Database 11g Release 2.
As Curt Monash recently noted there are a couple of approaches emerging to hybrid row/column stores.
Oracle’s approach, as revealed in a white paper (PDF) has been to add new hybrid columnar compression capabilities in its Exadata Storage servers.
This approach maintains row-based storage in the Oracle Database itself while enabling the use of column-storage to improve compression rates in Exadata, claiming a compression ratio of up to 10 without any loss of query performance and up to 40 for historical data.
As Oracle’s Kevin Closson explains in a blog post: “The technology, available only with Exadata storage, is called Hybrid Columnar Compression. The word hybrid is important. Rows are still used. They are stored in an object called a Compression Unit. Compression Units can span multiple blocks. Like values are stored in the compression unit with metadata that maps back to the rows.”
Vertica took a different hybrid approach with the release of Vertica Database, 3.5, which introduced FlexStore, a new version of the column-store engine, including the ability to group a small number of columns or rows together to reduce input/output bottlenecks. Grouping can be done automatically based on data size (grouped rows can use up to 1MB) to improve query performance of whole rows or specified based on the nature of the column data (for example, bid, ask and date columns for a financial application) to improve query performance.
Likewise, the Ingres VectorWise project (previously mentioned here) will create a new storage engine for the Ingres Database positioned as a platform for data-warehouse and analytic workloads, make use of vectorized execution, which sees multiple instructions processed simultaneously. The Vectorwise architecture makes use of Partition Attributes Across (PAX), which similarly groups multiple rows into blocks to improve processing, while storing the data in columns.
Update – Daniel Abadi has provided an overview at the different approaches to hybrid row-column architectures and suggests something I had suspected, that Oracle is also using the PAX approach, except outside the core database, while Vertica is using what he calls a fine-grained hybrid approach. He also speculates that Microsoft may end up going the third route, fractured mirrors – Update
Perhaps the future of the database may not be row- or column-based, but plaid.
August 6th, 2009 — Data management
Since the start of this year I’ve been covering data warehousing as part of The 451 Group’s information management practice, adding to my ongoing coverage of databases, data caching, and CEP, and contributing to the CAOS research practice.
I’ve covered data warehousing before but taking a fresh look at this space in recent months it’s been fascinating to see the variety of technologies and strategies that vendors are applying to the data warehousing problem. It’s also been interesting to compare the role that open source has played in the data warehousing market, compared to the database market.
I’m preparing a major report on the data warehousing sector, for publication in the next couple of months. In preparartion for that I’ve published a rough outline of the role open source has played in the sector over on our CAOS Theory blog. Any comments or corrections much appreciated.
June 8th, 2009 — Data management
At last year’s 451 Group client event I presented on the topic of database management trends and databases in the cloud.
At the time there was a lot of interest in cloud-based data management as Oracle and Microsoft had recently made their database management systems available on Amazon Web Services and Microsoft was about to launch the Azure platform.
In the presentation I made the distinction between online distributed databases (BigTable, HBase, Hypertable), simple data query services (SimpleDB, Microsoft SSDS as was), and relational databases in the cloud (Oracle, MySQL, SQL Server on AWS etc) and cautioned that although relational databases were being made available on cloud platforms, there were a number of issues to be overcome, such as licensing, pricing, provisioning and administration.
Since then we have seen very little activity from the major database players with regards to cloud computing (although Microsoft has evolved SQL Data Services to be a full-blown relational database as a service for the cloud, see the 451′s take on that here).
In comparison there has been a lot more activity in the data warehousing space with regards to cloud computing. On the one hand there data warehousing players are later to the cloud, but in another they are more advanced, and for a couple of reasons I believe data warehousing is better suited to cloud deployments than the general purpose database.
For one thing most analytical databases are better suited to deployment in the cloud thanks to their massively parallel architectures being a better fit for clustered and virtualized cloud environments.
And for another, (some) analytics applications are perhaps better suited to cloud environments since they require large amounts of data to be stored for long periods but processed infrequently.
We have therefore seen more progress from analytical than transactional database vendors this year with regards to cloud computing. Vertica Systems launched its Vertica Analytic Database for the Cloud on EC2 in May 2008 (and is wotking on cloud computing services from Sun and Rackspace), while Aster Data followed suit with the launch of Aster nCluster Cloud Edition for Amazon and AppNexus in February this year, while February also saw Netezza partner with AppNexus on a data warehouse cloud service. The likes of Teradata and illuminate are also thinking about, if not talking about, cloud deployments.
To be clear the early interest in cloud-based data warehousing appears to be in development and test rather than mission critical analytics applications, although there are early adopters and ShareThis, the online information-sharing service, is up and running on Amazon Web Services’ EC2 with Aster Data, while search marketing firm Didit is running nCluster Cloud Edition on AppNexus’ PrivateScale, and Sonian is using the Vertica Analytic Database for the Cloud on EC2.
Greenplum today launched its take on data warehousing in the cloud, focusing its attention initially on private cloud deployments with its Enterprise Data Cloud initiative and plans to deliver “a new vision for bringing the power of self-service to data warehousing and analytics”.
That may sound a bit woolly (and we do see the EDC as the first step towards private cloud deployments) but the plan to enable the Greenplum Database to act as a flexible pool of warehoused data from which business users will be able to provision data marts makes sense as enterprises look to replicate the potential benefits of cloud computing in their datacenters.
Functionality including self-service provisioning and elastic scalability are still to come but version 3.3 does include online data-warehouse expansion capabilities and is available now. Greenplum also notes that it has customers using the Greenplum Database in private cloud environments, including Fox Interactive Media’s MySpace, Zions Bancorporation and Future Group.
The initiative will also focus on agile development methodologies and an ecosystem of partners, and while we were somewhat surprised by the lack of virtualization and cloud provisioning vendors involved in today’s announcement, we are told they are in the works.
In the meantime we are confident that Greenplum’s won’t be the last announcement from a data management focused on enabling private cloud computing deployments. While much of the initial focus around cloud-based data management was naturally focused on the likes of SimpleDB the ability to deliver flexible access to, and processing of, enterprise data is more likely to be taking place behind the firewall while users consider what data and which applications are suitable for the public cloud.
Also worth mentioning while we’re on the subject in RainStor, the new cloud archive service recently launched by Clearpace Software, which enable users to retire data from legacy applications to Amazon S3 while ensuring that the data is available for querying on an ad hoc basis using EC2. Its an idea that resonates thanks to compliance-driven requirements for long-term data storage, combined with the cost of storing and accessing that data.
451 Group subscribers should stay tuned for our formal take on RainStor, which should be published any day now, while I think it’s probably fair to say you can expect more of this discussion at this year’s client event.