The Data Day, A few days: November 15-21 2013

Storage and big data: what went wrong? And more

And that’s the data day, today.

The Data Day, Two days: December 4/5 2012

EMC/VMware make Pivotal move. Funding for ClearStory. And more

And that’s the Data Day, today.

The Data Day, Today: Apr 25 2012

Splunk soars on IPO. VMware acquires Cetas. Vertica retain autonomy. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* For 451 Research clients

# Splunk IPO: $3bn and counting M&A Insight

# VMware snaps up Cetas Software for ‘big data’ analytics Deal Analysis

# HP’s Vertica retains its autonomy, continues integration with Autonomy Impact Report

# SAP makes long-awaited predictive analytics move of its own Impact Report

# Sanbolic pitches data management platform for server, desktop and database consolidation Impact Report

* Splunk IPO kills, lives up to expectations

* VMware acquires Cetas Software for Cloud and Big Data Analytics

* Opera Solutions Acquires Procurement Analytics Tools and Services from BIQ and Lexington Analytics

* Terascala Announces $14M Series B Funding Round Led by Strategic Partner Consortium

* Ravel Acquired by W2O Group To Expand Big Data Client Services And Enrich In-House Analytics and Insights Technology

* Teradata Active Data Warehouses Provide Private Cloud Benefits

* Pentaho Introduces New Interactive Visualization and Expanded Big Data Analytics

* Teradata Unveils New Purpose-Built Appliance for SAS High-Performance Analytics

* SAP Establishes Global Managing Board to Lead Company

* Oracle to Hadoop Under OneAppliance: GridIron Introduces First All-Flash Appliance Line With Unprecedented Performance to Tackle Unified Big Data Processing

* Lucid Imagination Technology Integration with SugarCRM Lets Customers Enjoy Improved Global Search Capabilities with Apache Lucene/Solr

* The Apache Software Foundation Announces Apache Cassandra v1.1

* Miso project: how it will help you make your own Guardian-style infographics and data visualisations

And that’s the Data Day, today.

The Data Day, Today: Apr 2 2012

Basho launches cloud storage play. Opera acquisitions. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Basho Unveils Riak CS, Multi-Tenant Cloud Storage Software for Public and Private Clouds

* InsightsOne Secures $4.3 Million in Series A Round of Funding Led by Norwest Venture Partners

* Opera buys Commendo to create predictive analytics powerhouse

* Opera Solutions Increases Procurement Capabilities with Acquisition of Lexington Analytics

* How federal money will spur a new breed of big data

* Another HP org change Vertica no longer under the purview of Autonomy boss Mike Lynch?

* New SAS Visual Analytics Helps Organizations Analyze, Visualize Big Data

* Citrusleaf Delivers Real-Time NoSQL Replication

* NuoDB Launches Open Source Initiative on Github

* Actian Teams up With FlyingBinary and Tableau to Unleash Big Data Potential

* DH2i Launches and Unveils DxConsole Next Generation Virtualization Solution to Enable the Agile, Always-On Enterprise

* Acunu Analytics Ready to Preview!

* SAND Technology Announces Second Quarter Results for Fiscal Year 2012

* Idera Announces VMware Database Performance Monitoring Solution

* Idera Announces SQL Compliance Manager 3.6

* WalmartLabs is building big data tools — and will then open source them

* The three waves of opportunities in big data

* 4 Big Data Myths – Part I

* For 451 Research clients

# Drawn to Scale raises funds for Hadoop-based real-time database Impact report

# ParElastic brings elastic parallelism to relational databases Impact report

# DH2i launches with PolyServe-inspired database-virtualization software Impact report

# Tape industry pins future on ‘big data,’ active archiving and LTFS Spotlight report

# Lucid Imagination dreams up new strategy for enterprise search Market development report

# Pentaho identifies ‘big data’ analytics as investment priority, hooks into DataStax Market development report

# GridGain positions in-memory data grid for real-time analytics Market development report

# Having earned its stripes in HPC, Panasas heads for ‘big data’ Market development report

* Google News Search outlier of the day: Top 10 Dog and Cat Medical Conditions of 2011

And that’s the Data Day, today.

The Data Day, Today: Mar 22 2012

Oracle reports Q3. EMC acquires Pivotal Labs. ClearStoty launches. And much, much more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Oracle Reports Q3 GAAP EPS Up 20% to 49 Cents; Q3 Non-GAAP EPS Up 15% to 62 Cents Database and middleware revenue up 10%.

* EMC Goes Social, Open and Agile With Big Data EMC acquires Pivotal Labs, plans to release Chorus as an open source project

* ClearStory Data Launches With Investment From Google Ventures, Andreessen Horowitz and Khosla Ventures

* HP Lead Big Data Exec Chris Lynch Resigns

* “Hortonworks Names Ari Zilka Chief Products Officer

* DataStax Enterprise 2.0 Adds Enterprise Search Capabilities to Smart Big Data Platform

* MapR Unveils Most Comprehensive Data Connection Options for Hadoop

* New Web-Based Alpine Illuminator Integrates with EMC Greenplum Chorus, The Social Data Science Platform

* RainStor and IBM InfoSphere BigInsights to Address Growing Big Data Challenges

* IBM Introduces New Predictive Analytics Services and Software to Reduce Fraud, Manage Financial Performance and Deliver Next Best Action

* Datameer Releases Major New Version of Analytics Platform

* Kognitio Announces Formation of “Kognitio Cloud” Business Unit

* HStreaming Announces Free Community Edition of Its Real-Time Analytics Platform for Hadoop

* Talend and MapR Announce Certification of Big Data Integration and Big Data Quality

* Schooner Information Technology Releases Membrain 4.0

* Gazzang Launches Big Data Encryption and Key Management Platform

* Logicworks Solves Big Data Hosting Challenges With New Infrastructure Services for Hadoop

* “Big Data” Among Most Confusing Tech Buzzwords

* For 451 Research clients

# Infochimps launches Chef-based platform for Hadoop deployment Impact Report

# Big-data security, or SIEM buzzword parity? Spotlight report

# DataStax adds enterprise search and elastic reprovisioning to database platform Market Development report

# With a new CEO and IBM as a reseller, Revolution Analytics charts next growth phase Market Development report

# Cray branches out, offering storage and a ‘big data’ appliance Market Development report

# CodeFutures sees a future beyond database sharding Market Development report

# Third time lucky for ScaleOut StateServer 5.0? Market Development report

# Attunity looks to 2012 for turnaround; up to the cloud and ‘big data’ movement Market Development report

# Panorama rides Microsoft’s coattails into in-memory social BI using SQL Server 2012 Market Development report

And that’s the Data Day, today.

The Data Day, Today: Feb 17 2012

Rob Bearden is new Hortonworks CEO. Oracle updates MySQL Cluster. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* With a new CEO, will Hortonworks get serious about Hadoop? Rob Bearden is Hortonworks’ new chief elephant-herder.

* Oracle Oracle Announces General Availability of MySQL Cluster 7.2. Claims 1 billion queries per minute.

* Vertica Extends Manageability and Ease-of-Use for the Vertica Analytics Platform.

* NGDATA Raises Capital to Accelerate Growth Belgian data management company confirms that it recently acquired Outerthought.

* QlikTech Announces Fourth Quarter and Full Year 2011 Financial Results

* Quest Business Intelligence Studio 1.0 is now Generally Available

* Jaspersoft Delivers Big Data Integration into Jaspersoft ETL

* Hortonworks University Launches to Deliver Comprehensive Apache Hadoop Training and Certification.

* Schema in Cassandra 1.1. “as systems deployed on Cassandra grew and matured, lack of schema became a pain point”

* GigaSpaces Announces New Cloudify Free Product Edition.

* Announcing Reduced Pricing on SQL Azure and New 100MB Database Option.

* Composite Software Continues Innovation with Release of Version 6.1 of its Data Virtualization Platform.

* Multi-Tenant Cloudant in Europe.

* NuvolaBase has launched its hosted graph database offering.

* How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did. Big data’s beer and diapers equivalent.

* Beyond “Big Data”. “I have a theory that buzzwords are usually helpful in general, in that they usher in new concepts before they end up as meaningless marketing fluff–and, eventually, punchlines. I think this is in the process of happening right now with the term “big data”.”

* For 451 Research clients

# Akiban prepares to launch ‘table grouping’ NewSQL database IMpact Report

# IBM sheds light on the Big Blue business of information governance Market Decelopment Report

And that’s the Data Day, today.

The Data Day, Today: Jan 24 2012

Thoughts on Splunk’s IPO and DynamoDB. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Thoughts on the Splunk IPO and S-1 By Dave Kellogg.

* Thoughts on SimpleDB, DynamoDB and Cassandra By Adrian Cockcroft.

* Recommind’s Revenue Leaps 95% in Record-Setting 2011 Predictable.

* Hewlett-Packard Expands to Cambridge via Vertica’s “Big Data” Center Moving.

* Announcing SkySQL Enterprise HA for the MariaDB & MySQL databases

* Membase Server is Now Couchbase Server But not *the* Couchbase Server.

* Cloudera Teams With O’Reilly Media to Merge Hadoop World and Strata Conferences

* Survey results: How businesses are adopting and dealing with data 100 Strata Online Conference attendees.

* Big data market survey: Hadoop solutions

* LinkedIn released SenseiDB, an open source distributed, realtime, semi-structured database.

* For 451 Research clients

# VMware: not your father’s database company Impact Report

# Sparsity Technologies draws up plans for graph database adoption Impact Report

# Amazon launches DynamoDB, an auto-configuring database as a service Market Development report

# NuoDB targets Q2 release for elastic relational database Market Development report

# ADVIZOR illuminates growth strategy, roadmap in data discovery and analysis Market Development report

# Birst adds own analytic engine for BI, OEM agreement with ParAccel Market Development report

* Google News Search outlier of the day: RentAGrandma.com Recruiting Wonderful Grandmas

And that’s the Data Day, today.

Job trends highlight post-M&A analytic database investment

When I was messing around with Indeed.com job trends the other day I was struck by an interesting trend relating to the five recent major M&A deals involving analytic database vendors: Netezza, Sybase, Greenplum, Vertica and Aster Data.


netezza, sybase iq, greenplum, vertica, aster data Job Trends graph

netezza, sybase iq, greenplum, vertica, aster data Job Trends Netezza jobsSybase Iq jobsGreenplum jobsVertica jobsAster Data jobs

The trends aren’t immediately obvious from that chart, but if we break them out individually and add a black dot to indicate the approximate date of the acquisition announcement it all becomes clear.


(Note: scale varies from chart to chart)

While the acquisitions have accelerated job postings for all acquired analytic databases, Greenplum has clearly been the biggest beneficiary. Indeed.com’s data also explains why this might be: EMC/Greenplum is responsible for over 50% of the current Greenplum-related job postings on the site (excluding recruiter postings).

Greenplum had 140 employees when it was acquired in July 2010. Based on the hiring growth illustrated above, EMC’s Data Computing Products Division is set to reach 650 by the end of the year.

Netezza started with a much larger base, but IBM is expected to increase headcount at Netezza from 500 in September 2010 to a target of 800 by year-end. Thanks, no doubt, to Netezza’s larger installed base, IBM is responsible for just 7.7% of Netezza job postings.

This highlights something we recently noted in a 451 Group M&A Insight report: in order to make a considerable dent in the dominance of the big four, any acquiring company will not only have to buy a data-warehousing player but also invest in its growth.

While Vertica and Aster Data are both heading in the right direction, we believe that HP and Teradata will have to accelerate their investment in the Vertica subsidiary and the new Aster Data ‘center of excellence’ respectively.

HP recently told us headcount has grown about 40% since the acquisition (it wasn’t being specific, but Vertica reported 100 employees in January). HP/Vertica is currently responsible for 13.9% for Vertica-related job postings on Indeed.com

We had speculated that Teradata would need to similarly boost the headcount at Aster Data beyond the estimated 100 employees. Teradata/Aster Data is responsible for 24% of job postings for Aster Data.

But what of Sybase? While Sybase IQ also has a larger installed base, SAP/Sybase are responsible for just 6.4% of the Sybase IQ-related job postings on Indeed.com. The Sybase IQ chart illustrates some common sense investment advice: the value of your investment can go down as well as up.

The future of the database is… plaid?

Oracle has introduced a hybrid column-oriented storage option for Exadata with the release of Oracle Database 11g Release 2.

Ever since Mike Stonebraker and fellow researchers at MIT, Brandeis University, the University of Massachusetts and Brown University presented (PDF) C-Store, a column-oriented database at the 31st VLDB Conference, in 2005, the database industry has debated the relative merits of row- and column-store databases.

While row-based databases dominated the operational database market, column-based database have made in-roads in the analytic database space, with Vertica (based on C-Store) as well as Sybase, Calpont, Infobright, Kickfire, Paraccel and SenSage pushing column-based data warehousing products based on the argument that column-based storage favors the write performance required for query processing.

The debate took a fresh twist recently as former SAP chief executive, Hasso Plattner, recently presented a paper (PDF) calling for the use of in-memory column-based storage databases for both analytical and transaction processing.

As interesting as that is in theory, of more immediate interest is the fact that Oracle – so often the target of column-based database vendors – has introduced a hybrid column-oriented storage option with the release of Oracle Database 11g Release 2.

As Curt Monash recently noted there are a couple of approaches emerging to hybrid row/column stores.

Oracle’s approach, as revealed in a white paper (PDF) has been to add new hybrid columnar compression capabilities in its Exadata Storage servers.

This approach maintains row-based storage in the Oracle Database itself while enabling the use of column-storage to improve compression rates in Exadata, claiming a compression ratio of up to 10 without any loss of query performance and up to 40 for historical data.

As Oracle’s Kevin Closson explains in a blog post: “The technology, available only with Exadata storage, is called Hybrid Columnar Compression. The word hybrid is important. Rows are still used. They are stored in an object called a Compression Unit. Compression Units can span multiple blocks. Like values are stored in the compression unit with metadata that maps back to the rows.”

Vertica took a different hybrid approach with the release of Vertica Database, 3.5, which introduced FlexStore, a new version of the column-store engine, including the ability to group a small number of columns or rows together to reduce input/output bottlenecks. Grouping can be done automatically based on data size (grouped rows can use up to 1MB) to improve query performance of whole rows or specified based on the nature of the column data (for example, bid, ask and date columns for a financial application) to improve query performance.

Likewise, the Ingres VectorWise project (previously mentioned here) will create a new storage engine for the Ingres Database positioned as a platform for data-warehouse and analytic workloads, make use of vectorized execution, which sees multiple instructions processed simultaneously. The Vectorwise architecture makes use of Partition Attributes Across (PAX), which similarly groups multiple rows into blocks to improve processing, while storing the data in columns.

Update – Daniel Abadi has provided an overview at the different approaches to hybrid row-column architectures and suggests something I had suspected, that Oracle is also using the PAX approach, except outside the core database, while Vertica is using what he calls a fine-grained hybrid approach. He also speculates that Microsoft may end up going the third route, fractured mirrors – Update

Perhaps the future of the database may not be row- or column-based, but plaid.

Lowering barriers to data warehousing adoption with open source

Since the start of this year I’ve been covering data warehousing as part of The 451 Group’s information management practice, adding to my ongoing coverage of  databases, data caching, and CEP, and contributing to the CAOS research practice.

I’ve covered data warehousing before but taking a fresh look at this space in recent months it’s been fascinating to see the variety of technologies and strategies that vendors are applying to the data warehousing problem. It’s also been interesting to compare the role that open source has played in the data warehousing market, compared to the database market.

I’m preparing a major report on the data warehousing sector, for publication in the next couple of months. In preparartion for that I’ve published a rough outline of the role open source has played in the sector over on our CAOS Theory blog. Any comments or corrections much appreciated.