February 3rd, 2012 — Matthew Aslett
Data management
New CEO at Revolution. Pentaho goes big data. EMC Hadoop gets Isilon. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.
* Revolution Analytics Names David Rich New CEO
* Pentaho Open Sources Big Data Capabilities to Further Fuel Widespread Adoption
* EMC Isilon is Industry’s First Scale-Out NAS System with Native Hadoop Support
* Actuate Reports Fourth Quarter and Fiscal Year 2011 Financial Results
* Sumo Logic Raises $15M Series B Round for Next Generation Log Management and Analytics
* Announcing Oracle R Enterprise 1.0
* Paul Cormier Joins Hortonworks’ Board of Directors
* DataStax Launches First Complete Solution for Cassandra Development on Windows and Mac
* Latest Release of Kalido Information Engine Eliminates Data Mart Migration and Consolidation Hassles
* Karmasphere Brings More Power, Collaboration, and Faster Insights to Big Data Analytics Teams on Hadoop
* Why Big Data Won’t Make You Smart, Rich, Or Pretty
* SAP HANA – slowly moving out of hype into actual projects
* For 451 Research clients
# Actuate gets ready to go shopping in the ‘big data’ mall Acquirer IQ
# Couchbase cites enterprise adoption, clarifies distributed NoSQL database strategy Impact report
# SpagoBI illuminates 2012 roadmap, takes open source model to US, Latin America Impact report
# Customer data analysis provider nPario combines big data and smart segmentation Impact report
# Tableau details 2012 growth strategy, gets semantic for visual analytics Market development report
# EMC integrates re-branded Hadoop distribution with Isilon NAS Market development report
# Quiterian seeks funding for new customer analytics in the cloud focus Market development report
# Hortonworks refines its commercial strategy for Apache Hadoop Market development report
# Digital Reasoning pledges to automate the analysis of complex data Market development report
And that’s the Data Day, today.
Tags: Actuate, Big Data. Hadoop, cormier, couchbase, DataStax, Digital Reasoning, EMC, greenplum, HANA, hortonworks, isilon, Kalido, Karmasphere, nPario, Oracle R Enterprise, pentaho, Quiterian, revolution analytics, SAP, SpagoBI, Sumo Logic, Tableau
February 2nd, 2012 — Matthew Aslett
Data management
Thanks to everyone who has already taken part in our survey exploring changing attitudes to MySQL following its acquisition by Oracle and examining the competitive dynamic between MySQL and other database technologies, including NoSQL and NewSQL.
The response has been great and even a quick look at the results makes for interesting reading, particularly in the light of our previous findings which indicated declining MySQL usage.
I am really looking forward to having the opportunity for a deep dive into the results and break out the figures to get a better understanding of the potential impact of alternative MySQL distribution and support providers, as well as NoSQL and NewSQL, on continued usage of MySQL.
The survey results will be made freely available on our blogs, as well as being included in a long format report containing our additional analysis and research related to the MySQL ecosystem and competitive dynamic.
Right now, however, is your last chance to contribute to the survey and get your voice heard. There are just 12 questions to answer, spread over four pages, and the entire survey should take no longer than five minutes to complete. All individual responses are of course confidential.
The survey will close in 24 hours.
Tags: MySQL, NewSQL, noSQL
January 31st, 2012 — Matthew Aslett
Data management
As expected, EMC has announced that it is integrating its Greenplum HD distribution of Apache Hadoop with its Isilon scale-out NAS technology. The move coincides with a re-branding of the company’s Hadoop distributions that, while slight, could prove significant.
Specifically, EMC has enabled the Hadoop Distributed File System (HDFS) as a native protocol supported on OneFS in addition to Network File System (NFS) and Common Internet File System (CIFS) support, enabling Isilon systems to provide the underlying storage layer for Hadoop processing, as well as a common storage pool for Hadoop and other systems.
EMC is talking up the benefits of combining Isilon with Greenplum HD. For the record, that’s the Hadoop distribution previously known as Greenplum HD Community Edition, based on the Apache Hadoop 0.20.1 code branch.
Greenplum HD Enterprise Edition, based on MapR Technologies’ M5 distribution, is now known as Greenplum MR, and is not supported by Isilon due to the fact that it replaces HDFS with Direct Access NFS.
EMC notes that Greenplum MR is being positioned as a high-performance Hadoop offering for customers that have failed to achieve their required performance from other distributions.
While EMC is quick to maintain its happiness with the MapR relationship and its commitment to Greenplum MR, it’s clear that tight integration with Isilon, particularly in the EMC Greenplum DCA, will result in an expanded role for Greenplum HD.
Additionally, while the company’s Greenplum Command Center provides unified management for the Greenplum Database, Greenplum HD and Greenplum Chorus as part of the recently announced Unified Analytics Platform (UAP), MapR has its own management and monitoring functionality.
Since we expect EMC to pitch the benefits of integrated software in UAP and software and hardware in DCA, it is now clear that Greenplum HD, rather than the Greenplum MR, is considered the company’s primary Hadoop distribution.
Given Greenplum HD’s starring role in the Unified Analytics Platform (UAP), Data Computing Appliance (DCA) and integration with Isilon, Greenplum MR’s role is likely to become increasingly niche.
Tags: EMC, greenplum, hadoop, isilon, MapR
January 30th, 2012 — Matthew Aslett
Data management
I put this slide together for my own benefit as I was trying to keep track of the various incarnations of Couchbase’s brands. Looks like I wasn’t the only one, so I thought I’d also make our perspective available.
There are a couple of differences between our slide and Koji Kawamura’s:
Ours contains an extra layer of names (e.g. “Elastic Couchbase”) that were briefly used by Couchbase in discussion and I believe in marketing, although never for shipping product.
Also ours doesn’t mention memcached. It could be on there given that Membase is based on it, and Couchbase Server can still be deployed in “memcached only mode”, but in that sense it is a feature of Membase/Couchbase Server. And anyway, I couldn’t fit it on

Tags: coucdb, couchbase, membase, noSQL
January 27th, 2012 — Matthew Aslett
Data management
Tags: amazon, AWS Storage Gateway, big data, bigcouch, cloud foundry, Continuuity, Davos, enterprisedb, gooddata, Hadoop Summit, HBase, jaspersoft, MassTC, netezza, noSQL, Postgres Plus, RJMetrics, Seismic, Tenzing, Zimory
January 27th, 2012 — Matthew Aslett
Data management
451 Research yesterday announced that it has published its 2012 Previews report, an all-encompassing report highlighting the most disruptive and significant trends that our analysts expect to dominate and drive the enterprise IT industry agenda over the coming year.
The 93 page report provides an outlook and assessment across all 451 Research technology sectors and practice areas – including software infrastructure, cloud enablement, hosting, security, datacenter technologies, hardware, information management, mobility, networking and eco-efficient IT – with input from our team of 40+ analysts. The 2012 Previews report is available upon request here.
IM research director Simon Robinson has already provided a taster of our predictions as they relate to the information-centric landscape. Below I have outlined some of our core predictions related to the data-centric ecosystem:
The overall trend predicted for 2012 could best be described as the shifting focus from volume, velocity and velocity, to delivering value. Out concept of Total Data reflects the path from velocity and variety of information sources to the all-important endgame of deriving value from data. We expect to see increased interest in data integration and analytics technologies and approaches designed specifically to exploit the potential benefits of ‘big data’ and mainstream adoption of Hadoop and other new sources of data.
We also anticipate, and are beginning to see, increased focus on technologies that enable access to data in different storage platforms without requiring data movement. We believe there is an emerging role for what we are calling the ‘data hub‘ – an independent platform that is responsible for managing access to data on the various data storage and processing technologies.
Increased understanding of the value of analytics will also increase interest in the integration of analytics into operational applications. Embedded analytics is nothing new, but has the potential to achieve mainstream adoption this year as the dominant purveyors of applications used to run operations are increasingly focused on serving up embedded analytics as a key component within their product portfolios. Equally importantly, many of them now have database platforms capable of uniting previously disparate technologies to deliver true embedded analysis.
There has been a growing recognition over the past year or so that any type of data management project – whether focused on master data management (MDM), data or application integration, or data quality – needs to bring real benefits to business processes. Some may see this assertion as obvious and pretty easy to achieve, but that’s not necessarily the case. However, it is likely to become more so in the next 12-18 months as companies realize a process-driven approach to most data management programs makes sense and vendors deliver capabilities to meet this demand.
While ‘big data’ presents a number of opportunities, it also poses many challenges, not the least of which is the lack of developers, managers, analysts and scientists with analytics skills. The users and investors placing a bet on the opportunities offered by new data management products are unlikely to be laughing if it turns out that they cannot employ people to deploy, manage and run those products, or analysts to make sense of the data they produce. It is not surprising that, therefore, the vendors that supply those technologies are investing in ensuring that there is a competent workforce to support existing and new projects.
Finally, while cloud computing may be one of the technology industry’s hot topics, it has had relatively little impact on the data management sector to date. That is not to say that databases are not available on cloud computing platforms, but we must make a distinction between databases that are deployed in public clouds, and ‘cloud databases‘ that have the potential to fulfil the role of emerging databases in building private and hybrid clouds. The former have been available for many years. The latter are just beginning to come to fruition based on NoSQL databases, as well as a new breed of NewSQL relational databases, designed to meet the performance, scalability and flexibility needs of large-scale data processing.
451 Research clients can get more details of these specific predictions via our 2012 preview – Information Management, Part 2. Non-clients can apply for trial access at the same link, while the entire 2012 Previews report is available here.
Also, mark your diaries for a webinar discussing report highlights on Thursday Feb 9 at noon ET, which will be open for clients and non-clients to attend. Registration details to follow soon…
Tags: big data, cloud database, data hub, embedded analytics, hadoop, process-driven, skills crisis, total data
January 27th, 2012 — Simon Robinson
Archiving, Collaboration, Content management, Data management, eDiscovery, Search, Text analysis
Every New Year affords us the opportunity to dust down our collective crystal balls and predict what we think will be the key trends and technologies dominating our respective coverage areas over the coming 12 months.We at 451 Research just published our 2012 Preview report; at almost 100 pages it’s a monster, but offers some great insights across twelve technology subsectors, spanning from managed hosting and the future of cloud to the emergence of software-defined networking and solid state storage; and everything in between. The report is available to both 451Research clients and non-clients (in return for a few details); access the landing page
here. There’s a press release of highlights
here. Also, mark your diaries for a webinar discussing report highlights on Thursday Feb 9 at noon ET, which will be open for clients and non-clients to attend. Registration details to follow soon…
Here are a selection of key takeaways from the first part of the Information Management preview, which focuses on information governance, ediscovery, search, collaboration and file sharing. (Matt Aslett will be posting highlights of part 2, which focuses more on data management and analytics, shortly.)
- One of the most obvious common themes that will continue to influence technology spending decisions in the coming year is the impact of continued explosive data and information growth. This continues to shape new legal frameworks and technology stacks around information governance and e-discovery, as well as to drive a new breed of applications growing up around what we term the ‘Total Data’ landscape.
- Data volumes and distributed data drive the need for more automation and auto-classification capabilities will continue to emerge more successfully in e-discovery, information governance and data protection veins — indeed, we expect to see more intersection between these, as we noted in a recent post.
- The maturing of the cloud model – especially as it relates to file sharing and collaboration, but also from a more structured database perspective – will drive new opportunities and challenges for IT professionals in the coming year. Looks like 2012 may be the year of ‘Dropbox for the enterprise.’
- One of the big emerging issues that rose to the fore in 2011, and is bound to get more attention as the New Year proceeds, is around the dearth of IT and business skills in some of these areas, without which the industry at large will struggle to harness and truly exploit the attendant opportunities.
- The changes in information management in recent years have encouraged (or forced) collaboration between IT departments, as well as between IT and other functions. Although this highlights that many of the issues here are as much about people and processes as they are about technology, the organizations able to leap ahead in 2012 will be those that can most effectively manage the interaction of all three.
- We also see more movement of underlying information management infrastructures into the applications arena. This is true with search-based applications, as well as in the Web-experience management vein, which moves beyond pure Web content management. And while Microsoft SharePoint continues to gain adoption as a base layer of content-management infrastructure, there is also growth in the ISV community that can extend SharePoint into different areas at the application-level.
There is a lot more in the report about proposed changes in the e-discovery arena, advances of the cloud, enterprise search and impact of mobile devices and bring-your-device-to-work on information management.
Tags: 451, big data, cloud, eDiscovery, information governance, SharePoint, total data
January 24th, 2012 — Matthew Aslett
Data management
Tags: ADVIZOR, amazon, big data, birst, couchbase, dynamoDB, hadoop, hadoop world, HP, MariaDB, membase, MySQL, NuoDB, paraccel, Recommind, SenseiDB, SkySQL, Sparsity, Splunk, Strata, vertica, vmware
January 23rd, 2012 — Matthew Aslett
Data management
If you’re a MySQL user, tell us about your adoption plans by taking our current survey.
Back in late 2009, at the height of the concern about Oracle’s imminent acquisition of Sun Microsystems and MySQL, 451 Research conducted a survey of open source software users to assess their database usage and attitudes towards Oracle.
The results provided an interesting snapshot of the potential implications of the acquisition and the concerns of MySQL users and even, so I am told, became part of the European Commission’s hearing into the proposed acquisition (used by both sides, apparently, which says something about both our independence and the malleability of data).
One of the most interesting aspects concerned the apparently imminent decline in the usage of MySQL. Of the 285 MySQL users in our 2009 survey, only 90.2% still expected to be using it two years later, and only 81.8% in 2014.
Other non-MySQL users expected to adopt the open source database after 2009, but the overall prediction was decline. While 82.1% of our sample of 347 open source users were using MySQL in 2009, only 78.7% expected to be using it in 2011, declining to 72.3% in 2014.
This represented an interesting snapshot of sentiment towards MySQL, but the result also had to be taken with a pinch of salt given the significant level of concern regarding MySQL future at the time the survey was conducted.

The survey also showed that only 17% of MySQL users thought that Oracle should be allowed to keep MySQL, while 14% of MySQL users were less likely to use MySQL if Oracle completed the acquisition.
That is why we are asking similar questions again, in our recently launched MySQL/NoSQL/NewSQL survey.
More than two years later Oracle has demonstrated that it did not have nefarious plans for MySQL. While its stewardship has not been without controversial moments, Oracle has also invested in the MySQL development process and improved the performance of the core product significantly. There are undoubtedly users that have turned away from MySQL because of Oracle but we also hear of others that have adopted the open source database specifically because of Oracle’s backing.
That is why we are now asking MySQL users to again tell us about their database usage, as well as attitudes to MySQL following its acquisition by Oracle. Since the database landscape has changed considerably late 2009, we are now also asking about NoSQL and NewSQL adoption plans.
Is MySQL usage really in decline, or was the dip suggested by our 2009 survey the result of a frenzy of uncertainty and doubt given the imminent acquisition. Will our current survey confirm or contradict that result? If you’re a MySQL user, tell us about your adoption plans by taking our current survey.
Tags: MySQL, NewSQL, noSQL, Oracle, Survey
January 20th, 2012 — Simon Robinson
Archiving, eDiscovery, M&A
We commented recently on Symantec’s acquisition of cloud archiving specialist LiveOffice. The announcement also afforded Big Yellow an opportunity to unveil what it calls “Intelligent Information Governance;” an over-arching theme that provides the context for some of the product-level integrations it has been working on. For example, it just announced improved integration between its Clearwell eDiscovery suite and its on-premise archive software, EnterpriseVault (stay tuned for more on this following LegalTech later this month).
There’s clearly an opportunity to go deeper than product-level ‘integration,’ however. In a blog post, Symantec VP Brian Dye raised an issue that we have been seeing for a while, especially among some of our larger end-user clients. In the post, Brian discusses the fundamental contention that all of us – from individuals to corporations to governments — face around information governance — striking the right balance between control of information and freedom of information.
Software has emerged to help us manage this contention, most typically through data loss prevention (DLP) tools – to control what data does and doesn’t leave the organization — and eDiscovery and records management tools, to control what data is retained, and for how long. Brian noted that there is an opportunity to do much more here by linking the two sides of what is in many ways the same coin, for example by sharing the classification schemes used to define and manage critical and confidential information.
This is an idea that we have discussed at length internally, with some of our larger end-user clients, and with a good few security and IM vendors. Notably, many vendors responded by telling us that, though a good idea in principle, in reality organizations are too siloed to get value from such capabilities; DLP is owned and operated by the security team, while eDiscovery is managed by legal, records management and technology teams. While some of the end-users we have discussed this with are certainly siloed to a point, they are also working to address this issue by developing a more collaborative approach, establishing cross-functional teams, and so on.
A cynic would point out that some self interest might be at play here too from a vendor perspective; why sell one integrated product to a company when you can sell them essentially the same technology twice. But of course, we’re not the remotest bit cynical (!) There is also the reality that at most large vendors, product portfolios have been put together at least in part by acquisitions. Security and e-discovery products may be sold separately because they are, in fact, separate products with little to no integration in terms of products or sales organizations. And vendors may not yet be motivated to do the hard integration work (technically, organizationally), if they are not seeing consistent enough demand from consolidated buying teams at large organizations.
Wendy Nather, Research Director of our security practice, notes that such integration is desirable;
- Users don’t WANT to have meta-thoughts about their data; they just want to get their work done, which is why it’s hard to implement a user-driven classification process for DLP or for governance. The alternative is a top-down implementation, and that would work even better with only one ‘top’ — that is, the security and legal teams working from the same integrated page.
However, Wendy also notes that such an approach is itself not without complexity;
- Confidential data can be highly contextual in nature (for example, when data samples get small enough to identify individuals, triggering HIPAA or FERPA); you need advanced analytics on top of your DLP to trigger a re-classification when this happens. Why, you might even call this Data Event Management (DEM).
It’s notable that Symantec is now starting to talk up the notion of a unified, or converged approach to data classification. Of course, it is one of the better-positioned vendors to take advantage here, given its acquisitions in both DLP (Vontu in 2007) and eDiscovery (Clearwell in 2011), while LiveOffice adds some intriguing options for doing some of this in the cloud (especially if merged with its hosted security offerings from MessageLabs).
Nonetheless, we look forward to hearing more from Symantec — and others — about progress here through 2012. Indeed, if you are attending LegalTech in New York in a couple of weeks, then our eDiscovery analyst David Horrigan would love to hear your thoughts. Additionally, senior security analyst Steve Coplan will be taking a longer look at the convergence of data management and security in his upcoming report on “The Identities of Data.”
In other words, this is a topic that we’re expending a fair amount of energy on ourselves; watch this space!
Tags: cloud, DLP, eDiscovery, information governance, M&A