March 19th, 2012 — Data management
I’m gearing up for a busy few weeks of international travel with presentations in the Europe and both the east and west coasts of the US.
It all starts on March 28 when I’ll be heading to London for Cassandra Europe 2012 where I’m looking forward to attending a packed schedule of Apache Cassandra case studies. Later in the day I’ll be essentially improvising a presentation combining our view of the state of the NoSQL market with an overview of highlights from the case studies stream for those who have attended the workshop stream.
The following week is HCTS EU, 451 Research’s own event in London, which takes place on April 2-3 and is Europe’s go-to convergence event for CIOs, cloud decision makers, vendors and investors. On April 3 I’ll be presenting our ‘Big Data’ Survival Guide – explaining the importance of ‘big data’ – what it is, what it isn’t and why you should care, we well as 451 Research’s associated concept of Total Data, designed to enable the realisation of valuable business intelligence from ‘big data’.
After a quick trip to California for an analyst event I’ll be heading for Zurich for a couple of events where I’ll be explaining our perspective on the development and adoption of NoSQL and NewSQL databases, including some insights from our forthcoming long format report on the competitive dynamic between MySQL, NoSQL and NewSQL. Specifically, I’ll be presenting at the ESE Conference on March 25th, followed by the NoSQL Road Show on March 26.
Then I’m off to Washington DC to attend MarkLogic World, where I’ll be appearing on a panel with other analysts on May 2 to discuss the impact and implications of ‘big data’.
At some point during all this traveling I’ll be completing the forthcoming long format report on the competitive dynamic between MySQL, NoSQL and NewSQL, hopefully before I’m back in California for OSBC, where I’m scheduled to present our findings on May 21.
Look out also for details of a couple of webinars currently being scheduled between now and the end of May as well.
And then I’m going on holiday.
March 13th, 2012 — Data management
March 8th, 2012 — Data management
Microsoft launches SQL Server 2012. MapR integrates with Informatica. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.
* Microsoft Releases SQL Server 2012 to Help Customers Manage “Any Data, Any Size, Anywhere”
* SQL Server 2012 Released to Manufacturing
* SAS Access to Hadoop Links Leading Analytics, Big Data
* MapR And Informatica Announce Joint Support To Deliver High Performance Big Data Integration And Analysis
* Teradata Expands Integrated Analytics Portfolio
* New Teradata Platform Reshapes Business Intelligence Industry
* Microsoft’s Trinity: A graph database with web-scale potential
* KXEN Announces Availability of InfiniteInsight Version 6, a Predictive Analytics Solution with Unprecedented Agility, Productivity, and Ease of Use
* Software AG Announces its Strategy for the In-memory Management of Big Data
* Attunity and Hortonworks Announce Partnership to Simplify Big Data Integration with Apache Hadoop
* Schooner Information Technology and Ispirer Systems Partner to Deliver SQLWays for SchoonerSQL
* Big Data & Search-Based Applications
* Namenode HA Reaches a Major Milestone
* How Twitter is doing its part to democratize big data
* Dropping Prices Again– EC2, RDS, EMR and ElastiCache
* For 451 Research clients
# SAS outlines Hadoop strategy, previews Hadoop-based in-memory analytics Market Development report
# Pervasive rides the elephant into ‘big data’ predictive analytics Market Development report
# IBM makes desktop discovery and analysis play, shares business analytics priorities Market Development report
# Clustrix launches SDK to tap developer interest in new databases Market Development report
# Continuent and SkySQL team up for clustered MySQL support Analyst note
# MapR gets a boost from Cisco and Informatica Analyst note
And that’s the Data Day, today.
March 8th, 2012 — Data management
We recently speculated that EMC Greenplum’s focus on the integration of its Greenplum HD Hadoop distribution with its Data Computing Appliance (DCA) and Isilon storage technology would mean an increasingly niche role for Greenplum MR- the Hadoop distribution based on MapR’s M5.
Two recent announcements indicate that niche might continue to be a lucrative one for MapR, however. First, Cisco released details of a reference architecture for deploying Greenplum MR on Cisco’s UCS servers. Then Informatica announced a partnership with MapR to jointly support its Data Integration Platform running on MapR’s distribution for Hadoop.
The Informatica relationship also covers bi-directional data integration with Informatica PowerCenter and Informatica PowerExchange, snapshot replication using Informatica FastClone, and data streaming into MapR’s distribution via NFS using Informatica Ultra Messaging. In addition, In addition, the free Informatica HParser Community Edition will be available for download as part of the MapR distribution.
While the partnership with Informatica is a direct one for MapR, the Cisco reference architecture announcement illustrates that the benefit MapR gains from its relationship with EMC Greenplum includes exploiting the company’s leverage with potential partners.
March 2nd, 2012 — Data management
February 29th, 2012 — Data management
February 28th, 2012 — Data management
During my first trip to Oracle OpenWorld as an analyst a few years ago I asked a room full of Oracle data-warehousing users whether any of them had explored the use-cases for other Oracle data management assets, such as the TimesTen in-memory database.
The question was met with complete silence before Ken Jacobs kindly suggested that perhaps this wasn’t the right crowd for that sort of question.
It was one of those moments that really haunts you. My first industry event as an analyst and I had embarrassed myself by asking an apparently stupid question in front of a room of more experienced colleagues and potential clients.
Doesn’t seem like such a stupid question now though, does it?
I have exorcised the demons! This house is clear.
February 24th, 2012 — Data management
February 22nd, 2012 — Data management
In late 2010 I published a post discussing the problems associated with trying to size the ‘big data’ market based on a lack of clarity on the definition of the term and what technologies it applies to.
In that post we discussed a 2010 Bank of America Merrill Lynch report that estimated that ‘big data’ represented a total addressable market worth $64bn. This week Wikibon estimated that the big data market stands at just over $5bn in factory revenue growing to over $50bn by 2017, while Deloitte estimated that industry revenues will likely be in the range of $1-1.5bn this year.
To put that in perspective, Bank of America Merrill Lynch estimated that the total addressable market for ‘big data’ in 2010 was this

Wikibon estimates that the ‘big data’ market in 2012 is this

and Deloite estimates that the ‘big data’ market in 2012 is this

UPDATE – IDC has become the first of the big analyst vendors to break out its big data abacuses (abaci?). IDC thinks the ‘big data’ market in 2010 was $3.2bn. That’s this

Not surprisingly they came to their numbers by different means. BoA added up market estimates for database software, storage and servers for databases, BI and analytics software, data integration, master data management, text analytics, database-related cloud revenue, complex event processing and NoSQL databases.
Wikibon came to its estimate by adding up revenue associated with a select group of technologies and a select group of vendors, while Deloitte added up revenue estimates for database, ERP and BI software, reduced the total by 90% to reflect the proportion of data warehouses with more than five terabytes of data, and reduced that total by 80-85% to reflect the low level of current adoption.
IDC, meanwhile, went through a slightly tortuous route of defining the market based on the volume of data collected, OR deployments of ultra-high-speed messaging technology, OR rapidly growing data sets, AND the use of scale-out architecture, AND the use of two or more data types OR high-speed data sources.
There is something to be said for each of these definitions. But equally each can be easily dismissed. We previously described our issues with the all-inclusive nature of the BoA numbers, and while we find Wikibon’s process much more agreeable, some of the individual numbers they have come up with are highly questionable. Deloitte’s methodology is surreal, but defensible. IDC’s just illustrates the problem:
What this highlights is that the essential problem is the lack of definition for ‘big data’. As we stated in 2010: “The biggest problem with ‘big data’… is that the term has not been – and arguably cannot be – defined in any measurable way. How big is the ‘big data’ market? You may as well ask ‘how long is a piece of string?’”
February 17th, 2012 — Data management