The Data Day, Today: Feb 29 2012

Microsoft and Hortonworks expand Hadoop partnership. Oracle ships Exalytics. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Hortonworks to Bring Apache Hadoop to Millions of New Users Hortonworks and Microsoft expanded their relationship around Apache Hadoop.

* Oracle Announces Availability of Oracle Exalytics In-Memory Machine

* Fujitsu Releases “Interstage Big Data Parallel Processing Server V1.0” to Help Enterprises Utilize Big Data

* Pentaho and DataStax announce strategic partnership delivering the first complete Apache Cassandra-based big data analytics solution to the market

* Cloudant Names Andy Palmer to its Board of Directors

* R integrated throughout the enterprise analytics stack

* Jaspersoft Announces Big Data Index to Track Demand for Big Data Analytics

* 1010data Enables Companies to Rapidly Model and Predict Individual Consumer Behavior and Social Network Relationships

* Tableau Software Teams with Attivio to Tap Unstructured Content and Deliver Deeper Insight to Business Users

* Infochimps and the Future of Data Marketplaces “This is the clearest indication yet that data marketplaces may be the latest ‘Application Service Provider’ cycle, as in right idea, wrong time.”

* HStreaming and RainStor Partner to Lower the Cost of Big-Data Analytics on Hadoop

* JustOne Database Sets the Stage for Accelerated Growth in 2012 and Beyond

* Big Data investment map

* A group of Google Engineers released “vitess” – a project to help scale MySQL databases.

And that’s the Data Day, today.

A stupid question about in-memory analytics

During my first trip to Oracle OpenWorld as an analyst a few years ago I asked a room full of Oracle data-warehousing users whether any of them had explored the use-cases for other Oracle data management assets, such as the TimesTen in-memory database.

The question was met with complete silence before Ken Jacobs kindly suggested that perhaps this wasn’t the right crowd for that sort of question.

It was one of those moments that really haunts you. My first industry event as an analyst and I had embarrassed myself by asking an apparently stupid question in front of a room of more experienced colleagues and potential clients.

Doesn’t seem like such a stupid question now though, does it?

I have exorcised the demons! This house is clear.

The Data Day, Today: Feb 24 2012

Teradata partners with Hortonworks. New CEOs for Zettaset and VoltDB. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Teradata-Hortonworks Partnership to Accelerate Business Value from Big Data Technologies

* Skytree Unlocks the Advanced Analytics Power of Big Data with Unprecedented Performance, Scalability and Accuracy

* Big Data Innovator Zettaset Appoints Jim Vogt as New President and CEO

* Zettaset to Create Secure Hadoop with ‘SHadoop’ Initiative

* VoltDB Names Bruce Reading President and Chief Executive Officer

* Basho Unveils New Graphical Operations Dashboard, Diagnostics With Release of Riak 1.1

* Pervasive RushAnalyzer Launches ‘No Compromise’ Predictive Analytics for Hadoop and Big Data

* QlikTech Reveals Pricing for its QlikView Business Discovery Platform

* Kognitio Announces Completely Memory-Based Pricing

* Objectivity Adds New Plugin Framework, Integrated Visualizer And Support For Tinkerpop Blueprints To InfiniteGraph

* Announcing the Infochimps Platform for Big Data

* Big Data, Hadoop and StreamInsight

* Three New Cloud Providers join the MongoDB ecosystem

* Hadoop Has Promise but Also Problems

* Hortonworks: Reaffirming our Commitment to 100% Pure Open Source Despite speculation to the contrary.

* WhySQL? Evernote explains why it continues to use SQL databases.

* More on database consistency Anders Karlsson explains the different definitions of database consistency.

* Graphic proof of big demand for big data talent Or just graphic proof of use of phrase ‘big data’ in jobs ads?

* Will ‘big data’ transform your industry?

And that's the Data Day, today.

Updated: sizing the big data problem: ‘big data’ is *still* the problem

In late 2010 I published a post discussing the problems associated with trying to size the ‘big data’ market based on a lack of clarity on the definition of the term and what technologies it applies to.

In that post we discussed a 2010 Bank of America Merrill Lynch report that estimated that ‘big data’ represented a total addressable market worth $64bn. This week Wikibon estimated that the big data market stands at just over $5bn in factory revenue growing to over $50bn by 2017, while Deloitte estimated that industry revenues will likely be in the range of $1-1.5bn this year.

To put that in perspective, Bank of America Merrill Lynch estimated that the total addressable market for ‘big data’ in 2010 was this

Wikibon estimates that the ‘big data’ market in 2012 is this

and Deloite estimates that the ‘big data’ market in 2012 is this

UPDATE – IDC has become the first of the big analyst vendors to break out its big data abacuses (abaci?). IDC thinks the ‘big data’ market in 2010 was $3.2bn. That’s this

Not surprisingly they came to their numbers by different means. BoA added up market estimates for database software, storage and servers for databases, BI and analytics software, data integration, master data management, text analytics, database-related cloud revenue, complex event processing and NoSQL databases.

Wikibon came to its estimate by adding up revenue associated with a select group of technologies and a select group of vendors, while Deloitte added up revenue estimates for database, ERP and BI software, reduced the total by 90% to reflect the proportion of data warehouses with more than five terabytes of data, and reduced that total by 80-85% to reflect the low level of current adoption.

IDC, meanwhile, went through a slightly tortuous route of defining the market based on the volume of data collected, OR deployments of ultra-high-speed messaging technology, OR rapidly growing data sets, AND the use of scale-out architecture, AND the use of two or more data types OR high-speed data sources.

There is something to be said for each of these definitions. But equally each can be easily dismissed. We previously described our issues with the all-inclusive nature of the BoA numbers, and while we find Wikibon’s process much more agreeable, some of the individual numbers they have come up with are highly questionable. Deloitte’s methodology is surreal, but defensible. IDC’s just illustrates the problem:

What this highlights is that the essential problem is the lack of definition for ‘big data’. As we stated in 2010: “The biggest problem with ‘big data’… is that the term has not been – and arguably cannot be – defined in any measurable way. How big is the ‘big data’ market? You may as well ask ‘how long is a piece of string?'”

The Data Day, Today: Feb 17 2012

Rob Bearden is new Hortonworks CEO. Oracle updates MySQL Cluster. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* With a new CEO, will Hortonworks get serious about Hadoop? Rob Bearden is Hortonworks’ new chief elephant-herder.

* Oracle Oracle Announces General Availability of MySQL Cluster 7.2. Claims 1 billion queries per minute.

* Vertica Extends Manageability and Ease-of-Use for the Vertica Analytics Platform.

* NGDATA Raises Capital to Accelerate Growth Belgian data management company confirms that it recently acquired Outerthought.

* QlikTech Announces Fourth Quarter and Full Year 2011 Financial Results

* Quest Business Intelligence Studio 1.0 is now Generally Available

* Jaspersoft Delivers Big Data Integration into Jaspersoft ETL

* Hortonworks University Launches to Deliver Comprehensive Apache Hadoop Training and Certification.

* Schema in Cassandra 1.1. “as systems deployed on Cassandra grew and matured, lack of schema became a pain point”

* GigaSpaces Announces New Cloudify Free Product Edition.

* Announcing Reduced Pricing on SQL Azure and New 100MB Database Option.

* Composite Software Continues Innovation with Release of Version 6.1 of its Data Virtualization Platform.

* Multi-Tenant Cloudant in Europe.

* NuvolaBase has launched its hosted graph database offering.

* How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did. Big data’s beer and diapers equivalent.

* Beyond “Big Data”. “I have a theory that buzzwords are usually helpful in general, in that they usher in new concepts before they end up as meaningless marketing fluff–and, eventually, punchlines. I think this is in the process of happening right now with the term “big data”.”

And that's the Data Day, today.

The Data Day, Today: Feb 14 2012

Teradata closes best year ever. NetApp and EMC propose big data forum. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Teradata Announces 2011 Fourth Quarter and Full-Year Results (PDF)

* Hell Has Not Frozen Over: NetApp and EMC Combine to Educate for Big Data Standards

* Cray Forms New Big Data Division, Hires New General Manager

* Privacy in the Age of Big Data

* ScaleBase Unveils New Elastic Load Balancing Feature at Cloud Connect

* Introducing CDH4

* Lucid Imagination “Search-as-a-Service” Powers Flexible, Cost-Effective Enterprise-Wide Data Discovery

* Couchbase Survey Shows Accelerated Adoption of NoSQL in 2012

* Open Source OData Tools for MySQL and PHP Developers

* New Release of WhereScape’s Data Warehouse Development Environment Enables Cross-Platform Database Appliance Support

* On MongoDB, SQL and ACID

And that's the Data Day, today.

Webinar: Scaling Big-Data Applications with NewSQL Database Solutions

Next week – February 15, 2012, at 10am PT to be precise – I’ll be taking part in a webinar with Clustrix to discuss scaling big database applications with NewSQL databases.

I’ll be providing an overview of the origins of the term NewSQL, its adoption, and a discussion of the core technologies that we see fitting into the category, as well as an explanation of the key considerations for choosing between NoSQL and NewSQL databases.

Other webinar participants include Robin Purohit, Clustrix president and CEO; and Aaron Passey, Clustrix CTO, and there will also be a presentation by Massive Media, which recently announced that it has used Clustrix to build and grow Twoo, a social networking site, to more than four million users in only six months without sharding the application and without any downtime.

Register for the event here.

The Data Day, Today: Feb 8 2012

SAP targets HANA at SMEs. WibiData raises $5m. Zimory acquires Sones devs. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* SAP to Arm Small and Midsize Enterprises With Real-Time Analytics Powered by SAP HANA

* Hadoop startup WibiData raises $5M to power web analytics

* Zimory Acquires Database Development Team from Sones

* Oracle Announces Availability of Oracle Advanced Analytics for Big Data

* Kalido Fuels Growth with New Customers, Market Leading Data Governance Capabilities in 2011

* Xeround Announces Free Version of Popular Cloud Database

* Hypertable Inc. Announces New Products and Services for Next Generation Hadoop NoSQL Database Deployments

* Cloudera Connector for Tableau Has Been Released

* Information Builders Launches WebFOCUS Hyperstage to Speed Performance and Delivery of Business Intelligence

* Actian Releases Vectorwise Workgroup Edition, Claims Best in Affordable Big Data Analytics to Mid-Market

* 10gen and Carahsoft Partner to Bring Leading NoSQL Solution to Government Sector

* MySQL progress in a year

* Endeca CEO: We wanted IPO, but Oracle acquisition gave peace of mind

And that's the Data Day, today.

The Data Day, Today: Feb 3 2012

New CEO at Revolution. Pentaho goes big data. EMC Hadoop gets Isilon. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Revolution Analytics Names David Rich New CEO

* Pentaho Open Sources Big Data Capabilities to Further Fuel Widespread Adoption

* EMC Isilon is Industry’s First Scale-Out NAS System with Native Hadoop Support

* Actuate Reports Fourth Quarter and Fiscal Year 2011 Financial Results

* Sumo Logic Raises $15M Series B Round for Next Generation Log Management and Analytics

* Announcing Oracle R Enterprise 1.0

* Paul Cormier Joins Hortonworks’ Board of Directors

* DataStax Launches First Complete Solution for Cassandra Development on Windows and Mac

* Latest Release of Kalido Information Engine Eliminates Data Mart Migration and Consolidation Hassles

* Karmasphere Brings More Power, Collaboration, and Faster Insights to Big Data Analytics Teams on Hadoop

* Why Big Data Won’t Make You Smart, Rich, Or Pretty

* SAP HANA – slowly moving out of hype into actual projects

And that's the Data Day, today.

Last chance to take part in our MySQL/NoSQL/NewSQL survey

Thanks to everyone who has already taken part in our survey exploring changing attitudes to MySQL following its acquisition by Oracle and examining the competitive dynamic between MySQL and other database technologies, including NoSQL and NewSQL.

The response has been great and even a quick look at the results makes for interesting reading, particularly in the light of our previous findings which indicated declining MySQL usage.

I am really looking forward to having the opportunity for a deep dive into the results and break out the figures to get a better understanding of the potential impact of alternative MySQL distribution and support providers, as well as NoSQL and NewSQL, on continued usage of MySQL.

The survey results will be made freely available on our blogs, as well as being included in a long format report containing our additional analysis and research related to the MySQL ecosystem and competitive dynamic.

Right now, however, is your last chance to contribute to the survey and get your voice heard. There are just 12 questions to answer, spread over four pages, and the entire survey should take no longer than five minutes to complete. All individual responses are of course confidential.

The survey will close in 24 hours.