Entries from June 2012 ↓

Hadoop’s potential to revolutionise the IT industry

Platfora’s CEO Ben Werther recently wrote a great post explaining the benefits of Apache Hadoop and its potential to play a major role in a modern-day equivalent of the industrial revolution.

Ben highlights one of the important aspects of our Total Data concept, that generating value from data is about more than just the volume, variety, and velocity of ‘big data’, but also the way in which the user wants to interact with their data.

“What has changed – the heart of the ‘big data’ shift – is only peripherally about the volume of data. Companies are realizing that there is surprising value locked up in their data, but in unanticipated ways that will only emerge down the road.”

He also rightly points out that while Hadoop provides what is fast-becoming the platform of choice for storing all of this data, from an industrial revolution perspective we are still reliant on the equivalent of expert blacksmiths to make sense of all the data.

“Since every company of any scale is going to need to leverage big data, as an industry we either need to train up hundreds of thousands of expert blacksmiths (aka data scientists) or find a way into the industrialized world (aka better tools and technology that dramatically lower the bar to harnessing big data).”

This is a point that Cloudera CEO Mike Olson has been making in recent months. As he stated during his presentation at last month’s OSBC: “we need to see a new class of applications that exploit the benefits and architecture of Hadoop.”

There has been a tremendous amount of effort in the past 12-18 months to integrate Hadoop into the existing data management landscape, via the development of uni- and bi-directional connectors and translators that enable the co-existence of Hadoop with existing relational and non-relational databases and SQL analytics and reporting tools.

This is extremely valuable – especially for enterprises with a heavy investment in SQL tools and skills. As Larry Feinsmith, Managing Director, Office of the CIO, JPMorgan Chase pointed out at last year’s Hadoop World: “it is vitally important that new big data tools integrate with existing products and tools”.

This is why ‘dependency’ (on existing tools/skills) is an integral element of the Total Data concept alongside totality, exploration and frequency.

However, this integration of Hadoop into the established data management market really only gets the industry so far, and in doing-so maintains the SQL-centric view of the world that has dominated for decades.

As Ben suggests, the true start of the ‘industrial revolution’ will begin with the delivery of tools that are specifically designed to take advantage of Hadoop and other technologies and that bring the benefits of big data to the masses.

We are just beginning to see the delivery of these tools and to think beyond the SQL-centric perspective with analytics approaches specifically designed to take advantage of MapReduce and/or the Hadoop Distributed File System. This again though, signals only the end of the beginning of the revolution.

‘Big data’ describes the realization of greater business intelligence by storing, processing and analyzing data that was previously ignored due to the limitations of traditional data management technologies.

The true impact of ‘big data’ will only be realised once people and companies begin to change their behaviour, using this greater business intelligence gained from using tools specifically designed to exploit the benefits and architecture of Hadoop and other emerging data processing technologies, to alter business processes and practices.

Forthcoming Webinar: Embracing Big Data with Pentaho & 451 Research

One June 21, 2012 at 10:00 am PT I’ll be taking part in a live webinar with Pentaho to discuss practical approaches to tacking ‘big data’ challenges.

With Richard Daley, Founder and Chief Strategy Officer, Pentaho, we’ll be exploring a successful, agile approach that business users and data scientists are using to analyze and visualize data from Hadoop, NoSQL and relational sources.

In particular. I’ll be examining the approaches taken by a number of real companies to choosing between data management technologies for particular projects, and increasingly important decision for which there is no simple answer.

Attend this webinar to understand:

  • How big data fits into the total data fabric
  • Why there is a reliance on existing technologies and skills
  • The need to balance investment in those existing technologies and skills with the adoption of new techniques

Full registration details for the event can be found here.