7 Hadoop questions. Q6: Hadoop’s shortcomings

What are the major shortcomings of Hadoop? The answer to that questions looks set to shape the future development roadmap for the open source data processing framework, which is why it is one of the major questions being asked as part of our 451 Research 2013 Hadoop survey.


The limitations of Hadoop have been widely reported over the years, but as the Apache Hadoop community and related vendors have responded to issues such as reliability and high availability – not least via the now generally available Apache Hadoop 2 – so attention turns to other areas such as security, administration and performance, as well as more advanced functionality requirements, including graph processing, stream processing, improved SQL support and virtualization support.


The list of potential improvements is therefore fairly long, and as we near the end of our survey it is interesting to see that the list of key advances respondents are looking for in order to increase adoption of Hadoop is fairly widespread.

So far the responses to our Hadoop survey suggest administration tooling and performance top the list, followed by reliability, SQL support and backup and recovery, but development tools and authentication and access control are not far behind.

EMC World redux, and EMC’s own ‘journey’

We recently attended EMC’s annual user conflab – EMC World – in Boston. The 451 Group was there in force, with Kathleen Reidy and Katey Wood representing our Information Management agenda, as well as Henry Baltazar and myself on the storage side. Yes, it’s taken me longer than I though to put some thoughts together – which I am attributing to the fact that I have been involved in the Uptime Institute’s Symposium in New York this week; an excuse that I am sticking to!

For our take on some of the specific product announcements that EMC made at the show, I would refer you to the reports we have already published (on V-Plex, Atmos, SourceOne, mid-range storage and Backup and Recovery Systems). But aside from these, I was struck by a few other broader themes at EMC that i think are worth commenting on further.

First, the unavoidable — and even overwhelming — high-level message at EMC World revolved around the ‘journey to the private cloud,’ – in other words, how EMC is claiming to help customers move from where they are now to a future where their IT is more efficient, flexible and responsive to the business. Whether or not you believe the ‘private cloud’ message is the right one – and I talked with as many ‘believers’ as I did ‘skeptics’ – there’s no doubt that EMC has the proverbial ball and is running with it. I can’t think of many other single-vendor conferences that are as fully committed to cloud as EMC, and given EMC’s established position in the enterprise datacenter and its range of services that range across virtualization, security and information management, you can understand why it has cloud religion.

But there undoubtedly is risk associated with such a committed position; I don’t believe ‘cloud’ will necessarily go the way of ‘ILM,’ for example, but EMC needs to start delivering tangible evidence that it really is helping customers achieve results in new and innovative ways.

Another issue EMC has to be careful about is its characterization of ‘virtualization’ versus ‘verticalization.’ This is designed to position EMC’s ‘virtualization’ approach as a more flexible and dynamic way of deploying a range of IT services and apps across best-of-breed ‘pools’ of infrastructure, more dynamically than through the vertical stacks that are being espoused by Oracle in particular.

Though I believe that a fascinating — even idealogical — battle is shaping up here, it’s not quite so clear-cut as EMC would have you believe. What is a vBlock if not a vertically integrated and highly optimized storage, server, network and virtualization stack? And doesn’t the new vBlock announcement with SAP offer an alternative that is in many ways comparable with the Oracle ‘stack’ (especially if you throw in Sybase as well)? I get the difference between an Oracle-only stack and a more partner-driven alternative, but I think the characterization of virtualization as ‘good’ and verticalization as ‘bad’ is overly simplistic; the reality is much more nuanced, and EMC itself is embracing elements of both.

Speaking of journeys, it’s also clear to me that EMC is on a journey of its own, both in terms of the products it offers (and the way it is building them), and in terms of how it positions itself. EMC has always been a technology company that lets its products do the talking; but in an era where larger enterprises are looking to do business with fewer strategic partners, this isn’t always enough. Hence, the ‘journey to the private cloud’ is designed to help EMC position itself as a more strategic partner for its customers, while efforts such as the VCE (VMware, Cisco and EMC) coalition bring in the other infrastructure elements that EMC itself doesn’t offer. At the conference itself, much of the messaging was focused on how EMC can help deliver value to customers, and not just on the products themselves.

This approach is a rather radical change for EMC. Though it remains at its core a conservative organization, I think this more ‘holistic’ approach is evidence that two senior management additions EMC has added recently are starting to make their presence felt.

The first hire was that of COO Pat Gelsinger, an ex-Intel exec who has been brought to assemble a plan to execute on the private cloud strategy. As well as a very strong technical pedigree, Gelsinger’s strength is the combination of an ability to conceive and articulate the big picture, as well as understand the tactical steps that are required to realize this; including product development, customer satisfaction and M&A. It seems to me that Gelsinger is already immensely respected within EMC, and already seems regarded by some as CEO-in-waiting; a transition that would be a shoe-in should this strategy pay off.

The other key addition is that of ex-Veritas and Symantec CMO Jeremy Burton as EMC’s first chief marketing officer. To me, this appointment underscores EMC’s need to market itself both more aggressively, as well as differently, in order to maintain and grow its position in the market. Though Burton has only been in the job for a few weeks, we got a sense at EMC World of how he may reshape EMC’s public image; a more light-hearted approach to keynotes (some of which worked better than others, but you have to start somewhere!) bore Burton’s hallmarks, for example.

But if Burton came to EMC for a challenge, I think he has one; EMC’s reputation and brand in the large datacenter is solid, but it has work to do to build its image in the lower-reaches of the market, an area that CEO Joe Tucci has highlighted as a major growth opportunity.

Although this is as much a product challenge as anything else, EMC must also carefully consider how it brands itself to this audience. Will an existing EMC brand – Clariion, Iomega or even Mozy — appeal to a smaller storage buyer, or does it come up with something entirely new? Given its disparate product set here, could an acquisition of an established smaller-end player provide it with an instant presence?

Then there’s the issue of direct marketing; today, EMC spends a fraction of its rivals on advertising in the trade and business press. Given Burton’s background at Oracle and Symantec, plus the growing imperative for IT companies to appeal to the C-level suite to reinforce their strategic relevance, could EMC soon be featuring on the back page of the Economist?

Upcoming presentation on virtualization and storage

I’m going to be presenting the introductory session at a BrightTalk virtual conference on March 25 on the role and impact of the virtual server revolution on the storage infrastructure. Although it’s been evident for some time that the emergence of server virtualization has had — and continues to have — a meaningful impact on the storage world, the sheer pace of change here makes this a worthwhile topic to revisit. As the first presenter of the event — the conference runs all day — it’s my job to set the scene; as well as introducing the topic within the context of the challenges that IT and storage managers face, I’ll outline a few issues that will hopefully serve as discussion points throughout the day.

Deciding on which issues to focus on is actually a lot harder than it sounds — I only have 45 minutes — because, when you start digging into it, the impact of virtualization on storage is profound on just about every level; performance, capacity (and more importantly, capacity utilization), data protection and reliability, and management.

I’ll aim to touch on as many of these points as time allows, as well as provide some thoughts on the questions that IT and storage managers should be asking when considering how to improve their storage infrastructure to get the most out of an increasingly virtualized datacenter.

The idea is to make this a thought-provoking and interactive session. Register for the live presentation here: http://www.brighttalk.com/webcast/6907.  After registering you will receive a confirmation email as well as a 24-hour reminder email.  As a live attendee you will be able to interact with me by posing questions which I will be able to answer on air.  If you are unable to watch live, the presentation will remain available via the link above for on-demand participation.