Entries from September 2011 ↓
September 29th, 2011 — Data management
The 451 Group is conducting a survey into end user attitudes towards the potential benefits of ‘big data’ and new and emerging data management technologies.
Created in conjunction with TheInfoPro, a division of The 451 Group focused on real-world perspectives on the IT customer, the survey contains less than 20 questions and does not ask for details of specific projects. It does cover data volumes and complexity, as well as attitudes to emerging data management technologies – such as Hadoop and exploratory analytics, as well as NoSQL and NewSQL – for certain workloads.
In return for your participation, you will receive a copy of a forthcoming long-format report covering introducing Total Data, The 451 Group’s concept for explaining the changing data management landscape, which will include the results. Respondents will also have the opportunity to become members of TheInfoPro’s peer network.
The survey is expected to close in late October and we are also plan to provide a snapshot of the results in our presentation, The Blind Men and The Elephant, at Hadoop World in early November.
Many thanks in advance for your participation in this survey. We look forward to sharing the results with you. The survey can be found at http://bit.ly/451data
September 27th, 2011 — Data management
I’ll be taking our data management research out on the road in the next few months with a number of events, webinars and presentations.
On October 12 I’m taking part in the NoSQL Road Show Amsterdam, with Basho, Trifork and Erlang Solutions, where I’ll be presenting NoSQL, NewSQL, Big Data…Total Data – The Future of Enterprise Data Management.
The following week, October 18, I’m taking part in the Hadoop Tuesdays series of webinars, presented by Cloudera and Informatica, specifically talking about the Hadoop Ecosystem.
The Apache Hadoop ecosystem will again be the focus of attention on November 8 and 9, when I’ll be in New York for Hadoop World, presenting The Blind Men and the Elephant.
Then it’s back to NoSQL with two more stops on the NoSQL Road Show, in London on November 29 and Stockholm on December 1, where I’ll once again be presenting NoSQL, NewSQL, Big Data…Total Data – The Future of Enterprise Data Management.
I hope you can join us for at least one of these events, and am looking forward to learning a lot about NoSQL and Apache Hadoop adoption, interest and concerns.
September 22nd, 2011 — Search
Customers of The 451 Group would have seen my report on the enterprise search market published September 15. If you are a client, you can view it here. I thought it would be useful to provide a condensed version of the report to a wider audience as I think the market is at an important point it in its development and it merits a broader discussion.
The enterprise search market is morphing before our eyes into something new. Portions of it are disappearing, and others are moving into adjacent markets, but a core part of it will remain intact. A few key factors have caused this, we think. Some are historical, by which we mean they had their largest effect in the past, but the ongoing effect is still being felt, whereas the contemporary factors are the ones that we think are having their largest impact now, and will continue to do so in the short-term future (12-18 months).
- Over-promising and under-delivery of intranet search between the last two US recessions, roughly between 2002 and 2007, resulting in a lot of failed projects.
- A lack of market awareness and understanding of the value and risk inherent in unstructured data.
- The entrance of Google into the market in 2002.
- The lack of vision by certain closely related players in enterprise content management (ECM) and business intelligence (BI).
- The lack of a clear value proposition for enterprise search.
- The rise of open source, in particular Apache Lucene/Solr.
- The emergence of big data, or total data.
- The social media explosion.
- The rapid spread of SharePoint.
- The acquisitive growth of Autonomy Corp.
- Acquisition of fast-growing players by major software vendors, notably Dassault Systemes, Hewlett-Packard and Microsoft.
The result of all this has been a split into roughly four markets, which we refer to as low-end, midmarket, OEM and high-end search-based applications.
The low-end, or entry-level, enterprise search market has become, if not commodified, then pretty close to it. It is dominated by Google and open source. Other commercial vendors that once played in it have mostly left the market.
The result is that potential entry-level enterprise search customers are left with a dichotomy of choices: Google’s yellow search appliances that have two-year-term licenses and somewhat limited configurability (but are truly plug-and-play options) on the one hand, and open source on the other. It is a closed versus a very open box, and they have different and equally enthusiastic customer bases. Google is a very popular department-level choice, often purchased by line-of-business knowledge workers frustrated at obsolete and over-engineered search engines. Open source is, of course, popular with those that want to configure their search engine themselves or have a service provider do it and, thus, have a lot of control over how the engine works, as well as the results it delivers. Apache Lucene is also part of many commercial, high-end enterprise search products, including those of IBM.
Mid-market search is a somewhat vague area, where vendors are succeeding in deals of roughly $75,000-250,000 selling intranet search. This area has thinned out as some vendors have tried to move upmarket into the world of search-based applications, but there are still many vendors making a decent living here. However, SharePoint has had a major effect on this part of the market, and if enterprises already have SharePoint – and Microsoft reckons more than 70% have at least bought a license at some point already – then it can be tough to offer a viable alternative. However, if SharePoint isn’t the main focus, then there is still a decent business to be had offering effective enterprise search, often in specific verticals, albeit without a huge amount of vertical customization.
The OEM search business has become a lot more interesting recently, in part due to which vendors have left it, leaving space for others. Microsoft’s acquisition of FAST in early 2008 meant one of the two major vendors at the time had essentially left the market entirely, since its focus moved almost entirely to SharePoint, as we recently documented. The other major OEM vendor at the time was Autonomy, and while it would still consider itself to be so, we think much of its OEM business, in fact, comes from document filters, rather than the OEMing of the IDOL search engine. Autonomy would strongly dispute that, but it might be moot soon anyway – it now looks as if it will end up as part of Hewlett-Packard following the announcement of its acquisition at a huge valuation, on August 18.
Those exits have left room for the rise of other vendors in the space. Key markets here include archiving, data-loss prevention and e-discovery. Many tools in these areas have old or quite basic search and text analysis functionality embedded in them, and vendors are looking for more powerful alternatives.
The high end of the enterprise search market has become, in effect, the market for search-based applications (SBA) – that is, applications that are built on top of a search engine, rather than solely a relational database (although they often work alongside a database). These were touted back in the early 2000s by FAST, but it was too early, and FAST was too complex a set of tools to give the notion widespread acceptance. But in the latter part of the last decade and this one, SBAs have emerged as an answer to the problem of generic intranet search engines getting short shrift from users dissatisfied that the search engines don’t deliver what they want, when they want it.
Until recently, SBAs have mainly been a case of the vendors and their implementation partners building one-off custom applications for customers. But they are now moving to the stage where out-of-the-box user interfaces are being supplied for common tasks. In other words, it’s maturing in a similar way to the application software industry 20 years ago, which was built on top of the explosion in the use of relational databases.
We’ve seen examples in manufacturing, banking and customer service, and one of the key characteristics of SBAs is their ability to combine structured and unstructured data together in a single interface. That was also the goal of earlier efforts to combine search with business-intelligence tools, which often simply took the form of adding a search engine to a BI tool. That was too simplistic, and the idea didn’t really take off, in part because search vendors hadn’t paid enough attention to structure data.
But SBAs, which put much more focus on the indexing process than earlier efforts, appear to be gaining traction. If we were to get to the situation where search indexes are considered a better way of manipulating disparate data types than relational databases, that would be a major shift (see big data). Another key element of successful SBAs is that they don’t look like traditional search engines, with a large amount of white space and a search bar in the middle of the screen. Rather, they make use of facets and other navigation techniques to guide users through information, or often simply to present the relevant information to them.
As I mentioned, there’s more in the full report, including more about specific vendors, total (or big) data and the impact of social media. If you’d like to know more about it, please get in touch with me.
September 7th, 2011 — Data management
While there has been a significant amount of interest in the volume, velocity of variety of big data (and perhaps a few other Vs depending on who you speak to), it has been increasingly clear to that the trends driving new approaches to data management relate not just to the nature of the data itself, but how the user wants to interact with the data.
As we previously noted, if you turn your attention to the value of the data then you have to take into account the trend towards storing and processing all data (or at least as much as is economically feasible), and the preferred rate of query (the acceptable time taken to generate the result of a query, as well as the time between queries). Another factor to be added to the mix is the way in which the user chooses to analyze the data: are they focused on creating a data model and schema to answer pre-defined queries, or engaging in exploratory analytic approaches in which data is extracted and the schema defined in response to the nature of the query?
All of these factors have significant implications for which technology is chosen to store and analyze the data, and another user-driven factor is the increased desire to use specialist data management technologies depending on the specific requirement. As we noted in NoSQL, NewSQL and Beyond, in the operational database world this approach has become known as polyglot persistence. Clearly though, in the analytic database market we are talking not just about approaches to storing the data, but also analyzing it. That is why we have begun using the term ‘polyglot analytics’ to describe the adoption of multiple query-processing technologies depending on the nature of the query.
Polyglot analytics explains why we are seeing adoption of Hadoop and MapReduce as a complement to existing data warehousing deployments. It explains, for example, why a company like LinkedIn might adopt Hadoop for its People You May Know feature while retaining its investment in Aster Data for other analytic use cases. Polyglot analytics also explains why a company like eBay would retain its Teradata Enterprise Data Warehouse for storing and analyzing traditional transactional and customer data, as well as adopting Hadoop for storing and analyzing clickstream, user behaviour and other un/semi-structured data, while also adopting an exploratory analytic platform based on Teradata’s Extreme Data Appliance for extreme analytics on a combination of transaction and user behaviour data pulled from both its EDW and Hadoop deployments.
The emergence of this kind of exploratory analytic platform exemplifies the polyglot analytics approach to adopting a different platform based the user’s approach to analytics rather than the nature of the data. It also highlights some of the thinking behind Teradata’s acquisition of Aster Data, IBM’s acquisition of Netezza, as well as HP’s acquisition of Vertica and the potential future role of vendors such as ParAccel and Infobright.
We are about to embark on a major survey of data management users to assess their attitudes to polyglot analytics and the drivers for adopting specific data management/analytics technologies. The results will be delivered as part of our Total Data report later this year. Stay tuned for more details on the survey in the coming weeks.
September 7th, 2011 — eDiscovery
We have just published our annual report on the e-Discovery and e-Disclosure industries. This year we’ve subtitled the report ‘Crossing Clouds and Continents.’
This reflects a couple of the main themes of the report that are directly related: the rise of cloud computing within e-Discovery and the effect it has on those involved in e-Discovery in terms of how much simpler it makes it to store data in all sorts of locations. That of course then rasises issues of who is responsisble for that data and under what jurisduction it falls. Other issues we focus on in the report include:
- Changes in the legal sector in the US & UK
- In-sourcing & out-sourcing of e-Discovery by corporations and law firms
- European e-Discovery
- Social media
- Bribery, corruption & fraud
- Products & technologies, mapped to EDRM and beyond
- User case studies in healthcare, law & government (financial regulators)
- M&A – both the recent surge and a look ahead to what’s next
- Profiles of 30+ software and service providers
To find out more about it and how to get a copy, you can visit this page or contact myself directly.