Previewing Information Management in 2012

Every New Year affords us the opportunity to dust down our collective crystal balls and predict what we think will be the key trends and technologies dominating our respective coverage areas over the coming 12 months.We at 451 Research just published our 2012 Preview report; at almost 100 pages it’s a monster, but offers some great insights across twelve technology subsectors, spanning from managed hosting and the future of cloud to the emergence of software-defined networking and solid state storage; and everything in between. The report is available to both 451Research clients and non-clients (in return for a few details); access the landing page here.  There’s a press release of highlights here. Also, mark your diaries for a webinar discussing report highlights on Thursday Feb 9 at noon ET, which will be open for clients and non-clients to attend. Registration details to follow soon…

Here are a selection of key takeaways from the first part of the Information Management preview, which focuses on information governance, ediscovery, search, collaboration and file sharing. (Matt Aslett will be posting highlights of part 2, which focuses more on data management and analytics, shortly.)

  • One of the most obvious common themes that will continue to influence technology spending decisions in the coming year is the impact of continued explosive data and information growth.  This  continues to shape new legal frameworks and technology stacks around information governance and e-discovery, as well as to drive a new breed of applications growing up around what we term the ‘Total Data’ landscape.
  • Data volumes and distributed data drive the need for more automation and auto-classification capabilities will continue to emerge more successfully in e-discovery, information governance and data protection veins — indeed, we expect to see more intersection between these, as we noted in a recent post.
  • The maturing of the cloud model – especially as it relates to file sharing and collaboration, but also from a more structured database perspective – will drive new opportunities and challenges for IT professionals in the coming year.  Looks like 2012 may be the year of ‘Dropbox for the enterprise.’
  • One of the big emerging issues that rose to the fore in 2011, and is bound to get more attention as the New Year proceeds, is around the dearth of IT and business skills in some of these areas, without which the industry at large will struggle to harness and truly exploit the attendant opportunities.
  • The changes in information management in recent years have encouraged (or forced) collaboration between IT departments, as well as between IT and other functions. Although this highlights that many of the issues here are as much about people and processes as they are about technology, the organizations able to leap ahead in 2012 will be those that can most effectively manage the interaction of all three.
  • We also see more movement of underlying information management infrastructures into the applications arena.  This is true with search-based applications, as well as in the Web-experience management vein, which moves beyond pure Web content management.  And while Microsoft SharePoint continues to gain adoption as a base layer of content-management infrastructure, there is also growth in the ISV community that can extend SharePoint into different areas at the application-level.

There is a lot more in the report about proposed changes in the e-discovery arena, advances of the cloud, enterprise search and impact of mobile devices and bring-your-device-to-work on information management.

Our Total Data report is now totally available

…and it’s totally awesome.

Data volumes are exploding. Enterprises need better techniques to analyze, for example, IT management data or customer behavior statistics. The term ‘big data’ has emerged to describe new data management challenges posed by the growing volume, variety and velocity of data being produced by interactive applications and websites, as well as sensors, meters and other data-generating machines.

Our term ‘Total Data’ denotes a broad approach to data management that makes use of all available data, regardless of where it resides, to improve the efficiency and accuracy of business intelligence.

Total Data describes how users are deploying specialist data management technologies to maximize the benefit from individual operational or analytic workloads, while avoiding the creation of data silos by applying a unified approach to management that enables efficient data movement and integration.

This report examines the trends behind big data, as well as the new and existing technologies used to store and process this data, and outlines a Total Data management approach that is focused on selecting the most appropriate data storage and processing technology to deliver value from big data.

For more details of our Total Data report, and how to get it, see this page.

Valeriy Lobanovskyi: soccer manager… big data visionary

The increased focus on the value of data, combined with the recent release of Moneyball, has focused much attention on Oakland Athletics general manager Billy Beane and his successful use of data to improve performance.

Beane was my no means the first to realize the potential use of data in sports, however. That title could arguably go to Valeriy Lobanovskyi, manager of the Dynamo Kyiv soccer team between 1974 and 1990.

Lobanovskyi’s name is unlikely to be well known to even the most ardent football fans but our research into Total Football as an inspiration for our total data concept has highlighted the fact that Lobanovskyi was as much a big data visionary as he was a footballing visionary.

Total football is most readily associated with Rinus Michels and his teams: Ajax of Amsterdam, Barcelona, and the Dutch national side of the 1970s; but while Michels was busy winning Dutch league titles and European Cups, Lobanovskyi similarly was busy at Dynamo Kiev winning the Soviet League eight times, the Ukrainian league five times, and the European Cup Winner’s Cup twice with an approach known as Universality.

Describing the concept of Universality, Lobanovskyi once stated that “the most important thing in football is what a player is doing on a pitch when he is not in possession of the ball.”

Total football devotees will recognize the description, and as Hortonworks co-founder Arun C Murthy recently noted, Lobanovskyi arguably deserves as much credit as Michels for coming up with what would eventually become known as total football.

So far, so football visionary. What separates Lobanovskyi from Michels is the fact that he based much of his vision on data, and the analysis of data. Originally trained as an engineer, Lobanovskyi saw the potential value of a scientific, data-led approach to sport.

Together with statistician Anatoliy Zelentsov, Lobanovskyi devised a method of recording and analyzing the events and actions in a game of football and using it to provide players with a statistical analysis of their performance and set targets designed to meet the style he wanted the team to play (squeezing, pressing, or combination).

“All life,” Lobanovskyi once said, “is a number”.

An example of Lobanovskyi and Zelentsov’s targets, as explained in Inverting the Pyramid: A History of Football Tactics, by Jonathan Wilson, is displayed below:

To put this in some context, Lobanovskyi was using statistics and data as a means of gaining competitive advantage in sport 20 years before the formation of Opta Sports and Prozone, and almost 30 years before Beane and the 2002 Oakland Athletics.

Clients can read more about Total Football, and our description of approaches to data management in an era of ‘big data’, in our Total Data report, to be released in the coming days.

The geographic distribution of NoSQL skills – just one more thing

Hidden away amongst the details of our little tour around LinkedIn statistics on NoSQL and Hadoop skills was some interesting information on how many LinkedIn members list the various data management technologies in our sample in their profiles.

Our original post contained the fact that there were 9,079 LinkedIn members with “Hadoop” in their member profiles, for example, compared to 366,084 with “MySQL” in their member profiles.

Later posts showed there were 170 with “Membase” and 1,687 with “HBase”, 787 with “Apache Cassandra” and 376 with “Riak”, 6,048 with “MongoDB” and 2,152 with “Redis”, and finally, 1,844 with “CouchDB” and 268 with “Neo4j”.

This gives us an interesting perspective on the relative adoption of the various NoSQL databases:

If it wasn’t already obvious from the list above, the chart illustrates just how much more prevalent MongoDB skills are compared to the other NoSQL databases, followed by Redis, Apache CouchDB, Apache HBase and Apache Cassandra. The chart also illustrates that while HBase is the second most prevalent NoSQL skill set in the USA, it is only fourth overall given its lower prevalence in the rest of the world.

In response, a representative from a certain vendor notes “Some skills are more valued not because they are more prevalent, but because they are harder to achieve.” Make of that what you will.

The geographic distribution of NoSQL skills: CouchDB and Neo4j

Following last week’s post putting the geographic distribution of Hadoop skills, based on a search of LinkedIn members, in context, this week we will be publishing a series of posts looking in detail at the various NoSQL projects.

The posts examine the geographic spread of LinkedIn members citing a specific NoSQL database in their member profiles, as of December 1, and provides an interesting illustration of the state of adoption for each.

We’ve already taken a look at Membase and HBase; Apache Cassandra and Riak; and 10gen’s MongoDB and Redis.

Part four brings the series to a close with a look at Apache CouchDB and Neo4j, which boast the most geographically diverse adoption of the NoSQL databases in our sample.

The statistics showed that 36.4% of the 1,844 LinkedIn members with “CouchDB” in their member profiles are based in the US, while only 8.9% are in the Bay area, the least of any of the NoSQL database we looked at.

The results also indicate that the UK is a particularly strong area for CouchDB skills, with 7.1%. Other hot-spots include Canada (4.1%), Germany (4.0%) and The Netherlands (3.1%).

Neo4j is even more widely adopted, with only 36.2% of the 268 LinkedIn members with “Neo4j” in their member profiles based in the US, although 10.4% are in the Bay area.

With 4.1%, Sweden is a hot-spot for Neo4j skills, as one might expect given that’s where it and Neo Technology originated. The UK is also strong with 9.7%, followed by India with 5.6% and the New York area with 4.9%.

Since Neo4j originated in Europe it is of course an open question whether its higher adoption in the Rest of the World than the US is a sign of a greater spread of adoption, or a relative failure to infiltrate the US market. Given that the company already has an active presence in the US we are inclined towards the former.

N.B. The size of the boxes is in proportion to the search result (click each image for a larger version). World map image: Owen Blacker

Forthcoming webinar: What is a cloud database?

Cloud computing and big data are two of the hottest topics in the industry today, which makes cloud databases a particularly hot prospect for 2012. What is a cloud database, however? On Thursday, December 15 at 12:00pm EST I’ll be taking part in a webinar with Karen Tegan Padir, Vice President of Products and Marketing, EnterpriseDB on the subject of cloud computing and true cloud databases.

In this webcast, you’ll get an overview of the current state of cloud database computing, and more specifically the differences between cloud databases and databases in the cloud. I’ll be providing an overview of the functional requirements that separate databases running in the public cloud, and databases that will be used to power private and hybrid clouds.

Then Karen will provide an overview and demonstration of Postgres Plus Cloud Server, which provides DaaS for PostgreSQL databases and went into public beta earlier this week.

You can register for the event here

The geographic distribution of NoSQL skills: HBase and Membase

Following last week’s post putting the geographic distribution of Hadoop skills, based on a search of LinkedIn members, in context, this week we will be publishing a series of posts looking in detail at the various NoSQL projects.

The posts examine the geographic spread of LinkedIn members citing a specific NoSQL database in their member profiles, as of December 1, and provides an interesting illustration of the state of adoption for each.

We begin this week’s series with Membase and HBase, the two projects that proved, like Apache Hadoop, to have significantly greater adoption in the USA compared to the rest of the world.

The statistics showed that 58.2% of the 170 LinkedIn members with “Membase” in their member profiles are based in the US (as previously explained, we tried the same search with Couchbase, but with only 85 results we decided to use the Membase result set as it was more statistically relevant).

As with Hadoop, a significant proportion (27.1%) of those are in the Bay area, the highest proportion of all the NoSQL databases we looked at. The results also indicate that Ukraine is a hot-spot for Membase skills, with 3.5%, while Membase adoption is lower the UK (2.4%) than other NoSQL databases.

It should not be a great surprise that Apache HBase returned similar results to Apache Hadoop. The top eight individual regions for HBase were exactly the same as for Hadoop, although the UK (3.4%) is stronger for HBase, as is India (10.7%).

The statistics showed that 57.0% of the 1,687 LinkedIn members with “HBase” in their member profiles are based in the US, with 25.0% in the Bay area (the third highest in our sample behind Hadoop and Membase).

The series will continue later this week with MongoDB, Riak, CouchDB, Apache Cassandra, Neo4j, and Redis.

N.B. The size of the boxes is in proportion to the search result (click each image for a larger version). World map image: Owen Blacker

VC funding for Hadoop and NoSQL tops $350m

451 Research has today published a report looking at the funding being invested in Apache Hadoop- and NoSQL database-related vendors. The full report is available to clients, but below is a snapshot of the report, along with a graphic representation of the recent up-tick in funding.

According to our figures, between the beginning of 2008 and the end of 2010 $95.8m had been invested in the various Apache Hadoop- and NoSQL-related vendors. That figure now stands at more than $350.8m, up 266%.

That statistic does not really do justice to the sudden uptick of interest, however. The figures indicate that funding for Apache Hadoop- and NoSQL-related firms has more than doubled since the end of August, at which point the total stood at $157.5m.

A substantial reason for that huge jump is the staggering $84m series A funding round raised by Apache Hadoop-based analytics service provider Opera Solutions.

The original commercial supporter of Apache Hadoop, Cloudera, has also contributed strongly with a recent $40m series D round. In addition, MapR Technologies raised $20m to invest in its Apache Hadoop distribution, while we know that Hortonworks also raised a substantial round (unconfirmed, but reportedly $20m) from Benchmark Capital and former parent Yahoo as it was spun off in June. Index Ventures also recently announced that it has become an investor in Hortonworks.

I am reliably informed that if you factor in Hortonworks’ two undisclosed rounds, the total funding for Hadoop and NoSQL vendors is actually closer to $400m.

The various NoSQL database providers have also played a part in the recent burst of investment, with 10gen raising a $20m series D round and Couchbase raising $15m. DataStax, which has interests in both Apache Cassandra and Apache Hadoop, raised an $11m series B round, while Neo Technology raised a $10.6m series A round. Basho Technologies raised $12.5m in series D funding in three chunks during 2011.

Additionally, there are a variety of associated players, including Hadoop-based analytics providers such as Datameer, Karmasphere and Zettaset, as well as hosted NoSQL firms such as MongoLab, MongoHQ and Cloudant.

One investor company name that crops up more than most in the list above is Accel Partners, which was an original investor in both Cloudera and Couchbase, and backed Opera Solutions via its Accel- KKR joint venture with Kohlberg Kravis Roberts.

It appears that those investments have merely whetted Accel’s appetite for big data, however, as the firm last week announced a $100m Big Data Fund to invest in new businesses targeting storage, data management and analytics, as well as data-centric applications and tools.

While Accel is the fist VC shop that we are aware of to create a fund specifically for big data investments, we are confident both that it won’t be the last and that other VCs have already informally earmarked funds for data-related investments.

451 clients can get more details on funding and M&A involving more traditional database vendors, as well as our perspective on potential M&A suitors for the Hadoop and NoSQL players.

Scalable SQL: more than the mullet of the database world?

In the first part of our coverage on emerging database products and vendors we examined the new NoSQL databases and suggested that the incumbent database vendors would likely respond to the growing threat with a mix of in-memory and distributed caching technologies.

That is yet to happen, although it has only been a few months and the NoSQL databases have generated more noise than revenue at this stage, but in the meantime a new set of database vendors and products have emerged that could pose a more direct threat to the database incumbents while thwarting the potential of the NoSQL upstarts.

For want of a better phrase we have taken to referring to these products collectively as scalable SQL databases, and have just published a new spotlight report pulling together our various reports on the runners and riders.

Some of the vendors promise to deliver the scalability and flexibility promised by NoSQL while retaining the support for SQL queries and/or ACID (atomicity, consistency, isolation, durability). That is not an insignificant boast and it will be tough to offer the best of both worlds.

“SQL For Business, NoSQL For Partay!” is the explanation offered by MulletDB, a project that promises scalability and SQL queries. The danger is the scalable SQL ends up being the database equivalent of the celebrated mullet hairstyle or its business attire equivalent: the jacket and jeans.

One of the companies trying to avoid that problem is GenieDB (coverage) The London-based company’s GenieDB Engine is a fully replicated distributed database that combines a key-value store database with a ‘sharded’ memcached layer. Another example is Clustrix, which was founded in December 2006 to develop a new database appliance that would offer both scalability and durability in a single product.

Meanwhile VoltDB emerged earlier this summer with a transactional database management system that is designed to scale across clusters of industry-standard servers while retaining transactional integrity.

Additionally Xeround has recently confirmed its intention to reposition its Intelligent Data Grid (IDG) technology as Xeround Data Service, a scalable SQL database with support for ACID-compliant transactional capabilities for cloud computing environments, while New Technology/enterprise’s CloudTran, is designed to bring enterprise-level transaction management to GigaSpaces’ XAP in-memory data grid for on-premises deployment, and eventually any PaaS offering.

Meanwhile we are intrigued by VMware’s acquisiton of distributed data management vendor GemStone and its positioning of GemFire as a next-generation data management layer for cloud applications, as well as the forthcoming introduction of SQL querying in GigaSpaces’ eXtreme Application Platform (XAP), which will enable in-memory management of relational data and initiatives.

It is very early stages for all these vendors, and they have yet to prove that they have truly solved the problem of consistency and partition tolerance. In the meantime there are plenty of other contenders waiting in line.

Akiban is promising that it has the secret to SQL scalability with an approach that pre-groups data in order to overcome latency, caching and data distribution issues. Another company currently in stealth mode is JustOne Database which is working on perfecting a new storage model in order to deliver the performance and scalability required to support transactions and analytics on the same data simultaneously.

That is also the goal of Tokutek, which offers the TokuDB MySQL storage engine is based on Fractal Tree indexing technology designed to reduce data-insertion times and improve the performance of MySQL for both read and write applications.

JustOne and Tokutek are part of a slightly different set of vendors we are viewing under the scalable SQL umbrella: those that promise to improve performance for appropriate workloads to the extent that the advanced scale-out capabilities promised by some NoSQL databases become irrelevant.

While we’re on the subject of existing database vendors that could be considered part of the scalable SQL set, it is also worth mentioning MarkLogic. The company has recently been| associating itself with NoSQL and while the fact that it does not support SQL makes it a better literal fit with NoSQL the company’s support for ACID means that we would see it as an option for customers looking to improve performance without losing consistency, especially for unstructured or semi-structured data.*

As we previously noted; to some degree, the rise of NoSQL has resulted from the inability of the MySQL database to scale consistently. It is no surprise to see many of the scalable SQL vendors promising to improve the performance and scalability of MySQL, therefore, while others promote a clean-slate approach to address new big data management problems.

We have more details on each of the products and projects, mentioned above (as well as some not mentioned) their potential use cases, how they relate to MySQL, and what potential impact they may have on the adoption of NoSQL technologies, in the full report.

This is very much the start of our coverage of these vendors however. Expect more coverage in the near future, as well as a wider perspective on the potential for alternatives to the incumbent database suppliers, into 2011.

*Additionally, since the absence of SQL is only really tangential to many of the projects and products referred to as NoSQL it seems to me to be appropriate to have a database that does not support SQL in the scalable SQL category.

Is Sybase buying Aleri?

Marc Adler and Marco Seiriö seem to think so.

Such a deal would seem a little strange coming less than a year after Sybase licensed the underlying complex event processing (CEP) engine for Sybase CEP from Coral8, immediately prior to Coral8′s acquisition by Aleri.

The terms of that licensing agreement provide a clue as to why Sybase would consider opening up its wallet again to snap up Aleri, however.

As Aleri insisted last March, “The licensing arrangement allows Sybase to embed CEP capabilities within and ONLY WITHIN Sybase products such as RAP”.

Sybase later confirmed (clients only) to us that this was indeed the arrangement and maintained that its strategy for CEP was to embed it within larger platform products.

As well as RAP – The Trading Edition, the company’s risk-analytics platform, Sybase also had plans to target opportunities in the telecommunications, healthcare and government sectors.

One justification for the acquisition of Aleri would be that it would allow Sybase to target those markets and other opportunities with a standalone CEP offering based on Aleri’s next-generation engine codenamed Ohio which is slated for roll-out in 2010 and is designed to include the best features from Aleri Streaming Platform and the Coral8 Engine and be backwards-compatible with both.

Then of course there are the Aleri/Coral assets beyond the core CEP engine, including the Aleri Studio visual modeling application, as well as dashboard and OLAP server capabilities, and packaged applications for risk and liquidity analysis and management.

As for why Aleri would sell out to Sybase – we certainly noted some trepidation from the company when we caught up (clients only) in September last year. While the company was buoyant about its plans for Ohio it was reticent to discuss details of customer wins/successes.

The only thing the company would say was that it had more than 80 customers, the number of combined customers when the merger closed.

At that point it was somewhat more confident, claiming (clients only) to be the largest pure-play CEP vendor in terms of headcount and customer base and revenue (although with none of the CEP vendors disclosing revenue figures, that last claim was always highly debatable).