April 20th, 2011 — Data management
As we noted last week, necessity is one of the six key factors that are driving the adoption of alternative data management technologies identified in our latest long format report, NoSQL, NewSQL and Beyond.
Necessity is particularly relevant when looking at the history of the NoSQL databases. While it is easy for the incumbent database vendor to dismiss the various NoSQL projects as development playthings, it is clear that the vast majority of NoSQL projects were developed by companies and individuals in response to the fact that the existing database products and vendors were not suitable to meet their requirements with regards to the other five factors: scalability, performance, relaxed consistency, agility and intricacy.
The genesis of much – although by no means all – of the momentum behind the NoSQL database movement can be attributed to two research papers: Google’s BigTable: A Distributed Storage System for Structured Data, presented at the Seventh Symposium on Operating System Design and Implementation, in November 2006, and Amazon’s Dynamo: Amazon’s Highly Available Key-Value Store, presented at the 21st ACM Symposium on Operating Systems Principles, in October 2007.
The importance of these two projects is highlighted by The NoSQL Family Tree, a graphic representation of the relationships between (most of) the various major NoSQL projects:

Not only were the existing database products and vendors were not suitable to meet their requirements, but Google and Amazon, as well as the likes of Facebook, LinkedIn, PowerSet and Zvents, could not rely on the incumbent vendors to develop anything suitable, given the vendors’ desire to protect their existing technologies and installed bases.
Werner Vogels, Amazon’s CTO, has explained that as far as Amazon was concerned, the database layer required to support the company’s various Web services was too critical to be trusted to anyone else – Amazon had to develop Dynamo itself.
Vogels also pointed out, however, that this situation is suboptimal. The fact that Facebook, LinkedIn, Google and Amazon have had to develop and support their own database infrastructure is not a healthy sign. In a perfect world, they would all have better things to do than focus on developing and managing database platforms.
That explains why the companies have also all chosen to share their projects. Google and Amazon did so through the publication of research papers, which enabled the likes of Powerset, Facebook, Zvents and Linkedin to create their own implementations.
These implementations were then shared through the publication of source code, which has enabled the likes of Yahoo, Digg and Twitter to collaborate with each other and additional companies on their ongoing development.
Additionally, the NoSQL movement also boasts a significant number of developer-led projects initiated by individuals – in the tradition of open source – to scratch their own technology itches.
Examples include Apache CouchDB, originally created by the now-CTO of Couchbase, Damien Katz, to be an unstructured object store to support an RSS feed aggregator; and Redis, which was created by Salvatore Sanfilippo to support his real-time website analytics service.
We would also note that even some of the major vendor-led projects, such as Couchbase and 10gen, have been heavily influenced by non-vendor experience. 10gen was founded by former Doubleclick executives to create the software they felt was needed at the digital advertising firm, while online gaming firm Zynga was heavily involved in the development of the original Membase Server memcached-based key-value store (now Elastic Couchbase).
In this context it is interesting to note, therefore, that while the majority of NoSQL databases are open source, the NewSQL providers have largely chosen to avoid open source licensing, with VoltDB being the notable exception.
These NewSQL technologies are no less a child of necessity than NoSQL, although it is a vendor’s necessity to fill a gap in the market, rather than a user’s necessity to fill a gap in its own infrastructure. It will be intriguing to see whether the various other NewSQL vendors will turn to open source licensing in order to grow adoption and benefit from collaborative development.
NoSQL, NewSQL and Beyond is available now from both the Information Management and Open Source practices (non-clients can apply for trial access). I will also be presenting the findings at the forthcoming Open Source Business Conference.
October 29th, 2010 — Archiving, Data management, eDiscovery, M&A, Search, Storage
The cloud archiving market will generate around $193m in revenues in 2010, growing at a CAGR of 36% to reach $664m by 2014.
This is a key finding from a new 451 report published this week, which offers an in-depth analysis of the growing opportunity around how the cloud is being utilized to meet enterprise data retention requirements.
As well as sizing the market, the 50-page report – Cloud Archiving; A New Model for Enterprise Data Retention – details market evolution, adoption drivers and benefits, plus potential drawbacks and risks.
These issues are examined in more detail via five case studies offering real world experiences of organizations that have embraced the cloud for archiving purposes. The report also offers a comprehensive overview of the key players from a supplier perspective, with detailed profiles of cloud archive service providers, with discussion of related enabling technologies that will act as a catalyst for adoption, as well as expected future market developments.
Profiled suppliers include:
- Autonomy
- Dell
- Global Relay
- Google
- i365
- Iron Mountain
- LiveOffice
- Microsoft
- Mimecast
- Nirvanix
- Proofpoint
- SMARSH
- Sonian
- Zetta
Why a dedicated report on archiving in the cloud, you may ask? It’s a fair question, and one that we encountered internally, since archiving aging data is hardly the most dynamic-sounding application for the cloud.
However, we believe cloud archiving is an important market for a couple of reasons. First, archiving is a relatively low-risk way of leveraging cloud economics for data storage and retention, and is less affected by the performance/latency limitation that have stymied enterprise adoption of other cloud-storage applications, such as online backup. For this reason, the market is already big enough in revenue terms to sustain a good number of suppliers; a broad spectrum that spans from Internet/IT giants to tiny, VC-backed startups. It is also set to experience continued healthy growth in the coming years as adoption extends from niche, highly regulated markets (such as financial services) to more mainstream organizations. This will pull additional suppliers – including some large players — into the market through a combination of organic development and acquisition.
Second, archiving is establishing itself as a crucial ‘gateway’ application for the cloud that could encourage organizations to embrace the cloud for other IT processes. Though it is still clearly early days, innovative suppliers are looking at ways in which data stored in an archive can be leveraged in other valuable ways.
All of these issues, and more, are examined in much more detail in the report, which is available to CloudScape subscribers here and Information Management subscribers here. An executive summary and table of contents (PDF) can be found here.
Finally, the report should act as an excellent primer for those interested in knowing more about how the cloud can be leveraged to help support ediscovery processes; this will be covered in much more detail in another report to be published soon by Katey Wood.
June 6th, 2008 — Search
Google has changed the name the scope of its Website search it offers to Website owners that want a little more than simply to know that their site is being indexed by Google, but don’t want to go as far as buying one of its blue or yellow search appliances. 451 clients can read what we thought of it here.
Google has three levels of Website search to offer organizations – completely free but with no control as to which parts of your website are indexed and when, known as Custom Search Edition/AdSense for Search (CSE/AFS); the newly rebranded Google Site Search; and the Google search appliances, which it sells in Mini and Search Appliance form factors, which can be used both for external-facing Website search as well as intranet search.
Google stopped issuing customer numbers for its appliances in October 2007. The number of organizations it had sold to at that point was about 10,000 customers. I suspect that number is around 11,500 now, though I don’t have any great methodology to back that up, I’m just extrapolating from previously-issued growth figures. That’s an extraordinary amount of organizations with a Google box.
To give some perspective, Autonomy has ~17,000 customers now. But the vast majority came from Verity. When Autonomy bought Verity in November 2005, Verity had about 15,000 customers (and Autonomy had about 1,000). But Verity got about 8,000 of those customers via its acquisition of Cardiff Software in February 2004. So in about 2.5 years Autonomy has added about 1,000 customer, but of course has done of lot of up-selling to its base and doesn’t play in the low-cost search business anymore (mainly because of Google).
The actual number of Google appliances sold is higher of course as many organizations have multiple appliances. I’ll never forget 18 months or so ago standing in a room of a top 3 Wall Street investment bank with its top ~25 technologists gathered in a room and seeing about 6 of them put up their hands when asked who has a Google appliance – most of those weren’t known about to their boss or to each other.
But Google appliance proliferation is commonplace in large organizations. The things are so cheap and so relatively easy to install they are bought often under the radar of IT . The problem comes when times get tough (as they are in investment banking IT, that’s for sure) the organization wants to ring more out of the assets it has – even if it didn’t know it had those assets until relatively recently.
That’s why we strongly expect Google to come out with some sort of management layer this year to handle this sort of unintended (by the customer that is) proliferation. Watch this space.
March 7th, 2008 — Search, Text analysis
I’ve gathered all my current thinking on potential M&A in enterprise search in a SectorIQ that we published earlier this week to our customers. In it, I look at four main potential targets plus a few other small ones and look at a few of the likely acquirers. (This is the way we write all our Sector IQs, btw and they’re a great way of getting a quick grasp on what might be coming down the pike in any particular sector of the IT industry)
Fortunately those of you that are not our customers (yet!) are able to read it via our arrangement with the New York Times DealBook section. Click here to see the NY Times posting or go here to go straight to the report – and while you’re there, sign up for a trial of our M&A KnowledgeBase, where we’ve been collecting details of every IT, internet and telecoms deal since the start of 2002!
Finally, a quick word about the headline. We like to have some fun here at 451 with these things and while I appreciate that this one might have been pushing things a little in terms of clearly explaining what the report was about, when else would I be able to use it?
March 6th, 2008 — 2.0, Collaboration
No surprise really that social software, social publishing and other types of socializing were hot topics this week at the AIIM show here in Boston. I started out the week at Drupalcon (co-located at AIIM this year), the community event for the open source Web publishing tool Drupal. This was my first time at Drupalcon, or really at any open source user event of this size. A couple things struck me. First and most superficially, I stuck out a bit both due to my rather corporate-looking business attire (sorry guys) and because of my gender — a comment was made at the start of the event that the attendees were 93% male.
But much more interesting was the level of engagement. Cheers and audience participation during the keynote by project lead Dries Buytaert were plentiful. The event was packed (there were 800 attendees and they had expected 500) and there appeared to be a high level of engagement among folks in the sessions and the hallways. (And I wasn’t the only one sticking out for looking a little corporate – I think the guys from Acquia, the new Drupal start-up were in the same boat. 451 Group clients can read our write-up on Acquia here (log-in required)).
AIIM didn’t have the same level of excitement but there was still a common thread between the two events. Part of Drupal’s popularity is due to its community features and the availability of modules to add capabilities like feed management, voting and so forth. Other vendors that fall into a broadly defined content management market are busy adding similar capabilities either to WCM tools that will ultimately deliver community features to site visitors or to content contributor UIs within apps themselves. I met with folks from Day Software, Alfresco, IBM, Salesforce.com and Oracle and support for communities, collaboration and user-generated content are hot topics. Interestingly, it was not a focus during a meeting with Google — no social features appear particularly imminent for Google’s Search Appliances.
I also attended an interesting session held by Tony Byrne of CMS Watch. Tony looked at CMS architectures and how those companies wishing to implement external communities or to support user-generated content on external sites may end up with best-of-breed tools for architectural reasons, even though WCM vendors are adding support for these features themselves. Interesting stuff.
There was no sense of irrational exuberance at AIIM though, not like last year’s Enterprise 2.0 conference that had a jammed showcase floor and overflowing sessions. AIIM is a massive show though and as it is co-located with the On Demand show, it’s an odd mix of photocopiers, printing machines and enterprise software. Several ECM vendors I met with including SpringCM, Xythos (which I found out was acquired by Blackboard last year in a deal that has been kept totally quiet), Hyland Software and Tower Software are much more focused on more traditional ECM problems, from process management to archiving, which are alive and well.
February 6th, 2008 — 2.0, Collaboration
We did a webinar this morning on enterprise social software, mostly presenting some high-level results of a survey we did with ChangeWave Research and analysis of the survey data and the market for our new special report on social software.
We had some Q&A at the end of the session using the web meeting software. I didn’t get to answer all the questions on the call as we ran out of time but have been going through the questions that queued up. There were several on Google and specifically on whether or not Google’s current enterprise offerings (Google Apps mostly) are “social.”
I don’t think it’s particularly useful to spend time debating whether or not Google Docs is social software. It’s a useful tool – I use it fairly regularly to collaboratively author documents and to share them with folks inside and outside of the company. That’s certainly a collaborative app and in some ways it’s definitely more collaborative than Microsoft Office (though I find the revision tracking with Google much more difficult).
But what strikes me about all these Google questions is the mindshare Google already has in the market, whether or not it has the tools. That’s part of what we found in our survey and what I discussed this morning on the webinar. In our survey, 18% of those currently using or planning to use social software (defined in this report as social networking, blogs, wikis and social bookmarking) in their organizations use or plan to use Google. I’m not sure what products / services exactly as Google doesn’t even really have enterprise offerings in these specific categories. But there you have it. And then a good chunk of the questions I had on the content were on Google. It’s definitely a ripe market for Google, if and when it decides to pick it.