On the opportunities for cloud-based databases and data warehousing

At last year’s 451 Group client event I presented on the topic of database management trends and databases in the cloud.

At the time there was a lot of interest in cloud-based data management as Oracle and Microsoft had recently made their database management systems available on Amazon Web Services and Microsoft was about to launch the Azure platform.

In the presentation I made the distinction between online distributed databases (BigTable, HBase, Hypertable), simple data query services (SimpleDB, Microsoft SSDS as was), and relational databases in the cloud (Oracle, MySQL, SQL Server on AWS etc) and cautioned that although relational databases were being made available on cloud platforms, there were a number of issues to be overcome, such as licensing, pricing, provisioning and administration.

Since then we have seen very little activity from the major database players with regards to cloud computing (although Microsoft has evolved SQL Data Services to be a full-blown relational database as a service for the cloud, see the 451′s take on that here).

In comparison there has been a lot more activity in the data warehousing space with regards to cloud computing. On the one hand there data warehousing players are later to the cloud, but in another they are more advanced, and for a couple of reasons I believe data warehousing is better suited to cloud deployments than the general purpose database.

  • For one thing most analytical databases are better suited to deployment in the cloud thanks to their massively parallel architectures being a better fit for clustered and virtualized cloud environments.
  • And for another, (some) analytics applications are perhaps better suited to cloud environments since they require large amounts of data to be stored for long periods but processed infrequently.
  • We have therefore seen more progress from analytical than transactional database vendors this year with regards to cloud computing. Vertica Systems launched its Vertica Analytic Database for the Cloud on EC2 in May 2008 (and is wotking on cloud computing services from Sun and Rackspace), while Aster Data followed suit with the launch of Aster nCluster Cloud Edition for Amazon and AppNexus in February this year, while February also saw Netezza partner with AppNexus on a data warehouse cloud service. The likes of Teradata and illuminate are also thinking about, if not talking about, cloud deployments.

    To be clear the early interest in cloud-based data warehousing appears to be in development and test rather than mission critical analytics applications, although there are early adopters and ShareThis, the online information-sharing service, is up and running on Amazon Web Services’ EC2 with Aster Data, while search marketing firm Didit is running nCluster Cloud Edition on AppNexus’ PrivateScale, and Sonian is using the Vertica Analytic Database for the Cloud on EC2.

    Greenplum today launched its take on data warehousing in the cloud, focusing its attention initially on private cloud deployments with its Enterprise Data Cloud initiative and plans to deliver “a new vision for bringing the power of self-service to data warehousing and analytics”.

    That may sound a bit woolly (and we do see the EDC as the first step towards private cloud deployments) but the plan to enable the Greenplum Database to act as a flexible pool of warehoused data from which business users will be able to provision data marts makes sense as enterprises look to replicate the potential benefits of cloud computing in their datacenters.

    Functionality including self-service provisioning and elastic scalability are still to come but version 3.3 does include online data-warehouse expansion capabilities and is available now. Greenplum also notes that it has customers using the Greenplum Database in private cloud environments, including Fox Interactive Media’s MySpace, Zions Bancorporation and Future Group.

    The initiative will also focus on agile development methodologies and an ecosystem of partners, and while we were somewhat surprised by the lack of virtualization and cloud provisioning vendors involved in today’s announcement, we are told they are in the works.

    In the meantime we are confident that Greenplum’s won’t be the last announcement from a data management focused on enabling private cloud computing deployments. While much of the initial focus around cloud-based data management was naturally focused on the likes of SimpleDB the ability to deliver flexible access to, and processing of, enterprise data is more likely to be taking place behind the firewall while users consider what data and which applications are suitable for the public cloud.

    Also worth mentioning while we’re on the subject in RainStor, the new cloud archive service recently launched by Clearpace Software, which enable users to retire data from legacy applications to Amazon S3 while ensuring that the data is available for querying on an ad hoc basis using EC2. Its an idea that resonates thanks to compliance-driven requirements for long-term data storage, combined with the cost of storing and accessing that data.

    451 Group subscribers should stay tuned for our formal take on RainStor, which should be published any day now, while I think it’s probably fair to say you can expect more of this discussion at this year’s client event.

    Enterprise Search Summit 09 perspectives

    I started off this year’s Enterprise Search Summit in New York last week with a dinner sponsored by New Idea Engineering and Attivio on Monday night, which was highly enjoyable, despite my jetlag – having to try and stay up the first night in from London. Thanks to those folks for the invite and the conversation.

    Katey and I were not allowed to sit in any of the session this year from some strange reason. So I can’t tell you first hand about what was interesting or not or the attendance in the sessions. Go figure. It also wasn’t that conducive to meeting end users, which is a main objective of attending these things.

    Katey reckoned attendance overall was slightly down on last year, but not spectacularly so (I was at different conference and so had to miss last year’s).

    So away from those two disappointments, we did have a fairly full docket of meetings with vendors, which were generally lively, with good give and take. Where we say ’451 research to follow,’ it means our clients can expect a research report on the company in the near future.

    Some of the highlights:

    Attivio – CTO Sid Probstein is always chock-full of ideas and so always good to have a sitdown with him. CEO Ali Riaz is entertaining on a whole different level. The company appears to be going great guns and is at the forefront of the drive to combine structured and unstructured data as we have said before.

    BA-Insight – not really a search company or a text analysis company; more of a piece of information management middleware that aims to increase ‘findability’ within SharePoint. As any SharePoint users, especially those in an environment with multiple SharePoint sites – that can only be a good thing. Connectors to other search engines coming. 451 research to follow.

    Coveo – the company was out in force at this conference having just launched version 6.0 of its search platform featuring better scalability, connectors and mobile functionality. We covered that product update a short while back.

    Endeca – met chief scientist Daniel Tunkelang for the first time. Clearly the owner of an active mind, Daniel presents a different face to the search company. His thoughts on the conference are here.

    Google – the typically on-message briefing from Google. It owns the low end and is increasingly taking chunks out of the mid-tier, but still no sign of the management layer enterprises needed to get their arms around the myriad Google search appliances lying around most large organizations. It will probably appear out of the blue at some point though, this year, I’d imagine.

    MicrosoftNate Treolar was a great evangelist for Fast Search & Transfer while a product manager, and so it seems appropriate that he has the term ‘evangelist’ in his title at Microsoft where he’s working not only on the SharePoint search ecosystem but other programs such as ‘conversational’ and ‘actionable’ search; talking and doing, hey, what else is there? ;)

    PerfectSearch – we don’t usually see too many companies at this conference that we haven’t spoken to before, but PerfectSearch is one of them. It sells a search appliance and some of the founders have a Novell background, hence its Orem, IT HQ. 451 research to follow.

    Vivisimo – from what we’ve heard the company is going well, both in the indirect (OEM) ad direct markets. We’ve noticed how often this company is being bad-mouthed by its competitors (over and above the usual FUD in any tech market) though we’re not sure why. Perhaps because Pittsburgh isn’t as fashionable as Boston or the Valley? Don’t really know, but it seems misplaced based on our experience. It’s making good headway with Lexis-Nexis, which will be important in the eDiscovery market as well with other customers that have demanded confidentiality (pretty common in the eDiscovery market). 451 research to follow.

    Microsoft sheds more light on Office 14

    Microsoft has begun to share information on what it calls the “waves” of Office 14 products set to hit the market this year and next. Most of the information at this point is on Microsoft Exchange 2010, which has entered public beta. General availability is expected in the second half of this year.

    There’s also some info for SharePoint, though little detail. Microsoft SharePoint Server 2010 will go into technical preview in Q3 2009 and be generally available in the first half of 2010.  Beyond that, we still don’t know what will and won’t be in SharePoint.next (though we don’t have to call it that anymore).

    The part of the Exchange 2010 announcement that caught my attention is the reference to an integrated e-mail archive.  Did Microsoft just enter the email archiving market?  That would certainly be noteworthy, given that much of the hot email archiving market involves archiving Exchange email.  Since Microsoft hasn’t had a horse in this race, this has been the realm of third-party providers like Symantec and Mimosa Systems to date.

    On the analyst telebriefing held today by Microsoft on this announcement, I asked about this and the role for Microsoft’s email archiving partners going forward.  Michael Atalla, Group Product Manager for Exchange at Microft told me that Microsoft is out to meet the needs of the 80% of its customers that don’t yet have any email archiving technology and that existing email archiving products serve a “niche” of the market at the high end for customers that have to meet regulatory requirements for email archiving.

    While I agree there is still a lot of opportunity in the email archiving space, describing existing adoption as limited to those in regulated industries isn’t exactly accurate.

    I’ve tried to dig deeper into what this integrated archive includes.  Not easy, as there is no mention of archiving at all in the TechNet docs on Exchange 2010 (though there’s quite a bit of interesting detail on records and retention management).

    Best I can tell, Exchange 2010 lets you create individual or “personal archives.”  This page from Microsoft explains that a personal archive is:

    an additional mailbox associated with a user’s primary mailbox.  It appears alongside the primary mailbox folders in Outlook. In this way, the user has direct access to e-mail within the archive just as they would their primary mailbox. Users can drag and drop PST files into the Personal Archive, for easier online access – and more efficient discovery by the organization. Mail items from the primary archive can also be offloaded to the Personal Archive automatically, using Retention Polices…

    So it moves the PST file from the desktop to the server, which makes it more available for online searching and discovery purposes.  But is that really email archiving?  I can see how that would be attractive to end users that want an easier way to access archived emails, but it seems like it would increase the load on the mail server and not handle things like de-duping, which archiving is generally meant to address.

    I’m not an expert on email archiving though.  I’d love to hear from anyone who has comments.

    What will NOT be in the next version of SharePoint

    I might catch a lot of readers with that title, but of course I don’t really know for sure what will and won’t be in the next version of SharePoint.  Microsoft is still mum on the topic and I suspect will remain so until the SharePoint Conference slated for October.  This event was held in March last year; it seems logical it has been delayed this year to time the event with Office 14 announcements specific to SharePoint.

    I read Guy Creese’s post last week on what he thinks will be in the next version of SharePoint and like Guy, I get a lot of questions in this vein.  I agree with Guy that SharePoint.next will have search improvements (we already know that one) and more sophisticated administration (we all hope). I’ll be surprised to see dramatic improvements in the transition between hosted and on-premise SharePoint in this version, I think the marketing is likely to lead the reality in this area for sometime to come, but perhaps I’ll be surprised.

    I often get questions more specifically (from vendors) around what Microsoft isn’t going to do and reading Guy’s post, I thought it would be interesting to comment on what’s left out.

    On the social software front…

    There’s been some debate of late about whether or not SharePoint is an “Enterprise 2.0″ tool at all (or what, in fact, that even means, if anything). But anyone who saw Lawrence Liu pitch SharePoint versus IBM Lotus Connections to a packed room at Enterprise 2.0 last year, would certainly assume Microsoft has ambitions in this area.  It’s worth noting however that Liu left Microsoft not long after that for Telligent Systems, which sells community software as an adjunct to SharePoint.  Liu presumably knows more about the SharePoint roadmap than we do, so looking at Telligent’s roadmap (limited version here) is probably a good indication of where Microsoft won’t go in social software in this next release (think community analytics, bridging internal and external communities, and feed aggregation).

    It’s not about WCM.

    Making SharePoint ubiquitous for content-based collaboration is Microsoft’s number one goal and this means improved admin, search and social software, to my mind.   So what will get left out?   I don’t think we’ll see any major changes on the WCM front.  Microsoft marketed the WCM capabilities in MOSS 2007 when it first came out, as it stopped development on its stand-alone WCM product, Microsoft CMS (which came from its 2001 acquisition of nCompass) in favor of Sharepoint.  But this seems to have died down and vendors like Sitecore are doing well selling more sophisticated WCM with SharePoint integrations, apparently with cooperation from Microsoft.  WCM for large, customer-facing sites, is really not where SharePoint strengths lie and Microsoft will likely let this one stand much as it is as it invests in other areas (Sitecore even sells a bundle for intranets, showing some market opportunity for WCM even in SharePoint’s sweet spot).

    What about records management and archiving?

    There’s some records management today in SharePoint, but it’s limited to SharePoint environments.  Improved admin across server farms could help here but it doesn’t seem likely Microsoft is going to go far beyond this and this doesn’t address the archiving issue at all.  Vendors like Open Text, Symantec and EMC are banking on their products’ abilities to manage and archive content (including email) from multiple repositories including SharePoint.  And this seems like a market that will be relatively immune to changes in SharePoint.next — indeed, changes that make SharePoint more popular are likely only good news to these vendors, at least in the short term.

    I’m sure there are other gaps vendors are filling where they may be some continued opportunity after SharePoint.next, but those are the big ones that jump to my  mind.

    Lervik leaves Microsoft-FAST

    So it appears that John Markus Lervik has left Microsoft – he’s now a (Former) Corporate Vice President there, despite the fact that Microsoft claimed to be concentrating its search efforts in his native Norway.

    When I saw the news over the weekend I took one look at the date and recalled that the deal to buy FAST was in early January 2008 and thus a year had just past and such 12 month lock-ups are customary, and that FASTForward09 is coming up, starting February 9 and so Microsoft wanted a clean break before that, I’m sure. Nobody’s talking right now, so it’s hard to know all the ins and outs, but that’s why I suspect it’s happened now, rather than earlier or later.

    Anyway, I agree with Dave Kellog’s assessment of why what happened, happened.

    John Markus never seemed comfortable to me being a Microsoft executive. Bjorn Olstad probably isn’t that comfortable with it either, but he is undoubtedly a very smart engineering leader and product developer, and in a role where he doesn’t need to sing the company song three times before breakfast and I suspect he’d like to stay that way, rather than get involved in being a figurehead for FAST within Microsoft.

    We look forward to hearing just what Microsoft is doing with FAST in early February, because over the last year or so, we haven’t heard anything more than we heard at FASTForward last year.

    Oracle continues to build collaboration

    It’s hard for me to get excited about email.  Luckily for me, here at The 451 Group, we focus on emerging areas of the technology market or sectors where there is particular innovation or disruption.  And none of that much describes the email market.  I have looked at some of the vendors doing interesting things here (like Zimbra and Open-Xchange) and I had met once with PostPath before Cisco grabbed it a few weeks ago.  But even though I cover collaboration, I’m not down in the weeds with  “groupware” everyday.

    So an announcement from Oracle that it has spent the last three years building a new collaboration suite, to replace Oracle Collaboration Suite (which hadn’t exactly taken the market by storm), doesn’t get me all revved up.  There’s lots of content out there on Beehive, from InformationWeek and CNET for example, so I won’t rehash all the details.  I know it’s about more than email and there’s a lot there that technically makes sense — its focus on security and compliance, scalability, integration with Outlook and the Zimbra web client, support for CalDAV and so on.

    But it is not going to be an easy fight for Oracle in this market, to be sure, no matter how badly Oracle wants a piece of the collaboration market.  How many companies — others than OCS customers that are now stuck with a dead product — are going to move to a brand-new collab product?  I didn’t find any of the use cases Oracle described during their pre-brief all that compelling.

    I wonder why Oracle, which obviously has no hesitation about buying into markets where it wants to be a major player, hasn’t acquired collab technology?  In the related content management market, Oracle made several attempts to market a database-driven content management system, mostly based on its ContentDB product and until it ultimately, purchased Stellent for $440m in 2006.  This strategy seems to be going well for Oracle (451 Group clients can read a more detailed write-up from a few months ago on Oracle’s progress in the content management market) and the company upped its investment by purchasing Captovation for document capture in January and Skywire Software for output (though this one was really more about insurance apps).

    I understand that Oracle can’t go out and acquire the biggest competitor to Microsoft Exchange (that would IBM Lotus) and collab that ties closely to its apps is a high priority for Oracle, so building maybe made the most sense.  Still, there are other models for disruption – Yahoo! is looking at a different market segment with Zimbra and Cisco is planning its SaaS strategy with PostPath.  Those vendors both see Google Apps as the potential disrupter to the Exchange / SharePoint powerhouse and are looking to take a piece of that action before Google has it tied up.

    I’m maybe not quite as a pessimistic as Matt Cain over at Gartner (his assessment doesn’t pull any punches).  I was part of a briefing with Matt once (back in his Meta Group days) when I was a product manager at Sun, on the integration between Sun’s portal server and collaboration products (email/calendar).  It was a half-baked integration with a lot of marketing fluff and Matt called us on it bluntly and accurately.  In saying Beehive is unlikely to be any more successful than [Oracle's] past efforts, he does the same.

    CMIS and industry standards in ECM

    The rumored multi-vendor ECM interoperability effort has been unveiled.  IBM, Microsoft and EMC (and others) have collaborated on a draft specification – Content Management Interoperability Services (CMIS) – that is meant to addresses basic interoperability and accessibility for repository-based content.  The goal is to make it easier to pull/push managed content to/from other apps without the need for custom integrations or third-party connectors.

    Some write-ups are already out there, with more detailed explanations:

    CMS Wire – Industry Heavy Weights Move to Standardize Enterprise Content Management

    Microsoft Enterprise Content Management (ECM) Team Blog – Announcing the CMIS Specification

    Chuch Hollis – CMIS — it’s not JAS (just another standard)

    John Newton’s Content Log – Alfresco releases first CMIS implementation

    Chuck Hollis, as usual, has a particularly concise and on-target analysis.  He notes several of the following points that the standard effort has going for it, and I’ve added a few of my own:

    • Interoperability is a real and growing problem (James McGovern has several intereting posts on this topic).  The industry needs to start to take some steps to solve it.
    • This effort, though clearly still 1.0, has the right vendors behind it as it involves Oracle, Adobe and, Alfresco (kudos to still-small (and open source) Alfresco for getting a seat at the table on this one), along with the leads IBM, Microsoft and EMC.
    • The multi-platform / multi-language approach is a must — a Java-only standard would have left SharePoint out of the picture and not covering SharePoint interoperability would seriously hamper the effectiveness of any ECM standard at this point.
    • By working at a services layer and utilizing REST and SOAP, layering on top of existing systems and not requiring major re-writes or upgrades will be more feasible and potentially have the quickest impact.  This may also limit the sophistication of the what the standard is able to accomplish, but it’s better to get some lightweight interoperability with a larger number of existing systems.

    What are the drawbacks or potential pitfalls?

    • It will likely be 2010 before we see commercial products supporting CMIS, though Alfresco has already announced an implementation of the draft spec in its Labs (fka Community) edition. An open source vendor of course has more flexibility in pushing out (unsupported) code than a commercial vendor, though Alfresco’s REST architecture makes this more straightforward.  (Alfresco does plan to support the draft spec in its commercial Enterprise code during the ratification process; no word on whether commercial vendors will follow suit).
    • Early integrations will in some cases be wrappers, perhaps shipped as downloadable modules outside of regular release cycles.  We’ll have to watch to see what this means and enables.
    • Standards efforts often go nowhere fast.

    I’m sure there are more, but those are the ones that occur to me at the moment.

    At this point, all we can do is note that the vendors have made the effort to develop the standard and watch as it is handed over to OASIS for ratification.  It’s a slow process – the vendors involved began work on this in 2006, which is indicative of the pace of such projects.

    Old news department: Continued growth for SharePoint

    A number of things passed me by this summer (yes, there was a reduced work schedule, a nice vacation — back at it now. Look for this blog to return to activity after a quiet summer).

    One of the things I didn’t follow closely enough at the time was Microsoft’s earnings announcement at the end of its fiscal 2008.  Joe Wilcox at eWeek noted a 30% year-over-year growth in revenue associated with the SharePoint Server.  This isn’t in the filing, so must have been mentioned during the earnings call.  John Mancini picked this up but I didn’t find much else on it.  Then Stephan Elop, President of Microsoft’s Business Division, in a speech during a financial analyst meeting on July 24th cited fiscal year growth of 35% for SharePoint.

    Microsoft claimed $800m in SharePoint revenue (in a press release) last year for fiscal 2007, so 30% growth puts 2008 revenue at $1.04 billion, 35% growth puts it at $1.08 billion.  The company also made a rather vague announcement in March the SharePoint Conference and via a press release that it had surpassed the $1 billion revenue mark.  At that point, we dug into it to find the $1 billion number was for the rolling twelve-month period.

    The vagueness of the numbers is because of the difficulty of tracking individual product revenue, particulary when a product is tied to others in bundles.  Microsoft calculates SharePoint revenue by including revenue associated with Microsoft Office SharePoint Server 2007, the previous SharePoint Portal Server 2003 version, SharePoint Designer, Forms Server and SharePoint Search. SharePoint Server is sold individually and also as part of Microsoft’s Core client access license (CAL) and Enterprise CAL. So in the latter case, a share of the revenue from those bundles is associated with SharePoint.

    All of this means the numbers are inexact to be sure and all licensed SharePoint seats (we haven’t seen an update on this number, from the 100 million claimed earlier this year) are not actively used.  But of course, the numbers are still indicative of SharePoint’s growing adoption, which few question. And many customers use the free SharePoint Services, which doesn’t directly show up in revenue numbers at all.

    I suppose Microsoft didn’t make a big deal about it because the growth is in line with what it had already reported earlier in the year.  For others, the fact that SharePoint is a growing business for Microsoft isn’t exactly, uh, news.  Still, official news on SharePoint can be hard to come by so forgive the post if this is too old news, but I thought if I missed it, maybe others had too.

    Alfresco plays Microsoft’s SharePoint game

    Last week it was EMC’s Documentum group taking on SharePoint and this week it’s Alfresco, interesting not just because both Documentum and Alfresco were founded by the same person.  I had the chance to speak with that person, John Newton, this morning about Alfresco Labs 3.0 (Labs is the new name for Alfresco Community, which is the unsupported, uncertified version of Alfresco’s open source ECM software).

    Alfresco has been positioning itself as the open source alternative to SharePoint for awhile and this announcement puts more wood behind that marketing arrow (Alfresco is undeniably good at marketing).

    By working with the documented server protocols that Microsoft made available after its tangle with the EU, Alfresco built interoperability with the Microsoft Office desktop apps and with Microsoft Office SharePoint Server to make Alfresco a more viable replacement for SharePoint or to make it easier for the two to co-exist.  The most useful part of this will be the ability for end-users to work with an Alfresco repository via Office apps in the same way they work with SharePoint.

    As is generally the case in ECM, SharePoint and Alfresco aren’t apples-to-apples in all senses and Alfresco isn’t necessarily attempting to replicate all the search, business intelligence and portal pieces of SharePoint just yet.  But this definitely provides an alternative for those organizations looking for basic content services a la SharePoint in a non-Microsoft or mixed server OS, database and browser environments.

    EMC improves SharePoint strategy with CenterStage

    EMC announced the upcoming beta availability of a new product today, Documentum CenterStage (formerly codenamed Magellan); our full write-up for 451 Group clients is here.  CenterStage appears to be aptly named, given the focus it is getting as part of the Documentum 6.5 announcement. CenterStage is part of the announcement, but not really the release.  Documentum 6.5 ships at the end of July and CenterStage Essentials goes into beta later this quarter.

    But CenterStage is the most interesting part of the 6.5 announcement.  With CenterStage, EMC can finally articulate a more coherent competitive strategy against Microsoft SharePoint.  CenterStage by itself isn’t really competitive with SharePoint but it is the user-friendly front-end component the company has lacked.   Until now, there was an integration between Documentum and SharePoint.  Oh and there was eRoom, but EMC really hadn’t kept eRoom up with the times nor was it particularly well integrated with Documentum, making it difficult for the company to sell an ‘end-to-end’ story that was any better than using SharePoint along with Documentum.

    So EMC is putting a lot of energy into CenterStage, it’s a big deal for EMC’s Documentum group.  Will it be a big deal outside of EMC and it’s established Documentum customer base?  Probably not intitially.  But I think a lot of people have been wondering if EMC/Documentum really would cede all the interface apps to Microsoft that easily and, eventually, inevitably, marginalize the Documentum group beyond repair.  At least now, it looks like EMC is in the fight.