May 8th, 2008 — Content management, Search
The combination of search, text analysis and content management is turning into one of the central memes of this blog. This wasn’t deliberate, although it’s something we’ve deliberated internally for a couple of years.
There were plenty of partnerships between search and content management vendors around, but they seemed to us to be either at the press release level, i.e. little more than marketing, or to be as a result of a small handful of one-off projects in the field.
But it turns out others within the industry were thinking about much deeper integrations even if they weren’t saying so publicly.
About a year after Stellent and FAST (both then independent, of course) announced a partnership that resulted in Stellent OEMing FAST’s engine, FAST seriously considered buying Stellent.
I’ve heard from a couple of reliable sources that this was discussed at the highest level within FAST, but it chose not to pursue the deal and instead decided to veer way off its core business and ending up distracting itself to such an extent it got itself tied up in knots. This ended with it being forced to incur about $55m in charges in 2007 that resulted in its share rice plummeting and thus ending up costing Microsoft a lot less than it would have done.
Incidentally, one of those sidebars - Ezmo - a music community site (presented to analysts in February 2007 as a “customer” of FAST, when in fact the phrase that should’ve been used was”‘wholly-owned subsidiary”) was shut down in March.
Of course Stellent went on to be acquired by Oracle in 2007 and we’ve been impressed by the way the database giant has integrated the company so far.
But FAST and Stellent could have made for an interesting combination of the ability to manage and analyze unstructured content, and who knows, FAST-Stellent might’ve been a force to be reckoned with? Now we look to see what Microsoft - something of a toe-dipper when it comes to content management and Oracle, armed with a pretty decent search engine do to prolong this meme.
April 21st, 2008 — Search
Well, that’s one of those pesky search acquisitions sorted out anyway.
Microsoft and Fast Search & Transfer (FAST) will consummate (their words, not mine) the acquisition on Thursday (April 24) now that the conditions of the acquisition have been met, according to this. FAST has had the requisite number of shares tendered since February. The time since then has been spent clearing the regulatory hurdles.
I’d grown quite attached to those Oslo Stock Exchange announcements as they provided FAST-watchers like me with a a running commentary on FAST’s progress, listing each major customer win as they happened, along with a whole lot of other stuff, including last year’s major stumble.
The new chapter of Microsoft’s enterprise search business starts this week, which is good timing for us, as I’m speaking with them next week.
April 17th, 2008 — Content management, Text analysis
We have long wondered why more content management vendors don’t fully embrace text analysis (or even enterprise search for that matter).
These guardians of most organizations unstructured data were beaten to the punch in terms of exploiting text by business intelligence companies, which are more accustomed to manipulating structured data. It’s great that the BI companies are starting (slowly) to embrace the idea of unlocking the value locked within unstructured text, it’s somewhat bizarre why content management vendors didn’t get there first.
We said this many years ago, in the most coherent form in mid 2005 with our report called Text-aware applications: the endgame for unstructured data (the clue’s in the title).
In report that we said:
“…while the penetration of content management systems is relatively high when compared with other ways of managing unstructured data, these systems do little at present to help analyze that unstructured data.”
and somewhat optimistically:
“Indeed, despite the CMS’s [content management systems] ability to organize, most implementations rarely attempt to push into anything that could be considered a semantic understanding of the content. This may be set to change, however, with some vendors, such as EMC, making headway in automatically parsing documents at a deeper level than just file-level metadata.”
That was a tad premature on our part.
Think about the main players and what they do to understand what resides in the documents they ‘manage.’
EMC Documentum - it has its content intelligence services classification engine sure, and it bought a federated search product many moons ago, but neither are exactly front and central to its product strategy. And ILM (try searching on that now at EMC and see what you get) only dealt with file-level metadata, not semantic metadata. However the X-Hive acquisition was an interesting one from this standpoint (see below for more on XML databases).
Vignette - bar an OEM relationship with Autonomy (which most vendors have) nothing much doing here despite the need for Web content management to increase its understanding of the text its managing to make websites more attractive to advertisers (think of using text analysis to build links to other content automatically to keep visitors on the site longer).
Interwoven - Metatagger isn’t exactly at the bleeding edge any more, although the idea is sound.
IBM Filenet - here there is hope. IBM has taken a classifier it got from its iPhrase acquisition and used it to do initial classification to help determine what should or should not be deemed a record. IBM has all sorts of text analysis toys to play with and we expect more from it in the future.
Open Text - it once had five search engines, and was a pioneer in that space. But I’m not aware of anything it does to extract meaning from the content it manages.
Autonomy - Its tagline is ‘Meaning-based computing.’ It owns a powerful classification engine but now also owns records management and a bunch of other stuff. It’s the one company that checks most of the boxes here (but isn’t a document or Web content management vendor). But as the company currently refuses to talk to us, we’re in the dark as to which bit fits where and are unable to tell our clients what benefits Autonomy could bring them as a result. If the company cares to get in touch with me, I’m here.
This post was prompted partly by a recent conversation I had with Nstein . It is morphing from being a struggling text analysis vendor laden with debt (it’s publicly traded in Canada, so the numbers don’t lie) to a fast-growing combination of Web content management, digital asset management (via acquisitions in 2006 and 2007) and text analysis, built atop an XML database licensed from IxiaSoft. Its focusing exclusively on the largest publishing companies, using the text analysis to automatically create links between new and archived content (thus pushing it up Google rankings). It competes with Mark Logic and Interwoven, mainly.
Any Gmail user that looks in their spam folder and see ads for “Spam Swiss Pie - Bake 45-55 minutes or until eggs are set,” can appreciate how crude keyword matching against content is next to useless.
There’s so much more that can be done here and so much insight being left on the table, whether it be in better website management to attract readers, voice of the customer analysis tied to BI, or government intelligence.
Tools that manage content need to understand that content - its language, its meaning, its sentiment. Otherwise, they are missing a trick.
March 31st, 2008 — Internet, Text analysis
I used to cover a lot of so-called sentiment analysis vendors, that is companies that used text analysis techniques to mine the Web to determine how consumers feel about something, be it a company, product, movie or whatever.
Companies like BuzzMetrics, Biz360 and Cymfony sprung up to serve this market. Some got bought - BuzzMetrics is now part of Nielsen and Cymfony was picked up by TNS Media Intelligence in February 2007. Biz360 meanwhile is still independent and plugging away.
Around the time we published our Text-Aware applications special report in mid-2005 we thought this stuff would move beyond appealing solely to marketing and PR professionals to blaze some sort of trail of text analysis into the enterprise, rather like analysis on structured data has done via the likes of SAS Institute and SPSS. Well, it didn’t, though we still think in general enterprises will adopt text analysis, but that’s for another post.
But I’m amazed to find, turning back to look at the sentiment analysis market after recent conversations with the likes of Jodange and veteran Lexalytics (which is an enabler of this stuff rather than selling the service itself) and reading Matthew Hurst’s posts on sentiment mining that there’s way more companies now than there were 2-3 years ago (so much for traditional maturation models leading to consolidation, or perhaps, with our eye on innovation we were just too early?). But the somewhat disappointing thing to notice was that they are still to doing much the same thing with what appears to be much the same technology.
So here’s a list of what we would broadly call sentiment analysis companies in alphabetical order (some old, some new, some stealth). This is list far from comprehensive and very North American-focused, so I realize I’m probably missing a lot.
It was originally compiled for our internal use, but once I realized just how much of this stuff there is around, I thought I’d share it to see if I could find anymore.
Andiamo Systems - I don’t know them but pricing it by ‘mention’ makes me wonder how sophisticated the sentiment analysis is - more mentions doesn’t necessarily equate to anything other than more mentions.
Biz360 - veteran of the space
BrandIntel - appears to involve a bit of manual labor, rather than a pure software approach
Buzzlogic - recent startup with a lot of er, buzz
Collective Intellect - company launched about a year ago, targeting financial services industry, which is somewhat tough right now
Infonic - British company formerly known as Corpora, has broader text analysis tools and used at Dow Jones, so I believe
Monitor110 - aimed at institutional investors, with Daper Fisher Jurvetson as investors
MotiveQuest - I don’t know them but from the website it may not be quite so technology-driven and more brute force, but I could be wrong
Nielsen Buzzmetrics - the 800lb gorilla that rolled up some of the early players
Nstein - has an Nsentiment module as part of its text analytics offering aimed at the publishing industry
Northern Light - veteran search company, has some sentiment analysis in its MI analyst product, but it’s document-level, which means the whole document is either one thing or another, when often stories can be both positive and negative
RavenPack counts Dow Jones as a partner and also claims to be able to do news-based algorithmic trading, which is ballsy, if nothing else
SentiMetrix - stealth, apparently
ScoutLabs - in beta and uses Lexalytics’ technology
SkyGrid - aggregates and analyzes financial news, includes Bill Burnham as an enthusiastic investor
Summize - analyzes online product reviews for sentiment
Umbria -focused on online sentiment analysis of social media, such as blogs
Here is our report on Lexalytics (451 login required to see full text) and our report on Jodange will come shortly.
We plan to speak to some (but not all) of the above in the coming and I’ll report back on what I find (though most of it will end up in our syndicated research). But if anyone knows of any significant omissions, please leave them in the comments, I’d love to know.
March 20th, 2008 — Search
We’ll occasionally use this blog to discuss our own internal taxonomy work. We ask enough vendors if they eat their own dog food, as it were, so it’s only fair we turn the spotlight on ourselves occasionally.
We here at 451, like all industry analyst companies that publish research have our own issues with categorizing our reports and making them easy to find. We started in an ad-hoc fashion when we launched back in 2000 with eight broad sections and some basic metadata but without any real plan. We gradually added to the categories till we came to a point a few years ago when I realized unless we took a more coordinated approach to a taxonomy and categorizing reports across all our products and services we were heading for trouble.
So, with experience gleaned from talking to numerous vendors and users over the years I embarked on developing a single taxonomy for the whole company. Our IT team built our own taxonomy editor and another tool to sort out reconciliations between old and new taxonomies.
But until Kathleen joined in 2006 however it was still something of a side project, but she used her experience of doing a similar project at Giga to propel the project along and I’m pleased to be able to say we’re now satisfied that we have all the bases covered. However, we know a taxonomy is never finished and we are constantly making small revisions as the industry shifts.
How we use the taxonomy still varies from product to product, however. Our M&A KnowledgeBase has always used a taxonomy as the basis of helping customers find deals, and we have just ported that product over to the new taxonomy, meaning it has gone from 300 or so categories to more than 600 (and I know more is not always better, but you’ll have to trust me on this one, or better still sign up for a trial!). It’s a much more balanced representation of the tech, internet and telecoms industry than before.
This means our Market Insight Service, TechDealmaker and M&A KnowledgeBase are all using the same taxonomy, although only the last of those currently exposes it. We use additional themes to group our research in key areas, such as open source, enterprise security or our work with the European Union. We also use the taxonomy to drive internal tools to help us manage our coverage areas and output.
In future posts I’ll talk about specific elements of the taxonomy and also how we’re planning to roll it out across all our research, improve our search engine and overall make it easier for customers to find the research they need.
March 7th, 2008 — Search, Text analysis
I’ve gathered all my current thinking on potential M&A in enterprise search in a SectorIQ that we published earlier this week to our customers. In it, I look at four main potential targets plus a few other small ones and look at a few of the likely acquirers. (This is the way we write all our Sector IQs, btw and they’re a great way of getting a quick grasp on what might be coming down the pike in any particular sector of the IT industry)
Fortunately those of you that are not our customers (yet!) are able to read it via our arrangement with the New York Times DealBook section. Click here to see the NY Times posting or go here to go straight to the report - and while you’re there, sign up for a trial of our M&A KnowledgeBase, where we’ve been collecting details of every IT, internet and telecoms deal since the start of 2002!
Finally, a quick word about the headline. We like to have some fun here at 451 with these things and while I appreciate that this one might have been pushing things a little in terms of clearly explaining what the report was about, when else would I be able to use it? 
March 3rd, 2008 — Collaboration, Content management, Search
Welcome to the new 451 Group blog about information management. What’s information management, you may ask?
It’s the confluence of a variety of strategies organization employ to get their arms and exploit the myriad sources of data and information at their disposal. Specifically this means 451’s coverage of the following areas:
- Search
- Collaboration
- Content management
- Text analysis
- eDiscovery
- Archiving
- Storage
- Databases (relational & otherwise)
- Business intelligence
- Master & metadata management
It is written mainly by Kathleen Reidy and myself, and both of us will be at the AIIM Expo this week in Boston where we will be taking the temperature of the content management market & talking with a bunch of vendors and end users.
More on that and Drupalcon this week.
February 8th, 2008 — Archiving, Content management, Search
I’ve been attending LegalTech here in New York for the past few years, but this year things seemed to be different.Firstly, and most noticeably, every inch of available space at the New York Hilton on 6th Avenue was taken, spread across three floors. The corridors, which in less busy shows simply lead you to rooms, were lined with stands as were the exhibition spaces. It reminded me of the annual SIFMA Technology Management conference, which is a bit of a zoo and in the same location. But unlike the financial services industry, the legal industry and general counsel offices of corporations haven’t traditionally been seen as major buyers of IT, let alone cutting edge stuff.But there’s nothing like regulations to fuel a surge in the market. The changes the Federal Rules of Civil Procedure (FRCP), which took effect in December 2006 and mandated that all electronic records were discoverable and that parties needed to be ready within 120 days of the start of a lawsuit to discuss their eDiscovery terms. This made eDiscovery a very hot market in 2007 (and helped Stratify to a nice valuation when it was bought by Iron Mountain in July 2007 for $158m).
But one of the messages I picked up pretty loud and clear is that law firms and legal departments have their eye on a much bigger problem, currently being done largely manually, but ripe for automation: document review. Figures of a $15bn market for document review now and a bill of $40bn by 2011 for overall review expense raised more than a few eyebrows among some prospective customers of document review vendors (many of which are also eDiscovery vendors, a market pegged at about $3bn). Jay Brudz, senior counsel, Legal Technology at GE, put it bluntly, “you know how many freaking lightbulbs we’ve gotta sell to pay for that?,” before making it clear that he had no intention in paying what vendors are asking.
The other point of tension I’m picking up is the one between intelligent archiving and search - the battle of ideas between those that think it’s better to do all the tagging at archive time and do some culling at that point (to avoid storing dupes and garbage) and those that think you should store everything and develop smarter search engines.
It’s clear - admittedly without any empirical evidence to hand - that protagonists in this space, be they general counsel departments, outside law firms or the vendors feel the rate is increasing so fast, their ability to cull the data at archiving time to make it more easily discoverable later can’t keep pace. There’s clearly somethig to that, given how rapidly talk has moved from gigabytes, to terabytes to petabytes to something an IBMer who handles data governance strategy for the company told me his clients call Goog-bytes - a generic term to mean so much data they can’t get their heads around it. After all, at this rate it won’t be that long before we talk of yottabytes in this arena, and what comes after that?
Search and archiving is something we at 451 Group have spent a lot of time on already and that is sure to continue in 2008.
February 7th, 2008 — Internet
Given this blog’s name we were very interested to meet up again with Michael Nelson, recently of IBM and now visiting professor at Georgetown University, teaching courses including “The Future of the Internet” and “What Shapes the Global Information Society.” Nelson was until last year director of IBM’s Internet technology and strategy, helping to implement the thoughts of people like the recently retired Irving Wladawsky-Berger and John Patrick, as well as deep involvement in various Internet Society and United Nations efforts in Internet governance. I met him in the 1990s during the various meetings that led to the creation of ICANN in 1998, during which time he left the FCC (after a stint at the Clinton White House) and joined IBM.
We met at an IBM event announcing its plans for Cognos, the acquisition of which closed at the end of January. Nelson chaired a panel of a couple of Cognos customers - one that sold pizza and one that sold gardening tools, but both of which were grappling with rapidly increasing volumes of data within their corporations and both of which used Cognos’ tools to try and do more than just figure out what they have - to actually figures out how their business are performing and how they might to do in the future - performance management tools, leading to business optimization in IBM-Cognos parlance.
Nelson’s only been there for three months, but one of the projects his students are working on is to measure the amount of data on the Internet; of course he acknowledges that depending on what you count as being ‘on the Internet’ (is a company’s Salesforce.com on the Internet?) he and his students could be out by factors of 5, 10 or whatever. I will be finding out more soon and will report back here.
January 25th, 2008 — Search, Text analysis
I was asked by someone recently what I thought would be a major trend in text analysis (or text analytics, but I prefer analysis) in 2008. We covered this ground in 451’s annual preview of storage and information management. That’s available to 451 customers only, but the sub-title of the section on search and text analysis was ‘The big vendors move in.’
That referred as much to to search as it did to text analysis, with the emergence of Oracle and SAP as players in the search market along with IBM. t mentioned Microsoft but at that time, it looked as if Redmond was content to continue developing its own stuff. But then it bought Fast Search & Transfer (FAST) and that changed things. And, I think, along with SAP’s ownership of Business Objects, and by extension text analysis player Inxight Software, it may mean 2008 is a year if not quite of stasis, then certainly of a slower pace of innovation.
FAST had a lot of interesting stuff in its labs, probably too much judging by the financial mess it got itself into mid-2007. But it will be distracted this year as it gets subsumed into Microsoft and it ties itself ever tightly to the Microsoft platform. And similarly for thew 3 to 4 months that Business Objects owned Inxight as an independent company we heard a fair amount about its plans to leverage iunstructured information. Now SAP owns it, we may hear a lot less.