Entries Tagged 'Text analysis' ↓

Text analysis + content management = insight

We have long wondered why more content management vendors don’t fully embrace text analysis (or even enterprise search for that matter).

These guardians of most organizations unstructured data were beaten to the punch in terms of exploiting text by business intelligence companies, which are more accustomed to manipulating structured data. It’s great that the BI companies are starting (slowly) to embrace the idea of unlocking the value locked within unstructured text, it’s somewhat bizarre why content management vendors didn’t get there first.

We said this many years ago, in the most coherent form in mid 2005 with our report called Text-aware applications: the endgame for unstructured data (the clue’s in the title).

In report that we said:

“…while the penetration of content management systems is relatively high when compared with other ways of managing unstructured data, these systems do little at present to help analyze that unstructured data.”

and somewhat optimistically:

“Indeed, despite the CMS’s [content management systems] ability to organize, most implementations rarely attempt to push into anything that could be considered a semantic understanding of the content. This may be set to change, however, with some vendors, such as EMC, making headway in automatically parsing documents at a deeper level than just file-level metadata.”

That was a tad premature on our part.

Think about the main players and what they do to understand what resides in the documents they ‘manage.’

EMC Documentum – it has its content intelligence services classification engine sure, and it bought a federated search product many moons ago, but neither are exactly front and central to its product strategy. And ILM (try searching on that now at EMC and see what you get) only dealt with file-level metadata, not semantic metadata. However the X-Hive acquisition was an interesting one from this standpoint (see below for more on XML databases).

Vignette – bar an OEM relationship with Autonomy (which most vendors have) nothing much doing here despite the need for Web content management to increase its understanding of the text its managing to make websites more attractive to advertisers (think of using text analysis to build links to other content automatically to keep visitors on the site longer).

Interwoven – Metatagger isn’t exactly at the bleeding edge any more, although the idea is sound.

IBM Filenet – here there is hope. IBM has taken a classifier it got from its iPhrase acquisition and used it to do initial classification to help determine what should or should not be deemed a record. IBM has all sorts of text analysis toys to play with and we expect more from it in the future.

Open Text – it once had five search engines, and was a pioneer in that space. But I’m not aware of anything it does to extract meaning from the content it manages.

Autonomy – Its tagline is ‘Meaning-based computing.’ It owns a powerful classification engine but now also owns records management and a bunch of other stuff. It’s the one company that checks most of the boxes here (but isn’t a document or Web content management vendor). But as the company currently refuses to talk to us, we’re in the dark as to which bit fits where and are unable to tell our clients what benefits Autonomy could bring them as a result. If the company cares to get in touch with me, I’m here.

This post was prompted partly by a recent conversation I had with Nstein . It is morphing from being a struggling text analysis vendor laden with debt (it’s publicly traded in Canada, so the numbers don’t lie) to a fast-growing combination of Web content management, digital asset management (via acquisitions in 2006 and 2007) and text analysis, built atop an XML database licensed from IxiaSoft. Its focusing exclusively on the largest publishing companies, using the text analysis to automatically create links between new and archived content (thus pushing it up Google rankings). It competes with Mark Logic and Interwoven, mainly.

Any Gmail user that looks in their spam folder and see ads for “Spam Swiss Pie – Bake 45-55 minutes or until eggs are set,” can appreciate how crude keyword matching against content is next to useless.

There’s so much more that can be done here and so much insight being left on the table, whether it be in better website management to attract readers, voice of the customer analysis tied to BI, or government intelligence.

Tools that manage content need to understand that content – its language, its meaning, its sentiment. Otherwise, they are missing a trick.

Sentiment analysis has more legs than we’d bargained for

I used to cover a lot of so-called sentiment analysis vendors, that is companies that used text analysis techniques to mine the Web to determine how consumers feel about something, be it a company, product, movie or whatever.

Companies like BuzzMetrics, Biz360 and Cymfony sprung up to serve this market. Some got bought – BuzzMetrics is now part of Nielsen and Cymfony was picked up by TNS Media Intelligence in February 2007. Biz360 meanwhile is still independent and plugging away.

Around the time we published our Text-Aware applications special report in mid-2005 we thought this stuff would move beyond appealing solely to marketing and PR professionals to blaze some sort of trail of text analysis into the enterprise, rather like analysis on structured data has done via the likes of SAS Institute and SPSS. Well, it didn’t, though we still think in general enterprises will adopt text analysis, but that’s for another post.

But I’m amazed to find, turning back to look at the sentiment analysis market after recent conversations with the likes of Jodange and veteran Lexalytics (which is an enabler of this stuff rather than selling the service itself) and reading Matthew Hurst’s posts on sentiment mining that there’s way more companies now than there were 2-3 years ago (so much for traditional maturation models leading to consolidation, or perhaps, with our eye on innovation we were just too early?). But the somewhat disappointing thing to notice was that they are still to doing much the same thing with what appears to be much the same technology.

So here’s a list of what we would broadly call sentiment analysis companies in alphabetical order (some old, some new, some stealth). This is list far from comprehensive and very North American-focused, so I realize I’m probably missing a lot.

It was originally compiled for our internal use, but once I realized just how much of this stuff there is around, I thought I’d share it to see if I could find anymore.

Andiamo Systems – I don’t know them but pricing it by ‘mention’ makes me wonder how sophisticated the sentiment analysis is – more mentions doesn’t necessarily equate to anything other than more mentions.

Biz360 – veteran of the space

BrandIntel – appears to involve a bit of manual labor, rather than a pure software approach

Buzzlogic – recent startup with a lot of er, buzz

Collective Intellect – company launched about a year ago, targeting financial services industry, which is somewhat tough right now

Infonic – British company formerly known as Corpora, has broader text analysis tools and used at Dow Jones, so I believe

Monitor110 – aimed at institutional investors, with Daper Fisher Jurvetson as investors

MotiveQuest – I don’t know them but from the website it may not be quite so technology-driven and more brute force, but I could be wrong

Nielsen Buzzmetrics – the 800lb gorilla that rolled up some of the early players

Nstein – has an Nsentiment module as part of its text analytics offering aimed at the publishing industry

Northern Light – veteran search company, has some sentiment analysis in its MI analyst product, but it’s document-level, which means the whole document is either one thing or another, when often stories can be both positive and negative

RavenPack counts Dow Jones as a partner and also claims to be able to do news-based algorithmic trading, which is ballsy, if nothing else

SentiMetrix – stealth, apparently

ScoutLabs – in beta and uses Lexalytics’ technology

SkyGrid – aggregates and analyzes financial news, includes Bill Burnham as an enthusiastic investor

Summize – analyzes online product reviews for sentiment

Umbria -focused on online sentiment analysis of social media, such as blogs

Here is our report on Lexalytics (451 login required to see full text) and our report on Jodange will come shortly.

We plan to speak to some (but not all) of the above in the coming and I’ll report back on what I find (though most of it will end up in our syndicated research). But if anyone knows of any significant omissions, please leave them in the comments, I’d love to know.

Our take on M&A in enterprise search

I’ve gathered all my current thinking on potential M&A in enterprise search in a SectorIQ that we published earlier this week to our customers. In it, I look at four main potential targets plus a few other small ones and look at a few of the likely acquirers. (This is the way we write all our Sector IQs, btw and they’re a great way of getting a quick grasp on what might be coming down the pike in any particular sector of the IT industry)

Fortunately those of you that are not our customers (yet!) are able to read it via our arrangement with the New York Times DealBook section. Click here to see the NY Times posting or go here to go straight to the report – and while you’re there, sign up for a trial of our M&A KnowledgeBase, where we’ve been collecting details of every IT, internet and telecoms deal since the start of 2002!

Finally, a quick word about the headline. We like to have some fun here at 451 with these things and while I appreciate that this one might have been pushing things a little in terms of clearly explaining what the report was about, when else would I be able to use it? 😉

Text analysis in 2008

I was asked by someone recently what I thought would be a major trend in text analysis (or text analytics, but I prefer analysis) in 2008. We covered this ground in 451’s annual preview of storage and information management. That’s available to 451 customers only, but the sub-title of the section on search and text analysis was ‘The big vendors move in.’

That referred as much to to search as it did to text analysis, with the emergence of Oracle and SAP as players in the search market along with IBM. t mentioned Microsoft but at that time, it looked as if Redmond was content to continue developing its own stuff. But then it bought Fast Search & Transfer (FAST) and that changed things. And, I think, along with SAP’s ownership of Business Objects, and by extension text analysis player Inxight Software, it may mean 2008 is a year if not quite of stasis, then certainly of a slower pace of innovation.

FAST had a lot of interesting stuff in its labs, probably too much judging by the financial mess it got itself into mid-2007. But it will be distracted this year as it gets subsumed into Microsoft and it ties itself ever tightly to the Microsoft platform. And similarly for thew 3 to 4 months that Business Objects owned Inxight as an independent company we heard a fair amount about its plans to leverage iunstructured information. Now SAP owns it, we may hear a lot less.