Entries Tagged 'Archiving' ↓
June 20th, 2008 — Archiving, Search, Text analysis, Uncategorized
You’ve had Nick’s take, now here’s mine, with a little overlap – great minds think alike, right?
We were not expecting the 40 attendees for the pre-conference workshops during prime Sunday TV viewing time. Seth Grimes laid out “Text Analytics for Dummies,” while Nick gave a market overview. But the attendance (and the long Q&A sessions) were good indicators of user enthusiasm and the desire for real, practicable advice about the field.
Some of the other memorable moments:
- Best of the vendor panel: Seth Grimes’s challenge to say something nice about a fellow vendor’s offerings. And the vendors’ response to an audience question about incorporating UIMA, which was uniformly that it wasn’t necessary or in demand.
- The Facebook presentation on trend-tracking through users’ “Wall” posts was brought back for an encore by popular demand. The crowd in my session was a little confrontational about the amount of analysis being done on the available information (never enough!), but as far as quick and dirty zeitgeist goes, it was unbeatable, and a lot of fun.
- The Clarabridge 1-hour deployment was good sport, with at least one customer’s testimony that once the system is learned, it can actually be configured with speed approaching that of CTO Justin Langseth. You have to hand it to Clarabridge: they make it look easy.
Some thoughts on the users’ takes:
- In presentations and in private chats, frequently recurring themes among vendors was eDiscovery and social media – some of the drivers for the market. The user questions I heard were mostly about sentiment analysis, deployment time and ROI. Specifically, information on how to judge all of the offerings – is sentiment analysis accurate enough? What is the expected deployment time, what is the ROI?
- Precision and recall went back and forth again, but the hard truth is that the edge depends on the application. For patents or PubMed searches or eDiscovery, you need recall. For other applications, precision is paramount. Some users I spoke with mistook this as a lack of accuracy – it’s more of a sliding scale of usefulness.
- Accuracy was a recurring issue, both because text analytics is an emerging technology, and, of course, text is messy and imprecise. Partly it’s a matter of maturation. But the “fast / cheap / or good – pick any two” truism about software development is equally true here. Even with built in taxonomies and dictionaries or domain-specific knowledge, any text analytics software needs configuration to increase accuracy for its application and user, which takes time.
- “Win fast and win often” – great words from Tony Bodoh of Gaylord Hotels, on the user panel. Because of the financial investment, the fact that text analysis software can automate (obsolete) some employee work, the time it takes to configure, and general resistance to change, it is important to gain both executive and user buy-in early in the process. Chris Jones of Intuit echoed the sentiment, adding that it’s not advisable to go after your largest (and most time-consuming) problem first – come up with a number of smaller successes to prove the concept to users and higher-ups. Incidentally, both of these are Clarabridge users.
- Jones also noted that one of his “lessons learned” was to avoid over-configuring or too much tinkering with the analytics. He advised after a prudent amount of configuration to treat it more or less like a black box, and not worry about what is going on under the hood, just let it do its job and leave it to the professionals.
- Some more wisdom from the user panel: you can’t go into a text analytics deployment expecting quantifiable ROI. “You don’t know what you don’t know” - which is what the tool is there to solve. In many cases, the real potential isn’t obvious until you can see how it works with your business. At that point it’s possible to come up with applications that not even its creators could have thought up.
- Lastly (and this is not a new sentiment, but it meant more coming from school Superintendent Chris Bowman, who looked like he had my parents on speed-dial): the text analytics field is emerging, and will become integrated with larger applications. This will eventually render a conference like this obsolete, but it also means a great chance to get a leg up as an early adopter.
Looking forward to next year!
April 1st, 2008 — Archiving, Content management
When Nick first unveiled this blog last month he rightly noted ‘storage’ as one of the many categories that falls into a capacious bucket we term ‘information management.’ With this in mind he reminded me that it would be appropriate for the 451 Group’s storage research team to contribute to the debate, so here it is!
For the uninitiated, storage can appear to be either a bit of a black hole, or just a lot of spinning rust, so I’m not going to start with a storage 101 (although if you have a 451 password you can peruse our recent research here). Suffice to say that storage is just one element of the information management infrastructure, but its role is certainly evolving.
Storage systems and associated software traditionally have provided applications and users with the data they need, when they need it, along with the required levels of protection. Clearly, storage has had to become smarter (not to mention cheaper) to deal with issues like data growth; technologies such as data deduplication help firms grapple with the “too much” part of information management. But up until now the lines of demarcation between “storage” (and data management) and “information” management have been fairly clear. Even though larger “portfolio” vendors such as EMC and IBM have feet in both camps, the reality is that such products and services are organized, managed and sold separately.
That said, there’s no doubt these worlds are coming together. The issues we as analysts are grappling with relate to where and why this taking place, how it manifests itself, the role of technology, and the impact of this on vendor, investor and end-user strategies. At the very least there is a demand for technologies that help organizations bridge the gap - and the juxtaposition - between the fairly closeted, back-end storage “silo” and the more, shall we say, liberated, front-end interface where information meets its consumers.
Here, a number of competing forces are challenging, even forcing, organizations to become smarter about understanding what “information” they have in their storage infrastructure; data retention vs data disposition, regulated vs unregulated data and public vs private data being just three. Armed with such intelligence, firms can, in theory, make better decisions about how (and how long) data is stored, protected, retained and made available to support changing business requirements.
“Hang on a minute,” I hear you cry. “Isn’t this what Information Lifecycle Management (ILM) was supposed to be about?” Well, yes, I’m afraid it was. And one thing that covering the storage industry for almost a decade has told me is that it moves at a glacial pace. In the case of ILM, the iceberg has probably lapped it by now. The hows and whys of ILM’s failure to capture the imagination of the industry is probably best left for another day, but I believe that at least one aim of ILM – helping organizations better understand their data so it can better support the business — still makes perfect sense.
What we are now seeing is the emergence of some real business drivers that are compelling a variety of stakeholders – from CIOs to General Counsel — to take an active interest in better understanding their data. This, in turn, is driving industry consolidation as larger vendors in particular move to fill out their product portfolios; the latest example of this is the news of HP’s acquisition of Australia-based records management specialist Tower Software. Over the next few weeks I’ll be exploring in more detail three areas where we think this storage-information gap is being bridged; in eDiscovery, archiving and security. Stay tuned for our deeper thoughts and perspectives in this fast-moving space.
February 8th, 2008 — Archiving, Content management, Search
I’ve been attending LegalTech here in New York for the past few years, but this year things seemed to be different.Firstly, and most noticeably, every inch of available space at the New York Hilton on 6th Avenue was taken, spread across three floors. The corridors, which in less busy shows simply lead you to rooms, were lined with stands as were the exhibition spaces. It reminded me of the annual SIFMA Technology Management conference, which is a bit of a zoo and in the same location. But unlike the financial services industry, the legal industry and general counsel offices of corporations haven’t traditionally been seen as major buyers of IT, let alone cutting edge stuff.But there’s nothing like regulations to fuel a surge in the market. The changes the Federal Rules of Civil Procedure (FRCP), which took effect in December 2006 and mandated that all electronic records were discoverable and that parties needed to be ready within 120 days of the start of a lawsuit to discuss their eDiscovery terms. This made eDiscovery a very hot market in 2007 (and helped Stratify to a nice valuation when it was bought by Iron Mountain in July 2007 for $158m).
But one of the messages I picked up pretty loud and clear is that law firms and legal departments have their eye on a much bigger problem, currently being done largely manually, but ripe for automation: document review. Figures of a $15bn market for document review now and a bill of $40bn by 2011 for overall review expense raised more than a few eyebrows among some prospective customers of document review vendors (many of which are also eDiscovery vendors, a market pegged at about $3bn). Jay Brudz, senior counsel, Legal Technology at GE, put it bluntly, “you know how many freaking lightbulbs we’ve gotta sell to pay for that?,” before making it clear that he had no intention in paying what vendors are asking.
The other point of tension I’m picking up is the one between intelligent archiving and search - the battle of ideas between those that think it’s better to do all the tagging at archive time and do some culling at that point (to avoid storing dupes and garbage) and those that think you should store everything and develop smarter search engines.
It’s clear - admittedly without any empirical evidence to hand - that protagonists in this space, be they general counsel departments, outside law firms or the vendors feel the rate is increasing so fast, their ability to cull the data at archiving time to make it more easily discoverable later can’t keep pace. There’s clearly somethig to that, given how rapidly talk has moved from gigabytes, to terabytes to petabytes to something an IBMer who handles data governance strategy for the company told me his clients call Goog-bytes - a generic term to mean so much data they can’t get their heads around it. After all, at this rate it won’t be that long before we talk of yottabytes in this arena, and what comes after that?
Search and archiving is something we at 451 Group have spent a lot of time on already and that is sure to continue in 2008.