Tuesday, April 3, 2012

Text Analytics for (Very Smart) Dummies, Part I

Today’s Most Popular and Least Understood Research Tool Explained…Sort Of
By Marc Dresner, IIR USA

So you think you know text analytics? Maybe. Or not. We hear about this “revolutionary” methodology all the time now, but it clearly means different things to different marketers, let alone researchers.

Ill-defined is oft ill-conceived in my book, but this seems hardly the case.

Not too long ago, after a series of conversations on not necessarily related topics, it occurred to me that there’s a somewhat alarmingly nebulous aspect to this seemingly straightforward concept.

So I ask you: What is text analytics???

Don’t look to me for answers, because like most people I’ve spoken with in research circles, I thought we had this figured out. I daresay we may have been wrong.

To illustrate, I present the first of three simple text Q-&-A interviews.

Each represents neither an entirely consistent nor contradictory definition, but on the whole they most certainly affirm my argument that many of us misapprehend the meaning of text analytics as a methodology and as a tool on some level or other.

About our select three:
- The first is a niche provider and trusted personal favorite who doesn’t mince words, has the right background to speak to the topic and who has even developed a new DIY text analytics software.

- For Part two, I’ve Q/A’d a Fortune 500 corporate research functionary with a unique inward focus and a firm grip and smart take on text analytics.

- Part three features a very special guest, representing an internal research dept from a more forward thinking vertical of a classic company with an outward, traditional consumer perspective.


Lest I get ahead of myself, for Part One, let’s turn loose Tom H. C. Anderson, Founder and Managing Partner of Anderson Analytics OdinText

Anderson’s award-winning firm was notably among the first in MR—circa 2005—to provide text analytics. Over the past few years, based on tremendous experiential knowledge and continous feedback from its clients the firm has developed text analytics solutions for data from large scale survey and call center comments for such clients as Starwood Hotels and Kodak, to social media data for firms like Unilever and LinkedIn.

Q. Please tell us briefly about your current role and your company.

Anderson Analytics helps clients in various industries leverage their structured and unstructured [text] data. My role and my company’s role have been evolving from a more full-service approach to helping clients take a hands-on approach to unstructured data analytics. It’s probably a 50/50 mix, now, but we’ve tried to make our software—OdinText—as intuitive as possible so that our clients can feel comfortable doing most of their own analysis.

Q. Define "text analytics”—Is it all the same?

There are certainly alternatives to the more linguistic approach—which has fallen out of favor a bit now—to statistical and machine learning methods, which seem to be proving more effective.

We’ve also done quite a bit of work related to measuring emotion in text. This technique was first pioneered in the field of psychology.

While you don’t need to be wed to any one tool or approach, we’ve found that for most clients accurately understanding what is being discussed (verbatim concepts) is more important than sentiment or emotion. But it really depends on both the business objectives as well as the data source.

Q. There are different use cases for text analytics and, more recently, text analytics firms are turning their attention to MR. How has market research, specifically as a use case, evolved in importance for the text analytics industry in general and for your company specifically?

Market Research has always been our primary use case, though customer service is obviously very closely linked. MR has reacted slower than expected in my opinion, but the industry is coming around.

Our use case is very different than, say, public relations, which uses [text analytics] mainly to monitor comments broadly on social media and then to engage with specific influencers.

Market researchers need deeper insights, and we also have a lot of valuable data in our organizations from survey open ends—especially trackers—to call center logs, etc. These are very rich insight sources with obvious value.

Admittedly, I think social media data has been hyped, but my job is not to sell a specific data source as being more important than another, but to help clients get an accurate read on un- or undermet needs and to help them move up the text analytic value chain.

Q: What are some of the most important use cases?

Within marketing research, relatively speaking, I think there is a bit too much attention focused on social media monitoring. This is just one single source of text data. Most firms have a wealth of rich unstructured data within their organization already that they need to understand—larger survey data studies, CRM feedback etc. There’s also some confusion surrounding the appropriateness of text analytics for qualitative research.

While this can certainly help smaller samples, the ROI is difficult to justify depending on the circumstances: Some of our clients who’ve been using our software on larger data sets have asked us if they can use it on much smaller studies as well. So our clients are actually changing mymy thinking in this area, and I’m now a bit less concerned with how much data you have.

If you’re comfortable using a specific tool, and it’s designed well, then leveraging it on smaller data sets doesn’t require as much of a time investment, and in such cases, the text volume threshold is much lower.

I would say, though, that you still probably want to have at least a few hundred comments and/or multiple sources of smaller samples before text analytics makes sense. While text analytics can technically offer value to even a single focus group, the ROI here is less promising; you should be able to read and synthesize all the responses of a single focus group yourself.

Q: What are the benefits to an enterprise approach to text analytics versus specific use-case approaches?

There’s even more confusion around this than about what text analytics is and isn’t in general.

Enterprise as in “Enterprise Content Management” is one of the many buzzwords that intersect the market research, text analytics and business intelligence fields. All ECM really means is a formalized means of storing data or documents, usually with a simple search function built in.

Somehow “Enterprise” has taken on a level of importance that lacks meaning. Even survey companies have started calling themselves Enterprise Feedback Management (EFM) firms now; I suppose mainly to differentiate themselves from the popular tools out there that do pretty much the same thing as they do for free (Survey Monkey etc.). The idea seems to be if we can’t beat them let’s change what we call ourselves…

Anyway, getting back to your question and how it relates to text analytics… Some text analytics firms have taken the ECM approach, probably because they came from this BI space before they got into text analytics.

“Enterprise” by definition has to be simplistic. So if you’re looking for a very simplistic search type of solution across your enterprise, then ECM may be an option. There is quite a bit of debate on how useful it is to look at customers and data holistically across organizations and departments.

The approach we’ve taken is to develop software with specific departmental use case in mind, as a SaaS (Software as a service). This means clients don’t need to invest in their own IT hardware or support.

Of course ECM and, more specifically, SaaS applications are not mutually exclusive. Many clients have integrated survey data, CRM data and social media into our tool. We’re also looking into how we might fit our tool into a client’s ECM from another vendor. Secondly, for clients who want to run software on their own servers and do their own integration and upgrades, licensing the more specific SaaS software is also usually an option.

Q: How does Big Data fit into text analytics, and do you think market researchers have the skills and tools needed to leverage what’s available?

Depends on how you define Big Data. Generally, I would say no. Even the larger traditional market research houses have few if any staff with the experience or tools necessary to handle Big Data. And MR industry statistical packages—the usual ones-typically crash with larger data sets, and sampling becomes necessary. This is one of the other reasons we developed OdinText: The datasets we were working with started getting too big for some of the tools we had been using!

Big Data becomes more important further down the text analytics value chain when predictive analytics and modeling are used. I’d love to see more market researchers get past just monitoring and do more of this really interesting work.

Q: What criteria should an organization use to determine whether to (a) develop an in-house text analytics capability, b) outsource text analytics or c) adopt a hybrid model?

I think initially a hybrid model may be ideal. Select a vendor that has experience with text analytics in your specific use case. Ideally, the vendor should be able to train you in best practices and use of their tool, but also be able to handle more full-service approach assuming an important out-of-the-ordinary analysis need comes up or your staff is spread too thin.

Q: What questions need to be asked in order to identify the right capabilities provider when one is required?

You need to ask, “So what?” Don’t fall for a bunch of techno-jargon you don’t understand. If the provider is not able to speak specifically about how text analytics can help your department become more valuable, and make specific contributions and improvements to your decision making and process improvements, then you should be talking to someone else. Simple as that.


Editor's note: Next up, IBM's take on text analytics for internal understanding!

For learn more about text analytics, don't miss The Market Research Technology Event – a unique forum dedicated to the exploration and promotion of technological innovations in consumer and market research and business intelligence—taking place April 30 thru May 2 in Las Vegas.  As a reader of this blog, when you register to join us, mention code MRTECH12BLOG and save 10% off the standard rate!

ABOUT THE AUTHOR/INTERVIEWER
Marc Dresner is an IIR USA communication lead specializing in audience engagement. He is the former executive editor of Research Business Report, a confidential newsletter for the market research industry. He may be reached at mdresner@iirusa.com. Follow him @mdrezz.

3 comments:

Charles_S_Patridge said...

This Text Article is quite good - especially when trying to select an outside vendor to help you with your specific test case. If they do not understand what you are looking for or you do not understand what they are offering - than I agree - you are talking to the wrong vendor.

TM / TA is a VERY MANUAL INTENSIVE venture and should not be taken lightly.

It took us 3 man years to build the text dictionary / ontology / taxonomy that was specific to our needs for Property & Casualty Claims for Automobile Claims.

The resulting table contained over 200,000 text terms (ngrams) which we used to catergorize up to 400 different flags that assisted us in determining a number of key issues for Claim executives - subrogation missed, exploding claims, follow best practices, new types of claims that could become class action cases, stream line annual claim reviews, ability to classify claims in a variety of ways that structure data could not provide, etc etc.

Due to the fact that many off the shelf products are VERY expensive, you may want to consider DIY with whatever software you feel may be useful to at least get you started. In doing so, you will learn a GREAT DEAL about your data, concepts you may not have known about, issues which you never thought of - thus leading you with a better frame of mind to select your 3rd part vendor, should you decide it is necessary.

We happened to use the BASE SAS software of which I wrote many of our own TM routines to do what we needed. It was far cheaper and more flexible than buying the SAS Text Miner product.

It allowed us to explode a variety of concerns, methods, concepts under our control than offered by SAS TM.

My $.02,

Charles Patridge

Doing TM since the 1980s - even though we did not call it TM back then - it was concept discovery for me.

AND MOST IMPORTANT, it is CRITICAL you have an SME (Subject Matter Expert) assigned to your test case for TM while developing / implementing your TM solution.

Tom H. C. Anderson said...

Thank you Charles, and glad we are in agreement.

It’s true that there are various levels of insights that can be gained via text analytics. I think that concept exploration is an important yet relatively easy initial benefit to the software. Tracking/Monitoring is perhaps a second level benefit if you are looking at streams of data which most of us who are using text analytics have.

I think the final and arguably the most exciting step, at least in my opinion, may be the host of analytical techniques that can be brought to bear on the raw data. Some of this can be done within the specific text analytics tools, others can and should be handled in outside tools such as the ones you mention.

Tom H. C. Anderson said...

Thank you Charles, and glad we are in agreement.

It’s true that there are various levels of insights that can be gained via text analytics. I think that concept exploration is an important yet relatively easy initial benefit to the software. Tracking/Monitoring is perhaps a second level benefit if you are looking at streams of data which most of us who are using text analytics have.

I think the final and arguably the most exciting step, at least in my opinion, may be the host of analytical techniques that can be brought to bear on the raw data. Some of this can be done within the specific text analytics tools, others can and should be handled in outside tools such as the ones you mention.