The debate over the accuracy–and quality–of survey research conducted online is flaring at the moment, at least partly in response to a paper by Yeager, Krosnick, Chang, Javitz. Levendusky, Simpson and Wang: “Comparing the accuracy of RDD telephone surveys and Internet surveys conducted with probability and non-probability samples.”  Gary Langer, director of polling at ABC News, wrote about the paper in his blog “The Numbers” on September 1. In a nutshell, the paper compares survey results obtained via random-digit dialing (RDD) with those from an Internet panel where panelists were recruited originally by means of RDD and from a number of “opt-in” Internet panels where panelists were “sourced” in a variety of ways.   The results produced by the probability sampling methods are, according to the authors, more accurate than those obtained from the non-probability Internet samples.  You can find a response from Doug Rivers, CEO of YouGov/Polimetrix (and Professor of Political Science at Stanford) at “The Numbers,” as well as some other comments.

The analysis presented in the paper is based on surveys conducted in 2004/5.  In recent years the coverage of the RDD sampling frame has deteriorated as the number of cellphone-only users has increased (to 20% currently).  In response to concerns of several major advertisers about the quality of online panel data, the Advertising Research Foundation (ARF) established an Online Research Quality Council and just this past year conducted new research comparing online panels with RDD telephone samples.  Joel Rubinson, Chief Research Office of The ARF, has summarized some of the key findings in a blog post. According to Rubinson, this study reveals no clear pattern of greater accuracy for the RDD sample.  There are, of course, differences in the two studies, both in purpose and method, but it seems that we can no longer assume that RDD samples represent the best benchmark against which to compare all other samples. (more…)

Have you heard about the Facebook Gross National Happiness Index?  On Monday, October 12, the Times ran an article (by Noam Cohen) reporting some of the findings based on analysis of two years’ worth of Facebook status updates from 100 million users in the U.S.  The index was created by Adam D. I. Kramer, a doctoral candidate in social psychology at the University of Oregon, and is based on counts of positive and negative words in status updates.  According to the article, classification of words as positive or negative is based on the Linguistic Inquiry and Word Count dictionary.

Among the researchers’ conclusions:  we’re happier on Fridays than on Mondays; holidays also make Americans happy.  The premature death of a celebrity may make us sad.  According to a post by Mr. Kramer on the Facebook blog, the two “saddest” days–days with the highest numbers of negative words–were the days on which actor Heath Ledger and pop icon Michael Jackson died.  Mr. Kramer points out that, coincidentally, Mr. Ledger died on the day of the Asian stock market crash, which might have contributed to the degree of negativity.

We’re going to see a lot more of this kind of thing as researchers delve into the rich trove of information generated by users of search engines and web-enabled social networking.  The happiness index, based as it is on simple frequency analysis of words, is the tip of the iceberg.  At the moment, “social media”–I’m not exactly sure what that label means–is getting incredible attention in the marketing and marketing research community.  The question that has yet to be posed, let alone answered, is, “what exactly do we learn from all this information?”

(more…)

In the October 4, 2009 edition of The NY Times “Sunday Business” section (“It’s Brand New, but Make It Sound Familiar“), Mary Tripsas, an associate professor at the Harvard Business School, writes about the challenge of finding the right consumer reference points for innovations.  In a nutshell, consumers have a hard time figuring out innovation unless they can compare it to something that is more familiar.  One example offered in the column comes from Arthur Markham, a professor of psychology at the University of Texas in Austin, is the less than blockbuster introduction of the Segway motorized personal transport device.  In a similar vein, Dan Ariely (Predictably Irrational) argues that comparison is a fundamental process in consumer decision making.

Estimating demand for really new innovations may just be the most difficult endeavor in market research.  A decade ago Robert Veryzer, Jr. identified six factors that make it difficult for consumers to react to innovation (“Key Factors Affecting Customer Evaluation of Discontinuous New Products,” Journal of Product Innovation Management, 1998, 15, 136-150) .  The first factor listed is “lack of familiarity with the product, with the way in which the product is used, or with the underlying technology.”  And one way consumers try to understand a discontinuous product is by comparison with things they already know about.

By and large, I think marketers and market researchers underestimate the fundamental role of comparison and contrast in the way we make judgments about products.  As Professor Tripsas makes clear, humans (consumers included) rely on categorization to understand the world.  Looking at a new, discontinuous product, we’re likely to ask, is it this or that? (more…)

The Psychology of Survey Response by Roger Tourangeau, Lance J. Rips, and Kenneth Raskinski (Cambridge University Press, 2000) will change the way you think about the “craft” of survey design.  While there are other, well-regarded books on survey question construction (such as Asking Questions by Norman Bradburn, Seymour Sudman, and Brian Wansink, Jossey-Bass, 2004) and tons of individual research papers and articles on various aspects of survey design, measurement scales, question construction and the like, this is the first book I’ve encountered that presents a practical conceptual framework for understanding the cognitive processes that produce a response to a given question.  Moreover, the authors review a lot of relevant research to support their framework.

(more…)

I got a telemarketing call from my local telephone service provider today.  Actually, the call came from a telemarketing company, on behalf of my telecom provider.

The young woman who called wanted to offer me some packages that might allow me to save money (the typical bundled service offering with local, long distance, broadband internet access and, in this case, satellite TV).

You would expect that someone calling on behalf of your current provider would know what services you already subscribed to, but no–I had to answer a series of questions to let the telemarketer know that I already had the bundled long distance and the “max” DSL service.  So this was mildly annoying, but when you consider that I had upgraded to these services only a few months ago, the absence of “customer knowledge” reflected in this encounter is a good indication that my local telco doesn’t know how to treat me like a valued customer.

There’s just enough inertia to keep me as a customer–my email address is linked to the DSL service, for example.

The amazing thing is that technology and CRM software makes it dead simple to put the relevant customer info at the fingertips of a telemarketing rep.  So, when it’s not there, it’s all the more glaring.

Copyright 2009 by David G. Bakken.  All rights reserved.

The news last week that Kohlberg Kravis Roberts & Co. was giving Eastman Kodak a cash infusion of $400 million prompted me to reflect on the changing fortunes of the yellow box over the last fifteen or so years.  For most of my lifetime, Kodak defined consumer photography.  Sure, there were some threats from Fuji and point-and-shoot 35 mm camera makers, but for the average snapshooter, or the parents of newborns, Kodak pretty much owned the market.  And they understood the need to make photography simple.

They had a great business–revenues and profits were tied to the number of images that people captured on film.  The more pictures people took, the more film they used, and the more paper and chemicals processors consumed to turn those images into prints.  Kodak perfected the model over the years, as processing services became almost instantaneous, and consumers could get two prints of every image for almost no incremental cost (but doubling the volume of Kodak paper that was sold–a big advantage in an economies-of-scale business).

Kodak’s consumer marketing strategy was built around the belief that the primary job that consumers were hiring cameras, film, and processing for was documentation of life’s special (“Kodak”) moments.  Consumer research, for example, showed dramatic increases in picture taking following the birth of a first child (with lower peaks for each subsequent child!).

Kodak’s fortunes in consumer photography were tied to this belief and their initial strategy in response to digital photography included a continued emphasis on simplifying the process (cameras with printer docks, for example) and capturing a big chunk of the consumables volume, primarily photo paper.  But something strange happened with digital photography.  The relationship between the number of images captured and the number of prints produced pretty much evaporated.  This was a clue that maybe there was a different job that digital photography was doing for consumers.  And it looks like that job is sharing experiences.  Are cellphone cameras popular because they eliminate one device or because they allow people to share images and experiences  almost immediately?

The documentation job is no doubt still important, but digital photography has revealed that there was at least one other important job.  Because digital does that job better than film photography (no need to print images to share them, for example) and does the documentation job for consumers at least as well as film, the value network has shifted dramatically.

This is a common dynamic with disruptive innovation.  The success of the incumbent technology obscures the fact that it may not do all the jobs it is hired for equally well, or not as cheaply as it could be done.  AT&T, for example, when confronted with competition from the likes of MCI and Sprint, assumed that the “job” they were doing for consumers was providing access to any telephone in the world.  It turns out that the job a significant number of consumers wanted to perform was connecting to a very small number of specific telephone numbers, and MCI and Sprint enabled that at a much lower price.  The rest of that story is telecommunications history.

Copyright 2009 by David G. Bakken.  All rights reserved.

My friend and mentor Bob Klein introduced me to the marketing “retrospectograph” some years ago.  As I recall, we had been reviewing the analysis of data from a consumer study that was part of a simulated test market for a new product.  Something about a particular result was counterintuitive, and I was developing–out loud–a plausible after the fact cause for this finding when Bob said something like, “Well, if you turn on the marketing retrospectograph, you can explain anything.”

This was an important lesson and reminder of the potential fallacy of post hoc explanation (from post hoc ergo propter hoc–”after this, therefore because of this”).  All too often in the course of creating customer knowledge, I’ve seen marketers fashion a theory of the customer from a few facts–the results of a single survey or a few focus groups, for example–and that theory of the customer becomes the basis for action without any further testing.

Nassim Nicholas Talib explores this phenomenon in his book, The Black Swan, using the term “retrospective determinism.”   Taleb describes this mechanism–working backwards to a satisficing causal explanation–in his discussion of “silent evidence,” by which he means all the occurrences of something that are not observed.  Who is to say that the next set of survey results will provide the same facts on which our now codified theory of the customer is based?  His main point is that failing to take into account the possibility of different outcomes or input and outcome pairings not yet observed makes things seem much more deterministic (as opposed to random) than they really are.

Once a theory of the customer emerges, confirmation bias kicks in.  Our natural tendency is to give more attention and weight to events or observations that confirm our causal models than to those observations that are disconfirming.

The smaller the set of observations on which the theory is based, relative to the total number of possible observations, the more likely Taleb’s “silent evidence” is hiding disconfirming, if inconvenient, facts.  Of course, in the hypothetical situation where you have a perfectly identified population of these observations from which you can draw a large enough random sample, it’s possible to make inferences about the relationship between inputs and outcomes without worrying too much about “silent evidence” since we can make explicit statements about the likelihood of unobserved events.  However, this favorable circumstance rarely, if ever, occurs in the world of marketing research and customer knowledge creation.

There’s a lot of evidence coming from cognitive science indicating that we are more or less hard-wired to make erroneous causal inferences.  As one manifestation of this tendency, most fictional crime-solving, from Sherlock Holmes to CSI, relies on Taleb’s mechanism of retrospective determination, coupled with a strong dose of confirmation bias.  Of course, dramatic tension in these stories often comes from the discovery of disconfirming silent evidence that is too compelling to ignore.

The scientific method is supposed to protect us from these biases.  The main problem in marketing is that we do not–or cannot–take the time to execute the sort of programmatic research that treats each plausible explanation or theory of the customer as one of many possible theories that we must continually try to disconfirm.  Unfortunately, it’s all to easy to turn on the marketing “retrospectograph” and convince ourselves that we understand our customers.

Copyright 2009 by David G. Bakken.  All rights reserved.

In my last post on predictive modeling (4 August 2009) I used the recent announcement that the Netflix Prize appears to have been won to make two points.  First, predictive modeling based on huge amounts of consumer/customer data is becoming more important and more prevalent throughout business (and other aspects of life as well).  Second, the power of predictive modeling to deliver improved results may seduce us into believing that just because we can predict something, we understand it.

Perhaps because it fuses popular culture with predictive modeling, Cinematch (Netflix’ recommendation engine) seemed like a good example to use in making these points.  For one thing, if predicting movie viewers’ preferences were easy, the motion picture industry would probably have figured out how to do it at some early stage in the production process–not that they haven’t tried.  A recent approach uses neural network modeling to predict box office success from the characteristics of the screenplay (you can read Malcom Gladwell’s article in The New Yorker titled “The Formula” for a narrative account of this effort).  The market is segmented by product differentiation (e.g., genres) as well as preferences.  At the same time, moviegoers’ preferences are somewhat fluid, and there is a lot of “cross-over” with fans of foreign and independent films also flocking to the most Hollywood of blockbuster films.

This brings to mind a paradox of predictive modeling (PM).  PM can work pretty well in the aggregate (and perhaps allowing Netflix to do a good job of estimating demand for different titles in the backlist) but not so well when it comes to predicting a given individual’s preferences.  I tend to be reminded of this every time I look at the list of movies that Cinematch predicts that I’ll love.  For each recommended film, there’s a list of one or more other films that form the basis for the recommendation.  I’m struck by the often wide disparities between the recommended film and the films that led to the recommendation.  One example:  Cinematch recommended “Little Miss Sunshine” (my predicted rating is 4.9, compared to an average of 3.9) because I also liked “There Will Be Blood,” “Fargo,” and “Syriana.”  It would be hard to find three films more different than “Little Miss Sunshine.”  ”Mostly Martha” is another example.  This is a German film in the “foreign romance” genre that was remade as “No Reservations” in the U.S. with Catherine Zeta-Jones.  Cinematch based its recommendation on the fact that I liked “The Station Agent.”  These two films have almost no objective elements in common.  They are in different languages, set in different countries, with very different story lines, cast and so forth.  But they share many subjective elements (great acting, characters you care about, and humor, among others) and it’s easy to imagine that someone who likes one of these will enjoy the other.  On the other hand, Cinematch made a lot of strange recommendations (such as “Amelie,” a French romantic comedy) based on the fact that I enjoyed “Gandhi,” the Oscar-winning 1982 biopic that starred Ben Kingsley. (more…)

In my post titled “I can’t tell you what ‘insight’ looks like, but I’ll know it when I see it” (28 May 2009) I mentioned a study conducted by Jonah Berger and Gael Le Mens on the rise and fall of popularity in given names. The latest issue of Knowledge@Wharton describes this research, and a link to the article, “How adoption speed affects the abandonement of cultural tastes.” For your convenience, clicking on this title here will take you to the article.  A key finding from this study is that the faster a name rises in popularity, the more quickly it falls out of favor.

Steve Lohr reported in The New York Times on July 28 that two teams appear to have tied for the $1 million prize offered by Netflix to anyone who could improve its movie recommendation system (target: a 10% reduction in a measure of prediction error).  This is certainly a triumph for the field of predictive modeling, and, perhaps, for “crowdsourcing” (at least when accompanied by a big monetary carrot) as an effective method for finding innovative solutions to difficult problems.

Predictive modeling has been used to target customers and to determine their credit worthiness for at least a couple of decades, but it’s been receiving a lot more attention lately, in part thanks to books like Supercrunchers (by Ian Ayres, Bantam, 2007) and Competing on Analytics (by Thomas H. Davenport and Jeanne G. Harris, Harvard Business School Press, 2007). The basic idea behind predictive modeling, as most of you will know, is that variation in some as yet unobserved outcome variable (such as whether a consumer will respond to a direct mail offer, spend a certain amount on a purchase, or give a movie a rating of four out of five stars) can be predicted based on knowledge of the relationship between one or more variables that we can observe in advance and the outcome of interest.  And we learn about such relationships by looking at cases where we can observe both the outcome and the “predictors.”  The workhorse method for uncovering such relationships is regression analysis.

In many respects, the Netflix Prize is a textbook example of the development of a predictive model for business applications.  In the first place, prediction accuracy is important for Netflix, which operates a long tail business, making a lot of money from its “backlist” of movie titles.  Recommendation engines like Cinematch and those used by Amazon and other online retailers make the long tail possible to the extent that they bring backlist titles to the attention of buyers who otherwise would not discover them. Second, Netflix has a lot of data consisting of ratings of movies by its many customers that can be used as fodder in developing the model.  All entrants had access to a dataset consisting of more than 100 million ratings from over 480,000 randomly chosen Netflix customers (that’s roughly 200 ratings per customer).  In all these customer rated about 18,000 different titles (for about 5,500 ratings per title).  That is a lot of data for developing a predictive model by almost any standard.  And, following the textbook approach, Netflix provided a second dataset to be used for testing the model, because the goal of the modeling is to predict cases not yet encountered, and the judging was based on how accurately a model predicted the ratings in this dataset (and those ratings were not provided to the contestants).

There were a couple of unusual challenges in this competition.  First, despite the sheer quantity of data, it is potentially “sparse” in terms of the number of individuals who rated exactly the same sets of movies.  A second challenge came in the form of what Clive Thompson, in an article in the Sunday Times Magazine (“If You Liked This, You’re Sure to Love That,” November 23, 2008), called the “Napoleon Dynamite” problem.  In a nutshell, it’s really hard to predict how much someone will like “Napoleon Dynamite” based on how much they like other films.  Other problem films identified by Len Bertoni, one of the contestents Thompson interviewed for the article, include “Lost in Translation” (which I liked) and “The Life Aquatic with Steve Zissou” (which I hated, even though both films star Bill Murray).

I’m eager to see the full solutions that the winning teams employed.  After reading about the “Napoleon Dynamite” problem, I began to think that a hierarchical Bayesian solution might work by capturing some of the unique variability in these problem films but there are likely other machine learning approaches that would work.

It’s possible that the achievements of these two teams will translate to real advances for predictive modeling based on the kinds of behavioral and attitudinal data that companies can gather from or about their customers. If that’s the case, then we’ll probably see companies turning to ever more sophisticated predictive models.  But better predictive models do not necessarily improve our understanding of the drivers of customer behavior.  What’s missing in many data-driven predictive modeling systems like Cinematch is a theory of movie preferences.  This is one reason why the algorithms came up short in predicting the ratings for films like “Napoleon Dynamite”–the data do not contain all the information needed to explain or understand movie preferences.  If you looked across my ratings for a set of films similar to “The Life Aquatic” in important respects (cast, director, quirkiness factor) you would predict that I’d give this movie a four or a five.  Same thing for the “The Duchess,”  which I sent back to Netflix without even watching the entire movie.

These minor inaccuracies may not matter much to Netflix which should seek to optimize across as many customers and titles as possible.  Still, if I follow the recommendations of Cinematch and I’m disappointed too often, I may just discontinue Netflix altogether. (NOTE:  Netflix incorporates some additional information into their Cinematch algorithm, but for purposes of the contest, they restricted the data available to the contestants).

In my view, predictive models can be powerful business tools, but they have the potential to lead us into a false belief that because we can predict something on the basis of mathematical relationships, we understand what we’re predicting.  We might also lapse into an expectation that “prediction” based on past behavior is in fact destiny.  We need to remind our selves that correlation or association is a necessary but not a sufficient condition to show a causal relationship.

Copyright 2009 by David G. Bakken.  All rights reserved.

Next Page »