August 2009


My friend and mentor Bob Klein introduced me to the marketing “retrospectograph” some years ago.  As I recall, we had been reviewing the analysis of data from a consumer study that was part of a simulated test market for a new product.  Something about a particular result was counterintuitive, and I was developing–out loud–a plausible after the fact cause for this finding when Bob said something like, “Well, if you turn on the marketing retrospectograph, you can explain anything.”

This was an important lesson and reminder of the potential fallacy of post hoc explanation (from post hoc ergo propter hoc–“after this, therefore because of this”).  All too often in the course of creating customer knowledge, I’ve seen marketers fashion a theory of the customer from a few facts–the results of a single survey or a few focus groups, for example–and that theory of the customer becomes the basis for action without any further testing.

Nassim Nicholas Talib explores this phenomenon in his book, The Black Swan, using the term “retrospective determinism.”   Taleb describes this mechanism–working backwards to a satisficing causal explanation–in his discussion of “silent evidence,” by which he means all the occurrences of something that are not observed.  Who is to say that the next set of survey results will provide the same facts on which our now codified theory of the customer is based?  His main point is that failing to take into account the possibility of different outcomes or input and outcome pairings not yet observed makes things seem much more deterministic (as opposed to random) than they really are.

Once a theory of the customer emerges, confirmation bias kicks in.  Our natural tendency is to give more attention and weight to events or observations that confirm our causal models than to those observations that are disconfirming.

The smaller the set of observations on which the theory is based, relative to the total number of possible observations, the more likely Taleb’s “silent evidence” is hiding disconfirming, if inconvenient, facts.  Of course, in the hypothetical situation where you have a perfectly identified population of these observations from which you can draw a large enough random sample, it’s possible to make inferences about the relationship between inputs and outcomes without worrying too much about “silent evidence” since we can make explicit statements about the likelihood of unobserved events.  However, this favorable circumstance rarely, if ever, occurs in the world of marketing research and customer knowledge creation.

There’s a lot of evidence coming from cognitive science indicating that we are more or less hard-wired to make erroneous causal inferences.  As one manifestation of this tendency, most fictional crime-solving, from Sherlock Holmes to CSI, relies on Taleb’s mechanism of retrospective determination, coupled with a strong dose of confirmation bias.  Of course, dramatic tension in these stories often comes from the discovery of disconfirming silent evidence that is too compelling to ignore.

The scientific method is supposed to protect us from these biases.  The main problem in marketing is that we do not–or cannot–take the time to execute the sort of programmatic research that treats each plausible explanation or theory of the customer as one of many possible theories that we must continually try to disconfirm.  Unfortunately, it’s all to easy to turn on the marketing “retrospectograph” and convince ourselves that we understand our customers.

Copyright 2009 by David G. Bakken.  All rights reserved.

In my last post on predictive modeling (4 August 2009) I used the recent announcement that the Netflix Prize appears to have been won to make two points.  First, predictive modeling based on huge amounts of consumer/customer data is becoming more important and more prevalent throughout business (and other aspects of life as well).  Second, the power of predictive modeling to deliver improved results may seduce us into believing that just because we can predict something, we understand it.

Perhaps because it fuses popular culture with predictive modeling, Cinematch (Netflix’ recommendation engine) seemed like a good example to use in making these points.  For one thing, if predicting movie viewers’ preferences were easy, the motion picture industry would probably have figured out how to do it at some early stage in the production process–not that they haven’t tried.  A recent approach uses neural network modeling to predict box office success from the characteristics of the screenplay (you can read Malcom Gladwell’s article in The New Yorker titled “The Formula” for a narrative account of this effort).  The market is segmented by product differentiation (e.g., genres) as well as preferences.  At the same time, moviegoers’ preferences are somewhat fluid, and there is a lot of “cross-over” with fans of foreign and independent films also flocking to the most Hollywood of blockbuster films.

This brings to mind a paradox of predictive modeling (PM).  PM can work pretty well in the aggregate (and perhaps allowing Netflix to do a good job of estimating demand for different titles in the backlist) but not so well when it comes to predicting a given individual’s preferences.  I tend to be reminded of this every time I look at the list of movies that Cinematch predicts I’ll love.  For each recommended film, there’s a list of one or more other films that form the basis for the recommendation.  I’m struck by the often wide disparities between the recommended film and the films that led to the recommendation.  One example:  Cinematch recommended “Little Miss Sunshine” (my predicted rating is 4.9, compared to an average of 3.9) because I also liked “There Will Be Blood,” “Fargo,” and “Syriana.”  It would be hard to find three films more different from “Little Miss Sunshine.”  “Mostly Martha” is another example.  This is a German film in the “foreign romance” genre that was remade as “No Reservations” in the U.S. with Catherine Zeta-Jones.  Cinematch based its recommendation on the fact that I liked “The Station Agent.”  These two films have almost no objective elements in common.  They are in different languages, set in different countries, with very different story lines, cast and so forth.  But they share many subjective elements (great acting, characters you care about, and humor, among others) and it’s easy to imagine that someone who likes one of these will enjoy the other.  On the other hand, Cinematch made a lot of strange recommendations (such as “Amelie,” a French romantic comedy) based on the fact that I enjoyed “Gandhi,” the Oscar-winning 1982 biopic that starred Ben Kingsley. (more…)

In my post titled “I can’t tell you what ‘insight’ looks like, but I’ll know it when I see it” (28 May 2009) I mentioned a study conducted by Jonah Berger and Gael Le Mens on the rise and fall of popularity in given names. The latest issue of Knowledge@Wharton describes this research, and a link to the article, “How adoption speed affects the abandonement of cultural tastes.” For your convenience, clicking on this title here will take you to the article.  A key finding from this study is that the faster a name rises in popularity, the more quickly it falls out of favor.

Steve Lohr reported in The New York Times on July 28 that two teams appear to have tied for the $1 million prize offered by Netflix to anyone who could improve its movie recommendation system (target: a 10% reduction in a measure of prediction error).  This is certainly a triumph for the field of predictive modeling, and, perhaps, for “crowdsourcing” (at least when accompanied by a big monetary carrot) as an effective method for finding innovative solutions to difficult problems.

Predictive modeling has been used to target customers and to determine their credit worthiness for at least a couple of decades, but it’s been receiving a lot more attention lately, in part thanks to books like Supercrunchers (by Ian Ayres, Bantam, 2007) and Competing on Analytics (by Thomas H. Davenport and Jeanne G. Harris, Harvard Business School Press, 2007). The basic idea behind predictive modeling, as most of you will know, is that variation in some as yet unobserved outcome variable (such as whether a consumer will respond to a direct mail offer, spend a certain amount on a purchase, or give a movie a rating of four out of five stars) can be predicted based on knowledge of the relationship between one or more variables that we can observe in advance and the outcome of interest.  And we learn about such relationships by looking at cases where we can observe both the outcome and the “predictors.”  The workhorse method for uncovering such relationships is regression analysis.

In many respects, the Netflix Prize is a textbook example of the development of a predictive model for business applications.  In the first place, prediction accuracy is important for Netflix, which operates a long tail business, making a lot of money from its “backlist” of movie titles.  Recommendation engines like Cinematch and those used by Amazon and other online retailers make the long tail possible to the extent that they bring backlist titles to the attention of buyers who otherwise would not discover them. Second, Netflix has a lot of data consisting of ratings of movies by its many customers that can be used as fodder in developing the model.  All entrants had access to a dataset consisting of more than 100 million ratings from over 480,000 randomly chosen Netflix customers (that’s roughly 200 ratings per customer).  In all these customer rated about 18,000 different titles (for about 5,500 ratings per title).  That is a lot of data for developing a predictive model by almost any standard.  And, following the textbook approach, Netflix provided a second dataset to be used for testing the model, because the goal of the modeling is to predict cases not yet encountered, and the judging was based on how accurately a model predicted the ratings in this dataset (and those ratings were not provided to the contestants).

There were a couple of unusual challenges in this competition.  First, despite the sheer quantity of data, it is potentially “sparse” in terms of the number of individuals who rated exactly the same sets of movies.  A second challenge came in the form of what Clive Thompson, in an article in the Sunday Times Magazine (“If You Liked This, You’re Sure to Love That,” November 23, 2008), called the “Napoleon Dynamite” problem.  In a nutshell, it’s really hard to predict how much someone will like “Napoleon Dynamite” based on how much they like other films.  Other problem films identified by Len Bertoni, one of the contestents Thompson interviewed for the article, include “Lost in Translation” (which I liked) and “The Life Aquatic with Steve Zissou” (which I hated, even though both films star Bill Murray).

I’m eager to see the full solutions that the winning teams employed.  After reading about the “Napoleon Dynamite” problem, I began to think that a hierarchical Bayesian solution might work by capturing some of the unique variability in these problem films but there are likely other machine learning approaches that would work.

It’s possible that the achievements of these two teams will translate to real advances for predictive modeling based on the kinds of behavioral and attitudinal data that companies can gather from or about their customers. If that’s the case, then we’ll probably see companies turning to ever more sophisticated predictive models.  But better predictive models do not necessarily improve our understanding of the drivers of customer behavior.  What’s missing in many data-driven predictive modeling systems like Cinematch is a theory of movie preferences.  This is one reason why the algorithms came up short in predicting the ratings for films like “Napoleon Dynamite”–the data do not contain all the information needed to explain or understand movie preferences.  If you looked across my ratings for a set of films similar to “The Life Aquatic” in important respects (cast, director, quirkiness factor) you would predict that I’d give this movie a four or a five.  Same thing for the “The Duchess,”  which I sent back to Netflix without even watching the entire movie.

These minor inaccuracies may not matter much to Netflix which should seek to optimize across as many customers and titles as possible.  Still, if I follow the recommendations of Cinematch and I’m disappointed too often, I may just discontinue Netflix altogether. (NOTE:  Netflix incorporates some additional information into their Cinematch algorithm, but for purposes of the contest, they restricted the data available to the contestants).

In my view, predictive models can be powerful business tools, but they have the potential to lead us into a false belief that because we can predict something on the basis of mathematical relationships, we understand what we’re predicting.  We might also lapse into an expectation that “prediction” based on past behavior is in fact destiny.  We need to remind our selves that correlation or association is a necessary but not a sufficient condition to show a causal relationship.

Copyright 2009 by David G. Bakken.  All rights reserved.