Steve Lohr reported in The New York Times on July 28 that two teams appear to have tied for the $1 million prize offered by Netflix to anyone who could improve its movie recommendation system (target: a 10% reduction in a measure of prediction error).  This is certainly a triumph for the field of predictive modeling, and, perhaps, for “crowdsourcing” (at least when accompanied by a big monetary carrot) as an effective method for finding innovative solutions to difficult problems.

Predictive modeling has been used to target customers and to determine their credit worthiness for at least a couple of decades, but it’s been receiving a lot more attention lately, in part thanks to books like Supercrunchers (by Ian Ayres, Bantam, 2007) and Competing on Analytics (by Thomas H. Davenport and Jeanne G. Harris, Harvard Business School Press, 2007). The basic idea behind predictive modeling, as most of you will know, is that variation in some as yet unobserved outcome variable (such as whether a consumer will respond to a direct mail offer, spend a certain amount on a purchase, or give a movie a rating of four out of five stars) can be predicted based on knowledge of the relationship between one or more variables that we can observe in advance and the outcome of interest.  And we learn about such relationships by looking at cases where we can observe both the outcome and the “predictors.”  The workhorse method for uncovering such relationships is regression analysis.

In many respects, the Netflix Prize is a textbook example of the development of a predictive model for business applications.  In the first place, prediction accuracy is important for Netflix, which operates a long tail business, making a lot of money from its “backlist” of movie titles.  Recommendation engines like Cinematch and those used by Amazon and other online retailers make the long tail possible to the extent that they bring backlist titles to the attention of buyers who otherwise would not discover them. Second, Netflix has a lot of data consisting of ratings of movies by its many customers that can be used as fodder in developing the model.  All entrants had access to a dataset consisting of more than 100 million ratings from over 480,000 randomly chosen Netflix customers (that’s roughly 200 ratings per customer).  In all these customer rated about 18,000 different titles (for about 5,500 ratings per title).  That is a lot of data for developing a predictive model by almost any standard.  And, following the textbook approach, Netflix provided a second dataset to be used for testing the model, because the goal of the modeling is to predict cases not yet encountered, and the judging was based on how accurately a model predicted the ratings in this dataset (and those ratings were not provided to the contestants).

There were a couple of unusual challenges in this competition.  First, despite the sheer quantity of data, it is potentially “sparse” in terms of the number of individuals who rated exactly the same sets of movies.  A second challenge came in the form of what Clive Thompson, in an article in the Sunday Times Magazine (“If You Liked This, You’re Sure to Love That,” November 23, 2008), called the “Napoleon Dynamite” problem.  In a nutshell, it’s really hard to predict how much someone will like “Napoleon Dynamite” based on how much they like other films.  Other problem films identified by Len Bertoni, one of the contestents Thompson interviewed for the article, include “Lost in Translation” (which I liked) and “The Life Aquatic with Steve Zissou” (which I hated, even though both films star Bill Murray).

I’m eager to see the full solutions that the winning teams employed.  After reading about the “Napoleon Dynamite” problem, I began to think that a hierarchical Bayesian solution might work by capturing some of the unique variability in these problem films but there are likely other machine learning approaches that would work.

It’s possible that the achievements of these two teams will translate to real advances for predictive modeling based on the kinds of behavioral and attitudinal data that companies can gather from or about their customers. If that’s the case, then we’ll probably see companies turning to ever more sophisticated predictive models.  But better predictive models do not necessarily improve our understanding of the drivers of customer behavior.  What’s missing in many data-driven predictive modeling systems like Cinematch is a theory of movie preferences.  This is one reason why the algorithms came up short in predicting the ratings for films like “Napoleon Dynamite”–the data do not contain all the information needed to explain or understand movie preferences.  If you looked across my ratings for a set of films similar to “The Life Aquatic” in important respects (cast, director, quirkiness factor) you would predict that I’d give this movie a four or a five.  Same thing for the “The Duchess,”  which I sent back to Netflix without even watching the entire movie.

These minor inaccuracies may not matter much to Netflix which should seek to optimize across as many customers and titles as possible.  Still, if I follow the recommendations of Cinematch and I’m disappointed too often, I may just discontinue Netflix altogether. (NOTE:  Netflix incorporates some additional information into their Cinematch algorithm, but for purposes of the contest, they restricted the data available to the contestants).

In my view, predictive models can be powerful business tools, but they have the potential to lead us into a false belief that because we can predict something on the basis of mathematical relationships, we understand what we’re predicting.  We might also lapse into an expectation that “prediction” based on past behavior is in fact destiny.  We need to remind our selves that correlation or association is a necessary but not a sufficient condition to show a causal relationship.

Copyright 2009 by David G. Bakken.  All rights reserved.