Thinking Analytically


If you were lisitening to NPR’s “All Things Considered” broadcast on January 18, you might have heard a brief report on research that reveals regional differences (“dialects”) in word usage, spellings, slang and abbreviations in Twitter postings.  For example, Northern and Southern California use spelling variants koo and coo to mean “cool.”

Finding regional differences in these written expressions is interesting in its own right, but I’ve just finished reading the paper describing this research and there’s a lot more going on here than simply counting and comparing expressions across different geographic regions.  The paper is an excellent example of what market researchers might do to analyze social media.

The study authors–Jacob Eisenstein, Brendan O’Connor, Noah A. Smith, and Eric P. Xing–are affiliated with the School of Computer Science at Carnegie Mellon University (Eisenstein, who was interviewed for the ATC broadcast, is a postdoctoral fellow).  They set out to develop a latent variable model to predict an author’s geographic location from the characteristics of text messages.  As they point out, there work is unique in that they use raw text data (although “tokenized”) as input to the modeling.  They develop and compare a few different models, including a “geographic topic model” that incorporates the interaction between base topics (such as sports) and an author’s geographic location as well as additional latent variable models:  a “mixture of unigrams” (model assumes a single topic) and a “supervised linear Dirichlet allocation.”    If you have not yet figured it out, the models, as described, use statistical machine learning methods.  That means that some of the terminology may be unfamiliar to market researchers, but the description of the algorithm for the geographic topic model resembles the hierarchical Bayesian methods using the Gibb’s sampler that have come into fairly wide use in market research (especially for choice-based conjoint analysis).

This research is important for market research because it demonstrates a method for estimating characteristics of individual authors from the characteristics of their social media postings.  While we have not exhausted the potential of simpler methods (frequency and sentiment analyses, for example), this looks like the future of social media analysis for marketing.

Copyright 2011 by David G. Bakken.  All rights reserved.

There’s an interesting article by Jonah Lehrer in the Dec. 13 issue of The New Yorker, “The Truth Wears Off:  Is there something wrong with the scientific method?” Lehrer reports that a growing number of scientists are concerned about what psychologist Joseph Banks Rhine termed the “decline effect.”  In a nutshell, the “decline effect” is an observed tendency for the size of an observed effect to decline over the course of studies attempting to replicate that effect.  Lehrer cites examples from studies of the clinical outcomes for a class of once-promising antipsychotic drugs as well as from more theoretical research.  This is a scary situation given the inferential nature of most scientific research.  Each set of observations represents an opportunity to disconfirm a hypothesis.  As long as subsequent observations don’t lead to disconfirmation, our confidence in the hypothesis grows.  The decline effect suggests that replication is more likely, over time, to disconfirm a hypothesis than not.  Under those circumstances, it’s hard to develop sound theory.

Given that market researchers apply much of the same reasoning as scientists in deciding what’s an effect and what isn’t, the decline effect is a serious threat to creating customer knowledge and making evidence-based marketing decisions. (more…)

As I noted in my last post, the American Marketing Association’s Advanced Research Techniques Forum took place in San Francisco the second week in June (June 6-9).  The program is an intentional mix of presentations from academic researchers and market research practitioners.  While the practitioner presentations are often more interesting, at least from the standpoint of a fellow practitioner, this year the best and most useful presentations either came from the academic side or had significant contribution from one or more academic researchers.  In that last post I wrote about three papers that explored different aspects of social media.  Three more papers from this year’s ART make my list of the most worthwhile presentations. (more…)

The New York Times is one of the more interesting innovators when it comes to using data visualization to tell a story or make a point.  In particular, the Business section employs a variety of chart forms to reveal what is happening in financial markets.  The Weather Report uses “small multiples” to show 10-day temperature trend for major U.S. Cities.

Even more interesting are the occasional illustrations that appear under the heading of “Op-Chart.”  For a few years now the Times periodically presents on the Op-Ed page a comparative table that tracks “progress” in Iraq on a number of measures such as electric power generation.

Another impressive chart appeared in “Sunday Opinion” on January 10, 2010.  Titled “A Year in Iraq and Afghanistan,” this full page illustration provides a detailed look at the 489 American and allied deaths that occurred in Afghanistan and the 141 deaths in Iraq.  At first glance, the chart resembles the Periodic Table of Elements.  Deaths in Iraq take up the top one-fourth or so of the chart (along with the legend); deaths in Afghanistan occupy the bulk of the illustration.

Each death is represented by a figure, and each figure appears in a box representing the date which the death occurred. One figure shape represents American forces, and a slightly different shape signifies a member of the coalition forces.  For coalition forces, the color of the figure indicates nationality.  A small symbol indicates the cause of each death (homemade bomb, mortar, hostile fire, bomb, suicide bomb, or non-combat related).  Multiple deaths from the same event or cause on a date occupy the same box.

Most dates have only a single death, but a few days standout as particularly tragic:  seven U.S. troops dying due to a non-combat related cause in Afghanistan on October 26; eight killed by hostile fire on October 3rd; seven killed by a homemade bomb on October 27; six Italians killed by a homemade bomb on September 17; five Americans killed by a suicide bomber in Mosul, Iraq, on April 10.

The deaths are linked to specific locations on maps of Iraq and Afghanistan.  Helmand Province was the deadliest place, with 79 of the 489 deaths in Afghanistan.  In Iraq, Baghdad was the most dangerous place, accounting for 42 of the 141 deaths in that country.  While Americans are the largest number, 112 of the dead in Afghanistan were British troops.

There is a wealth of information in this chart with four pieces of information on every death, but in some ways there is too much detail.  To get at the numbers I provided above, I had to manually count the pictures.  There are no summary statistics.  The picture grabs our attention, and immediately conveys the magnitude of the price the U.S. and our allies are paying in Afghanistan.   But if we want to act on data, we need a little more than just a very clever visual display.  Summaries of the numbers would help, here.  It’s useful to know, for example, that 65 of the 141 deaths in Iraq (46%) were due to non-combat related causes, compared to 48 (10%) of the deaths in Afghanistan.  Eighty percent of the fatalities in deadly Helmand province were due to hostile fire; 57% in other parts of Afghanistan were caused by homemade bombs (in Iraq there were 19 deaths, or 13% of the total, from homemade bombs).

Two of the creators of this chart, Adriana Lins de Albuquerque (a doctoral student in political science at Columbia) and Alicia Cheng of mgmt.design, produced a slightly different version of this chart summarizing the death toll in Iraq for 2007 (click here).  That earlier version did not have as much detail about each individual death (location information is not included, for example) but includes some additional causes, like torture and beheading that, thankfully, appear to have disappeared.

The advantage to displaying data in this fashion lies in the ability of our brains to form patterns quickly.  The use of color to designate coalition members makes the contributions of our allies apparent in a way that a simple tally might not.  Even without a year-to-year comparison, we can see that Iraq has become, at least for US troops and our allies, a much safer place than Afghanistan.  Additionally, this one chart presents data that, in other forms, might require several PowerPoint slides to communicate: deaths by date, deaths by city or province, deaths by nationality, causes of death, number killed per incident, and cause of death.

Any complex visual display of data requires making trade-offs.  In this case, for example, the creators arranged the deaths chronologically (oldest first) within each geographic block.  That means that patterns in other variables, such as cause of death or nationality of troops, may be harder to detect on first glance.  The chronological ordering has layout implications, since on some dates there were multiple casualties.

All in all, it’s a great piece of data visualization that to my mind would be even better with the addition of a few summary statistics.

A disclaimer–I counted twice to get each of the numbers I provide above, but I offer no guarantee that I am not off by one or two deaths in any of those numbers.

Copyright 2010 by David G. Bakken.  All rights reserved.

Looking back over the last year in market research offers an opportunity to consider just which transformations, new ideas, industry trends, and emerging techniques might shape MR over the next few years.  Here’s a list of eight topics I’ve been following, with thoughts on the potential impact each might have on MR over the next two or three years. (more…)

The debate over the accuracy–and quality–of survey research conducted online is flaring at the moment, at least partly in response to a paper by Yeager, Krosnick, Chang, Javitz. Levendusky, Simpson and Wang: “Comparing the accuracy of RDD telephone surveys and Internet surveys conducted with probability and non-probability samples.”  Gary Langer, director of polling at ABC News, wrote about the paper in his blog “The Numbers” on September 1. In a nutshell, the paper compares survey results obtained via random-digit dialing (RDD) with those from an Internet panel where panelists were recruited originally by means of RDD and from a number of “opt-in” Internet panels where panelists were “sourced” in a variety of ways.   The results produced by the probability sampling methods are, according to the authors, more accurate than those obtained from the non-probability Internet samples.  You can find a response from Doug Rivers, CEO of YouGov/Polimetrix (and Professor of Political Science at Stanford) at “The Numbers,” as well as some other comments.

The analysis presented in the paper is based on surveys conducted in 2004/5.  In recent years the coverage of the RDD sampling frame has deteriorated as the number of cellphone-only users has increased (to 20% currently).  In response to concerns of several major advertisers about the quality of online panel data, the Advertising Research Foundation (ARF) established an Online Research Quality Council and just this past year conducted new research comparing online panels with RDD telephone samples.  Joel Rubinson, Chief Research Office of The ARF, has summarized some of the key findings in a blog post. According to Rubinson, this study reveals no clear pattern of greater accuracy for the RDD sample.  There are, of course, differences in the two studies, both in purpose and method, but it seems that we can no longer assume that RDD samples represent the best benchmark against which to compare all other samples. (more…)

Have you heard about the Facebook Gross National Happiness Index?  On Monday, October 12, the Times ran an article (by Noam Cohen) reporting some of the findings based on analysis of two years’ worth of Facebook status updates from 100 million users in the U.S.  The index was created by Adam D. I. Kramer, a doctoral candidate in social psychology at the University of Oregon, and is based on counts of positive and negative words in status updates.  According to the article, classification of words as positive or negative is based on the Linguistic Inquiry and Word Count dictionary.

Among the researchers’ conclusions:  we’re happier on Fridays than on Mondays; holidays also make Americans happy.  The premature death of a celebrity may make us sad.  According to a post by Mr. Kramer on the Facebook blog, the two “saddest” days–days with the highest numbers of negative words–were the days on which actor Heath Ledger and pop icon Michael Jackson died.  Mr. Kramer points out that, coincidentally, Mr. Ledger died on the day of the Asian stock market crash, which might have contributed to the degree of negativity.

We’re going to see a lot more of this kind of thing as researchers delve into the rich trove of information generated by users of search engines and web-enabled social networking.  The happiness index, based as it is on simple frequency analysis of words, is the tip of the iceberg.  At the moment, “social media”–I’m not exactly sure what that label means–is getting incredible attention in the marketing and marketing research community.  The question that has yet to be posed, let alone answered, is, “what exactly do we learn from all this information?”

(more…)

In my last post on predictive modeling (4 August 2009) I used the recent announcement that the Netflix Prize appears to have been won to make two points.  First, predictive modeling based on huge amounts of consumer/customer data is becoming more important and more prevalent throughout business (and other aspects of life as well).  Second, the power of predictive modeling to deliver improved results may seduce us into believing that just because we can predict something, we understand it.

Perhaps because it fuses popular culture with predictive modeling, Cinematch (Netflix’ recommendation engine) seemed like a good example to use in making these points.  For one thing, if predicting movie viewers’ preferences were easy, the motion picture industry would probably have figured out how to do it at some early stage in the production process–not that they haven’t tried.  A recent approach uses neural network modeling to predict box office success from the characteristics of the screenplay (you can read Malcom Gladwell’s article in The New Yorker titled “The Formula” for a narrative account of this effort).  The market is segmented by product differentiation (e.g., genres) as well as preferences.  At the same time, moviegoers’ preferences are somewhat fluid, and there is a lot of “cross-over” with fans of foreign and independent films also flocking to the most Hollywood of blockbuster films.

This brings to mind a paradox of predictive modeling (PM).  PM can work pretty well in the aggregate (and perhaps allowing Netflix to do a good job of estimating demand for different titles in the backlist) but not so well when it comes to predicting a given individual’s preferences.  I tend to be reminded of this every time I look at the list of movies that Cinematch predicts I’ll love.  For each recommended film, there’s a list of one or more other films that form the basis for the recommendation.  I’m struck by the often wide disparities between the recommended film and the films that led to the recommendation.  One example:  Cinematch recommended “Little Miss Sunshine” (my predicted rating is 4.9, compared to an average of 3.9) because I also liked “There Will Be Blood,” “Fargo,” and “Syriana.”  It would be hard to find three films more different from “Little Miss Sunshine.”  “Mostly Martha” is another example.  This is a German film in the “foreign romance” genre that was remade as “No Reservations” in the U.S. with Catherine Zeta-Jones.  Cinematch based its recommendation on the fact that I liked “The Station Agent.”  These two films have almost no objective elements in common.  They are in different languages, set in different countries, with very different story lines, cast and so forth.  But they share many subjective elements (great acting, characters you care about, and humor, among others) and it’s easy to imagine that someone who likes one of these will enjoy the other.  On the other hand, Cinematch made a lot of strange recommendations (such as “Amelie,” a French romantic comedy) based on the fact that I enjoyed “Gandhi,” the Oscar-winning 1982 biopic that starred Ben Kingsley. (more…)

Steve Lohr reported in The New York Times on July 28 that two teams appear to have tied for the $1 million prize offered by Netflix to anyone who could improve its movie recommendation system (target: a 10% reduction in a measure of prediction error).  This is certainly a triumph for the field of predictive modeling, and, perhaps, for “crowdsourcing” (at least when accompanied by a big monetary carrot) as an effective method for finding innovative solutions to difficult problems.

Predictive modeling has been used to target customers and to determine their credit worthiness for at least a couple of decades, but it’s been receiving a lot more attention lately, in part thanks to books like Supercrunchers (by Ian Ayres, Bantam, 2007) and Competing on Analytics (by Thomas H. Davenport and Jeanne G. Harris, Harvard Business School Press, 2007). The basic idea behind predictive modeling, as most of you will know, is that variation in some as yet unobserved outcome variable (such as whether a consumer will respond to a direct mail offer, spend a certain amount on a purchase, or give a movie a rating of four out of five stars) can be predicted based on knowledge of the relationship between one or more variables that we can observe in advance and the outcome of interest.  And we learn about such relationships by looking at cases where we can observe both the outcome and the “predictors.”  The workhorse method for uncovering such relationships is regression analysis.

In many respects, the Netflix Prize is a textbook example of the development of a predictive model for business applications.  In the first place, prediction accuracy is important for Netflix, which operates a long tail business, making a lot of money from its “backlist” of movie titles.  Recommendation engines like Cinematch and those used by Amazon and other online retailers make the long tail possible to the extent that they bring backlist titles to the attention of buyers who otherwise would not discover them. Second, Netflix has a lot of data consisting of ratings of movies by its many customers that can be used as fodder in developing the model.  All entrants had access to a dataset consisting of more than 100 million ratings from over 480,000 randomly chosen Netflix customers (that’s roughly 200 ratings per customer).  In all these customer rated about 18,000 different titles (for about 5,500 ratings per title).  That is a lot of data for developing a predictive model by almost any standard.  And, following the textbook approach, Netflix provided a second dataset to be used for testing the model, because the goal of the modeling is to predict cases not yet encountered, and the judging was based on how accurately a model predicted the ratings in this dataset (and those ratings were not provided to the contestants).

There were a couple of unusual challenges in this competition.  First, despite the sheer quantity of data, it is potentially “sparse” in terms of the number of individuals who rated exactly the same sets of movies.  A second challenge came in the form of what Clive Thompson, in an article in the Sunday Times Magazine (“If You Liked This, You’re Sure to Love That,” November 23, 2008), called the “Napoleon Dynamite” problem.  In a nutshell, it’s really hard to predict how much someone will like “Napoleon Dynamite” based on how much they like other films.  Other problem films identified by Len Bertoni, one of the contestents Thompson interviewed for the article, include “Lost in Translation” (which I liked) and “The Life Aquatic with Steve Zissou” (which I hated, even though both films star Bill Murray).

I’m eager to see the full solutions that the winning teams employed.  After reading about the “Napoleon Dynamite” problem, I began to think that a hierarchical Bayesian solution might work by capturing some of the unique variability in these problem films but there are likely other machine learning approaches that would work.

It’s possible that the achievements of these two teams will translate to real advances for predictive modeling based on the kinds of behavioral and attitudinal data that companies can gather from or about their customers. If that’s the case, then we’ll probably see companies turning to ever more sophisticated predictive models.  But better predictive models do not necessarily improve our understanding of the drivers of customer behavior.  What’s missing in many data-driven predictive modeling systems like Cinematch is a theory of movie preferences.  This is one reason why the algorithms came up short in predicting the ratings for films like “Napoleon Dynamite”–the data do not contain all the information needed to explain or understand movie preferences.  If you looked across my ratings for a set of films similar to “The Life Aquatic” in important respects (cast, director, quirkiness factor) you would predict that I’d give this movie a four or a five.  Same thing for the “The Duchess,”  which I sent back to Netflix without even watching the entire movie.

These minor inaccuracies may not matter much to Netflix which should seek to optimize across as many customers and titles as possible.  Still, if I follow the recommendations of Cinematch and I’m disappointed too often, I may just discontinue Netflix altogether. (NOTE:  Netflix incorporates some additional information into their Cinematch algorithm, but for purposes of the contest, they restricted the data available to the contestants).

In my view, predictive models can be powerful business tools, but they have the potential to lead us into a false belief that because we can predict something on the basis of mathematical relationships, we understand what we’re predicting.  We might also lapse into an expectation that “prediction” based on past behavior is in fact destiny.  We need to remind our selves that correlation or association is a necessary but not a sufficient condition to show a causal relationship.

Copyright 2009 by David G. Bakken.  All rights reserved.