Reading Room

If you were lisitening to NPR’s “All Things Considered” broadcast on January 18, you might have heard a brief report on research that reveals regional differences (“dialects”) in word usage, spellings, slang and abbreviations in Twitter postings.  For example, Northern and Southern California use spelling variants koo and coo to mean “cool.”

Finding regional differences in these written expressions is interesting in its own right, but I’ve just finished reading the paper describing this research and there’s a lot more going on here than simply counting and comparing expressions across different geographic regions.  The paper is an excellent example of what market researchers might do to analyze social media.

The study authors–Jacob Eisenstein, Brendan O’Connor, Noah A. Smith, and Eric P. Xing–are affiliated with the School of Computer Science at Carnegie Mellon University (Eisenstein, who was interviewed for the ATC broadcast, is a postdoctoral fellow).  They set out to develop a latent variable model to predict an author’s geographic location from the characteristics of text messages.  As they point out, there work is unique in that they use raw text data (although “tokenized”) as input to the modeling.  They develop and compare a few different models, including a “geographic topic model” that incorporates the interaction between base topics (such as sports) and an author’s geographic location as well as additional latent variable models:  a “mixture of unigrams” (model assumes a single topic) and a “supervised linear Dirichlet allocation.”    If you have not yet figured it out, the models, as described, use statistical machine learning methods.  That means that some of the terminology may be unfamiliar to market researchers, but the description of the algorithm for the geographic topic model resembles the hierarchical Bayesian methods using the Gibb’s sampler that have come into fairly wide use in market research (especially for choice-based conjoint analysis).

This research is important for market research because it demonstrates a method for estimating characteristics of individual authors from the characteristics of their social media postings.  While we have not exhausted the potential of simpler methods (frequency and sentiment analyses, for example), this looks like the future of social media analysis for marketing.

Copyright 2011 by David G. Bakken.  All rights reserved.


The Preditioneer’s Game:  Using the Logic of Brazen Self-Interest to See and Shape the Future by Bruce Bueno de Mesquita makes a pretty strong case for using models to make critical decisions, whether in business or international policy.  To anyone involved in prediction science, Bueno de Mesquita’s claim of 90% accuracy (“According to a declassified CIA assessment…”) might seem an exaggeration.  But the author has two things in his favor.  He limits his efforts at prediction to a specific type of problem, and he’s predicting outcomes for which there is usually a limited set of possibilities (for example, whether or not a bank will issue a fraudulent financial report in a given year).  (more…)

The current issue of The Economist carries an article titled, “Riders on a swarm.”  The article describes the use of swarm intelligence–the collective behavior that results from the individual actions of many simple “agents”–that is inspired by the behavior of insects like ants and bees or flocks of birds.  Although–unlike a column that appeared in a previous issue –“agent-based simulation” is not mentioned by name, these models have all of the relevant attributes of agent-based simulations, and you can find example models of collective insect and flocking bird behavior in agent-based toolkits such as NetLogo

As noted in the article, these models have found some business applications in logistics and problems like traffic control.  Ant-based foraging models, for example, have been applied to solving routing problems for package delivery services.  Route optimization, given a set of delivery locations, is a fixed problem with a large number of potential solutions that probably can be solved analytically (or by simple brute force) with enough computing power.  Swarm models have the advantage that they can arrive at a good and often optimal solution without needed to specify and solve a linear programming problem.  By programming simple individual agents, such as artificial ants, with a simple set of rules for interacting with their environment and a set of goal-directed behaviors, the system can arrive at an optimal solution, even though no individual agent “solves” the problem. 

Something that was new to me in this article is “particle swarm optimization” (PSO) which is inspired by the behavior or flocking birds and swarming bees.  According to the article, PSO was invented in the 1990’s by James Kennedy and Russell Eberhart.   Unlike the logistics problems, there may be no closed form or analytically tractable solution to problems such as finding the optimal shape for an airplane wing.  In that case, a simulation in which thousands of tiny flowing particles follow a few simple movement rules may be just the ticket.

This stuff is fascinating, but it’s not clear that there are many useful applications for this type of modeling in marketing or marketing research, at least as long as the unit of analysis is the intersection of an individual “consumer” and a specific purchase or consumption occasion.   Of course, if imitation and social contagion are at least as important in our purchase decisions as the intrinsic attributes of t products and services (as research by Duncan Watts and his collaborators has shown in the case of popular music), then agent-based simulations may turn out to be one of the best ways to understand and predict consumer behavior.

Copyright 2010 by David G. Bakken.  All rights reserved.

I’ve just finished Seizing the White Space: Business Model Innovation for Growth and Renewal by Mark W. Johnson (Harvard Business Press).  Johnson is chairman of Innosight, a consulting and investment company he co-founded with Clayton Christensen, Harvard Business School professor and author of The Innovator’s Dilemma.

Johnson aims to present a systematic approach to business model innovation that has grown out of the original research and analysis that fueled The Innovator’s Dilemma. For me, the most important contribution of that earlier work was the way that Christensen revealed an underlying pattern that explained the successes and failures of different innovations.  Since that book was published, Christensen and Johnson (and other collaborators) have refined those ideas, and this book is a neat and concise summary of their current thinking and practice.  In a nutshell, successful innovation requires a solid customer value proposition, a profit formula that will deliver value to the firm, and the right resources and processes.

Johnson makes these ideas accessible without watering them down too much.  When he talks about the role of resource velocity, for example, even readers without an MBA or experience in finance or accounting will get the relationship between resource turnover and profitability.

While Johnson is targeting the challenges of innovating from inside an incumbent business, any one who is developing a business will benefit from his business model innovation framework.

The white space that Johnson wants readers to seize is an area that does not fit well with the current organization (or industry) and consists of either new customers or existing customers served in new ways.  The customers comprising the white space are often nonconsumers who lack access due to affordability or skill.  Many of the case studies Johnson uses to support his argument, such as the $2,200 Tata Nano, the world’s cheapest new car, include new business models that address this lack of access.

This book reinforces one of the themes you’ll find in my previous posts.  Successful innovation starts with finding one or more unmet or poorly met “jobs” that a customer needs to get done.  This outside-in approach to innovation often runs counter to the way most companies innovate, but Johnson will convince you that business model innovation has the potential to create far more value than does product innovation.

Copyright 2010 by David G. Bakken.  All rights reserved.

Nassim Nicholas Taleb introduced a new term into the lexicon of business forecasting, the “black swan event.”  The metaphor comes from the apparent fact that, for some reason, black swans should not exist, but they sometimes do.  In THE BLACK SWAN:  The Impact of the Highly Improbably, Taleb expounds for  366 pages on what is, for the most part, a single idea:  the normal (bell-shaped) distribution is pretty much worthless for predicting the likelihood of any random occurrence.  Taleb augments this idea in various, occasionally entertaining ways, acquaints the reader with power law and “fat tail” distributions, and takes excursions through fractal geometry and chaos theory.

Taleb tells us he aspires to erudition, and he introduces the reader to plenty of “great thinkers” that history has failed to credit.  You can come away from this book feeling that it is mostly about showing us how erudite Taleb is.  For me, one of the key shortcomings is Taleb’s tendency, via style, to claim that we should accept his arguments on faith.  There are plenty of concepts, especially involving numbers, that would benefit from concrete examples.  There’s just a little too much “Take my word for it” in his writing.  Still, if you’ve got time to kill, this is not an unrewarding read.

David Orrell tackles the very same subject–our inability to predict the future–in The Future of Everything:  The Science of Prediction (which has a sub-sub title: “From Wealth and Weather to Chaos and Complexity”).  For a mathematician, Orrel has an entertaining style and writes with clarity.  This book is far more focused than THE BLACK SWAN, which is sort of meandering.  The book is divided into three main parts: past, present and future.  The past provides a history of forecasting, beginning with the Greeks and the Oracle at Delphi.  The present considers the challenges of prediction in three key areas: weather, health (via genetics), and finance.  Orrel did his dissertation research on weather forecasting, and after reading this book, I think you’ll agree that it’s a great case study for revealing everything we think we know about the “science of prediction.”

Orrel’s main point is that a key problem in prediction is model error (the basis of his dissertation), which far outweighs the influence of chaos and other random disturbances.  In a nutshell, the complexity of these systems exceeds our ability to specify and parameterize models (models are subject to specification error, parameter error, and stochastic error).  Weather is a great example.  While there are only a few components to the system (temperature, humidity, air pressure, and such), the interactions between these components are almost impossible to predict.  Another problem is the resolution of the model; conditions are extremely local, but it it very difficult to develop a model that resolves to a volume small enough to predict local conditions.

Orrell educates.  The reader comes away with an understanding of the logic and mechanics of forecasting, as well as the seemingly intractable challenges.  Orrell provides clear explanations of many important forecasting concepts and does a good job of making the math accessible to a general reader.  There are a couple of shortcomings.  Orrell gives only passing notice to agent-based simulation and similar computational approaches to complexity.  And, in the third part of the book (the “future”), after spending the preceding two parts on the near futility of prediction (but for different reasons than Taleb), Orrell offers his “best guesses” for the future in areas such as climate change.

While I embrace the basic premises of these books, some new developments are cause for optimism.  Economists using an agent-based model of credit markets were able to simulate the fall off the cliff that we’ve experienced in the real world, as just one example.  While not truly “predictive,” these models can help us understand the conditions that are likely to produce extreme outcomes.

THE BLACK SWAN has its rewards, but The Future of Everything has far more value for the forecasting professional.  As a chaser, you might try Why Most Things Fail:  Evolution, Extinction and Economics by Paul Ormerod.

Copyright 2010 by David G. Bakken.  All rights reservcd.

The current issue of The Economist (January 30 -February 5 2010) features a 15-page special report on social networking.  Typically thorough, the report covers history, the differences between major players (Facebook, Twitter, and MySpace), benefits for small businesses, potential sources of profit for social networking sites, and some of the “peripheral” issues–such as the impact on office productivity and privacy concerns.  For any marketers who’ve been caught by surprise by the emergence of social media and social networking as marketing forces or been watching out of the corner of their eye, this special report might be especially informative. (more…)

Have you heard about the Facebook Gross National Happiness Index?  On Monday, October 12, the Times ran an article (by Noam Cohen) reporting some of the findings based on analysis of two years’ worth of Facebook status updates from 100 million users in the U.S.  The index was created by Adam D. I. Kramer, a doctoral candidate in social psychology at the University of Oregon, and is based on counts of positive and negative words in status updates.  According to the article, classification of words as positive or negative is based on the Linguistic Inquiry and Word Count dictionary.

Among the researchers’ conclusions:  we’re happier on Fridays than on Mondays; holidays also make Americans happy.  The premature death of a celebrity may make us sad.  According to a post by Mr. Kramer on the Facebook blog, the two “saddest” days–days with the highest numbers of negative words–were the days on which actor Heath Ledger and pop icon Michael Jackson died.  Mr. Kramer points out that, coincidentally, Mr. Ledger died on the day of the Asian stock market crash, which might have contributed to the degree of negativity.

We’re going to see a lot more of this kind of thing as researchers delve into the rich trove of information generated by users of search engines and web-enabled social networking.  The happiness index, based as it is on simple frequency analysis of words, is the tip of the iceberg.  At the moment, “social media”–I’m not exactly sure what that label means–is getting incredible attention in the marketing and marketing research community.  The question that has yet to be posed, let alone answered, is, “what exactly do we learn from all this information?”


Next Page »