Book Review: The Numerati
The Numerati are all the statisticians, computer scientists and analysts around the world who are analyzing tons of data to understand "us". This is the main topic of this wonderful book written by Stephen Baker, a Business Week journalist.
The book is an easy read, written with a simple style that makes it accessible to everybody, and yet incredibly intriguing and informative for the knowledgeable reader.
Stephen interviewed tens of researchers and entrepreneurs around the US and put into focus one of the major trends of our days: not only an incredible amount of data has been and is collected everyday around the world but we are also finally starting to "use" these data to let us understand relevant aspect of the human being. Health, Finance, Marketing, Policy, are only few examples of areas where data is collected and deeply analyzed everyday.
Content
The book is organized around 7 chapters: Worker, Shopper, Voter, Blogger, Terrorist, Patient, Lover, in which people is modeled under the lens of a specific stereotype.
In Worker we are modeled according to our skills and the way we work. We meet people like Samer Takriti at IBM who is modeling about 300.000 IBM workers to understand the relationship between their skills and their performance and how to better allocate these skills in the company the same way we used to do with any other physical company asset.
In Shopper we are modeled according to the things we buy. Researchers are analyzing the millions of transactions we make everyday in stores to understand what "type" of buyers we are. Raiyd Ghani, for instance, analyzes with his group at Accenture Technology Labs grocery store transactions to provide personalized suggestions to shoppers through the use of carts equipped with personal assistants.
In Voter we are modeled according to ... to what? This is an impressive chapter because it demonstrates that we can be modeled in a given domain indirectly, using data that apparently has no connection with the subject matter. This is what Josh Gotbaum with his political firm Spotlight Analysis does. They provide detailed indications on swing voters based on data taken from large data companies like ChoicePoint and and Acxiom, who collect an incledible amount of data about us on almost every aspect of our life (scary?! :-)).
In Blogger we are modeled according to our opinion. Yes, our opinion. There are companies like Umbria Communications which analyzes the blogosphere to understand the opinion trends of millions of bloggers on whatever interests a given company. If I want to track how people react to a new product put on the market Umbria can tell.
In Terrorist we are modeled as potential terrorists or thieves. Here we meet people like Jeff Jonas, now at IBM, who helped casinos in Las Vegas sift through millions of internal records to single out suspect customers. And the same technology is used by In-Q-Tel, the venture capital arm of the CIA which invested in this technology, to cope with national security and counter terrorism.
In Patient we are modeled according to our body signals and medical records. This is the chapter I most loved, not only for its humanitarian applications, but also for the cleverness of some solutions. Eric Dishman launched the home health division at Intel where they design smart sensors like the "magic carpet" that monitors weight an movements to monitor the health of patients and where they try to predict the onset of diseases like Parkinson's and Alzheimer's by detecting suspect variations in the stream of data.
Finally, in Lover we are modeled according to our profile to find matches among us as potential lovers. We meet Helen Fisher, a Rutger's University anthropologist, who devised an innovative method to find matches between people which is the basis of the Chemistry.com dating website. Her method goes well beyond simple matching of demographic data, it is based on her theory that we can be split in four groups where a specific hormone is predominant and that the best matches comes from complementary hormones.
Reflections
The first issue the book raises is obviously privacy. I really liked the approach of Stephen Baker, equally distant from the excitement for the new opportunities brought by innovation and the potential for a super-controlled society where drawing a full profile of ourselves is becoming worryingly easy. Any other technological shift in history came however with the promise of new advancement in human being together with novel problems (think about cars and pollution). Stephen asks the right questions to some of the researchers he met. The most interesting in terms of privacy is the one with Jeff Jonas who is "vehemently opposed to the use of statistical data mining to predict the next terrorist attack" because of the high risk of intrusion and false alarms. And yet he believes that this technology can both protect our freedom and our privacy at the same time. I think this is one of the biggest challenges of our time, to find the right balance between the opportunities for increased freedom and security and the risks of intrusion, control, and faulty conclusions in the analysis of our own data.
From a more scientific and technological point of view what strikes me is the relevance prediction has in all the application areas described in the book. In traditional data analysis, especially for those with a visualization background, the focus is on "understanding" what is in the data to build a mental model out of it and in "discovering" some special gems out of chaos. Yet, however, real world applications are more concerned with elaborating actionable solutions to run and test, and I have the impression that "prediction" lends itself better to this goal. Think about it, through the book's examples, in workers the company wants to predict performance to put people in the right place, in shoppers the grocery store wants to predict what product can be sold to one specific customer to provide timely suggestions, in voter a political party wants to predict which population segment should be addresses with a targeted message to increase the chances they hit a group of swing voters, and so on. How do we, visual information designers and analysts, cope with this fact? Are we able to provide with our tools the same level of actionable knowledge or are we condemned to just describe things and hope that this information will be useful in some way?
Implications for Visualization
What is the role of visualization in the world of the Numerati. I think it is huge!!!
First of all all the technologies used by the Numerati are to some extent prone to errors and they are always the results of continued refinement of the underlying model. Visualization can play a significant role in helping the modelers understand and test their models and explore their implication as they are applied to new data. Without such a level of interaction the risk is to build monstrous black-boxes that spit oracles we all have to follow without really knowing why.
Another area where I see a large role of visualization is when mining is used in monitoring environments, where the timely detection and comprehension of the situation (more technically known as the situational awareness problem) is important. We have a long and respected tradition of research for knowing what works best in terms of visual representation when visual saliency, detection and contextual information are at stake. Well designed visualizations that permit to get the most out of a screen in a matter of seconds are of paramount importance here, from the need to analyze terrorist attacks to the doctor monitoring a patient.
A third potential I see for visualization is the need for personal data visualization. As these technologies develops, and the results of data analysis become more pervasive, I expect to see and increase in the need of managing personal data and the results of these analyzes by end-users. And how are we going to provide this information to the average person? Visualization can play a big role here and and again it would need to reinvent itself a bit. In this domain extremely simple and useful visualizations will be needed and some of them will be provided on non-standard devices like TVs, cell phones, public displays. We need flexible and simple solutions to provide to the large public.
So, in summary, the explosion of data analysis is good news for us! We have plenty of novel challenges to address. A somewhat silent mind shift is already going on underway ... I expect to see in the future an ever tighter integration of automatic mining technologies and visualization, as the recent Visual Analytics trend demonstrates after all.