There is an interesting new post at “In the Pipeline” that summarizes the performance of Google’s “big data” project to track flu trends from search terms. In short, the predictive performance appears to be pretty bad so far, at least compared to what you might have expected given the hype around “big data.” The author raises some key points, including the importance of high-quality data, even in very large datasets. I particularly like this analogy:
“The quality of the data matters very, very, much, and quantity is no substitute. You can make a very large and complex structure out of toothpicks and scraps of wood, because those units are well-defined and solid. You cannot do the same with a pile of cotton balls and dryer lint, not even if you have an entire warehouse full of the stuff.” –In the Pipeline, March 24, 2014
Data filtering and modeling approaches will likely continue to improve, however, and I think this project is worth watching in the future.
People often assume that the world tomorrow will be pretty much like the world today. We all have an in-built bias towards linear thinking when we ponder the future. Although a linear bias was helpful for thousands of years of our evolution, today technology is changing at an exponential pace and in order to better anticipate future market opportunities and technology’s impact on society, it is crucial to think in terms of exponential trends. This is a point that renowned futurist Ray Kurzweil has made in his many books and speeches for the last several decades.
One example of an exponential trend in biology (among many) is the cost per genome sequence (graph below). As recently as 2001, the cost to sequence a genome was an astronomical $100M. Between 2001 and 2007, the cost decreased exponentially (a straight line on a log plot), to the point where a genome in 2007 cost only $10M to sequence. Around 2007, a paradigm shift in technology massively accelerated this exponential process, and the cost decreased even faster than before, hitting just $10K in 2012.
As economists are fond of saying, when the price falls, more is demanded. As a result of this massively reduced sequencing price, many more partial and complete genomes are being sequenced than ever before. The dramatic, exponential gains in price/performance of sequencing technology have unleashed a tidal wave of sequence data.