Written by: Danny Stout
Now that I’m officially 39 years old…again…I’ve been around long enough to hear more than a few phrases become popular and then disappear after a few months or years. Most of them have been used in popular culture, but there have also been terms used professionally as well that are no longer groovy. One such popular term that I’ve been hearing a lot lately is Big Data. It seems that the term Big Data may mean different things to different people. What exactly is Big Data? What is Big Data doing for us? Finally, is Big Data here to stay, or will we be talking about a new popular term in a few more years?
Wikipedia tells us that Big Data is a “collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications”. I remember when a gigabyte seemed intimidating. Now my hard drive holds more gigabytes than Carter has liver pills. So if our hard drives hold gigabytes and even terabytes of data, what exactly defines Big Data? While gigabytes and terabytes can get large, they can still be managed on traditional hard drives and network servers. So when you get to Big Data, or in excess of hundreds or even thousands of terabytes of data, you really start seeing some Big Data. And when you get into data this large, you can no longer rely on traditional storage for your data. More than likely you’ll have to rely on distributed file systems or other storage technologies. When you get into data of this size, stored in this manner, you are dealing with Big Data.
What can Big Data do for you? For a recent example look at the victory of President Obama in the recent U.S. election. The Wall Street Journal has an excellent article
written about how Big Data, and the analysis of that data, contributed to his reelection. His campaign merged databases from many sources including those from pollsters, fundraisers, fieldworkers, consumer databases, social media, mobile contacts and voter files, and used that data to improve their campaign efforts. They used algorithms to score this data, deriving persuadability scores for potential undecided voters. “The persuasion scores allowed the campaign to focus its outreach efforts—and their volunteer calls—on voters who might actually change their minds as the result. It also guided them in what policy messages individual voters should hear.” That’s pretty powerful stuff. If you’ve shopped at Walmart, you’ve been on the receiving end of Big Data analysis. People who shop at Walmart generate more than 1 million customer transactions every hour. The data from these transactions are imported into databases estimated to contain more than 2.5 petabytes. That information can be used to determine where to place products or what items to put next to one another so that you will be more likely to buy both products. If you have access to Big Data, you can definitely make it work for you and hit any number of targets.
Now, is Big Data here to stay? The focus is no longer on hypothesis testing in predictive analytics, but on looking to the data for revealing patterns. The data is the model. We are in what I like to call the Angler’s era, but instead of talking about how big someone’s fish is, everyone is talking about the size of their Big Data. Bigger is not necessarily better in Big Data. With sophisticated sampling methodology, you don’t need to analyze 2.5 petabytes of data just because you have access to the database. It’s a waste of time and resources. See Dr. Thomas Hill’s white paper
on Big Data for an excellent discussion of this topic. I’m not sure if I’ll always capitalize the term, and I probably shouldn’t be doing it now. But I do think that big data, or the impact of big data, will be with us for the foreseeable future. Last year, the Big Data Research and Development Initiative
was introduced. The Obama Administration announced $200 million dollars in new research and development investments to handle the rapidly growing volume of data. They hope to use big data to solve some of the Nation’s most pressing challenges. While we will hopefully move out of the Angler’s era in regards to big data, I believe big data, and the impact of big data, will definitely be felt for a long time to come.