Quantcast
Channel: Minitab | Minitab
Viewing all articles
Browse latest Browse all 828

Everyone's Talking "Big Data"...But Size Isn't What Matters

$
0
0

You know what the big thing is in the data analysis world—"Big Data." Big, big, big, very big data. Massive data. ENORMOUS data. Data that is just brain-bendingly big. Data so big that we need globally interconnected supercomputers that haven't even been built yet just to contain one one-billionth of it. That's the kind of big data everybody's so excited about. 

Whatever. There's no denying that the proliferation of data about seemingly every nuance of the world opens many, many opportunities to answer important questions. But in practical terms, "big data" seems to work much the same as...well...regular data, with same risks, benefits, and potential for misuse and abuse. 

"Big Data" = Big Confusion

Part of my bad attitude toward the term "big data" stems from my writing background. I like idioms, coinages and phrases that really communicate something meaningful about their subject. "Punk rock," for example, is a linguistically elegant way to describe the stylistic and attitudinal shift performers like the Ramones and the Germs brought into rock music in the mid-1970s. It tells you who, what, and even a little bit of how a cultural transformation occurred. 

The term "Big Data" isn't so elegant or easily defined.The term's origin is obvious: thanks to technology, we can collect, generate, store, slice, dice, and splice together data sets that really were unimaginably big just a few years ago. And a few years hence, our capacity to generate and store more data about more and more and more stuff will be even greater.  

But from that sensible origin, the buzzword "Big Data" has assumed such a wide variety of meanings that it's a useless term for anyone who values specificity. When someone says "Big Data" in your staff meeting, is everyone in the room thinking about the same thing? Probably not.

Lisa Arthur summed the issue up nicely in a column on Forbes.com

Big data is new and “ginormous” and scary–very, very scary. No, wait. Big data is just another name for the same old data marketers have always used, and it’s not all that big, and it’s something we should be embracing, not fearing. No, hold on. That’s not it, either. What I meant to say is that big data is as powerful as a tsunami, but it’s a deluge that can be controlled . . . in a positive way, to provide business insights and value. Yes, that’s right, isn’t it?

Oh, who knows? And honestly, who really cares?  Trying to find a firm definition of "Big Data" is focusing on the wrong detail about the opportunities all of this available data offers. 

No Matter How Big, Data Answers Only the Questions You Ask

You can use statistical software like Minitab to make sense of your data. We've described Minitab as a tool to help you "hear what your data are saying." That's useful shorthand to convey the benefits of data analysis, but it's probably more accurate to say that statistical software facilitates a conversation with your data. Or perhaps an interrogation. You can think of statistical software as a translator, putting the language of raw numbers into terms that are easier and faster for most of us to understand. 

The problem is, your data is only going to answer the questions you ask. It doesn't just volunteer valuable information. And if you're staring at your data and wondering where to begin, does it really matter whether you've got 30 data points or 300 million?  

Of course, most peoplehave some idea of what they'd like to find out, even if they have a little trouble articulating it precisely. But even a straightforward question like "Did switching suppliers improve product quality?" can be trickier than it seems. Because as any translator will tell you, the answer to a question might be different depending on how you ask it. A query that's sensitive to the right nuances and considerations may yield a very detailed and useful response, while an unsubtle and overly broad question might bring an answer that's technically accurate, but practically unusable. 

Bernard Marr put it very well in a recent column:

...data on its own is meaningless. Remember the value of data is not the data itself – it’s what you do with the data.

Big Enough to Render Everything Significant?

My last quibble with the current hoopla over "big data" is the fact that it, perhaps unintentionally, lends credence to the notion that more data is always better. In fact, more data is not always better

Marr touches on this point as well. He notes that the advent of "big data" lets decision-makers back up their choices with "facts." The problem, he aptly points out, is that "all those 'facts' can conceal the truth." In other words, get a data set that's big enough and you can get the right answer to just about any misguided or ill-considered question you want to ask. "A lot of data can generate lots of answers to things that don't really matter," as Marr says. 

I'd take it a step further than that, though: when the data gets big enough, nearly everything can become statistically significant. If you're paying attention to the difference between statistical and practical significance, that's not a big problem. But if you're not paying attention, or if, perhaps, you're very eager to see evidence of a particular outcome, conflating statistical and practical significance is all too easy. 

My favorite demonstration of this was discussed in an earlier post on the Minitab Blog, in which occasional blogger the Stats Cat related the tale of an analysis of MRI data that found statistically significant evidence of brain activity in a dead, frozen salmon. Of course, the statistical significance was meaningless on a practical level—the data set was just so big that finding statistical significance was essentially inevitable. Despite the statistical "evidence" of brain activity, that salmon remained indisputably dead

It's a great cautionary tale that illustrates why more data doesn't always give better answers, and reminds us that unless we're careful about the questions we ask and the conclusions we make, our big-data dreams also have the potential to lead us into nightmarish mistakes. 


Viewing all articles
Browse latest Browse all 828

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>