Figures lie, so they say, and liars figure. A recent post at Ben Orlin's always-amusing mathwithbaddrawings.com blog nicely encapsulates why so many people feel wary about anything related to statistics and data analysis. Do take a moment to check it out, it's a fast read.
In all of the scenarios Orlin offers in his post, the statistical statements are completely accurate, but the person offering the statistics is committing a lie of omission by not putting the statement in context. Holding back critical information prevents an audience from making accurate assessment of the situation.
Ethical data analysts know better.
Unfortunately, unethical data analysts know how to spin outcomes to put them in the most flattering, if not the most direct, light. Done deliberately, that's the sort of behavior that leads many people to mistrust statistics completely.
Lessons for People Who Consume StatisticsSo, where does this leave us as consumers of statistics? Should we mistrust statistics? The first question to ask is whether we trust the people who deliver statistical pronouncements. I believe most people try to do the right thing.
However, we all know that it's easy—all too easy—for humans to make mistakes. And since statistics can be confusing, and not everyone who wants or needs to analyze data is a trained statistician, great potential exists for erroneous conclusions and interpretive blunders.
Bottom line: whether their intentions are good or bad, people often cite statistics in ways that may be statistically correct, but practically misleading. So how can you avoid getting fooled?
The solution is simple, and it's one most statisticians internalized long ago, but doesn't necessarily occur to people who haven't spent much time in the data trenches:
Always look at the underlying distribution of the data.
Especially if the statistic in question pertains to something extremely important to you—like mean salary at your company, for example—ask about the distribution of the data if those details aren't volunteered. If you're told the mean or median as a number, are you also given a histogram, boxplot, or individual value plot that lets you see how the data are arranged? My colleague Michelle Paret wrote an excellent post about this.
If someone is trying to keep the distribution of the data a mystery, then the ultimate meaning of parameters like mean, median, or mode is also unknown...and your mistrust is warranted.
Lessons for People Who Produce StatisticsAs purveyors and producers of statistics, who need to communicate results with people who aren't statistically savvy, what lessons can we take from this? After reading the Math with Bad Drawings blog, I thought about it and came up with two rules of thumb.
1. Don't use statistics to obscure or deflect attention from a situation.
Most people do not deliberately set out to distort the truth or mislead others. Most people would never use the mean to support one conclusion when they know the median supports a far different story. Our conscience rebels when we set out to deceive others. I'm usually willing to ascribe even the most horrendous analysis to gross incompetence rather than outright malice. On the other hand, I've read far too many papers and reports that torture language to mischaracterize statistical findings.
Sometimes we don't get the outcomes we expected. Statisticians aren't responsible for what the data show—but we are responsible for making sure we've performed appropriate analyses, satisfied checks and assumptions, and that we have trustworthy data. It should go without saying that we are ethically compelled to report our results honestly, and...
2. Provide all of the information the audience needs to make informed decisions.
When we present the results of an analysis, we need to be thorough. We need to offer all of the information and context that will enable our audience to reach confident conclusions. We need to use straightforward language that helps people tune in, and avoid jargon that makes listeners turn off.
That doesn't mean that every presentation we make needs to be laden with formulas and extended explanations of probability theory; often the bottom line is all a situation requires. When you're addressing experts, you don't need to cover the introductory material. But if we suspect an audience needs some background to fully appreciate the results of an analysis, we should provide it.
There are many approaches to communicating statistical results clearly. One of the easiest ways to present the full context of an analysis in plain language is to use the Assistant in Minitab. As many expert statisticians have told us, the Assistant doesn't just guide you through an analysis, it also explains the output thoroughly and without resorting to jargon.
And when statistics are clear, they're easier to trust.
Bad drawing by Ben Orlin, via mathwithbaddrawings.com