A reader asked a great question in response to a post I wrote about Pareto charts. Our readers typically do ask great questions, but this one turned out to be more difficult to answer than it first seemed.
My correspondent wrote:
My understanding is that when you have count data, a bar chart is the way to go. The gaps between the bars emphasize that the data are not measured on a continuous scale. The Pareto chart puts the bars in decreasing size going from left to right. However, the bars now touch, even though the data scale has not changed. I'm just looking for some history or explanation as to why the bars in the Pareto chart touch, which seems to violate basic rules of effective graphing.
In case you're not familiar with all of this, here's a quick, mostly visual recap. A bar chart displays counts of categorical variables. Separating the bars emphasizes the data's categorical nature:
Assume that for some measurable aspect of our business, we classify measurements from 1-10 as Critical, from 11-20 as Very Important, from 21-30 as Important, and so on. A histogram of integer data corresponding to the counts in the bar chart above looks like this:
The bars of the histogram touch because they represent continuous data. It makes sense that the bars abut each other, since there's no categorical "gap" between, say, 1 and 2.
Which brings us to the Pareto chart, whose bins show counts or frequencies of defects—categorical data. And yet, when you produce a Pareto chart in Minitab and most other packages, the bars touch...
The question had me scratching my head. I checked Minitab's built-in statistical glossary and help files, our web site, and then expanded my search for to some reliable statistics resources on the Web. No answer. So I went to my colleague-next-door's office and asked her why the bars on the Pareto chart touch, even though they represent counts or frequencies of categories.
She didn't know, either, but she had a good idea who would know: Dr. Terry Ziemer, who as a senior statistician at Minitab during the 1990s directed development work in the area of industrial statistics. He later became a principal at the Six Sigma Academy and then founded Six Sigma Intelligence. I e-mailed Terry to find out why the bars on Minitab's Pareto chart touch. He quickly replied:
I wish there was some big technical answer I could give you, but it was simply a design choice. At the time when I did program this, most of the example Pareto charts I looked at had the bars touching, and I agreed that (at least in my opinion) the chart looks better that way than it does when there are gaps between the bars. Since a Pareto chart is sort of a distribution graph for defect types, it did seem to make sense to make it more like a histogram, another distribution graph where the bars touch, than to make it look like a standard bar chart where you have the gaps.Bar Charts and Histograms and Paretos, Oh My!
So that's why the bars on a Pareto chart in Minitab touch: it was an aesthetic choice, and one that makes perfect sense if we see the Pareto chart as similar to a histogram, in that it shows you the distribution of defect types.
If you're saying "But that's not a definitive answer," you're right. Unfortunately, there doesn't seem to be a definitive answer. Looking through the literature revealed advocates both for and against having the Pareto bars touch, but not much in the way of detailed rationales.
For example, The Practitioner's Guide to Statistics and Lean Six Sigma for Process Improvement by Mikel Harry et al. states on page 171, "The bars in a Pareto chart are arranged side-by-side (touching) in descending order from the left." Why the bars should touch, however, is left unexplained. Joiner Associates' Pareto Charts: Plain & Simple suggests "Having the bars touch makes it easier to judge the relative size or impact of the different parts of the problem." A design choice, again.
On the other hand, page 112 of Statistical Reasoning for Everyday Life by Jeffrey Bennett et al., states "To make the Pareto chart, we put the bars in descending order of size...Because the categories are nominal, the bars should not touch." Clearly, there's still some debate among academics, and if you prefer your Pareto charts with space between bins, you'll find some support—but the touching bars, as implemented in Minitab, do appear to be the more popular option.