Minitab | Minitab

Have you ever had a probability plot that looks like this?

Probability Plot of Patient Weight Before and After Surgery

The probability plot above is based on patient weight (in pounds) after surgery minus patient weight (again, in pounds) before surgery.

The red line appears to go through the data, indicating a good fit to the Normal, but there are clusters of plotting points at the same measured value. This occurs on a probability plot when there are many ties in the data. If the true measurement can take on any value (in other words, if the variable is continuous), then the cause of the clusters on the probability plot is poor measurement resolution.

The Anderson-Darling Normality test typically rejects normality when there is poor measurement resolution. In a previous blog post (Normality Tests and Rounding) I recommended using the Ryan-Joiner test in this scenario. The Ryan-Joiner test generally does not reject normality due to poor measurement resolution.

In this example, the Ryan-Joiner p-value is above 0.10. A probability plot that supports using a Normal distribution would be helpful to confirm the Ryan-Joiner test results. How can we see a probability plot of the true weight differences? Simulation can used to show how the true weight differences might look on a probability plot.

The difference in weight values were rounded to the nearest pound. In effect, we want to add a random value from -0.5 to +0.5 to each value to get a simulated measurement. The steps are as follows:

Store simulated noise values from -0.5 to +0.5 in a column using Calc > Random Data > Uniform.
Use Calc > Calulator to add the noise column to the original column of data.
Create a normal probability plot using Stat > Basic Statistics > Normality Test.
Repeat steps 1-3 several times if you want to see how the results are affected by the simulated values.

The resulting graph from one iteration of these steps is shown below. It suggests that the Normal distribution is a good model for the difference in weights for this surgery.

Probability plot with simulated measurements

Minitab will deliver a presentation on Detecting and Analyzing Non-Normal Data at the IHI conference in Orlando FL on Monday, December 8th 2014. We also are developing a 1-day training course called Detecting and Analyzing Non-Normal data to be released in 2015.

In technical support, we frequently receive calls from Minitab users who have questions about the differences between Cpk and Ppk.

Michelle Paret already wrote a great post about the differences between Cpk and Ppk, but it also helps to have a better understanding of the math behind these numbers. So in this post I will show you how to calculate Ppk using Minitab’s default settings when the subgroup size is greater than 1. Then, in my next post, I’ll show you how to calculate Cpk.

Default Capability Methods

For data that follow the normal distribution, we use a Normal Capability Analysis (Stat> Quality Tools> Capability Analysis> Normal). If we click the Estimate button in the dialog box that comes up, we can see the default methods used in Minitab:

Capability Analysis dialog

From the dialog box above, we can see that Minitab 17 uses the Pooled standard deviation when subgroup sizes are greater than 1. To see details of the formulas used, we can click the Help button in the lower-left corner. Then, from the Help window shown below, click the see also link, and then choose Methods and formulas:

See Also

The Methods and Formulas section shows the formulas Minitab uses. If we click Estimating standard deviation, we can find the formulas for the pooled standard deviation. Under the Potential capability heading we find the formulas for Cpk, and under Overall capability we find the formulas for Ppk.

Normal capability formulas

To illustrate the calculation of Ppk and Cpk, we’ll use a sample data set available in Minitab. Go to File> Open Worksheet, click the Look in Minitab Sample Data folder button at the bottom, and open the dataset named CABLE.MTW.

This sample data set is from a manufacturer of cable wire; the data was collected in subgroups of 5, then the diameters of the cables were recorded and entered in the Minitab worksheet. The lower spec limit for the cable diameter is 0.5 and the upper spec limit is 0.6.

Calculating the Mean and Overall Standard Deviation

If we use the information above to complete the Capability dialog box as shown below (accepting the default settings for the within-subgroup estimation method), Minitab gives us estimates of Cpk and Ppk:

But how exactly does Minitab arrive at these numbers? That's what we're going to find out.

Calculating Ppk is easier than Cpk, since Ppk is based on the overall standard deviation of the data instead of the within-subgroup standard deviation. We'll see the formula for that in my next post, but to obtain the overall standard deviation we can use Stat> Basic Statistics> Store Descriptive Statistics and store the standard deviation and the mean (which we’ll also need) in the worksheet:

(Be sure to click the Statistics button in the dialog box above to make sure only Mean and Standard Deviation are selected before clicking OK.)

Calculating Ppk

To get Ppk (which is the lesser of PPU and PPL), we need to calculate PPU and PPL using the standard deviation and mean shown above, following the formulas shown in Methods and Formulas:

PPU Fomula PPL Formula

In Minitab, we use Calc> Calculator to enter the formulas and store them in the worksheet:

The formulas above give us a PPU of 0.922719 and PPL of 0.800701. Since Ppk is the lesser of these two values, the Ppk is 0.80.

You'll note that this matches the results from the Minitab capability analysis output shown earlier:

Ppk

Of course, it's easier to use Minitab's capability output than it is to do these calculations manually, but my goal is to lift the lid off the "black box" and give you an appreciation for what Minitab does behind the scenes to provide these figures. In my next post, we'll see how Cpk is calculated.

Minitab's capability analysis output gives you estimates of the capability indices Ppk and Cpk, and we receive many questions about the difference between them. Some of my colleagues have taken other approaches to explain the difference between Ppk and Cpk, so I wanted to show you how they differ by detailing precisely how each one is calculated.

When you're using statistical software like Minitab, you don't need to do these calculations by hand, but I also want to lift the lid off the "black box" to show you what Minitab does behind the scenes to provide these figures.

In my previous post, we saw how Ppk is calculated. This time, we'll go through the calculation of Cpk, using the same sample data set in Minitab. Go to File > Open Worksheet, click the "Look in Minitab Sample Data folder" button at the bottom, and open the dataset named CABLE.MTW.

Calculating Within-Subgroup Standard Deviation

Where Ppk uses the overall standard deviation, Cpk uses the within-subgroup standard deviation. Calculating Cpk is easy once we have an estimate of the within-subgroup standard deviation. The default method in Minitab for the within-subgroup calculation is the pooled standard deviation. The formula for this calculation from Methods and formulas is:

formula for pooled standard deviation

This looks a little intimidating, but you’ll see it’s not so bad if we take it one step at a time.

First, we’ll calculate Sp. For this example, the subgroup size is fixed at 5. We’ll begin with a clean worksheet containing only the Diameter data in C1.

We need to estimate the mean of the data in each subgroup and store those values in the worksheet. To do that, we’ll create a column that defines our subgroups using Calc > Make Patterned Data > Simple Set of Numbers, and then completing the dialog box as shown below:

subgroups

With 100 data points and 5 points in each subgroup, we have 20 subgroups.

Now we can use our new column containing the subgroups to calculate the mean of each subgroup, using Stat > Basic Statistics > Store Descriptive Statistics. We complete the dialog box like in the example below, entering the Diameter column under Variables and the Subgroup column as the By variable:

descriptive statistics

We then click Options and choose Store a row of output for each row of input, uncheck Store district values of By variables, and then click OK in each dialog box. Now column C3 will show the average of each subgroup; the first 5 rows from C1 were used to calculate the mean of those first 5 rows, and that same mean value is displayed in the first 5 rows of C3.

We will now use these values to calculate the numerator for Sp using Calc > Calculator:

numerator for Sp

We are summing the squared differences between each measurement and its subgroup mean. The Numerator column in the Minitab worksheet will show 0.02735 using the formula above.

Next, we calculate the denominator for Sp, which is the subgroup size minus 1, summed over all subgroups. Since we have a constant subgroup size of 5, and a total of 20 subgroups, an easy way to enter this in the calculator is:

denominator for Sp

Now with the numerator and denominator for Sp stored in the worksheet, we take the square root of Numerator/Denominator:

square root of numerator/denominator

Notice that the Sp value 0.0184899 is the estimate of the subgroup standard deviation if we tell Minitab NOT to use the unbiasing constant, C4, by clicking the Estimate button in the Normal Capability Analysis dialog box and then unchecking Use unbiasing constants.

Now to finish calculating the within-subgroup standard deviation using C4 (the default), we can look up C4 in the table that is linked in Methods and Formulas under the Methods heading.

The C4 value we need is C4 for (d + 1). As defined in Methods and formulas, d is the sum of (subgroup size – 1); in our case the subgroup size is fixed at 5, so 20*(5-1) = 80. If d = 80, we add 1 and get 81, so we look up N = 81 in the C4 column of unbiasing constants:

unbiasing constants

We enter 0.996880 in column C7 in the worksheet and use it in the calculator to get the pooled within-subgroup standard deviation:

within subgroup standard deviation

We can see that this value matches the output from our initial capability analysis graph.

initial graph

Calculating Cpk

Finally, we use our within-subgroup standard deviation to calculate CPU and CPL. Cpk is the lesser of CPU and CPL, and we find these two formulas in Methods and Formulas:

formula for CPL formula for cpu

We calculate CPL and CPU as shown below using the calculator and the mean of the data that we previously calculated:

calculate cpl and cpu

Since Cpk is the lesser of the two resulting values, Cpk is 0.83. That matches the Cpk value in Minitab’s capability output:

process capability for diameter

As long as you're using Minitab, you won't need to calculate Ppk and Cpk by hand. But I hope seeing the calculations Minitab uses to get these capability indices provides some insight into the differences between them!

Sure, Minitab Statistical Software is powerful and easy to use, but did you know that it’s also magic? One of the illusions that Minitab can peform is the world famous disappearing-reappearing-analysis-settings act. Of course, as with many illusions, it’s not so hard once you know the trick. In this case, it’s downright easy once you know about Minitab project files.

The statue of liberty

If you’ve done any work in Minitab you may very well have saved a project file and been grateful that your data, graphs, and statistical tables could all be saved together in a single file. But, it’s just as amazing that Minitab can remember exactly how you did your analysis the last time.

Imagine that you routinely run a capability analysis on the same process. The first time you did the analysis, you changed several of the options to get the output that you wanted. When you open Minitab the next time, you want to perform the same analysis on a new data set. Having a saved project makes it easy. Try it for yourself if you want, following the steps below. Begin by downloading our free trial if you don't already have our statistical software, then download worksheets Basil.MTW and Basil2.MTW.

Introduce your Assistant

Open the Basil.MTW worksheet.
Choose Stat > Quality Tools > Capability Analysis > Multiple Variables (Normal).
In Variables, enter T1H1 T1H2.
In Subgroup sizes, enter 4.
In Lower spec, enter 2.
In Upper spec, enter 8.
Click Graphs.
Uncheck Normal probability plot. Click OK.
Click Options.
Under Display, select Benchmark Z’s (σ level) and check Include confidence intervals.
Click OK twice.

The capability analysis is in your project file.

Presto, they’re gone!

Close Minitab. When asked if you want to save changes to the project, click Yes.
Name the file and click Save.

Minitab Statistical Software is closed. The settings for your analysis are nowhere to be found.

Abracadabra—they’re back!

Reopen the project file that you saved.
Open the Basil2.MTW worksheet.
Choose Stat > Quality Tools > Capability Analysis > Multiple Variables (Normal).

The settings from your previous analysis have reappeared! All you have to do to complete the capability analysis, with all of your customizations, is click OK.

Bask in the applause from the audience

Keeping all of the parts of your analysis in one place is a great feature of Minitab’s project files. For people who routinely repeat the same analysis, the fact that the project file also remembers the settings that you used for your analysis is a fantastic time saver.

Whether you repeat an analysis weekly, quarterly, or even annually, Minitab’s ready to pick up right where you left off. This might not be quite as astounding as David Copperfield making the Statue of Liberty disappear and reappear, but if you want to get your statistical results fast and easy, it’s the best kind of magic.

Ready for more? Projects files and many other fundamental features of Minitab, are explained in the online Getting Started Guide.

Stepwise regression and best subsets regression are both automatic tools that help you identify useful predictors during the exploratory stages of model building for linear regression. These two procedures use different methods and present you with different output.

An obvious question arises. Does one procedure pick the true model more often than the other? I’ll tackle that question in this post.

Sign: which way?

First, a quick refresher about the two procedures and their different results:

Stepwise regression presents you with a single model constructed using the p-values of the predictor variables
Best subsets regression assess all possible models and displays a subset along with their adjusted R-squared and Mallow’s Cp values

The key benefit of the stepwise procedure is the simplicity of the single model. Best subsets does not pick a final model for you but it does present you with multiple models and information to help you choose the final model. For more details, read this post where I compare stepwise regression to best subsets regression and present examples using both analyses.

Determining the Better Model Selection Method

A study by Olejnik, Mills, and Keselman* compares how often stepwise regression, best subsets regression using the lowest Mallow’s Cp, and best subsets using the highest adjusted R-squared selects the true model.

The authors assessed 32 conditions that differed by the number of candidate variables, number of authentic variables, sample size, and level of multicollinearity. For each condition, the authors created 1,000 computer-generated datasets and analyzed them with both stepwise and best subsets to determine how often each procedure selected the correct model.

And, the winner is...stepwise regression!! Congratulations! Well, sort of, as we’ll see.

Best subsets regression using the lowest Mallow’s Cp is a very close second. The overall difference between Mallow’s Cp and stepwise selection is less than 3%. The adjusted R-squared performed much more poorly than either stepwise or Mallow’s Cp.

However, before we pop open the champagne to celebrate stepwise regression’s victory, there’s a huge caveat to reveal.

Stepwise selection usually did not identify the correct model. Gasp!

Digging into the Results

Let’s look at the results more closely to see how well stepwise selection performs and what affects its performance. I’ll only cover stepwise selection, but the results for Mallow’s Cp are essentially tied and follow the same patterns. I’ll give my thoughts on the matter at the end.

In the results below, stepwise regression identifies the correct model if it selects all of the authentic predictors and excludes all of the noise predictors.

Best case scenario

In the study, stepwise regression performs the best when there are four candidate variables, three of which are authentic; there is zero correlation between the predictors; and there is an extra-large sample size of 500 observations. For this case, the stepwise procedure selects the correct model 84% of the time. Unfortunately, this is not a realistic scenario and the accuracy diminishes from here.

Number of candidate predictors and number of authentic predictors

The study looks at scenarios where there are either 4 or 8 candidate predictors. It is harder to choose the correct model when there are more candidates simply because there are more possible models to choose from. The same pattern holds true for the number of authentic predictors.

The table below shows the results for models with no multicollinearity and a good sample size (100-120 observations). Notice the decrease in the percent correct as both the number of candidates and number of authentic predictors increase.

Candidate predictors Authentic predictors % Correct model 4 1 62.7 2 54.3 3 34.4 8 2 31.3 4 12.7 6 1.1

Multicollinearity

The study varies multicollinearity to determine how correlated predictors affect the ability of stepwise regression to choose the correct model. When predictors are correlated, it’s harder to determine the individual effect each one has on the response variable. The study set the correlation between predictors to 0, 0.2, and 0.6.

The table below shows the results for models with a good sample size (100-120 observations). As correlation increases, the percent correct decreases.

Candidate predictorsAuthentic predictorsCorrelation% Correct model 4 2 0.0 54.3 0.2 43.1 0.6 15.7 8 4 0.0 12.7 0.2 1.0 0.6 0.4

Sample size

The study uses two sample sizes to see how that influences the ability to select the correct model. The size of the smaller samples is calculated to achieve 0.80 power, which amounts to 100-120 observations. These sample sizes are consistent with good practices and can be considered a good sample size.

The very large sample size is 500 observations and it is 5 times the size that you need to achieve the benchmark power of 0.80.

The table below shows that a very large sample size improves the ability of stepwise regression to choose the correct model. When choosing your sample size, you may want to consider a larger sample than what the power and sample size calculations suggest in order to improve the variable selection process.

Candidate predictors Authentic predictors Correlation % Correct - good sample size % Correct - very large sample 4 2 0.0 54.3 72.1 0.2 43.1 72.9 0.6 15.7 69.2 8 4 0.0 12.7 53.9 0.2 1.0 39.5 0.6 0.4 1.8

Closing Thoughts

Stepwise regression generally can’t pick the true model. This is true even with the small number of candidate predictors that this study looks at. In the real world, researchers often have many more candidates, which lowers the chances even further.

Reality is complex and we should not expect that an automated algorithm can figure it out for us. After all, the stepwise algorithm follows simple rules and it knows nothing about the underlying process or subject area. However, stepwise regression can get you to right ballpark. At a glance, you’ll have a rough idea of what is going on in your data.

It’s up to you to get from the rough idea to the correct model. To do this, you’ll need to use your expertise, theory, and common sense rather than relying solely on simplistic model selection rules.

For tips about how to do this, read my post Four Tips on How to Perform a Regression Analysis that Avoids Common Problems.

*Stephen Olejnik, Jamie Mills, and Harfey Keselman, “Using Wherry’s Adjusted R2 and Mallow’s Cp for Model Selection from All Possible Regressions”, The Journal of Experimental Education, 2000, 68(4), 365-380.

College Football Playoff Throughout the college football season, I’ve been looking at the influence of the preseason AP Poll on rankings later in the season. Each analysis found a positive association between preseason rankings and the current rankings. That is, between top-ranked teams with a similar number of losses, teams ranked higher in the preseason are also ranked higher in current polls. The biggest exception is SEC teams, who were able to consistently jump over non-SEC teams who ranked higher in the preseason.

Now that we have the final college football playoff poll, let’s do one more analysis to see if the final rankings correlate with the preseason AP poll.

The Top 6 Teams

We’ll look at the top 6 teams that were vying for a playoff spot: Alabama, Oregon, Florida State, Ohio State, TCU, and Baylor. First we’ll look at an individual value plot showing each team’s preseason rank versus their final rank.

Individual Value Plot

The only change from the preseason rankings is that Alabama and Oregon jumped ahead of Florida State. Other than that the final ranking of the teams is exactly the same as it was in the preseason. In addition, the correlation coefficient is 0.83 and there are 13 concordant pairs to only 2 discordant pairs. These statistics further show that teams ranked higher in the preseason will also be ranked higher in the final poll.

Cross Tabulation

But to the committee’s credit, they did drop Florida State. This was unprecedented since Florida State went undefeated and Alabama and Oregon both lost a game. But all season long, Florida State has played close games against average teams. And whether you’re winning or losing, playing so many close games against average competition means you yourself are an average team. If this were the BCS era, the championship game would be between Florida State and Alabama, and we would all be howling about how Oregon got left out. But for most of the season, Oregon and Alabama have looked like the two best teams, and both will get their chance to show it on the field.

But are Alabama and Oregon really the two best teams, or is it simply confirmation bias? Before the season it was clear that pollsters (and I’m sure fans, too) thought Alabama and Oregon were two of the best teams. So by winning they simply confirmed a belief we already had. Imagine if Arizona had gone 12-1 and won the Pac-12 instead of Oregon, and Oklahoma went 11-1 and won the Big 12 instead of TCU or Baylor. Oklahoma started the year ranked #4 while Arizona was unranked. Any doubt the Pac-12 would have been the conference left out of the playoff in that scenario?

The college football playoff committee did set a few new precedents. As I noted, they dropped an undefeated major conference team behind one-loss teams for what has to be the first time ever. (I found it ironic that soon after the playoff committee did this, the AP and Coaches Polls did the same thing. They would have never done that before.) The committee also showed they wouldn’t be locked in to the prior week’s poll. In previous seasons, a team would only drop late in the season after a loss. But the fact that TCU dropped from #3 to #6 in the final two polls shows the committee isn’t locked into last week’s thinking. Whether you agree with the decision to drop TCU or not, this new way of thinking is refreshing. (And for what it's worth, TCU, the Sagarin Ratings still have you at #2.)

But when it comes to the top-ranked teams, it’s still better to have confirmed our preseason expectations that you’re one of the best teams than disprove our preseason expectations that you’re not. So Baylor, next year make sure to hire the PR firm before the season starts!

Over the last year or so I’ve heard a lot of people asking, “How can I calculate B10 life in Minitab?” Despite being a statistician and industrial engineer (mind you, one who has never been in the field like the customers asking this question) and having taken a reliability engineering course, I’d never heard of B10 life. So I did some research.

The B10 life metric originated in the ball and roller bearing industry, but has become a metric used across a variety of industries today. It’s particularly useful in establishing warranty periods for a product. The “BX” or “Bearing Life” nomenclature, which refers to the time at which X% of items in a population will fail, speaks to these roots.

So then, B10 life is the time at which 10% of units in a population will fail. Alternatively, you can think of it as the 90% reliability of a population at a specific point in its lifetime—or the point in time when an item has a 90% probability of survival. The B10 life metric became popular among ball and roller bearing makers due to the industry’s strict requirement that no more than 10% of bearings in a given batch fail by a specific time due to fatigue failure.

Now that I know what the term means, I can tell people who ask that Minitab’s reliability analysis can easily compute this metric. (In fact, our statistical software can compute any “BX” lifetime—but we’ll save that for another blog post.) B10 life is also known as the 10th percentile and can be found in Minitab’s Table of Percentiles output, which is displayed in Minitab’s session window.

B10 Life - Table of Percentiles

And unlike other reliability metrics, B10 life directly correlates the maximum allowable percentile of failures (or the minimum allowable reliability) with an application-specific life point in time.

So we can get the B10 life metric by looking at the Table of Percentiles in Minitab’s session window output. But you might still be asking two questions: how do I create this table, and how do I interpret it?

Finding B10 Life, Step by Step

Suppose we have tracked and recorded the battery life times over a certain number of years for 1,970 pacemakers. The reliability of pacemakers is critical, because patients’ lives depend on these devices!

We observed exact failure times—defined as the time at which a low battery signal was detected—for 1,019 of those pacemakers. The remaining 951 pacemakers never warned of a low battery, so they “survived.”

Our data is organized as follows:

B10 Life - Data Organization

When we have both observed failures and units surviving beyond a given time, we call the data “right-censored.” And we know from process knowledge that the Weibull distribution best describes the lifetime of these pacemaker batteries. Knowing this information will help us use Minitab’s reliability analysis correctly.

Setting Up the Reliability Analysis

Because we have right-censored data and we know our distribution, we are ready to access Minitab’s Statistics > Reliability/Survival > Distribution Analysis (Right Censoring) > Parametric Distribution Analysis menu to compute the B10 life.

We want to know the batteries’ reliability—or probability of survival—at different times, so our variable of interest is the number of years a pacemaker battery has survived. In the Parametric Distribution Analysis dialog, you’ll notice the Weibull distribution is already selected as the assumed distribution. We’ll leave this default setting since we know the Weibull distribution best describes battery life times.

B10 Life Metric - Right Censoring

We also know whether the number in the ‘Years’ column was an exact failure time or a censored time (beyond which the battery survived). We must account for the censored data. By clicking the button labeled ‘Censor’, we can include a censoring column that contains values indicating whether or not the pacemaker survived or failed at the recorded time. Inour Minitab worksheet, “Failed or Survived” is the censoring column. Our censoring value is ‘S’, which stands for ‘Survived’, indicating no failure was observed during the pacemaker battery tracking period.

B10 Life - censoring column

Interpreting the Table of Percentiles and B10 Life

Once we click OK through all dialogs to carry out the analysis, Minitab outputs the Table of Percentiles, where we can find our B10 life:

B10 Life - Corresponding Percentile

Where the Percent column displays 10, the corresponding Percentile value tells us that the B10 life of pacemaker batteries is 6.36 years—or, to put it another way, 6.36 years is the time at which 10% of the population of pacemaker batteries will fail.

There we have it! The next time you are looking to compute the B10 life of a product, and perhaps seeking to establish suitable warranty periods, you need look no further than Minitab’s reliability tools and the Table of Percentiles.

At the end of my previous post, aspiring statisticians Woodrow "Woody" Stem and August "Russell" Leaf, creators of the famed Stem-and-Leaf plot, were in bad shape. They had beaten each other statsless after an argument about the challenge given to them by their mentor, Dr. Histeaux Graham. That challenge: to devise a simple, yet elegant way to examine the distribution of values in a sample.

After their fateful bout of pugilism, Woody convalesced at the renowned Saint Tukey Center for Post Hoc Health. There, he continued to refine his theory of equally spaced intervals (or "bins," as he did eventually resort to calling them). His approach was great for evaluating the range of the sample—the proverbial minimum and maximum. But alas! How could he examine the values betwixt these values!?! The problem vexed him greatly.

Russell sought care from the Gaussian order of the Brothers of Functional Likelihood. Their fabled monastery provided a quiet place for Russell to rest and think. And the brothers of the order also brewed a mean stout! (Their porter...meh. But the stout was absolutely to recover for!) In the solitude and the dampitude of the large stone structure, Russell whiled away the days dipping his quill, creating random samples, sorting them, and meticulously copying the values onto artisanal parchment. (To this day, Russell's data sets are highly prized for their marginally ornate illustrations. [Or is that 'ornate margin illustrations'? I'm not sure. I'd have to look it up. So let's go with 'marginally ornate.'])

Russell bemused himself with art, he abused himself with stout, but he never disabused himself of one central notion: In order to understand a sample, one needed to know each value therein. Russell's methods did indeed gave him a deep understanding of each datum, but alas! How could he shine light on the nature of the distribution: the clumps, the gaps, the peaks, the tails, and, par chance, the outliers!?! The problem vexed him greatly. Until that fateful day...

An Almighty Wind

It was a cloudy afternoon, sometime after both protagonists had regained the power of ambulation. Woody, still weak from his prolonged requiescence, sought the rejuvenation one can only derive from abundant fresh air. Strolling the local park, his head immersed in a dense cloud of smoke from his trusty briar, Woody chanced to glimpse a lone figure on a bench. The man appeared to be creating random samples, sorting them, and meticulously copying the values onto artisanal parchment. "What a dolt," Woody thought, and decided to go over and give the man a piece of his mind. And, in a way, that's just what he did.

Historians differ on exactly what happened next. Some say that as the two men—once friends, now bitter rivals—met each other's gaze, the clouds and smoke parted and a blinding shaft of light engulfed them both. (Mind you, this was before sunglasses.) The men froze, slack-jawed at the spectacle. A sudden gust wrested the pipe from Woody's weak fingers. The pipe flew up in the draft and struck first Woody, then Russell, thwacking each upside the head. It was as if God himself had grown weary of their stubbornness and had reached down from the heavens to give each of his beloved, but wayward children a dope slap. (I'm not sure what the other historians say about this event. I'd have to look it up, so let's go with dope slap.)

As the memory of their past friendship slowly returned to our heroes, each was suddenly able to look beyond his own foibles and prejudice to grasp the value-added synergies that might be realized from collaboration and cooperation, vis-à-vis cheating on their assignment. To this day, the creation that was inspired in that singular moment still bears the names of its creators.

(Some historians have also claimed that in the moment of epiphany, Woody's ever-present chocolate snack tumbled from his hand and came to rest in a bowl filled with a peanut-based butter substitute that Russell was known to enjoy. The result was said to have been quite tasty. However, these claims have never been substantiated.)

Amazing as this story is, some today have never heard of the Stem-and-Leaf plot. You see, Dr. Graham eventually appropriated (read 'stole') the ideas of his charges and created his own graph which he named after himself: Dr. Histeaux Graham's Magic Distribution Tonic. The name was later shortened to Histeaux Graham's Plot, and finally Histogram. (It was also known for a time by other names, such as Histeaux's Odyssey, the Graham Tracker, and That One With The Bars.)

Example of Dr. Graham's Tracker Plot (a.k.a. Histogram)

Eventually, with the advent of plotting machines, computers, and later the interwebs, the venerable Stem-and-Leaf plot fell into widespread disuse. It's just too easy these days to create a histogram. And many practitioners feel the histogram connotes a more sophisticated esthetic than does the cruder, but way-easier-to-make-by-hand Stem-and-Leaf plot.

But don't feel too bad for Dr. Stem and Dr. Leaf, gentle readers. For at least their crowning achievement is still known and revered, if only in statistical circles.

"Data! Data! Data! I can't make bricks without clay."
— Sherlock Holmes, in Arthur Conan Doyle's The Adventure of the Copper Beeches

Whether you're the world's greatest detective trying to crack a case or a person trying to solve a problem at work, you're going to need information. Facts. Data, as Sherlock Holmes says.

jujubes

But not all data is created equal, especially if you plan to analyze as part of a quality improvement project.

If you're using Minitab Statistical Software, you can access the Assistant to guide you through your analysis step-by-step, and help identify the type of data you have.

But it's still important to have at least a basic understanding of the different types of data, and the kinds of questions you can use them to answer.

In this post, I'll provide a basic overview of the types of data you're likely to encounter, and we'll use a box of my favorite candy—Jujubes—to illustrate how we can gather these different kinds of data, and what types of analysis we might use it for.

The Two Main Flavors of Data: Qualitative and Quantitative

At the highest level, two kinds of data exist: quantitative and qualitative.

Quantitativedata deals with numbers and things you can measure objectively: dimensions such as height, width, and length. Temperature and humidity. Prices. Area and volume.

Qualitative data deals with characteristics and descriptors that can't be easily measured, but can be observed subjectively—such as smells, tastes, textures, attractiveness, and color.

Broadly speaking, when you measure something and give it a number value, you create quantitative data. When you classify or judge something, you create qualitative data. So far, so good. But this is just the highest level of data: there are also different types of quantitative and qualitative data.

Quantitative Flavors: Continuous Data and Discrete Data

There are two types of quantitative data, which is also referred to as numeric data: continuous and discrete. As a general rule, counts are discrete and measurements are continuous.

Discrete data is a count that can't be made more precise. Typically it involves integers. For instance, the number of children (or adults, or pets) in your family is discrete data, because you are counting whole, indivisible entities: you can't have 2.5 kids, or 1.3 pets.

Continuousdata, on the other hand, could be divided and reduced to finer and finer levels. For example, you can measure the height of your kids at progressively more precise scales—meters, centimeters, millimeters, and beyond—so height is continuous data.

If I tally the number of individual Jujubes in a box, that number is a piece of discrete data.

a count of jujubes is discrete data

If I use a scale to measure the weight of each Jujube, or the weight of the entire box, that's continuous data.

Continuous data can be used in many different kinds of hypothesis tests. For example, to assess the accuracy of the weight printed on the Jujubes box, we could measure 30 boxes and perform a 1-sample t-test.

Some analyses use continuous and discrete quantitative data at the same time. For instance, we could perform a regression analysis to see if the weight of Jujube boxes (continuous data) is correlated with the number of Jujubes inside (discrete data).

Qualitative Flavors: Binomial Data, Nominal Data, and Ordinal Data

When you classify or categorize something, you create Qualitative or attribute data. There are three main kinds of qualitative data.

Binary data place things in one of two mutually exclusive categories: right/wrong, true/false, or accept/reject.

Occasionally, I'll get a box of Jujubes that contains a couple of individual pieces that are either too hard or too dry. If I went through the box and classified each piece as "Good" or "Bad," that would be binary data. I could use this kind of data to develop a statistical model to predict how frequently I can expect to get a bad Jujube.

When collecting unordered or nominal data, we assign individual items to named categories that do not have an implicit or natural value or rank. If I went through a box of Jujubes and recorded the color of each in my worksheet, that would be nominal data.

This kind of data can be used in many different ways—for instance, I could use chi-square anlaysis to see if there are statistically significant differences in the amounts of each color in a box.

We also can have ordered or ordinal data, in which items are assigned to categories that do have some kind of implicit or natural order, such as "Short, Medium, or Tall." Another example is a survey question that asks us to rate an item on a 1 to 10 scale, with 10 being the best. This implies that 10 is better than 9, which is better than 8, and so on.

The uses for ordered data is a matter of some debate among statisticians. Everyone agrees its appropriate for creating bar charts, but beyond that the answer to the question "What should I do with my ordinal data?" is "It depends." Here's a post from another blog that offers an excellent summary of the considerations involved.

Additional Resources about Data and Distributions

For more fun statistics you can do with candy, check out this article (PDF format): Statistical Concepts: What M&M's Can Teach Us.

For a deeper exploration of the probability distributions that apply to different types of data, check out my colleague Jim Frost's posts about understanding and using discrete distributions and how to identify the distribution of your data.

by Matthew Barsalou, guest blogger

Aaron and Billy are two very competitive—and not always well-behaved—eight-year-old twin brothers. They constantly strive to outdo each other, no matter what the subject. If the boys are given a piece of pie for dessert, they each automatically want to make sure that their own piece of pie is bigger than the other’s piece of pie. This causes much exasperation, aggravation and annoyance for their parents. Especially when it happens in a restaurant (although the restaurant situation has improved, since they have been asked not to return to most local restaurants).

Sending the boys to their rooms never helped. The two would just compete to see who could stay in their room longer. This Christmas their parents were at wits' ends, and they decided the boys needed to be taught a lesson so they could grow up to be upstanding citizens. Instead of the new bicycles the boys were going to get—and probably just race till they crashed anyway—their parents decided to give them each a bag of coal.

An astute reader might ask, “But what does this have to do with Minitab?” Well, dear reader, the boys need to figure out who got the most coal. Immediately upon opening their packages, the boys carefully weighed each piece of coal and entered the data into Minitab.

Then they selected Stat > Basic Statistics > Display Descriptive Statistics and used the "Statistics" options dialog to select the metrics they wanted, including the sum of the weights they'd entered:

Billy quickly saw that he had the most coal, and yelled, “I have 279.383 ounces and you only have 272.896 ounces. Our parents must love me more.”

“Not so fast,” said Aaron. “You may have more, but is the difference statistically significant?” There was only one thing left for the boys to do: perform a two sample t-test.

In Minitab, Aaron selected Stat > Basic Statistics > 2-Sample t…

The boys left the default values at a confidence level of 95.0 and a hypothesized difference of 0. The alternative hypothesis was “Difference ≠ hypothesized difference” because the only question they were asking was “Is there a statistically significant difference?” between the two data sets.

The two troublemakers also selected “Graphs” and checked the options to display an individual value plot and a boxplot. They knew they should look at their data. Having the graphs available would also make it easier for them to communicate their results to higher authorities, in this case, their poor parents.

Individual Value Plot of Coal

Boxplot of Coal

Both the individual value plots and boxplots showed that Aaron's bag of coal had pieces with the highest individual weights. But he also had the pieces with the least weight. So the values for his Christmas coal were scattered across a wider range than the values for Billy‘s Christmas coal. But was there really a difference?

Billy went running for his tables of Student‘s t-scores so he could interpret the resulting t-value of -0.71. Aaron simply looked at the resulting p-value of 0.481. The p-value was greater than 0.05 so the boys could not conclude there was a true difference in the weight of their Christmas "presents."

600

The boys dutifully reported the results, with illustrative graphs, each demanding that they get a little more to best the other. Clearly, receiving coal for Christmas had done nothing to reduce their level of competitiveness. Their parents realized the boys were probably not going to grow up to be upstanding citizens, but they may at least become good statisticians.

Happy Holidays.

About the Guest Blogger

Matthew Barsalou is an engineering quality expert in BorgWarner Turbo Systems Engineering GmbH’s global engineering excellence department. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time, Statistics for Six Sigma Black Belts and The ASQ Pocket Guide to Statistics for Six Sigma Black Belts.

Histograms are one of the most common graphs used to display numeric data. Anyone who takes a statistics course is likely to learn about the histogram, and for good reason: histograms are easy to understand and can instantly tell you a lot about your data.

Here are three of the most important things you can learn by looking at a histogram.

Shape—Mirror, Mirror, On the Wall…

If the left side of a histogram resembles a mirror image of the right side, then the data are said to be symmetric. In this case, the mean (or average) is a good approximation for the center of the data. And we can therefore safely utilize statistical tools that use the mean to analyze our data, such as t-tests.

If the data are not symmetric, then the data are either left-skewed or right-skewed. If the data are skewed, then the mean may not provide a good estimate for the center of the data and represent where most of the data fall. In this case, you should consider using the median to evaluate the center of the data, rather than the mean.

Did you know...

If the data are left-skewed, then the mean is typically LESS THAN the median.

If the data are right-skewed, then the mean is typically GREATER THAN the median.

Span—A Little or a Lot?

Suppose you have a data set that contains the salaries of people who work at your organization. It would be interesting to know where the minimum and maximum values fall, and where you are relative to those values. Because histograms use bins to display data—where a bin represents a given range of values—you can’t see exactly what the specific values are for the minimum and maximum, like you can on an individual value plot. However, you can still observe an approximation for the range and see how spread out the data are. And you can answer questions such as "Is there a little bit of variability in my organization's salaries, or a lot?"

Outliers (and the ozone layer)

Outliers can be described as extremely low or high values that do not fall near any other data points. Sometimes outliers represent unusual cases. Other times they represent data entry errors, or perhaps data that does not belong with the other data of interest. Whatever the case may be, outliers can easily be identified using a histogram and should be investigated as they can shed interesting information about your data.

Rewind to the mid-1980s when scientists reported depleting ozone levels above Antarctica. The Goddard Space Center had studied atmospheric ozone levels, but surprisingly didn’t discover the issue. Why? The analysis they used automatically eliminated any Dobson readings below 180 units because ozone levels that low were thought to be impossible.

On a recent vacation, I was unsuccessfully trying to reunite with my family outside a busy shopping mall and starting to get a little stressed. I was on a crowded sidewalk, in a busy city known for crime, and it was raining. I thought there was no way things could get more aggravating when something warm and solid hit my arm and shirt.

Joel on Vacation A bird had pooped on me.

Not having the kids with me, and being in a foreign country, I used a couple of words in English that best represented my feelings towards the bird and the entire situation. Being a generally positive person, I had quickly recovered once I had gotten dry, cleaned up, and had the kind of beverage you have when you've been defecated on and want to de-stress. I told the kids, who of course found it funny. I even laughed a little.

What are the odds?

Two days later, in a small beach town, my wife and I were returning from a hike. I was not even thinking about the poop incident any longer when something warm and solid hit my shirt and shorts.

A bird had pooped on me. Again.

Now, based on quantity and viscosity, we have some reason to believe it may have actually been the small monkeys common that part of the world, but that's beside the point of this post. And at this point, you may be wondering what the point of this post is actually since all I've done thus far is given you two reasons to laugh at my misfortune...

The Poisson Process

Events that occur completely randomly and with roughly equal probability over time—like a bird pooping on you, or a server crashing, or workplace injuries in a factory—come from what is known as a Poisson process.

You may know Poisson as being related to count data, and the application here is that if you take random spans of time of equal length (days, months, years, etc.), the number of occurrences of the event in each span will follow the Poisson distribution. The time between those events follows the exponential distribution, meaning that most events will happen somewhat close together but there will also be really long spans between events sometimes.

Many undesirable events in healthcare settings—like hospital-acquired infections, falls, accidental needle sticks—follow this model. They can be tracked using a T-Chart (or if only the count of days or hours are known, the G-Chart) in Minitab so a practitioner can know if these events have started occurring more or less frequently.

This model is also useful in making me feel better, since I know that while getting pooped on by a bird is a pretty rare and unpredictable event, having it happen twice in three days isn't as crazy as it first sounds.

I like learning about new things. This fondness makes it less depressing when I have to admit total ignorance on any subject. Thus, when I heard that there were “New Year honours” given out, I expected something like a Dave Letterman top ten list about events from 2014. Instead, it turns out that New Year honours are awards given out to people for their actions, achievements, or service to the United Kingdom.

Neck badge of a Commander of the Military Division of the Order of the British Empire

The Guardian’s website helpfully provided a spreadsheet of the 1,164 people honored this year. Having been shamed by my lack of knowledge, I thought I’d take a look at the list with Minitab Statistical Software and see what I might notice.

The first thing I noticed was that the spreadsheet began with three columns titled “Order,” “Level,” and “Award.” “How fascinating,” I thought to myself. “I can’t wait to see what awards belong to different orders.” Here are three ways to visualize my initial ignorance in Minitab.

Bar Chart of 1,000 Categories

Because the data are categories, the first thing I did was make a bar chart. Unfortunately, it can be hard to fit many categories on a bar chart, especially if some of the labels are long. In this case, each of the three categorical variables has 10 unique values. If we make a chart with 1,000 categories, the result is not legible.

You cannot read the labels on this graph.

On the chart above, most of the combinations of categories have frequencies of zero. For example, all of the Knight Bachelor awards have order values of Knighthood, so any other combinations of the award Knight Bachelor with other orders have bars with zero height.

Bar Chart of Lengthy Labels

To make more space on our graph, the first thing to do is to hide the empty cells. That way, we need space on the chart only for the combinations of values that occur in the dataset.

All of the labels do not fit beneath the chart.

When we hide the empty cells, the number of categories on the chart decreases from 1,000 to 10. Unfortunately, the labels are so long that we still can't fit them all on the chart.

Stacked Bar Chart

One way to get more labels to fit is to stack one of the categories so that the labels are in a legend. That way, we need space for only two labels beneath the bars. So I created a stacked bar chart with the data:

All of the labels fit if you put one category in a legend.

On this stacked bar chart, we can see that there are 10 bars...and none of them are stacked. This chart is showing us that, in the dataset, everyone who gets the same award gets it at the same level and for the same order. The variables in this dataset suggest that order, level, and award are all the same thing, which is not what I expected at all.

Wrap Up

I’m a bit relieved, in the end, to say that the source of the confusion might be the Guardian’s dataset rather than my ignorance. According to the BBC’s 2012 guide to the honours, there is no order called “Companions of the Order of the Bath” as is found in the dataset. Rather, the order is “the Order of the Bath” and Companion is a rank, or level, within that order. Now thtat I know a little bit more about the honours system than I did before, I can make a proper chart:

Only one recipient each was made a Knight Commander and a Knight Grand Cross.

This emphasizes the rarity of some honours given this New Year’s Day. Among all the honours, I would guess that (Evan) Paul Silk, the only Knight Commander of the Order of the Bath, and Professor Sir John Irving Bell FRS, the only Knight Grand Cross of the Order of the British Empire, deserve special congratulations.

Last fall I had a birthday. It wasn’t one of those tougher birthdays where the number ends in a zero. Still, the birthday got me thinking. In response, I told myself, age is just a number. Then I did a mental double-take. Can a statistician say that? After all, numbers are how I understand the world and the way it works.

birthday cake

Can age just be a number? After some musing, I concluded that age is just a number!

How did I reach my conclusion? I’m pretty sure that I’m not just deluding myself to feel better. In statistics, you need to be able to trust your data. Whether you’re performing an ANOVA, regression analysis, or a designed experiment, if you can’t trust your data, you can’t trust the results of your analysis.

I can hear some of you protesting, "But certainly your age is accurate!" Yes, it’s a straightforward matter to count the number of revolutions the Earth has made around the Sun since the day I was born. There is absolutely no doubt that number is correct. Trust me, I've tried counting it different ways.

However, there are other ways that numbers can deceive you. In statistics and experimental design, we are concerned with both reliability and validity. These issues require us to question whether the measurements we use are consistent and whether we are really studying what we think we are studying.

Reliability relates to the repeatability of the measurements and the experiment as a whole. If you measure the same thing multiple times, do you get the same measurements? Do similar studies produce the same results? We’re good here. Or, rather, I should say that it’s at least consistent if not “good”. Every time I check the calendar, I’m undoubtedly no younger!

Validity relates to how well a conclusion reflects the requirements of the scientific method. There are various types of validity such as internal, external, and construct. Here we’ll focus mainly on internal validity, which deals with how confident we can be in the cause-and-effect relationships in a study.

A Hypothetical Study: Does Age Cause Bad Things?

Which study, you ask? That’s where we’ll have to get a bit hypothetical. Presumably, we have some basis behind our dislike of higher ages. Perhaps we’re imagining a study that relates older ages to negative outcomes. But, which outcomes? That’s a bit foggy. We also need to be sure that age causes these undefined outcomes rather than just being correlated with them.

These issues are good indications that our hypothetical study doesn’t have good internal validity. We can’t be sure of the cause and effect. In fact, we’re not even sure which effects we’re talking about! At this point, we can’t trust the conclusion that higher ages cause negative outcomes.

In a quick, preliminary assessment, it seems to me that there are two general categories of life outcomes for this study area.

Biological changes: It’s a fact that over time biological processes occur and that we change. We develop and go through puberty. As we get older, mental and physical abilities decline, etc. While these changes are unavoidable, they occur at different rates for different people. We can influence some of the underlying factors for these changes. For example, we can make decisions about eating a healthy diet, exercising, and reducing UV exposure. Unfortunately, some of the underlying factors are out of our control and depend on luck.

However, an important question is, which way does the causal arrow point? Do the annual changes in our calendar age cause these biological changes? Or, do these biological changes define our concept of age? I’d argue that the causal arrows are thus: Underlying factors -> Biological changes -> Concept of age.

Important life matters: These are not so biology- or time-dependent, and are more under our control. This category includes things such as happiness, having strong relationships, maximizing your potential, performing rewarding activities, making a positive contribution to your community, etc.

So, we have one category where the casual connection is unclear and perhaps in the opposite direction of what our study is designed to show. The other category doesn’t seem age-dependent at all. None of this supports what our thought experiment is attempting to show, that age causes bad things. This study is already questionable...and I haven't even mentioned the fact that it’s impossible to randomly assign people to different age groups.

Based on all of these issues, my professional opinion is that age is just a number!

This conclusion should free you from worrying about that constant increase in age every year. Don't sweat that number. However, it places responsibility on you to make smart choices in your life. After all, your choices have more impact on your life than some meaningless number. And, the rest you can't control.

As a member of Minitab's Technical Support team, I get the opportunity to work with many people creating control charts. They know the importance of monitoring their processes with control charts, but many don’t realize that they themselves could play a vital role in improving the effectiveness of the control charts.

In this post I will show you how to take control of your charts by using Minitab Statistical Software to set the center line and control limits , which can make a control chart even more valuable.

When you add or change a value in the worksheet, by default the center line and control limits on a control chart are recalculated. This can be desirable in many cases—for example, when you have a new process. Once the process is stable, however, you may not want the center line and control limits continually recalculated.

Consider this stable process:

Xbar Chart of Thickness

Now suppose the process has changed, but with the new re-calculated center line and control limits, the process is still shown to be in control (using the default Test 1: 1 point > 3 standard deviation from the center line).

Xbar Chart of Thickness

If you have a stable control chart, and you do not want the center line or control limits to change (until you make a change to the process), you can set the center line and control limits. Here are two ways to do this.

Use the Estimate tab.
This option works well when you want to use the initial subgroups to calculate the center line and control limits.

Choose Stat > Control Charts > Variables Charts for Subgroups > Xbar.
Click the Estimate tab.
Choose “Use the following subgroups when estimating parameters” and enter the appropriate number of subgroups. In the example above we want to use the first 12 subgroups, so enter 1:12.

X-bar chart options

Use the Parameters tab.
This option works well when you do not have an initial set of data you want to use to calculate the center line and control limits, but know the values you want to use.

Suppose you want the center line of your Xbar chart to be 118.29, UCL=138.32 and LCL=98.26.

Solve for the standard deviation, s. Using the formula for UCL, estimating for µ and s for :

Note: If you want to use the estimates from another data set, such as a similar process, you could obtain the estimates of the mean and standard deviation without solving for s. Choose Stat > Control Charts > Variables Charts for Subgroups > Xbar. Choose Xbar Options, then click the Storage tab. Check Means and Standard deviations. I'll use the data from the first 12 subgroups above for illustration:

These values are stored in the next available blank columns in the worksheet.

Choose Stat > Control Charts > Variables Charts for Subgroups > Xbar. Choose Xbar Options, then click the Parameters tab. Enter the mean and standard deviation.

Using the center line and the control limits from the stable process (using either of the methods described above), the chart now reveals the new process is out of control.

As you can see, it's important to consider whether you are using the best center line and control limits for your control charts. Making sure you're using the best options, and setting the center line and control limits manually when desirable, will make your control charts even more beneficial.

by Matthew Barsalou, guest blogger

Recently Minitab’s Joel Smith posted a blog about an incident in which he was pooped on by a bird. Twice. I suspect many people would assume the odds of it happening twice are very low, so they would incorrectly assume they are safer after such a rare event happens.

I don’t have data on how often birds poop on one person, and I assume Joel is unwilling to stand under a flock of berry-fed birds waiting to collect data for me, so I’ll simply make up some numbers for illustration purposes only.

Suppose there is a 5% chance of being pooped on by a bird during a vacation. That means the probability of being pooped on is 0.05. The probability of being pooped on twice during the vacation is 0.0025 (0.05 x 0.05) or 0.25%, and the probability of being pooped on three times is 0.000125 (0.05. x 0.05 x 0.05).

Joel has already been pooped on twice. So what is the probability of our intrepid statistician being pooped on a third time?

The probability is 0.05. If you said 0.000125, then you may have made a mistake known as the Gambler’s Fallacy or the Monte Carlo Fallacy. This fallacy is named after the mistaken belief that things will average out in the short-term. A gambler who has suffered repeated losses may incorrectly assume that the recent losses mean a win is due soon. Things will balance out in the long term, but the odds do not reset after each event. Joel could correctly conclude the probability of a bird pooping on him during his vacation are low and the odds of being pooped on twice are much lower. But being pooped on one time does not affect the probability of it happening a second time.

There is a caveat here. The probabilities only apply if the meeting of poop and Joel are random events. Perhaps birds, for reasons understood only by birds, have an inordinate fondness for Joel. Our probability calculations would no longer apply in such a situation. This would be like calculating the probabilities of a coin toss when there is some characteristic that causes the coin to land more on one side than on the other.

We can perform an experiment to determine if Joel is just a victim of the odds or if there is something that makes the birds target him. The generally low occurrence rate would make it difficult to collect data in a reasonable amount of time so we should perform an experiment to collect data. We could send Joel to a bird sanctuary for two weeks and record the number of times he is pooped on. Somebody of approximately the same size and appearance as Joel could be used as a control. Both Joel and the control should be dressed the same to ensure that birds are not targeting a particular color or clothing brand. The table below shows the hypothetical results of our little experiment.

We can see that Joel was hit 99 times, while the control was only hit 80 times. But does this difference mean anything? To find out, we can use Minitab Statistical Software to determine if there is a statistically significant difference between the number of times Joel was hit and the number of times the control was hit.

Enter the data into Minitab and then go to Stat > Basic Statistics > 2-Sample Poisson Rate and select “Each sample is in its own column.” Go to Options and select “Difference > hypothesized difference” as the alternative hypothesis for a one-tailed upper tailed test. The resulting P-value shown in the output below is 0.078. That's greater than the alpha of 0.05 so we fail to reject the null hypothesis. Although there was a higher occurrence rate for Joel, we have no reason to think that birds are especially attracted to him.

Test and CI for two-sample Poisson rates output

Joel is well aware of the Gambler’s Fallacy, so we can be assured that he is not under a false sense of security. He must know the probability of him getting struck a third time has not changed. But has he considered that these may not be random events? The experiment described here was only hypothetical. Perhaps Joel should consider wearing a sou’wester and rain coat the next time he takes a vacation in the sun.

About the Guest Blogger

Last Friday I had an interesting tweet come across my Twitter feed.

And that was before the Patriots failed to cover their first playoff game of 2015 against the Ravens. When you include that, the record becomes 3-11, good for a winning percentage of only 21%! With the Patriots set to play another playoff game against the Colts, it seems like the smart thing to do is to bet the Colts to cover. But wait, 14 games is a pretty small sample. We should do a hypothesis test to determine whether this percentage is significantly less than 50%.

1 Proportion Test

Minitab Statistical Software returns a p-value of 0.029, which is less than the alpha value of 0.05, so we can be 95% confident that the true percentage of games that the Patriots cover during the playoffs is less than 50%. Great! Now it’s time to get my ATM card and bet a mortgage payment on the Colts. Thank you, statistics!

But wait, there is one more question I should probably ask pertaining to that tweet.

Why only the last 13 games?

Date

Patriots Opponent

Spread

Score

Cover the Spread?

1/19/2014

@ Denver

L 16-26

1/11/2014

Indianapolis

-7.5

W 43-22

1/20/2013

Baltimore

-8

L 13-28

1/13/2013

Houston

-9.5

W 41-28

2/5/2012

New York Giants

-3

L 17-21

1/22/2012

Baltimore

-7

W 23-20

1/14/2012

Denver

-14

W 45-10

1/16/2011

New York Jets

-9.5

L 21-28

1/10/2010

Baltimore

-3.5

L 14-33

2/3/2008

New York Giants

-12.5

L 14-17

1/20/2008

San Diego

-14

W 21-12

1/12/2008

Jacksonville

-13.5

W 31-20

1/21/2007

Indianapolis

+3.5

L 34-38

1/14/2007

San Diego

W 24-21

1/7/2007

New York Jets

-9.5

W 37-16

Here are the last fifteen games the Patriots played prior to the tweet (so, not including the most recent Baltimore game). I’ve highlighted the 13th, 14th, and 15th games in red. All three of these games were played 1 week apart, but the 13th game was included in the tweet, while the 14th and 15th games were conveniently left off.

Why? Because 3-10 against the spread sounds more impressive than 5-10.

This is using selective endpoints to manipulate statistics to help prove your point. It’s these kind of things that lead people to say “There are three kinds of lies: lies, damned lies, and statistics.” The conclusions you can make from your statistical analysis are only as good as the data behind it is. That’s why you should always make sure you collect a random, unbiased, sample. And before you believe the conclusions made by others, ensure they collected the data correctly too!

Tom Brady -- Keith Allison. Used under Creative Commons Attribution-ShareAlike 2.0 In our Patriots situation, we could go back and look at every playoff game the Patriots have played in. But I don’t think their games in 1963 have any effect on their games this season. So instead, the best thing to do is to associate this Patriots team with Tom Brady. So we should sample all the playoff games that Tom Brady has played in. That includes the 16 previous games (in which he went 5-11 against the spread) and 11 games he played before 2007 (in which he went 6-4-1). This gives us a final record of 11-15-1, which is a winning percentage of 42%.

Once we obtained a legitimate sample of data, we see that Tom Brady and the Patriots record against the spread in the playoffs isn’t nearly as bad as we were originally led to believe. While 42% is still less than 50%, it is no longer significantly different.

1 Proportion Test

So could the Patriots still fail to cover against the Colts this weekend? Of course. But I'm not going to go bet a mortgage payment on it.

Photo of Tom Brady by Keith Allison, used under Creative Commons 2.0.

I really can’t make this stuff up.

I wrote a post a couple of years ago entitled: “How to Talk to Your Kids about Six Sigma and Quality Improvement,” in which I lamented about “Community Hero” day in my daughter’s 1st grade class and the need to explain to her why I wasn’t at the "community-hero-level" of Maggie’s Mommy, the pediatrician.

Pong

Well, now it's two years later and my son, Thomas, is in class with Maggie’s little brother, Sam. Seriously. Again. Instead of Community Hero Day, however, the classroom had Engineering Day. Step aside, Maggie’s Mommy!

Or maybe not. Enter Maggie and Sam’s Daddy, the aerospace engineer.

Seriously, this family.

After a fun-filled Engineering Day, Sam and Thomas were excitedly telling me about all the fun things they learned about. “You design things, you build things—YOU MAKE THINGS—IT’S SO AWESOME!” Every engineer dreams of the day when their child’s eyes light up at the thought of building things. Nothing, not even a trip to Disney World, tops the feeling of knowing you may have brought an engineer into this world. This was it. I did it. I proudly exclaimed “I’m an engineer!”

And, then….

Thomas: “Weeeeell, not really….”

Me: “No, I am”

Thomas: “Well….you really don’t make things…real things. You make software. That’s not REAL.”

Sam: “My daddy makes planes.”

Thomas: “See, Sam’s daddy is a REAL engineer.”

Me: “I’m a REAL engineer.”

My daughter, Emilia, came to my rescue: “Thomas, Mommy was a real engineer. Mommy, tell them about the old days when you were a real engineer.”

And. Breathe. This was not happening again. Software Quality Engineering is REAL and I intended to prove it.

When my kids bounced out of bed the next day, they did what is now second nature. They grabbed for, of course, their iPad. But it wasn’t there.

“Where’s my iPad?”

“Sorry, only REAL things today!”

“What?”

“But, what are we going to do?”

“Real things.”

I really had underestimated the joy that this would bring me.

And so, to prove to them that a majority of the “things” they do during the day rely on software, I introduced statistical sampling. We talked about pulling a random sampling of their toys to provide a picture of the entire population.

And so they headed to the toy room and did just that. Then we entered the data in the statistical software I help engineer. The results: 67.7% were either software or involved the use of software.

This was not the world I grew up in. I vaguely recall the rich kids having Pong back in the day, but, aside from that, nothing that needed software. Sticks, rocks and, if you didn’t cause trouble, a bike. Gangs of kids on Huffy bikes—that was my childhood.

Dependence of Toys on Software

So, our sample provided a picture and a couple of very good lessons:

1) Software Engineering is real. Very real.

and

2) My kids rely far too much on it.

As any respectable engineer would do, I adjusted based on the result. This holiday season, my children unwrapped chemistry sets, architect sets, books, and, because we can’t stop cold turkey, a little technology.

So, did they get their iPads and video games back? Of course. But not until I made them sit through several updates that we had been procrastinating.

And, did the long wait during the updates frustrate them and impact their user experience? Yep.

Hmmm...if only there were people to do something about that.

You may have been in a situation where you had created a general full factorial design and noticed that your design’s run size was higher than you imagined. (Quick refresher: a general full factorial design is an experimental design where any factor can have more than 2 levels).

Determined to minimize the monstrous size of your worksheet, you go back to Stat > DOE > Factorial > Create Factorial Design and go through all of the sub-menus to select a fractional factorial design. A fractional factorial design would allow you to select only a subset or "fraction" of the runs from your design to analyze.

You’d quickly find that there is no option to select a fractional subset for a general full factorial design. These fractional designs are only available in designs where each factor has 2 levels. We’ll need to start looking elsewhere for design reduction.

Fortunately, after creating a general full factorial design, a menu item ‘Select Optimal Design’ under DOE > Factorial > Create Factorial Design is now available in Minitab's statistical software. Select Optimal Design will allow you to select the "best" design points by reducing the number of experimental runs in the original design.

To obtain the best subset of your base set, you’ll need to focus on two things: points and terms.

Number of Points in Optimal Design

Enter in the number of runs you would like to subset out of your original design. For example, if your original design contains 64 runs, and you only have enough resources to capture 32 runs of data, enter 32.

Terms

You’ll see a Terms… sub-menu. In here, you’ll see a model terms selection box. You’ll need to select an initial model prior to finding an optimal design. If you're looking to eventually analyze main effects and 2-way interactions, then you could select 2 from the 'Include terms in the model up through order' drop-down.

After clicking "OK" in each dialog window, Minitab will generate a new worksheet in your project, containing your optimal subset.

Optimality Measures

In your session window, you’ll see a table entitled Optimal Design:

The first section details the row numbers that were selected from your original general full factorial design. The second section is a listing of optimality measures, which are useful in comparing different designs to see which one is more appropriate. Let’s focus on D-optimality, as that it is the measure Minitab optimizes for.

D-optimality

D-optimality minimizes the variance in the regression coefficients of the fitted model. Larger values are desirable. The maximum possible value is n^(df), where n is the number of experimental runs and df is the degrees of freedom for all terms in your selected model plus the constant.

Where to go from here?

You may disoover that after settling on a specific optimal design, you have room to collect 5 more runs from your original data set. You could select the task to ‘Augment/Improve’ within the Create Optimal Design menu to incorporate 5 new runs into your optimal design. This requires that you return back to your original design and create an indicator column to protect those points that were selected in the first optimal design. To protect these points, you need to use negative indicators such as “-1.” The rest of the points can be labelled as “0”, meaning that they can be excluded from the model when the design is reduced.

Selecting the task “Evaluate Design” under the Create Optimal Design menu can yield other benefits as well. This allows you to view how your D-optimality measures would change if you wanted to keep using the same optimal design points, but alter the model terms selected. You would use an indicator column for this scenario as well, so that Minitab knows which runs from the original data set were used to create your optimal design. For both “Augment…” and “Evaluate…”, you compare your newly obtained D-Optimality measures to your old ones and see if they have improved.

I hope this information aids you the next time you want to reduce the size of your general full factorial designed experiment!

What Does It Mean When Your Probability Plot Has Clusters?

How Cpk and Ppk Are Calculated, part 1

How Cpk and Ppk Are Calculated, part 2

The World-Famous Disappearing-Reappearing-Analysis-Settings Act

Which Is Better, Stepwise Regression or Best Subsets Regression?

Analyzing the Final College Football Playoff Poll

How to Calculate B10 Life with Statistical Software

An Unauthorized Biography of the Stem-And-Leaf Plot - Part II: A New Leaf

Understanding Qualitative, Quantitative, Attribute, Discrete, and Continuous Data Types

A Minitab Holiday Tale: Featuring the Two Sample t-Test

3 Things a Histogram Can Tell You

Poisson Processes and Probability of Poop

3 Ways Minitab Can Illustrate My Ignorance about British Honours

Can a Statistician Say that Age Is Just a Number?

Control Your Control Chart!

Birds Versus Statisticians: Testing the Gambler's Fallacy

Tom Brady and the Danger of Selective Endpoints

How to Talk to Your Kids about Quality Engineering, Part 2: Statistical Sampling

Reducing the Size of Your General Full Factorial Design

Plot the State of the Union Global Temperature Data in Minitab