Minitab | Minitab

By Matthew Barsalou, guest blogger.

Minitab Statistical Software can assist us in our analysis of data, but we must make judgments when selecting the data for an analysis. A good operational definition can be invaluable for ensuring the data we collect can be effectively analyzed using software.

Dr. W. Edwards Deming explains in Out of the Crisis (1989), “An operational definition of safe, round, reliable, or any other quality must be communicable, with the same meaning to vendor as to purchaser, same meaning yesterday and today to the production worker.” Deming goes onto to tell us an operational definition requires a specific test, a judgment criteria, and a decision criteria to determine if something met the criteria.

The concept of operational definitions crossed my mind when I read Todd VanDerWerff’s review of Mad Max: Fury Road at Vox.

VonDerWerff presented an illustration of the percent of time individual Mad Max movies contained a chase scene based on data from the Internet Movie Data Base. I have recreated the illustration below as a bar chart using Minitab.

I first typed the data into a Minitab worksheet as shown below:

I then stacked the data by going to Data > Stack > Columns… and selecting columns C1-C4. Next, I relabeled column C1-T as “Film” and column C2 as % Chase.”

Then I went to Graph > Bar Chart and selected “Values from a table” and a “Simple” bar chart. The graph variables were % Chase and the categorical variable was Film. I clicked on the resulting bar chart and then right clicked and selected Add > Data labels. The resulting bar chart is shown below:

Chart of Percent Chase

As a connoisseur of the Mad Max series, I was rather shocked to see that Mad Max: Fury Road consisted of only 32% chase scenes. I would have estimated 95-95% chase scenes! VanDerWreff explains “We're skewing toward the conservative side here and only counting scenes where the characters are in the thick of a really contentious chase, where either side might prevail.” Obviously, we are using different criteria to identify a chase scene. VanDerWreff is close to an operational definition; however, “where either side might prevail” could still be open to interpretation and therefore, inadequate as an operational definition.

In Twenty Things you Need to Know (2009), Wheeler lists three questions that can serve as a framework for an operational definition:

What do you want to accomplish?
By what method will you accomplish your objective?
How will you know you have accomplished your objective?

Answering Wheeler’s three questions can help us to define an operational definition for chase scenes in the latest Mad Max movie: We want to identify chase scenes in a Mad Max: Fury Road. We will use a calibrated stop watch capable of differentiating down to 1/100th of a second to identify the start and stop time of a chase where a chase is defined as “the time from when a chasing party first appears on screen at a range of 1,800 meters or less away from the chased party and the time will stop at the point where the chasing party is seen to be more than 1,800 meters away from the chased party or the last scene in which the chasing party appears.” The total chase time is to be divided by the total length of the movie and multiplied by 100. The objective will be accomplished after the last credit appears on the screen at the end of the movie.

Such a simple operational definition makes it clear what should be considered a chase scene. Notices that the operational definition refers to “chased parties” and not “chased vehicles”? This operational definition would include foot chases as chase time. Without an operational definition, one evaluator may include foot chases while another ignores them.

Tina Turner tells us, “We don’t need another hero.” Perhaps, but what we do need is a good operational definition if we want to correctly collect data for a statistical analysis.

About the Guest Blogger

Matthew Barsalou is a statistical problem resolution Master Black Belt at BorgWarner Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right Time, Statistics for Six Sigma Black Belts and The ASQ Pocket Guide to Statistics for Six Sigma Black Belts.

Low-resolution poster image displayed under fair use. Copyright is believed to belong to the distributor of the item promoted, Warner Bros. Pictures.

The line plot is an incredibly agile but frequently overlooked tool in the quest to better understand your processes.

In any process, whether it's baking a cake or processing loan forms, many factors have the potential to affect the outcome. Changing the source of raw materials could affect the strength of plywood a factory produces. Similarly, one method of gluing this plywood might be better or worse than another.

But what is even more complicated to consider is how these factors might interact. In this case, plywood made with materials obtained from supplier “A” might be strongest when glued with one adhesive, while plywood that uses material from supplier “B” might be strongest when you glued with a different adhesive.

Understanding these kinds of interactions can help you maintain quality when conditions change. But where do you begin? Try starting with a line plot.

The Line Plot Has Two Faces

Line plots created with Minitab Statistical Software are flexible enough to help you find interactions and response patterns whether you have 2 factors or 20. But while the graph is always created the same way, such changes in scale produce two seemingly distinct types of graph.

With just a few groups…the focus is on interaction effects. In the graph below, a paint company that wants to improve the performance of its products has created a line plot that finds a strong interaction between spray paint formulation and the pressure at which it’s applied.
Line Plot 1

An interaction is present where the lines are not parallel.

With many groups…the focus is on deviations from an expected response profile. (That's why in the chemical industry this is sometimes called a profile graph.) The line plot below shows a comparison of chemical profiles of a drug from three different manufacturing lines.

Many Groups

Any profile that deviates from the established pattern could suggest quality problems with that production line, but these three profiles look quite similar.

More Possibilities to Explore

If you’re an experienced Minitab user, these examples may seem familiar. In its various incarnations, the line plot is similar to the interaction plot, to "Calculated X" plots used in PLS, and even to time series plots that appear with more advanced analyses. But the line plot gives you many more options for exploring your data. Here’s another example.

explore the mean

A line plot of the mean sales from a call center shows little interaction between the call script and whether the operators received sales training because the lines are parallel.

explore standard deviation

But because line plot allows us to examine functions other than the mean, we can see that there is, in fact, an interaction effect in terms of standard deviation. The lines are not parallel. For some reason, the variability in sales seems to be affected by the combination of script and training.

How to create a line plot in Minitab

Creating a line plot in Minitab is simple. For example, suppose that your company makes pipes. You’re concerned about the mean diameter of pipes that are produced on three manufacturing lines with raw materials from two suppliers.

Example with Symbols

Because you’re examining only two factors—line and supplier—a With Symbols option is appropriate. Use Without Symbols options when you have many groups to consider. Symbols may clutter the graph. Within these categories, you have your choice of data arrangement.

Choose Graph > Line Plot > With Symbols, One Y.
Click OK.

example variables

Now, enter the variables to graph. Note that Line Plot allows you to graph a number of different functions apart from the mean.

In Graph variables, enter 'Diameter'.
In Categorical variable for X-scale grouping, enter Line.
In Categorical variable for legend grouping, enter Supplier.
Click OK.

Line Plot of Diameter

The line plot shows a clear interaction between the supplier and the line that manufacture the pipe.

Putting line plots to use

The line plot is an ideal way to get a first glimpse into the data behind your processes. The line plot resembles a number of graphs, particularly the interaction plots used with DOE or ANOVA analyses. But, while the function of line plots may be similar, their simplicity makes them an especially appropriate starting point.

It can highlight the variables and the interactions that are worth exploration. Its powerful graphing features also allow you to analyze subsets of your data or to graph different functions of your measurement variable, like standard deviation or count.

I recently fielded an interesting question about the probability and survival plots in Minitab Statistical Software's Reliability/Survival menus:

Is there a one-to-one match between the confidence interval points on a probability plot and the confidence interval points on survival plot at a specific percentile?

Now, this may seem like an easy question, given that the probabilities on a survival plot are simply 1 minus the failure probabilities on a probability plot at a specific time t or stressor (in the case of Probit Analysis, used for our example below).

This can be seen here, at the 10th percentile:

The probability plot is saying that at a voltage of 113.25, 10% of your items are failing. Conversely, the survival plot will show that 90% of your items will survive at that same voltage.

How do the graphs compare when adding confidence intervals to both graphs? Before we get our hands dirty with this, let’s first review some terms and methods to get us comfortable enough to proceed further.

Reliability/Survival Analysis

This is the overarching classification of tools within Minitab that help with modeling life data. Distribution Analysis, Repairable Systems Analysis, and Probit Analysis fall within this category.

Probit Analysis

This analysis will be used as our example today. Probit analysis is used when you want to estimate percentiles and survival probabilities of an item in the presence of a stress. The response is required to be binomial in nature (go/no go, pass/fail). One example of a probit analysis could be testing light bulb life at different voltages.

Since the response data is binomial, you’d have to specify what would be a considered an event for that light bulb at a certain voltage. Let’s say the event is a light bulb blowing out before 800 hours.

Excerpt of data set

Blows(The Event) Trials Volts 2 50 108 6 50 114 11 50 120 45 50 132 Probability Plot

This graph plots each value against the percentage of values in the sample that are less than or equal to it, along a fitted distribution line (middle line). In probit analysis, it helps determine, at certain voltages, what the percentage of bulbs fail before 800 hours.

Survival Plot

This graph displays a plot of the survival probabilities versus time. Each plot point represents the proportion of units surviving at time t. In probit analysis, it helps determine, at a certain voltages, what the percentage of bulbs survive beyond 800 hours.

Back to the original question…

Can we take a value along the CI of a probability plot and find its corresponding value on the CI of survival plot at a specific percentile? Here are the confidence interval values for the percentile at 113.246:

If we add the above confidence interval values for the 10th percentile to the survival plot, you'll see that they don’t quite equal what’s shown at 90%. They’re a little off:

The Reason

In our probability plot, the confidence interval is calculated with the parameter of interest being the percentile. Let’s look at the 10th percentile again:

Our 95% CI (111.302 to 114.779) is around the value of 113.246 volts. In our survival plot, the confidence interval is calculated around the probability of survival. You can see this in the session window under the Table of Survival Probabilities. The 95% CI around the survival probability of 0.90 for a voltage of 113.246:

Here’s another look at our survival plot with our aforementioned survival probabilities added:

They all nicely fit on one straight vertical line at voltage = 113.246.

This all being said, you can convert the lower bound or upper bound of a percentile to a point on a survival plot. Let’s say we look at the lower bound for 113.246 (which is 111.302). We’d first have to find the survival probability for that value:

Now let’s look at that table of survival probabilities for 0.90 again:

Notice that the survival probability for the lower CI of 113.246 ends up being the upper bound of the survival probability of 0.90. Given that the survival probabilities are one minus the failure probabilities, it makes sense that you'd have to look at the upper bound of a survival plot when analyzing the lower bound of a probability plot.

I hope this post helps you develop a deeper understanding of the relationship between our probability and survival plots—and I hope it wasn't too technical!

Please check out these other posts on Reliability/Survival:

Probit Analysis: Down Goes the Meathouse!

The Care and Feeding of Capital Equipment( with Reliability Statistics)

When someone gives you data to analyze, you can gauge how your life is going by what you've received. Get a Minitab file, or even comma-separated values, and everything feels fine. Get a PDF file, and you start to think maybe you’re cursed because of your no-good-dirty-rotten-pig-stealing-great-great-grandfather and wish that you were someone else. For those of you who might be in such dire straits today, here are 3 helpful things you can do in Minitab Statistical Software: change data type, code and remove missing values, and recode variables.

For the purposes of having an example, I’m going to use some data from the Centers for Medicare and Medicaid Services. The data are from October 2008 to September 2009 and track the quality of a hospital’s response to a patient with pneumonia. The data in the PDF file look like this:

The PDF file has header text and a nicely formatted table.

If you copy and paste it into Minitab, hoping for nicely-organized tables as appear in the document, you get a single column that contains everything:

The header text and the table content are all in one column.

Don’t despair. Instead, look at the capabilities that are at your fingertips.

Change Data Type

What we’re really after for analysis are the numbers inside the table, so a good first step is to get the numbers.

Choose Data > Change Data Type > Text to Numeric.
In Change text columns, enter C1.
In Store Numeric Columns in, enter C2.
Click OK. In the Error box, click Cancel.

When you look at the worksheet, the cells that had text values after the paste are now missing value symbols and the numbers that were in the tables remain. You might be a bit unnerved that the percentages of patients who received treatments are all 1, but that’s only a result of the column formatting. (Want to see? Change the numeric display format.)

Remove missing values

You can easily get rid of the missing values in these data so that the missing values don’t interfere with further analysis, but there’s an additional complication here. While most of the missing values are column headers that we don’t want in the data, the table itself contains some missing values. Anytime a hospital gave a treatment to fewer than 10 patients, the table contains the value “Low Sample (10 or less).” To preserve these missing values while eliminating the others, we want to use different values to represent the different cases in the data.

Choose Calc > Calculator.
In Store Result in Variable, enter C3.
In Expression, enter If(Left(C1,3)=”Low”, 99999999, C2).
Click OK.

Now that you have two kinds of missing value, you can start cleaning them up. First, get rid of the ones that don’t represent values in the table.

Choose Data > Copy > Columns to Columns.
In Copy from columns, enter C3.
In Store Copied Data in Columns, select In current worksheet, in columns and enter C4.
Click Subset the Data.
In Specify Which Rows to Include, select Rows that match and click Condition.
In Condition, enter C3 <> '*'.
Click OK in all of the dialog boxes.

Now that we’ve gotten rid of the missing values that weren’t numbers in the table, we can change the missing values that we kept back to a form Minitab recognizes.

Choose Calc > Calculator.
In Store result in variable, enter C5.
In Expression, enter If(c4 = 99999999, ‘*’, c4).
Click OK.

Recode the data

For analysis, we want one row for each hospital. To do this, we’ll create a table in the worksheet that shows how to identify the variables for analysis, then unstack the variables.

Because we kept the missing values from the table, every hospital has 9 variables. We make a table in the worksheet that shows the numbers 1 to 9 and a name for each variable:

A table with number codes and labels that you want for the variables.

To associate the variable names with all 1,944 rows of data, we’ll make patterned data.

Choose Calc > Make Patterned Data > Simple Set of Numbers.
In Store patterned data in, enter C8.
In From first value, enter 1.
In To last value, enter 9.
In Number of times to list sequence, enter 216.
Click OK.

To convert the number codes to the text variable descriptions, we’ll recode the data.

Choose Data > Code > Use Conversion Table.
In Code values in the following column, enter C8.
In Current values, enter C6.
In Coded values, enter C7.
Click OK.

Now that you have a column that says which number belongs to each variable, unstack the data.

Choose Data > Unstack Columns.
In Unstack the data in, enter C5.
In Using subscripts in, enter C9.
Click OK.

Now, you have a new worksheet where each hospital is identified by its unique CCN and the variables are the proportions of pneumonia patients who got each treatment from that hospital.

Once the data are in a traditional format for analysis, you can start to get the answers that you want quickly. For example a Laney P’ chart might suggest whether some hospitals had a higher proportion of unvaccinated pneumonia patients than you would expect from the variation in the data.

8 facilities have higher proportions for the year than you would expect from a random sample from a stable process.

Fortunately, being able to change data types, remove missing values, and recode data lets you get data ready to analyze in Minitab as fast as possible. That way, you’re ready to give the answers that your fearless data analysis justifies.

Previously, I’ve written about how to interpret regression coefficients and their individual P values.

I’ve also written about how to interpret R-squared to assess the strength of the relationship between your model and the response variable.

Recently I've been asked, how does the F-test of the overall significance and its P value fit in with these other statistics? That’s the topic of this post!

In general, an F-test in regression compares the fits of different linear models. Unlike t-tests that can assess only one regression coefficient at a time, the F-test can assess multiple coefficients simultaneously.

The F-test of the overall significance is a specific form of the F-test. It compares a model with no predictors to the model that you specify. A regression model that contains no predictors is also known as an intercept-only model.

The hypotheses for the F-test of the overall significance are as follows:

Null hypothesis: The fit of the intercept-only model and your model are equal.
Alternative hypothesis: The fit of the intercept-only model is significantly reduced compared to your model.

In Minitab statistical software, you'll find the F-test for overall significance in the Analysis of Variance table.

Analysis of variance table with the F-test of overall significance

If the P value for the F-test of overall significance test is less than your significance level, you can reject the null-hypothesis and conclude that your model provides a better fit than the intercept-only model.

Great! That set of terms you included in your model improved the fit!

Typically, if you don't have any significant P values for the individual coefficients in your model, the overall F-test won't be significant either. However, in a few cases, the tests could yield different results. For example, a significant overall F-test could determine that the coefficients are jointly not all equal to zero while the tests for individual coefficients could determine that all of them are individually equal to zero.

There are a couple of additional conclusions you can draw from a significant overall F-test.

In the intercept-only model, all of the fitted values equal the mean of the response variable. Therefore, if the P value of the overall F-test is significant, your regression model predicts the response variable better than the mean of the response.

While R-squared provides an estimate of the strength of the relationship between your model and the response variable, it does not provide a formal hypothesis test for this relationship. The overall F-test determines whether this relationship is statistically significant. If the P value for the overall F-test is less than your significance level, you can conclude that the R-squared value is significantly different from zero.

If your entire model is statistically significant, that's great news! However, be sure to check the residual plots so you can trust the results!

If you're learning about regression, read my regression tutorial!

When data are collected in subgroups, it’s easy to understand how the variation can be calculated within each of the subgroups based the subgroup range or the subgroup standard deviation.

When data is not collected in subgroups (so the subgroup size is 1), it may be a little less intuitive to understand how within-subgroup standard deviation is calculated. How does Minitab Statistical Software calculate within-subgroup variation if there is only one data point in each subgroup? How does this affect Cpk? This blog post will discuss how within-subgroup variation and Cpk are calculated when the subgroup size is 1.

For this post, the data linked here will be used with along with a lower spec of 10 and an upper spec of 20 (sorry, no back story to this data). We will also accept Minitab’s default method for calculating within-subgroup variation for when the subgroup size is 1, which is the average moving range.

The normal capability results below show that for this dataset, the within-subgroup standard deviation is 1.85172 and the Cpk is 0.89:

To find the formulas Minitab uses to calculate the average moving range, we navigate the following menu path in Minitab: Help> Methods and Formulas> Process capability> Process capability (Normal). The section titled Estimating standard deviation shows the formula for the average moving range:

We’ll use the formula above (and link to the table of unbiasing constants) to replicate Minitab’s Cpk output for a normal capability with a subgroup size of 1.

First, we calculate Rbar. To do that, we’ll get the average of the moving ranges, by calculating the difference from the data point in row 1 to row 2, row 2 to row 3, and so forth. An easy way to do that in Minitab is to use the Lag function in the Time Series menu- we choose Stat> Time Series> Lag, and then complete the dialog box as shown below and click OK:

The lag function shifts every row down by the number of rows we type in the Lag field above.

Now we can use Calc > Calculator to subtract C2 from C1 and store the differences in a new column. Because the formula tells us to take the Max minus the Min values and we don’t want to rearrange the data, we can just use the ABS function in the calculator to get the absolute values of the differences:

Next we can use Stat> Basic Statistics> Store Descriptive Statistics to store the Sum of the differences that we calculated in the previous step:

The value stored in the worksheet, 206.785, is the numerator for our R-bar calculation. Now we can plug that number into the formula from Methods and Formulas:

Rbar = (Rw + ... + Rn) / (n - w + 1)

w = The number of observations used in the moving range. The default is w = 2

Rbar = (206.785)/100-2+1 = 2.08874

Finally, we can find the value of the unbiassing constant (d2) using the table linked in Methods and Formulas. In this example, w = 2, and d2(w) = 1.128:

To calculate sigma x-bar, we use the formula from Methods and Formulas, dividing our Rbar estimate by the d2 value from the table (I used Minitab’s calculator again to get the answer):

Sigma x-bar = 0.0210984/1.128 = 1.85172– that matches Minitab’s capability output, so we’re almost there!

Now we can calculate Cpk, which is the lesser of CPU and CPL. Once again Methods and Formulas tells us how to calculate CPU and CPL:

We can get the sample mean, X-bar, from Minitab’ capability output or by using Stat> Basic Statistics> Store Descriptive Statistics. That X-bar value along with the other values we’ve calculated are plugged in the above formulas:

CPU = (20-15.063)/(3*1.85172) = 0.89

CPL = (15.063-10)/(3*1.85172) = 0.91

Since Cpk is the lesser of CPU and CPL, then Cpk = 0.89, just like Minitab said!

I hope this post on calculating Cpk when the size of the subgroup is 1 was helpful. You may also be interested in learning how Minitab calculates Cpk when the subgroup size is greater than 1.

Before I joined Minitab, I worked for many years in Penn State's College of Agricultural Sciences as a writer and editor. I frequently wrote about food science and particularly food safety, as I regularly needed to report on the research being conducted by Penn State's food safety experts, and also edited course materials and bulletins for professionals and consumers about ensuring they had safe food.

culture dish After I joined Minitab and became better acquainted with data-driven quality methods like Six Sigma, I was surprised at how infrequently some of the powerful quality tools common in many industries are used in food safety work.

So I was interested to see a recent article on the Food Safety Tech web site about an application of the tool called FMEA in pathogen testing.

What Is an FMEA?

The acronym FMEA is short for "Failure Modes and Effects Analysis." What the tool really does is help you look very carefully and systematically at exactly how and why things can go wrong, so you can do your best to prevent that from happening.

In the article, Maureen Harte, a consultant and Lean Six Sigma black belt, talks about the need to identify, quantify, and assess risks of the different pathogen detection methods used to create a Certificate of Analysis (COA)—a document companies obtain to verify product quality and purity.

Too often, Harte says, companies accept COA results blindly:

[They] lack the background information to really understand what goes into a COA, and they trust that what is coming to them is the highest quality.

Harte then proceeds to explain how doing an FMEA can make the COA more meaningful and useful.

FMEA helps us understand the differences between testing methods by individually identifying the risks associated with each method on its own. For each process step [in a test method], we ask: Where could it go wrong, and where could an error or failure mode occur? Then we put it down on paper and understand each failure mode.

Completing an FMEA

Doing an FMEA typically involves these steps:

Identify potential failure types, or "modes," for each step of your process.
List the effects that result when with those failures occur.
Identify potential causes for each failure mode.
List existing controls that are in place to keep these failures from happening.
Rate the Severity of the effect, the likelihood of Occurrence, and the odds of Detecting the failure mode before it causes harm.
Multiply the values for severity, occurrence, and detection to get a risk priority number (RPN).
Improve items with a high RPN, record the actions you've taken, then revise the RPN.
Maintain as a living document.

You can do an FMEA with just a pencil and paper, although Minitab's Quality Companion and Qeystone Tools process improvement software include forms that make it easy to complete the FMEA—and even share data from process maps and other forms you'll may be using.

Here's an example of a completed Quality Companion FMEA tool:

FMEA

FMEA Steps

1) In Process Map - Activity, enter each process step, feature or type of activity. In the example above, it's preparation of growth culture and incubation. We also list the key components or inputs of each step.

2) In Potential Failure Mode, we note the ways the process can fail for each activity. There may be many ways it could fail. In the example, we've identified contamination of growth medium and incubating cultures at the wrong temperature as potential failure modes.

3) In Potential Failure Effects, we detail the possible fallout of each type of failure. There may be multiple failure effects. In the example above, contaminated growth culture could lead to the waste of perfectly good raw materials. An improperly performed incubation might lead to undetected pathogens, and possibly unsafe products.

4) In SEV (Severity Rating), we assign severity to each failure effect on a 1 to 10 scale, where 10 is high and 1 low. This is a relative assignment. In the food world, wasting some good materials is undesirable, but having pathogens reach the market is obviously much worse, hence the ranking of 6 and 9, respectively.

5) In OCC (Occurrence Rating), estimate the probability of occurrence of the cause. Use a 1 to 10 scale, where 10 signifies high frequency (guaranteed ongoing problem) and 1 signifies low frequency (extremely unlikely to occur).

6) In Current Control, enter the manner in which the failure causes/modes are detected or controlled.

7) In DET (Detection Rating), gauge the ability of each control to detect or control the failure cause/mode. Use a 1 to 10 scale, where 10 signifies poor detection/control and 1 signifies high detection/control (you're almost certain detection to catch the problem before it causes failure).

8) RPN (Risk Priority Number) is the product of the SEV, OCC, and DET scores. The higher the RPN, the more severe, more frequent, or less controlled a potential problem is, indicating a greater need for immediate attention. Above, the RPN of 81 for potential incubation error indicates that that type of failure should get higher priority than contaminated cultures. .

9) If you're doing FMEA as part of an improvement project, you can use it to prioritize corrective actions. Once you've implemented improvements, enter the revised SEV, OCC, and DET values to calculate a current RPN.

The Benefits of an FMEA

When you've completed the FMEA, you'll have the answers to these questions:

What are the potential failure modes at each step of a process?
What is the potential effect of each failure mode on the process output, and how severe is it?
What are the potential causes of each failure mode, and how often do they occur?
How well can you detect a cause before it creates a failure mode and effect?
How can you assign a risk value to a process step, that factors in the frequency of the cause, the severity of failure, and the capability of detecting it in advance?
What part of the process should an improvement project focus on?
Which inputs are vital to the process, and which aren't?
How can reaction plans be documented as part of process control?

And if your understanding of the steps that underlie your Certificate of Analysis is that thorough, you will be able to stand behind it with much more confidence.

Where could you apply an FMEA in your organization?

by Jeff Parks, guest blogger

Being from Kentucky, horse racing comes natural to me. Like nearly everyone else, I watched and was moved by American Pharoah’s Triple Crown run, which ended an historic 37-year streak of Triple Crown disappointments.

Prior to this year the longest drought was 25 years, stretching from Citation's 1948 achievement to Secretariat's performance in 1973.

Many people watching had never seen a Triple Crown won before.

While it was a great achievement and an historic moment, a logical question arises: how does American Pharoah compare with the best Triple Crown winners of the past?

And that comparison begins and ends with Secretariat.

We can use Minitab's statistical software to look at Secretariat's and American Pharoah's performance compared to the long history of the Belmont Stakes and the other Triple Crown races, the Kentucky Derby and the Preakness.

The Belmont Stakes has been run at Belmont Park in Elmont, New York, for 147 years, while the Preakness is 143 years old. The Kentucky Derby has been run since 1875, making it 141 years old. However, it has always been at Churchill Downs. The Belmont and Preakness have not always been run at their current location, which is why the Kentucky Derby is usually described as the longest consecutive running sporting event in the United States.

Today the Belmont is the longest of the three races at 1½ miles, but, like the other three races, it has not always been that distance over its entire history. But it has been kept at its current distance since 1926. That gives us 89 years of data to look at the winning horse times to see how they compare.

The data is below:

Year

Winner

Time

Time in seconds

2015

American Pharoah

02:26.7

146

2014

Tonalist

02:28.5

148

2013

Palace Malice

02:30.7

150

2012

Union Rags

02:30.4

150

2011

Ruler on Ice

02:30.9

150

2010

Drosselmeyer

02:31.6

151

2009

Summer Bird

02:27.5

147

2008

Da'Tara

02:29.7

149

2007

Rags to Riches ‡

02:28.7

148

2006

Jazil

02:27.9

147

2005

Afleet Alex

02:28.7

148

2004

Birdstone

02:27.5

147

2003

Empire Maker

02:28.3

148

2002

Sarava

02:29.7

149

2001

Point Given

02:26.6

146

2000

Commendable

02:31.2

151

1999

Lemon Drop Kid

02:27.9

147

1998

Victory Gallop

02:29.2

149

1997

Touch Gold

02:28.8

148

1996

Editor's Note

02:29.0

149

1995

Thunder Gulch

02:32.0

152

1994

Tabasco Cat

02:26.8

146

1993

Colonial Affair

02:30.0

150

1992

A.P. Indy

02:26.1

146

1991

Hansel

02:28.1

148

1990

Go And Go

02:27.2

147

1989

Easy Goer

02:26.0

146

1988

Risen Star

02:26.4

146

1987

Bet Twice

02:28.2

148

1986

Danzig Connection

02:29.8

149

1985

Creme Fraiche

02:27.0

147

1984

Swale

02:27.2

147

1983

Caveat

02:27.8

147

1982

Conquistador Cielo

02:28.2

148

1981

Summing

02:29.0

149

1980

Temperence Hill

02:29.8

149

1979

Coastal

02:28.6

148

1978

Affirmed †

02:26.8

146

1977

Seattle Slew †

02:29.6

149

1976

Bold Forbes

02:29.0

149

1975

Avatar

02:28.2

148

1974

Little Current

02:29.2

149

1973

Secretariat †

02:24.0

144

1972

Riva Ridge

02:28.0

148

1971

Pass Catcher

02:30.4

150

1970

High Echelon

02:34.0

154

1969

Arts and Letters

02:28.8

148

1968

Stage Door Johnny

02:27.2

147

1967

Damascus

02:28.8

148

1966

Amberoid

02:29.6

149

1965

Hail To All

02:28.4

148

1964

Quadrangle

02:28.4

148

1963

Chateaugay

02:30.2

150

1962

Jaipur

02:28.8

148

1961

Sherluck

02:29.2

149

1960

Celtic Ash

02:29.2

149

1959

Sword Dancer

02:28.4

148

1958

Cavan

02:30.2

150

1957

Gallant Man

02:26.6

146

1956

Needles

02:29.8

149

1955

Nashua

02:29.0

149

1954

High Gun

02:30.8

150

1953

Native Dancer

02:28.6

148

1952

One Count

02:30.2

150

1951

Counterpoint

02:29.0

149

1950

Middleground

02:28.6

148

1949

Capot

02:30.2

150

1948

Citation †

02:28.2

148

1947

Phalanx

02:29.4

149

1946

Assault †

02:30.8

150

1945

Pavot

02:30.2

150

1944

Bounding Home

02:32.2

152

1943

Count Fleet †

02:28.2

148

1942

Shut Out

02:29.2

149

1941

Whirlaway †

02:31.0

151

1940

Bimelech

02:29.6

149

1939

Johnstown

02:29.6

149

1938

Pasteurized

02:29.4

149

1937

War Admiral †

02:28.6

148

1936

Granville

02:30.0

150

1935

Omaha †

02:30.6

150

1934

Peace Chance

02:29.2

149

1933

Hurryoff

02:32.6

152

1932

Faireno

02:32.8

152

1931

Twenty Grand

02:29.6

149

1930

Gallant Fox †

02:31.6

151

1929

Blue Larkspur

02:32.8

152

1928

Vito

02:33.2

153

1927

Chance Shot

02:32.4

152

1926

Crusader

02:32.2

152

Note that the winners’ time has to be converted from Minutes: seconds format to straight seconds for analysis.

Using Minitab’s SPC (Statistical Process Control) Individual Value (I-chart) chart we can see:

Only two time periods show to be outside of control limits

Secretariat's 1973 time of 144 seconds (2:24), which is the lowest time (meaning the fastest winner)
High Echelon's 1970 time of 154 seconds (2:34), which is longest time (meaning the slowest winner)

The overall average winner time over the past 89 years is 148.81, seconds (2:28.8). Secretariat’s time is more than 4 seconds faster. Now, that is 3% faster than the average winner. And it may be tempting to ask, “So what?”

But let’s look at this in another way. Rather than looking at control limits, which a SPC chart does, how about approaching this from a capability perspective? Let’s have Secretariat's time be a lower spec limit and assess the probability of another horse beating that time.

Using Minitab’s Normality test and Graphing function we can see that:

The data is not normally distributed and we can see Secretariats time as an outlier on the far left.

When performing a Capability Analysis with non-normal data we have a few choices. We can transform the data or identify the distribution and then do a capability analysis on that particular distribution.

Minitab has a feature known as a Johnson Transformation, which can automatically transform many nonnormal distributions and analyze them using the spec limits provided with little effort by the user.

This is one of the advantages of the Minitab statistical software. When we do this using a Johnson Transformation with the 144 Secretariat time as a lower spec limit we get:

Or .36% chance of any horse achieving that time. A very unlikely event indeed.

Secretariat holds the record for the Belmont. But he also has the record in the other two legs of the Triple Crown as well:

Kentucky Derby: 119 seconds (1:59)
Preakness: 114 seconds (1:54)

Let's apply this same approach to the Kentucky Derby, which has had the same distance since 1896

There's a 5.54% chance of a horse beating Secretariat's Kentucky Derby time.

How about the Preakness?

A 3.5% probability.

The probability of a horse beating Secretariats time in all 3 Triple Crown races would be

(.0036) *(.0554) * (.035) = 7x10^4%

In other words, in about one million years we would only see this happen 7 times.

When you consider that only 43 Triple Crown opportunities have happened since Secretariat's run in 1973, and the horses are 3 years old when they race, about 14 generations of horses have tried and failed to beat Secretariat's record.

Billions of dollars and countless time and effort are spent each year trying to make thoroughbred horses faster. The training is supposedly better, the nutrition and supplements are better and, yes, the drugs they give these horses are all better than they were 43 years ago.

Yet despite that, no one has beaten “Big Red,” as Secretariat was known. After his death, an autopsy found Secretariat had a heart 2.75 times larger than that of the average horse.

American Pharoah is a great horse who had a fantastic run, and excited us all by delivering the first Triple Crown victory in 37 years. But he is no Secretariat.

And apparently, no other horse ever was, either.

About the Guest Blogger

Jeff Parks has been a Lean Six Sigma Master Black Belt since 2002 and involved in process improvement work since 1997. He is a former U.S. Navy nuclear submarine officer and lives in Louisville, Ky., with his wife and 7 children. He can be reached at Jwparks407@hotmail.com and via Twitter, @JeffParks3.

Photo of American Pharoah used under Creative Commons license 2.0. Source: Maryland GovPics https://www.flickr.com/people/64018555@N03

With their victory in game 6 over the Tampa Bay Lightning, the Chicago Blackhawks won their 3rd Stanley Cup Championship in the last 6 years. This is an incredible feat that no doubt means the Blackhawks have been a very talented hockey team over that stretch. But just like random variation can play a part in quality processes, luck can play a part in sporting outcomes. So how lucky has Chicago been?

Probability of Winning 3 Out of 7 Stanley Cup Championships

The Blackhawks have won 3 of the last 6 Stanley Cups, but their run really began the year before they won the first cup, which was 2009. That was the first year they had made the playoffs in 7 years, so I’ll start collecting the data from there. For each year, I took the odds that the Blackhawks would win the championship at the start of the playoffs and turned that into a probability. Rows in bold represent years the Hawks won.

Year

Odds

Percentage

2015

8 to 1

11%

2014

8 to 1

11%

2013

7 to 2

22%

2012

15 to 1

2011

60 to 1

2010

8 to 1

11%

2009

11 to 1

Average

9 to 1

10%

The only year Chicago was actually the favorite to win the cup was 2013, and even then they had a less than 1 in 4 chance of winning. So they have overcome some pretty long odds.

To calculate their overall odds of winning 3 championships in 7 years, I’m going to use their average percentage of 10%. Our number won’t be perfect, but it will give us a pretty good idea of how unlikely the Blackhawks run has been. The following Minitab probability distribution plot shows the probability of the Blackhawks winning 3 or more championships in the last 7 years.

Binomial Distribution Plot

There is only a 2.6% (approximately 1 in 42) chance that the Blackhawks would have won 3 or more championships in the last 7 years! There is no doubt that skill and talent are integral parts of Chicago’s success. But to win as often as they have, you need to have some luck too. And speaking of luck, had you bet $100 on the Blackhawks at the start of the playoffs each of the last 7 years, you would be up $1,650! So if you think all these championships are something that could easily have been predicted (Kane! Toews! Hossa! Of course they won!) Las Vegas begs to differ.

Now, what would this graph have looked like if Chicago was the favorite at the start of each Stanley Cup Playoffs? I found that for each year, the favored NHL team has about a 20% chance of winning it all. So let’s look at another binomial distribution plot.

Binomial Distribution Plot

Even the NHL favorite winning 3 championships in 7 years is unlikely, happening only about 15% of the time. We see that the most likely outcome is that the favorite would win one championship. And wouldn’t you know it, the only NHL favorite to win the Stanley Cup the last 7 years was the 2013 Chicago Blackhawks.

Can We Find a Luckier Team?

Believe it or not, there is one team that has overcome even greater odds to recently win 3 championships. That would be the San Francisco Giants, as they won the World Series in 2010, 2012, and 2014. That’s 3 titles in 6 years! And their respective probability in each of those years was 11%, 12%, and 7%. The other 3 years they didn’t even make the playoffs. It’s feast or famine with San Francisco! So the odds of winning 3 World Series in the only 3 years you make the playoffs……

0.11*0.12*0.07 = 0.000924 = approximately 1 in 1,082

Sorry Chicago, your run was impressive, but the Giants have proven Lady Luck is on their side even more. But do you want to overcome them? Well, how does 4 Stanley Cups in 8 years sound?

Earlier, I wrote about the different types of data statisticians typically encounter. In this post, we're going to look at why, when given a choice in the matter, we prefer to analyze continuous data rather than categorical/attribute or discrete data.

As a reminder, when we assign something to a group or give it a name, we have created attribute or categorical data. If we count something, like defects, we have gathered discrete data. And if we can measure something to a (theoretically) infinite degree, we have continuous data.

Or, to put in bullet points:

Categorical = naming or grouping data
Discrete = count data
Continuous = measurement data

A statistical software package like Minitab is extremely powerful and can tell us many valuable things—as long as we're able to feed it good numbers. Without numbers, we have no analyses nor graphs. Even categorical or attribute data needs to be converted into numeric form by counting before we can analyze it.

What Makes Numeric Data Discrete or Continuous?

At this point, you may be thinking, "Wait a minute—we can't really measure anything infinitely,so isn't measurement data actually discrete, too?" That's a fair question.

If you're a strict literalist, the answer is "yes"—when we measure a property that's continuous, like height or distance, we are de facto making a discrete assessment. When we collect a lot of those discrete measurements, it's the amount of detail they contain that will dictate whether we can treat the collection as discrete or continuous.

I like to think of it as a question of scale. Say I want to measure the weight of 16-ounce cereal boxes coming off a production line, and I want to be sure that the weight of each box is at least 16 ounces, but no more than 1/2 ounce over that.

With a scale calibrated to whole pounds, all I can do is put every box into one of three categories: less than a pound, 1 pound, or more than a pound.

With a scale that can distinguish ounces, I will be able to measure with a bit more accuracy just how close to a pound the individual boxes are. I'm getting nearer to continuous data, but there are still only 16 degrees between each pound.

But if I measure with a scale capable of distinguishing 1/1000th of an ounce, I will have quite a wide scale—a continuum—of potential values between pounds. The individual boxes could have any value between 0.000 and 1.999 pounds. The scale of these measurements is fine enough to be analyzed with powerful statistical tools made for continuous data.

What Can I Do with Continuous Data that I Can't Do with Discrete?

Not all data points are equally valuable, and you can glean a lot more insight from 100 points of continuous data than you can from 100 points of attribute or count data. How does this finer degree of detail affect what we can learn from a set of data? It's easy to see.

Let's start with the simplest kind of data, attribute data that rates a the weight of a cereal box as good or bad. For 100 boxes of cereal, any that are under 1 pound are classified as bad, so each box can have one of only two values.

We can create a bar chart or a pie chart to visualize this data, and that's about it:

Attribute Data Bar Chart

If we bump up the precision of our scale to differentiate between boxes that are over and under 1 pound, we can put each box of cereal into one of three categories. Here's what that looks like in a pie chart:

pie chart of count data

This gives us a little bit more insight—we now see that we are overfilling more boxes than we are underfilling—but there is still a very limited amount of information we can extract from the data.

If we measure each box to the nearest ounce, we open the door to using methods for continuous data, and get a still better picture of what's going on. We can see that, on average, the boxes weigh 1 pound. But there's high variability, with a standard deviation of 0.9. There's also a wide range in our data, with observed values from 12 to 20 ounces:

graphical summary of ounce data

If I measure the boxes with a scale capable of differentiating thousandths of an ounce, more options for analysis open up. For example, now that the data are fine enough to distinguish half-ounces (and then some), I can perform a capability analysis to see if my process is even capable of consistently delivering boxes that fall between 16 and 16.5 ounces. I'll use the Assistant in Minitab to do it, selecting Assistant > Capability Analysis:

capability analysis for thousandths

The analysis has revealed that my process isn't capable of meeting specifications. Looks like I have some work to do...but the Assistant also gives me an I-MR control chart, which reveals where and when my process is going out of spec, so I can start looking for root causes.

IMR Chart

If I were only looking at attribute data, I might think my process was just fine. Continuous data has allowed me to see that I can make the process better, and given me a rough idea where to start. By making changes and collecting additional continuous data, I'll be able to conduct hypothesis tests, analyze sources of variances, and more.

Some Final Advantages of Continuous Over Discrete Data

Does this mean discrete data is no good at all? Of course not—we are concerned with many things that can't be measured effectively except through discrete data, such as opinions and demographics. But when you can get it, continuous data is the better option. The table below lays out the reasons why.

Continuous Data

Discrete Data

Inferences can be made with few data points—valid analysis can be performed with small samples. More data points (a larger sample) needed to make an equivalent inference. Smaller samples are usually less expensive to gather Larger samples are usually more expensive to gather. High sensitivity (how close to or far from a target) Low sensitivity (good/bad, pass/fail) Variety of analysis options that can offer insight into the sources of variation Limited options for analysis, with little indication of sources of variation

I hope this very basic overview has effectively illustrated why you should opt for continuous data over discrete data whenever you can get it.

"By publishing the historical data, public dialogue that results from the data release can be more productive because you’ll be able to discuss changes over time."— Denice Ross, 5/17/2015

Last month, President Obama launched the Police Data Initiative. A key goal of the initiative was to make data about police departments more accessible to the public. Twenty-one communities decided to participate in the initial round, including Philadelphia.

Among Code for America's recommendations to help police departments get started was the suggestion to open historical records. On June 19, the data set "Philadelphia Police Advisory Commission Complaints" was made available via opendataphilly.org. The data set includes several variables about complaints made against police officers between 2009 and 2012, and gives us the chance to explore some steps you can take to clean up your data for analysis, using features in Minitab.

Proper

One thing to look for is redundant categories and labels. If you download the data and take a look at the actions that resulted from the complaints, you’ll find these values in these frequencies.

Tally for Discrete Variables: ACTION

            ACTION Count
            Accept     26
            ACCEPT    161
             Audit     11
             AUDIT     64
               NAR     16
    NoJurisdiction      2
NON-JURISDICTIONAL      1
            Reject     23
            REJECT    142
          Rejected      1
         WITHDRAWN      3
                N=    450
                *=      5

It’s easy to see that the values “Accept” and “ACCEPT” should be the same. If you're using Minitab, it can change those values for you. (If you're not using Minitab, you can get a free 30-day trial.) Try this:

Choose Calc > Calculator.
In Store result in variable, enter ‘Action taken’.
In Expression enter Proper(ACTION). Click OK.

Now there’s a column with these values and frequencies:

Tally for Discrete Variables: Action taken

      Action taken Count
            Accept    187
             Audit     75
               Nar     16
    Nojurisdiction      2
Non-jurisdictional      1
            Reject    165
          Rejected      1
         Withdrawn      3
                N=    450
                *=      5

Instead of having to make 62 corrections in the data, you have to make only 2. Prefer a different format? You could substitute LOWER or UPPER for PROPER to get all lowercase or all uppercase letters.

Left

The Philadelphia data set includes a variable for the date and time of the incident, but none of the times are recorded. Including the unused values for time yields data like these:

The time values are all 0.

To get the usable "date" portion of the data, you can use the calculator. Try this:

Choose Calc > Calculator.
In Store result in variable, enter 'Text date'.
In Expression, enter Left(DATE_, 10). Click OK.

The column that results is still formatted as text. To do an analysis where you can sort by date, you can quickly change the date format. Select a cell in the column, right-click, and select Format Column. When you pick Date from the list of types, Minitab recognizes the format for you.

Code

If you dig a bit deeper into the data, you’ll notice an oddity that’s not readily apparent. The current web site for the police in Philadelphia lists 21 districts. In the data, 23 units are included. That's because the 23rd District has been incorporated into the 22nd District, and the 4th District incorporated into the 3rd. If we want to include complaints about officers from those districts in their new districts, you can recode the districts. Try this:

Choose Data > Code > To Text.
In Code values in the following columns, enter Unit.
In Method, select Code Individual Values.
For District 4, change the Coded value to District 3.
For District 23, change the Coded value to District 22.
Click OK.

Code

Summary

                                Number
Original Value Recoded Value of Rows
    District 4     District 3        2
   District 23    District 22        7

Source data column UNIT
Recoded data column Coded UNIT

Number of unchanged rows: 446

Minitab shows you a summary table so you can see how the values were recoded and you’re ready to go!

Wrap up

Whether you have data about police complaints or patient throughput times, you’re likely to need to do a little bit of work for your data to be ready to analyze. Fortunately, Minitab makes it easy to make common adjustments like getting the case of letters to match across entries. The faster your data is ready to analyze, the faster you can do the analysis to make better decisions.

The photo of the police car is by Zuzu and is licensed under this Creative Commons License.

Last month the ESPN series Outside the Lines reported on major league pitchers suffering serious injuries from being struck in the head by line drives, and efforts MLB is making towards having protective gear developed for pitchers. You can view the report here if you'd like:

A couple of things jump out at me from the clip:

The overwhelming majority of pitchers are not interested in wearing protective gear if it is either visually obvious or noticeable to the pitcher himself, who fears it will affect his ability to pitch well.
The standard set by Major League baseball is that approved headgear must be able to protect against a ball travelling at 83 mph, the average speed in which line drives are travelling when they reach the pitcher's mound.

Torres in protective headgear Upon watching the report, I knew immediately it would have little if any impact on pitcher safety and that pitchers will continue suffering severe injuries or even death from line drives until a stronger standard is set and pitchers are forced to wear approved devises. The faulty understanding of statistics has led to the current standard, and I will outline three reasons why.

The standard was set as the average. First, I would like to say it is commendable that MLB collected data on the ball speeds in order to set the standard rather than just making some intuitive guess. However, that data was then turned into a single value, as is unfortunately so common in the world: the average. I think the problem is that what statisticians call the mean, most people refer to as the "average." When most people hear the term "average" they associate it with a meaning somewhat like "common" or "typical," but to know what is common or typical we must also know about the variation in the data. Assuming line drive speeds are symmetrically distributed, half of them will exceed 83 mph and half will not. Very few will actually be 83 mph, so that value is really not common or typical at all. In selecting this value as a standard, baseball's governing body has determined that head gear does not need to protect against half of line drives.
The standard ignores the relationship between speed and likelihood of striking the pitcher. The standard begins by ignoring half of all line drives. But it's actually worse than that. From #1 you might assume that while the average was not the best choice, cutting the rate of line drives hitting pitchers in the head and injuring them in half is a pretty good first step. But that would assume all line drive speeds are equally likely to hit the pitcher in the head, and that is certainly not the case. A pitcher has twice as long to react to a 60 mph drive as he does a 120 mph drive, which, of course, is more likely to actually hit him. Their analysis assumes the distribution of line drive speeds hitting pitchers in the head would match the distribution of all line drive speeds, whereas almost every instance of a head strike involves the ball travelling faster than the average speed. So the rule protects against line drives that are, for the most part, not actually hitting the pitcher in the head.
The standard ignores the relationship between speed and severity of injury. Aside from a pitcher being much more likely to react to a slower ball and avoid the hit in the first place, that slower ball was likely to be much less damaging if contact was made. The balls travelling very fast—which we've just stated were more likely to hit the pitcher—are also considerably more damaging and most needing of protection.

So to summarize, setting the standard at the average speed has the effect of protecting pitchers against line drives that are unlikely to hit them and will cause much less damage if they do so. Given that pitchers already don't want to wear protection and will quickly catch on to these facts intuitively (even if they don't think in statistical terms), it's hard to imagine many pitchers adopting the gear if not required, or truly being more protected in any meaningful way if they do.

As Minitab trainer Paul Sheehy was telling me recently, giving someone powerful tools like statistics and not properly training them in how to use them is letting them "run with scissors." Unfortunately in this case it is major league pitchers who stand to get hurt, and not the people carrying the scissors...

Photograph of Alex Torres by UCinternational, used under Creative Commons 2.0.

Design of Experiments (DOE) has a reputation for difficulty, and to an extent, this statistical method deserves that reputation. While it's easy to grasp the basic idea—acquire the maximum amount of information from the fewest number of experimental runs—practical application of this tool can quickly become very confusing.

steaks Even if you're a long-time user of designed experiments, it's still easy to feel uncertain if it's been a while since you last looked at split-plot designs or needed to choose the appropriate resolution for a fractional factorial design.

But DOE is an extremely powerful and useful tool, so when we launched Minitab 17, we added a DOE tool to the Assistant to make designed experiments more accessible to more people.

Since summer is here at Minitab's world headquarters, I'm going to illustrate how you can use the Assistant's DOE tool to optimize your grilling method.

If you're not already using it and you want to play along, you can download the free 30-day trial version of Minitab Statistical Software.

Two Types of Designed Experiments: Screening and Optimizing

To create a designed experiment using the Assistant, open Minitab and select Assistant > DOE > Plan and Create. You'll be presented with a decision tree that helps you take a sequential approach to the experimentation process by offering a choice between a screening design and a modeling design.

DOE Assistant

A screening design is important if you have a lot of potential factors to consider and you want to figure out which ones are important. The Assistant guides you through the process of testing and analyzing the main effects of 6 to 15 factors, and identifies the factors that have greatest influence on the response.

Once you've identified the critical factors, you can use the modeling design. Select this option, and the Assistant guides you through testing and analyzing 2 to 5 critical factors and helps you find optimal settings for your process.

Even if you're an old hand at analyzing designed experiments, you may want to use the Assistant to create designs since the Assistant lets you print out easy-to-use data collection forms for each experimental run. After you've collected and entered your data, the designs created in the Assistant can also be analyzed using Minitab's core DOE tools available through the Stat > DOE menu.

Creating a DOE to Optimize How We Grill Steaks

For grilling steaks, there aren't that many variables to consider, so we'll use the Assistant to plan and create a modeling design that will optimize our grilling process. Select Assistant > DOE > Plan and Create, then click the "Create Modeling Design" button.

Minitab brings up an easy-to-follow dialog box; all we need to do is fill it in.

First we enter the name of our Response and the goal of the experiment. Our response is "Flavor," and the goal is "Maximize the response." Next, we enter our factors. We'll look at three critical variables:

Number of turns, a continuous variable with a low value of 1 and high value of 3.
Type of grill, a categorical variable with Gas or Charcoal as options.
Type of seasoning, a categorical variable with Salt-Pepper or Montreal steak seasoning as options.

If we wanted to, we could select more than 1 replicate of the experiment. A replicate is simply a complete set of experimental runs, so if we did 3 replicates, we would repeat the full experiment three times. But since this experiment has 16 runs, and neither our budget nor our stomachs are limitless, we'll stick with a single replicate.

When we click OK, the Assistant first asks if we want to print out data collection forms for this experiment:

Choose Yes, and you can print a form that lists each run, the variables and settings, and a space to fill in the response:

Alternatively, you can just record the results of each run in the worksheet the Assistant creates, which you'll need to do anyway. But having the printed data collection forms can make it much easier to keep track of where you are in the experiment, and exactly what your factor settings should be for each run.

If you've used the Assistant in Minitab for other methods, you know that it seeks to demystify your analysis and make it easy to understand. When you create your experiment, the Assistant gives you a Report Card and Summary Report that explain the steps of the DOE and important considerations, and a summary of your goals and what your analysis will show.

Now it's time to cook some steaks, and rate the flavor of each. If you want to do this for real and collect your own data, please do so! Tomorrow's post will show how to analyze your data with the Assistant.

grill

Design of Experiments is an extremely powerful statistical method, we added a DOE tool to the Assistant in Minitab 17 to make it more accessible to more people.

Since it's summer here, I'm applying the Assistant's DOE tool to outdoor cooking. Earlier, I showed you how to set up a designed experiment that will let you optimize how you grill steaks.

If you're not already using it and you want to play along, you can download the free 30-day trial version of Minitab Statistical Software.

Perhaps you are following along, and you've already grilled your steaks according to the experimental plan and recorded the results of your experimental runs. Otherwise, feel free to download our data here for the next step: analyzing the results of our experiment.

Analyzing the Results of the Steak Grilling Experiment

After collecting your data and entering it into Minitab, you should have an experimental worksheet that looks like this:

With your results entered in the worksheet, select Assistant > DOE > Analyze and Interpret. As you can see below, the only button you can click is "Fit Linear Model."

As you might gather from the flowchart, when it analyzes your data, the Assistant first checks to see if the response exhibits curvature. If it does, the Assistant will prompt you to gather more data so you it can fit a quadratic model. Otherwise, the Assistant will fit the linear model and provide the following output.

When you click the "Fit Linear Model" button, the Assistant automatically identifies your response variable.

All you need to do is confirm your response goal—maximizing flavor, in this case—and press OK. The Assistant performs the analysis, and provides you the results in a series of easy-to-interpret reports.

Understanding the DOE Results

First, the Assistant offers a summary report that gives you the bottom-line results of the analysis. The Pareto Chart of Effects in the top left shows that Turns, Grill type, and Seasoning are all statistically significant, and there's a significant interaction between Turns and Grill type, too.

The summary report also shows that the model explains very high proportion of the variation in flavor, with an R2 value of 95.75 percent. And the "Comments" window in the lower right corner puts things if plain language: "You can conclude that there is a relationship between Flavor and the factors in the model..."

The Assistant's Effects report, shown below, tells you more about the nature of the relationship between the factors in the model and Flavor, with both Interaction Plots and Main Effects plots that illustrate how different experimental settings affect the Flavor response.

And if we're looking to make some changes as a result of our experimental results—like selecting an optimal method for grilling steaks in the future—the Prediction and Optimization report gives us the optimal solution (1 turn on a charcoal grill, with Montreal seasoning) and its predicted Flavor response (8.425).

It also gives us the Top 5 alternative solutions, shown in the bottom right corner, so if there's some reason we can't implement the optimal solution—for instance, if we only have a gas grill—we can still choose the best solution that suits our circumstances.

I hope this example illustrates how easy a designed experiment can be when you use the Assistant to create and analyze it, and that designed experiments can be very useful not just in industry or the lab, but also in your everyday life.

Where could you benefit from analyzing process data to optimize your results?

By Matthew Barsalou, guest blogger.

Many statistical tests assume the data being tested came from a normal distribution. Violating the assumption of normality can result in incorrect conclusions. For example, a Z test may indicate a new process is more efficient than an older process when this is not true. This could result in a capital investment for equipment that actually results in higher costs in the long run.

Statistical Process Control (SPC) requires either normally distributed data or a transformation must be performed on the data. It would be very risky to monitor a process with SPC charts created with data that violated the assumption of normality.

What can we do if the assumption of normality is critical to so many statistical methods? We can construct a probability plot to test this assumption.

Those of us who are a bit old-fashioned can construct a probability plot by hand, by plotting the order values (j) against the observed cumulative frequency (j- 0.5/n). Using the numbers 16, 21, 20, 19, 18 and 15, we would construct a normal probability plot by first creating the table shown below.

(j – 0.5)/6

0.158

0.325

0.492

0.658

0.825

0.992

We then plot the results as shown in the figure below.

normal probability plot

That's fine for a small data set, but nobody wants to plot hundreds or thousands of data points by hand. Fortunately, we can also use Minitab Statistical Software to assess the normality of data. Minitab uses the Anderson-Darling test, which compares the actual distribution to a theoretical normal distribution. Anderson-Darling test’s null hypothesis is “The distribution is normal.”

Anderson-Darling test:

H0: The data follow a normal distribution.

Ha: The data don’t follow a normal distribution.

Test statistic: A2 = - N – S, where and F is the cumulative distribution function of the specified distribution. We can assess the results by looking at the resulting p value.

The figure below shows a normal distribution with a sample size of 27. The same data is shown in a histogram, probability plot, dot plot and a box blot.

Probability plot

The next figure shows a normal distribution with sample a size of 208. Notice how the data is concentrated in the center of the histogram, probability plot, dot plot, and box plot.

normal probability plot

A Laplace distribution with a sample size of 208 is shown below. Visually, this data almost resembles a normal distribution; however, the Minitab generated P value of < 0.05 tells us that this distribution is not normally distributed.

normality plot

The figure below shows a uniform distribution with a sample size of 270. Even without looking at the P value we can quickly see that the data is not normally distributed.

normal probability distribution assumption plot

Back in the days of hand-drawn probability plots, the “fat pencil test” was often used to evaluate normality. The data was plotted and the distribution was considered normal if all of the data points could be covered by a thick pencil. The fat pencil test was quick and easy. Unfortunately, it is not as accurate as the Anderson-Darling test and is not a substitution for an actual test.

probability plot of normal
Fat pencil test with normally distributed data

Probability Plot of Non-Normal
Fat pencil test with non-normally distributed data

The proper identification of a statistical distribution is critical for properly performing many types of hypothesis tests or for control charting. Fortunately, we can now asses our data without having to rely on hand-drawn tests and a large diameter pencil.

To test for normality go to the Graph menu in Minitab, and select Probability Plot.

Selecting the probability plot

Click on OK to select Single if you are only looking at one column of data.

probability plot selection

Select your column of data and then click OK.

Single Probability Plot

Minitab will generate a probability plot of your data. Notice the P-value below is 0.829. We would fail to reject the null hypothesis that the distribution of our data is equal to a normal distribution when we use a P-value of 0.05 for 95% confidence.

Probability Plot of C1

Using Minitab to test data for normality is far more reliable than a fat pencil test and generally quicker and easier. However, the fat pencil test may still be a viable option if you absolutely must analyze your data during a power outage.

About the Guest Blogger

It’s been almost 5 years since I used a quotation from Ghostbusters to introduce one of my early blog posts. But as we’re getting a few bits of entertainment news about the next installment in the Ghostbusters franchise, I thought it might be a good time to talk about busting ghosts in Minitab.

In the Minitab sense, ghosts are spaces that are in your data that you can’t see. The busting action is slightly different for text data and numeric data.

Remove Spaces from Text Data

Let’s look at text data first. Let’s say that you have data that look like this:

A column that looks like the same text value repeated.

Egon with PKE meter It looks like the same value repeated over and over again. It looks like there’s nothing there but the word “Ghostbusters.”

This is the moment where you get your PKE meter out. Or rather, this is where you tally your data to see what's going on. That moment’s going to be like this:

Choose Stat > Tables > Tally Individual Variables.
In Variables, enter Title. Click OK.

The count of the single word should be the same as the number of rows in the data. It’s not. There are ghosts there that you can’t see.

Tally for Discrete Variables: Title

                      Title Count
               Ghostbusters      5
              Ghostbusters       6
             Ghostbusters        7
            Ghostbusters         5
           Ghostbusters          8
          Ghostbusters           9
         Ghostbusters            5
        Ghostbusters             7
       Ghostbusters              3
      Ghostbusters               4
     Ghostbusters               11
    Ghostbusters                 8
   Ghostbusters                  8
Ghostbusters                   7
Ghostbusters                    5
Ghostbusters                     2
                         N=    100

Extra spaces show when you highlight text in the data window. Tally, standing in for your PKE meter, shows a stairstep pattern because of the extra spaces at the end of the word. Extra spaces that you can't see in the data window unless you highlight a cell.

One way to fix the problem would be to go through the data window clicking on the individual cells to find the ones with extra spaces to delete. Fortunately, Minitab gives you a better way to make all of these values the same. You can use the calculator function TRIM:

Choose Calc > Calculator.
In Store Result in Variable, enter Title2.
In Expression, enter Trim(Title). Click OK.

If you tally the new values, ghostbusters trap you get the result you expect when all of the values are the same:

Tally for Discrete Variables: Title2

      Title2 Count
Ghostbusters    100
          N=    100

All those ghosts are in the trap, not in your data.

Remove Spaces from Numeric Data

If you wanted, you could trim a series of numbers to eliminate spaces, but a bigger problem remains unsolved. For example, if you paste some numbers formatted as text from Excel, you might get a column like this:

Numeric data that looks like the same value repeated.

These numbers look like numbers. It doesn’t look like there are extra spaces that you can’t see, but the spaces are there. We can see them if we tally them:

Tally for Discrete Variables: Numbers
Numbers Count
    9     55
     9       8
    9       11
   9        10
9          8
9           8
     N=    100

But the column is also formatted as text. That’s what the “–T” next to the C9 means in the worksheet. If we want to analyze the numbers, find the mean or create a scatterplot, then we need the numbers to be formatted as numbers. If we use the TRIM function, the numbers will still be text. For this, we’re going to have to use something a little bit stronger:

Ghostbusters crossing the streams

No, not crossing the streams. (Egon said that crossing the streams was bad.) Instead, we change the column format.

Put the cursor in the column of numbers.
Right-click and select Format Column.
In Choose type, select Automatic numeric.
Click OK.

Numeric data formatted as numeric data

Tally for Discrete Variables: Numbers
Numbers Count
9 100
N= 100

In one step, you’ve converted the format of the column and eliminated the extra spaces. The ghosts are all gone.

Wrap-up

If you have to work with data, especially data that someone types, then you’re going to have to deal with messy data. The TRIM function and the ability to change column formats are two tools that Minitab gives you to get the ghosts out of your data.

Have other problems? Minitab has lots of other tools to simplify the cleaning that you’ll need to do for your data. If you’re ready for a few more tricks, check out 3 Ways to Clean Up Data So You Can Promote Public Dialog and 3 Features to Make You Glad You're You When You Have to Clean Data in Minitab.

Every now and then I’ll test my Internet speed at home using such sites as http://speedtest.comcast.net or http://www.att.com/speedtest/. My need to perform these tests could stem from the cool-looking interfaces they employ on their site, as they display the results using analog speedometers and RPM meters. They could also stem from the validation that I need in "getting what I am paying for," although I realize that there are other factors that determine what Internet speed you ultimately end up with when you browse the Web.

Recently I started thinking about the distribution of these speeds. If I were to run enough tests, would these speeds be normally distributed?

When performing an Internet speed test, you are given an estimated download and upload speed. The download speed is the rate at which data travels from the Internet to your device, and the upload speed is the rate at which data travels from your device to the Internet. I was also curious as to whether the population means of these speeds were statistically different.

Is the Data Normally Distributed?

I ran 30 speed tests from my office at Minitab and recorded the download and upload data into a Minitab Statistical Software worksheet: Here is a sample of the data:

I went to Stat > Basic Statistics > Normality Test. Here are the probability plots for download and upload speed.

I’ll be using an alpha level of 0.05 to compare the p-value to. Both probability plots show p-values greater than alpha, and therefore we do not have enough evidence to reject the null hypothesis. As a quick reminder, the null hypothesis is that our data follows a normal distribution. We can assume normality.

Is There a Difference Between Upload and Download Speed?

Let’s find out if there was a statistical difference between the download speed and the upload speed.

Go to Stat > Basic Statistics > 2-Sample t:

I chose “Each Sample is in its own column” under the dropdown, and entered in the column for download speed for Sample 1 and upload speed for Sample 2.

If you click on Options you’ll see a checkbox for "Assume Equal Variances." Checking this box will result in a slightly more powerful 2-Sample-t test. But how do I know if the variances are equal or not? By using quick test in Minitab!

I cancelled out of the 2-Sample t dialog window and quickly ran an Equal Variances test (Stat > Basic Statistics > 2 Variances) and received these results:

Given that our p-value is greater than an alpha of 0.05, we don’t have enough evidence to say that the two variances are statistically different. Therefore, we are able to go back to the 2-Sample t test and check the box for "Assume equal Variances."

Here's the output from my 2-Sample t-test:

Since our p-value is less than 0.05, we can reject the null hypothesis (that both means are the same) and say that the population means for download and upload speed are statistically different.

Vrrrrooooooooom!

I was curious as to why the upload speeds were higher than the download speeds during my testing. Whenever I’ve tested speeds at my house, I’ve always seen the reverse.

I asked someone here at Minitab who is well versed in network setup, and he said that there could have been more bandwidth consumption from my coworkers than normal at the time of data collection. This extra consumption can push the download speeds below the upload speeds. He also said that the nature of how the Internet is configured at a company can be a contributing factor as well.

If you were given an expected download rate by your cable company, you could add to this experiment by performing a 1-Sample t-test. The expected download rate would serve as your hypothesized mean. You would then be able to perform a hypothesis test to see if your mean is statistically different from your hypothesized mean.

If you find that you're not getting the speeds you wanted, I wouldn't start running around with pitchforks just yet. According to http://www.cnet.com/how-to/how-to-find-a-reliable-network-speed-test/ , accuracy and consistency in speeds may depend on what online speed test you are using. But comparing the different speed testing tools is an analysis for another day!

I don't like the taste of crow. That's a shame, because I'm about to eat a huge helping of it.

I'm going to tell you how I messed up an analysis. But in the process, I learned some new lessons and was reminded of some older ones I should remember to apply more carefully.

This Failure Starts in a Victory

My mistake originated in the 2015 Triple Crown victory of American Pharoah. I'm no racing enthusiast, but I knew this horse had ended almost four decades of Triple Crown disappointments, and that was exciting. I'd never seen a Triple Crown won before. It hadn't happened since 1978.

So when an acquaintance asked to contribute a guest post to the Minitab Blog that compared American Pharoah with previous Triple Crown contenders, including the record-shattering Secretariat, who took the Triple Crown in 1973, I eagerly accepted.

In reviewing the post, I checked and replicated the contributor's analysis. It was a fun post, and I was excited about publishing it. But a few days after it went live, I had to remove it: the analysis was not acceptable.

To explain how I made my mistake, I'll need to review that analysis.

Comparing American Pharoah and Secretariat

In the post, we used Minitab's statistical software to compare Secretariat's performance to other winners of Triple Crown races.

Since 1926, the Belmont Stakes has been the longest of the three races at 1.5 miles. The analysis began by charting 89 years of winning horse times:

Only two data points were outside of the I-chart's control limits:

The fastest winner, Secretariat's 1973 time of 144 seconds
The slowest winner, High Echelon's 1970 time of 154 seconds

The average winning time was 148.81 seconds, which Secretariat beat by more than 4 seconds.

Applying a Capability Approach to the Race Data

Next, the analysis approached the data from a capability perspective: Secretariat's time was used as a lower spec limit, and the analysis sought to assess the probability of another horse beating that time.

The way you assess capability depends on the distribution of your data, and a normality test in Minitab showed this data to be nonnormal.

When you run Minitab's normal capability analysis, you can elect to apply the Johnson transformation, which can automatically transform many nonnormal distributions before the capability analysis is performed. This is an extremely convenient feature, but here's where I made my mistake.

Running the capability analysis with Johnson transformation, using Secretariat's 144-second time as a lower spec limit, produced the following output:

The analysis found a .36% chance of any horse beating Secretariat's time, making it very unlikely indeed.

The same method was applied to Kentucky Derby and Preakness data.

We found a 5.54% chance of a horse beating Secretariat's Kentucky Derby time.

We found a 3.5% probability of a horse beating Secretariat's Preakness time.

Despite the billions of dollars and countless time and effort spent trying to make thoroughbred horses faster over the past 43 years, no one has yet beaten “Big Red,” as Secretariat was known. So the analysis indicated that American Pharoah may be a great horse, but he is no Secretariat.

That conclusion may well be true...but it turns out we can't use this analysis to make that assertion.

My Mistake Is Discovered, and the Analysis Unravels

Here's where I start chewing those crow feathers. A day or so after sharing the post about American Pharoah, a reader sent the following comment:

Why does Minitab allow a Johnson Transformation on this data when using Quality Tools > Capability Analysis > Normal > Transform, but does not allow a transformation when using Quality Tools > Johnson Transformation? Or could I be doing something wrong?

Interesting question. Honestly, it hadn't even occurred to me to try to run the Johnson transformation on the data by itself.

But if the Johnson Transformation worked when performed as part of the capability analysis, it ought to work when applied outside of that analysis, too.

I suspected the person who asked this question might have just checked a wrong option in the dialog box. So I tried running the Johnson Transformation on the data by itself.

The following note appeared in Minitab's session window:

no transformation is made

Uh oh.

Our reader hadn't done anything wrong, but it was looking like I made an error somewhere. But where?

I'll show you exactly where I made my mistake in my next post.

Photo of American Pharoah used under Creative Commons license 2.0. Source: Maryland GovPics https://www.flickr.com/people/64018555@N03

Last time, I told you how I had double-checked the analysis in a post that involved running the Johnson transformation on a set of data before doing normal capability analysis on it. A reader asked why the transformation didn't work on the data when you applied it outside of the capability analysis. DOH!

I hadn't tried transforming the data that way, but if the transformation worked when performed as part of the capability analysis, it should work when applied outside of that analysis, too.

But the reader was correct. The transformation failed when applied by itself.

What Happened?

When I'd performed the capability analysis with the Johnson transformation option selected, the analysis seemed fine to me. It had been a while since I'd done a capability analysis, but the graph looked okay.

Then I remembered one of my first Minitab instructors, who told us "Always look at the session window." So I did. And there it was:

Yes, the process capability analysis had been performed...on data that hadn't been transformed. I missed it. And it wasn't until a reader tried running the analysis a different way that my oversight was revealed.

Missing the First Warning

While Minitab does warn you that the transformation failed, you need to check the session window to see it. I've used Minitab and other statistical software packages for some time now, and I know that it's important to look at all of the output.

In this case, I only looked at the graph. Graphs tell you a lot, but you shouldn't rely on graphs alone. I knew this, and I usually do check Minitab's session window...but in this case, I didn't.

Knowing What to Look For

While I should have checked the session window, there's another reason I missed the fact that the transformation hadn't occurred: when it comes to capability analysis, I was out of practice.

Like most people who use Minitab, I have a wide range of responsibilities. Some involve statistics and data analysis, and many do not. I do some types of analysis far more frequently than others. Capability is one that I hadn't performed in a while.

Given the time that passed since the last time I did a capability analysis with transformed data, I should have been more thorough in reviewing the output, shown here:

My mistake seems obvious now: this graph contains a huge warning that the transformation failed. However, the warning lies not in what you see above, but instead in what this graph does not show.

For comparison, here's a capability report that involves a successful transformation:

Yeah...when you see the transformation equation in the subhead of this graph, not to mention the words "After Transformation" in the data table, their absence in the earlier graph is very conspicuous.

Thus, I missed my second opportunity to realize that the transformation had failed. Unfortunately, that meant that the analysis of the Triple Crown data wasn't valid. I felt like a fool for missing something that seems so obvious in hindsight.

You can bet that I'll remember to check the session window more vigilantly, and that I'll be quite a bit more cautious when performing analyses that I haven't done in a while.

In fact, after realizing my mistake, I tried doing this analysis using the capability tools in the Assistant, which duly notified me that the analysis was suspect. Would that I had thought to use the Assistant, at least to double-check my results, in the first place!

Owning Up

I removed the post about American Pharoah from the blog. Then I wrote to the person who had caught my error, and expressed my gratitude—and chagrin—that he had noticed it.

But it turned out I had even more lessons to learn from this failed analysis.

Photo by Alex E. Proimos, used under Creative Commons 2.0.

If you've read the first two parts of this tale, you know it started when I published a post that involved transforming data for capability analysis. When an astute reader asked why Minitab didn't seem to transform the data outside of the capability analysis, it revealed an oversight that invalidated the original analysis.

lemons and lemonade I removed the errant post. But to my surprise, the reader who helped me discover my error, John Borneman, continued looking at the original data. He explained to me, "I do have a day job, but I'm a data geek. Plus, doing this type of analysis ultimately helps me analyze data found in my real work!"

I want to share what he did, because it's a great example of how you can take an analysis that doesn't work, ask a few more questions, and end up with an analysis that does work.

Another Look at the Original Analysis

At root, the original post asked, "What is the probability of any horse beating Secretariat's record?" A capability study with Secretariat's winning time as the lower spec limit would provide a estimate of that probability, but as the probability plot below indicates, the data was not normal:

So we ran Stat > Capability Analysis > Normal and selected the option to apply the Johnson transformation before calculating capability. Minitab returned a capability analysis, but the resulting graph doesn't explicitly note that the Johnson transformation was not used.

Note the lack of information about the transformation in the preceding graph. If you don't see details about the transformation, it means the transformation failed. But I failed to notice what wasn't there. I also neglected to check the Session Window, which does tell you the transformation wasn't applied:

Applying the Transformation by Itself

When you select the Johnson transformation as part of the capability analysis in Minitab, the transformation is just a supporting player to the headliner, capability analysis. The transformation doesn't get a lot of attention.

But using Stat > Quality Tools > Johnson Transformation places the spotlight exclusively on the transformation, and Minitab highlights whether the transformation succeeds—or, in this case, fails.

When I looked at this data, I saw that it wasn't normally distributed. But Borneman noticed something else: the data had an ordinal pattern—the race times fell into buckets that were one full second apart.

That means the data lacked discrimination: it was not very precise.

While ordinal data can be used in many analyses, poor discrimination often causes problems when trying to transform data or fit it to a common distribution. Capability studies, where the data at the tails is important, really shouldn't be performed with ordinal data—especially when there is low discrimination.

What Can We Do If the Data Is Truly Ordinal?

But other techniques are available, particularly graphical tools, including box plots and time series plots. And if you wish to compare two groups of data and the data is ordinal with more than 10 categories, you can use ANOVA, a t-test, or even non-parametric tests such as Moods Median.

Playing out the "what if" scenario that this data was ordinal, Borneman used this approach to see if there was a difference between the Kentucky derby winning times in races run between 1875 and 1895 and those between 1896 and 2015.

Minitab Output

"The race was 1.5 miles until 1896, when it was shortened to 1.25 miles," Borneman says when looking at the results. "So obviously we'd expect to see a difference, but it's a good way to illustrate the point."

Ordinal data is valuable, but given its limited discrimination, it can only take you so far.

What Can We Do If the Data Is Not Truly Ordinal?

Borneman soon realized that the original data must have been rounded, and more precise data might not be ordinal. "Races clock the horse's speed more accurately than to the nearest second," he says. "In fact, I found that the Derby clocks times to the nearest 1/100 of a second since 2001. The race was timed to the 1/4 second from 1875 to 1905, and to the 1/5 second 1906 to 2000."

He found Kentucky Derby winning race times with more precise measurements, and not the rounded times:

Then he compared the rounded and non-rounded data. "The dot plot really shows the differences in discrimination between these two data sets," he says.

Does the New Data Fit the Normal Distribution?

Borneman wondered if the original analysis could be revisited with this new, more precise data. But a normality test showed the new data also was not normally distributed, and that it didn't fit any any common distribution.

However, running the Johnson Transformation on this data worked!

That meant the more detailed data could be used to perform the capability analysis that failed with the original, rounded data.

An Even More Dramatic Result

Running the capability study using the Johnson transformation and using Secretariat's time as the lower spec limit, Borneman found that the probability of another horse getting a time less than 119.4 seconds is 0.32%.

This is quite a difference from the original analysis, which found about a 5% chance of another horse beating Secretariat's time. In fact, it adds even more weight to the original post's argument that Secretariat was unique among Triple Crown winners.

Now, it should be noted that using a capability study to assess the chance of a future horse beating Secretariat's time is a bit, well, unorthodox. It may make for a fun blog post, but it does not account for the many factors that change from race to race.

"And as my wife—a horse rider and fanatic—pointed out, we also don't know what type of race each jockey and trainer ran," Borneman told me. "Some trainers have the goal to win the race, and not necessarily beat the fastest time."

Borneman's right about this being an off-label use of capability analysis. "On the other hand," he notes, "Secretariat's time is definitely impressive."

What Have I Learned from All This?

In the end, making this mistake reinforced several old lessons, and even taught me some new ones. So what am I taking away from all of this?

Graphs are great, but you can't assume they tell the whole story. Check all of the feedback and results available.
Know what the output should include. This is especially important if it's been a while since you performed a particular analysis. A quick peek at Minitab Help or Quality Trainer is all it takes.
Try performing the analyses in different ways. If I had performed this capability analysis using the Assistant in addition to using the Stat menu, for example, I would have discovered the problem earlier. And it would only have taken a few seconds.

And here's the biggest insight I'm taking from this experience:

When your analysis fails, KEEP ASKING QUESTIONS. The original analysis failed because the data could not be transformed. But by digging just a little deeper, Borneman realized that rounded data was inhibiting the successful transformation. And by asking variations on "what if," he demonstrated that you can still get good insights—even when your data won't behave the way you'd hoped.

I'm glad to learn these lessons, even at the cost of some embarrassment over my initial mistake. I hope sharing my experience will help you avoid a similar situation.

Operational Definitions: The First Step in a Statistical Analysis (Even after the Apocalypse)

How to Explore Interactions with Line Plots

A Closer Look at Probability and Survival Plots

3 Features to Make You Glad You're You When You Have to Clean Data in Minitab

What Is the F-test of Overall Significance in Regression Analysis?

How Is Cpk Calculated When the Subgroup Size Is 1?

Using Quality Tools Like FMEA in Pathogen Testing

American Pharoah and the Belmont Stakes: A Statistical Journey through Horse Racing History

Are the Chicago Blackhawks Currently the Luckiest Team in Sports?

Why Is Continuous Data "Better" than Categorical or Discrete Data?

3 Ways to Clean Up Data So You Can Promote Public Dialog

How MLB's Understanding of Line Drive Data Fails to Protect Pitchers

Applying DOE for Great Grilling, part 1

Applying DOE for Great Grilling, part 2

Pencils and Plots: Assessing the Normality of Data

Ghostbusting Spaces in Minitab

T-tests for Speed Tests: How Fast Is Internet Speed?

Lessons from a Statistical Analysis Gone Wrong, part 1

Lessons from a Statistical Analysis Gone Wrong, Part 2

Lessons from a Statistical Analysis Gone Wrong, Part 3