Gage Linearity and Bias: Wake Up and Smell Your Measuring System

February 24, 2017, 5:14 am

≫ Next: How to Use the "Swiss Army Knife" Control Chart

≪ Previous: Three Common P-Value Mistakes You'll Never Have to Make

Right now I’m enjoying my daily dose of morning joe. As the steam rises off the cup, the dark rich liquid triggers a powerful enzyme cascade that jump-starts my brain and central nervous system, delivering potent glints of perspicacity into the dark crevices of my still-dormant consciousness.

Feels good, yeah! But is it good for me? Let’s see what the studies say…

Drinking more than 4 cups of coffee per day is associated with a higher risk of death from all causes
Drinking coffee is inversely associated with the mortality risk, with those drinking 4 cups a day having the lowest risk of death from all causes
Drinking 2 to 4 cups of coffee a day is associated with a higher risk of cardiovascular disease
Drinking 3.5 cups of coffee per day is associated with a lower risk of cardiovascular disease

Hmm. These are just a few results from copious studies on coffee consumption. But already I'm having a hard time processing the information.

Maybe another cup of coffee would help. Er...uh...maybe not.

The pivotal question you should ask before you perform any analysis

There are a host of possible explanations that might help explain these seemingly contradictory study results.

Perhaps the studies utilized different study designs, different statistical methodologies, different survey techniques, different confounding variables, different clinical endpoints, or different populations. Perhaps the physiological effects of coffee are modulated by the dynamic interplay of a complex array of biomechanisms that are differently triggered in each individual based on their unique, dynamic phenotype-genotype profiles.

Or perhaps...just perhaps...there's something even more fundamental at play. The proverbial elephant in the room of any statistical analysis. The essential, pivotal question upon which all your results rest...

"What am I measuring? And how well am I actually measuring what I think I'm measuring?"

Measurement system analysis helps ensure that your study isn't doomed from the start.

A measurement systems analysis (MSA) evaluates the consistency and accuracy of a measuring system. MSA helps you determine whether you can trust your data before you use a statistical analysis to identify trends and patterns, test hypotheses, or make other general inferences.

MSA is frequently used for quality control in the manufacturing industry. In that context, the measuring system typically includes the data collection procedures, the tools and equipment used to measure (the "gage"), and the operators who measure.

Coffee consumption studies don't employ a conventional measuring system. Often, they rely on self-reported data from people who answer questionnaires about their life-style habits, such as "How many cups of coffee do you drink in a typical day?" So the measuring "system," loosely speaking, is every respondent who estimates the number of cups they drink. Despite this, could MSA uncover potential issues with measurements collected from such a survey?

Caveat: What follows is an exploratory exercise performed with small set of nonrandom data for illustrative purposes only. To see standard MSA scenarios and examples, including sample data sets, go to the Minitab's online dataset library and select the category Measurement systems analysis.

Gage Linearity and Bias: "Houston, we have a problem..."

For this experiment (I can't call it a study), I collected different coffee cups in the cupboard of our department lunchroom (see image at right). Then I poured different amounts of liquid into each cup and and asked people to tell me how full the cup was. The actual amount of liquid was 0.50 cup, 0.75 cup, or 1 cup, as measured using a standard measuring cup.

To evaluate the estimated "measurements" in relation to the actual reference values, I performed a gage linearity and bias study (Stat > Quality Tools > Gage Study > Gage Linearity and Bias Study). The results are shown below.

Note: A gage linearity and bias study evaluates whether a measurement system has bias when compared to a known standard. It also assesses linearity—the difference in average bias through the expected operating range of the measuring device. For this example, I didn't enter an estimate of process variation, so the results don't include linearity estimates.

The Y axis shows the amount of bias, which is the difference between the observed measurement using the gage and the reference or master value. For this study, bias is the difference between the reported volume measured using different coffee cups minus the actual measured volume using a standard cup. If the measurements perfectly match the reference values, the data points on the graph should fall along the line bias = 0, with a slope of 0.

That's obviously not the case here. The estimated measurements for all three reference values show considerable negative bias. That is, when using the coffee cups in our department lunchroom as "gages", every person's estimated measurement was much smaller than the actual amount of liquid. Not a surprise, because the coffee cups are larger than a standard cup. (There are coffee cups that hold about one standard cup, by the way, such as the cup that I use every morning. But most Americans don't drink from coffee cups this small. It was designed back in the '50s, when most things—houses, grocery carts, cheeseburgers—were made in more modest proportions).

The Gage Bias table shows that the average bias increases as the amount of liquid increases. And even though this was a small sample, the bias was statistically significant (P < 0.000). Importantly, notice that the bias wasn't consistent at each reference value—there is a considerable range of bias among the estimates at each reference value.

Despite its obvious limitations, this informal, exploratory analysis provides some grounds for speculation.

What does "one cup of coffee" actually mean in studies that use self-reported data? What about categories such as 1-2 cups, or 2-4 cups? If it's not clear what x cups of coffee actually refers to, what do we make of risk estimates that are specifically associated with x number of cups of coffee? Or meta-analyses that combine self-reported coffee consumption data from different countries (equating one Japanese "cup of coffee", say, with one Australian "cup of coffee"?)

Of course, perfect data sets don't exist. And it's possible that some studies may manage to identify valid overall trends and correlations associated with increasing/decreasing coffee consumption.

Still, let's just say that a self-reported "cup of coffee" might best be served not with cream and sugar, but with a large grain of salt.

So before you start brewing your data...

And before you rush off to calculate p-values...it's worth taking the extra time and effort to make sure that you're actually measuring what you think you're measuring.

↧

How to Use the "Swiss Army Knife" Control Chart

February 27, 2017, 5:00 am

≫ Next: Creating and Reading Statistical Graphs: Trickier than You Think

≪ Previous: Gage Linearity and Bias: Wake Up and Smell Your Measuring System

A recent discussion on the Minitab Network on LinkedIn pertained to the I-MR chart. In the course of the conversation, a couple of people referred to it as "The Swiss Army Knife of control charts," and that's a pretty great description. You might be able to find more specific tools for specific applications, but in many cases, the I-MR chart gets the job done quite adequately.

When you're collecting samples of data to learn about your process, it's generally a good idea to group the sample data into subgroups, if possible. The idea is that these subgroups represent "snapshots" of your process. But what if you can't? Your process might have a very long cycle time, or sampling or testing might be destructive or expensive. Production volume may be too low. Or grouping your measurements might not feasibly capture variability for a given time. In many such instances, an I-MR chart is a good way to go.

What Can an I-MR Chart Do?

Let's take a closer look at this "Swiss Army Knife" control chart and see what it does and how it works. The I-MR chart, like other control charts, has three main uses:

To monitor the stability of your process.
Even the most stable process has variation in some amount, and attempts to "fix" normal fluctuations in a process may actually introduce instability. An I-MR chart can show you changes that should be addressed.
To determine whether your process is stable enough to improve.
You generally don't want to make improvements to a process that isn't stable. That's because the instability keeps you from confidently assessing the impact of your changes. You can confirm (or deny) your process stability with an I-MR chart before you make improvements.
To demonstrate process performance improvements.
If your improvements had a big impact, how do you show your stakeholders and higher-ups? Before-and-after I-MR charts provide powerful visual proof.

Now that we know what the I-MR chart can do, let's consider what it is. The I-MR is actually the combination of two different charts in a single presentation. The graph's top part is an Individuals (I) chart. It shows you the value of each observation (the individuals), and helps you assess the center of the process.

I chart

The graph at the bottom is called a Moving Range (MR) chart. It calculates the variation of your process using ranges of two (or more) successive observations, and plots them.

MR Chart

The green line represents the process mean and process variation on the I and MR portions of the chart, respectively, while the red lines represent the upper and lower control limits.

How to Create an I-MR Chart

Suppose the chemical company you work for makes a custom solution, and you need to assess whether its pH value is consistent over time. You record a single pH measurement per batch. Since you are collecting individual samples rather than subgroups, the I-MR chart can help.

pH data

You record pH measurements for 25 consecutive batches. To prepare that data for the I-MR chart, just enter those measurements, in order, in a single column of a Minitab worksheet. (You can download this data set here to follow along. If you don't already have Minitab Statistical Software, you can use the free trial.)

Now select Stat > Control Charts > Variables Charts for Individuals > I-MR from the menu, and choose pH as the Variable. (If you have more than one variable you want to chart, you can enter more than one column here and Minitab will produce multiple I-MR charts simultaneously.) If you want to add labels, divide the data into stages, and more, you can do that in the "I-MR Options" subdialog.

Let's assume that we want to detect any possible special-cause variation. Click I-MR Options and select Tests. These tests highlight points that exceed the control limits and detect specific patterns in the data. In the dropdown menu, select "Perform all tests for special causes," and then OK out of the dialog.

Check the MR Chart First

After you press OK, Minitab generates your I-MR chart:

I-MR Chart of pH

It might seem counterintuitive, but you should examine the MR chart at the bottom first. This chart reveals if the process variation is in or out of control. The reason you want to check this first is that if the MR chart is out of control, the I-chart control limits won't be accurate. In that case, any unusual points on the I chart may result from unstable variation, not changes in the process center. But when the MR chart is in control, an out-of-control I chart does indicate changes in the process center.

When points fail Minitab's tests, they are marked in red. In this MR chart, none of the individual points fall outside the lower and upper control limits of 0 and 0.4983, respectively. In addition, the points also have a random pattern. That means our process variation is in control, and we're good to take the next step: looking at the I Chart.

Check the I Chart After the MR Chart

The individuals (I) chart shows if your process mean is in control. In contrast to the MR chart, this I chart shows evidence of potential nonrandom patterns.

I chart of pH

Minitab can perform up to eight different special-cause variation tests for the I chart. Problem observations are marked in red, and also display the number of the corresponding failed test.

This I chart shows that three separate observations failed two different tests. We can check the Session Window for more details about why Minitab flagged each point:

Test Results for I Chart

Observation 8 failed Test 1, which means this observation was more than 3 standard deviations from the center line—the strongest evidence that a process is out of control. Observations 20 and 21 failed Test 5, which tests for a run of two out of three points with the same sign that fall more than two standard deviations from the center line. Test 5 provides additional sensitivity for detecting smaller shifts in the process mean.

This I-MR chart indicates that the process average is unstable and therefore the process is out of control, possibly due to the presence of special causes.

After looking at your data in the I-MR chart, you know there may be a problem that needs to be addressed. That's the whole purpose of the control chart! The next step is to identify and address the source of this special-cause variation. Until these causes are eliminated, the process cannot achieve a state of statistical control.

If you'd like to know more about the behind-the-scenes math that goes into this chart, check out my colleague Marilyn Wheatley's post about how I-MR control chart limits are calculated.

↧

Creating and Reading Statistical Graphs: Trickier than You Think

March 1, 2017, 5:30 am

≫ Next: The Empirical CDF, Part 2: Software vs. Etch-a-Sketch

≪ Previous: How to Use the "Swiss Army Knife" Control Chart

My colleague Cody Steele wrote a post that illustrated how the same set of data can appear to support two contradictory positions. He showed how changing the scale of a graph that displays mean and median household income over time drastically alters the way it can be interpreted, even though there's no change in the data being presented.

Graph interpretation is tricky, especially if you're doing it quickly When we analyze data, we need to present the results in an objective, honest, and fair way. That's the catch, of course. What's "fair" can be debated...and that leads us straight into "Lies, damned lies, and statistics" territory.

Cody's post got me thinking about the importance of statistical literacy, especially in a mediascape saturated with overhyped news reports about seemingly every new study, not to mention omnipresent "infographics" of frequently dubious origin and intent.

As consumers and providers of statistics, can we trust our own impressions of the information we're bombarded with on a daily basis? It's an increasing challenge, even for the statistics-savvy.

So Much Data, So Many Graphs, So Little Time

The increased amount of information available, combined with the acceleration of the news cycle to speeds that wouldn't have been dreamed of a decade or two ago, means we have less time available to absorb and evaluate individual items critically.

A half-hour television news broadcast might include several animations, charts, and figures based on the latest research, or polling numbers, or government data. They'll be presented for several seconds at most, then it's on to the next item.

Getting news online is even more rife with opportunities for split-second judgment calls. We scan through the headlines and eyeball the images, searching for stories interesting enough to click on. But with 25 interesting stories vying for your attention, and perhaps just a few minutes before your next appointment, you race through them very quickly.

But when we see graphs for a couple of seconds, do we really absorb their meaning completely and accurately? Or are we susceptible to misinterpretation?

Most of the graphs we see are very simple: bar charts and pie charts predominate. But as statistics educator Dr. Nic points out in this blog post, interpreting even simple bar charts can be a deceptively tricky business. I've adapted her example to demonstrate this below.

Which Chart Shows Greater Variation?

A city surveyed residents of two neighborhoods about the quality of service they get from local government. Respondents were asked to rate local services on a scale of 1 to 10. Their responses were charted using Minitab Statistical Software, as shown below.

Take a few seconds to scan the charts, then choose which neighborhood's responses exhibit the most variation, Ferndale or Lawnwood?

Lawnwood Bar Chart

Ferndale Bar Chart

Seems pretty straightforward, right? Lawnwood's graph is quite spiky and disjointed, with sharp peaks and valleys. The graph of Ferndale's responses, on the other hand, looks nice and even. Each bar's roughly the same height.

It looks like Lawnwood's responses have the most variation. But let's verify that impression with some basic descriptive statistics about each neighborhood's responses:

Descriptive Statistics for Fernwood and Lawndale

Uh-oh. A glance at the graphs suggested that Lawnwood has more variation, but the analysis demonstrates that Ferndale's variation is, in fact, much higher. How did we get this so wrong?

Frequencies, Values, and Counterintuitive Graphs

The answer lies in how the data were presented. The charts above show frequencies, or counts, rather than individual responses.

What if we graph the individual responses for each neighborhood?

Lawndale Individuals Chart

Ferndale Individuals Chart

In these graphs, it's easy to see that the responses of Ferndale's citizens had much more variation than those of Lawnwood. But unless you appreciate the differences between values and frequencies—and paid careful attention to how the first set of graphs was labeled—a quick look at the earlier graphs could well leave you with the wrong conclusion.

Being Responsible

Since you're reading this, you probably both create and consume data analysis. You may generate your own reports and charts at work, and see the results of other peoples' analyses on the news. We should approach both situations with a certain degree of responsibility.

When looking at graphs and charts produced by others, we need to avoid snap judgments. We need to pay attention to what the graphs really show, and take the time to draw the right conclusions based on how the data are presented.

When sharing our own analyses, we have a responsibility to communicate clearly. In the frequency charts above, the X and Y axes are labeled adequately—but couldn't they be more explicit? Instead of just "Rating," couldn't the label read "Count for Each Rating" or some other, more meaningful description?

Statistical concepts may seem like common knowledge if you've spent a lot of time working with them, but many people aren't clear on ideas like "correlation is not causation" and margins of error, let alone the nuances of statistical assumptions, distributions, and significance levels.

If your audience includes people without a thorough grounding in statistics, are you going the extra mile to make sure the results are understood? For example, many expert statisticians have told us they use the Assistant in Minitab 17 to present their results precisely because it's designed to communicate the outcome of analysis clearly, even for statistical novices.

If you're already doing everything you can to make statistics accessible to others, kudos to you. And if you're not, why aren't you?

↧

The Empirical CDF, Part 2: Software vs. Etch-a-Sketch

March 3, 2017, 5:00 am

≫ Next: P-value Roulette: Making Hypothesis Testing a Winner’s Game

≪ Previous: Creating and Reading Statistical Graphs: Trickier than You Think

Like many, my introduction to 17th-century French philosophy came at the tender age of 3+. For that is when I discovered the Etch-a-Sketch®, an entertaining ode to Descartes' coordinate plane.

Little did I know that the seemingly idle hours I spent doodling on my Etch-a-Sketch would prove to be excellent training for the feat that I attempt today: plotting an Empirical Cumulative Distribution Function (ECDF).

How Can We Use an Empirical CDF?

We can use an empirical cdf graph to:

Determine how well data follow a specific distribution.
Get estimates of parameters and population percentiles.
Compare sample distributions.

Like the Etch-a-Sketch, an ECDF is fairly simple. Each unique value in a sample is represented by a step on the ECDF. The data values are represented on the x-axis, and the y-axis represents the cumulative percentage (from 0% to 100%). In other words, you plot the value of each observation against its actual cumulative probability. As you move from left to right, the steps of the ECDF climb toward 100%.

Creating an Empirical CDF using Software

Here's an example. This is an ECDF of the Carbo variable in the Cereal.MTW data set. To open this data set in Minitab 17, choose Help > Sample Data. I sorted the values in ascending order for this demonstration. To do the same, click in the column to select part of it, then right-click and choose Sort Columns > Entire Worksheet > Smallest to Largest. Finally, choose Graph > Empirical CDF..., select "Single," and in the dialog box choose "Carbo" as the graph variable as shown. (Note that I've also clicked the "Distribution" button and opted to display only the connect line between the points in my graph.)

empircal cdf dialog

Minitab outputs the following graph:

carbo data empirical cdf

The first step in the ECDF shows that the smallest value in the sample is 13 (X) and that 8.33% (Y) of the values in the sample are less than or equal to 13. (With 12 observations in the sample, each observation accounts for 1 / 12 x 100 = 8.33% of the sample).

empirical cdf, step 2

The next smallest value in the sample is 19. Together, the two observations (13 and 19) account for 16.66% of the sample. Notice that the 4th step, which represents the x-value of 23, is steeper than most. That's because there are 3 observations that have the value 23, so this step increases the cumulative percentage from 25% to 50%.

Using an Etch-an-ECDF

To turn my Etch-a-Sketch into an "Etch-an-ECDF," I needed to perform some careful knob calibration. I didn't want to eyeball the values, so I had to do some calculations.

The horizontal axis represents the units of x, from 13 on the left to 28 on the right. After some twiddling, I determined that 4.5 rotations of the x-knob are required to move the stylus all the way from the left edge of the screen to the right edge. So 4.5 rotations equals 15 x-units. Which means that one x-unit is equal to 4.5 / 15 = 0.3 rotations.

The vertical axis represents the cumulative percentages. I determined that 3.2 rotations of the y-knob are required to move the stylus all the way from the bottom of the screen to the top. Since there are 12 observations, each observation receives 3.2 / 12 = 0.27 rotations of the y-knob.

To rotate the knobs with any degree of accuracy, I need a gauge that can show me, for example, what 0.27 of a full rotation looks like. I realized that a pie chart would make an excellent gauge. I entered the following data in the worksheet.

pie gauge data

To create the pie chart of these data, I choose Graph > Pie Chart and complete the dialog box as follows.

pie chart dialog box

NOTE: To print the pie chart at the desired size, select the graph and choose File > Page Setup and uncheck Scale to paper size. By default, Minitab stretches or shrinks the graph when it prints so that it takes a up a full page.

page setup dialog box

With the pie chart gauges, a hole-punch, some scissors, and some tape, I was ready to transform my Etch-a-Sketch into an Etch-an-ECDF.

the tools

I cut out the pie gauges, being careful not to mix up the x-gauge with the y-gauge. Then I punched a hole in the center of each gauge as shown below.

punch it

After cutting a slit from the edge of each gauge to the center, I slipped the gauges under the knobs and tapped them into place. I also penned some lines on the knobs to provide reference points. Voilà!

etch-an-ecdf

With my knobs calibrated, I was ready to start plotting. With the stylus in the lower left, I used the gauge to carefully turn the y-knob 0.27 rotations to represent the 8.33% cumulative percentage for the smallest x-value, 13.

turn the y-knob

Then, to advance from x=13 to x=19, I used the x-gauge to carefully turn the x-knob one unit at a time until I had turned it 6 units.

turn the x-knob

I alternated turning the x- and y-knobs the correct number of units. For example, when I reached x=23, I turned the y-knob 3 units instead of just 1, because there are 3 observations that have the value 23. Before I knew it, I had a work of art depicting the very essence of the Carbo sample!

an ecdf of beauty

That was fun. But I have to admit, there are definite limitations to this approach. One huge drawback is that I'll need to recalibrate my knobs to match the range and number of observations in each new sample. Fortunately, Minitab Statistical Software lets you plot an ECDF quickly and easily, without all the cutting and the punching and the taping.

More advantages of creating your ECDF in Minitab

In addition to ease of use, Minitab offers would-be ECDF plotters other advantages as well. For example, you can add fitted lines to show the CDF for a fitted distribution. (We talked about CDFs [Cumulative Distribution Functions] in my previous post, "The Empirical CDF, Part 1: What's a CDF?").

If you hover your cursor over a fitted distribution line, Minitab shows you a table of population percentiles that are estimated from your data.

table of percentiles

To copy the table of percentiles, select an individual fitted line, then right-click and choose Copy Text. (If your graph includes more than one fitted line, click once to select all fitted lines, then click again to select only the desired fitted line.)

And finally, if you create your ECDF in Minitab Statistical Software, you can also easily and quickly create many other useful graphs such as histograms, probability plots, and marginal plots to learn even more about your data.

↧

P-value Roulette: Making Hypothesis Testing a Winner’s Game

March 6, 2017, 5:00 am

≫ Next: Which Statistical Error Is Worse: Type 1 or Type 2?

≪ Previous: The Empirical CDF, Part 2: Software vs. Etch-a-Sketch

Welcome to the Hypothesis Test Casino! The featured game of the house is roulette. But this is no ordinary game of roulette. This is p-value roulette!

Here’s how it works: We have two roulette wheels, the Null wheel and the Alternative wheel. Each wheel has 20 slots (instead of the usual 37 or 38). You get to bet on one slot.

What happens if the ball lands in the slot you bet on? Well, that depends on which wheel we spin. If we spin the Null wheel, you lose your bet. But if we spin the Alternative wheel, you win!

I’m sorry, but we can’t tell you which wheel we’re spinning.

Doesn’t that sound like a good game?

Not convinced yet? I assure you the odds are in your favor if you choose your slot wisely. Look, I’ll show you a graph of some data from the Null wheel. We spun it 10,000 times and counted how many times the ball landed in each slot. As you can see each slot is just as likely as any other, with a probability of about 0.05 each. That means there’s a 95% probability the ball won’t land on your slot, so you have only a 5% chance of losing—no matter what—if we happen to spin the Null wheel.

histogram of p values for null hypothesis

What about that Alternative wheel, you ask? Well, we’ve had quite a few different Alternative wheels over the years. Here’s a graph of some data from one we were spinning last year:

histogram of p values from alternative hypothesis

And just a few months ago, we had a different one. Check out the data from this one. It was very, very popular.

histogram of p-values from popular alternative hypothesis

Now that’s what I call an Alternative! People in the know always picked the first slot. You can see why.

I’m not allowed to show you data from the current game. But I assure you the Alternatives all follow this same pattern. They tend to favor those smaller numbers.

So, you’d like to play? Great! Which slot would you like to bet on?

Is this on the level?

No, I don’t really have a casino with two roulette wheels. My graphs are simulated p-values for a 1-sample t-test. The null hypothesis is that the mean of a process or population is 5. The two-sided alternative is that the mean is different from 5. In my first graph, the null hypothesis was true: I used Minitab to generate random samples of size 20 from a normal distribution with mean 5 and standard deviation of 1. For the other two graphs, the only thing I changed was the mean of the normal distribution I sampled from. For the second graph, the mean was 5.3. For the final graph, the mean was 5.75.

For just about any hypothesis test you do in Minitab Statistical Software, you will see a p-value. Once you understand how p-values work, you will have greater insight into what they are telling you. Let’s see what we can learn about p-values from playing p-value roulette.

Just as you didn’t know whether you are spinning the Null or Alternative wheel, you don’t know for sure whether the null hypothesis is true or not. But basing your decision to reject the null hypothesis on the p-value favors your chance of making a good decision.
If the null hypothesis is true, then any p-value is just as likely as any other. You control the probability of making a Type I error by rejecting only when the p-value falls within a narrow range, typically 0.05 or smaller. A Type I error occurs if you incorrectly reject a true null hypothesis.
If the alternative hypothesis is true, then smaller p-values become more likely and larger p-values become less likely. That’s why you can think of a small p-value as evidence in favor of the alternative hypothesis.
It is tempting to try to interpret the p-value as the probability that the null hypothesis is true. But that’s not what it is. The null hypothesis is either true, or it’s not. Each time you “spin the wheel” the ball will land in a different slot, giving you a different p-value. But the truth of the null hypothesis—or lack thereof—remains unchanged.
In the roulette analogy there were different alternative wheels, because there is not usually just a single alternative condition. There are infinitely many mean values that are not equal to 5; my graphs looked at just two of these.
The probability of rejecting the null hypothesis when the alternative hypothesis is true is called the power of the test. In the 1-sample t-test, the power depends on how different the mean is from the null hypothesis value, relative to the standard error. While you don’t control the true mean, you can reduce the standard error by taking a larger sample. This will give the test greater power.

You Too Can Be a Winner!

To be a winner at p-value roulette, you need to make sure you are performing the right hypothesis test, and that your data fit the assumptions of that test. Minitab’s Assistant menu can help you with that. The Assistant helps you choose the right statistical analysis, provides easy-to-understand guidelines to walk you through data collection and analysis. Then it gives you clear graphical output to let you know how to interpret your p-value, while helping you evaluate whether your data are appropriate, so you can trust your results.

↧

Which Statistical Error Is Worse: Type 1 or Type 2?

March 8, 2017, 11:24 am

≫ Next: Statistical Tools for Process Validation, Stage 3: Continued Process Verification

≪ Previous: P-value Roulette: Making Hypothesis Testing a Winner’s Game

People can make mistakes when they test a hypothesis with statistical analysis. Specifically, they can make either Type I or Type II errors.

As you analyze your own data and test hypotheses, understanding the difference between Type I and Type II errors is extremely important, because there's a risk of making each type of error in every analysis, and the amount of risk is in your control.

What's the worst that could happen? So if you're testing a hypothesis about a safety or quality issue that could affect people's lives, or a project that might save your business millions of dollars, which type of error has more serious or costly consequences? Is there one type of error that's more important to control than another?

Before we attempt to answer that question, let's review what these errors are.

The Null Hypothesis and Type 1 and 2 Errors

When statisticians refer to Type I and Type II errors, we're talking about the two ways we can make a mistake regarding the null hypothesis (Ho). The null hypothesis is the default position, akin to the idea of "innocent until proven guilty." We begin any hypothesis test with the assumption that the null hypothesis is correct.

We commit a Type 1 error if we reject the null hypothesis when it is true. This is a false positive, like a fire alarm that rings when there's no fire.

A Type 2 error happens if we fail to reject the null when it is not true. This is a false negative—like an alarm that fails to sound when there is a fire.

It's easier to understand in the table below, which you'll see a version of in every statistical textbook:

RealityNull (H0) not rejectedNull (H0) rejected Null (H0) is true. Correct conclusion. Type 1 error Null (H0) is false. Type 2 error Correct conclusion.

These errors relate to the statistical concepts of risk, significance, and power.

Reducing the Risk of Statistical Errors

Statisticians call the risk, or probability, of making a Type I error "alpha," aka "significance level." In other words, it's your willingness to risk rejecting the null when it's true. Alpha is commonly set at 0.05, which is a 5 percent chance of rejecting the null when it is true. The lower the alpha, the less your risk of rejecting the null incorrectly. In life-or-death situations, for example, an alpha of 0.01 reduces the chance of a Type I error to just 1 percent.

A Type 2 error relates to the concept of "power," and the probability of making this error is referred to as "beta." We can reduce our risk of making a Type II error by making sure our test has enough power—which depends on whether the sample size is sufficiently large to detect a difference when it exists.

The Default Argument for "Which Error Is Worse"

Let's return to the question of which error, Type 1 or Type 2, is worse. The go-to example to help people think about this is a defendant accused of a crime that demands an extremely harsh sentence.

The null hypothesis is that the defendant is innocent. Of course you wouldn't want to let a guilty person off the hook, but most people would say that sentencing an innocent person to such punishment is a worse consequence.

Hence, many textbooks and instructors will say that the Type 1 (false positive) is worse than a Type 2 (false negative) error. The rationale boils down to the idea that if you stick to the status quo or default assumption, at least you're not making things worse.

And in many cases, that's true. But like so much in statistics, in application it's not really so black or white. The analogy of the defendant is great for teaching the concept, but when we try to make it a rule of thumb for which type of error is worse in practice, it falls apart.

So Which Type of Error Is Worse, Already?

I'm sorry to disappoint you, but as with so many things in life and statistics, the honest answer to this question has to be, "It depends."

In one instance, the Type I error may have consequences that are less acceptable than those from a Type II error. In another, the Type II error could be less costly than a Type I error. And sometimes, as Dan Smith pointed out in Significancea few years back with respect to Six Sigma and quality improvement, "neither" is the only answer to which error is worse:

Most Six Sigma students are going to use the skills they learn in the context of business. In business, whether we cost a company $3 million by suggesting an alternative process when there is nothing wrong with the current process or we fail to realize $3 million in gains when we should switch to a new process but fail to do so, the end result is the same. The company failed to capture $3 million in additional revenue.

Look at the Potential Consequences

Since there's not a clear rule of thumb about whether Type 1 or Type 2 errors are worse, our best option when using data to test a hypothesis is to look very carefully at the fallout that might follow both kinds of errors. Several experts suggest using a table like the one below to detail the consequences for a Type 1 and a Type 2 error in your particular analysis.

Null Type 1 Error: H0 true, but rejected Type 2 Error: H0 false, but not rejectedMedicine A does not relieve Condition B. Medicine A does not relieve Condition B, but is not eliminated as a treatment option. Medicine A relieves Condition B, but is eliminated as a treatment option. Consequences Patients with Condition B who receive Medicine A get no relief. They may experience worsening condition and/or side effects, up to and including death. Litigation possible. A viable treatment remains unavailable to patients with Condition B. Development costs are lost. Profit potential is eliminated.

Whatever your analysis involves, understanding the difference between Type 1 and Type 2 errors, and considering and mitigating their respective risks as appropriate, is always wise. For each type of error, make sure you've answered this question: "What's the worst that could happen?"

To explore this topic further, check out this article on using power and sample size calculations to balance your risk of a type 2 error and testing costs, or this blog post about considering the appropriate alpha for your particular test.

↧

Statistical Tools for Process Validation, Stage 3: Continued Process Verification

March 10, 2017, 5:00 am

≫ Next: Predicting the 2017 NCAA Tournament

≪ Previous: Which Statistical Error Is Worse: Type 1 or Type 2?

In its industry guidance to companies that manufacture drugs and biological products for people and animals, Process Validation Stages the Food and Drug Administration (FDA) recommends three stages for process validation: Process Design, Process Qualification, and Continued Process Verification. In this post, we we will focus on that third stage.

Stage 3: Continued Process Verification

Per the FDA guidelines, the goal of this third and final stage of Process Validation is to provide:

Continual assurance that the process remains in a state of control – the validated state – during commercial manufacture...the collection and evaluation of information and data about the performance of the process will allow detection of undesired process variability. Evaluating the performance of the process identifies problems and determines whether action must be taken to correct, anticipate, and prevent problems so that the process remains in control.

Example: Monitor a Process with Control Charts

Suppose you are responsible for monitoring an oral tablet manufacturing process. You need to demonstrate that hardness is stable over time, and detect if the process variation has shifted and therefore requires attention.

You also want to make sure production line operators do not overreact to minor changes in the data, which are inherent in routine variability. Avoiding overreaction prevents unnecessary process adjustments that may actually result in an unintentional increase in variability.

You sample five tablets per hour, measure their hardness, and then enter the data into Minitab Statistical Software to create an Xbar-R control chart.

ControlChart

This Xbar-R chart does not reveal any points flagged in red, and therefore shows that the process is in statistical control. You can conclude that you are maintaining the validated state of the process, and that there aren’t any unwanted, unusual shifts in either the process mean (per the upper Xbar chart) or variation (per the lower R chart) that have been detected.

If the control chart had revealed an out-of-control state—a process exposed to unanticipated sources of variation—then next steps would include characterizing the issue and conducting a root cause investigation. Was there a change in material characteristics? Is there an equipment maintenance or calibration issue? Or is there some other source of variability that provoked a process shift?

Ensuring Compliance

Failure to detect undesirable process variation can be mitigated with routine monitoring and control charting. In addition to control charts and the statistical tools commonly used for the Process Design and Process Qualification stages, there are other useful statistical techniques to support you in your process validation efforts.

For example, Minitab also includes acceptance sampling to help you calculate the number of samples to take and use a randomly drawn sample of product to determine whether to accept or reject an entire lot.

If you don’t yet have Minitab, try it free for 30 days and see for yourself all that it offers for process validation and how easy it is to use.

↧

Predicting the 2017 NCAA Tournament

March 13, 2017, 12:38 pm

≫ Next: What to Do When Your Data's a Mess, part 1

≪ Previous: Statistical Tools for Process Validation, Stage 3: Continued Process Verification

2016 Final Four Predictions can be a tricky thing. Consider trying to predict the number rolled by 2 six-sided dice. We know that 7 is the most likely outcome. We know the exact probability each number has of being rolled. If we rolled the dice 100 times, we could calculate the expected value for the number of times each value would be rolled. However, even with all that information, we can't definitively predict the value of an individual roll. The process includes random variation that we can't predict. At best, all we can do is make an educated guess.

The same logic applies to trying to predict a basketball tournament. We can know who the best teams are and we can model the probability they have of advancing. But just like rolling dice, the process involves random variation that makes predicting an individual game hard. We can't predict when a team is going to catch fire and hit almost 60% of their 3-point shots. And we can't predict when a team that normally forces turnovers on 25% of its opponents possessions is going to do so only 10% of the time. At best, all we can do is make an educated guess.

But hey, educated guessing is still better than completely guessing! Plus it's a lot of fun. So let's get started!

I’ll be using the Sagarin Predictor Ratings to determine the probability each team has of advancing in the NCAA tournament using a binary logistic model created with Minitab Statistical Software. You can find the details of how the probabilities are being calculated here.

Before we start, I’d also like to mention one other set of basketball ratings, called the Pomeroy Ratings. Both the Sagarin ratings and the Pomeroy ratings have proven to be pretty accurate in predicting college basketball games. But Ken Pomeroy always breaks down the tournament using his system. So instead of duplicating his numbers, I like to use the Sagarin predictor ratings. But I’ll be sure to mention places where the two systems disagree, and you can select the one you want to go with!

Alright, enough with the small talk. Let’s get to the statistics!

East

The following table has the probabilities each team in the East Region has of advancing in each round (up to the Final Four).

Team

2nd Round

Sweet 16

Elite 8

Final 4

(1) Villanova

99%

73%

45%

30%

(2) Duke

97% 72% 43% 20%

(5) Virginia

89% 51% 24%

14%

(4) Florida

90%

46%

20%

12%

(3) Baylor

91%

55%

28%

10%

(6) SMU

79%

39%

18%

(8) Wisconsin

76%

24%

10%

(10) Marquette

50%

14%

(7) South Carolina

50%

14%

(9) Virginia Tech

24%

3.3%

0.2%

(11) USC/Providence

21%

0.1%

(12) UNC Wilmington

11%

0.5%

0.1%

(13) East Tenn St

10%

0.1%

< 0.1%

(14) New Mexico St

0.2%

< 0.1%

(15) Troy

0.3%

<0.1%

< 0.1%

(16) Mt. St. Marys/New Orleans

0.1%

< 0.1%

Congratulations on the overall number 1 seed Villanova—now here comes the hardest region in the tournament! The east region features the highest rated 8 seed (Wisconsin at #17), 5 seed (Florida at #10) and 4 seed (Virginia at #7). Villanova is a great team (ranked #2 in Sagarin and Pomeroy), but their path is going to be very hard right out of the gate. If Wisconsin defeats Virginia Tech, they'd have a 30% chance of knocking off the Wildcats. And both Virginia and Florida would have a 40% chance of beating Villanova. And we haven't even mentioned the 2 seed yet!

That 2 seed would, of course, be Duke. They are ranked #8 in the Sagarin Predictor rankings. But despite being 6 spots lower than Villanova, you'll see that they have a similar probability of reaching the Sweet Sixteen and Elite Eight. That's because their path is much easier. Neither Marquette or South Carolina are as good as Wisconsin, and Baylor is a pretty weak 3 seed. In fact, the Sagarin Ratings would have SMU as only a 1-point underdog to Baylor, and the Pomeroy Ratings would actually favor SMU! So don't pick Baylor to go too far in your bracket.

But getting back to Duke, their ranking might actually be a little low. Duke lost a handful of games this year when they were missing Grayson Allen and Coach Krzyzewski. Had Coach K been there the entire year and had Grayson Allen been healthy and, er, "well behaved", this team would probably had performed better in some of those losses. And since they're both back for the tournament, it stands to reason this team might actually be better than #8. But how much better remains to be seen.

If you're looking for early upsets, this probably isn't the region for you. Both UNC Wilmington and East Tennessee State are good teams, but they drew brutal opening round opponents. The Pomeroy Ratings give those teams a better chance than shown here (more on that later), but even then the best probability is UNC Wilmington having a 22% chance of beating Virginia. It's possible (especially if Virginia has another offensive funk they've been prone to this year) but the upsets are more likely to come in the later rounds.

Speaking of upsets in later rounds, SMU is a team capable of making a run. I've already mentioned that they're capable of beating Baylor, but the Pomeroy Ratings actually would favor them against Duke too! (Although the same caveat I gave on Duke earlier would also apply to the Pomeroy Ratings.) Having SMU in your Sweet Sixteen or even Elite Eight isn't a poor choice if you want to pick some chaos.

So overall, your best bet is taking Villanova or Duke in this region. Although Virginia and Florida both make for interesting dark horses. The problem is they will most likely have to play each other in the 2nd round, and that game is basically a tossup. So good luck picking which one to go with!

West

Team

2nd Round

Sweet 16

Elite 8

Final 4

(1) Gonzaga

99%

87%

56%

43%

(4) West Virginia

95%

70%

33%

24%

(3) Florida St

93%

67%

38%

12%

(2) Arizona

96%

54%

29%

(7) Saint Mary's

72%

37%

19%

(5) Notre Dame

81%

27%

(11) Xavier

54%

18%

(6) Maryland

46%

13%

0.6%

(8) Northwestern

50%

0.5%

(9) Vanderbilt

50%

0.5%

(10) VCU

28%

0.4%

(12) Princeton

19%

0.2%

0.1%

(13) Bucknell

0.1%

<0.1%

(14) Florida Gulf Coast

0.1%

<0.1%

(15) North Dakota

0.3%

<0.1%

(16) South Dakota St

0.2%

<0.1%

Gonzaga has been to the NCAA tournament 19 times. Despite often being a higher seed, they have a winning record in tournament games of 24-19 and have made 7 Sweet Sixteens and 2 Elite Eights. However, despite all that success, they have never been to a Final Four. Well, that could change this year. This is by far the best team Gonzaga has ever had. They're ranked #1 in both Sagarin and Pomeroy. They drew the weakest 2 seed in the tournament. And, their Pomeroy Adjusted Efficiency Margin (the value that the ratings are based on) is the 4th best in the history of the Pomeroy Ratings. They trail only 2002 Duke, 2008 Kansas, and 2015 Kentucky. Of course, none of those teams won the tournament because single elimination tournaments can be like that. But make no mistake about it—this Gonzaga team is great.

Their stiffest competition actually comes from the 4 seed. West Virginia is ranked 4th in Sagarin and 5th in Pomeroy. The Mountaineers lost a lot of close games this year. And because close games are more a result of luck than ability, West Virginia is a much better team than their record indicates. Gonzaga and West Virginia will most likely meet in the Sweet Sixteen, and the winner of that game will be favored in their next game to reach the Final Four.

In the bottom half of the bracket, Arizona and Florida State are very week 2 and 3 seeds respectively. Florida State is 18th in Sagarin and 19th in Pomeroy. Arizona is 21st in Sagarin and 20th in Pomeroy. That leaves the door wide open for Saint Mary's to make a run. Sagarin would have the Gaels as slight underdogs to Florida State and Arizona, and Pomeroy would actually favor them in both games! If you want to root for the little guy, having Saint Mary's vs. Gonzaga in the Elite Eight wouldn't be a terrible pick. I mean, that's if you actually still count Gonzaga as a little guy anymore.

This region also has great potential for upsets in the first round. It favors 11-seeded Xavier over Maryland. And it gives Princeton a 19% chance of beating Notre Dame. That's not great, but on the bright side Pomeroy is much more optimistic, giving Princeton a 31% chance of winning. So I'll take this opportunity to illustrate the main difference between the Pomeroy Ratings and the Sagarin Ratings. It's the mid majors. I divided all 68 teams in the tournament into teams from mid-major conferences and power conferences. Then I looked at the difference in their rankings in the two systems. Here are the results.

Boxplot

The average difference for teams in the power conferences is 0, and there isn't much variation. But for mid-major teams, they are on average ranked 8.5 spots lower in Sagarin than Pomeroy. So when a mid major plays a power conference team, the Pomeroy ratings are going to give the mid major a better chance of winning. In our Princeton/Notre Dame example, Pomeroy says Notre Dame should be favored by 5 points where as Sagarin has it at 8.5. To see who might be closer I decided to see what the spread was in Vegas. And wouldn't you know it. They put it right in the middle at 7 points. So what should you do? Personally, since West Virginia will be a heavy favorite in the second round anyway, I say pick Princeton and root for the upset. Go Nerds!

In this region, you probably want to pick either Gonzaga or West Virginia. Sure, Florida State and Arizona have shots too, but their probabilities are pretty low considering they're a 3 and 2 seed. Chances are other people in your pool are going to pick Florida State and Arizona at a rate higher than 12% and 8%, respectively. So going with Gonzaga or West Virginia (and maybe taking a chance with Saint Mary's in the Elite Eight) should give you an edge.

Midwest

Team

2nd Round

Sweet 16

Elite 8

Final 4

(1) Kansas

97%

76%

46%

26%

(2) Louisville

98%

65%

41%

23%

(3) Oregon

95%

65%

32%

15%

(4) Purdue

89%

51%

26%

14%

(5) Iowa St

81%

42%

20%

10%

(6) Creighton

68%

27%

10%

(7) Michigan

50%

18%

(10) Oklahoma St

50%

18%

(8) Miami FL

58%

15%

(9) Michigan St

58%

15%

(11) Rhode Island

32%

0.4%

(12) Nevada

19%

0.2%

(13) Vermont

11%

0.3%

< 0.1%

(14) Iona

<0.1%

< 0.1%

(15) Jacksonville St

0.1%

<0.1%

< 0.1%

(16) NC Central/UC Davis

0.2%

< 0.1%

Kansas is the weakest 1 seed in the tournament, ranked 9th in Sagarin and 10th in Pomeroy. And thus you'll see they have the lowest probability of reaching the Final Four of all the 1 seeds. Louisville is actually ranked ahead of Kansas, but has a slightly lower probability of reaching the Final Four due to a harder path. Both Michigan and Oklahoma State are capable of knocking off Louisville in the 2nd round. The problem with picking that upset is you have to decide whether to take Michigan or Oklahoma State, and their opening game is a coin flip!

The statistics have Oregon as the 3rd most likely team to win this region, but that comes with an asterisk. In their next to last game of the season, Oregon senior Chris Boucher tore his ACL. He was the 3rd leading scorer, 2nd leading rebounder, and leading shot blocker. The statistics don't know Oregon has to play the rest of the season without him, so their chances are lower than shown here. Be wary of picking Oregon to go too far in your bracket.

That leaves Purdue as a viable option if you're looking for a dark horse. They would only be a 1-point underdog to Kansas according to Sagarin, so that is definitely a winnable game. And Iowa State already won on the road against Kansas earlier this season, so expect the Jayhawks to have their hands full in the Sweet Sixteen. Of course, Iowa State plays a very good Nevada team in the opening round. So if you're planning on putting Purdue in the Sweet Sixteen, picking Nevada as an upset isn't a bad option.

Overall, this region is pretty open. Louisville would be favored in any potential matchup, but they will have a tough game right off the bat wtih either Michigan or Oklahoma State. So if you wanted to go crazy and pick something like Michigan, Oklahoma State, or even Creighton in the Final Four, this would be the region to do it. But most likely, it'll be Kansas or Louisville.

South

Team

2nd Round

Sweet 16

Elite 8

Final 4

(1) North Carolina

99%

86%

69%

44%

(2) Kentucky

97%

60%

39%

21%

(3) UCLA

95%

58%

25%

10%

(10) Wichita St

77%

35%

21%

10%

(4) Butler

91%

60%

18%

(6) Cincinnati

68%

32%

12%

(5) Minnesota

65%

28%

(8) Arkansas

56%

0.9%

(9) Seton Hall

44%

0.5%

(7) Kansas St/Wake Forest

32%

10%

0.6%

(12) Middle Tenn St

35%

11%

0.3%

(7) Dayton

23%

0.3%

(13) Winthrop

0.1%

< 0.1%

(14) Kent St

0.5%

<0.1%

< 0.1%

(15) Northern Kentucky

0.1%

<0.1%

< 0.1%

(16) Texas Southern

0.1%

< 0.1%

This is the region that Villanova should have been given with the overall #1 seed. North Carolina has a cakewalk to the Elite Eight. Gonzaga is the only other team with a greater than 50% chance of reaching the Elite Eight (56%) and North Carolina's probability blows that away at 69%! Of course, once they get there they'll face a tough game. But who will it be?

Kentucky and UCLA are the two most likely teams to face North Carolina in the Elite Eight. But who is the next team after that? 10 seeded Wichita State! Yep, the Shockers are ranked #11 in Sagarin and #8 in Pomeroy. In fact, both systems would favor Wichita State over UCLA if those two teams played. And they'd only be a slight underdog to Kentucky. Three years ago an undefeated Wichita State team got the #1 seed only to lose to a very under-seeded Kentucky team in the 2nd round. This year they get a chance for payback, as they could now be the under-seeded team that pulls the 2nd round upset.

Every year there is a 12 seed that beats a 5 seed, and this region gives us our best chance. Sagarin gives Middle Tennessee State a 35% chance of beating Minnesota, but that's the lowest odds you'll find. Pomeroy gives Middle Tennessee State a 45% chance of winning. And in Vegas, the line is a pick em, meaning they think this game is a coin flip! That's an upset you should absolutely consider, and it's a no-brainer to pick if your pool gives you bonus points for upsets. In fact, if you get bonus points for upsets, go ahead and put Middle Tennessee State in the Sweet Sixteen. With North Carolina being such a heavy favorite to win their Sweet Sixteen game anyway, go ahead and try to maximize those bonus points!

North Carolina and Kentucky are both top 5 teams in Sagarin and Pomeroy, so choosing either to go to the Final Four is a good selection. However, North Carolina has the much easier path. And of course, look out for Wichita State. They definitely have the potential to "shock" the world and win this region.

Final Four

Team

Final Four

Semifinal

Champion

(1) Gonzaga

43%

29%

18%

(1) North Carolina

44%

28%

14%

(1) Villanova

30%

16%

10%

(4) West Virginia

24%

16%

(2) Kentucky

21%

12%

(1) Kansas

26%

12%

(2) Louisville

23%

11%

(2) Duke

20%

(5) Virginia

14%

(4) Florida

12%

The top 5 ranked teams in the Sagarin ratings are Gonzaga, Villanova, North Carolina, West Virginia, and Kentucky (in that order). So it's no surprise that those teams have the top 5 probabilities of winning the entire tournament. The only difference in the order is North Carolina gets a bump over Villanova due to the fact that they have a much easier path. But it's really a wide open tournament, as the favorite has only a 18% chance of winning the title. That's a far cry from the 41% chance Kentucky had as the top team two years ago. So when you pick your champion, try to think about who the other people entering your pool will choose. If you don't think anybody in your pool actually believes in Gonzaga, then they are the clear the choice for you. If you're entering a pool with hundreds of entries, West Virginia could be a good selection since there will be a ton of entries picking the higher seeds. Villanova could also be a good pick, if you think most people will avoid them since they won it last year. The choice is yours. So good luck, and remember, you're not just taking guesses.

You're taking educated guesses!

↧

What to Do When Your Data's a Mess, part 1

March 15, 2017, 7:52 am

≫ Next: Who's More (or Less) Irish?

≪ Previous: Predicting the 2017 NCAA Tournament

Isn't it great when you get a set of data and it's perfectly organized and ready for you to analyze? I love it when the people who collect the data take special care to make sure to format it consistently, arrange it correctly, and eliminate the junk, clutter, and useless information I don't need.

Messy Data You've never received a data set in such perfect condition, you say?

Yeah, me neither. But I can dream, right?

The truth is, when other people give me data, it's typically not ready to analyze. It's frequently messy, disorganized, and inconsistent. I get big headaches if I try to analyze it without doing a little clean-up work first.

I've talked with many people who've shared similar experiences, so I'm writing a series of posts on how to get your data in usable condition. In this first post, I'll talk about some basic methods you can use to make your data easier to work with.

Preparing Data Is a Little Like Preparing Food

I'm not complaining about the people who give me data. In most cases, they aren't statisticians and they have many higher priorities than giving me data in exactly the form I want.

The end result is that getting data is a little bit like getting food: it's not always going to be ready to eat when you pick it up. You don't eat raw chicken, and usually you can't analyze raw data, either. In both cases, you need to prepare it first or the results aren't going to be pretty.

Here are a couple of very basic things to look for when you get a messy data set, and how to handle them.

Kitchen-Sink Data and Information Overload

Frequently I get a data set that includes a lot of information that I don't need for my analysis. I also get data sets that combine or group information in ways that make analyzing it more difficult.

For example, let's say I needed to analyze data about different types of events that take place at a local theater. Here's my raw data sheet:

April data sheet

With each type of event jammed into a single worksheet, it's a challenge to analyze just one event category. What would work better? A separate worksheet for each type of occasion. In Minitab Statistical Software, I can go to Data > Split Worksheet... and choose the Event column:

split worksheet

And Minitab will create new worksheets that include only the data for each type of event.

separate worksheets by event type

Minitab also lets you merge worksheets to combine items provided in separate data files.

Let's say the data set you've been given contains a lot of columns that you don't need: irrelevant factors, redundant information, and the like. Those items just clutter up your data set, and getting rid of them will make it easier to identify and access the columns of data you actually need. You can delete rows and columns you don't need, or use the Data > Erase Variables tool to make your worksheet more manageable.

I Can't See You Right Now...Maybe Later

What if you don't want to actually delete any data, but you only want to see the columns you intend to use? For instance, in the data below, I don't need the Date, Manager, or Duration columns now, but I may have use for them in the future:

unwanted columns

I can select and right-click those columns, then use Column > Hide Selected Columns to make them disappear.

hide selected columns

Voila! They're gone from my sight. Note how the displayed columns jump from C1 to C5, indicating that some columns are hidden:

hidden columns

It's just as easy to bring those columns back in the limelight. When I want them to reappear, I select the C1 and C5 columns, right-click, and choose "Unhide Selected Columns."

Data may arrive in a disorganized and messy state, but you don't need to keep it that way. Getting rid of extraneous information and choosing the elements that are visible can make your work much easier. But that's just the tip of the iceberg. In my next post, I'll cover some more ways to make unruly data behave.

↧

Who's More (or Less) Irish?

March 17, 2017, 3:01 am

≫ Next: Six Sigma Concepts and Metrics (Part 1)

≪ Previous: What to Do When Your Data's a Mess, part 1

B'gosh n' begorrah, it's St. Patrick's Day today!

The day that we Americans lay claim to our Irish heritage by doing all sorts of things that Irish people never do. Like dye your hair green. Or tell everyone what percentage Irish you are.

Despite my given name, I'm only about 15% Irish. So my Irish portion weighs about 25 pounds. It could be the portion that hangs over my belt due to excess potatoes and beer.

Today, many American cities compete for the honor of being "the most Irish." Who deserves to take top honors? Data from the U.S. Census Bureau can help us decide.

The Minitab bar chart below shows the percentage people with Irish ancestry in major U.S. cities.

The reference line at 11.1% shows the national average. Any city above that has the right to wear green on its bar. (My place of birth, Minneapolis, comes in just below the national average. Close, but no green cigar!)

It's surprising that Bostonians are out-Irished, percentage-wise, by Pittsburghers. But die-hard Gaels from Beantown can take comfort in the margin of error for these estimates.

For Pittsburgh the actual U.S. Census Bureau estimate is 16.0% ± 0.7%. For Boston, the estimate is 15.5% ± 0.5%. So, statistically speaking, neither city can claim with confidence that it's the most Irish of large cities in the U.S.

New Yorkers and Chicagoans could also take issue with the above chart. After all, you could argue that it's sheer brute numbers, rather than percentages, that give cities their Irish heritage heft. The Minitab bar chart below shows they'd have a point.

The reference line represents the population of Limerick, the 3rd largest city in the Republic of Ireland.

The number of those with Irish ancestry in the Big Apple comprises a large city by itself (≈ 400,000). Together, New York and Chicago have more citizens with Irish ancestry (≈ 600,000) than the city of Dublin (≈ 525,000).

Based on this chart, even lads and lassies from Phoenix can proudly dye their hair kelly green. Although they probably won't have much luck looking for four-leaf clovers in the desert.

Notice that only Philadelphia and Boston get to wear green in both bar charts!

Note: If you want to find out whether your city can wear green on either bar chart, download a Minitab project with the data here. Then go to U.S. Census Bureau and use the Advanced Search for race and ancestry for your city. In the Search results, use the 2012 ACS 3-year estimates for Selected Social Characteristics in the U.S. In Minitab, add the name of your city and the % and count estimates for Irish ancestry to the worksheet. Then right-click the bar chart and select Update graph now. If your city deserves to wear green, double-click the bar and change the color.

All the world's a stage and most of us are desperately unrehearsed.
~ Sean O'Casey

If we had more time, we could debate these estimates further over a green beer.

What happens if you include all the surrounding metropolitan areas? Or people with Scotch-Irish ancestry? Is self-reported ancestry in a survey even accurate? (Not if you ask people about their Irish ancestry today!)

But ultimately it's not really about the numbers. It’s about what’s in the heart.

That's what makes all Americans part Irish. And part Chinese …part Lebanese…part Nigerian …part Navajo...part Mexican...part Swedish...part Filipino…

And gives usthe true spirit with which to say... E pluribus unum...and Happy St. Patrick’s Day!

↧

Six Sigma Concepts and Metrics (Part 1)

March 17, 2017, 5:00 am

≫ Next: What to Do When Your Data's a Mess, part 2

≪ Previous: Who's More (or Less) Irish?

Diamond Clip Art Did you know the most popular diamond cut is probably the Round Brilliant Cut? The first early version of what would become the modern Round Brilliant Diamond Cut was introduced by an Italian named Vincent Peruzzi, sometime in the late 17th century. In the early 1900s, the angles for an "ideal" diamond cut were designed by Marcel Tolkowsky. Minor changes have been made since then, but the angles for "ideal" brilliant cut diamonds have stood the test of time and are still similar to Tolkowsky's formula.

Six Sigma, as you know, evolved from tried and true quality methods developed by the pioneering quality greats from the early 1900s. These methods have been continually improved to be what they are today. Just like a diamond is valuable, has many sides, and withstands the test of time, so has Six Sigma.

I appreciate this quote from Lord Kelvin because it speaks to the very core of Six Sigma Concepts and Metrics:

I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind… – Lord Kelvin1

If you can’t measure your process or product and express it in numbers, you don’t know much about it. Sometimes it takes imagination to devise a way to measure your process or product. In most cases, it can be done!

Like a diamond, Six Sigma has many facets. It can be expressed in a variety of ways, including philosophy, set of tools, methodology, and metrics.

Six Sigma as a Philosophy

Six Sigma practitioners seek to understand work as a process that can be defined, measured, analyzed, improved and controlled. These processes use inputs (X) to produce outputs (Y). If you can understand how to control the inputs, you will learn how to control the outputs. We use the mathematical equation y = f(x) to express the relationship between inputs and outputs. In the Fitted Line Plot below, as the X variable (Temperature) increases, the Y output (Strength) decreases:

Fitted Line Plot

This Fitted Line Plot is a simple example, and in reality the processes we study are usually quite a bit more complicated. We might have 5, 20 or more X's identified. But as we work through the DMAIC process, we are narrowing down the number of X's to the critical few that have leverage or influence over the process. Then we seek to understand and control the relationship.

Six Sigma as a Methodology

Six Sigma practitioners typically use the DMAIC (Define, Measure, Analyze, Improve, and Control) methodology to identify and analyze a problem, resulting in the implementation of effective and efficient solutions.

I like to compare the Six Sigma DMAIC process to the jigsaw puzzle process. Most people use the same methodology to work a jigsaw puzzle. You open the box, dump out the pieces, turn them face up, find the corners, work the edges, then start working the middle usually by looking at the picture on the front of the box for clues. While not the only way to put together a jigsaw puzzle, it is an efficient and effective process.

DMAIC Model

When used appropriately, the Six Sigma DMAIC methodology should improve processes and/or products by making them more effective, more efficient, or both.

Six Sigma as a Set of Tools

Six Sigma practitioners use qualitative and quantitative tools to solve tough problems and implement process improvements. Qualitative tools include Voice of the Customer, Brainstorming, and Multi-voting, while the quantitative tools include Control Charts, Value Stream Mapping, and ANOVA.

Six Sigma Tools

If you are like me, my tool bag runneth over. I have so many tools, I can’t possibly use them all on one project, nor would I want to. Instead we need to pick and choose which tools to use that will provide information about the problem we are trying to solve. Like when you are working a jigsaw puzzle, sometimes you have to put a piece back because you can’t find where it goes. Then you pick up another piece and keep trying.

Keep in mind, not every tool you select will reveal a clue to solve the Six Sigma puzzle. It is just as important to find out which path you don’t need to go down as it is to find out which path to follow further.

Six Sigma as a Metric

Although you want to keep a well-rounded perspective for the metrics on your dashboard, Six Sigma metrics tend to measure process capability and performance. These metrics are all related to each other and focus on measuring the Big Y.

Six Sigma Metrics

Which metric you choose will depend on the type of data you have. Some of the metrics require discrete data, and others require continuous. It's similar to how a golfer will select a club from their golf bag based on the parameters of the shot, which usually includes the distance needed and field terrain. Most golfers will have some variation of the standard set of clubs in their bag, including drivers, irons, wedges, and putters.

In Part 2 of this series, I will discuss Six Sigma Concepts about variation. Stay tuned to learn more about the differences between Short/Long Term Variation and Between/Within Variation.

1William Thomson, most famously known as Lord Kelvin, was an Irish and British mathematical physicist and engineer who was born in Belfast in 1824. Lord Kelvin is most widely recognized for determining the correct value of absolute zero as approximately −273.15 Celsius.
– Wikipedia

↧

What to Do When Your Data's a Mess, part 2

March 22, 2017, 5:00 am

≫ Next: Trouble Starting an Analysis? Graph Your Data with an Individual Value Plot

≪ Previous: Six Sigma Concepts and Metrics (Part 1)

In my last post, I wrote about making a cluttered data set easier to work with by removing unneeded columns entirely, and by displaying just those columns you want to work with now. But too much unneeded data isn't always the problem.

What can you do when someone gives you data that isn't organized the way you need it to be?

That happens for a variety of reasons, but most often it's because the simplest way for people to collect data is with a format that might make it difficult to assess in a worksheet. Most statistical software will accept a wide range of data layouts, but just because a layout is readable doesn't mean it will be easy to analyze.

You may not be in control of how your data were collected, but you can use tools like sorting, stacking, and ordering to put your data into a format that makes sense and is easy for you to use.

Decide How You Want to Organize Your Data

Depending on how its arranged, the same data can be easier to work with, simpler to understand, and can even yield deeper and more sophisticated insights. I can't tell you the best way to organize your specific data set, because that will depend on the types of analysis you want to perform, and the nature of the data you're working with. However, I can show you some easy ways to rearrange your data into the form that you select.

Unstack Data to Make Multiple Columns

The data below show concession sales for different types of events held at a local theater.

stacked data

If we wanted to perform an analysis that requires each type of event to be in its own column, we can choose Data > Unstack Columns... and complete the dialog box as shown:

unstack columns dialog

Minitab creates a new worksheet that contains a separate column of Concessions sales data for each type of event:

Unstacked Data

Stack Data to Form a Single Column (with Grouping Variable)

A similar tool will help you put data from separate columns into a single column for the type of analysis required. The data below show sales figures for four employees:

Select Data > Stack > Columns... and select the columns you wish to combine. Checking the "Use variable names in subscript column" will create a second column that identifies the person who made each sale.

Stack columns dialog

When you press OK, the sales data are stacked into a single column of measurements and ready for analysis, with Employee available as a grouping variable:

stacked columns

Sort Data to Make It More Manageable

The following data appear in the worksheet in the order in which individual stores in a chain sent them into the central accounting system.

When the data appear in this uncontrolled order, finding an observation for any particular item, or from any specific store, would entail reviewing the entire list. We can fix that problem by selecting Data > Sort... and reordering the data by either store or item.

sorted data by item sorted data by store

Merge Multiple Worksheets

What if you need to analyze information about the same items, but that were recorded on separate worksheets? For instance, if one group was gathering historic data about all of a corporation's manufacturing operations, while another was working on strategic planning, and your analysis required data from each?

two worksheets

You can use Data > Merge Worksheets to bring the data together into a single worksheet, using the Division column to match the observations:

merging worksheets

You can also choose whether or not multiple, missing, or unmatched observations will be included in the merged worksheet.

Reorganizing Data for Ease of Use and Clarity

Making changes to the layout of your worksheet does entail a small investment of time, but it can bring big returns in making analyses quicker and easier to perform. The next time you're confronted with raw data that isn't ready to play nice, try some of these approaches to get it under control.

In my next post, I'll share some tips and tricks that can help you get more information out of your data.

↧

Trouble Starting an Analysis? Graph Your Data with an Individual Value Plot

March 23, 2017, 5:00 am

≫ Next: Six Sigma Concepts and Metrics (Part 2 of 2)

≪ Previous: What to Do When Your Data's a Mess, part 2

You've collected a bunch of data. It wasn't easy, but you did it. Yep, there it is, right there...just look at all those numbers, right there in neat columns and rows. Congratulations.

I hate to ask...but what are you going to do with your data?

If you're not sure precisely what to do with the data you've got, graphing it is a great way to get some valuable insight and direction. And a good graph to start with is an individual value plot, which you can create in Minitab Statistical Software by going to Graph > Individual Value Plot.

How can individual value plots help me?

There are other graphs you could start with, so what makes the individual value plot such a strong contender? That fact it lets you view important data features, find miscoded values, and identify unusual cases.

In other words, taking a look at an individual value plot can help you to choose the appropriate direction for your analysis and to avoid wasted time and frustration.

IDENTIFY INDIVIDUAL VALUES

Many people like to look at their data in boxplots, and you can learn many valuable things from those graphs. Unlike boxplots, individual value plots display all data values and may be more informative than boxplots for small amounts of data.

boxplot of length

The boxplots for the two variables look identical.

individual value plot

The individual value plot of the same data shows that there are many more values for Batch 1 than for Batch 2.

You can use individual value plots to identify possible outliers and other values of interest. Hover the cursor over any point to see its exact value and position in the worksheet.

clustered data distribution

Individual value plots can also clearly illustrate characteristics of the data distribution. In this graph, most values are in a cluster between 4 and 10. Minitab can jitter (randomly nudge) the points horizontally, so that one value doesn’t obscure another. You can edit the plot to turn on or turn off jitter.

MAKE GROUP COMPARISONS

Because individual value plots display all values for all groups at the same time, they are especially helpful when you compare variables, groups, and even subgroups.

time vs. shift plot

This plot shows the diameter of pipes from two lines over four shifts. You can see that the diameters of pipes produced by Line 1 seem to increase in variability across shifts, while the diameters of pipes from Line 2 appear more stable.

SUPPORT OTHER ANALYSES

An individual value plot is one of the built-in graphs that are available with many Minitab statistical analyses. You can easily display an individual value plot while you perform these analyses. In the analysis dialog box, simply clickGraphs and check Individual Value Plot.

Some built-in individual value plots include specific analysis information. For example, the plot that accompanies a 1-sample t-test displays the 95% confidence interval for the mean and the reference value for the null hypothesis mean. These plots give you a graphical representation of the analysis results.

horizontal plot

This plot accompanies a 1-sample t-test. All of the data values are between 4.5 and 5.75. The reference mean lies outside of the confidence interval, which suggests that the population mean differs from the hypothesized value.

Individual Value Plot: A Case Study

Suppose that salad dressing is bottled by four different machines and that you want to make sure that the bottles are filled correctly to 16 ounces. You weigh 30 samples from each machine. You plan to run an ANOVA to see if the means of the samples from each machine are equal. But, first, you display an individual value plot of the samples to get a better understanding of the data.

data

Choose Graph > Individual Value Plot.
Under One Y, choose With Groups.
Click OK.
In Graph variables, enter Weight.
In Categorical variables for grouping, enter Machine.
Click Data View.
Under Data Display, check Interval bar and Mean symbol.
Click OK in each dialog box.

individual value plot of weight

The mean fill weight is about 16 ounces for Fill2, Fill3, and Fill4, with no suspicious data points. For Fill1, however, the mean appears higher, with a possible outlier at the lower end.

Before you continue with the analysis, you may want to investigate problems with the Fill1 machine.

Putting individual value plots to use

Use Minitab’s individual value plot to get a quick overview of your data before you begin your analysis—especially if you have a small data set or if you want to compare groups. The insight that you gain can help you to decide what to do next and may save you time exploring other paths.

For more information on individual value plots and other Minitab graphs, see Minitab Help.

↧

Six Sigma Concepts and Metrics (Part 2 of 2)

March 24, 2017, 5:00 am

≫ Next: What to Do When Your Data's a Mess, part 3

≪ Previous: Trouble Starting an Analysis? Graph Your Data with an Individual Value Plot

In Part 1 of this blog series, I compared Six Sigma to a diamond because both are valuable, have many facets and have withstood the test of time. I also explained how the term “Six Sigma” can be used to summarize a variety of concepts, including philosophy, tools, methodology, or metrics. In this post, I’ll explain short/long-term variation and between/within-subgroup variation and how they help the Six Sigma practitioner to understand process performance.

Short/Long-Term Variation

In a nutshell, short-term or within-subgroup variation is data collected over a short period of time. Long-term or overall variation is data collected over a longer period. Makes perfect sense right?

Let’s start with the within-subgroup variation. The within-subgroup variation is the variation among measurements in a single subgroup. It represents the natural and inherent variation of the process over a short period of time. Within-subgroup variation will not be influenced by changes to the process inputs, such as different operators, changes in machine settings, or tool wear. When your process is evaluated using within-subgroup variation, you are asking the question: Does my current production sample meet specifications?

In figure 1 below, the within-subgroup variation is represented by the smaller histograms. As you can see, there are multiple subgroups in this data set:

Histograms
Figure 1

The within-subgroup variation is estimated by the within-subgroup standard deviation. Minitab calculates σwithin using one of the following methods:

Pooled standard deviation
Average of subgroup ranges (Rbar)
Average of subgroup standard deviations (Sbar)

The large overarching histogram in the figure above represents the overall variation, which is the within-subgroup variation combined with the variation that occurs among subgroups that are collected over a longer period of time.

The overall variation includes changes to process inputs or to the environment, such as fluctuations in temperature or changes in material. The general rule of thumb for overall variation is that it contains data collected over a sufficient time such that over 80% of the process variation is likely to be included.

The overall variation is estimated by the overall standard deviation. When you evaluate your process using overall variation, you are asking the question: Does my process in the long run meet specification?

Figure 2 displays data from an engine manufacturer using a forging process to make piston rings. The quality engineers want to assess the process capability. Over the span of two weeks, they collect 25 subgroups of five piston rings and measure the diameters. The specification limits for piston ring diameter are 74.0 mm ± 0.05 mm.

Process Capability for Diameter

Figure 2

Most capability assessments are grouped into one of two categories: potential (within) and overall capability. Each represents a unique measure of process capability. Potential capability is often called the "entitlement" of your process! It ignores differences between subgroups and represents how the process could perform if the shift and drift between subgroups are eliminated. Capability indices that assess potential capability include Cp, CPU, CPL, and Cpk.

The overall capability is what the customer experiences! It explains the differences between subgroups. Capability indices that assess overall capability include Pp, PPU, PPL, and Ppk.

You can assess the effect of variation between subgroups by comparing potential and overall capability. If the difference between them is large, there is likely a high amount of variation between the subgroups, and the stability of your process can be improved. If Cp and Cpk, and Pp and Ppk are the same, then you have a centered process, and one that has very little variation.

Between/Within Variation

Between-group variation is the variation due to the interaction between the subgroups. If the subgroup means are close to each other, the between variation will be small for different shifts, machines, or operators.

Between Within Variation

Figure 3

Within-group variation is the variation due to differences within individual samples. It is the random variation that we expect from noise or statistical error. Each sample is considered independently, and no interaction between samples is involved (because we're looking at a sample from one worker, one shift, or one batch). To improve process quality, try to eliminate the between-subgroup variation and the reduce within-subgroup variation.

Boxplot of Racquets

Figure 4

To demonstrate between- and within-subgroup variation, Figure 4 displays racquet sales data for seven stores on a boxplot. The length of the bars in the boxplot represent the within-subgroup variation. The store with the most within-subgroup variation is #13, while the store with the least amount of within-subgroup variation, at first glance, is store #10—but it has an outlier. Therefore, store #12 has the least amount of within-subgroup variation.

The between-subgroup variation is evaluated by comparing the mean (X-bar) between the stores. If the means are close to each other, the between variation calculation will be small.

ANOVA is a statistical method to compare three or more subgroups to determine if the subgroups are statistically the same or different. The F-Value is calculated using the between- and within-subgroup variation. If more variation is coming from within, then the subgroups are considered statistically the same. Conversely, if more variation is due to differences between subgroups, they are considered statistically different.

ANOVA Racquets vs Stores

Figure 5

Figure 5 shows the ANOVA table for the Racquet sales analysis for the seven stores. The stores term represents the between-subgroup variation and the error term represents the within-subgroup variation. After calculating the sum square and mean square, the F-value and P-value are calculated and used to determine the results. Since the F-value is close to 1 and the p-value is >0.05, the stores sales are considered not statistically different.

As you work to improve quality, be sure you recognize the differences between short/long-term variation and between/within-subgroup variation, and how they can help you understand process performance.

↧

What to Do When Your Data's a Mess, part 3

March 28, 2017, 5:00 am

≫ Next: Gauging Gage Part 1: Is 10 Parts Enough?

≪ Previous: Six Sigma Concepts and Metrics (Part 2 of 2)

Everyone who analyzes data regularly has the experience of getting a worksheet that just isn't ready to use. Previously I wrote about tools you can use to clean up and eliminate clutter in your data and reorganize your data.

In this post, I'm going to highlight tools that help you get the most out of messy data by altering its characteristics.

Know Your Options

Many problems with data don't become obvious until you begin to analyze it. A shortcut or abbreviation that seemed to make sense while the data was being collected, for instance, might turn out to be a time-waster in the end. What if abbreviated values in the data set only make sense to the person who collected it? Or a column of numeric data accidentally gets coded as text? You can solve those problems quickly with statistical software packages.

Change the Type of Data You Have

Here's an instance where a data entry error resulted in a column of numbers being incorrectly classified as text data. This will severely limit the types of analysis that can be performed using the data.

misclassified data

To fix this, select Data > Change Data Type and use the dialog box to choose the column you want to change.

change data type menu

One click later, and the errant text data has been converted to the desired numeric format:

numeric data

Make Data More Meaningful by Coding It

When this company collected data on the performance of its different functions across all its locations, it used numbers to represent both locations and units.

uncoded data

That may have been a convenient way to record the data, but unless you've memorized what each set of numbers stands for, interpreting the results of your analysis will be a confusing chore. You can make the results easy to understand and communicating by coding the data.

In this case, we select Data > Code > Numeric to Text...

code data menu

And we complete the dialog box as follows, telling the software to replace the numbers with more meaningful information, like the town each facility is located in.

Code data dialog box

Now you have data columns that can be understood by anyone. When you create graphs and figures, they will be clearly labeled.

Coded data

Got the Time?

Dates and times can be very important in looking at performance data and other indicators that might have a cyclical or time-sensitive effect. But the way the date is recorded in your data sheet might not be exactly what you need.

For example, if you wanted to see if the day of the week had an influence on the activities in certain divisions of your company, a list of dates in the MM/DD/YYYY format won't be very helpful.

date column

You can use Data > Date/Time > Extract to Text... to identify the day of the week for each date.

extract-date-to-text

Now you have a column that lists the day of the week, and you can easily use it in your analysis.

day column

Manipulating for Meaning

These tools are commonly seen as a way to correct data-entry errors, but as we've seen, you can use them to make your data sets more meaningful and easier to work with.

There are many other tools available in Minitab's Data menu, including an array of options for arranging, combining, dividing, fine-tuning, rounding, and otherwise massaging your data to make it easier to use. Next time you've got a column of data that isn't quite what you need, try using the Data menu to get it into shape.

↧

Gauging Gage Part 1: Is 10 Parts Enough?

March 29, 2017, 8:31 am

≫ Next: 5 Simple Steps to Conduct Capability Analysis with Non-Normal Data

≪ Previous: What to Do When Your Data's a Mess, part 3

"You take 10 parts and have 3 operators measure each 2 times."

This standard approach to a Gage R&R experiment is so common, so accepted, so ubiquitous that few people ever question whether it is effective. Obviously one could look at whether 3 is an adequate number of operators or 2 an adequate number of replicates, but in this first of a series of posts about "Gauging Gage," I want to look at 10. Just 10 parts. How accurately can you assess your measurement system with 10 parts?

Assessing a Measurement System with 10 Parts

I'm going to use a simple scenario as an example. I'm going to simulate the results of 1,000 Gage R&R Studies with the following underlying characteristics:

There are no operator-to-operator differences, and no operator*part interaction.
The measurement system variance and part-to-part variance used would result in a %Contribution of 5.88%, between the popular guidelines of <1% is excellent and >9% is poor.

So—no looking ahead here—based on my 1,000 simulated Gage studies, what do you think the distribution of %Contribution looks like across all studies? Specifically, do you think it is centered near the true value (5.88%), or do you think the distribution is skewed, and if so, how much do you think the estimates vary?

Go ahead and think about it...I'll just wait here for a minute.

Okay, ready?

Here is the distribution, with the guidelines and true value indicated:

PctContribution for 10 Parts

The good news is that it is roughly averaging around the true value.

However, the distribution is highly skewed—a decent number of observations estimated %Contribution to be at least double the true value with one estimating it at about SIX times the true value! And the variation is huge. In fact, about 1 in 4 gage studies would have resulted in failing this gage.

Now a standard gage study is no small undertaking—a total of 60 data points must be collected, and once randomization and "masking" of the parts is done it can be quite tedious (and possibly annoying to the operators). So just how many parts would be needed for a more accurate assessment of %Contribution?

Assessing a Measurement System with 30 Parts

I repeated 1,000 simulations, this time using 30 parts (if you're keeping score, that's 180 data points). And then for kicks, I went ahead and did 100 parts (that's 600 data points). So now consider the same questions from before for these counts—mean, skewness, and variation.

Mean is probably easy: if it was centered before, it's probably centered still.

So let's really look at skewness and how much we were able to reduce variation:

10 30 100 Parts

Skewness and variation have clearly decreased, but I suspect you thought variation would have decreased more than it did. Keep in mind that %Contribution is affected by your estimates of repeatability and reproducibility as well, so you can only tighten this distribution so much by increasing number of parts. But still, even using 30 parts—an enormous experiment to undertake—still results in this gage failing 7% of the time!

So what is a quality practitioner to do?

I have two recommendations for you. First, let's talk about %Process. Often times the measurement system we are evaluating has been in place for some time and we are simply verifying its effectiveness. In this case, rather than relying on your small sampling of parts to estimate the overall variation, you can use the historical standard deviation as your estimate and eliminate much of the variation caused by the same sample size of parts. Just enter your historical standard deviation in the Options subdialog in Minitab:

Options Subdialog

Then your output will include an additional column of information called %Process. This column is the equivalent of the %StudyVar column, but using the historical standard deviation (which comes from a much larger sample) instead of the overall standard deviation estimated from the data collected in your experiment:

Percent Process

My second recommendation is to include confidence intervals in your output. This can be done in the Conf Int subdialog:

Conf Int sibdialog

Including confidence intervals in your output doesn't inherently improve the wide variation of estimates the standard gage study provides, but it does force you to recognize just how much uncertainty there is in your estimate. For example, consider this output from the gageaiag.mtw sample dataset in Minitab with confidence intervals turned on:

Output with CIs

For some processes you might accept this gage based on the %Contribution being less than 9%. But for most processes you really need to trust your data, and the 95% CI of (2.14, 66.18) is a red flag that you really shouldn't be very confident that you have an acceptable measurement system.

So the next time you run a Gage R&R Study, put some thought into how many parts you use and how confident you are in your results!

see Part II of this series
see Part III of this series

↧

5 Simple Steps to Conduct Capability Analysis with Non-Normal Data

March 31, 2017, 5:00 am

≫ Next: Gauging Gage Part 2: Are 3 Operators or 2 Replicates Enough?

≪ Previous: Gauging Gage Part 1: Is 10 Parts Enough?

by Kevin Clay, guest blogger

In transactional or service processes, we often deal with lead-time data, and usually that data does not follow the normal distribution. why be normal

Consider a Lean Six Sigma project to reduce the lead time required to install an information technology solution at a customer site. It should take no more than 30 days—working 10 hours per day Monday–Friday—to complete, test and certify the installation. Following the standard process, the target lead time should be around 24 days.

Twenty-four days may be the target, but we know customer satisfaction increases as we complete the installation faster. We need to understand our baseline capability to meet that demand, so we can perform a capability analysis.

We know our data should fit a non-normal (positively skewed) distribution. It should resemble a ski-slope like the picture below:

In this post, I will cover five simple steps to understand the capability of a non-normal process to meet customer demands.

1. Collect data

First we must gather data from the process. In this scenario, we are collecting sample data. We pull 100 samples that cover the full range of variation that occurs in the process.

In this case the full range of variation comes from three installation teams. We will take at least 30 data points from each team.

2. Identify the Shape of the Distribution

We know that the data should fit a non-normal distribution. As Lean Six Sigma practitioners, we must prove our assumption with data. In this case, we can conduct a normality test to prove non-normality.

We are using Minitab as the statistical analysis tool, and our data are available in this worksheet. (If you want to follow along and don't already have it, download the free Minitab trial.)

From the menu, select “Normality Test” found under “Stat > Basic Statistics > Normality Test …”

Populate the “Variable:” field with LeadTime, and click OK as shown:

normality test dialog

You should get the following Probability Plot:

probability plot of lead time

Since the P-value (outlined in yellow in the above picture) is less than .05, we assume with 95% confidence the data fits a non-normal distribution.

3. Verify Stability

In a Lean Six Sigma project, we might find the answer to our problem anywhere on the DMAIC roadmap. Belts need to learn to look for the signals all throughout the project.

In this case, signals can come from instability in our process. They show up as red dots on a control chart.

To see if this lead time process is stable, we will run an I-MR Chart. In Minitab, select Stat > Control Charts > Variables Charts for Individuals > I-MR…”

Populate “Varibles:” with “LeadTime” in the dialog as shown below:

I-MR Chart dialog

Press OK, and you'll get the following “I-MR Chart of LeadTime”:

I-MR Chart of Lead Time

The I-MR Chart shows two signal of instability (shown as red dots) on both the Individual Chart on the top of the graph, and the Moving Range Chart on the bottom.

These data points indicate abnormal variation, and their cause should be investigated. These signals could offer great insight into the problem you are trying to solve. Once identified and resolved the causes of these points, you can take additional data or remove the points from the data set.

In this scenario, we will leave the two points in the data set (we will not remove the two points)

4. What Non-Normal Distribution Does the Data Best Fit?

There are several non-normal data distributions that the data could fit, so we will use a tool in Minitab to show us which distribution fits the data best. Open the “Individual Distribution Identification” dialog by going to Stat > Quality Tools > Individual Distribution Identification…

Populate “Single column:” and Subgroup size:” as follows:

individual distribution identification dialog

Minitab will output the four graphs shown below. Each graph includes four different distributions:

probability ID plots 1

probability id plots 2

probability id plots 3

probability id plots 4

Pick the distribution with the Largest P-Value (excluding the Johnson Transformation and the Box Cox Transformation). In this scenario, the exponential distribution fits the data best.

5. What Is the Process Capability?

Now that we know the distribution that best fits these data, we can perform the non-normal capability analysis. In Minitab, select Stat > Quality Tools > Capability Analysis > Nonnormal…

Populate the “Capability Analysis (Nonnormal Distribution)” dialogue box as seen below. Make sure to select “Exponential” next to Fit distribution. Then, Click on “Options”.

capability analysis dialog

Fill in the “Capability Analysis (Non Normal Distribution): Options” dialogue box with the following:

capability analysis options dialog

We chose “Percents” over “Parts Per Million” because in this scenario it would take years to produce over one million outputs (or data for each installation time).

OK out of the options and main dialog boxes, and you should get the following “Process Capability Report for LeadTime”:

process capability of lead time

We interpret the results of a non-normal capability analysis just as we do an analysis done on data with a normal distribution.

Capability is determined by comparing the width of the process variation (VOP) to the width of the specification (VOC).We would like the process spread to be smaller than, and contained within, the specification spread.

That’s clearly not the case with this data.

The Overall Capability index on the right side of the graph depicts how the process is performing relative to the specification limits.

To quickly determine whether the process is capable, compare Ppk with your minimum requirement for the indices. Most quality professionals consider 1.33 to be a minimum requirement for a capable process. A value less than 1 is usually considered unacceptable.

With a Ppk of .23, it seems our IT Installation Groups have work ahead to get their process to meet customer specifications. At least these data offer a clear understanding of how much the process can be improved!

About the Guest Blogger:

Kevin Clay is a Master Black Belt and President and CEO of Six Sigma Development Solutions, Inc., certified as an Accredited Training Organization with the International Association of Six Sigma Certification (IASSC). For more information visit www.sixsigmadsi.com or contact Kevin at 866-922-6566 or kclay@sixsigmadsi.com.

Would you like to publish a guest post on the Minitab Blog? Contact publicrelations@minitab.com.

↧

Gauging Gage Part 2: Are 3 Operators or 2 Replicates Enough?

April 4, 2017, 5:00 am

≫ Next: How to Improve Cpk

≪ Previous: 5 Simple Steps to Conduct Capability Analysis with Non-Normal Data

In Part 1 of Gauging Gage, I looked at how adequate a sampling of 10 parts is for a Gage R&R Study and providing some advice based on the results.

Now I want to turn my attention to the other two factors in the standard Gage experiment: 3 operators and 2 replicates. Specifically, what if instead of increasing the number of parts in the experiment (my previous post demonstrated you would need an unfeasible increase in parts), you increased the number of operators or number of replicates?

In this study, we are only interested in the effect on our estimate of overall Gage variation. Obviously, increasing operators would give you a better estimate of of the operator term and reproducibility, and increasing replicates would get you a better estimate of repeatability. But I want to look at the overall impact on your assessment of the measurement system.

Operators

First we will look at operators. Using the same simulation engine I described in Part 1, this time I did two different simulations. In one, I increased the number of operators to 4 and continued using 10 parts and 2 replicates (for a total of 80 runs); in the other, I increased to 4 operators and still used 2 replicates, but decreased the number of parts to 8 to get back close to the original experiment size (64 runs compared to the original 60).

Here is a comparison of the standard experiment and each scenario laid out here:

Operator Comparisons

Operator Descriptive Stats

It may not be obvious in the graph, but increasing to 4 operators while decreasing to 8 parts actually increased the variation in %Contribution seen...so despite requiring 4 more runs this is the poorer choice. And the experiment that involved 4 operators but maintained 10 parts (a total of 80 runs) showed no significant improvement over the standard study.

Replicates

Now let's look at replicates in the same manner we looked at parts. In one run of simulations we will increase replicates to 3 while continuing to use 10 parts and 3 operators (90 runs), and in another we will increase replicates to 3 and operators to 3, but reduce parts to 7 to compensate (63 runs).

Again we compare the standard experiment to each of these scenarios:

Replicate Comparisons

Replicates Descriptive Statistics

Here we see the same pattern as with operators. Increasing to 3 replicates while compensating by reducing to 7 parts (for a total of 63 runs) significantly increases the variation in %Contribution seen. And increasing to 3 replicates while maintaining 10 parts shows no improvement.

Conclusions about Operators and Replicates in Gage Studies

As stated above, we're only looking at the effect of these changes to the overall estimate of measurement system error. So while increasing to 4 operators or 3 replicates either showed no improvement in our ability to estimate %Contribution or actually made it worse, you may have a situation where you are willing to sacrifice that in order to get more accurate estimates of the individual components of measurement error. In that case, one of these designs might actually be a better choice.

For most situations, however, if you're able to collect more data, then increasing the number of parts used remains your best choice.

But how do we select those parts? I'll talk about that in my next post!

see Part I of this series
see Part III of this series

↧

How to Improve Cpk

April 5, 2017, 5:00 am

≫ Next: Why Is Continuous Data "Better" than Categorical or Discrete Data?

≪ Previous: Gauging Gage Part 2: Are 3 Operators or 2 Replicates Enough?

You run a capability analysis and your Cpk is bad. Now what?

First, let’s start by defining what “bad” is. In simple terms, the smaller the Cpk, the more defects you have. So the larger your Cpk is, the better. Many practitioners use a Cpk of 1.33 as the gold standard, so we’ll treat that as the gold standard here, too.

Suppose we collect some data and run a capability analysis using Minitab Statistical Software. The results reveal a Cpk of 0.35 with a corresponding DPMO (defects per million opportunities) of more than 140,000. Not good. So how can we improve it? There are two ways to figure that out:

#1 Look at the Graph

Example 1: The Cpk for Diameter1 is 0.35, which is well below 1.33. This means we have a lot of measurements that are out of spec.

Using the graph, we can see that the data—represented by the blue histogram—is not centered between the spec limits shown in red. Fortunately, variability does not appear to be an issue since the histogram and corresponding normal curve can physically fit between the specification limits.

Q: How can we improve Cpk?

A: Center the process by moving the mean closer to 100 – halfway between the spec limits – without increasing the variation.

Example 2: In the analysis for Diameter2, we see a meager Cpk of only 0.41. Fortunately, the data is centered relative to the spec limits. However, the histogram and corresponding normal curve extend beyond the specs.

Q: How can we improve Cpk?

A: Reduce the variability, while maintaining the same average.

Example 3: In the analysis for Diameter3, we can see that the process is not centered between the specs. To make matters worse, the histogram and corresponding normal curve are wider than the tolerance (i.e. the distance between the spec limits), which indicates that there’s also too much variability.

Q: How can we improve Cpk?

A. Shift the mean closer to 100 to center the process AND reduce the variation.

#2 Compare Cp to Cpk

Cp is similar to Cpk in that the smaller the number, the worse the process, and we can use the same 1.33 gold standard. However, the two statistics and their corresponding formulas differ in that Cp only compares the spread of the data to the tolerance width, and does not account for whether or not the process is actually centered between the spec limits.

Interpreting Cp is much like asking “will my car fit in the garage?” where the data is your car and the spec limits are the walls of your garage. We’re not accounting for whether or not you’re a crappy driver and can actually drive straight and center the car—we’re just looking at whether or not your car is narrow enough to physically fit.

Example 1: The analysis for Diameter1 has a Cp of 1.64, which is very good. Because Cp is good, we know the variation is acceptable—we can physically fit our car in the garage. However, Cpk, which does acccount for whether or not the process is centered, is awful, at only 0.35.

Q: How can we improve Cpk?

A: Shift the mean to center the process between the specs, without increasing the variation.

Example 2: The analysis for Diameter 2 shows that Cp = 0.43 and Cpk = 0.41. Because Cp is bad, we know there’s too much variation—our car cannot physically fit in the garage. And because the Cp and Cpk values are similar, this tells us that the process is fairly centered.

Q: How can we improve Cpk?

A: Reduce the variation, while maintaining the same average.

Example 3: The analysis for Diameter 3 has a Cp = 0.43 and Cpk = -0.23. Because Cp is bad, we know there’s too much variation. And because Cp is not even close to Cpk, we know that the process is also off center.

Q: How can we improve Cpk?

A. Shift the mean AND reduce the variation.

And for a 3rd way...

Whether you look at a capability analysis graph or compare the Cp and Cpk statistics, you’re going to arrive at the same conclusion regarding how to improve your results. And if you want yet another way to figure out how to improve Cpk, you can also look at the mean and standard deviation—but for now, I’ll spare you the math lesson and stick with #1 and #2 above.

In summary:

↧

Why Is Continuous Data "Better" than Categorical or Discrete Data?

April 7, 2017, 5:00 am

≫ Next: Gauging Gage Part 3: How to Sample Parts

≪ Previous: How to Improve Cpk

Earlier, I wrote about the different types of data statisticians typically encounter. In this post, we're going to look at why, when given a choice in the matter, we prefer to analyze continuous data rather than categorical/attribute or discrete data.

As a reminder, when we assign something to a group or give it a name, we have created attribute or categorical data. If we count something, like defects, we have gathered discrete data. And if we can measure something to a (theoretically) infinite degree, we have continuous data.

Or, to put in bullet points:

Categorical = naming or grouping data
Discrete = count data
Continuous = measurement data

A statistical software package like Minitab is extremely powerful and can tell us many valuable things—as long as we're able to feed it good numbers. Without numbers, we have no analyses nor graphs. Even categorical or attribute data needs to be converted into numeric form by counting before we can analyze it.

What Makes Numeric Data Discrete or Continuous?

At this point, you may be thinking, "Wait a minute—we can't really measure anything infinitely,so isn't measurement data actually discrete, too?" That's a fair question.

If you're a strict literalist, the answer is "yes"—when we measure a property that's continuous, like height or distance, we are de facto making a discrete assessment. When we collect a lot of those discrete measurements, it's the amount of detail they contain that will dictate whether we can treat the collection as discrete or continuous.

I like to think of it as a question of scale. Say I want to measure the weight of 16-ounce cereal boxes coming off a production line, and I want to be sure that the weight of each box is at least 16 ounces, but no more than 1/2 ounce over that.

With a scale calibrated to whole pounds, all I can do is put every box into one of three categories: less than a pound, 1 pound, or more than a pound.

With a scale that can distinguish ounces, I will be able to measure with a bit more accuracy just how close to a pound the individual boxes are. I'm getting nearer to continuous data, but there are still only 16 degrees between each pound.

But if I measure with a scale capable of distinguishing 1/1000th of an ounce, I will have quite a wide scale—a continuum—of potential values between pounds. The individual boxes could have any value between 0.000 and 1.999 pounds. The scale of these measurements is fine enough to be analyzed with powerful statistical tools made for continuous data.

What Can I Do with Continuous Data that I Can't Do with Discrete?

Not all data points are equally valuable, and you can glean a lot more insight from 100 points of continuous data than you can from 100 points of attribute or count data. How does this finer degree of detail affect what we can learn from a set of data? It's easy to see.

Let's start with the simplest kind of data, attribute data that rates a the weight of a cereal box as good or bad. For 100 boxes of cereal, any that are under 1 pound are classified as bad, so each box can have one of only two values.

We can create a bar chart or a pie chart to visualize this data, and that's about it:

Attribute Data Bar Chart

If we bump up the precision of our scale to differentiate between boxes that are over and under 1 pound, we can put each box of cereal into one of three categories. Here's what that looks like in a pie chart:

pie chart of count data

This gives us a little bit more insight—we now see that we are overfilling more boxes than we are underfilling—but there is still a very limited amount of information we can extract from the data.

If we measure each box to the nearest ounce, we open the door to using methods for continuous data, and get a still better picture of what's going on. We can see that, on average, the boxes weigh 1 pound. But there's high variability, with a standard deviation of 0.9. There's also a wide range in our data, with observed values from 12 to 20 ounces:

graphical summary of ounce data

If I measure the boxes with a scale capable of differentiating thousandths of an ounce, more options for analysis open up. For example, now that the data are fine enough to distinguish half-ounces (and then some), I can perform a capability analysis to see if my process is even capable of consistently delivering boxes that fall between 16 and 16.5 ounces. I'll use the Assistant in Minitab to do it, selecting Assistant > Capability Analysis:

capability analysis for thousandths

The analysis has revealed that my process isn't capable of meeting specifications. Looks like I have some work to do...but the Assistant also gives me an I-MR control chart, which reveals where and when my process is going out of spec, so I can start looking for root causes.

IMR Chart

If I were only looking at attribute data, I might think my process was just fine. Continuous data has allowed me to see that I can make the process better, and given me a rough idea where to start. By making changes and collecting additional continuous data, I'll be able to conduct hypothesis tests, analyze sources of variances, and more.

Some Final Advantages of Continuous Over Discrete Data

Does this mean discrete data is no good at all? Of course not—we are concerned with many things that can't be measured effectively except through discrete data, such as opinions and demographics. But when you can get it, continuous data is the better option. The table below lays out the reasons why.

Continuous Data

Discrete Data

Inferences can be made with few data points—valid analysis can be performed with small samples. More data points (a larger sample) needed to make an equivalent inference. Smaller samples are usually less expensive to gather Larger samples are usually more expensive to gather. High sensitivity (how close to or far from a target) Low sensitivity (good/bad, pass/fail) Variety of analysis options that can offer insight into the sources of variation Limited options for analysis, with little indication of sources of variation

I hope this very basic overview has effectively illustrated why you should opt for continuous data over discrete data whenever you can get it.

↧