Regression with Meat Ants: Analyzing a Count Response (Part II)

September 10, 2015, 5:00 am

≫ Next: Regression with Meat Ants: Analyzing a Count Response (Part III)

≪ Previous: Regression with Meat Ants: Analyzing a Count Response (Part 1)

My previous post showed an example of using ordinary linear regression to model a count response. For that particular count data, shown by the blue circles on the dot plot below, the model assumptions for linear regression were adequately satisfied.

But frequently, count data may contain many values equal or close to 0. Also, the distribution of the counts may be right-skewed. In the quality field, this commonly occurs when you count the number of defects on each item, or the number of defectives in a sample.

So let's suppose that the number of ants coming to each sandwich portion was instead the count data shown by the red square symbols on the dot plot.

If you want to follow along, open the Minitab project file with the new count data. Set up and analyze the ordinary linear regression model the same way as in Part 1. You should get the following result:

For the new count data, notice that the general relationship between the predictors and the count response are similar to those in the original data set. Both Filling and Butter are statistically significant predictors for the ant count response (p < 0.1). Similarly, as before, the coefficients table shows that Ham and Pickles, With Butter are the predictor levels associated with the highest ant count.

But for the new count data, look what happens to the regression equation:

The equation now yields negative counts for some predictor values. For example, the estimated ant count for a Vegemite sandwich on white bread without butter is approximately -0.4. So if you drop that particular sandwich on a sidewalk in Australia, you can expect a little less than negative one-half of an ant to appear. Amodel that predicts antimatter. Hmm. Intriguing. But not very practical.

What about the model assumptions?

With the new count response data, the Residuals Versus Fits plot suggests that the critical assumption of constant variance may be violated. The spread of the residuals appears to increase as the fitted values of the model increase. This classic "megaphone" pattern in the residual plot is a problem—the model estimates get more erratic at higher fitted values.

When this happens, one common approach is to transform the response data to stabilize the variance. In fact, Minitab's linear regression analysis includes an option to perform a Box-Cox transformation (which is a family of power transformations that includes log transform, square root, and other transformation functions) for situations like this. But here's the catch: In many cases, count data can be problematic to transform, especially if they contain the value 0.

For example, try to perform the Box-Cox transformation in Minitab with the new count data, and you'll get this error message.

Even if you try a transformation that can handle counts of 0, you might run into problems due to poor discrimination in your count data. So, when you use ordinary linear regression with a count response, and one of the critical assumptions aren't met—you may find yourself up a creek without a log (or other) transform.

And even if your count data don't include 0, or you manage to find a transformation that works (or use sleight-of-hand to replace 0s in your data set with very minuscule decimal values to make all the data positive), the resulting model with the transformed values may still yield problematic estimates for a count response.

Now what?

Well, instead of using Alka Seltzer to clean your toilet bowl, how about using a product that'd been specifically designed to clean it, such as a toilet bowl cleaner?

That is, instead of using ordinary linear regression, which is technically designed to evaluate a continuous response, why not use a regression analysis specifically designed to analyze a count response? Stay tuned for the next post (Part 3).

↧

Regression with Meat Ants: Analyzing a Count Response (Part III)

September 11, 2015, 5:00 am

≫ Next: College Football and the High-Variance Strategy

≪ Previous: Regression with Meat Ants: Analyzing a Count Response (Part II)

If you use ordinary linear regression with a response of count data, if may work out fine (Part 1), or you may run into some problems (Part 2).

Given that a count response could be problematic, why not use a regression procedure developed to handle a response of counts?

A Poisson regression analysis is designed to analyze a regression model with a count response.

First, let's try using Poisson regression on the count response composed mostly of small data values, including 0, which was so problematic to evaluate using ordinary linear regression. In Minitab 17,choose Stat > Regression > Poisson Regression > Fit Poisson Model) and set up the regression model with the same three categorical predictors as before. The output includes the table shown below:

Filling and Butter are statistically significant at the 0.1 level—just as they were when using ordinary linear regression. But take a look at the regression equation:

The regression equation is different for a Poisson model than for ordinary linear regression. For one thing, the estimated ant count response equals the exponentiated value of the Y' value given by the equation. For example, for a peanut butter sandwich on whole meal bread, with butter, the equation yields a Y' value of -1.508 - 0.061(1) + 1.099(1) + 0.670(1) = 0.2. But that's not the estimated ant count. The estimated ant count is exp (0.2) = e0.2 ≈ 1.22. (By the way, you can have Minitab perform this ugly number crunching for you: Choose Poisson Regression > Predict and fill in the predictor values for which you want a response estimate.)

Notice that with Poisson regression, the regression equation never produces a negative count response, as it can for ordinary linear regression. That's because even if the equation contains negative coefficients that produce a negative value of Y' for some values of the predictors, the exponentiated value of that negative value will always be positive. The smallest response estimate you can possibly get is essentially 0. That's a definite advantage of using Poisson regression with a count response—you don't have to grapple with those weird "antimatter" response estimates.

As with any regression analysis, Poisson regression has model assumptions that need to be evaluated, by examining the residual plots.

Here, the residual plots seem to be OK. The Residuals Versus Log of Fits plot shows a slight increasing pattern—but nothing as troublesome as the classic megaphone pattern we saw when we evaluated these data using ordinary linear regression in Part 2.

For Poisson regression, you should also examine the goodness-of-fit results in the Session window:

For the goodness-of-fit tests, smaller p-values (such as values less than 0.05) indicate that the model does not adequately fit the data. The p-values here are relatively large (> 0.9), so there's no statistically significant evidence of lack-of-fit. That's a good thing!

Using Poisson regression to evaluate the data set of small ant counts (shown by the red values in the dot plot below) worked out well. We avoided all of the messy problems that arose when we tried to evaluate these data using ordinary linear regression in Part 2.

Is Poisson Regression Always a Better Choice for a Count Response?

So Poisson regression can often provide more suitable results for a count response than ordinary linear regression. But is it always a better choice?

Let's go back, full-circle, to the original ant count data set from am.stat org, shown by the blue data values in the dotplot below).

In Part I, we saw that ordinary linear regression performed well with this count data. But look what happens to the model fit statisticss when these count data are evaluated using Poisson regression:

The p-value for the model fit is very small (< 0.05), indicating poor model fit. In fact, 19 of the 48 values in the data set are flagged as "unusual observations"! That's almost 40% of the data—much more than the 5% that you could expect to occur by random chance.

What's the problem?

Overdispersion: The Nemesis of a Poisson Model

One thing that can throw the proverbial wrench into a Poisson model is overdispersion. A Poisson distribution is technically defined as a distribution with equal mean and variance. If the variance of the data is much greater than its mean, it can cause problems.

For the original count data that produced all unusual observations and poor lack-of-fit results using Poisson regression, notice that the variance is nearly 8 times greater than the mean:

Just for illustrative purposes (never do something like this with your data!), consider what happens if you add a value of 200 to each count in this data set, to make the mean and variance closer.

Now, if you re-run the Poisson regression analysis with this modified data, look what happens to the lack-of-fit statistics and unusual observations:

The goodness-of-fit p-values are high and indicate no evidence of lack of fit. In fact, only one data value is now flagged as an unusual observation. The Poisson model is much happier when the mean and variance are closer together (they don't have to be exactly equal, though).

Moral of the story? For the original data set of count responses, the model fit was actually better with the ordinary linear regression than it was for Poisson regression, due to overdispersion.

Sandwich Wrap Up: Analyzing a Count Response using Ordinary Linear Regression or Poisson Regression

This whole thing got started when some bored kids started throwing parts of their sandwiches to meat ants. By analyzing the ant count data in Minitab, we've seen some important issues to consider when using ordinary linear regression or Poisson regression with a count response:

Ordinary Linear Regression Poisson Regression

Designed for a continuous response (but often used with a count response)
Equation may yield negative values
Resistant to overdispersion
May not work well with small, skewed data sets
(and transformation may not be effective)

Designed for a count response
Equation yields only nonnegative values
Sensitive to overdispersion
Can handle small, skewed data sets

Both analyses perform equally well with a ham and pickle, a peanut butter, or a Vegemite sandwich.

↧

College Football and the High-Variance Strategy

September 11, 2015, 8:09 am

≫ Next: Reducing the Size of Your Response Surface Design

≪ Previous: Regression with Meat Ants: Analyzing a Count Response (Part III)

Variance is a measure of how much the data are scattered about their mean. Usually we want to minimize it as much as possible. A manufacturer of screws wants to minimize the variation in the length of the screws. A restaurant owner doesn't want the taste of the same meal to vary from one day to the next. And they might not know it, but most football coaches choose a low variance strategy when they make 4th down decisions!

For example, take the following situation: It’s 4th and 2 at the 50 yard line. If a coach punts, the other team will most likely start with the ball anywhere between their own 1 and 20 yard line. Using the model for expected points, the coach (if they’re playing on the road) can expect their opponent to score between 0.65 and -0.77 points, depending on how lucky they get with the punt. If the coach goes for it, their expected points will be approximately 2.4 if they convert and -2.9 if they fail.

So one decision has a range of 1.42 expected points and the other has a range of 5.3. The low-variance option is clearly to punt. And coaches are choosing the low-variance option mainly because they want to avoid the worst-case scenario (failing on 4th down and giving their opponent the ball at midfield). They are making choices that minimize their risk.

But what if they wanted to maximize their points instead?

Creating a Cheat Sheet for 4th Down Decisions

Using the model for expected points and the model that calculates the probability of converting on 4th down, I was able to create a simple image that shows the decisions that maximize your expected points on 4th down. If a distance is on or below the line, you should go for it. Any distances above the line should result in a punt or field goal. The following chart assumes a punt nets 40 yards (the Big Ten average last year) until a team gets passed midfield, then it assumes the punt is downed at the 10 yard line.

4th down decision chart

The statistics say to go for it on 4th and 2 and less, no matter where you are at on the field. As we reach field goal range, there are different decisions depending on whether you’re playing at home or on the road. Home teams should be much more aggressive than road teams, as the numbers say they shouldn't attempt a field goal on 4th and 5 or shorter. And on 4th and goal, teams should be going for it inside the 5 yard line every time, no matter if they are at home or on the road. The reason being that even if you fail, the other team is starting so deep in their own territory that you’re likely to be the next team to score anyway. It’s a win/win!

Now a lot of these decisions favor choosing the high variance strategy. Fourth and 2 on your own 10 yard line? You can imagine what the range of possibilities are there. But it’s important to remember these are just general guidelines, they’re not meant to be written in stone. Some teams should choose the low variance strategy even if it decreases their expected points.

For example, take Ohio State. They will have more talent than almost every team they play. As Virginia Tech found out earlier this week, you may be able to keep it close for a while, but sooner or later the talent gap will be too much to overcome. For a team to beat Ohio State, they’re going to need some lucky breaks. Something like Ohio State failing on 4th down multiple times deep in their own territory. So if you’re the Buckeyes (or any heavily favored team), when you’re deep in your own territory, the best decision is to choose the low-variance strategy and punt. If you avoid giving your opponent easy opportunities to score, odds are your talent alone will be enough to get you the win.

But the opposite is true if you’re a heavy underdog. Then, you should actually be more aggressive than the previous chart suggests. Your best chance of pulling an upset is to hope you get lucky converting a majority of your 4th downs. Of course, this also gives you the best chance of being blown out. But if you’re not likely to win playing conventionally anyway, what do you have to lose?

Indiana Hoosiers, are you listening?

Stop Punting!

The Indiana football program has had a pretty rough go of it. And that’s putting it mildly. Their last Big Ten Championship was in 1967. The last time they won a bowl game was 1991, and they’ve had exactly two winning seasons since then (both resulting in bowl losses). Clearly, playing conventionally hasn’t gotten Indiana very far. Perhaps it's time for Indiana to adopt a new strategy. Let's dig deeper.

The past 5 seasons, here is where the Indiana defense has ranked in football outsiders S&P+ ratings:

And things don’t seem to be turning around this season. In week 1 Indiana gave up 47 points to Southern Illinois, a FCS team that went 6-6 a season ago. Southern Illinois had 411 yards passing, another 248 yards rushing, and had 5 different touchdown drives of 72 yards or more. What exactly is the point of punting if the other team can score from anywhere on the field?

The lone bright spot for Indiana is their quarterback, Nate Sudfeld. With Sudfeld at the helm in 2013, Indiana had the 16th best offense according to the football outsiders S&P+ ratings. He completed only five games last year before his season was ended by an injury. But this season he’s back for his senior year, ready to lead what should be an above average Indiana offense.

So let’s recap. A defense that is so bad, teams will be able to score on it no matter where they start with the ball. An offense led by a senior who previously quarterbacked one of the top offenses in the country. And a history of losing trying to play the same way the rest of the Big Ten plays. It only leaves one more thing to say.

Hey Indiana, stop punting!

↧

Reducing the Size of Your Response Surface Design

September 14, 2015, 5:00 am

≫ Next: Which Supplier Should You Choose? Check the Data.

≪ Previous: College Football and the High-Variance Strategy

It sometimes may be prohibitively expensive or time-consuming to gather data for all runs for a designed experiment (DOE).

For example, a 6 factor, 2-level factorial design can entail 64 experimental runs, which may be too high a number for your particular situation. We have seen how to handle these some of these situations in previous posts, such as Design of Experiments: "Fractionating" and "Folding" a DOE and Gummi Bear DOE: Selecting Your Experimental Design Resolution. Both of these articles discuss how to create a smaller subset of your design and what to look out for when doing so.

What would we do if we wanted to minimize a response surface design, though?

What Is a Surface Response Design?

A response surface design is used when you suspect curvature in the response. It allows you to add squared (or quadratic) terms to your model in order to analyze that curvature.

A central composite design, one type of RSD, consists of:

Factorial Cube points
Axial points (values outside of cube’s low and high)
Center points. (values in the middle of the cube)

A central composite design with two factors is shown below. Points of the diagram represent the experimental runs that are performed:

You may already be saying, “but my existing factorial design also can contain cube and center points.” That is true. Your factorial design, however, can only attempt to detect curvature in your model—it can’t necessarily model it. This is where response surface designs come in to save the day.

If curvature is detected in the response surface, you can modify your existing factorial design into an RSD by adding axial points in order to analyze a model with quadratic terms. (In Minitab Statistical Software, you can do this by going to Stat > DOE > Factorial Design > Modify Design > Add Axial Points). This allows you to build upon your design without having to create a RSD from scratch.

Reducing Your RSD

There are a couple of ways you can reduce the size of your central composite design if you deem it too large.

1. Select Optimal Design

Minitab can use one of two methods for reducing your design, D-Optimality and Distance Based Optimality. D-Optimality does require that you have an idea of what terms in the model (certain interactions, squares) you’ll eventually want to analyze. You’ll specify those terms under the “Terms” sub-menu. Distance Based, however, can be used when you don’t have the ability to specify a model in advance.

For either method, you will have to specify the number of runs in your reduced design as well, in the box labeled “Number of points in your optimal design.” You can find Select Optimal Design under Stat > DOE > Response Surface > Select Optimal Design.

2. Fractional Response Surface Designs

Surprise! Yes, you can take a fraction of your RSD, but the availability of fractionating your design depends on the amount of continuous factors you are analyzing. Please view the chart below:

The values in the table represent the total number of runs that will be created each type of design. For a 5-factor, ½-fraction central composite design, either 32 or 33 runs will be created, depending on whether it’s blocked.

One difference between a fractional factorial and a fractional CCD is that the CCDs are all resolution V or better. All of the main effects, 2-factor interactions and square terms can be estimated independently of each other. Since those are the only terms we consider in response surface models, we do not show the alias table that we are normally accustomed to seeing when creating a fractional factorial design. The alias table would display the terms that cannot be estimated independently of one another.

You can create a fractional response surface design under Stat > DOE > Response Surface > Create Response Surface Design. If you have at least 5 factors, you’ll have the option to take a half-fraction design under the Designs sub-menu.

I hope this information helps you when you create your next DOE!

↧

Which Supplier Should You Choose? Check the Data.

September 15, 2015, 5:00 am

≫ Next: An Easy Data Set to Summarize with Minitab's Assistant

≪ Previous: Reducing the Size of Your Response Surface Design

Whatever industry you're in, you're going to need to buy supplies. If you're a printer, you'll need to purchase inks, various types of printing equipment, and paper. If you're in manufacturing, you'll need to obtain parts that you don't make yourself.

But how do you know you're making the right choice when you have multiple suppliers vying to fulfill your orders? How can you be sure you're selecting the vendor with the highest quality, or eliminating the supplier whose products aren't meeting your expectations?

Let's take a look at an example from automotive manufacturing to see how we can use data to make an informed decision about the options.

Camshaft Problems

Thanks to camshafts that don’t meet specifications, too many of your car company's engines are failing. That’s harming your reputation and revenues. Your company has two different camshaft suppliers, and it's up to you to figure out if camshafts from one or both of them are failing to meet standards.

The camshafts in your engines must be 600 mm long, plus or minus 2 mm. To acquire a basic understanding of how your suppliers are doing, you measure 100 camshafts from each supplier, sampling 5 shafts each from 20 different batches and record the data in a Minitab worksheet.

Once you have your data in hand, the Assistant in Minitab Statistical Software can tell you what the data say. If you're not already using our software and you want to play along, you can get a free 30-day trial version.

Step 1: Graph the Data

Seeing your data in graphical form is always a good place to start your analysis. So in Minitab, select Assistant > Graphical Analysis.

The Assistant offers you a choice of several different graph options for each of three possible objectives. Since we're not graphing variables over time, nor looking for relationships between variables, let's consider the options available under "Graph the distribution of data," which include histograms, boxplots, and Pareto charts among others.

A basic summary of the data is a good place to start. Click the Graphical Summary button and complete the dialog box as shown.

The Assistant outputs a Diagnostic Report, a Report Card, and the Summary Report shown below.

The data table in this summary report reveals that the means of the camshafts sampled from supplier 2 and supplier 1 are both very close to the target of 600 mm.

But the report also reveals a critical difference between the suppliers: while neither data set contains any outliers, there is clearly more variation in the lengths of camshafts from supplier 2.

The supplier 1 distribution graph on the left is clustered tightly around the mean, while the one for supplier 2 reflects a wider range of values. The graph of the data in worksheet order shows that supplier 1’s values hew tightly to the center line, compared to supplier 2’s more extreme up-and-down pattern of variation.

Returning to the table of summary statistics at the bottom of the output, a check of the basic statistics quantifies the difference in variation between the samples: the standard deviation of supplier 1's samples is 0.31, compared to 1.87 for supplier 2.

We already have solid evidence to support choosing supplier 1 over supplier 2, but the Assistant in Minitab makes it very easy to get even more insight about the performance of each supplier so you can make the most informed decision.

Step 2. Perform a Capability Analysis

You want to assess the ability of each supplier to deliver camshafts relative to your specifications, so you select the Capability Analysis option in the Assistant menu.

The analysis you need depends on what type of data you have. The measurements you’ve collected are continuous. The Assistant directs you to the appropriate analysis.

Clicking the more… button displays the requirements and assumptions that need to be satisfied for the analysis to be valid.

You already know your data are reasonably normal, that it’s been collected in rational subgroups, and that you have enough data for reliable estimates—we saw this in the initial Graphical Analysis.

However, the Assistant also notes that you should collect data from a process that is stable. You haven’t evaluated that yet…but fortunately, the Assistant automatically assesses process stability as part of its capability analysis.

Confident that your data were collected appropriately, you click the Capability Analysis button and complete the dialog box as shown to analyze Supplier 1:

The Assistant produces all the output you need in a clear, easy-to-follow format. The Diagnostic Report offers detailed information about the analysis, while the Report Card flags potential problems. In this case, the Report Card verified that the data were from a stable process and that the analysis should be reasonably precise and accurate.

The Summary Report gives you the bottom-line results of the analysis.

As shown in the scale at the top left of the report, Supplier 1’s process is very capable of delivering camshafts that meet your requirements. The histogram shows that all of the data fall within specifications, while the Ppk measurement of overall capability is 1.94, exceeding the typical industry benchmark of 1.33.

Now you perform the analysis for Supplier 2. Once again, the Report Card verifies the stability of the process and confirms that the capability analysis should be reasonably accurate and precise. However, the Assistant’s Summary Report for Supplier 2 reveals a very different situation in most other respects:

The scale at the top left of the report shows that Supplier 2’s ability to provide parts that meet your specifications is quite low, while the histogram shows that an alarming number of the camshafts in your sample fall outside your spec limits. Supplier 2’s Ppk is 0.31, far below the 1.33 industry benchmark. And with a defect rate of 28.95%, you can expect more than a quarter of the motors you assemble using Supplier 2’s camshafts to require rework!

You can use the Assistant output you created to explain exactly why you should continue to acquire camshafts from Supplier 1. In addition, even though you’ll discontinue using Supplier 2 for now, you can give this supplier important information to help them improve their processes for the future.

A Clear Answer

Whatever your business, you count on your suppliers to provide deliverables that meet your requirements. You have seen how the Assistant can help you visualize the quality of the goods you receive, and perform a detailed analysis of your suppliers’ capability to deliver quality products.

↧

An Easy Data Set to Summarize with Minitab's Assistant

September 16, 2015, 5:36 am

≫ Next: Repeated Measures Designs: Benefits, Challenges, and an ANOVA Example

≪ Previous: Which Supplier Should You Choose? Check the Data.

Pope Francis When I started out on the blog, I spent some time showing some data sets that would be easy to illustrate statistical concepts. It’s easier to show someone how something works with something familiar than with something they’ve never thought about before.

Need a quick illustration to share with someone about how to summarize a variable in Minitab? See if they have a magazine on their desk, and evaluate it for line length.

Pick a few lines in the middle of paragraphs and count the number of characters per line. Using a formal random sample would be best, but in my experience, you can get by with a cluster sample. For a cluster sample in this context, randomly select a few paragraphs and then count up the number of characters in all of the lines except the first and last lines. (The first and last lines will tend to be shorter. Most magazines indent first lines. Last lines end where the sentence ends, rather than where the column ends.)

The sample in this illustration is from Newsweek’s recent article “Is the Pope Catholic?” (I used the tablet version, not the online article, for the data.)

Make Everything Clear—The Assistant Menu

In Minitab, the most complete way to summarize your data is with the Assistant menu. In Minitab, select Assistant > Graphical Analysis > Graphical Summary. With this tool, you don’t just get a graphical summary. You get everything you’ve come to expect from the Assistant to help you understand your data. Minitab evaluates your sample size and checks for outliers on the Report Card.

The report card for the graphical summary evaluates sample size and calls attention to potential outliers.

Then, the Diagnostic Report shows you graphs that let you compare visualize outliers and the shape of your data.

The diagnostic report shows example shapes of the data and points out outliers in a graph.

The Summary Report includes descriptive statistics, confidence intervals for common statistics, a normality test, and graphs—everything that you need to present your summary to others so that they’ll understand what’s important about the data.

So what’s important about these data? While I didn't do a capability analysis, Newsweek's probably doing a good job. Print magazines are generally designed so that the line lengths enhance legibility for readers. Although there are many factors to consider, 50 to 60 characters per line is usually a reasonable target. The minimum line length in these sample is 50, no smaller than it should be. Only 6 of the values out of 50 samples are higher than 60, and those are all 61. It’s pretty easy to see on this histogram that the spread of the values is right where you’d want it to be.

Get what you need when you summarize data

With Minitab’s Assistant, the data summary that you need is ready at a moment’s notice. You get the guidance you need to be confident that what you’re presenting is everything you need to make smart decisions.

The image of Pope Francis is by the Korean Culture and Information Service and is licensed under this Creative Commons License.

↧

Repeated Measures Designs: Benefits, Challenges, and an ANOVA Example

September 17, 2015, 5:00 am

≫ Next: Big Ten 4th Down Calculator: Week 1

≪ Previous: An Easy Data Set to Summarize with Minitab's Assistant

Repeated measures designs don’t fit our impression of a typical experiment in several key ways. When we think of an experiment, we often think of a design that has a clear distinction between the treatment and control groups. Each subject is in one, and only one, of these non-overlapping groups. Subjects who are in a treatment group are exposed to only one type of treatment. This is the common independent groups experimental design.

These ideas seem important, but repeated measures designs throw them out the window! What if you have a subject in the control group and all the treatment groups? Is this a problem? Not necessarily. In fact, repeated measures designs can provide tremendous benefits!

In this post, I’ll highlight the pros and cons of using a repeated measures design and show an example of how to analyze a repeated measures design using ANOVA in Minitab.

What Are Repeated Measures Designs?

As you'd expect, repeated measures designs involve multiple measurements of each subject. That’s no surprise, but there is more to it than just that. In repeated measures designs, the subjects are typically exposed to all of the treatment conditions. Surprising, right?

In this type of design, each subject functions as an experimental block. A block is a categorical variable that explains variation in the response variable that is not caused by the factors that you really want to know about. You use blocks in designed experiments to minimize bias and variance of the error because of these nuisance factors.

In repeated measures designs, the subjects are their own controls because the model assesses how a subject responds to all of the treatments. By including the subject block in the analysis, you can control for factors that cause variability between subjects. The result is that only the variability within subjects is included in the error term, which usually results in a smaller error term and a more powerful analysis.

The Benefits of Repeated Measures Designs

More statistical power: Repeated measures designs can be very powerful because they control for factors that cause variability between subjects.

Fewer subjects: Thanks to the greater statistical power, a repeated measures design can use fewer subjects to detect a desired effect size. Further sample size reductions are possible because each subject is involved with multiple treatments. For example, if an independent groups design requires 20 subjects per experimental group, a repeated measures design may only require 20 total.

Quicker and cheaper: Fewer subjects need to be recruited, trained, and compensated to complete an entire experiment.

Assess an effect over time: Repeated measures designs can track an effect overtime, such as the learning curve for a task. In this situation, it’s often better to measure the same subject at multiple times rather than different subjects at one point in time for each.

Managing the Challenges of Repeated Measures Designs

Repeated measures designs have some disadvantages compared to designs that have independent groups. The biggest drawbacks are known as order effects, and they are caused by exposing the subjects to multiple treatments. Order effects are related to the order that treatments are given but not due to the treatment itself. For example, scores can decrease over time due to fatigue, or increase due to learning. In taste tests, a dry wine may get a higher rank if it was preceded by a dryer wine and a lower rank if preceded by a sweeter wine. Order effects can interfere with the analysis’ ability to correctly estimate the effect of the treatment itself.

There are various methods you can use to reduce these problems in repeated measures designs. These methods include randomization, allowing time between treatments, and counterbalancing the order of treatments among others. Finally, it’s always good to remember that an independent groups design is an alternative for avoiding order effects.

Below is a very common crossover repeated measures design. Studies that use this type of design are as diverse as assessing different advertising campaigns, training programs, and pharmaceuticals. In this design, subjects are randomly assigned to the two groups and you can add additional treatments and a control group as needed.

Diagram of a crossover repeated measures design

There are many different types of repeated measures designs and it’s beyond the scope of this post to cover all of them. Each study must carefully consider which design meets the specific needs of the study.

For more information about different types of repeated measures designs, how to arrange the worksheet, and how to perform the analysis in Minitab, see Analyzing a repeated measures design. Also, learn how to use Minitab to analyze a Latin square with repeated measures design. Now, let’s use Minitab to perform a complex repeated measures ANOVA!

Example of Repeated Measures ANOVA

An experiment was conducted to determine how several factors affect subject accuracy in adjusting dials. Three subjects perform tests conducted at one of two noise levels. At each of three time periods, the subjects monitored three different dials and make adjustments as needed. The response is an accuracy score. The noise, time, and dial factors are crossed, fixed factors. Subject is a random factor, nested within noise. Noise is a between-subjects factor, time and dial are within-subjects factors.

Here are the data to try this yourself. If you're not already using our software and you want to play along, you can get a free 30-day trial version.

To analyze this repeated measures design using ANOVA in Minitab, choose: Stat > ANOVA > General Linear Model > Fit General Linear Model, and follow these steps:

In Responses, enter Score.
In Factors, enter Noise Subject ETime Dial.
Click Random/Nest.
Under Nesting, enter Noise in the cell to the right of Subject.
Under Factor type, choose Random in the cell to the right of Subject.
Click OK, and then click Model.
Under Factors and Covariates, select all of the factors.
From the pull-down to the right of Interactions through order, choose 3.
Click the Add button.
From Terms in model, choose Subject*Etime*Dial(Noise) and click Delete.
Click OK in all dialog boxes.

Below are the highlights.

You can gain some idea about how the design affected the sensitivity of the F-tests by viewing the variance components below. The variance components used in testing within-subjects factors are smaller (7.13889, 1.75, 7.94444) than the between-subjects variance (65.3519). It is typical that a repeated measures model can detect smaller differences in means within subjects as compared to between subjects.

Variance components for repeated measures design

Of the four interactions among fixed factors, the noise by time interaction was the only one with a low p-value (0.029). This implies that there is significant evidence for judging that a subjects' sensitivity to noise changed over time. There is also significant evidence for a dial effect (p-value < 0.0005). Among random terms, there is significant evidence for time by subject (p-value = 0.013) and subject (p-value < 0.0005) effects.

ANOVA table for repeated measures design

In closing, I'll graph these effects using Stat > ANOVA > General Linear Model > Factorial Plots. This handy tool takes our ANOVA model and produces a main effects plot and an interactions plot to help us understand what the results really mean.

Main effects plot for repeated measures design

Interaction plot for repeated measures design

↧

Big Ten 4th Down Calculator: Week 1

September 21, 2015, 10:49 am

≫ Next: The Longest Drive: Golf and Design of Experiments, Part 1

≪ Previous: Repeated Measures Designs: Benefits, Challenges, and an ANOVA Example

This summer, I created a model to determine the correct 4th down decision. But whether it’s for business or some personal interest, creating a model is just the starting point. The real benefits come from applying your model. And for the Big Ten 4th down calculator, the time to apply the model is now!

On Saturday night, Penn State and Rutgers officially kicked off conference play for the 2015 Big Ten football season. So let’s see what the model thinks about each team’s 4th down decisions.

One caveat before we begin. In hypothesis testing, it’s important to understand the difference between statistical and practical significance. A test that concludes there is a statistically significant result doesn’t imply that your result has practical consequences. You should use your specialized knowledge to determine whether the difference is practically significant.

The same line of thought should be applied to the 4th down calculator. The decision the calculator gives isn’t meant to be written in stone. It simply gives a good estimate of what an average Big Ten team would do against another average Big Ten team. Coaches should also consider other factors, like the game situation and the strengths and weaknesses of their specific team. But the 4th down calculator will still give a very strong starting place for the decision making!

Okay, enough of the pregame show, let’s get to the game!

4th Down Decisions in the First 3 Quarters

In the first 3 quarters of a football game, coaches should be making decisions that maximize their points (in the 4th quarter they should be maximizing their win probability, but more on that in a bit). The following graph shows approximately when coaches should go for it and when they should kick.

4th Down Decision Chart

Let’s start by seeing how Rutgers’s 4th downs decisions lined up with the previous graph.

Rutgers 4th down decisions in the first 3 quarters

Distance

Yards to End Zone

Model Decision

Coaches Decision

Agree or Disagree

Expected Points Punt

Expected Points Go for it

Next Score

Punt

Agree

-3.3

-4.3

-7

Go for It

Punt

Disagree

0.1

0.4

-7

Punt

Agree

-1.6

-2.8

-7

Punt

Agree

-0.3

-1.4

-7

Punt

Agree

-0.9

-2.2

-7

Punt

Agree

-2.2

-3.1

Punt

Agree

-0.3

-1.2

Rutgers’ 4th down decisions show how ineffective their offense was against the Penn State defense. They never had a 4th down distance less than 5 yards. The result was a lot of punting, which was the correct decision most of the time. Keyword, most. On their 2nd drive, Rutgers punted on 4th and 5 from the Penn State 37 yard line. The statistics say that on average, they would have scored 0.3 more points by going for it. But remember our analogy about practical and statistical significance? When you factor in more information, I think the decision to go for it is even stronger.

The 4th down calculator assumes a net punt of 40 yards when calculating the expected values. But once a team is in their opponent’s territory, it assumes the punt is downed at the 10 yard line. But what if the punt goes into the end zone for a touchback? If the punt results in a touchback (which is exactly what happened in the game), Rutgers’s expected points on a punt decrease to -0.7. Now instead of a difference of 0.3 points, it’s over a full point! As a 10-point underdog, that’s exactly the type of decision Rutgers couldn’t afford to make. And sure enough, on the very next drive, Penn State went 80 yards and scored the first touchdown of the game. The extra 17 yards Rutgers gained didn't even matter.

Now let’s look at Penn State's 4th downs.

Penn State’s 4th down decisions in the first 3 quarters

Distance

Yards to End Zone

Model Decision

Coaches Decision

Agree or Disagree

Expected Points Punt

Expected Points Go for it or FG

Next Score

Punt

Agree

0.7

0.5

Field Goal

Punt

Disagree

0.8

0.5

Punt

Agree

-0.3

-1.1

Punt

Agree

-0.9

Go for it

Agree

0.7

-3

Go for it

Punt

Disagree

0.6

1.1

-3

On Penn State’s 2nd possession, they decided to punt from the Rutgers 35-yard line rather than attempt a 52-yard field goal. But remember that thing about statistical and practical significance? These long field goals are absolutely a case where a coach needs to use their knowledge about their specific kicker. To calculate the field goal probability, I used data from the previous 3 Big Ten seasons. The resulting model predicts that a kicker will make a 52-yard field goal about 48% of the time. But for the data I collected, a coach is only going to attempt a 50+ yard field goal with a kicker they feel confident has a chance of making it. For many college kickers, a 50+ yard field goal is out of their range. So the coaches should use their knowledge of their specific kicker.

For Penn State, they had kicker Joey Julius attempt both a 49 yard and 50 yard field goal the previous week (in the pouring rain too!). So I think it’s safe to assume that Coach Franklin thinks he has a strong enough leg. So we’re going to stick with the 4th down calculator’s decision that kicking a field goal is what Penn State should have done.

Late in the game, Penn State had a pair of 4th and 1 decisions to make. In college football, the conversion rate on 4th and 1 is so high (68% is the probability the model uses) that the 4th down calculator is going to say to go for it no matter where you are the field. And on the first 4th and 1, Penn State correctly followed suit. However, they went with an empty backfield, clearly indicating to Rutgers that the play would be a quarterback sneak. Sure enough, Rutgers stopped quarterback Christian Hackenberg on the play. Rutgers then used all their “momentum” from the big stop to go 3 and out on offense and punt the ball back to Penn State.

On the next possession, Penn State decided to punt on 4th and 1. The difference in expected points is half a point, but when you add the fact that running back Saquon Barkley averaged 9.3 yards per rush, the decision to go for it is even stronger. The Nittany Lions were already up 21 points, so the decision to punt didn’t have a large impact on the outcome of the game. But in the future, Penn State will certainly have another 4th and 1 at midfield with a much closer score. Hopefully they go with their first 4th and 1 decision, and not the latter.

4th Down Decisions in the 4th Quarter

In the final quarter, coaches should be making decisions that maximize their win probability. To calculate win probability, I’m using this formula from Pro Football Reference. It uses the Vegas spread, the standard deviation of the Vegas spread (which is 15.53 for college football), the time remaining in the game, the current score, and the expected points the team with the ball currently has.

The Penn State Rutgers game only had one 4th down decision in the 4th quarter that we’ll discuss. Rutgers had a 4th and 9 at the Penn State 11-yard line. To maximize their expected points, Rutgers should kick the field goal. But with only 10 minutes left they should be maximizing their win probability. Down by 21 with 10 and a half minutes left, here is their resulting win probability for going for it and kicking a field goal.

Win Probability Go for it: 0.71%
Win Probability Kick FG: 0.57%

Ok, so Rutgers wasn’t likely to win either way. Both decisions result in a win probability of less than 1%. But the win probability for going for it is higher. And down by 21, you need to score 3 times. By kicking a field goal, you go from needing 3 scores to...oh wait, you still need 3 scores.

If a player were to give up and walk off the field with 10 minutes left in a game, their head coach would be irate. The story would probably even make Sports Center later that night. But when it comes to the coaches themselves, apparently giving up is fine. Because by kicking a field goal down 21 with 10 minutes left, giving up is exactly what Rutgers did.

Summary

Each week, I’ll summarize the times coaches disagreed with the 4th down calculator and the difference in expected points between the coach’s decision and the calculator’s decision. I’ll do this only for the 1st 3 quarters since I’m tracking expected points and not win probability. I also want to track decisions made on 4th and 1, and decisions made between midfield and the opponent’s 35 yard line. I’ll call this area the “Gray Zone.” These will be pretty sparse now, but will fill up as the season goes along! Then we can easily compare the actual outcomes of different decisions in similar situations.

Have anything else you think I should track? Let me know and I’ll add it!

Team Summary

Team

Disagreements

Difference in Expected Points

Penn State

0.8

Rutgers

0.3

4th and 1

Yards To End Zone

Punts

Average Next Score After Punt

Go for It

Average Next Score after Go for it

Field Goals

Average Next Score After FG

75-90

50-74

-3

25-49

-3

1-24

The Gray Zone (4th downs 35-50 yards to the end zone)

4th Down Distance

Punts

Average Next Score After Punt

Go for It

Average Next Score after Go for it

Field Goals

Average Next Score After FG

-3

2-5

-4

6-9

10+

↧

The Longest Drive: Golf and Design of Experiments, Part 1

September 22, 2015, 6:50 am

≫ Next: Regression in the Real World

≪ Previous: Big Ten 4th Down Calculator: Week 1

As we broke for lunch, two participants in the training class began to discuss, debate, and finally fight over a fundamental task in golf—how to drive the ball the farthest off the tee. Both were avid golfers and had spent a great deal of time and money on professional instruction and equipment, so the argument continued through the lunch hour, with neither arguer stopping to eat. Several other class participants chimed in with approval or disagreement over the points being made on both sides.

Even though I know very little about golf, I thought the problem could be easily solved with a designed experiment as soon as I returned home from the training session. After all, this particular training session was DOE in Practice, and I was the instructor.

Back at Minitab, we found avid golfers who were very much interested in the problem and willing to participate and have data collected on their drives. However, my initial research on this subject revealed that the problem was not that simple. I talked to other statisticians who had attempted to determine how to drive the longest ball from the tee, but they told me they had failed due to high process variation.

Others I talked to found that that the variables they selected did not have a measurable impact. My research also pointed me to conflicting studies that found that keeping the ball high on the tee improved distance, while others recommended keeping the ball low to the ground. Clearly this was not going to be an easy puzzle to solve, making the DOE approach even more appropriate here.

It was easy for me to relate this back to solving manufacturing process engineering problems—the difficult ones have many of the following characteristics in common:

A complex process with many possible input variables.
Noise variables beyond the researcher’s control that affect the response.
Physical roadblocks that prevent executing certain desired run conditions.
Time, money, and manpower limitations.
Many experts with competing theories on key drivers of the process (some real and some not).
Measurement variability making it difficult to measure both responses and inputs.
Process variation causing different response values even when repeating the same run conditions.

In industry, figuring out how to deal with these kinds of complexities can be overwhelming, often to the point that we wind up just living with an under-performing process for years on end. These are the kinds of entrenched problems that challenge new college grads to give it all they've got, while their more experienced colleagues jeer or offer tales of failed past attempts. But secretly, they’re watching from the sidelines, curious to see what the newcomer’s approach will reveal about the age-old problem.

Does this sound like your life at work? Are you faced with a process engineering challenge that involves many of the same characteristics in the list above? Solving the golf drive-distance problem will require dealing with all seven of these issues. I'll be publishing a series of blog posts over the next few weeks to share our method for breaking this problem into more manageable parts, and our journey towards answering how to drive the ball the farthest off the tee. Our goal is to demonstrate how the steps we’re using here can also be applied to answer questions about your process.

In the next post, I’ll cover how to use process knowledge experts to help define the process and the problem.

Many thanks to Toftrees Golf Resort and Tussey Mountain for use of their facilities to conduct our golf experiment.

↧

Regression in the Real World

September 23, 2015, 5:00 am

≫ Next: Everyone's Talking "Big Data"...But Size Isn't What Matters

≪ Previous: The Longest Drive: Golf and Design of Experiments, Part 1

I recently guest lectured for an applied regression analysis course at Penn State. Now, before you begin making certain assumptions—because as any statistician will tell you, assumptions are important in regression—you should know that I have no teaching experience whatsoever, and I’m not much older than the students I addressed.

I’m just 5 years removed from my undergraduate days at Virginia Tech, so standing behind the podium felt backwards. But it certainly provoked fond memories of my days in Hutcheson Hall—home to Virginia Tech’s Department of Statistics—learning the same concepts these students are learning.

minitab's regression menu I remember what it was like to be in their shoes. For instance, I had no idea how calculating an R-squared value by hand could meaningfully contribute to my life after college—especially when software like Minitab can so easily do the work for me.

Now I know better.

I wanted to show these students how regression and other statistical concepts could be applied in their future lines of work, and how integral a role statisticians can play in any business. As a student, I failed to grasp this because I was more concerned with the letter grade I needed to obtain to maintain a high GPA. Giving this guest lecture was an opportunity to renounce my old, flawed mentality.

What 5 years in the real world has taught me

Since graduating, I’ve engaged with many Minitab users, and I’ve worked with Minitab’s statistical consultants, who address a variety of business requests for statistical help. I’ve encountered numerous practical applications of how the tools in Minitab help people who need to analyze their data—many of whom lack formal statistical training—select the proper analysis, draw useful conclusions, and make key business decisions that lead to improved processes and increased profits.

If you want to draw actionable conclusions from your data, selecting and using the appropriate statistical tool is half the battle! In so many cases, people choose the wrong tools because they misunderstand what kind of data they have, or they haven’t fully defined the problem they are trying to solve. Questions like, ‘Are my data categorical or continuous,’ or ‘Are my observations independent,’ or ‘How are my data distributed,’ and ‘Does this analysis assume a particular distribution?’ are often difficult to answer, making selection of a statistical tool even more intimidating.

Assistant regression menu in Minitab 17 Fortunately, Minitab makes this entire process easy for professionals with any level of statistical expertise—our software provides the proper tool belt for solving statistical problems, including tools in the Assistant menu which offer detailed guidance to help the professional confidently choose the right analyses and make informed decisions for their business.

But formal training in statistical methods can give students a big advantage in the workplace. And regression—a very practical tool for modeling data and predicting outcomes—was my platform for communicating this idea, based on the experiences of some of Minitab’s own consulting statisticians.

What problems can regression solve?

If the data were accessible, I could use a regression analysis to show one thing that hasn’t changed about college—early morning classes have awful attendance rates, no matter what the subject is. My first lecture was during a Friday 8:00 a.m. section; the room was about half empty. And of those who actually attended, about half drowsily filtered in during the first 10 minutes of class.

But as I talked, I could practically see the synapses firing for some of those students. I presented several examples where regression came to the rescue of businesses in real world settings. These businesses had particular questions, and Minitab’s statistical consultants used regression to provide the answers.

I showed the students how:

A pharmaceutical company used regression to assess the stability of the active ingredient in a drug to predict its shelf life in order to meet FDA regulations and identify a suitable expiration date for the drug.
A credit card company applied regression analysis to predict monthly gift card sales and improve yearly revenue projections.
A hotel franchise used regression to identify a profile for and predict potential clients who might default on a timeshare loan in order to reduce loan qualification rates among high-risk clients, adjust interest rates based on client risk factors, and minimize company losses.
An insurance company used regression to determine the likelihood of a true problem existing when a home insurance claim was filed, in order to discourage customers from filing excessive or petty claims.

The results of these regression analyses, along with help from Minitab’s statistical consultants, gave these companies the confidence to make decisions they knew would improve their business. They were encouraged to identify solutions to address problem areas, and to implement new processes within their organization as well as new strategies to promote products and services to their clientele—knowing they could collect data and use the same tools again in the future to prove that changes were impactful.

The moral of the story

In a world filled with data, students who learn statistics leave college with a skill set that is highly sought after. When it comes to working with data, they will have advantages over professionals who don’t have formal statistical training.

But as most of us know, there’s more to data analysis than memorizing a bunch of equations. And even experienced statisticians can forget some of the nuances involved in analyses they haven’t used in a while. What they don’t forget is how to attack a problem.

In the end, that’s what I wanted these students to understand—that data-driven questions really boil down to problem-solving. What question am I trying to answer, and how do I tackle it?

The real world is about identifying a problem when we encounter it, choosing the right tool to solve it, and interpreting the answer in a way that drives a manager or executive to enact change within a business. Because businesses face real questions and challenges that statisticians—and software like Minitab—can help answer.

↧

Everyone's Talking "Big Data"...But Size Isn't What Matters

September 24, 2015, 5:00 am

≫ Next: How to Improve Cpk

≪ Previous: Regression in the Real World

You know what the big thing is in the data analysis world—"Big Data." Big, big, big, very big data. Massive data. ENORMOUS data. Data that is just brain-bendingly big. Data so big that we need globally interconnected supercomputers that haven't even been built yet just to contain one one-billionth of it. That's the kind of big data everybody's so excited about.

Whatever. There's no denying that the proliferation of data about seemingly every nuance of the world opens many, many opportunities to answer important questions. But in practical terms, "big data" seems to work much the same as...well...regular data, with same risks, benefits, and potential for misuse and abuse.

"Big Data" = Big Confusion

Part of my bad attitude toward the term "big data" stems from my writing background. I like idioms, coinages and phrases that really communicate something meaningful about their subject. "Punk rock," for example, is a linguistically elegant way to describe the stylistic and attitudinal shift performers like the Ramones and the Germs brought into rock music in the mid-1970s. It tells you who, what, and even a little bit of how a cultural transformation occurred.

The term "Big Data" isn't so elegant or easily defined.The term's origin is obvious: thanks to technology, we can collect, generate, store, slice, dice, and splice together data sets that really were unimaginably big just a few years ago. And a few years hence, our capacity to generate and store more data about more and more and more stuff will be even greater.

But from that sensible origin, the buzzword "Big Data" has assumed such a wide variety of meanings that it's a useless term for anyone who values specificity. When someone says "Big Data" in your staff meeting, is everyone in the room thinking about the same thing? Probably not.

Lisa Arthur summed the issue up nicely in a column on Forbes.com:

Big data is new and “ginormous” and scary–very, very scary. No, wait. Big data is just another name for the same old data marketers have always used, and it’s not all that big, and it’s something we should be embracing, not fearing. No, hold on. That’s not it, either. What I meant to say is that big data is as powerful as a tsunami, but it’s a deluge that can be controlled . . . in a positive way, to provide business insights and value. Yes, that’s right, isn’t it?

Oh, who knows? And honestly, who really cares? Trying to find a firm definition of "Big Data" is focusing on the wrong detail about the opportunities all of this available data offers.

No Matter How Big, Data Answers Only the Questions You Ask

You can use statistical software like Minitab to make sense of your data. We've described Minitab as a tool to help you "hear what your data are saying." That's useful shorthand to convey the benefits of data analysis, but it's probably more accurate to say that statistical software facilitates a conversation with your data. Or perhaps an interrogation. You can think of statistical software as a translator, putting the language of raw numbers into terms that are easier and faster for most of us to understand.

The problem is, your data is only going to answer the questions you ask. It doesn't just volunteer valuable information. And if you're staring at your data and wondering where to begin, does it really matter whether you've got 30 data points or 300 million?

Of course, most peoplehave some idea of what they'd like to find out, even if they have a little trouble articulating it precisely. But even a straightforward question like "Did switching suppliers improve product quality?" can be trickier than it seems. Because as any translator will tell you, the answer to a question might be different depending on how you ask it. A query that's sensitive to the right nuances and considerations may yield a very detailed and useful response, while an unsubtle and overly broad question might bring an answer that's technically accurate, but practically unusable.

Bernard Marr put it very well in a recent column:

...data on its own is meaningless. Remember the value of data is not the data itself – it’s what you do with the data.

Big Enough to Render Everything Significant?

My last quibble with the current hoopla over "big data" is the fact that it, perhaps unintentionally, lends credence to the notion that more data is always better. In fact, more data is not always better.

Marr touches on this point as well. He notes that the advent of "big data" lets decision-makers back up their choices with "facts." The problem, he aptly points out, is that "all those 'facts' can conceal the truth." In other words, get a data set that's big enough and you can get the right answer to just about any misguided or ill-considered question you want to ask. "A lot of data can generate lots of answers to things that don't really matter," as Marr says.

I'd take it a step further than that, though: when the data gets big enough, nearly everything can become statistically significant. If you're paying attention to the difference between statistical and practical significance, that's not a big problem. But if you're not paying attention, or if, perhaps, you're very eager to see evidence of a particular outcome, conflating statistical and practical significance is all too easy.

My favorite demonstration of this was discussed in an earlier post on the Minitab Blog, in which occasional blogger the Stats Cat related the tale of an analysis of MRI data that found statistically significant evidence of brain activity in a dead, frozen salmon. Of course, the statistical significance was meaningless on a practical level—the data set was just so big that finding statistical significance was essentially inevitable. Despite the statistical "evidence" of brain activity, that salmon remained indisputably dead.

It's a great cautionary tale that illustrates why more data doesn't always give better answers, and reminds us that unless we're careful about the questions we ask and the conclusions we make, our big-data dreams also have the potential to lead us into nightmarish mistakes.

↧

How to Improve Cpk

September 25, 2015, 5:00 am

≫ Next: How Many Samples Do You Need to Be Confident Your Product Is Good?

≪ Previous: Everyone's Talking "Big Data"...But Size Isn't What Matters

You run a capability analysis and your Cpk is bad. Now what?

First, let’s first start by defining what “bad” is. In simple terms, the smaller the Cpk, the more defects you have. So the larger your Cpk is, the better. Many practitioners use a Cpk of 1.33 as the gold standard, so we’ll treat that as the gold standard here, too.

Suppose we collect some data and run a capability analysis using Minitab Statistical Software. The results reveal a Cpk of 0.35 with a corresponding DPMO (defects per million opportunities) of more than 140,000. Not good. So how can we improve it? There are two ways to figure that out:

#1 Look at the Graph

Example 1: The Cpk for Diameter1 is 0.35, which is well below 1.33. This means we have a lot of measurements that are out of spec.

Using the graph, we can see that the data—represented by the blue histogram—is not centered between the spec limits shown in red. Fortunately, variability does not appear to be an issue since the histogram and corresponding normal curve can physically fit between the specification limits.

Q: How can we improve Cpk?

A: Center the process by moving the mean closer to 100 – halfway between the spec limits – without increasing the variation.

Example 2: In the analysis for Diameter2, we see a meager Cpk of only 0.41. Fortunately, the data is centered relative to the spec limits. However, the histogram and corresponding normal curve extend beyond the specs, yielding a meager Cpk of only 0.41.

Q: How can we improve Cpk?

A: Reduce the variability, while maintaining the same average.

Example 3: In the analysis for Diameter3, we can see that the process is not centered between the specs. To make matters worse, the histogram and corresponding normal curve are wider than the tolerance (i.e. the distance between the spec limits), which indicates that there’s also too much variability.

Q: How can we improve Cpk?

A. Shift the mean closer to 100 to center the process AND reduce the variation.

#2 Compare Cp to Cpk

Cp is similar to Cpk in that the smaller the number, the worse the process, and we can use the same 1.33 gold standard. However, the two statistics and their corresponding formulas differ in that Cp only compares the spread of the data to the tolerance width, and does not account for whether or not the process is actually centered between the spec limits.

Interpreting Cp is much like asking “will my car fit in the garage?” where the data is your car and the spec limits are the walls of your garage. We’re not accounting for whether or not you’re a crappy driver and can actually drive straight and center the car—we’re just looking at whether or not your car is narrow enough to physically fit.

Example 1: The analysis for Diameter1 has a Cp of 1.64, which is very good. Because Cp is good, we know the variation is acceptable—we can physically fit our car in the garage. However, Cpk, which does acccount for whether or not the process is centered, is awful, at only 0.35.

Q: How can we improve Cpk?

A: Shift the mean to center the process between the specs, without increasing the variation.

Example 2: The analysis for Diameter 2 shows that Cp = 0.43 and Cpk = 0.41. Because Cp is bad, we know there’s too much variation—our car cannot physically fit in the garage. And because the Cp and Cpk values are similar, this tells us that the process is fairly centered.

Q: How can we improve Cpk?

A: Reduce the variation, while maintaining the same average.

Example 3: The analysis for Diameter 3 has a Cp = 0.43 and Cpk = -0.23. Because Cp is bad, we know there’s too much variation. And because Cp is not even close to Cpk, we know that the process is also off center.

Q: How can we improve Cpk?

A. Shift the mean AND reduce the variation.

And for a 3rd way...

Whether you look at a capability analysis graph or compare the Cp and Cpk statistics, you’re going to arrive at the same conclusion regarding how to improve your results. And if you want yet another way to figure out how to improve Cpk, you can also look at the mean and standard deviation—but for now, I’ll spare you the math lesson and stick with #1 and #2 above.

In summary:

↧

How Many Samples Do You Need to Be Confident Your Product Is Good?

September 28, 2015, 5:00 am

≫ Next: The Longest Drive: Golf and Design of Experiments, Part 2

≪ Previous: How to Improve Cpk

How many samples do you need to be “95% confident that at least 95%—or even 99%—of your product is good?

The answer depends on the type of response variable you are using, categorical or continuous. The type of response will dictate whether you 'll use:

Attribute Sampling: Determine the sample size for a categorical response that classifies each unit as Good or Bad (or, perhaps, In-spec or Out-of-spec).
Variables Sampling: Determine the sample size for a continuous measurement that follows a Normal distribution.

The attribute sampling approach is valid regardless of the underlying distribution of the data. The variables sampling approach has a strict normality assumption, but requires fewer samples.

In this blog post, I'll focus on the attribute approach.

Attribute Sampling

A simple formula gives you the sample size required to make a 95% confidence statement about the probability an item will be in-spec when your sample of size n has zero defects.

, where the reliability is the probability of an in-spec item.

For a reliability of 0.95 or 95%,

For a reliability of 0.99 or 99%,

Of course, if you don't feel like calculating this manually, you can use the Stat > Basic Statistics > 1 Proportion dialog box in Minitab to see the reliability levels for different sample sizes.

one-sample-proportion

These two sampling plans are really just C=0 Acceptance Sampling plans with an infinite lot size. The same sample sizes can be generated using Stat > Quality Tools > Acceptance Sampling by Attributes by:

Setting RQL at 5% for 95% reliability or 1% for 99% reliability.
Setting the Consumer’s Risk (β) at 0.05, which results in a 95% confidence level.
Setting AQL at an arbitrary value lower than the RQL, such as 0.1%.
Setting Producer’s Risk (α) at an arbitrary high value, such as 0.5 (note, α must be less than 1-β to run).

By changing RQL to 1%, the following C=0 plan can be obtained:

If you want to make the same confidence statements while allowing 1 or more defects in your sample, the sample size required will be larger. For example, allowing 1 defect in the sample will require a sample size of 93 for the 95% reliability statement. This is a C=1 sampling plan. It can be generated, in this case, by lowering the Producer’s risk to 0.05.

As you can see, the sample size for an acceptance number of 0 is much smaller, but how realistic is this objective? That's a question you will need to answer.

Check out this post for more information about acceptance sampling.

↧

The Longest Drive: Golf and Design of Experiments, Part 2

September 29, 2015, 5:00 am

≪ Previous: How Many Samples Do You Need to Be Confident Your Product Is Good?

Step 1 in our DOE problem-solving methodology is to use process experts, literature, or past experiments to characterize the process and define the problem. Since I had little experience with golf myself, this was an important step for me.

This is not an uncommon situation. Experiment designers often find themselves working on processes that they have little or no experience with. For example, a quality engineer might be assigned to try to solve a process problem as part of a cross-functional team in another department, or in a supplier’s facility. Regardless of the situation, it’s never too early to open your mind and ears to all possible information to characterize your process and problem.

Based on contributions we solicited from experienced golfers, a Penn State golf coach, other statisticians who had tackled similar problems, and all the Internet had to offer, we were able to assemble a list of potential inputs for flight distance (Carry), rolling distance (Roll), and the corresponding total drive distance (Total).

Golfer – Block
Club speed on ball contact – Covariate
Contact point on the club face with respect to the center of the club face – Covariate
Tilt angle designed into the club face – Factor
Club shaft stiffness – Factor
Ball characteristics such as hardness, core composition, aerodynamics, etc. – Factor
Height of the ball on the tee – Factor
Club path and position on contact such as angle, arc and shaft flex – Noise
Golfer’s grip strength on the club – Noise
Ground surface conditions (incline, firmness, grass height, dampness, etc.) – Noise
Air temperature and humidity – Noise
Wearing expensive shorts with a Bubba Watson logo – Noise
Golfer’s arm length – Noise
Club head weight and club length – Noise
Ball spin rate and direction coming off the club – Noise

Managing the Inputs

These inputs have been classified into 4 categories according to how they will be managed in our experiment. Let’s walk through each of the four groups.

Blocking

Each golfer brings a unique style, swing, and athleticism to the game. Because each data point can be traced back to an individual golfer, the variability between golfers will be handled using a technique called blocking. The data from each golfer will be standardized according to the average driving performance of that golfer compared to the other golfers in the study.

Pardon the pun, but blocking essentially handicaps each golfer by their average distance so that all the data can be combined into one analysis without concern about golfer-to-golfer variation. When you set up an experiment, blocking allows you to take advantage of all the resources available to you (three manufacturing lines, two measurement technicians, etc.) without concern for the variability from block to block. Our experimental design and analysis will block on the golfer so we can take advantage of using several different golfers in the experiment.

Analysis of Covariance (Covariates)

Inputs 2 and 3, club speed and club/ball contact location on the club, are noise variables that we cannot control from drive to drive, but have a strong effect on our responses. The important distinction from other noise variables is that we can measure them on each drive. By establishing the average linear relationship between these covariates and our responses, we can mathematically adjust each Carry and Roll measurement for club speed and club/ball contact location for that drive.

By treating club speed and club/ball contact location as covariates, we can greatly reduce the level of background noise in our experiment data, thus giving us a clearer estimate of the effects of the experimental factors we are studying. Nearly all processes have noise variables that cannot be controlled and contribute to the overall process variation. A key experiment design goal for your study should be to plan to measure these variables during each run so that their effect can be removed from the background variability. This will reduce the overall level of noise you have to deal with in the final analysis.

Noise

Noise variables add to the background variability of our data. This variability can obscure our ability to measure the true effect of our research variables, which are the ones of interest to us.

Imagine seeing Michael Jordan playing basketball on the rare occasions when he had a bad game. If you'd never seen him play before, you could easily walk out of the game thinking “He is not very good.” Of course, you would have reached the wrong conclusion, even though you used data to make you decision, because your data was affected by background variability. In our DOE, we want to minimize the background variability in order to decrease the probability of making an incorrect conclusion.

There are several ways to do this. For example, inputs 8 and/or 15 can result in a very bad drive. We can minimize the impact on our results by discarding any drive that is an obvious slice into the trees. Likewise, as experimenters, we should be attentive to measurements that are strongly influenced by noise variables and should exclude those measurements as outliers. Analysis of residuals is a powerful tool to detect outliers (one which will be described in a later post), but the best time to identify an outlier is when the sample is made or measured. This way, the extraneous circumstances leading to the outlier can be immediately noted and considered in the decision to remove the data point.

Club head weight and club length could have been factors in this experiment. But the cost of the additional drivers required to study these club properties would have been prohibitive. In addition, club length and weight are fairly well standardized, and are not something every golfer can change on demand. Because of this, club length and club weight were held constant in our experiment to minimize their impact on the variability of the results and to prevent them from unintentionally biasing the measured effects of the factors in the experiment. In your own process experiments, you will have to use your engineering knowledge and practical considerations to determine which inputs should be held constant.

Finally, a noise variable such as input 9, grip strength on the club, will vary from drive to drive within a golfer and this change in grip strength cannot be easily measured. Therefore, grip strength variability will be one of many sources contributing to the overall process variation that we will have to contend with in our experiment. For such variables, the best approach is to remind the experimenters of the importance of consistency in their performance. For your process experiment, standardizing procedures and protocols where possible, along with requiring consistency within an experimenter throughout the study, is a good way to lower your background process variation.

In the end, there will be some unexplained process variation. We had no control over which golfers were going to break out the Bubba Watson golf shorts for this event. Unexplained variation will be quantified and utilized in two ways. First, the amount of unexplained process variability (error) will have to be accurately measured so that it can be compared to the size of the experimental factor effects in the final analysis. This comparison allows us to determine which effects are much larger than the level of error in the data—or in other words, which effects are statistically significant. Second, in the experiment design phase, we will have to estimate the level of process variation we expect to see in the experiment so that the number of measurements (sample size = N) needed to detect the factor effects in midst of the noise can be calculated. This calculation, known as power and sample size, will be illustrated in a future post.

Variability Summary

In summary, all of our process knowledge was used to generate a list of potential inputs to our responses. The inputs that were not selected to be studied in the experiment were classified as noise variables, blocking variables, or covariates. The blocking variables and covariates will be incorporated into our experiment design and analysis. The noise variables are either held constant during the experiment, carefully monitored to remove obvious outliers, controlled as best we can, or allowed to vary during the experiment because there is no other practical option.

After all of this planning, we haven’t said anything about experiment factors! This is a little-known fact about experiment design: the first key to success is planning to control, limit, account for, and finally measure the variability in your data. In your final analysis, your estimate of the background process variation will be in the calculation of every test statistic in your analysis. You have to work hard to get it right!

Experiment Design for Factors

Now that we have a good handle on our process variation, we can move forward to the four research variables (factors) in our experiment: tee height, shaft stiffness, ball quality, and club tilt angle.

One commonly held theory is that greater tilt angle will give higher loft for longer Carry. However, the velocity spent going up is not going forward and a longer “hang time” means that air resistance has more time to have a negative effect. In addition, the sharp angle of decent from high loft will lower the Roll. But does the tee height result in higher or lower loft? How do the shaft stiffness and ball quality affect the velocity of the ball off the tee? Of course a faster ball is going to go further.

In our experiment, as with your process, there will be many competing theories about how to reach the end goal. What levels of the factors should be tested and in what combinations to get to the right solution? What is the smallest number of tests we can run while still achieving an accurate and comprehensive understanding of our process (including a way to optimize the responses)? In the next post, I will discuss the different options for factor level settings/combinations for our four factors and ultimately, determine our basic experiment plan.

Many thanks to Toftrees Golf Resort and Tussey Mountain for the use of their facilities to conduct our golf experiment.

↧

Make Similar Graphs to Unclutter Data from the American Consumer Survey

September 30, 2015, 8:54 am

≫ Next: P Values and the Replication of Experiments

≪ Previous: The Longest Drive: Golf and Design of Experiments, Part 2

September 17 marked the release of new information from the American Community Survey (ACS) from the U.S. Census Bureau. Here’s a bar chart of what the press releases looked like for that day:

Most press releases from the Census Bureau about the ACS were about declines in uninsured people in major metropolitan areas.

Clearly there was a theme in play, one that was great news for major metropolitan areas. The Census Bureau even released a graph showing that the percentage of people within the 25 most populous metropolitan areas in the United States all saw declines in their percentages of uninsured people.

I tend a bit towards cynicism, so I wondered...why play up just the top 25 metro areas instead of the 50 states? It wouldn’t be that much harder to see...unless, that is, your graph is cluttered. After all, the default graph in Minitab for comparing numbers of uninsured people per state looks a bit like this:

This bar chart is too cluttered to easily identify the individual bars.

Now, I remember from my school days that we have a senate with two members from each state so that all states are represented equally. But when it comes to population, some states don’t really belong together. Case in point: when it comes to numbers of people, there’s hardly ever a reason for California and Vermont to be on the same graph.

Fortunately, Minitab makes it easy to create similar graphs of different variables in a worksheet. Let’s say that you’re starting from a worksheet like this one, where the states, Puerto Rico, and Washington, D.C., have been divided into categories based on the number of uninsured people in 2013. If you're not already using Minitab Statistical Software and you'd like to follow along, you can get it free for 30 days.

First, create a graph of the large states.

Choose Graph > Bar Chart.
In Bars represent, select Values from a table.
Under Two-way table, select Cluster. Click OK.
In Graph variables, enter ‘2013_Large’ ‘2014_Large’.
In Row Labels, enter ‘Region Large’.
In Table Arrangement, select Rows are outermost categories and columns are innermost.
Click Chart Options.
In Order Main X Groups By, select Decreasing Y. Click OK in both dialog boxes.

An ordered bar chart of the states with the highest number of uninsured people in 2013.

Next, let's edit the graph.

Double-click the bars.
Select the Groups tab.
Check Assign attributes by graph variables. Click OK.
Double-click the label in the legend.
Replace the variable name with only the year.
Double-click the labels for the years.
Select the Show tab.
In Show Labels by Scale Level, uncheck the tick labels for the graph variables and the axis label for the region variable. Click OK.

An edited bar chart of the states with the most uninsured people in 2013.

If you had to repeat these steps to create graphs for the medium, small, and smallest states, it would be laborious. Fortunately, Minitab makes it quick to recreate a graph with the same edits.

With the edited bar chart selected, choose Editor > Make Similar Graph.
Replace the large state variables with the medium state versions of the variables. Click OK.

Medium states in terms of the number of uninsured people in 2013.

All of the edits you made to the first graph are already on the second graph! You can repeat the process just as quickly for the last two categories.

Small states in terms of the number of uninsured people in 2013

Smallest states, Washington D.C. and Puerto Rick

With the uncluttered graphs, you can easily see that Louisiana and Nebraska are the only states that saw increases in the number of uninsured people from 2013 to 2014.

On a state-by-state level, the news looks good for the change in the number of uninsured people between 2013 and 2014. And if you need to divide up a cluttered bar chart, the news is pretty good that you have the capability to make a similar graph in Minitab. Repeating all your edits, without having to make any of them, is another way that you can get the answers you need from your analysis.

Bonus

To prepare the data set I showed, I used Minitab's Unstack feature. To see that and a few other tips with worksheets, see what Eston Martz demonstrated in the helpful piece What to Do When Your Data's a Mess, part 2. Or, if you're using Minitab Express, follow along with the online example of unstacking your data.

↧

P Values and the Replication of Experiments

October 1, 2015, 5:00 am

≫ Next: Specification Limits and Stability Studies

≪ Previous: Make Similar Graphs to Unclutter Data from the American Consumer Survey

An exciting new study sheds light on the relationship between P values and the replication of experimental results. This study highlights issues that I've emphasized repeatedly—it is crucial to interpret P values correctly, and significant results must be replicated to be trustworthy.

The study also supports my disagreement with the decision by the Journal of Basic and Applied Social Psychology to ban P values and confidence intervals. About six months ago, I laid out my case that P values and confidence intervals provide important information.

The authors of the August 2015 study, Estimating the reproducibility of psychological science, set out to assess the rate and predictors of reproducibility in the field of psychology. Unfortunately, there is a shortage of replication studies available for this study to analyze. The shortage exists because, sadly, it’s generally easier for authors to publish the results of new studies than replicate studies.

To get the reproducibility study off the ground, the group of 300 researchers associated with the project had to conduct their own replication studies first! These researchers conducted replications of 100 psychology studies that had already obtained statistically significant results and had been accepted for publication by three respected psychology journals.

Overall, the study found that only 36% of the replication studies were themselves statistically significant. This low rate reaffirms the importance of replicating the results before accepting a finding as being experimentally established!

Scientific progress is not neat and tidy. After all, we’re trying to model a complex reality using samples. False positives and negatives are an inherent part of the process. These issues are why I oppose the "one and done" approach of accepting a single significant study as the truth. Replication studies are as important as the original study.

The study also assessed whether various factors can predict the likelihood that a replication study will be statistically significant. The authors looked at factors such as the characteristics of the investigators, hypotheses, analytical methods, as well as indicators of the strength of the original evidence, such as the P value.

Most factors did not predict reproducibility. However, the study found that the P value did a pretty good job! The graph shows how lower P values in the original studies are associated with a higher rate of statistically significant results in the follow-up studies.

Bar chart that shows replication rate by original P-value

Right now it’s not looking like such a good idea to ban P values! Clearly, P values provide important information about which studies warrant more skepticism.

The study results are consistent with what I wrote in Five Guidelines for Using P Values:

The exact P value matters—not just whether a result is significant or not.
A P value near 0.05 isn’t worth much by itself.
Replication is crucial.

It’s important to note that while the replication rate in psychology is probably different than other fields of study, the general principles should apply elsewhere.

↧

Specification Limits and Stability Studies

October 5, 2015, 5:00 am

≫ Next: The Longest Drive: Golf and Design of Experiments, Part 3

≪ Previous: P Values and the Replication of Experiments

I was recently asked a couple of questions about stability studies in Minitab.

Question 1: If you enter in a lower and upper spec in the Stability Study dialog window, why do I see only one confidence bound per fitted line on the resulting graph? Shouldn’t there be two?

You use a stability study to analyze the stability of a product over time and to determine the product's shelf life. In order to run this in Minitab, you need:

Measurement Data
Time Variable
Batch Factor(Optional)

Shown below is a sample of the first 14 rows of a Stability Study data set. The full data set can be found in our Sample Data folder within Minitab Statistical Software. You can download a free 30-day trial of the software if you're not already using it. You can access this folder via File > Open Worksheet, then click on the 'Look in Sample Data Folder' button. The file is called shelflife.mtw.

The Month column represents the month at which the age of the product was collected. The batch column represents where the product originated from. In the sixth row, for example, the drug concentration percentage for Batch 2 at 3 months was 99.478%.

With this information, the stability study will help you estimate the average length of time that the response will be within specification. To satisfy my inquisitor’s first question, we will use a lower spec of 90% and an upper spec of 105%.

The Stability Study dialog box:

The Resulting Graph:

Minitab first checks to see if the starting point of the fitted line is between specs, and then determines the direction of the slope of the fitted lines before deciding what limit to calculate the shelf life from. If the decrease in the mean response is significant, then Minitab calculates the shelf life relative to the lower specification limit.

If the increase in the mean response over time is significant, Minitab calculates the shelf life relative to the upper specification limit. How we choose our bound is then decided by what spec Minitab has sided with. Thus, the 95% lower bound is only being shown in relation to the corresponding fitted line above it. From a conceptual standpoint, if the slope of the mean response line is trending downward, then you'd be looking at where its worst case scenario, the 95% lower bound, intersects with that lower spec. The overall shelf life for the batches is at 54.79 months for a 90% concentration.

Question 2: I get asterisks for Shelf Life at each Batch, as show below:

Batch Shelf Life

1 *

2 *

3 *

4 *

5 *

Overall *

This question is closely related to the first. This depends on the slope’s direction and what specification, lower or upper, you have chosen. Most likely, you won’t run into this situation if:

a. Your fitted line has a significant negative slope and you are only inputting a lower spec.

b. Your fitted line has a significant positive slope and you are only inputting an upper spec.

If you run a stability study with two specs, you may receive these asterisks if the mean response at time = 0 is not within both specifications. You can see this when we use a lower spec of 90 and an upper spec of 98:

For all batches with negative slope, if the response starts out above the upper spec it could never potentially go out of spec in the future, at least based on this model's prediction. It can't calculate a shelf-life for those batches.

It’s a different story in the first question we discussed, as the mean response at time = 0 was below the upper spec:

On a side note, there is another situation which can cause you to obtain all asterisks for the shelf life of the batches. This will happen when the slopes of all fitted lines on the graph are simply not significant.

I hope this information helps you when you perform your next stability study!

↧

The Longest Drive: Golf and Design of Experiments, Part 3

October 6, 2015, 5:00 am

≫ Next: Big Ten 4th Down Calculator: Week 2

≪ Previous: Specification Limits and Stability Studies

Step 2 in our DOE problem-solving methodology is to design the data collection plan you will use to study the factors in your experiment. Of course, you will have to incorporate blocking and covariates in your experiment design, as well as calculate the number of replications of run conditions needed in order to be confident in your results.

We will address these topics in future posts, but for now, let’s focus on the settings of the factors in our golf experiment. We will construct a full factorial design, fractionate that design to half the number runs for each golfer, and then discuss the benefits of running our experiment as a factorial design.

The four factors in our experiment and the low / high settings used in the study are:

Club Face Tilt (Tilt) – Continuous Factor : 8.5 degrees & 10.5 degrees
Ball Characteristics (Ball) – Categorical Factor : Economy & Expensive
Club Shaft Flexibility (Shaft) – Continuous Factor : 291 & 306 vibration cycles per minute
Tee Height (TeeHght) – Continuous Factor : 1 inch & 1 3/4 inch

To develop a full understanding of the effects of 2 – 5 factors on your response variables, a full factorial experiment requiring 2k runs ( k = of factors) is commonly used. Many industrial factorial designs study 2 to 5 factors in 4 to 16 runs (25-1 runs, the half fraction, is the best choice for studying 5 factors) because 4 to 16 runs is not unreasonable in most situations. The data collection plan for a full factorial consists of all combinations of the high and low setting for each of the factors. A cube plot, like the one for our golf experiment shown below, is a good way to display the design space the experiment will cover.

There are a number of good reasons for choosing this data collection plan over other possible designs. The details are discussed in many excellent texts, such as Wu & Hamada (2009), Mee (2009), and Montgomery (2013).

Five Important Reasons Why Factorial Experiments Are So Successful

Factorial and fractional factorial designs provide the most run efficient (economical) data collection plan to learn the relationship between your response variables and predictor variables. They achieve this efficiency by assuming that each effect on the response is linear and therefore can be estimated by studying only two levels of each predictor variable. After all, it only takes two points to establish a line.
Factorial designs estimate the interactions of each input variable with every other input variable. Often the effect of one variable on your response is dependent on the level or setting of another variable. The effectiveness of a college quarterback is a good analogy. A good quarterback can have good skills on his own. However, a great quarterback will achieve outstanding results only if he and his wide receiver have synergy. As a combination, the results of the pair can exceed the skill level of each individual player. This is an example of a synergistic interaction. Complex industrial processes commonly have interactions, both synergistic and antagonistic, occurring between input variables. We cannot fully quantify the effects of input variables on our responses unless we have identified all active interactions in addition to the main effects of each variable. Factorial experiments are specifically designed to estimate all possible interactions.
Factorial designs are orthogonal. We analyze our final experiment results using least squares regression to fit a linear model for the response as a function of the main effects and two-way interactions of each of the input variables. A key concern in least squares regression arises if the settings of the input variables or their interactions are correlated with each other. If this correlation occurs, the effect of one variable may be masked or confounded with another variable or interaction making it difficult to determine which variables actually cause the change in the response. When analyzing historical or observational data, there is no control over which variable settings are correlated with other input variable settings and this casts a doubt on the conclusiveness of the results. Orthogonal experimental designs have zero correlation between any variable or interaction effects specifically to avoid this problem. Therefore, our regression results for each effect are independent of all other effects and the results are clear and conclusive.
Factorial designs encourage a comprehensive approach to problem solving. They do this in two ways. First, intuition leads many researchers to reduce the list of possible input variables before the experiment in order to simplify the experiment execution and analysis. This intuition is wrong. The power of an experiment to determine the effect of an input variable on the response is reduced to zero the minute that variable is removed from the study (in the name of simplicity). Through the use of fractional factorial designs and experience in DOE, you quickly learn that it is just as easy to run a 7 factor experiment as a 3 factor experiment, while being much more effective. Secondly, factorial experiments study each variable’s effect over a range of settings of the other variables. Therefore, our results apply to the full scope of all the process parameter settings rather than just specific settings of the other variables. Our results are more widely applicable to all conditions than the results from studying one variable at a time.
Two-level factorial designs provide an excellent foundation for a variety of follow-up experiments, which will lead to the solution to your process problem. A fold-over of your initial fractional factorial can be used to complement an initial lower resolution experiment, providing a complete understanding of all your input variable effects. Augmenting your original design with axial points results in a response surface design to optimize your response with greater precision. The initial factorial design can provide a path of steepest ascent / descent to move out of your current design space into one with even better response values. Finally, and perhaps most commonly, a second factorial design with fewer variables and a smaller design space can be created to better understand the highest potential region for your response within the original design space.

As if these 5 success factors were not enough to convince us to use a 2-level factorial design for our base data collection plan, the references below also describe many other reasons supporting our choice.

Analysis of factorial designs produce the lowest variance estimators of our regression equation coefficients. Two-level factorial designs are highly efficient in their coverage of the desired experimental variable space. Finally, the results from these designs will allow you to easily create graphics such as the main effects, interaction, and cube plots—which are excellent communication tools in explaining your results.

Summary

I hope this short discussion has convinced you that any researcher in academics or industry will be well rewarded for the time spent learning to design, execute, analyze, and communicate the results from factorial experiments. The earlier in your career you learn these skills, the … well, you know the rest.

For these reasons, we can be quite confident about our selection of a full factorial data collection to study the 4 variables for our golf experiment. Each golfer will be responsible for executing only one half of the runs, called a half fraction, of the full factorial. Even so, the results for each golfer can be analyzed independently as a complete experiment.

In my next post, I will answer the question: How do we calculate the number of replicates needed for each set of run conditions from each golfer so that our results have a high enough power that we can be confident in our conclusions?

Previous Golf DOE Posts

The Longest Drive: Golf and Design of Experiments, Part 1

The Longest Drive: Golf and Design of Experiments, Part 2

References

Mee, R. W. (2009). A Comprehensive Guide to Factorial Two-Level Experimentation. Springer Science and Business Media, London – New York.

Montgomery, D.C. (2013). Design and Analysis of Experiments, 8th ed., Hoboken, NJ: John Wiley & Sons.

Wu, C.F. Jeff, Hamada, M.S. (2009). Experiments – Planning, Analysis, and Optimization. New York, NY: John Wiley & Sons.

Many thanks to Toftrees Golf Resort and Tussey Mountain for use of their facilities to conduct our golf experiment.

↧

Big Ten 4th Down Calculator: Week 2

October 7, 2015, 5:00 am

≫ Next: Does Every Good Analytical Chemist Need to Be a Statistician?

≪ Previous: The Longest Drive: Golf and Design of Experiments, Part 3

It was a wild weekend in the Big Ten. Four of the six conference games were decided by a touchdown or less, and all of those close games means we have plenty of 4th down decisions to analyze.

If you're new to the Big Ten 4th Down Calculator, I've used Minitab Statistical Software to create a model to determine the correct 4th down decision. And for the rest of the college football season, I'll use that model to track every 4th down decision in Big Ten Conference games.

One caveat before we begin. The decision the calculator gives isn’t meant to be written in stone. In hypothesis testing, it’s important to understand the difference between statistical and practical significance. A test that concludes there is a statistically significant result doesn’t imply that your result has practical consequences. You should use your specialized knowledge to determine whether the difference is practically significant.

The same line of thought should be applied to the 4th down calculator. Coaches should also consider other factors, like the game situation and the strengths and weaknesses of their team. But the 4th down calculator still provides a very strong starting place for the decision making! In fact, we can use the model to create a handy-dandy chart that gives a general idea of what your 4th down decision should be!

4th Down Decision Chart

Okay, enough of the pregame show, let’s get to the games!

Michigan 28 - Maryland 0

We'll start with the blowouts, and save the good stuff until later. For each game I'll break the analysis into two sections, 4th down decisions in the first 3 quarters, and 4th down decisions in the 4th quarter. The reason to separate the two is because in the first 3 quarters, coaches should be trying to maximize the amount of points they score. But in the 4th quarter, they should be maximizing their win probability. To calculate win probability, I’m using this formula from Pro Football Reference.

In this game Maryland clearly was routed. But could more aggressive 4th down decision-making have made a difference?

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Maryland

0 8 0 0 0

Michigan

1 4 3 2

0.93

Every single one of Maryland's 4th down decisions resulted in a punt, and the 4th down calculator agreed with every single one of them. That about sums up this game for Maryland. Six of their eight 4th down distances were 9 yards or more. Maryland never really had the opportunity to play aggressive.

Michigan's 4th down decisions agreed with the model for the most part, including two attempts to convert on 4th down in Maryland territory. They successfully converted one of the 2 attempts, which later led to a field goal. And the 4th down they didn't convert? Maryland went 3 and out on the next possession.

The only disagreement with the model came right before halftime. Michigan had a 4th and 1 on the Maryland 14 yard line, and they kicked the field goal instead of going for it. The decision to kick cost the Wolverines almost a full point in expected points lost. And at the time the score was only 3-0, so the game was very much in doubt. Now, I'm sure the fact that there was only a minute left in the half factored into the decision. But a first down would have stopped the clock and given Michigan the ball inside the Maryland 13-yard line. I think there would have still been plenty of time to score a touchdown (and still attempt a field goal if they didn't score). It didn't end up mattering, but Michigan should be more aggressive in the future.

By the time this game reached the 4th quarter, Michigan held a 21-0 lead, which they soon increased to 28-0. Maryland did end up punting on a 4th and 1 with 5 minutes left, but since they were behind by 4 touchdowns, the decision really didn't matter. Let's move on to the next game.

Northwestern 27 - Minnesota 0

Northwestern continued their undefeated season with an impressive shutout win against Minnesota. But their 4th down decision-making? Not as impressive. In fact, the 4th down calculator disagreed with the first 5 4th down decisions in this game.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Northwestern

4 2 3 1 2.34 Minnesota

2 5 0 2

1.13

Lucky for Northwestern, this game wasn't close, because they left over 2 points on the table! In fact, Northwestern made sub-optimal decision on their first 3 possessions.! On 4th and 3 at the Minnesota 8, they kicked a field goal instead of going for it. Inside the 10 yard line, teams should be much more aggressive on 4th down, the reason being that even if you fail, the other team has the ball deep in their own territory. If Minnesota were to start at their own 8 yard line, Northwestern would actually be more likely to score next. It's really a win/win. You either score a touchdown, or you fail but are more likely to score next anyway. Kicking the field goal is giving away free points. This decision alone cost Northwestern almost a full point!

Their next possession, Northwestern punted on 4th and 1. The location on the field does not matter, as the 4th down calculator will always say to go for it. And their 3rd possession, they kicked another field goal on 4th and 3. This decision wasn't as bad as the first, since they were at the Minnesota 23-yard line and not inside the 10. But as the home team, they should have been more aggressive. Oh, and that second field goal attempt? They missed it.

Minnesota punted on 4th and 3 from the Northwestern 40 when the calculator would have said to go for it. But in their next possession, they were actually too aggressive! On 4th and 7 from the Northwestern 31, they decided to go for it instead of kicking a 48 yard field goal. From 40 yards or more, Minnesota kicker Ryan Santoso is 7 for 14 in his career. The 4th down calculator assumes a 49 yard field goal will be made 55% of the time. So those numbers are pretty close. Now, I didn't see the game, so perhaps it was windy or raining. If for some reason the coach didn't think his chance of making the field goal was close to 50%, the calculator agrees with going for it over punting. But if not, Minnesota should have attempted the field goal.

Northwestern had this game in hand by the 4th quarter, so we'll move on to the close games.

Ohio State 34 - Indiana 27

Before the season I wrote about how Indiana should never punt. If only they had listened, they might have pulled the upset.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Ohio State

2 5 3 1 0.73 Indiana

2 9 1 2 2

The calculator agreed with 8 of Indiana's 9 decisions to punt, so maybe that "never punting" thing was a little too extreme. But there is one punt that absolutely crushed the soul of the Big Ten 4th down calculator. Late in the 2nd quarter, Indiana had a 4th and 1 at midfield, and they went for it! As a 21 point underdog to the #1 ranked team in the country, you have to play aggressively. And part of that is going for it on every 4th and 1. Indiana failed on a 4th and 1 running a fake punt earlier, but they converted this time. The 4th down calculator was pleased with both calls. But then, 3 plays later, Indiana had a 4th and 2 on the Ohio State 39...and punted. After they just went for a 4th and 1 at midfield! You're scared of Ohio State taking over at their own 39, but not at midfield? I don't get it. Assuming Indiana downs the ball at the 10 yard line, that decision cost them just over a full point. But the punt actually went into the end zone for a touchback. With Ohio State starting at the 20 yard line, the decision to punt actually cost them 1.7 points! You can't make decisions like that if you're going to upset the defending national champions.

Indiana's other 4th-down disagreement was kicking a field goal on 4th and 2 from the Ohio State 15 yard line instead of going for it. This again cost Indiana almost a full point. Indiana did a good job of being aggressive on 4th and 1. But they should have done the same on 4th and 2.

Ohio State's only poor decision was punting on 4th and 1 from their own 16-yard line. That cost the most of the 0.73 points you see in the table. However, they completely made up for it later in the game. On 4th and 1 from their own 35, they went for it when most coaches always punt. The result was a 65-yard touchdown run by Ezekiel Elliott. That decision may have saved Ohio State's season.

The one 4th-quarter decision I'll discuss is Indiana's decision to punt on 4th and 6 from their own 35 down a touchdown with only 6 minutes left in the game. The win probability calculator didn't like Indiana's chances of winning either way, but they favored going for it, giving Indiana a 3.4% chance of winning by going for it, and only a 2.7% chance by punting. And when you factor in the fact that Ohio State averaged 8 yards per rush on the game, there was a reasonable chance Indiana would never get the ball back. With 4th and 6 being pretty manageable, Indiana should have gone for it. Too often you see a coach punt on 4th and manageable, only to go for it on a much longer distance later (if they even get the ball back at all). That didn't happen in this case, as Indiana got the ball back and didn't face a 4th down until the last play of the game. But it doesn't change that fact that although it was close, going for it gave Indiana the best chance of winning.

Michigan State 24 - Purdue 21

This game had blow out written all over it early. And then, well, it didn't. But Michigan State was able to hold on, and avoid the dreaded SPARTY NO!

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Purdue

1 4 0 2 0.24 Michigan St

2 4 1 0

0.51

Purdue's disagreement was actually one where the calculator thought they were too aggressive. Instead of kicking a field goal on 4th and 5 from the Michigan State 30, they decided to go for it. But the difference in expected points is only 0.24, plus Purdue knows their kicker's ability better than me. And when you factor in that they were 21-point underdogs playing on the road, the decision to be aggressive and go for it really isn't bad at all.

Michigan State's worst decision came on a punt on 4th and 2 from midfield. The calculator says it's actually a pretty close call, with going for it only resulting in 0.19 more expected points. But in reality this play was a disaster for Michigan State. The punter mishandled the snap, and Purdue ended up getting the ball at the Michigan State 21 yard line. This set up the Boilermakers first touchdown, sparking the comeback. The next possession Michigan State attempted a 35 yard field goal on 4th and 4 instead of going for it (which is what the calculator suggests for a home team). The result was another disaster for Michigan State, as they missed the field goal. Then 3 plays later, Purdue scored another touchdown making it a one score game.

In the 4th quarter, Michigan attempted another field goal on 4th and 4 from Purdue 13 up by 7 points. Using expected points, Michigan State should have went for it. But in the 4th quarter win probability is what teams should be concerned with. And the win probability for both decisions are approximately 97% (it's so high because it takes into account that Michigan State was a 21.5 point favorite), so there wasn't a bad choice. Michigan State decided to kick the field goal, they made it, and it turned out to be the game winning score.

Illinois 14 - Nebraska 13

Poor Nebraska. This is now the 3rd game that they've lost on (just about) the final play of the game. They're a better team than their 2-3 record indicates. Unfortunately, that probably doesn't make Husker fans feel any better.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Nebraska

2 6 2 0 0.82 Illinois

2 6 2 2

0.63

On Nebraska's first drive, they punted on 4th and 1 from their own 31 yard line. Of course the calculator thinks they should have gone for it, especially considering Nebraska's rushing game is their strength. And this is about the same spot on the field that Ohio State went for it on 4th and 1, which worked out pretty well for them. Nebraska has lost all 3 close games they've played in. If you maximize your points early, then you might not being playing in such a tight game late.

On their 2nd possession, Illinois decided to kick a 50 yard field goal on 4th and 4. Think about it this way. The probability of converting a 4th and 4 is about 46%. The probability of making a 50 yard field goal is about 52%. And when you factor in that one of those decisions leaves open the possibility of scoring a touchdown, it really should be an easy decision. But Illinois decided to kick anyway. And, you guessed it, they missed the field goal.

The calculator does want to applaud Illinois for their 4th down decision on their next possession though. With a 4th and 1 at the Nebraska 10, they decided to go for it. This increased their expected value by over 1.5 points when compared to kicking the field goal! They didn't convert, but Nebraska went 3 and out, and Illinois started their next possession at the Nebraska 43 yard line. This is why going for it so close to the goal line should be an easy decision. Even if you fail, there is a good chance you're starting your next possession in great field position. Of course, Illinois used that great field position to miss a field goal. You can lead a horse to water, but you can't make it drink.

Now let's jump ahead to the 4th quarter, where some of the 4th down decisions were...well, let's be nice and say "puzzling."

4th Down Decisions in the 4th quarter

Team

4th Down Distance

Yards to End Zone

Calculator Decision

Coach Decision

Win Probability Go For It or Field Goal

Win Probability Punt

Illinois

61 Go For It Punt 15% 11% Illinois 10 80 Go For It Punt 4.1% 3.7% Nebraska 7 27 Field Goal Go For It 99.6% (FG) 99.4% (Go)

With just under 9 minutes left in the game, Illinois had a 4th and 1 from their own 39 yard line, and decided to punt. They were down 13-7 at the time, so odds are you're going to have to go for it on 4th down at some point. And there is no easier 4th down conversion to make than 4th and 1. But Illinois decided to punt, lowering their win probability by 4%. And it's even worse when you remember that Nebraska's rushing offense is perfect for running out the clock. Sure enough, in just 2 plays Nebraska advanced the ball to the Illinois 36 yard line, almost the same spot they would have started if Illinois failed on 4th and 1.

After a Nebraska punt, Illinois got the ball back and quickly found itself in a 4th and 10 on their own 20 with less than 5 minutes in the game. At this point the Illinois situation is looking bleak, giving them a 4.1% chance to win by going for it, and only a 3.7% chance to win by punting. I bet they wish they could have that 4th and 1 decision back. Anyway, this decision was a lot closer, but the win probability calculator favors going for it. You just can't be sure you'll get another chance, and even if you do, there might not be enough time to realistically score. But Illinois decided to punt. And luckily for them, Nebraska sailed way ahead of them in the "puzzling" decision making contest.

Nebraska had a 3rd and 7 at the Illinois 27 with about a minute left. Yes, this is a 3rd down, but it has to be mentioned. Illinois was out of timeouts, so a simple running play would have kept the clock running, eating up precious time. Plus a couple yards would make a field goal attempt easier. Instead, Nebraska threw an incomplete pass, stopping the clock with 55 seconds left. But they were still in the driver's seat, as you can clearly see by the win probabilities above. On 4th and 7, Nebraska could end the game two different ways:

A successful 44 yard field goal, which Big Ten kickers make approximately 63% of the time
A successful conversion on 4th and 7, which Big Ten teams make approximately 34% of the time

It seems like the clear decision is to kick the field goal. And for his career, Nebraska kicker Drew Brown is 5 for 11 (45%) on kicks between 40 and 49. That's lower than the Big Ten average (though we must beware of the danger of small sample sizes), but still better then the chance of converting on 4th down. Plus, he had already made a 39 yard field goal earlier in the game. Is 44 yards really that much farther than 39 yards? But for some reason Nebraska decided to go for it, and they were unsuccessful. Then Illinois went down and scored the game winning touchdown with 10 seconds left. All Nebraska needed was competent clock management skills and better 4th down decision making, and they should have easily won this game. But instead I present to you:

Mike Riley, current leader for Worst Big Ten 4th Down Decision of the Year.

Iowa 10 - Wisconsin 6

Now that's a Big Ten score if I've ever seen one. You'd expect a game with a score like that to have a ridiculous amount of punts. Well, in this case you'd be wrong.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Wisconsin

2 2 3 0 0.62 Iowa

0 3 2 1 0

Through the first 3 quarters, there were as many field goal attempts as punts. And it was two of Wisconsin's field goal attempts that the 4th down calculator disagreed with. The first one wasn't too bad. On 4th and 5 from the Iowa 28, the Badgers kicked a field goal when the calculator would have said to go for it. However, the difference in expected points is only 0.07. So either choice was actually fine. But later in the game, Wisconsin should have been more aggressive when they kicked a 42 yard field goal on 4th and 3. They gave up just over half a point by deciding to kick. And to make matters worse, they missed the field goal. And as we saw later in the game, those were points they desperately needed.

From the 4th down calculators perspective, Iowa coach Kirk Ferentz called a perfect game. On 4th and 2 from the Wisconsin 8 yard line, he decided to go for the first down instead of kicking a field goal. The play was unsuccessful, but the gambit paid off in the long run, as the next score in the game was an Iowa touchdown.

Then things got very interesting in the 4th quarter.

4th Down Decisions in the 4th Quarter

Team

4th Down Distance

Yards to End Zone

Calculator Decision

Coach Decision

Win Probability Go For It or Field Goal

Win Probability Punt

Wisconsin

33 Field Goal Punt

43%

41% Iowa

75 Go For It Go For It 72% 71% Iowa 2 87 Punt Punt 65.5% 65.8% Wisconsin 16 40 Punt Punt 19% 22%

There were some huge decisions made early in the 4th quarter, but we see that the win probabilities between those decisions weren't actually too different. Things started with Wisconsin punting from the Iowa 33-yard line. Of the three options, kicking the field goal results in the highest win probability. And if Wisconsin coach Paul Chryst didn't trust his kicker to make a 50 yard field goal, going for the first down gave him a win probability of 42%, which is still higher than punting. But you'll notice we're talking about a difference of 1 and 2 percent. Chryst should have been more aggressive, but the decision to punt wasn't an awful one.

In the very next series, Kirk Ferentz made a decision that everybody would crush him for if it hadn't worked, but would forget about it if it did work. Up 10-6 on the 25 yard line, he went for it on 4th and 1. You'll see that the calculator favored going for it, but only barely. Luckily for Kirk, Iowa picked up the first down. However, they ended up fumbling the very next play and Wisconsin recovered, setting them up nicely for a game winning touchdown. But then Wisconsin fumbled on 2nd and goal from 1 yard line, and Iowa recovered. I like to think of that as the football gods rewarding Ferentz for his aggressive 4th down decisions.

After recovering the Wisconsin fumble, Iowa had a 4th and 2 from their own 13-yard line. Kirk went for it on 4th and 1 earlier, so would he do the same on 4th and 2? You'll see that there was basically no difference in the win probability between punting and going for it. So neither decision would have been a bad one. Ferentz decided to punt, and on the ensuing drive, Wisconsin had a 4th and 16 with 3 and a half minutes left. You hate to see a losing team punting in opponent territory with so little time left, but 4th and 16 is so hard to convert that punting was actually the correct decision. Wisconsin was able to get the ball back at midfield and drove all the way down to the Iowa 16 yard line, but they weren't able to score the game-winning touchdown. So with the Iowa win and the excellent 4th down decision making, I'm now declaring:

Kirk Ferentz, current leader for Big Ten 4th Down Decision Maker of the Year.

Summary

Each week, I’ll summarize the times coaches disagreed with the 4th down calculator and the difference in expected points between the coach’s decision and the calculator’s decision. I’ll do this only for the 1st 3 quarters since I’m tracking expected points and not win probability. I also want to track decisions made on 4th and 1, and decisions made between midfield and the opponent’s 35 yard line. I’ll call this area the “Gray Zone.” These will be pretty sparse now, but will fill up as the season goes along. Then we can easily compare the actual outcomes of different decisions in similar situations.

Have anything else you think I should track? Let me know and I’ll add it!

Team Summary Team Number of Disagreements Total Expected Points Lost Northwestern 4 2.34 Indiana 2 2 Minnesota 2 1.13 Michigan 1 0.93 Nebraska 2 0.82 Penn State 2 0.8 Ohio State 2 0.73 Illinois 2 0.63 Wisconsin 2 0.62 Michigan St 2 0.51 Rutgers 1 0.3 Purdue 1 0.24 Maryland 0 0 Iowa 0 0

4th and 1

Yards To End Zone

Punts

Average Next Score After Punt

Go for It

Average Next Score after Go for it

Field Goals

Average Next Score After FG

75-90

50-74

3.7 2 3.5

25-49

-1.5

1-24

* 2 -5 1 3

The Gray Zone (4th downs 35-50 yards to the end zone)

4th Down Distance

Punts

Average Next Score After Punt

Go for It

Average Next Score after Go for it

Field Goals

Average Next Score After FG

-1

2-5

7 0.57

6-9

10+

-0.67

↧

Does Every Good Analytical Chemist Need to Be a Statistician?

October 8, 2015, 5:00 am

≫ Next: How to Change the Language in Minitab 17 Statistical Software

≪ Previous: Big Ten 4th Down Calculator: Week 2

I read trade publications that cover everything from banking to biotech, looking for interesting perspectives on data analysis and statistics, especially where it pertains to quality improvement.

Recently I read a great blog post from Tony Taylor, an analytical chemist with a background in pharmaceuticals. In it, he discusses the implications of the FDA's updated guidance for industry analytical procedures and methods validation. His audience comprises analytical chemists and pharmaceutical researchers, people who are very technologically savvy and adept at solving problems. The kind of people you'd imagine are very capable and eager to collect some data and figure out what it means.

Or maybe not.

chemistry set

What Taylor's post makes clear is that even a highly educated, scientifically inclined audience like this doesn't necessarily appreciate the value of statistical analysis—or at least, doesn't really enjoy actually doing it.

Taylor acknowledges an issue that Minitab has focused on from its earliest days: when it comes to analyzing data and using statistics, some people seem to get it right away, and others don't. Some people enjoy it, and many others find it tedious or even painful.

Those who fall into that second category—even highly educated analytical chemists—tend to try to avoid statistics, even though there are many tools available to help them get the benefits of analyzing their data more easily and quickly. The problem is that trying to avoid statistics and data in today's world is like an ostrich burying its head in the sand so it won't see a threat.

Taylor points out that new FDA guidelines make this particularly true in the pharmaceutical realm, where "the use of statistics will only be increasing in the future."

So does this mean every analytical chemist need to be a statistician? No. But Taylor makes a strong case that good analytical chemists at least need to appreciate and be prepared to apply statistical methods in their work—and that's excellent advice for people in most professions.

Tools that Make Statistics and Data Analysis Easier

Minitab has always sought to help more people understand and apply statistics, so it's extremely gratifying that Taylor gives us a shout-out by name:

If one wishes to utilize some of the most useful aspects of statistical experimental design and optimization, then one would need to use a simple statistical tool such as Minitab®...This program is very easy to use, and with only a rudimentary understanding of the principles, can help us to improve our analytical practice enormously.

Now that's music to our ears! Our goal has always been to make Minitab easy to use, and by adding features like the Assistant to guide you through statistical analysis and help you interpret the results correctly, we are striving to open the world of data analysis to as many people as possible.

Taylor goes on to call out Minitab's design of experiment (DOE) capabilities, which certainly have great application in chemistry and pharmaceutical research. He notes that with Minitab:

...it is very straightforward to plan, implement, and analyse Fractional Factorial designs which allow us to investigate the dominant variables within a method and help to control them more effectively. We can use the same methods for identifying interactions between variables in our methods and control or eliminate them. It is very easy to plan a fractional factorial design such as the Plackett-Burman, in order to significantly reduce the number of experiments required to validate an analytical method for Robustness. We can also use full factorial methods to optimise an analytical method in significantly fewer experiments using the tools available within Minitab.

Though Taylor doesn't mention it specifically, one of my favorite additions to Minitab 17, the latest version of our statistical software, is the DOE Assistant. Designed experiments are extremely powerful, but there is a perception that they are difficult to set up and analyze. The Assistant will actually guide you through the process of designing and analyzing both screening and optimization experiments step by step, and even gives you output that puts your results into straightforward language that's easy to understand and share with others, regardless of their level of expertise.

A DOE optimization report created with the Assistant in Minitab 17.

Resources for Using Statistics in Pharmaceuticals

If you're just getting started with statistics and/or Minitab, check out our e-learning course, Quality Trainer. You can also download Minitab Statistical Software and try it free for 30 days.

We've also developed an instructor-led training program that can help you use statistical methods to validate a pharmaceutical process for each stage of the FDA Process Validation Guideline.

If you're using Minitab Statistical Software, we offer resources to help with your validation, including Minitab’s software validation kit here:

http://www.minitab.com/support/software-validation/

This software validation kit was created to help you understand how we validate Minitab Statistical Software for market readiness. And you can find additional information about validating Minitab relative to the FDA guideline CFR Title 21 Part 11 at this link:

http://it.minitab.com/support/answers/answer.aspx?id=2588

Finally, Minitab's statistical consultants—highly respected statisticians with experience in pharmaceuticals, medical devices, and many other industries—can help you overcome even the toughest data analysis challenges.

Whatever industry you're in, wherever you are on the continuum of statistical experience and expertise, we encourage you to get the maximum benefit from the data you're collecting and analyzing, and we'd love to help.

↧