Revisiting the Relationship between Rushing and NFL Wins with Binary Fitted Line Plots

February 28, 2014, 5:06 am

≫ Next: Opening Ceremonies for Bubble Plots and Poisson Regression

≪ Previous: Gauging Gage Part 3: How to Sample Parts

Back in November, I wrote about why running the football doesn’t cause you to win games in the NFL. I used binary logistic regression to look at the relationship between rush attempts (both by the lead rusher and by the team) and wins. The results showed that the model for rush attempts by the lead rusher and wins fit the data poorly. But the model for team rush attempts and wins did fit the data well (although we went on to show that the team rushing attempts wasn’t causing the winning).

We were able to conclude this by looking at the p-value and goodness-of-fit tests. But what if we wanted to trade our boring output with some more entertaining images? Well, in the previous versions of Minitab, we were out of luck. But Minitab 17 introduces a new tool that is perfect for our situation, Binary Fitted Line Plots!

What Is a Binary Fitted Line Plot?

A binary fitted line plot examines the relationship between a continuous predictor variable and a binary response. A binary response variable has two possible outcomes, such as winning or losing a football game.

First, let’s consider a regular fitted line plot. This plot examines the relationship between a continuous predictor and a continuous response. For example, we could visualize the relationship between a person’s income and the value of their house.

Fitted Line Plot

We can clearly see that people with higher incomes own more expensive houses. So let’s see what this looks like if we swap out the value of the house with a binary variable, such as whether or not they own a vacation home.

Binary Fitted Line Plot

The x axis still shows us the income of each person. But the y axis is now the probability a person has of owning a vacation home. All of our observations (the blue dots) have a probability of either 0 (they don’t own a vacation home) or 1 (they do own a vacation home). Looking at just the blue dots, we can see lower income values tend to have a probability of 0, while the higher incomes have a probability of 1.

The red line shows us the probability that a single person would have of owning a vacation home based on their income. We can see that for incomes below about $80,000, the chances of owning a vacation home are very low, about 20% or less. Incomes above about $120,000 have a very high chance of owning a vacation home. You can’t be as certain whether a person will own a vacation home for incomes between $80,000 and $120,000.

Let’s look at one more example before we return to our football study. What would a non-existent relationship look like? For example, what if we changed the binary variable to whether or not the person likes vegetables?

Binary Fitted Line Plot

The red line is pretty much horizontal. This tells us that no matter what a person’s income is, the probability that they like vegetables is around 60%. So clearly there is no relationship between income and liking vegetables.

Now that we’ve gone through some examples, let’s get back to football! And note that for all the examples above, the data were completely made up for illustrative purposes. So no need to e-mail me demanding to know where I found the dataset that indicates 60% of people like vegetables!

Binary Fitted Line Plots for Rushing Attempts vs. Wins

In my post back in November, the first thing we looked at was individual rushing attempts and wins. We found a significant relationship between the two, but the model didn’t fit the data well. In other words, the model did a really bad job predicting whether a team won or lost based on the number of carries by the lead rusher.

So let’s see what the Binary Fitted Line Plot looks like!

Binary Fitted Line Plot

The red line has an upward trend, showing that more carries by the lead rusher leads to a higher probability of winning. But all of the probabilities are between 20% and 80%, so you can never be very certain in the probability that the team won or lost, no matter how many or few carries the lead rusher received. And in our dataset, 50% of the data are between 11 and 20 carries. We see from the graph that the probabilities in that range fall between about 40% and 60%. That means for the half of our data, the model is pretty much just guessing which team won or lost.

The individual observations (blue dots) help confirm this. You’ll notice that that both winning teams and losing teams appear to have about the same range of carries by the lead rusher. If our model fit the data well, we would expect to see higher values of carries grouped at the top right of the graph and lower values grouped at the bottom left (like in our income vs. vacation home plot).

This plot gives us a great visualization of what the statistics told us. Yes, there is a relationship between the two variables, but you’re going to do a really poor job predicting future outcomes based solely on that relationship.

Now let’s move on to our plot of team rushing attempts vs. wins. We found that there was also a significant relationship between these variables, and this time the goodness-of-fits tests said that the model fit the data well. Let’s see how we can use the binary fitted line plot to confirm those findings.

Binary Fitted Line Plot

Here we can see why the model fits the data better when we use team rushing attempts. When a team rushes 15 times or fewer, we can be pretty confident they’ll lose. Meanwhile, we can almost be certain a team wins if they rush 35 times or more. And remember how half of the data fell in the 40% to 60% probability range in the previous plot? Well here that range only includes about 25% of the data (teams having between 24 and 29 rushing attempts). So this model is doing a lot less “guessing” than the previous one.

The underlying statistics will always be needed to make decisions when it comes to statistical analyses. But graphs and plots are great ways to visualize the results and get a better understanding of the conclusions you reached with the statistics. Now that Minitab offers Binary Fitted Line plots, you have even more powerful tools at your disposal. So go ahead and plot away!

↧

Opening Ceremonies for Bubble Plots and Poisson Regression

March 3, 2014, 5:34 am

≫ Next: The Stability Report for Control Charts in Minitab 17 includes Example Patterns

≪ Previous: Revisiting the Relationship between Rushing and NFL Wins with Binary Fitted Line Plots

By popular demand, Release 17 of Minitab Statistical Software comes with a new graphical analysis called the Bubble Plot.

This exploratory tool is great for visualizing the relationships among three variables on a single plot.

To see how it works, consider the total medal count by country from the recently completed 2014 Olympic Winter Games. Suppose I want to explore whether there might be a possible association between the number of medals a country won and its maximum elevation. For that, I could use a simple scatterplot, right?

But say I want to throw a third variable into the mix, such as GDP per capita. (After all, a new bobsled costs tens of thousands dollars--a bit more than the dented plastic saucer I used as a kid. Even a top-of-the-line curling stone can set you back over $10,000. So maybe, just maybe, the wealth of a country may relate to its total medal count.)

To show these three variables simultaneously on the same plot, I'll choose Graph > Bubbleplot > Simple. Then I'll indicate which variables will be displayed on the plot as X, Y, and the bubble size.

When I click OK, Minitab displays the bubble plot below:

Tip: Depending on your data, you might want to change the relative size of the bubbles and add jitter (offset the points) to increase the legibility of the plot. The bubbles above have been slightly reduced in size and jittered.

How Do I Interpret All That Suds?

Interpret the X and Y variables on the horizontal and vertical scales as you would a scatterplot. On this plot, you can see the bubbles rising as you move from left to right on the plot. So it appears that as the country's GDP per capita increases, the total medal count increases.

There's one data point (bubble) that seems to buck this trend. Using the brushing feature allows me to easily identify it in the worksheet--it's Russia. The top medal winner in the games, but with a relatively low GDP per capita.

To explore the relationship between maximum elevation in the country, which is the bubble-size variable, with the two other variables, look for a consistent change in the the size of the bubbles as you move along either the vertical (Y) or the horizontal (X) axis.

As you move from left to right, the size of the bubbles seems fairly random. This suggests there's no strong relationship between the GDP per capita of a country and its maximum elevation, which makes perfect sense.

But as you move up the vertical scale, it does look like the bubbles generally seem to get slightly larger. Most of the smaller bubbles seem to fall below the 10-medal mark. I've brushed the 3 bubbles that buck this trend--they're "outliers" by virtue of their different sizes compared to their neighbors.

The small bubble near the top right represents the Netherlands--a relatively flat country with a maximum elevation of only 887 meters--yet nevertheless a top medal winner. The two larger bubbles at the bottom left are China and Kazakhstan, whose tallest peaks, at 8850 m and 7010 m, respectively, are higher than those in any of the other medal-winning countries.

The bubble plot shows some interesting descriptive trends for this set of data. But suppose this data had been a random sample collected from a larger population. To analyze these relationships statistically, and determine whether they hold for the entire population, you’d need to perform a more rigorous analysis.

That is, are the associations between GDP per capita and medal count, and maximum elevation and medal count, statistically significant in a regression analysis?

Caveat: Because regression is an inferential analysis, it requires a random sample of data from a population. For the sake of illustration, we'll pretend the Sochi Olympic data is a representative sample of all the modern Winter Olympic games. We don't know that that's the case, obviously, so consider the results speculative, at best.

New in 17: Poisson Regression Analysis

Notice that the response variable in this case is total medal counts which, strictly speaking, is not continuous data. (You can't win 3.37 Olympic medals.) That means that standard linear regression, which is typically performed on a continuous response variable, is not the best tool to analyze these data.

Luckily, Release 17 of Minitab now includes Poisson regression analysis, which is specifically designed to evaluate the relationship between explanatory variables and a count response.

To evaluate the relationship between maximum elevation and GDP per capita with total medal count, choose Stat > Regression > Poisson Regression > Fit Poisson Model. Minitab produces the following output:

-----------------------------------------

Poisson Regression Analysis: Medals versus Max Elevation (m), GDP/capita

Deviance Table

Source                           DF     Adj Dev      Adj Mean     Chi-Square     P-Value
Regression                     2     78.61           39.306           78.61         0.000
   Max Elevation (m)      1    21.46           21.457            21.46        0.000
   GDP/capita                 1     60.45          60.446            60.45         0.000
Error                              23   112.72 4.901
Total                               25   191.33

Model Summary

Deviance   Deviance
   R-Sq      R-Sq(adj)     AIC
   41.09% 40.04%      219.32

-------------------------------

By jove, it looks like both maximum elevation and GDP per capita are indeed significant predictors of medal count, explaining about 40% of the variation.

And to think I found these significant predictors by chasing a trail of floating bubbles, just like a little kid.

Who says statistics can't be fun?

Try It Yourself

If you'd like to experiment with the Bubble plot, Poisson regression, and other new features in Minitab 17, download the free 30-day trial. After downloading the trial version of Release 17, click here to get the data used in this post. Have fun!

↧

The Stability Report for Control Charts in Minitab 17 includes Example Patterns

March 4, 2014, 2:02 pm

≫ Next: Why Is There No R-Squared for Nonlinear Regression?

≪ Previous: Opening Ceremonies for Bubble Plots and Poisson Regression

Minitab’s Assistant got a lot of splashy upgrades for Minitab 17. The addition of DOE and multiple regression to the Assistant are large feature improvements with obvious advantages. But there are many subtler, but still fantastic additions that shouldn't be overlooked.

One of those additions is the example patterns added to the Stability Report for control charts.

The Stability Report was excellent in Minitab 16, clearly showing you the out-of-control points in the process:

The Stability Report in Minitab 16 clearly shows out-of-control points.

But the truth is that it’s sometimes hard to move from detecting the out-of-control points to an understanding of what’s going on in the process. That’s where the example patterns in the Assistant come into play.

The Stability Report in Minitab 17 includes example patterns to help you understand your process.

In the Minitab 17 report shown above, you still see the out-of-control points, but the Stability Report also reminds you what kinds of patterns are usually meaningful in a control chart. Even better, if you see one of the patterns in your data, then you can get more information about the pattern from the tooltip on the example chart. For the shift in the process mean that you see above, the tooltip reminds you:

Shifts in the data are noticeable and sustained changes in the process mean or process variation. Shifts can be permanent or temporary and are usually the result of a change to the process. A shift on the I chart indicates a change in the process mean and a shift on the MR chart indicates a change in the process variation. If shifts are present in your process, the estimates of the process mean or variation may not accurately reflect the current or future state of the process.

Typically, shifts are due to special causes rather than common causes. You should investigate the possible causes for the shifts so that you can address the problem.

The tooltip on the Stability Report helps you to understand your process.

The patterns to look for and the information that Minitab supplies for understanding the patterns are just one more way that the Assistant walks you through every step of your analysis so that you can be confident in the results.

Want to try it out for yourself? The data that I used to illustrate the tooltip is from one of our new Quick Start Exercises that you can use to get on the path of becoming a fearless data analyst. You’ll want to try them all!

↧

Why Is There No R-Squared for Nonlinear Regression?

March 6, 2014, 4:00 am

≫ Next: Using nonparametric analysis to visually manage durations in service processes

≪ Previous: The Stability Report for Control Charts in Minitab 17 includes Example Patterns

Plot of nonlinear regression model Nonlinear regression is a very powerful analysis that can fit virtually any curve. However, it's not possible to calculate a valid R-squared for nonlinear regression. This topic gets complicated because, while Minitab statistical software doesn’t calculate R-squared for nonlinear regression, some other packages do.

So, what’s going on?

Minitab doesn't calculate R-squared for nonlinear models because the research literature shows that it is an invalid goodness-of-fit statistic for this type of model. There are bad consequences if you use it in this context.

Why Is It Impossible to Calculate a Valid R-squared for Nonlinear Regression?

R-squared is based on the underlying assumption that you are fitting a linear model. If you aren’t fitting a linear model, you shouldn’t use it. The reason why is actually very easy to understand.

For linear models, the sums of the squared errors always add up in a specific manner: SS Regression + SS Error = SS Total.

This seems quite logical. The variance that the regression model accounts for plus the error variance adds up to equal the total variance. Further, R-squared equals SS Regression / SS Total, which mathematically must produce a value between 0 and 100%.

In nonlinear regression, SS Regression + SS Error do not equal SS Total! This completely invalidates R-squared for nonlinear models, and it no longer has to be between 0 and 100%.

Why Shouldn't You Use R-squared to Evaluate the Fit of Nonlinear Models?

As you can see, the underlying assumptions for R-squared aren’t true for nonlinear regression. Yet, most statistical software packages still calculate R-squared for nonlinear regression. Calculating this statistic in this context is a dubious practice that produces bad outcomes.

Spiess and Neumeyer* performed thousands of simulations for their study that show how using R-squared to evaluate the fit of nonlinear models leads you to incorrect conclusions. You don't want this!

That's why Minitab doesn't offer R-squared for nonlinear regression.

Specifically, this study found the following about using R-squared with nonlinear regression:

R-squared tends to be uniformly high for both very bad and very good models.
R-squared and adjusted R-squared do not always increase for better nonlinear models.
Using R-squared and adjusted R-squared to choose the final model led to the correct model only 28-43% of the time.

Clearly, using R-squared to evaluate and choose a nonlinear model is a bad idea. Additionally, the authors lament the persistence of this practice in some fields of study:

In the field of biochemical and pharmacological literature there is a reasonably high occurrence in the use of R2 as the basis of arguing against or in favor of a certain model. . . . Additionally, almost all of the commercially available statistical software packages calculate R2 values for nonlinear fits, which is bound to unintentionally corroborate its frequent use. . . . As a result from this work, we would like to advocate that R2 should not be reported or demanded in pharmacological and biochemical literature when discussing nonlinear data analysis.

If you're already using Minitab, great. However, if you use statistical software that calculates R-squared for nonlinear regression, don’t trust that statistic!

Instead, compare the standard error of the regression (S), and go with the smaller values.

________________________________

Spiess, Andrej-Nikolai, Natalie Neumeyer. An evaluation of R2 as an inadequate measure for nonlinear models in pharmacological and biochemical research: a Monte Carlo approach. BMC Pharmacology. 2010; 10: 6.

↧

Using nonparametric analysis to visually manage durations in service processes

March 6, 2014, 5:00 am

≫ Next: How to Handle Extreme Outliers in Capability Analysis

≪ Previous: Why Is There No R-Squared for Nonlinear Regression?

My main objective is to encourage greater use of statistical techniques in the service sector and present new ways to implement them.

In a previous blog, I presented an approach you can use to identify process steps that may be improved in the service sector (quartile analysis). In this post I'll show how nonparametric distribution analysis may be implemented in the service sector to analyze durations until a task is completed.

Knowing how much time you need to complete a task may be very useful when assessing process efficiency, and is an important factor in many businesses.

Consider a technical support department (for example Minitab technical support). Queries from users, customers, and potential customers arrive by e-mail, web, and phone calls to operators or specialists. Most of these queries are standard ones and will be answered quickly (depending on the efficiency of individual operators, of course).

But a few queries are much more complex or unusual. These may require referral to another specialist, more research, or additional help from colleagues.

So two modes are likely to be observed in this support department: the usual, standard mode (quick answers, low variability), and a mode for more unusual and complex queries (more time needed, large amount of variability).

Another example is the time it takes a car to repaired at a mechanic's shop: some parts that are rarely required might not always be available in the shop. When unusual parts are needed, it leads to substantial increases in repair times.

Any company that handles customer complaints and monitors the time it takes to resolve each incident also might be interested in analyzing durations.

Measuring Customer Experience

How can we quantify the customer experience ? The mean (or the median) is not really representative in this context: durations do not usually follow a normal distribution. Instead, we are dealing with bi-modal distributions: some customers will receive a quick response, and others will need to wait a lot longer because their problems are more complex or have never been encountered before. We could focus on the largest durations (considering the maximum value, the 95th or the 99th percentile), but these statistics are not really representative, either.

Ideally, we would like to have a global view of durations until completion, in order to compare several areas, branches, teams, services, etc.

A very efficient way to visually manage such processes is to use Non-parametric distribution analysis’ (in Minitab, go to Stat > Reliability Survival > Distribution Analysis (Right censoring) > Nonparametric Distribution Analysis...

An additional advantage of nonparametric distribution analysis is that censored values may be analyzed as well. For example, in a technical support department, a censored value might typically represent a user who reported an issue and received a suggestion, but never returned to confirm it worked. In this case, you cannot be really sure that the problem has been successfully solved and that the incident can be closed.

Durations to complete customer requests

Graphing Service Duration Times

In the survival plot below, the Y axis represents the number of requests that have not been completed yet (outstanding tasks, percentage of customers who are still waiting for their request to be dealt with) and the X axis represents durations/times.

Clearly branch C needs more time than Branch A or B. The green curve for Branch C reveals that most requests (the easy ones) are satisfied within the initial 20 hours, but the remaining requests take a lot more time, with a maximum duration of 100 hours. Notice Branch C's very fast reduction (steep slope) at the beginning and its substantial deceleration after 20 hours.

Survival plot

In the cumulative failure plot below, the Y axis represents the percentage of tasks (customer requests) that have been successfully completed (proportion of resolved issues), and the X avis represents duration/time to resolve an issue. The cumulative failures plot and the survival plot convey exactly the same type of information.

Cumulative plot

Conclusions

Branch C is clearly performing poorly compared to the other branches. Some decisive improvements are obviously needed. In order to reduce the longest durations,unusual or complex requests might follow a different process so that such queries are escalated quickly to the appropriate level

Nonparametric distribution analysis is usually used to analyze reliability data, but now you've seen how this tool can be implemented for a completely different purpose.

↧

How to Handle Extreme Outliers in Capability Analysis

March 10, 2014, 6:00 am

≫ Next: Analyzing “Luck” in College Basketball: Part II

≪ Previous: Using nonparametric analysis to visually manage durations in service processes

Transformations and non-normal distributions are typically the first approaches considered when the when the Normality test fails in a capability analysis. These approaches do not work when there are extreme outliers because they both assume the data come from a single common-cause variation distribution. But because extreme outliers typically represent special-cause variation, transformations and non-normal distributions are not good approaches for data that contain extreme outliers.

As an example, the four graphs below show distribution fits for a dataset with 99 values simulated from a N(m=10,s=1) distribution and 1 value simulated from a N(m=18,s=1) distribution. Two of the probability plots shown in these graphs assess the fit to a Normal Distribution after transforming the data. The remaining fourteen probability plots assess the fit to common (and some uncommon) non-normal distributions. None of the fits are adequate.

Probability Distribution Plots

Method for Calculating Defect Rate

For process data with common cause variation that follows a Normal distribution, a reasonable approach for modeling extreme outliers is to assume the outliers represent a shift in the mean of the distribution, as shown in the next three graphs.

In most cases, there will not be enough data to measure the variation in the outlier distribution, so the variation in the outlier distribution will need to be assumed unchanged from the common cause distribution. This is consistent with the approach used in classic regression and ANOVA analyses, which assume a change in the mean does not affect the variation.

The defect rate can be estimated by using a weighted average of the defect rates from the common cause distribution and the outlier distribution. The weights come from the sample sizes for each distribution.

In the example below, this calculation would be as follows (POS = Probability of Out of Spec):

Defect Rate = [99*(POS Common Cause Dist.)+1*(POS Outlier Dist.)]/100

Histogram of distributions

In the following two examples, this calculation would be as follows:

Defect Rate = [98*(POS Common Cause Dist.)+2*(POS Outlier Dist.)]/100

Histogram of outliers

Outliers histogram

Important Considerations When Dealing with Extreme Outliers

It is critical to investigate extreme outliers and attempt to understand what caused them. The outlier(s) may be measurement errors or data entry errors, in which case they do not represent the true process and should appropriately adjusted.

If they are legitimate values, your number one priority should be to prevent future outliers from occurring and strive for process stability.

When you have special-cause variation in a capability analysis, you should not assume the defect rate estimated from the data represents the future defect rate of the process, no matter what approach you used. However, you may be able to get a rough estimate of the defect rate during the sampling period as long as the sample adequately represents the process during that time period.

↧

Analyzing “Luck” in College Basketball: Part II

March 14, 2014, 5:00 am

≫ Next: Who's More (or Less) Irish?

≪ Previous: How to Handle Extreme Outliers in Capability Analysis

Luck and basketball Two months ago, I used Ken Pomeroy’s luck statistic to analyze the “luckiest” and “unluckiest” teams in college basketball. What Ken’s luck statistic is really looking at is close games. If you win most of your close games, you'll have a high luck statistic in the Pomeroy Ratings. Lose most of your close games, and your luck statistic will be low.

I looked at the winning percentages in close games of the 20 luckiest teams, 20 unluckiest teams, and 20 teams right in the middle. Sure enough the lucky group won most of their close games, the unlucky group lost most, and the middle group won just about half.

But now that two months have passed, I want to take those same teams and see how they’ve done in close games since. If winning (or losing) close games is really a skill, then the lucky and unlucky groups should continue to win and lose close games (and we'll change their names to "clutch" and "chokers"). But if close games truly are luck, then we would expect all three groups to have winning percentages close to .500.

The Analysis

For the same 60 teams used in the previous analysis, I noted their record in games decided by 6 points (2 possessions) or less, or games that went into overtime (regardless of the final score, since at the end of regulation it was obviously a close game). I also split out games decided by 3 points (1 possession) or less. Again, all overtime games were included in the 1 possession group regardless of the final score. You can get the data I used here.

Now we can compare each group’s winning percentage before January 19th (the date of my first analysis) and after January 19th.

Tabulated Statistics

Look how much the winning percentages have changed. After winning 80% of their close games before January 19, our lucky group has barely won half since. And it appears that our unlucky group has stopped choking away close games, as their winning percentage went from the teens to the forties!

Although one might still argue that in games decided by 2 possessions or less, the lucky group still has a big advantage (.5348 versus .4273) over the unlucky group. A 2-sample t test can tell us if this difference is significant.

2-sample t test

The p-value of .213 is greater than 0.05, so we cannot conclude that the difference in winning percentage in close games between the two groups is significant. So overall we can conclude there just isn't any skill in winning or losing close games. We can further show this by plotting each teams winning percentage on a dotplot.

Dotplot

There is no discernible pattern between the groups. Since January 19, each group has teams that have lost a lot of close games, won a lot of close games, and everything in between.

Applying These Findings to March Madness

This shows that you shouldn’t overreact to the outcome of a close game in college basketball (and the same is probably true in most other sports too). Once you get down to the final minute, the whether you win or lose has more to do with the bounce of the ball than your actual skill level. So regardless of the outcome, if you went into Cameron Indoor and played a tight game against Duke, you played very well. And if you played a close game at home against Prairie View A&M, you shouldn’t be happy with your performance.

Looking ahead to March Madness, two highly rated teams are currently in the top 20 of Ken Pomeroy’s luck statistic. They are Villanova and San Diego St. Combined, those two teams are 15-2 in games decided by 2 possessions or less or overtime games. Both teams could be as high as a 2 seed. Considering that we know neither of those teams “knows how to win in the clutch,” they’re both probably being rated too high. It’s not that either team is bad; it’s just that if they had a record closer to .500 in their close games, they’d have a few more losses and would drop a few seed lines.

The typical comeback to that is “But they did win those games!” Yes, they did, and I agree that they should be seeded higher since they did win those games.

But we’ve shown here that their higher seed is not due to any additional skill either team possesses. It’s merely due to the fact that they got lucky in their close games. And in March, that luck can run out at the most inopportune time, as Seton Hall just showed Villanova in the Big East tournament.

Buyer beware.

↧

Who's More (or Less) Irish?

March 17, 2014, 3:01 am

≫ Next: Predicting the 2014 NCAA Tournament

≪ Previous: Analyzing “Luck” in College Basketball: Part II

B'gosh n' begorrah, it's St. Patrick's Day today!

The day that we Americans lay claim to our Irish heritage by doing all sorts of things that Irish people never do. Like dye your hair green. Or tell everyone what percentage Irish you are.

Despite my given name, I'm only about 15% Irish. So my Irish portion weighs about 25 pounds. It could be the portion that hangs over my belt due to excess potatoes and beer.

Today, many American cities compete for the honor of being "the most Irish." Who deserves to take top honors? Data from the U.S. Census Bureau can help us decide.

The Minitab bar chart below shows the percentage people with Irish ancestry in major U.S. cities.

The reference line at 11.1% shows the national average. Any city above that has the right to wear green on its bar. (My place of birth, Minneapolis, comes in just below the national average. Close, but no green cigar!)

It's surprising that Bostonians are out-Irished, percentage-wise, by Pittsburghers. But die-hard Gaels from Beantown can take comfort in the margin of error for these estimates.

For Pittsburgh the actual U.S. Census Bureau estimate is 16.0% ± 0.7%. For Boston, the estimate is 15.5% ± 0.5%. So, statistically speaking, neither city can claim with confidence that it's the most Irish of large cities in the U.S.

New Yorkers and Chicagoans could also take issue with the above chart. After all, you could argue that it's sheer brute numbers, rather than percentages, that give cities their Irish heritage heft. The Minitab bar chart below shows they'd have a point.

The reference line represents the population of Limerick, the 3rd largest city in the Republic of Ireland.

The number of those with Irish ancestry in the Big Apple comprises a large city by itself (≈ 400,000). Together, New York and Chicago have more citizens with Irish ancestry (≈ 600,000) than the city of Dublin (≈ 525,000).

Based on this chart, even lads and lassies from Phoenix can proudly dye their hair kelly green. Although they probably won't have much luck looking for four-leaf clovers in the desert.

Notice that only Philadelphia and Boston get to wear green in both bar charts!

Note: If you want to find out whether your city can wear green on either bar chart, download a Minitab project with the data here. Then go to U.S. Census Bureau American FactFinder and use the Advanced Search for race and ancestry for your city. In the Search results, use the 2012 ACS 3-year estimates for Selected Social Characteristics in the U.S. In Minitab, add the name of your city and the % and count estimates for Irish ancestry to the worksheet. Then right-click the bar chart and select Update graph now. If your city deserves to wear green, double-click the bar and change the color.

All the world's a stage and most of us are desperately unrehearsed.
~ Sean O'Casey

If we had more time, we could debate these estimates further over a green beer.

What happens if you include all the surrounding metropolitan areas? Or people with Scotch-Irish ancestry? Is self-reported ancestry in a survey even accurate? (Not if you ask people about their Irish ancestry today!)

But ultimately it's not really about the numbers. It’s about what’s in the heart.

That's what makes all Americans part Irish. And part Chinese …part Lebanese…part Nigerian …part Navajo...part Mexican...part Swedish...part Filipino…

And gives usthe true spirit with which to say... E pluribus unum...and Happy St. Patrick’s Day!

↧

Predicting the 2014 NCAA Tournament

March 17, 2014, 11:57 am

≫ Next: Create a DOE Screening Experiment with the Assistant in Minitab 17

≪ Previous: Who's More (or Less) Irish?

Once again it’s time for the madness of March to begin! Which teams have the best shot of going to the final four? Is there a team that might become this year’s Florida Gulf Coast? And do any of the 16 seeds have a realistic shot of beating a 1 seed? Well sit back, because we’re going to answer all of that and more! Somebody tell Cinderella to get her glass slippers, it’s time to go dancing!

Which Ranking System to Use

Before we get to the bracket, we need to decide on which ranking system to use. Because we want to use these rankings for predicting future outcomes, we want a system that uses scoring margin to rank the teams since it is the best predictor of future performance. So anything that focuses on wins/losses are out (sorry RPI and BPI). Last year I tracked 4 different ranking systems, and the Sagarin Predictor came out on top. I did a similar analysis this year using the Pomeroy Ratings, two different Sagarin score based rankings (golden mean and predictor), and the LRMC rankings.

I tracked over 1,200 college basketball games from February 1st until selection Sunday. For each game, I calculated the probability of the favorite winning (probabilities for Pomeroy came right from his site and probabilities for the other rankings were calculated using a regression model). Then I grouped the probabilities into groups (50% to 59%, 60% to 69%, and so on). Then I compared the difference between the predicted probability and the observed probability of each group. The graph below shows the results.

Bar chart

The most accurate ranking systems will have small differences between the probabilities, so small bars are best. We can see the LRMC ratings have the largest difference in 3 different groups, including a 7.5% difference in the 70% to 79% group. So we won’t be using them!

However, there isn’t a clear winner between the other three. If you add up all the differences, the Pomeroy ratings have the smallest total difference, although the two Sagarin systems aren’t very far behind. But here’s the thing, Ken Pomeroy did the same analysis I’m about to do using his ratings. So instead of duplicating his numbers here, I’m actually going to use the Sagarin Predictor rakings. They worked out pretty well last year (they had Louisville ranked as the #1 team). And along the way I’ll point out when the three rankings disagree on a team so you can choose which one to trust!

Okay, enough with the boring talk, let’s get to the brackets!

South Region

The table below indicates the probability each team in the South Region has of advancing in each round up to the Final Four. (Odds of winning the tournament have their own section at the bottom of this post.) The Rank indicates the team’s Sagarin Predictor rank and the number in parentheses indicates each team’s seed.

Rank

Team

2nd Round

Sweet 16

Elite 8

Final 4

(1) Florida

96.3%

74.6%

56.5%

35.9%

(2) Kansas

94.0%

72.8%

52.2%

30.1%

(6) Ohio St

74.4%

46.3%

19.3%

8.1%

(4) UCLA

76.6%

47.3%

17.0%

7.5%

(3) Syracuse

82.7%

40.9%

14.6%

5.2%

(5) VCU

75.6%

38.3%

11.9%

4.5%

(8) Pittsburgh

69.6%

20.1%

10.8%

4.3%

(7) New Mexico

51.1%

13.5%

5.9%

1.6%

(10) Stanford

48.9%

12.5%

5.4%

1.6%

(11) Dayton

25.6%

9.7%

2.1%

0.4%

(9) Colorado

30.4%

4.9%

1.7%

0.4%

(13) Tulsa

23.4%

8.0%

1.3%

0.2%

(12) SF Austin

24.4%

6.4%

0.9%

0.1%

129

(14) W Michigan

17.3%

3.1%

0.4%

<0.1%

150

(15) E Kentucky

6.0%

1.1%

0.2%

<0.1%

190

(16) Albany

3.7%

0.5%

0.1%

<0.1%

Florida and Kansas are the clear favorites to win the South region. They are the only teams in the region ranked in the top 13 of the Sagarin rankings! However, keep in mind that the computers can’t account for injuries and Kansas will be without 7 foot center Joel Embiid for at least the first two games. Without him, Kansas is definitely not the 4th best team in the country, and if he’s out for more than the first weekend of the tournament, Kansas could be in trouble.

In addition to the Embiid situation, Kansas is ranked lower in both the Golden Mean (5th) and Pomeroy (9th), whereas Florida is still 4th and 3rd respectively. So those systems would have Florida as an even heavier favorite. The safest pick here is to go with Florida, but even without Embiid Kansas is still a really good team, so no shame if you want to go with the Jayhawks.

If you’re looking for a dark horse, the statistics like Ohio State. They’re a pretty big favorite in their first round game for a 6 seed, and both Sagarin rankings would actually favor the Buckeyes in a potential second round matchup against Syracuse (although Pomeroy would have them as slight underdogs). And their numbers would improve even more if they face a Embiid-less Kansas in the Sweet 16 (or if Kansas has already been knocked off by New Mexico or Stanford).

UCLA has the 4th best shot, but they’ll be pretty big underdogs to Florida if they face them. But one quick point about their potential 2nd round game against VCU. VCU thrives when they are forcing turnovers (they’re #1 in the country in defensive turnover percentage), but UCLA ranks 16th in the country in offensive turnover percentage. So just like Michigan was in the NCAA tournament last year, UCLA appears to be a bad matchup for VCU.

If you’re looking for 1st round upsets, this doesn’t appear to be the region for you. The 1 through 6 seeds all have at least about a 75% chance of winning. However, keep in mind that if you multiply all the probabilities together, the odds that they all win are only 32%. So chances are, at least one of them will lose. The question is which one will it be? Guess away!

One final note, you’ll see that Pittsburgh is a pretty heavy favorite in the 8/9 game. Most people picking in your “just for fun” office pool will probably assume that game is a coin flip. So if want to gain edge (a very small one, but an edge nonetheless) go ahead and pick Pitt.

East Region

Rank

Team

2nd Round

Sweet 16

Elite 8

Final 4

(2) Villanova

94.4%

71.7%

51.1%

29.9%

(1) Virginia

96.7%

76.7%

47.4%

27.5%

(4) Michigan St

88.2%

62.6%

33.4%

18.2%

(3) Iowa St

80.2%

51.4%

22.0%

9.6%

(7) Connecticut

66.3%

20.7%

10.3%

3.7%

(6) North Carolina

59.0%

27.0%

9.0%

3.1%

(5) Cincinnati

53.9%

19.3%

6.7%

2.4%

(12) Harvard

46.1%

15.1%

4.8%

1.5%

(8) Memphis

53.0%

12.6%

4.1%

1.2%

(11) Providence

41.0%

15.4%

4.1%

1.1%

(9) G Washington

47.0%

10.3%

3.1%

0.8%

(10) St. Joseph’s

33.7%

6.7%

2.3%

0.5%

(14) NC Central

19.8%

6.1%

1.0%

0.2%

113

(13) Delaware

11.8%

2.9%

0.4%

0.1%

171

(15) Milwaukee

5.6%

0.9%

0.1%

<0.1%

237

(16) Coastal Carolina

3.3%

0.4%

<0.1%

The East is a much more open race than the South region. Villanova is the favorite to advance, but Virginia and Michigan St aren’t too far behind. And speaking of Michigan St, are you ready for some more injury talk? The Spartans had a plethora of injuries this season, with multiple key players missing time. The computers don’t know this, so the Spartans are probably underrated in all of the rankings. But keep in mind that underrated is still ranked 8th! (MSU is 9th in golden mean and 10th in Pomeroy, so they’re all pretty similar). Now, all of Michigan State’s players are healthy, and they looked very impressive in winning the Big 10 tournament. You could easily make the argument that if they were healthy all year, they’d be ranked higher than both Villanova and Virginia. So their probabilities in this table are too low (and their potential opponents' probabilities are too high). But how much higher Michigan State’s chances should be is very subjective and hard to quantify.

But don’t just rule out Virginia yet, they’re still a very good team. With Cincinnati being a pretty weak 5 seed, and neither 8/9 seed being very good, it’s very likely that Michigan State and Virginia will meet in the Sweet Sixteen. That should be a great game, and you shouldn’t feel bad picking either team to win.

In the bottom half of the bracket, Villanova gets the benefit of there not being any other great teams to go up against them. The only other team ranked in the top 20 is Iowa St (more on them in a second), and there is only a 50% chance they’ll even have to play them in the Sweet Sixteen! Although you’ll see that 7 seeded Connecticut is ranked higher than either the 6 seed or the 5 seed (also true in both golden mean and Pomeroy). They could present a tough 2nd round matchup for Villanova. If you want to pick some chaos in this region, you could do worse than Connecticut over Villanova. And also keep in mind that Villanova had some very good luck in close games (8-1 in 2 possession/OT games). And as the Big East tournament showed, that luck can run out at any time.

Speaking of chaos, I want to talk about Iowa State for a minute. They are a heavy favorite over NC Central here, but the golden mean and Pomeroy rankings both have Iowa State ranked a lower and NC Central ranked higher. And, if you read John Gasaway’s Tuesday Truths, you’ll know that over 3,750 Big 12 possessions, Iowa State had the same scoring differential as Texas and Kansas St (ranked 38th and 40th respectively in Sagarin Predictor). Don’t be surprised if NC Central keeps that game close, and you’d be wise to avoid putting Iowa State too far in your bracket. Every ranking system has outliers, and it just very may be that even at 15th, the Sagarin Predictor has Iowa State a little too high.

Looking for a first round upset? Let me present to you Harvard over Cincinnati. Cincinnati will be the favorite, but this is definitely a winnable game for Harvard. And considering Michigan State would be a heavy favorite over Cincinnati in the next round anyway, you have very little to lose!

West Region

Rank

Team

2nd Round

Sweet 16

Elite 8

Final 4

(1) Arizona

96.2%

70.1%

57.7%

42.4%

(3) Creighton

88.2%

62.6%

37.8%

16.8%

(2) Wisconsin

85.8%

57.6%

31.9%

13.4%

(9) Oklahoma St

61.2%

20.1%

13.6%

7.5%

(4) San Diego St

72.6%

41.4%

10.9%

4.7%

(5) Oklahoma

67.6%

37.1%

9.4%

3.9%

(7) Oregon

59.0%

24.8%

10.4%

3.1%

(6) Baylor

62.8%

24.0%

10.5%

3.1%

(8) Gonzaga

38.8%

9.4%

5.4%

2.4%

(10) BYU

41.0%

14.0%

4.8%

1.1%

(11) Nebraska

37.2%

10.4%

3.3%

0.7%

(12) N Dakota St

32.4%

12.2%

1.8%

0.5%

(13) New Mexico St

27.4%

9.3%

1.2%

0.3%

112

(15) American

14.2%

3.6%

0.7%

0.1%

119

(14) Lafayette

11.8%

3.0%

0.5%

0.1%

167

(16) Weber St

3.8%

0.4%

0.1%

<0.1%

Arizona has a big advantage in this region because they have very weak 4 and 5 seeds. In fact, they’re going to face a tougher team in their 2nd game than they will the Sweet 16! Both Gonzaga (8 seed) and Oklahoma St (9) are ranked higher than the 4 and 5 seeds! In fact, Oklahoma St has the 4th best chance of going to the Final Four…as a 9 seed! What is going on?

Oklahoma State lost 12 games this season, but an amazing ten of them were by 2 possessions or less! And in one of the 2 blowout losses, they were without their best player, Marcus Smart. Overall Oklahoma State went 4-10 in 2 possession/OT games, making them much better than their record would indicate (they’re ranked 342 in Pomeroy’s luck statistic). Sagarin predictor has them at #10 and golden mean has them at #12. However, I should note that Pomeroy only has them at #22, so not all the computers agree that they are a top 12 team.

The problem with taking Oklahoma State to go far is that Arizona is really good (plus don’t sleep on Gonzaga, Pomeroy actually has the Zags ranked higher than Oklahoma State). Arizona was #1 in all 3 rankings most of the season, but they fell to 2nd in both Sagarin rankings the last weekend of the season (they’re still #1 in Pomeroy). If you really want to get crazy, it’s not unthinkable to have Oklahoma State (or even Gonzaga) upsetting Arizona, but the safe play is to put Arizona to at least the regional finals.

In the bottom half of the region, all the ranking systems agree that Creighton is a better team than Wisconsin (although by a slim margin). So Creighton has slightly better odds of advancing to the regional finals. But let’s focus on Wisconsin for a minute. They have only a 57.6% chance of making the Sweet Sixteen. That’s pretty low for a 2 seed. They face a better than you think opponent in American in the opening round. The Sagarin predictor gives American a 14% chance of pulling the upset. For reference, that’s about the same chance as Florida Gulf Coast had of beating Georgetown last year. And that probability only goes up if you use Pomeroy. He gives American an 18% chance of winning! If you want to pick a crazy upset (maybe you get bonus points in your pool for upsets), this is your best bet for a 2/15 upset.

If Wisconsin beats American, they’ll most likely have to face a tough Oregon team. Oregon has to face BYU in the first round, but they have even a better chance of advancing than shown here. BYU had starting guard Kyle Collinsworth tear his ACL in BYU’s last game. The percentages above don’t account for that (and BYU is an underdog anyway) so avoid picking BYU in your bracket. Assuming they beat BYU, Oregon definitely has the ability to knock off Wisconsin in the 2nd round.

In the land of first round upsets, that 12/5 games is coming up once again. North Dakota State has about a 1 in 3 chance of knocking off Oklahoma (same in all 3 ranking systems). And they’d have about the same chance of beating San Diego State in the next round too! That gives about a 1 in 9 chance that the name Taylor Braun will be very well known by the end of the weekend!

Midwest Region

Rank

Team

2nd Round

Sweet 16

Elite 8

Final 4

(4) Louisville

93.5%

83.4%

68.8%

53.4%

(3) Duke

86.9%

58.8%

37.5%

15.1%

(2) Michigan

93.1%

62.7%

31.4%

10.7%

(1) Wichita St

89.2%

49.8%

13.5%

6.2%

(11) Tennowa!

68.5%

29.4%

15.2%

4.6%

(8) Kentucky

63.6%

34.0%

9.0%

4.1%

(10) Arizona St

52.1%

19.4%

6.7%

1.5%

(7) Texas

47.9%

17.0%

5.6%

1.2%

(5) St. Louis

51.8%

7.5%

2.8%

0.9%

(9) Kansas St

36.4%

14.8%

2.7%

0.9%

(12) Xavier

48.2%

6.6%

2.4%

0.7%

(6) Massachusetts

31.5%

8.6%

2.9%

0.5%

(13) Manhattan

6.5%

2.4%

0.6%

0.1%

(14) Mercer

13.1%

3.3%

0.7%

0.1%

169

(16) Cal Poly

10.8%

1.5%

0.1%

<0.1%

198

(15) Wofford

6.9%

0.9%

0.1%

<0.1%

This is going to be a fun one! Let’s start with the play in games. I’m using the numbers for Xavier and Cal Poly here since they are the higher ranked team. But for Iowa/Tennessee, I took their average ranking (Iowa is 13, Tennessee is 20 and I rounded up) and called that team Tennowa. Tennowa is a very dangerous team, but we’ll get to them later.

Let’s start with the big one, Wichita State. The Sagarin ratings are not a fan, as they’re 11th in golden mean and 17th in predictor. Seventeenth! However, Pomeroy has them at #5. Obviously if you put them at #5, their odds become much better than what’s shown here and Louisville’s get worse. The truth is that it’s really hard to judge how good this team is. A case like Wichita State doesn’t happen often, so there is very little history to go off of. And the thing is, this tournament isn’t going to prove things one way or another either. An early loss doesn’t mean they were bad, and a deep run doesn’t prove that they’re great. It’s a single elimination tournament were crazy things happen. We do know that this Wichita State team is better than the one that went to the Final Four last year, but no matter which ranking system you use, their chances of doing that again are small.

The main reason for those small chances is Louisville. Ranked #1 in both Sagarin rankings and #2 in Pomeroy, they’d be favored over Wichita no matter how you slice it. If you’re not a believer in the Shockers, Louisville is a no-brainer to advance to at least the regional finals. And to add to that, predictor rankings say Wichita would only be a 51% favorite over Kentucky if they met in the 2nd round. So picking Kentucky to reach the Sweet Sixteen wouldn’t be a bad selection.

In the bottom half of the bracket, things could get crazy. Let’s start by pointing out that Tennowa is going to be very heavy favorites over Massachusetts. All three ranking systems agree you should avoid picking the Minutemen to advance.

Meanwhile, Michigan is the weakest 2 seed in the field, but they lucked out in that the 15 seed is very weak (they can be glad it’s not American) and neither Texas or Arizona State are ranked very high. But still, Michigan is ranked so low that the winner of the 7/10 game will have a realistic shot of pulling the upset (about 1 in 3).

As for Duke, they are the highest ranked 3 seed but they have a pretty tough road. Their opening matchup against Mercer isn’t a gimme (Mercer is in the 90s for both Predictor and Pomeroy but actually 73rd in golden mean). Then they’ll most likely have to face Tennowa, which could be a higher ranked team than any of the other top seeds have to face in the 2nd round. And if you think Iowa being ranked 13th is ridiculous, know that Tennessee is 13th in Pomeroy! Now both of those teams are likely outliers, and not really top-15 good. But they’re still much better than your average 11 seed. Either way, Duke fans should be rooting hard for Massachusetts in the opening round!

As for first round upsets, I’ve already mentioned that you should take the 11 seed over Massachusetts. But you’d be inclined to take the 12 seed too (especially if it’s Xavier). Saint Louis is a very weak 5 seed no matter which ranking system you use, and they will definitely be on upset alert early on.

And one more thing before we move on the Final Four. Assuming they win the play in game on Wednesday, look at Cal Poly’s chances of advancing to the 2nd round. They’re about 11%. For a 16 seed, that’s about as good as you’re going to get. Now keep in mind this assumes Wichita State is the 17th best team, so if you believe they are better than that ignore this paragraph. But if they’re not, Poly would have a realistic shot at becoming the first ever 16 seed to beat a 1 seed. Friday night at 7 p.m. Make sure to keep an eye on it!

Final Four

Rank

Team

Final Four

Semifinals

Champion

(4) Louisville

53.4%

36.3%

25.2%

(1) Arizona

42.4%

23.7%

14.9%

(1) Florida

35.9%

22.1%

11.3%

(2) Kansas

30.1%

17.5%

8.4%

(2) Villanova

29.9%

15.1%

6.9%

(1) Virginia

27.5%

14.2%

6.1%

(4) Michigan St

18.2%

9.3%

3.7%

(3) Duke

15.1%

7.0%

3.4%

(3) Creighton

16.8%

6.5%

2.9%

(2) Wisconsin

13.4%

4.9%

2.0%

(2) Michigan

10.7%

4.2%

1.7%

(3) Iowa St

9.6%

4.2%

1.4%

(9) Oklahoma St

7.5%

2.8%

1.2%

(6) Ohio St

8.1%

3.3%

1.1%

(4) UCLA

7.5%

3.0%

1.0%

(1) Wichita St

6.2%

2.1%

0.8%

Louisville is the #1 ranked team in the Sagarin Predictor, so it follows that they have the best probability of winning the entire thing. But don’t think of this as a prediction that they will win. In fact, this is saying that there is a 75% chance that they won’t win. If you think the computers are overestimating Louisville (or if you just want to pick somebody else), Arizona and Florida are fine choices. And keep in mind that Pomeroy still has Arizona ranked #1 (although Louisville isn’t far behind at #2). Plus Michigan State has a better chance than the numbers here indicate.

One last point I’d like to make is that everybody loves to complain about teams being under- or over-seeded. But look how the probability of each team winning the tournament almost lines up perfectly with their ranking in the Sagarin predictor. The best teams have the best odds, regardless of the seed. Louisville doesn’t suffer at all from being a 4 seed (in fact, the committee probably did them a favor putting them in a region with Wichita as the 1 and Michigan as the 2). Even Oklahoma State and Ohio State aren’t hurt too much by being a 9 and 6 seed!

And remember, in a single game, the best team doesn’t always win. If we could predict this entire thing perfectly with mathematics, it would be really boring and Warren Buffet wouldn’t be offering a billion dollars for a perfect bracket. So use the statistics as a guide, but have some fun with it too. So sit back, relax (though I'm not sure if that's possible during the tournament), and enjoy the madness!

↧

Create a DOE Screening Experiment with the Assistant in Minitab 17

March 19, 2014, 6:55 am

≫ Next: Five Ways to Make Your Control Charts More Effective

≪ Previous: Predicting the 2014 NCAA Tournament

If you’ve been looking at Minitab 17, you’ve noticed a lot of new enhancements. For me, the biggest of these is the addition of Design of Experiments (DOE) to the Assistant. DOE in the Assistant has so many exciting aspects it’s hard to take it all in at once, but here are 5 highlights for when you plan and create a screening experiment:

1. Just-in-time guidance

If you’re lucky, you’ve had the chance to study DOE with an expert. If not, even the flow chart that opens with the Assistant to plan an experiment might seem intimidating. Fortunately, you don’t have to go scouring the thrift store for used textbooks. Select Assistant > DOE > Plan and Create and click any of the tan shapes in the flow chart. The Assistant gives you the critical information you need to proceed.

Say, for example, that you aren’t sure what your first step should be. Click Plan the experimentation process and the Assistant reminds you that a good place to start is by making a list of potential factors. If you want help with that, you can use a fishbone diagram in Quality Companion or choose Stat > Quality Tools > Cause and Effect.

The Assistant provides guidance on each element in the flow chart.

2. Simplified decisions

Minitab Statistical Software is built for flexibility, so if you use the DOE tools accessible from the Stat menu, you have the option to choose from over 250 base screening designs. That flexibility is a powerful feature to have access to, but it's more than most people will need to do their jobs every day.

In the Assistant, the steps have been streamlined to make the work that you do every day faster. The Assistant offers 14 screening designs that will cover most of the typical situations you’ll encounter. On the flow chart, click Create Screening Design and you’re on your way.

The number of factors and the number of runs are the only decisions for setting up a screening design in the Assistant.

3. Print the form

As soon as you click OK to create your design, the Assistant asks whether you want to print a data collection form. If you're entering your data directly into the worksheet, you may not need to print. But if you'll be collecting data manually (for instance, if your experiment happens in a place where you don't have access to your computer), the Assistant takes care of spacing, orienting, and titling the form so that you don’t have to. Take a look at the difference!

Print the worksheet from Minitab with no changes:

The printed worksheet is on two pages.

Let the Assistant do it for you:

The Assistant prints the form on one page, with a title, and plenty of space to write.

4. Detection Ability

You wouldn’t know this unless you read the white paper, but one of the great things about the 14 screening designs in the Assistant is that they were chosen with their ability to detect effects in mind. That detection ability, which you might know as “statistical power” if you had the chance to study DOE with an expert, is an important feature that makes sure that your designed experiment isn’t a waste of time. The least powerful screening design the Assistant offers has an 80% chance of detecting an effect just a bit larger than 2 standard deviations (a moderate size effect). The rest all have a better than 80% chance of detecting an effect smaller than 2 standard deviations.

That detection ability means that when you choose a design the Assistant offers, you can be confident you’re planning an experiment that’s going to find important effects. The Summary Report that’s printed when you create the design even tells you the effect size you can detect 60% of the time and 80% of the time. For example, for 6 factors in 12 runs, you can detect an effect of 1.68 standard deviations 80% of the time.

The design can detect a 1.68 standard deviation effect 80% of the time.

5. Pre-Experiment Checklist

There’s more to successful DOE than knowing how to check your residuals. That’s what the pre-experiment checklist is for. The checklist reminds you about important steps that don’t necessarily have to be done in Minitab, like making sure people helping to run the experiment agree on how to set the factor levels and performing trial runs if you have the time and money. View the checklist by clicking the link at the bottom of the Report Card after you create the design.

You can view the pre-experiment checklist from the report card.

Whether you need in-depth guidance or a quick review, the pre-experiment checklist makes sure that you can collect your data confidently.

The pre-experiment checklist reminds you about best practices before you collect your data

Get started

There’s a lot to be excited about if you’re moving from Minitab 16 to Minitab 17, and we know a lot of people are excited about the DOE features in the Assistant. If you’re eager to explore the new version, start with the free trial and the quick start exercises so that you can see even more.

↧

Five Ways to Make Your Control Charts More Effective

March 21, 2014, 8:26 am

≫ Next: Got Good Judgment? Prove It with Attribute Agreement Analysis

≪ Previous: Create a DOE Screening Experiment with the Assistant in Minitab 17

Have you ever wished your control charts were better? More effective and user-friendly? Easier to understand and act on? In this post, I'll share some simple ways to make SPC monitoring more effective in Minitab.

Common Problems with SPC Control Charts

manufacturing line SPC I worked for several years in a large manufacturing plant in which control charts played a very important role. Virtually thousands of SPC (Statistical Process Control) charts were used to monitor processes, contamination in clean rooms, monitor product thicknesses and shapes as well as critical equipment process parameters. Process engineers regularly checked the control charts of the processes they were responsible for. Operators were expected to stop using equipment as soon as an out of control alert appeared and report this incident back to their team leader.

But some of the problems we faced had little to do with statistics. For example, comments entered by the operators were often not explicit at all. Control chart limits were not updated regularly and were sometimes not appropriate due to process changes in time. Also, there was confusion about the difference between control limits and specification limits, so even when drifts from the target were clearly identifiable, some process engineers were reluctant to take action as long as their data remained within specifications.

Other problems could be solved with a better knowledge of statistics. For example, some processes were cyclical in nature, and therefore the way subgroups were defined was critical. Also, since the production was based on small batches of similar parts, the within-batch variability was often much smaller than the between-batch variability (simply because the parts within a batch had been processed in very similar conditions). This lead to inappropriate control limits when standard X-bar control charts were used.

Red chart

Five Ways to Make SPC Monitoring Control Charts More Effective

Let's look at some simple ways to make SPC monitoring more effective in Minitab. In addition to creating standard control charts, you can use Minitab to:

Import data quickly to identify drifts as soon as possible.
Create Pareto charts to prevent special causes from reoccurring.
Account for atypical periods to avoid inflating your control limits.
Visually manage SPC alerts to quickly identify the out-of-control points.
Choose the right type of charts for your process.

1. Identify drifts as soon as possible.

To ensure that your control charts are up to date in Minitab, you can right click on them and choose “Automatically update Graphs.” However, Minitab is not always available on the shop floor, so the input data often must be saved in an Excel file or in a database.

Suppose that the measurement system generates an XML, Excel or text file, and that this data needs to be reconfigured and manipulated in order to be processed in an SPC chart in Minitab. You can automate these using a Minitab macro.

This macro might automatically retrieve data from an XML or a Text file or from a database (using Minitab's ODBC “Open Data Base Connectivity” functionality) into a Minitab worksheet, or transpose rows into columns, stack columns, or merge several files into one etc. This macro would enable you to obtain a continuously updated Minitab worksheet -- and consequently a continuously updated control chart.

You could easily launch the macro just by clicking on a customized icon or menu in Minitab (see the graph below) in order to update the resulting control chart.

SPC Tool Bar

Alternatively, if the macro is named Startup.mac, it will launch whenever you launch Minitab. If you're using Minitab to enable process operators or engineers to monitor control charts, you could also customize Minitab's toolbars and icons in order to show only the relevant toolbars and icons and focus on SPC.

The product support section of our website has information on adding a button to a menu or toolbar that will update data from a file or a database.

2. Create Pareto charts to prevent special causes from reoccurring.

Statistical Process Control may be used to identify the true root causes (the so-called special causes) of quality problems from the surrounding process noise (the so-called common causes). The root causes of quality issues need to be truly understood in order to prevent reoccurrence.

A Pareto chart of the causes for out-of-control points might be very useful to identify which special causes occur most frequently.

Comments can be entered in a column of the Minitab worksheet for each out-of-control point. These comments should be standardized for each type of problem. A list of keywords displayed in the Minitab worksheet would help operators enter meaningful keywords, instead of comments that differ each time. Then a Pareto chart could be used to identify the 20% causes that generate 80% of your problems, based on the (standardized) comments entered in the worksheet.

Pareto

Comments can even be displayed in the SPC chart by using the annotation toolbar. Click on the T (text) icon of the Graph Annotation toolbar.

3. Account for atypical periods to avoid inflating your control limits.

Atypical periods (due to measurement issues, outliers, or a quality crisis) may artificially inflate your control chart limits. In Minitab, control limits may be calculated according to a reference period (one with standard, stable /predictable behavior), or the atypical period may be omitted so that control limits are not affected.

In Minitab, go to Options in the control chart dialogue box, look for the Estimate Tab and select the subgroups to be omitted (untypical behavior, outliers), or use only some specified sub-groups to set reference periods. Although the atypical period will still get displayed on the control chart, it won't affect the way your control limits are estimated.

Untypical

If a reference period has been selected, you will probably need to update it after a certain period of time to ensure that this selection is still relevant.

4) Visually manage SPC alerts to quickly identify out-of-control points.

If the number of control charts you deal with is very large, and you need to quickly identify processes that are drifting away from the target, your could display all control charts in a Tile format (go to Window > Tile). When the latest data (i.e., the last row of the worksheet) generates an out-of-control warning, you can have the control chart become completely red, as shown in the picture below:

Red chart

You can do this by going to Tools > Options. Select “Control Charts and Quality Tools” on the list, then choose Other. Under the words “When last row of data causes a new test failure for any point,” check the box that says "Change color of chart." Note that the color will change according to the last row (latest single value) not according to the latest subgroup, so this option is more effective when collecting individual values.

5. Choose the right type of charts for your process.

When it comes to control charts, one size does not fit all. That's why you'll see a wide array of options when you select Stat > Control Charts. Be sure that you're matching the control chart you're using to the type of data and information you want to monitor. For example, if your subgroups are based on batches of products, I-MR-R/S (within/between) charts are probably best suited to monitor your process.

If you're not sure which control chart to use, you can get details about each type from the Help menu in Minitab, or try using the Assistant menu to direct you to the best test for your situation.

↧

Got Good Judgment? Prove It with Attribute Agreement Analysis

March 24, 2014, 5:31 am

≫ Next: Using Statistics to Show Your Boss Process Improvements

≪ Previous: Five Ways to Make Your Control Charts More Effective

Many Six Sigma and quality improvement tools could be applied in other areas. For example, I wonder whether my son's teachers could benefit from a little attribute agreement analysis.

He seemed frustrated the other day when I picked him up at school. He'd been working on a presentation that needed to be approved by his teachers. (My son attends a charter school, and each class is taught by a two-person teaching team.)

"What's wrong?" I asked when he clambered into the car with a big sigh.

My son explained that he'd given the presentation to teacher Jennifer that morning. A few minor suggestions aside, she thought it was fine. Jennifer told him the presentation was ready to deliver.

But when he gave the presentation to Jeff in the afternoon, the feedback was very different. Jeff felt the content of the presentation was too vague, and that my son needed to do more research and add more information. Jeff told him the presentation wasn't acceptable.

And because Jennifer had already left for the day, there wasn't a chance to reconcile these very different opinions.

No wonder my son felt frustrated.

The Challenge of Judging Attributes Consistently

gavel We all need to make judgments every day. Some are fairly inconsequential, such as whether you think a given song on the radio is good or bad. But judgments we make at work can have profound impacts on customers, coworkers, clients, employees...or students.

Inspectors classify parts as good, or bad. Employment screeners select applicants they think are worth interviewing. And instructors decide whether a student's work is acceptable. In each case, judgments are made about one or more attributes, which can't easily be measured objectively.

That's where the problems start. One synonym for "judgment" is "opinion," and peoples' opinions don't always match. That's not always a problem: If I like a song and you don't, it's not a big deal. But when two or more people have contradictory assessments of critical things, disagreement can cause real problems. The quality of a business' parts or service can vary from day to day, or even from shift to shift. Customers' experiences can be very inconsistent from one day to the next.

As if different judgments from different people aren't problematic enough, we also have a great capacity for disagreeing with ourselves. And in many cases we're inconsistent without even recognizing it: if you're inspecting parts that all look the same, are you sure you'd judge the same part the same way every time? And can you be sure you're inspecting parts consistently with your fellow inspectors?

Or, in the case of my son's teachers, how can you be sure your assessment of a student's work is consistent with your own judgments, and with those of your fellow instructors?

Benefits of Attribute Agreement Analysis

These situations can be illuminated by Attribute Agreement Analysis. Attributes are difficult to measure -- that's why we rely on judgments instead of objective measurements to assess them -- but we can collect data that reveals whether different people assign attributes to an item consistently, and whether an individual makes the same judgment when assessing the same item at different times.

Attribute Agreement Analysis can tell you whether and where you're getting it wrong. Knowing this helps everyone in the process to make better and more consistent judgments.

The results of an Attribute Agreement Analysis may indicate that your team judges attributes very consistently, and that you can be confident in how you're evaluating items. Alternatively, you may find that one or two team members make very different judgments than others, or that you don't always rate the same item the same way.

Identifying those issues gives you the opportunity to make improvements, through training, developing clearer standards, or other actions.

If my son's teachers did an Attribute Agreement Analysis, they might find they're not on the same page about what makes a good presentation. If they knew that was the case, they could then develop clearer and more consistent standards so they could more fairly assess their students' work.

How to Do an Attribute Agreement Analysis

There are two main steps in an Attribute Agreement Analysis:

Set up your experiment and collect the data
Analyze the data and interpret the results

You can use the Assistant in Minitab Statistical Software to do both. If you're not already using it, you can try Minitab free for 30 days.

The Assistant gives you an easy-to-follow Attribute Agreement Analysis worksheet creation tool and even lets you print out data collection forms for each participant and each trial:

Attribute Agreement Analysis Worksheet Creation

Collect your data, then use the Assistant to analyze it and give you clear interpretations of what your results mean.

See a step-by-step breakdown of how it's done in this QuickStart exercise for Minitab 17, in which a family uses Attribute Agreement Analysis to discover the source of their disagreements about dinner. The example includes instructions, a quick video summary, and a downloadable data set so you can try the analysis yourself.

Where could you use Attribute Agreement Analysis in your work or personal life?

↧

Using Statistics to Show Your Boss Process Improvements

March 25, 2014, 6:38 am

≫ Next: The Best European Football League: What the CTQ’s and Minitab Can Tell Us

≪ Previous: Got Good Judgment? Prove It with Attribute Agreement Analysis

Ughhh... your process is producing some parts that don't meet your customer's specifications! Fortunately, after a little hard work, you find a way to improve the process.

However, you want to perform the appropriate statistical analysis to back up your findings and make it easier to explain the process improvements to your boss. And it's important to remember that your boss is much like the boss in Eston's posts -- he's not too familiar with statistics, so you'll have to take it slow and show lots of "visual aids" in your explanation. How should you begin?

Enter before-and-after process capability analysis.

A process capability analysis evaluates how well a process meets a set of requirements defined by specification limits.

For example, a manufacturer of photocopiers requires that the width of a rubber roller must be between 32.523 cm and 32.527 cm to avoid paper jams. Capability analysis reveals how well the manufacturing process meets these specifications, and provides insight into how to improve the process.

Performing Before and After Process Capability Analysis

Assessing process capability before and after making process changes can be a valuable and easy way to prove improvements were made, while also ensuring your process is still meeting specification limits and producing “good” parts.

But before assessing processing capability, you must first ensure your process is stable—you can’t predict the performance of an unstable process! But you can predict, and improve on, a stable process.

Let’s go through a quick example to illustrate how one group of engineers used Minitab, as well as the Assistant, to create control charts and perform process capability analysis to assess a process and prove they made process improvements. (By the way, the Assistant menu was expanded to include DOE and Multiple Regression in the latest release of Minitab. Learn more and give it a try at www.minitab17.com/whatsnew)

Using Process Capability to Assess a Process

An extruded parts maker needed to verify their process was running between the lower spec limit of 17.5 hardness units and the upper limit of 22.5 hardness units, with the target value of 20. To collect data for their analysis, operators randomly selected three extruded parts at regular intervals, and recorded the hardness of each part:

(Download the data here if you want some practice.)

Engineers then used process capability analysis (in Minitab, go to Stat > Quality Tools > Capability Sixpack > Normal) to evaluate the capability of the manufacturing process in meeting the aforementioned customer requirements.

The control charts revealed a stable process, with all points falling within the control limits. However, the histogram showed that many measurements fell outside the specification limits, with 73,603 parts per million defective overall. With more parts falling above the upper spec limit than below the lower limit, the engineers concluded that the process mean must be shifted. Also, the histogram shows that variation needed to be reduced in order to reduce the number of defective parts and improve the capability of the process.

Using Process Capability to Verify Improvements

The engineers adjusted the process to reduce the variation and obtain a process mean closer to 20. So once again, operators collected 30 parts using the same sampling method, and measured the hardness of the parts.

The engineers then ran a before-and-after capability analysis using the Assistant to see how their improvements successfully shifted the process mean and reduced variation. Below, you can see that the Assistant provided the engineers with the interpretation of their output, which also doubled as an easy-to-understand "visual aid" to share with their boss:

The engineers were happy to see their adjustment to the process reduced the PPM from 73,603 to 2,681—a 96% reduction in percent out of spec—and the process mean shifted from 20.820 to 20.037, along with reduced variation. Best of all, the engineers had statistical proof and graphs to easily explain the process improvements to their boss!

For more on process capability analysis, check out:

Learning Process Capability Analysis with a Catapult

↧

The Best European Football League: What the CTQ’s and Minitab Can Tell Us

March 27, 2014, 5:54 am

≫ Next: Equivalence Testing for Quality Analysis (Part I): What are You Trying to Prove?

≪ Previous: Using Statistics to Show Your Boss Process Improvements

by Laerte de Araujo Lima, guest blogger

football In a previous post (How Data Analysis Can Help Us Predict This Year's Champions League), I shared how I used Minitab Statistical Software to predict the 2013-2014 season of the UEFA Champions league. This involved the regression analysis of main critical-to-quality (CTQ) factors, which I identified using the “voice of the customer” suggestions of some friends.

Since that post was published, my friends have stopped discussing the UEFA Champions league—they were convinced by the results I shared.

But now they’ve challenged me to use Six Sigma tools to quantify which European football league is best. In other words, which league gives its fans the best value (average per game) in terms of the CTQ factors that make games fun to watch?

Critical to the CTQ—Voice of the Customer

This analysis will be based on the same CTQ factors used in my previous post. I debriefed my friends (in a bar, while watching a football match, of course) after publishing that post, and they agreed that these CTQ really match their expectations about what should and should not happen in a match.

However, I did add one new CTQ factor in this study, “Average Number of Yellow and Red Cards,” since these data were available in a new database.

CTQ – Voice of Customer

UEFA CL database variable associated with CTQ

More goals per game to make game more fun!

↑ Average goals scored per game

Offensive strategy, with more attempts to score goals.

↑ Average attempts on target per game

↑ Average goals scored per game

↓ Average fouls committed per game

More effective use of game time.

↓ Average fouls committed per game

More “fair play” and protection for players with high football skills.

↓ Average fouls committed per game

↓ Average number of Yellow and Red cards

European Football League Database

The hardest part of this study was finding a reliable and complete database. For this, my friends at www.whoscored.com proved best. In this database, I could find all variables associates with the previous defined CTQ.

I apologize to my Portuguese and French friends, but as I noted in my previous post, only the most predominant countries and leagues (Italy, England, Germany, and Spain) in the last 12 UEFA Championship League seasons are considered in this scenario.

Country

League

# of teams

# of match’s per team

Web site

Spain (ES)

LIGA BBVA

http://www.ligabbva.com

Italy (IT)

Serie A

http://www.legaseriea.it/

England (UK)

Barclays Premier League

http://www.premierleague.com/en-gb.html

Germany (GE)

Bundesliga

http://www.bundesliga.de/de/index.php

Ranking Criteria and Methodology

Based on the CTQ factors, I performed an analysis of each league, looking at each team’s individual average values for each CTQ. This lets me compare not only the overall league average, but also the league’s teams’ average.

To perform this analysis, I used the statistical tool called Analysis of Variance (ANOVA). ANOVA tests the hypothesis that the means of two or more populations are equal.

ANOVA evaluates the importance of one or more factors by comparing the response variable means at the different factor levels. The null hypothesis states that all population means (factor level means) are equal while the alternative hypothesis states that at least one is different.

For this analysis I used the Assistant to perform One-Way ANOVA analysis. Assistant > Hypothesis test > One-Way ANOVA.

ANOVA Chooser in the Minitab 17 Assistant

Based on the One-Way ANOVA analysis, I’m able to identify and position the leagues based on last season’s average values of CTQ variables per match.

Then, after compiling all results, I deployed a Decision Matrix (another Six Sigma tool) to assess each league on the CTQ variables. The position of the league in the analysis and its associated weight (1 / 5 / 10) will give a final score for each league

Average Number of Yellow & Red Cards

One Way ANOVA for Average Yellow and Red Cards

As the Assistant's output makes clear, the One Way ANOVA hypothesis test p-value (0.001) is less than the threshold ( < 0.05), which indicates that there is a difference among the means. The table to the right of the p-value calculation show us which means differ from others, and the means comparison chart gives the graphical view of the statistical analysis.

Conclusion: The U.K. football league has the lowest average of cards (Yellow & Red) per game.

Average Fouls Committed per Game

The p-value (0.001) is less than the threshold (< 0.05), telling us that there is a difference in means. In this case, based on the comparison chart, it’s evident that the U.K. league has the lowest average number of fault per game so far. Among the remaining three leagues, there’s not a statistically significant difference between the Spanish and Italian leagues, nor between the Italian and German. In this situation, I’ve decided to give to the Spanish league a different score than German and Italian.

Average Goals Scored per Game

The p-value (0.729) is greater than the threshold (< 0.05), indicating there is no significant difference in the means.

Conclusion: No matter which league you watch, the number of goals per match will be, on average, the same.

Average Attempts on Target per Game

ANOVA of attempts per game

Again, the test p-value result (0,891) is greater than the threshold ( < 0.05), telling us that there is no statistically significant difference in the average attempts on target per game.

Conclusion: All four leagues will will receive the same score for this variable.

The final decision matrix helps us see the results of all of these analyses:

decision matrix

chart of critical to quality factors

Conclusion: The Best European Football League

Based on the results shown above, we can conclude the following.

Normality is not an issue. Except for the German league (18), all data have a sample size of 20 teams by league, and are normally distributed.
The U.K. league is the best in terms of “Fair Play.” Both average faults per game and cards (yellow & red) are less those than others leagues. (Now I can understand the critics of English support to the referee when an English team plays against another European team.)
There is no difference among the leagues in terms of average attempts on target per game, nor average goals per match.
Under the premise that the best European football league should have the best performance regarding the variables related to the selected CTQ, England’s football league comes out on top.

If a good league is one that answers the “customer” expectations (CTQ), the exercise performed in this post shows that England’s supporters (from all teams in premier league) should be the most satisfied supporters in Europe.

Unfortunately, this analysis is only based on the last season’s data, and thus it may only represent one static season and not a trend. But at minimum this analysis indicates that there are significant difference among the leagues, especially in the “fair play” CTQ factor.

About the Guest Blogger:

Laerte de Araujo Lima is a Supplier Development Manager for Airbus (France). He has previously worked as product quality engineer for Ford (Brazil), a Project Manager in MGI Coutier (Spain), and Quality Manager in IKF-Imerys (Spain). He earned a bachelor's degree in mechanical engineering from the University of Campina Grande (Brazil) and a master's degree in energy and sustainability from the Vigo University (Spain). He has 10 years of experience in applying Lean Six Sigma to product and process development/improvement. To get in touch with Laerte, please follow him on Twitter @laertelima or on LinkedIn.

↧

Equivalence Testing for Quality Analysis (Part I): What are You Trying to Prove?

March 31, 2014, 5:39 am

≫ Next: Equivalence Testing for Quality Analysis (Part II): What Difference Does the Difference Make?

≪ Previous: The Best European Football League: What the CTQ’s and Minitab Can Tell Us

With more options, come more decisions.

With equivalence testing added to Minitab 17, you now have more statistical tools to test a sample mean against target value or another sample mean.

Equivalence testing is extensively used in the biomedical field. Pharmaceutical manufacturers often need to test whether the biological activity of a generic drug is equivalent to that of a brand name drug that has already been through the regulatory approval process.

But in the field of quality improvement, why might you want to use an equivalence test instead of a standard t-test?

Interpreting Hypothesis Tests: A Common Pitfall

Suppose a manufacturer finds a new supplier that offers a less expensive material that could be substituted for a costly material currently used in the production process. This new material is supposed to be just as good as the material currently used. It should not make the product too pliable nor too rigid.

To make sure the substitution doesn’t negatively impact quality, an analyst collects two random samples from the production process (which is stable): one using the new material and one using the current material.

The analyst then uses a standard 2-sample t-test (Stat > Basic Statistics > 2-Sample t in Minitab Statistical Software) to assess whether the mean pliability of the product is the same using both materials:

________________________________________

Two-Sample T-Test and CI: Current, New

Two-sample T for Current vs New
             N    Mean   StDev    SE Mean
Current   9   34.092   0.261     0.087
New      10 33.971 0.581     0.18

Difference = μ (Current) - μ (New)
Estimate for difference: 0.121
95% CI for difference: (-0.322, 0.564)
T-Test of difference = 0 (vs ≠): T-Value = 0.60 P-Value = 0.562 DF = 12
________________________________________

Because the p-value is not less than the alpha level (0.05), the analyst concludes that the means do not differ. Based on these results, the company switches suppliers for the material, confident that statistical analysis has proven that they can save money with the new material without compromising the quality of their product.

The test results make everyone happy. High-fives. Group hugs. Popping champagne corks. There’s only one minor problem.

Their statistical analysis didn’t really prove that the means are the same.

Consider Where to Place the Burden of Proof

In hypothesis testing, H1 is the alternative hypothesis that requires the burden of proof. Usually, the alternative hypothesis is what you’re hoping to prove or demonstrate. When you perform a standard 2-sample t-test, you’re really asking: “Do I have enough evidence to prove, beyond a reasonable doubt (your alpha level), that the population means are different?”

To do that, the hypotheses are set up as follows:

If the p-value is less than alpha, you conclude that the means significantly differ. But if the p-value is not less than alpha, you haven’t proven that the means are equal. You just don’t have enough evidence to prove that they’re not equal.

The absence of evidence for a statement is not proof of its converse. If you don’t have sufficient evidence to claim that A is true, you haven’t proven that A is false.

Equivalence tests were specifically developed to address this issue. In a 2-sample equivalence test, the null and alternative hypotheses are reversed from a standard 2-sample t test.

This switches the burden of proof for the test. It also reverses the ramification of incorrectly assuming (H0) for the test.

Case in Point: The Presumption of Innocence vs. Guilt

This rough analogy may help illustrate the concept.

In the court of law, the burden of proof rests on proving guilt. The suspect is presumed innocent (H0), until proven guilty (H1). In the news media, the burden of proof is often reversed: The suspect is presumed guilty (H0), until proven innocent (H1).

Shifting the burden of proof can yield different conclusions. That’s why the news media often express outrage when a suspect who is presumed to be guilty is let go because there was not sufficient evidence to prove the suspect’s guilt in the courtroom. As long as news media and the courtroom reverse their null and alternative hypotheses, they’ll sometimes draw different conclusions based on the same evidence.

Why do they set up their hypotheses differently in the first place? Because each seems to have a different idea of what’s a worse error to make. The judicial system believes the worse error is to convict an innocent person, rather than let a guilty person go free. The news media seem to believe the contrary. (Maybe because the presumption of guilt sells more papers than presumption of innocence?)

When the Burden of Proof Shifts, the Conclusion May Change

Back to our quality analyst in the first example. To avoid losing customers, the company would rather err by assuming that the quality was not the same using the cheaper material--when it actually was--than err by assuming it was the same, when it actually was not.

To more rigorously demonstrate that the means are the same, the analyst performs a 2-sample equivalence test (Stat > Equivalence Tests > Two Sample).

________________________________________

Equivalence Test: Mean(New) - Mean(Current)

Test
Null hypothesis: Difference ≤ -0.4 or Difference ≥ 0.4
Alternative hypothesis: -0.4 < Difference < 0.4
α level: 0.05

Null Hypothesis    DF T-Value P-Value
Difference ≤ -0.4 12   1.3717    0.098
Difference ≥ 0.4   12 -2.5646    0.012

The greater of the two P-Values is 0.098. Cannot claim equivalence.
________________________________________

Using the equivalence test on the same data, the results now indicate that there isn't sufficient evidence to claim that the means are the same. The company cannotbe confident that product quality will not suffer if they substitute the less expensive material. By using an equivalence test, the company has raised the bar for evaluating a possible shift in the process mean.

Note: If you look at the above output, you'll see another way that the equivalence test differs from a standard t-test. Two one-sided t-tests are used to test the null hypothesis. In addition, the test uses a zone of equivalence that defines what size difference between the means you consider to be practically insignificant. We’ll look at that in more detail in my next post.

Quick Summary

To choose between an equivalence test and a standard t-test, consider what you hope to prove or demonstrate. Whatever you hope to prove true should be set up as the alternative hypothesis for the test and require the burden of proof. Whatever you deem to be the less harmful incorrect assumption to make should be the null hypothesis. If you’re trying to rigorously prove that two means are equal, or that a mean equals a target value, you may want to use an equivalence test rather than a standard t-test.

↧

Equivalence Testing for Quality Analysis (Part II): What Difference Does the Difference Make?

April 1, 2014, 5:31 am

≫ Next: Analyze a DOE with the Assistant in Minitab 17

≪ Previous: Equivalence Testing for Quality Analysis (Part I): What are You Trying to Prove?

magnifying glass My previous post examined how an equivalence test can shift the burden of proof when you perform hypothesis test of the means. This allows you to more rigorously test whether the process mean is equivalent to a target or to another mean.

Here’s another key difference: To perform the analysis, an equivalence test requires that you first define, upfront, the size of a practically important difference between the mean and the target, or between two means.

Truth be told, even when performing a standard hypothesis test, you should know the value of this difference. Because you can’t really evaluate whether your analysis will have adequate power without knowing it. Nor can you evaluate whether a statistically significant difference in your test results has significant meaning in the real world, outside of probability distribution theory.

But since a standard t-test doesn’t require you to define this difference, people often run the analysis with a fuzzy idea, at best, of what they’re actually looking for. It’s not an error, really. It’s more like using a radon measuring device without knowing what levels of radon are potentially harmful.

Defining Equivalence Limits: Your Call

How close does the mean have to be to the target value or to another mean for you to consider them, for all practical purposes, “equivalent”?

The zone of equivalence is defined by a lower equivalence and/or an upper equivalence limit. The lower equivalence limit (LEL) defines your lower limit of acceptability for the difference. The upper equivalence limit (UEL) defines your upper limit of acceptability for the difference. Any difference from the mean that falls within this zone is considered unimportant.

In some fields, such as the pharmaceutical industry, equivalence limits are set by regulatory guidelines. If there aren’t guidelines for your application, you’ll need to define the zone of equivalence using knowledge of your product or process.

Here’s the bad news: There isn’t a statistician on Earth who can help you define those limits. Because it isn’t a question of statistics. It’s a question of what size of a difference produces tangible ramifications for you or your customer.

A difference of 0.005 mg from the mean target value? A 10% shift in the process mean? Obviously, the criteria aren't going to be the same for the diameter of a stent and the diameter of a soda can.

Equivalence Test in Practice

Here's a quick example of a 1-sample equivalence test, adapted from Minitab 17 Help.To follow along, you can download the revised data here. If you don't have Minitab 17, download a free trial version here.

Suppose a packaging company wants to ensure that the force needed to open its snack food bags is within 10% of the target value of 4.2N (Newtons). From previous testing, they know that a force lower than 10% below the target causes the bags to open too easily and reduces product freshness.A force above 10% of the target makes the bags too difficult to open. They randomly sample 100 bags and measure the force required to open each one.

To test whether the mean force is equivalent to the target, they choose Stat > Equivalence Tests > 1-Sample and fill in the dialog box as shown below:

Tip: Use the Multiply by Target box when you want to define the equivalence limits for a difference in terms of a percentage of the target. In this case, the lower limit is 10% less than the target. The upper limit is 10% higher than the target. If you want to represent the equivalence limits in absolute terms, rather than as percentages, simply enter the actual values for your equivalence limits and don't check the Multiply by Target box.

When you click OK, Minitab displays the following results:

One-Sample Equivalence Test: Force

Difference: Mean(Force) - Target

Difference SE 95% CI Equivalence Interval
0.14270 0.067559 (0, 0.25487) (-0.42, 0.42)

CI is within the equivalence interval. Can claim equivalence.

Test
Null hypothesis: Difference ≤ -0.42 or Difference ≥ 0.42
Alternative hypothesis: -0.42 < Difference < 0.42
α level: 0.05

Null Hypothesis     DF T-Value P-Value
Difference ≤ -0.42 99   8.3290    0.000
Difference ≥ 0.42   99 -4.1046    0.000

The greater of the two P-Values is 0.000. Can claim equivalence.

Because the confidence interval for the difference falls completely within the equivalence limits, you can reject the null hypothesis that the mean differs from the target. You can claim that the mean and the target are equivalent.

Notice that if you had used a standard 1-sample t-test to analyze these data, the output would show a statistically significant difference between the mean and the target (at a significance level of 0.05):

One-Sample T: Force

Test of μ = 4.2 vs ≠ 4.2
Variable N Mean StDev SE Mean 95% CI T P
Force 100 4.3427 0.6756 0.0676 (4.2086, 4.4768) 2.11 0.037

These two sets of results aren't really contradictory, though.

The equivalence test has simply defined "equality" between the mean and the target in broader terms, using the values you entered for the equivalence zone. The standard t-test has no knowledge of what "practically significant' means. So it can only evaluate the difference from the target in terms of statistical significance.

In this way, an equivalence test is "naturally smarter" than a standard t-test. But it's your knowledge of the process or product that allows an equivalence test to evaluate the practical significance of a difference, in addition to its statistical significance.

Learn More about Equivalence Testing

There are four types of equivalence tests newly available in Minitab 17. To learn more about each test, choose Help > Help. Click the Index tab, scroll down to Equivalence testing, and click Overview.

↧

Analyze a DOE with the Assistant in Minitab 17

April 2, 2014, 6:50 am

≫ Next: Did Welch’s ANOVA Make Fisher's Classic One-Way ANOVA Obsolete?

≪ Previous: Equivalence Testing for Quality Analysis (Part II): What Difference Does the Difference Make?

By now, you probably know that Minitab 17 includes Design of Experiments (DOE) in the Assistant. We already spent some time looking at 5 highlights when you create a screening experiment with the Assistant in Minitab 17.

But the Assistant can also help you make sense of the data you collect for your experiment. After you create a design with the Assistant, choose Assistant > DOE > Analyze and Interpret and you’re on your way. Exactly what you get depends on which type of design you’re analyzing, but there’s some really neat stuff to help you get the most out of your data. Here are 3 highlights:

1. Next Steps

After you analyze the data from your design, the next question is often what you should do based on the results. Because the answer depends on your results, so does the Next Step that the Assistant suggests in the Report Card.

For example, when you analyze a screening design, the advice differs based on the number of significant factors. The goal of the screening design is to identify the 2 to 5 factors that have the greatest influence on the response. If you identify zero or one significant factor the Assistant suggests how to collect more data to identify significant factors. If you identify 6 or more significant factors, the Assistant suggests how to choose which factors to include in a modeling experiment so that you can identify the settings that produce the best value for your response variable. If you fit a linear model and there is no significant curvature, the Assistant suggests confirmation runs on the optimal response settings. If you fit a linear model and there is significant curvature, then the Assistant suggests that you collect more data so that you can model that curvature.

The report card provides the next steps for your analysis.

2. Guided Analysis Path

Although the Next Steps are convenient, one of the great things about the Assistant is the flow charts that guide you at just the right time. For example, when you choose to analyze and interpret a modeling design, the flow chart shows fitting a linear model. If you want to know more about the issues that will arise when you are ready to optimize the response, then you can click the Optimize response rectangle in the flow chart to get the information that you need.

The flow chart gives an overall plan for the analysis.

When you clikc the flow chart, you get more detailed information.

3.Optimal Settings

When you analyze a modeling design, the Assistant asks you about your goal for the response variable. If the Assistant finds that your modeling design fits the data, then you get the Prediction and Optimization report. The best part about the report is that it tells you the settings that best achieve your goal for the response. If you want to maximize the response, the settings that maximize the response are in the report. If you gave a target value, the settings that get closest to the target are in the report. And if, for any reason, the optimal settings are undesirable, then you get the next 5 best combinations of settings to help you find a way to meet your goal.

The optimization report shows how best to meet your goals.

Designed experiments are a powerful tool to help you get the most information from the least amount of data. With the addition of DOE to the Assistant in Minitab, it’s easier than ever to create, analyze, and interpret designed experiments. Try it out for yourself and you’ll be one step closer to performing fearless data analysis.

↧

Did Welch’s ANOVA Make Fisher's Classic One-Way ANOVA Obsolete?

April 3, 2014, 4:00 am

≫ Next: Control Chart Tutorials and Examples

≪ Previous: Analyze a DOE with the Assistant in Minitab 17

Interval plot of group means One-way ANOVA can detect differences between the means of three or more groups. It’s such a classic statistical analysis that it’s hard to imagine it changing much.

However, a revolution has been under way for a while now. Fisher's classic one-way ANOVA, which is taught in Stats 101 courses everywhere, may well be obsolete thanks to Welch’s ANOVA.

In this post, I not only want to introduce you to Welch’s ANOVA, but also highlight some interesting research that we perform here at Minitab that guides the implementation of features in our statistical software.

One-Way ANOVA Assumptions

Like any statistical test, one-way ANOVA has several assumptions. However, some of these assumptions are stringent requirements, while others can be waived. Simulation studies can determine which assumptions are true requirements.

For one-way ANOVA, we’ll look at two major assumptions. One of these assumptions is a true requirement, and understanding that explains why Welch’s ANOVA beats the traditional one-way ANOVA.

The discussion below is a summary of simulation studies conducted by Rob Kelly, a senior statistician here at Minitab. You can read the full results in the one-way ANOVA white paper. You can also peruse all of our technical white papers to see the research we conduct to develop methodology throughout the Assistant and Minitab.

Assumption: Samples are drawn from normally distributed populations

One-way ANOVA assumes that the data are normal. However, the simulations show that the test is accurate with nonnormal data when the sample sizes are large enough. These guidelines are:

If you have 2-9 groups, the sample size for each group should be at least 15.
If you have 10-12 groups, the sample size for each group should be at least 20.

Assumption: The populations have equal standard deviations (or variances)

One-way ANOVA also assumes that all groups share a common standard deviation even if they have different means. The simulations show that this assumption is stricter than the normality assumption. You can’t waive it away with a large sample size.

What happens if you violate the assumption of equal variances?

For hypothesis tests like ANOVA, you set a significance level. The significance level is the probability that the test incorrectly rejects the null hypothesis (Type I error). This error causes you to incorrectly conclude that the group means are different.

If you set the significance level to the common value of 0.05, 1 out of 20 tests should produce this error.

Rob ran 10,000 simulation runs for each of 50 different conditions to compare the observed error rate to the target level. Ideally, if you set the significance level to 0.05, the observed error rate is also 0.05.

The greater the difference between the target and actual error rate, the more sensitive one-way ANOVA is to violations of the equal variances assumption.

Simulation results for unequal variances

The simulations show that unequal standard deviations cause the actual error rate to diverge from the target rate for the traditional one-way ANOVA.

The best case scenario for unequal standard deviations is when group sizes are equal. With a significance level of 0.05, the observed error rate ranges from 0.02 to 0.08.

For unequal group sizes, the results varied greatly depending on the standard deviations of the larger and smaller groups. The error rates for unequal group sizes extend up to 0.22!

Solutions to this Problem

Clearly you need to be wary when you perform one-way ANOVA and your group standard deviations are potentially different. Fortunately, there are two approaches you can try.

Test for equal variances

In Minitab, you can perform a test to determine whether the standard deviations of the groups are significantly different: Stat > ANOVA > Test for Equal Variances. If the test’s p-value is greater than 0.05, there is insufficient evidence to conclude that the standard deviations are different.

However, there is a big caveat. Even if you meet the sample size guidelines for one-way ANOVA, the test for equal variances may have insufficient power. In this case, your groups can have unequal standard deviations but the test will be unlikely to detect the difference. In general, failing to reject the null hypothesis is not the best method to determine that groups are equal.

However, if you have an adequate sample size and if the variance test’s p-value is greater than 0.05, you can trust the results from the traditional one-way ANOVA.

Welch’s ANOVA

What do you do if the test for equal variances indicates that the standard deviations are different? Or that the test has insufficient power? Or, perhaps you just don’t want to have to worry about performing and explaining this extra test? Let me introduce you to Welch’s ANOVA!

Welch’s ANOVA is an elegant solution because it is a form of one-way ANOVA that does not assume equal variances. And the simulations show that it works great!

When the group standard deviations are unequal and the significance level is set at 0.05, the simulation error rate for:

The traditional one-way ANOVA ranges from 0.02 to 0.22, while
Welch’s ANOVA has a much smaller range, from 0.046 to 0.054.

Additionally, for cases where the group standard deviations are equal, there is only a negligible difference in statistical power between these two procedures.

Where to Find Welch’s ANOVA in Minitab

You might be using Welch’s ANOVA already without realizing it. Because of the advantages described above, the Assistant only performs Welch’s ANOVA.

Starting in Minitab 17, you can also perform Welch’s ANOVA outside of the Assistant. Go to Stat > ANOVA > One-Way. Click Options, and uncheck Assume equal variances. You can also perform multiple comparisons using the Games-Howell method to identify differences between pairs of groups.

Below is example output for Welch's ANOVA from the Assistant. Just like the classic one-way ANOVA, look at the p-value to determine significance and use the Means Comparison Chart to look for differences between specific groups.

One-Way ANOVA in Minitab's Assistant

The low p-value (< 0.001) indicates that at least one mean is different. The chart shows that each mean is different from the other two means.

↧

Control Chart Tutorials and Examples

April 4, 2014, 5:00 am

≫ Next: Introducing the Bubble Plot

≪ Previous: Did Welch’s ANOVA Make Fisher's Classic One-Way ANOVA Obsolete?

The other day I was talking with a friend about control charts, and I wanted to share an example one of my colleagues wrote on the Minitab Blog. Looking back through the index for "control charts" reminded me just how much material we've published on this topic.

Whether you're just getting started with control charts, or you're an old hand at statistical process control, you'll find some valuable information and food for thought in our control-chart related posts.

Different Types of Control Charts

One of the first things you learn in statistics is that when it comes to data, there's no one-size-fits-all approach. To get the most useful and reliable information from your analysis, you need to select the type of method that best suits the type of data you have.

The same is true with control charts. While there are a few charts that are used very frequently, a wide range of options is available, and selecting the right chart can make the difference between actionable information and false (or missed) alarms.

What Control Chart Should I Use? offers a brief overview of the most common charts and a discussion of how to use the Assistant to help you choose the right one for your situation. And if you're a control chart neophyte and you want more background on why we use them, check out Control Charts Show You Variation that Matters.

Joel Smith extols the virtues of a less commonly used chart in Beyond the "Regular Guy" Control Charts: An Ode to the EWMA Chart, while Greg Fox talks about using control charts to track rare events in Using G-Whiz Charts to Track Elusive Affirmations from Almost Adolescents.

In Using the Laney P' Control Chart in Minitab Software Development, Dawn Keller discusses the distinction between P' charts and their cousins, described by Tammy Serensits in P and U Charts and Limburger Cheese: A Smelly Combination.

And it's good to remember that things aren't always as complicated as they seem, and sometimes a simple solution can be just as effective as a more complicated approach. See why in Take It Easy: Create a Run Chart.

Control Chart Tutorials

Many of our Minitab bloggers have talked about the process of choosing, creating, and interpreting control charts under specific conditions. If you have data that can't be collected in subgroups, you may want to learn about How to Create and Read an I-MR Control Chart.

If you do have data collected in subgroups, you'll want to understand why, when it comes to Control Charts, Subgroup Size Matters.

It's often useful to look at control chart data in calendar-based increments, and taking the monthly approach is discussed in the series Creating a Chart to Compare Month-to-Month Change and Creating Charts to Compare Month-to-Month Change, part 2.

If you want to see the difference your process improvements have made, check out Analyzing a Process Before and After Improvement: Historical Control Charts with Stages and Setting the Stage: Accounting for Process Changes in a Control Chart.

While the basic idea of control charting is very simple, interpreting real-world control charts can be a little tricky. If you're using Minitab 17, be sure to check out this post about a great new feature in the Assistant: The Stability Report for Control Charts in Minitab 17 includes Example Patterns.

Finally, one of our expert statistical trainers offers his suggestions about Five Ways to Make Your Control Charts More Effective.

Control Chart Examples

Control charts are most frequently used for quality improvement and assurance, but they can be applied to almost any situation that involves variation.

My favorite example of applying the lessons of quality improvement in business to your personal life involves Bill Howell, who applied his Six Sigma expertise to the (successful) management of his diabetes. Find out how he uses Control Charts to Keep Blood Sugar in Check.

Some of our bloggers have applied control charts to their personal passions, including holiday candies in Control Charts: Rational Subgrouping and Marshmallow Peeps! and bicycling in The Problem With P-Charts: Out-of-control Cycle LaneYs!.

If you're into sports, see how Jim Colton used control charts to reveal When NHL Goalies Should Get Pulled. Or look to the cosmos to consider Signal to Noise: Detecting Extraterrestrials and Special Causes. And finally, compulsive readers like myself might be interested to see how relevant control charts are to literature, too, as Cody Stevens illustrates in Laney P' Charts Show How Poe Creates Intensity in "The Fall of the House of Usher."

How are you using control charts?

↧

Introducing the Bubble Plot

April 7, 2014, 5:00 am

≫ Next: ITEA Sneak-Peak: The Great Escape from Foam Defects

≪ Previous: Control Chart Tutorials and Examples

When you're evaluating a dataset, graphical analysis can be very important. While an analysis like a regression or ANOVA can be backed up by numbers, being able to visualize how your dataset is behaving can be even more convincing than a group of p-values—especially to those who aren’t trained in statistics.

For example, let’s look at a few variables we think may be correlated. In this specific example, we will take the Unemployment Rate and the Crime Rate for each state in the U.S. We have 3 columns of data in Minitab: C1, which contains the State Name; C2, which contains the Crime Rate; and C3, which contains the Unemployment totals.

Go to Graph > Scatterplot > Simple. Then put Unemployment on the X-Axis, and the crime rate on the Y-Axis.

There does appear to be a significant correlation between the Unemployment Rate and the Crime Rate. We can confirm this by running a correlation, but a quick glance at the picture will give you an idea of what’s going on. The more unemployment we have, the higher the crime rate.

The scatterplot helps us if we are interested in two variables. But what if we have an interest in a third variable we think may be related? A new tool in Minitab 17, called a Bubble Plot, allows us to investigate a third variable of interest in the same graph.

Let’s add another variable, State Population, in column C4. We can go to Graph > Bubble Plot and fill out our dialog as follows:

Once we have our dialog filled out, we can click OK, which gives us the graph below:

The bubble plot allows us to see a third variable, instead of the two that we are limited to with most graphs. This can be very useful if you have three variables of interest, and can look at this one graph instead of a number of different graphs which are only capable of showing us two variables at once.

It does look like the population does increase along with the crime rate and unemployment, though there are some obvious outliers.

Now, this looks good when we're just looking to see relationships, but what if we want to know specifics? This is where we can use Data Labels on the bubbles. Right-click on the bubbles, and choose Add > Data Labels. Choose the option 'Use labels from column:' and choose our column C1, which represents the State. This gives us the graph below:

With this final graph, in addition to the straightforward relationships between these variables we can clarify each specific data point. This is the advantage of the Bubble Plot; being able to see the relationship between three different variables in one plot.

The Unemployment Rate data used in this post comes from the Bureau of Labor Statistics. Crime Rate data comes from Census.gov.

↧