Minitab | Minitab

I know we lost by 2 touchdowns, but if only you had given Peterson 3 more carries we would have won!

Last week, ESPN ran an article about why the running game still matters. They used statistics to show that the more you run the football in the NFL, the more likely you are to win the game. Specifically, if you have a running back who gets at least 20 carries, you win about 70% of the time. Statistics from different eras all had the same result: it appears that the more you run the football, the better your odds of winning the football game are.

If only it were that simple.

There is no doubt that teams who run the football win more often. But that doesn’t mean that running the football causes the winning. After all, teams who kneel the ball in the 2nd half win about 99% of the time. So does that mean if teams simply took a knee more frequently in the 2nd half they’d win? Of course not!

Let's dive into the numbers and see if we can figure out what is really going on here.

The Relationship between Rushing and Winning

First, let’s look at how running the football and winning are related. I took data from every game through week 9 of the 2013 NFL season, and recorded if there was a rusher who had over 20 carries and whether that team won or lost. Here are the results (and you can get all the data I used for this statistical analysis here).

Tabulated Statistics

So this year, when a team has a rusher who has 20 or more carries, they’ve won 67.65% of the time. This lines up very well with the statistics in the ESPN article. And you’ll notice that when a team didn’t have a rusher with 20 carries, they lost 56% of the time. Clearly, the more you run the ball, the more you win.

But why did the author of the ESPN article choose 20 or more carries? Why not, say, 25 or more? Surely if running the football matters so much, a team that has a rusher with 25 or more carries must win even more, right? Let's see...

Tabulated Statistics

Oh...that’s why he chose 20: because choosing 25 didn’t help his argument. Teams with a rusher who has 25 or more carries have only won 54.55% of their games this year.

So I guess coaches should get their star running back 20-24 carries, then sit him on the bench the rest of the game—68% of the time, it works every time!!

But seriously, one of the problems with picking a “magic number” like 20 or 25 carries is that it decreases our sample size. There have only been 22 instances of a rusher with 25 or more carries this season. And only 68 (26%) have had a rusher get 20 or more carries.

That means we’re ignoring almost three quarters of all the available data!

To fix this problem, I’m going to use Binary Logistic Regression to model the relationship between the leading rusher's number of carries and winning. Binary Logistic Regression is similar to regular regression, except instead of modeling a continuous response (like weight) and a continuous predictor (like height), I’m using a binary response (win/loss) and a continuous predictor (number of carries).

What we would expect is that as the number of carries by your lead rusher increases, your probability of winning increases, too.

Binary Logistic Regression

We see that the p-values in the Logistic Regression Table are 0, indicating that there is a significant association between carries by the lead rusher and wins. However, the p-values for the Goodness-of-fit Tests are all very low, too. A low p-value (specifically less than 0.05) indicates that the predicted probabilities deviate from the observed probabilities in a way that the binomial distribution does not predict.

That means even though there is a significant association between carries by the lead rusher and wins, the model that was created fits the data very poorly. In other words, our model does a really bad job predicting whether a team won or lost given the number of carries by the lead rusher.

This is not surprising, seeing as the probability of winning went down when we increased the carries from 20 to 25. So the number of carries by the lead rusher doesn’t cause a team to win.

But I’m not done with this “rushing causes winning” myth yet!

The Relationship between Team Rushing and Winning

When I was collecting the data, I noticed multiple cases where a team had a rusher below 20 carries even though they blew the other team out. Take, for instance, Seattle’s week 3 game against Jacksonville, or San Francisco’s week 5 game against Houston.

In both games, Seattle and San Francisco won by at least 4 touchdowns. But their lead rushers only had 17 carries! The reason for this is because they had such large leads, they played their backups before their starting running backs could get to 20 carries. And because the backups played so early, the losing team’s starting running back actually had more carries than the winning teams (more evidence of why the previous model wasn't good).

However, as a team, Seattle and San Francisco had more rushing attempts than Jacksonville and Houston. So what if instead of limiting our rushing attempts to a single person, we use team rushing attempts instead.

Binary Logistic Regression

The p-values in the Logistic Regression Table are 0, indicating a significant association between team rush attempts and wins. And this time our p-values for the goodness-of-fits tests are all greater than 0.05, indicating that that this model is much better than our previous one.

So now that we have a model, we can use that to predict the probability of winning a game based on the entire team's number of rushing attempts.

Binary Logistic Regression

Here, I used the model to predict a team’s probability of winning based on the number of attempts. With only 10 attempts, a team has only a 6.8% chance of winning. This increases as the number of attempts increase, all the way up to a 97.6% chance of winning if you have 50 attempts!

So have we done it? Have we shown that teams that run the ball more win more often? Should I call the Denver Broncos front office and tell them to trade Peyton Manning for Adrian Peterson?

No, not yet. First, I want to answer one question. Is the running causing the winning...or is the winning causing the running? (Don’t worry, we’re almost done, and I promise no more binary logistic regression.)

Do Teams Rush to Gain the Lead, or Rush After They Have the Lead?

We’ve established that teams who rush more often win more often. But when are those rushing attempts coming? Just like taking a knee, the extra rushing attempts may just come at the end of the game when the winning team is trying to run the clock.

Take a look at the following table, which presents the average number of rushing attempts for the winning and losing team at different points in the game.

Total Rushing Attempts

Attempts through the first 3 quarters

Attempts in the 4th quarter

Winning Team

30.6

20.8

9.7

Losing Team

22.9

18.8

4.1

So on average, the winning team out-rushes the losing team by almost 8 attempts. However, through the first 3 quarters of the game, the number of rushing attempts is almost equal, with the winning team averaging only two more. But when you get to the 4th quarter, the winning team averages 5.6 more rushing attempts than the losing team! Of course, this could be viewed two different ways. Either winning teams are already winning in the 4th quarter, and thus rush more to run the clock. Or teams who don’t get pass wacky in the 4th quarter win because they stick to the run!

So which is it? Do teams rush to gain the lead, or rush after they have the lead?

Below is a fitted line plot showing the relationship between a team’s number of rushing attempts through the first 3 quarters of the game, and the scoring margin going into the 4th quarter. If running truly helps you win, a higher number of rushing attempts should result in a greater lead going into the 4th quarter.

Fitted Line Plot

The points appear to be randomly scattered about. The small R2 value of 6% shows almost no relationship between rush attempts through the first 3 quarters and having a lead going into the 4th quarter.

So now we’re going to plot the relationship between the margin going into the 4th quarter, and the number of rushing attempts during the 4th quarter.

Fitted Line Plot

This plot shows that teams with larger leads going into the 4th quarter have more rushing attempts. This plot indicates that it’s the winning that leads to the rushing, not the other way around.

And notice the points I circled on the left hand side of the graph. Those are teams that are behind by so many points that they’re rushing the ball in the 4th quarter just to end the game, rather than trying to throw to catch up (Hello, Jacksonville!). If we don’t include those points (since the team is no longer trying to win), our R2 value increases to 30%. That means that 30% of the variation in the number of rushing attempts a team makes in the 4th quarter can be explained by how many points they’re ahead or behind going into the 4th quarter. Considering all the crazy things that can happen in a quarter of football, I’d say that’s pretty high!

Rushing is definitely an important part of football, but don’t fool yourself into thinking there is some magic number of rushing attempts that guarantees victory. Winning teams have more rushing attempts mostly because they’re trying to run the clock out while the losing team has to throw to catch up. So spread the awareness, lest we have to listen to more fans who think they know more than coaches yelling, “Our team is undefeated when they run the ball at least 30 times. Somebody tell the coaches to run the ball more!!!”

Minitab Statistical Software displays the information that you need in the analyses that you perform, but it’s possible that you might want to capture information in a different place, or even a different program, so that the results are as easy to communicate as possible. For those occasions, it’s helpful to know how to get text even from the less obvious places in Minitab. Here are five useful cases:

1. Boxplot Tooltips

In Minitab, you often can hover over part of a graph and get a yellow box with some text in it. Try this:

Choose File > Open Worksheet.
Click Look in Minitab Sample Data Folder.
Select Exh_aov.MTW and click Open.
Choose Graph > Boxplot.
Select One Y With Groups. Click OK.
In Graph Variables, enter Thickness.
In Categorical variables for grouping, enter Setting. Click OK.

Now you have a boxplot that shows how different the thicknesses are for different settings. Leave your mouse pointer over one of the boxplots, and Minitab displays some of the sample statistics. To record those exact statistics for each boxplot, you need an additional step.

Click a boxplot twice slowly so that only one boxplot is selected.

The black squares on the corners of the box indicate the boxplot I selected.

Right-click on the boxplot and choose Copy Text.

The Copy Text option is in the bottom section of the menu.

Press CTRL+V to paste the tooltip text into the boxplot
Repeat steps 1 to 3 for each boxplot.
Double-click the text to edit the words and font size.
Click and drag the text boxes into a pleasant arrangement.

Boxplot showing the precise statistics from the tooltip.

Superuser tip

Right-click your boxplot and choose Send Graph to Microsoft Word. The boxplots and the copied statistics go right into your report.

2. Scatterplot Tooltips

The boxplot tooltips are great for getting the exact statistics the plots show. In scatterplot, you can use the same copying feature to display the equation of the regression line. Follow this example using the same data set that we used for the boxplot tooltips.

Choose Graph > Scatterplot.
Select With Regression and click OK.
In Y Variables enter Thickness.
In X Variables, enter Setting. Click OK.

This scatterplot does not show an equation.

Now you have a nice scatterplot that shows a line. In real life, we’d dig deeper into this relationship before using the line because we can see that the variation increases with Setting. But we can still use this graph to illustrate how to get the equation for the line onto the graph.

Click the regression line to select it.
Right-click and choose Copy Text.
Right-click and choose Add > Subtitle.
Paste the copied text from the tooltip into Subtitle. Click OK.

Now, the equation is in your subtitle.

This scatterplot shows the equation of the line.

Superuser tip

The line and the equation are nice, but for the most detail, use Minitab’s Assistant Menu. That’s also another place where you might not know how easy it is to copy!

3. Minitab’s Assistant Menu Summary Report

Minitab’s Assistant Menu is a great tool that leads you through your analysis. One of the best parts of the Assistant Menu is that it provides a summary report to help you interpret your results with confidence. You can try it out for yourself on the same data set we used for the boxplots and scatterplot.

Choose Assistant > Regression.
Click Click to perform analysis.
In Y column, enter Thickness.
In X column, enter Setting. Click OK.

Minitab's Assistant Menu Summary Report

The Comments section includes the fitted equation, a reminder to inspect the fit before using the model to predict, and a caution that statistical significance does not mean that a change in X will cause a change in Y. If you want to emphasize just these comments, it’s as easy as getting the tooltip text.

Right-click the Comments box.
Choose Copy Text.
Paste the copied text into another program, such as Microsoft Word.

You’re all set to discuss this portion of the Summary Report in detail.

4. Minitab’s Assistant Menu Report Card

Minitab’s Assistant Menu also includes guidelines to ensure that your analysis is successful. The first report in most Assistant Menu analyses is the Report Card, a great summary of checks on your data.

The Report Card for Regression includes four data checks. In a presentation, you might want to call attention to some of the checks more than others. In this case, you might want to spend extra time explaining just the Unusual Data check, because it has a caution symbol.

The report card indicates two data points with large residuals.

Right-click the text.
Choose Copy Text.

The Copy Text choice is in the same place as when you copy a tooltip.

Paste the text into another program, such as Microsoft Word.

You’re ready to give more detail about the two data points with large residuals.

Superuser tip

You can also edit the text in the Report Card. Double-click the text and you can add all the notes that you want.

5. Minitab Help

You know that you can copy text from tooltips in Minitab graphs and from the Minitab Assistant Menu Reports. If your goal is to show someone else how to do an analysis in Minitab, an example can be a big help.

Suppose that you want to include the example of a scatterplot with separate lines for different groups in a Microsoft Word 2010 document.

Choose Graph > Scatterplot.
Select With Regression and Groups. Click OK.
Click the Help button in the lower left corner of the Scatterplot dialog box.
In the table of contents, select Example, with regression and groups.
Highlight the body of the example.
Right-click and choose Copy.
Paste the copied text into Microsoft Word. What you paste does not include the graph window output.
Return to the example, right-click the graph and choose Copy.
Place your cursor in Microsoft Word where the graph should appear.
On the Home tab, in the Clipboard group, open the menu below Paste and choose Paste Special.
In As, select Device Independent Bitmap. Click OK.
In Microsoft Word, right-click the hyperlink in the second paragraph of the example and choose Remove Hyperlink.

Now you have the example, with formatted steps and graphical output, ready to share.

The help example, formatted in Microsoft Word.

Summary

Minitab’s output includes a lot of detail. In a presentation or report, you might want to call attention to specific statistics or features. Knowing how to copy from all types of Minitab output and guidance can save a lot of time and effort. So grab those tooltips, Assistant Menu notes, and help examples, and use them to your heart’s content!

Bonus

Ready for more about how to get the most out of Minitab? Make sure to review Patrick Runkel's Minitab Tips and Tricks: Top 10 Countdown Finaleto see advice about clearing a dialog box and opening Excel files in Minitab.

"The First Thanksgiving at Plymouth" (1914) By Jennie A. Brownscombe In the United States, our Thanksgiving holiday is fast approaching. On this day, we give thanks for the good things in our lives.

For this post, I wanted to quantify how thankful we should be. Ideally, I’d quantify something truly meaningful, like happiness. Unfortunately, most countries are not like Bhutan, which measures the gross national happiness and incorporates it into their five-year development plans.

Instead, I’ll focus on something that is more concrete and regularly measured around the world—income. By examining income distributions, I’ll show that you have much to be thankful for, and so does most of the world!

Anatomy of the Income Distribution Graphs

To really understand incomes, we need to understand the distribution of incomes for whole populations. By assessing entire distributions, we can identify the most common incomes, probabilities for ranges of incomes, income inequality, and how all of these change over time and by location.

To graph these distributions, I’ll use Minitab’s probability distribution plots and parameter estimates calculated by Pinkovskiy and Sala-i-Martin (2009).

This study found that the lognormal distribution best fits the income distributions. The lognormal distribution has two parameters, location and scale. Location describes how large incomes are and scale describes the spread of the values. Typically, both parameters get larger over time, which indicates both larger incomes and a larger spread between the rich and poor.

Distribution of U.S. income per capita in 2006

The distribution above shows the per capita income for the United States in 2006. Like all of the other income distributions you’ll see in this post, this one is right-skewed. Half the population falls within the shaded area between 0 and $28,788. This is typical of income distributions where the majority of values are jammed together on the left side and the distribution extends further to the right.

The x-axis is income “per capita” because it includes children and non-working adults, rather than just working adults. The intent is to show the amount of money that covers all individuals. For example, if a household of four has a total income of $80,000, each person in that house has a per capita income of $20,000.

Finally, all graphs display income in 2006 U.S. dollars, which allows you to compare across countries and time.

Why You Should Be Thankful!

Because you’re reading this blog, I can make some assumptions about you. You have electricity, and access to a computer and the Internet. Further, because you're reading a statistical blog, I can assume that you have a higher level of learning. In fact, I can pretty much bet that both your income and wealth are higher than those of the majority of people on Earth, and probably very much higher than the global average!

Let’s take a look at a sample of income distributions from various countries to see how I reach this conclusion.

Comparions of distributions of per capita income by country in 2006

In the graph above, there is a cluster of developing countries on the left. Given that two of them are the rural populations of China and India, it’s easy to see how most people fall within this range. The United Kingdom and the United States have peaks that are shifted to the right and they stretch out much further to the right. In the middle is the Russian Federation, but it's still well below the U.S. and UK.

Clearly, most people of the world fall far to the left on the income distribution. Let's zoom in on two countries to show how the country you’re born in dramatically affects the probability of what your income will be.

Comparison of U.S. and China income distributions

I’ve shaded the curves for an income per capita that is less than $10,000. In China, this covers 98.3% of the population, and for the U.S. it covers 7.6%.

To create a global distribution, Pinkovskiy and Sala-i-Martin summed the income distribution curves for 119 countries using a population weighted method. They found that in 2006 the global mode for income was $3,300 and that over 50% of the world population had an income per capita of less than $5000.

Davies et al., performed a similar analysis to look at the distribution of global wealth among adults in 2000. Wealth is net worth, or the value of all assets minus liabilities. In 2000, an adult needed wealth of just $2,138 U.S. dollars to be in the wealthiest half of the world, and needed $61,000 to be in the top 10%.

Closing Thoughts

If you’re like me, you might be surprised by the low values that are required to be in the top half, and higher, of the global distributions for income and wealth. Remember, these global distributions are right-skewed. Consequently, a high proportion of values are concentrated in the low end and the rest are spread out much further on the high end.

The last thing I want to do is to make this blog an exercise of patting ourselves on the back. Instead, I hope understanding the global distribution of wealth and income gives you a new perspective and reminds you of how much you have to be thankful for.

In part two, I'll switch gears and use these distributions to assess global poverty and how it has changed over the decades. How does the overall global welfare today compare to 1970? Do more people have their basic needs met? Is income inequality a big problem?

_____________________________________________________________

Maxim Pinkovskiy, Xavier Sala-i-Martin, "Parametric Estimations of the World Distribution of Income", NBER Working Paper No.15433, October 2009. I'd like to give Maxim Pinkovskiy a special thanks for sharing the parameter estimates with me as well as clarifying several points.

James B. Davies, Susanna Sandstrom, Anthony Shorrocks, Edward N. Wolff, "The World Distribution of Household Wealth", United Nations University, Discussion Paper No. 2008/03.

In my previous post, I looked at how personal income levels fit into the global distribution of incomes. Although, I’d be the last person to suggest that a higher income guarantees more happiness—after all, I’ve visited a number of developing countries and, as long as their basic needs are met, the people seem to be just as happy and hard working as people here at home.

So instead of personal income levels, I’d like to assess something more meaningful: global well-being. How does the overall global welfare today compare to 1970? Do more people have their basic needs met? That’s what we’ll look at in this post, and there’s good news here!

To evaluate global well-being, I’ll assess how global poverty and income inequality have changed.

Global Poverty Levels from 1970 to 2006

Depending on the organization and year, there are several official poverty lines for developing countries. I’ll use the $1 a day poverty line, which equates to US $312 in 2006 dollars. Using the other common poverty lines ($2, $3 a day, etc) produces similar results.

Below are a couple of representative examples of developing countries to illustrate the global trend. The shaded region in the graphs show how the proportion of those living below the poverty line in Ecuador and rural China have dropped remarkably since 1970.

Distribution graph that shows declining poverty in Ecuador from 1970 to 2006

Distribution graph of the decling level of povery in China from 1970 to 2006

The same pattern applies to the United States. While it’s hard to match a single per capita income value to the different household sizes and incomes that the Census bureau uses to measure poverty, a per capita value of $5000 is close for most household sizes.

Distribution graph that shows declining poverty in the United State from 1970 to 2006

The graph shows that the percentage of those in the United States with a per capita income of less than $5000 has dropped from 5.6% to less than 1%.

To assess poverty changes on a global scale, Pinkovskiy and Sala-i-Martin combined 119 country distributions, and found:

Using the official $1/day line, we estimate that world poverty rates have fallen by 80% from 0.268 in 1970 to 0.054 in 2006. The corresponding total number of poor has fallen from 403 million in 1970 to 152 million in 2006. . . . We also find similar reductions in poverty if we use other poverty lines. . . We learn that not only are poverty rates falling, but that they are falling faster than population is rising.

Global Income Inequality from 1970 to 2006

While the poverty rates have dropped significantly over the past 30 years, income inequality within most countries has increased. I want to determine whether this negatively affects global well-being.

You can see the increasing inequality in the scale parameters that generally increase over time, which produces a wider spread in the graphs. There are more complex measures of inequality, such as the Gini coefficient, but I’ll illustrate the principle using the income ratio of the 90th and 10th percentile of earners in the United States.

Distribution graph that shows the 10th and 90th income percentiles in the U.S. for 1970 and 2006

The graph displays the per capita income values for the 10th and 90th percentiles in the United States. The ratio of the high to low incomes increases from 5.3 in 1970 to 6.6 in 2006. A similar pattern exists for most countries and indicates that income inequality is increasing within countries.

While increasing inequality may sound detrimental, keep in mind that it occurs during a time where both the proportion and absolute counts of people living in poverty is sharply declining. Also, counter-intuitively, while income inequality is increasing within most countries, it is actually decreasing globally.

The two graphs below use China and the United States as examples to show how this works. I picked these countries because they both have large economies and are representative of the global trend. In my last post, I compared these two countries to show how different they were in 2006. However, in 1970, they were far more different. Over the decades, China’s income distribution has gained ground.

Comparison of China's and the U.S.'s distribution of incomes in 1970

Comparison of China's and the U.S.'s income distributions in 2006

These two graphs highlight the region where the two economies overlap in 2006. However, in 1970, there was almost no overlap. I shaded the range of $2500-$7500 for Chinese incomes in this overlap zone to illustrate the Chinese gains over time. In 1970, virtually no Chinese had per capita incomes in this range, while 53% were in this range in 2006. Also, note how the distribution for each country has a wider spread, which indicates that the within-country income inequality is increasing.

The global picture follows the same pattern: within-country inequality has increased but the between country inequality has decreased by an even greater amount. The net result is that global income inequality has decreased.

In short, income equality in 1970 was greater because more people were very poor. Inequality has increased since then because fewer people are living in severe poverty.

Perhaps the growing within-country inequality isn’t as bad as it first seems?

Pinkovskiy and Sala-i-Martin conclude, “We find that various measures of global inequality have declined substantially and measures of global welfare increased by somewhere between 128% and 145%.”

Closing Thoughts

All in all, I think this is great news and something we can all be thankful for! Poverty levels are sharply down and measures of global welfare are increasing. While income inequality is increasing, that simply reflects that fact that there are far fewer people living in poverty.

There is still a ways to go, because severe poverty has not been eliminated. In today’s world, severe poverty is generally found in Africa where the rates actually climbed much of the time and only recently began a slight decline. According to Pinkovskiy and Sala-i-Martin, “Welfare unambiguously deteriorated in 23 countries, totaling less than 5% of the world’s 2006 population.”

Looking beyond incomes, there are countries with human rights violations that are not reflected in these promising results. There's also the issue of unequal rights and opportunities.

That said, these findings were a nice surprise to me because usually you only hear the bad news. The decrease in poverty is a longstanding trend that persists over decades, which is a great sign for the future!

Happy Thanksgiving!

In looking for the answer to an unrelated quality improvement question the other day, I ran across a blog post that answers a question I'd had for a while: what's the origin of the value stream map?

A value stream map (VSM) is a key tool in many quality improvement projects, especially those using Lean. The value stream is the collection of all of the activities, both value-added and non-value added, that generate a product or service that meets customer needs. The VSM shows how both materials and information flow as a product or service moves through the process value stream, helping teams visualize where improvements might be made in both flows.

In his post, Michel Baudin traces the history of a process map that accounts for both materials and information back to a text published in 1918, and provides examples and documentation of how this idea has been applied, transformed, and popularized since then.

There are two kinds of value stream maps, a current state map and a future state map. A current state value stream map shows what the actual process looks like at the beginning of a project. It identifies waste and helps you envision an improved future state. The future state map shows what the process should look like at the end of the project. Then, as the future state map becomes the current state map, a new future state map can be created and a plan implemented to achieve it.

Tools for Creating Value Stream Maps

You can use many tools to create a value stream map. At the most basic level, you just need paper and pencil. A group facilitator might use a whiteboard or cover a wall with paper, then give each work team involved in the process color-coded post-it notes. The team members put their tasks on the notes, place them in sequence, then draw lines between steps to show how the work flows. The group adds new steps and adjusts the map until it captures the process in its current state.

More sophisticated value stream maps can be created with easy-to-use software. There are stand-alone VSM creation tools, and also VSM tools that are part of more comprehensive process improvement software packages.

Minitab offers value stream map tools in Qeystone, our project portfolio management platform for Lean and Six Sigma deployments, as well as Quality Companion, our collection of soft tools for quality improvement projects. This video provides a great overview of how the VSM tool works in Companion:

If you'd like to try this yourself, you can download a free 30-day trial of Quality Companion. We also offer the full PDF of our Quality Companion training manual's lesson on value stream mapping.

A Value Stream Map by Any Other Name...

I won't reiterate all the details on the history of the value stream map here. But I will share a theory about the name "value stream map" that I found particularly interesting.

The idea of mapping the flow of information and materials through a process clearly predates the modern "value stream map." There are many potential terms that could be (and have been) applied to this tool; so what made the VSM term stick? Baudin attributes it to savvy marketing:

“Process Mapping,” “Materials and Information Flow Analysis,” are all terms that, at best, appeal to engineers. Any phrase with “value” in it, on the other hand, resonates with executives and MBAs... They readily latch on to a concept called “Value Stream Mapping,” even though their eyes would glaze over at the sight of an actual map. While this confusingly abstract vocabulary can be frustrating to engineers, it does serve the vital purpose of getting top management on board. The trick is to know when to use it — in the board room — and when not to — on the shop floor.

Makes you wonder what other quality tools might be renamed for greater appeal in the boardroom...

If you're interested in VSM, I encourage you to read Baudin's full post as well as the comments that follow it; you'll find some great insight, history, and practical information about when and where to use this tool.

These blog posts also share some helpful tips:

Five Guidelines You Need to Follow to Create an Effective Value Stream Map

Four More Tips for Making the Most of Value Stream Maps

Is your improvement program a success or a failure? In my first post in this series, I mentioned that we reached out to our customers who are practitioners in the field of quality improvement to better understand how they complete projects, what tools they use, and the challenges and roadblocks they come across in achieving success with quality initiatives. One area quality leaders said they were struggling with was the training aspect of their programs—actually getting their belts and/or project team members up to speed with adequate training to complete projects independently.

Insufficient Training

They told us projects were failing because of insufficient training. But why was their training insufficient? The responses that we were given revealed a couple scenarios. In many smaller companies, only Black Belts are trained in order to keep costs down. They send a small number of individuals away to be trained (maybe as a Black Belt for example), with the expectation that they can come back and sufficiently train all of their colleagues. However, this plan tends to backfire for several different reasons.

Because the people tasked with training others spend so much time training, they don’t have enough time to practice what they’ve learned and complete successful project or prove that the methodology actually works. Other times, process owners are preliminarily tasked with leading projects, but they might not have been formally trained yet, so they may not feel confident or have the knowledge to really support the project to completion.

Another cause of insufficient training that our customers revealed is that the importance of “learning by doing” is not always explored during the training. Their training might focus too much on the theory of a certain methodology, but trainees never actually bring a real project to training and complete it. Momentum can be lost if trainees don’t ever see the value of a real project.

These are just a couple of items to consider when you’re starting your company's training program—perhaps a few traps you can avoid. And while we don’t offer Six Sigma training or specific belt training here at Minitab, we do have several public, remote, and onsite training options available to help you learn our software, as well as extensive Help resources within the software. Check out http://www.minitab.com/training/ for more information.

Lack of Management Support

The top reason customers shared with us about why their projects were failing will likely not come as a big surprise to you. Customers told us their projects were failing mostly because of a lack of management support. The training program can be top-notch, and teams can be working together seamlessly, but if there isn’t management support or buy-in you’re going to have a tough time getting your solutions implemented and being successful with your overall improvement program.

But what does good management support actually look like? According to the customers we surveyed, the following were identified as being associated with having good management support:

Minitab Bar Chart

People we talked to who said they had good management support spoke of having a lot of moral support from management and that recognition was given for successful projects. Awareness of top management in knowing the various projects being worked on, as well as the fact that top management understood the Six Sigma methodology themselves were also reasons given.

For previous posts about how to avoid a Lean Six Sigma project failure, take a look at Avoiding a Lean Six Sigma Project Failure, part 1, Avoiding a Lean Six Sigma Project Failure, part 2, and Avoiding a Lean Six Sigma Project Failure, part 3.

We hope understanding some of the barriers mentioned in this series of posts will prevent project failures from happening to you in the future, or perhaps these ideas may help you in fine-tuning and making your current program even better.

Nala couch What factors significantly affect how quickly my couch-potato pooch obeys the “Lay Down” command?

The cushiness of the floor surface? The tone of voice used? The type of reward she gets? How hungry she is?

I created a 1/8 fraction Resolution IV design for 7 factors and collected response data for 16 runs. Now it’s time to analyze the data in Minitab, using Stat > DOE > Factorial > Analyze Factorial Design.

After removing insignificant terms from the model, one at a time, starting with the highest-order interaction, here's the final model:

Of the original 7 factors in the screening experiment, only three are statistically significant at the alpha level of 0.05.

Procedure: Sit first, then Lay Down vs Lay Down Only
Hand signal: Point to the floor while giving the command, or not
Reward: Dry biscuit vs honey cured ham.

Even though it’s not statistically significant, the Voice term is included in the final model because it’s part of the statistically significant 2-way interaction Voice*Hand signal.

To visualize your results, you can display a normal plot of the standardized effects.

Normal plot

The statistically significant effects are shown in red. The stronger the effect, the farther away from the blue line it is.

The Reward factor (G) has the greatest effect, followed by the Hand signal (D).

Analyzing the Main Effects

How do the different settings of each factor actually affect the response? To find out, display a main effects plot of the factors in your final model.

Main Effects

The vertical scale shows the mean response—here, the number of seconds it took the dog to lie down after being given the Lay Down command.

The two bottom plots show just what you’d expect: Using a hand signal and a reward of fresh honey-cured ham are both associated with a quicker mean response. In fact, they were the two strongest main effects.

But the two upper plots surprised me. They’re a good reminder that you can’ t always trust your “pat assumptions.”

I expected that giving my dog the Sit command first, and then a treat, before giving the Lay Down command, would focus her and make lay down more quickly. But as the Procedure plot shows, she responded faster, on average, when the Lay Down command was given by itself.

I also expected she would respond more quickly to an authoritative tone of voice. But the Voice plot suggests that she’s less motivated by a cold, steely “Nurse Ratched” tone than she is by a perky Mickey Mouse voice.

Or is she?

Warning 1: Beware of a main effects plot for a factor that’s part of a significant interaction. What you see may not be what you get. Analyzing Interactions

In this model, there’s a significant interaction that involves the Voice factor (Voice*Hand signal). To accurately interpret the effect of this factor, I need to examine an interaction plot.

interaction plot

Now you can see why the main effects plot for Voice was misleading.

When combined with a hand signal, an authoritative tone actually does result in a quicker mean response than the positive tone. And that makes perfect sense—the direct hand signal could serve to emphasize the tone of authority, and vice versa.

But wait. Before I rush to publish my results in the Journal of Applied Canine Psychology, the world's leading paw-reviewed journal, there’s something else that needs to be considered.

Warning # 2: Beware of interpreting a significant interaction in a fractional factorial experiment without considering the aliasing structure.

Assessing Confounded Terms

Remember this experiment was a 1/8 fraction Resolution IV design. To reduce the number of required runs from 120 to 16, several interactions were confounded.

The alias structure of the original design indicated that the significant AD interaction (Voice*Hand signal) was confounded with two other 2-way interactions: EF and CG.

Let’s take a look at those confounded 2-way interactions. This is where you interpret your statistical results with a large dollop of good old-fashioned common sense.

Could an Empty Stomach Affect the Dog's Reaction to the Floor Surface?

EF is the 2-way interaction between the Meal factor (which indicated whether the command was given before or after the dog’s daily meal) and the Floor factor (which indicated whether the floor surface was carpet or wood).

How would hunger influence my dog’s response to the softness of the floor surface?

That interaction just doesn’t make any sense to me—so I decided to nix it as a possibility. (If I’d decided EF was a valid interaction, I’d need to keep factors E and F in the final model).

Could the Reward Given Affect the Dog's Reaction to the Procedure Used?

The other 2-way interaction confounded with AD was CG, the interaction between the Procedure and the Reward. Here’s what the interaction of those two factors looks like:

interaction plot 2

This interaction seems more plausible.

After the dog sits first and gets a dry biscuit, she may not be as motivated to lay down afterwards as she is when she gets the fresh ham after sitting first.

“A crummy old dry biscuit? Why, that’s hardly worth the extra effort!” she might think.

But when a reward of fresh ham is used, both procedures have a similar effect.

So both interactions AD and CE are plausible—and the final model accommodates both of them.

If I wanted to delineate the effects of these two interactions more clearly, I could perform a follow-up experiment with the 4 factors in the reduced model, using a design that doesn't confound AD and CE.

The Last Step: Checking Assumptions

In the first post of this 3-part series, I mentioned that my doggy DOE screening experiment was meant to be an illustrative example, rather than a bona fide DOE application.

Why? Because in a real DOE experiment, an important assumption is that each run in your experiment is independent. Because I measured the response of the same dog in each run, the runs in my experiment are dependent.

To make the runs independent, I'd need to randomly sample from a population of dogs, and then perform each run on a different dog. (You'd use a similar technique to conduct a DOE experiment to evaluate customer response, by randomly sampling customers).

That being the case, I fully expected see a run order effect in my experiment, as my dog began to respond more quickly to the Lay Down command with each run. But that didn't seem to happen:

residual plots

Which just goes to show that you can't teach an old, lazy dog new tricks. At least, not without a fridge full of fresh, honey-cured ham.

Translink Ticket Vending Machine found at all train stations in south-east Queensland. For one reason or another, the response variable in a regression analysis might not satisfy one or more of the assumptions of ordinary least squares regression. The residuals might follow a skewed distribution or the residuals might curve as the predictions increase. A common solution when problems arise with the assumptions of ordinary least squares regression is to transform the response variable so that the data do meet the assumptions. Minitab makes the transformation simple by including the Box-Cox button. Try it for yourself and see how easy it is!

The government in Queensland, Australia shares data about the number of complaints about its public transportation service. You can find the data at this link if you want to follow along:

http://translink.com.au/about-translink/reporting-and-publications/public-transport-performance-data

I’m going to use the data set titled “Complaints.” I’ll analyze the data a bit more thoroughly later, but for now I want to focus on the transformation. The variables in this data set are the date, the number of passenger trips, the number of complaints about a frequent rider card, and the number of other customer complaints. I’m excluding the data for the last week of 2012 and the first week of 2013 because ridership is so much lower compared to other weeks.

Let’s say that we want to use the number of complaints about the frequent rider card as the response variable. The number of other complaints and the date are the predictors. The resulting normal probability plot of the residuals shows an s-curve.

The residuals do not appear normal.

Because we see this pattern, we’d like to go ahead and do the Box-Cox transformation. Try this:

Choose Stat > Regression > General Regression.
In Response, enter the column with the number of complaints on the go card.
In Model, enter the columns that contain the other customer complaints and the date.
Click Box-Cox.
Check Box-Cox power transformation (W = Y**Lambda).
Click OK.
Click Graphs.
Select Individual plots and check Normal plot of residuals.
Click OK twice.

The residuals are more normal.

The probability plot that results is more linear, although it still shows outlying observations where the number of complaints in the response are very high or very low relative to the number of other complaints. You'll still want to check the other regression assumptions, such as homoscedasticity.

So there it is, everything that you need to know to use a Box-Cox transformation on the response in a regression model. Easy, right? Ready for some more? Check out more advantages of Minitab’s General Regression Tool.

The image of the Translink vending machine is by Brad Wood and is licensed for reuse under thisCreative Commons License.

All measurements are rounded to some degree. In most cases, you would not want to reject normality just because the data are rounded. In fact, the normal distribution would be a quite desirable model for the data if the underlying distribution is normal since it would smooth out the discreteness in the rounded measurements.

Some normality tests reject a very high percentage of time due to rounding when the underlying distribution is normal (Anderson-Darling and Kolmogorov-Smirnov), while others seem to ignore the rounding (Ryan-Joiner and chi square).

As an extreme example of how data that is very well modeled by a normal distribution can get rejected, consider a sample size of 5000 from a normal distribution with mean 10 and standard deviation of 0.14.

The display below shows a partial list of the data rounded to the nearest 100th, a histogram, and probability plots of that same data. The histogram and probability plots look great. The Ryan-Joiner Test passes Normality with a p-value above 0.10 (probability plot on the left). However, the Anderson-Darling p-value is below 0.005 (probability plot on the right). Clearly, rejecting Normality in a case like this is inappropriate.

normal distribution with mean 10 and SD of 0.14

A simulation was conducted to address a more common sample size, n=30. Data were simulated from a normal distribution with mean 0 and standard deviation 1, then rounded to the nearest integer. An example of a probability plot from this simulation appears below.

normal probability plot

In this iteration of the simulation, the Anderson-Darling P-value was less than 0.005 while the Ryan-Joiner P-value was greater than 0.10.

The simulation results were remarkably consistent, with the Anderson-Darling (AD) test almost always rejecting normality and the Ryan-Joiner (RJ) test almost always failing to reject normality. The Kolmogorov-Smirnov (KS) and Chi-square (CS) tests were included in the simulation too. The CS test was almost as good as the RJ test at avoiding rejecting normality due to rounding.

CI for Probability of Rejecting Normality

A second simulation was conducted with less extreme rounding*. Data were simulated from a normal distribution with mean 0 and standard deviation 2, then rounded to the nearest integer. An example of a probability plot from this simulation is below. In this iteration of the simulation, the Anderson-Darling P-value was less than 0.05 while the Ryan-Joiner P-value was greater than 0.10.

Normal probability plot

In this second simulation with less extreme rounding, the AD and KS tests did not reject as often.

Probability of Rejecting Normality

A third simulation was conducted with the same degree of rounding as the second simulation, but a larger sample size, n=100. Due to the larger sample size, the AD and KS tests went back to almost always rejecting normality. The RJ and CS tests again almost never rejected normality due to rounding.

Confidence Interval for Probability of Rejecting Normality

So far, we have only discussed avoiding rejecting normality due to rounding, but do the RJ and CS tests detect when there is truly a non-normal distribution? A final simulation was conducted with the same degree of rounding as the first simulation, but from an underlying non-normal distribution. Data were simulated from an exponential distribution with standard deviation 1, then rounded to the nearest integer. The AD and KS tests correctly rejected normality, but likely due to rounding as much as non-normality. The RJ and CS tests did detect the non-normality, just not as often as the AD and KS tests.

CI for Probability of Rejecting Normality

In summary, the RJ and CS tests avoid rejecting Normality just due to rounding (simulations 1-3) while still detecting data that truly comes from a non-normal distribution (simulation 4).

*The degree of rounding in this post is defined as (Measurement Resolution) / (Standard Deviation), where measurement resolution is the smallest change a measuring system can detect. The larger this ratio, the more extreme the rounding.

If you're scared this is what the playoff will look like, just remind yourself that Georgia Southern beat Florida this year without completing a pass.

Auburn beat Alabama this past weekend in one of the most incredible endings you’ll ever see. The stakes couldn't have been higher. With the defeat, Alabama lost any realistic shot at playing for the national championship this year.

However, if this defeat had occurred next year, the loss would mean absolutely nothing to Alabama, at least in terms of playing for a national championship. If four teams were selected to compete in a playoff, Alabama would certainly be one of those four teams despite the loss. Likewise, if four teams made the playoffs, undefeated Florida State could lose their game this weekend and be fine, too (though the same can’t be said of Ohio State). This has led some to complain that the BCS is actually a better format than the playoffs because we don’t end up with meaningless games. They all count!

Of course, this isn’t entirely correct. If Ohio State and Florida State both win this weekend, the SEC title game between #3 Auburn and #5 Missouri is completely meaningless. And what about #6 Oklahoma State vs. #17 Oklahoma? Oklahoma State currently has no shot at the BCS title game, so that game is completely meaningless. But if there were a four-team playoff, they would still have a shot.

(One quick tangent before I get to the point of this post: You might have stopped in the middle of the last paragraph and said “Wait, a 1-loss Auburn could still jump an undefeated Ohio State, so that game isn’t meaningless.” Actually, because Ohio State’s computer numbers will improve this week with a win, it would take almost every voter in the Harris and Coaches poll to vote Auburn ahead of Ohio State for Auburn to be #2 in the BCS. I was actually going to do an entire post going through the math, but instead I’ll just link to this article that sums it up nicely. And don’t think they’re biased just because it’s an Ohio State site. I looked into it and their numbers are legit.)

Okay, so where were we? Oh yeah...meaningless college football games! So, no matter the format (BCS or playoff)some meaningless football games will take place late in the season. But which system will create more of them? I went back through the last 5 years' worth of data and determined the number of meaningless games in each system that occurred during the last two weeks of the season. Meaninglessness, of course, can be a subjective judgment, so I made sure to specify what a meaningless game was before I started.

A game is meaningless in the playoff system if a team loses in the last two weeks of the season, but still would have been in a four-team playoff. I know for the playoff there will be a selection committee to determine the teams, but for the sake of this post, I used the top 4 teams in the final BCS standings as my “playoff” teams. This year’s Alabama team is the perfect example.
A game is meaningless in the BCS system if a team lost in the last two weeks of the season and still ended up in the BCS title game or if they finished outside the top two spots in the final BCS and in the last two weeks of the season playing in games that would have secured them a spot in a playoff. Last year’s game between Florida and Florida State was a perfect example of the latter. Neither team had a shot at the BCS title game, but at 10-1, both had a shot at making a playoff. Florida ended up winning and finished #3 in the BCS. But when you only take the top two teams, that game becomes meaningless.

Keep in mind a game can be meaningful in both situations, too! Last year #2 Alabama played #3 Georgia in the SEC Championship game. Alabama won and went to the BCS title game, while the loss plummeted Georgia to #10 in the final BCS standings. So that game would have been for a spot in both the BCS title game and a playoff. Therefore, it was meaningful in both.

But enough talk, let’s get to the data analysis!

The Statistics

I used Minitab to tally the number of meaningful and meaningless games there would have been the last 5 seasons. Keep in mind I only used the last two weeks of each season. You can get the data I used here.

Tally

So there have only been 2 games in the last 5 years where a team ranked #1 or #2 lost in the last two weeks of the season and still finished in the top 4 of the final BCS standings. One of those teams was Alabama in 2008. The Tide lost to the Tim Tebow-led Florida Gators in the SEC title game, but still finished #4 in the final BCS standings.

The other team was LSU in 2007. The #1-ranked Tigers lost to Arkansas in their regular season finale, a game that everybody thought cost them a chance for a national championship. But not only did the Tigers finish in the top 4 of the BCS, they actually finished #2! So that game was meaningless for both a playoff and the BCS systems!

So the idea that a playoff system will result in a lot of late-season games in which a team loses but still makes the playoffs isn’t correct. Sure, every now and then it will happen, but not very often. And on the flip side, look at the number of games that have been rendered meaningless because only the top two teams make it: 16 games! You think the regular season is going to become boring because of a playoff system? Au contraire mon ami!

It’s going to become more exciting!

I’m not going to go through all 16 games, but let’s see if we can determine a year where we missed a lot of excitement because we didn’t have a four-team playoff.

Tabulated statistcs

We see that in both 2007 and 2010, there were four games in the last two weeks of the season that would have been meaningful if we had a playoff system. If you’re a college football fan, you’re probably familiar with the craziness that happened in 2007 just with the BCS system (the #1 and #2 ranked teams both lost in each of the last two weeks of the season). So let’s focus on 2010.

What Happened

Oregon and Auburn were ranked #1 and #2 and were both undefeated. In the last two weeks of the season, Oregon won both of their games by 3 scores. Auburn won a thrilling Iron Bowl against Alabama by a point, then won the SEC Championship game by 39 points.

So one exciting game and 3 snooze fests. And by the way, in the second-to-last week of the season, Oregon and Auburn both won on Friday (it was the week of Thanksgiving). So we had an entire Saturday of football the next day in which none of the games mattered. Nice job, BCS!

But what if there'd been a four-team playoff instead?

What Could Have Happened

The Friday night after Oregon and Auburn had already won, #4 Boise State played #19 Nevada. Nevada won a thrilling game in overtime, knocking previously undefeated Boise State out of the playoffs. That meant #5 LSU was next in line to get the last playoff spot. But the next day, they lost to Arkansas! That left another spot open, which Stanford grabbed with a win over Oregon State. TCU was ranked #3 in the BCS, so they also had a meaningful game that Saturday against New Mexico, which they won easily.

So that’s four extra games in just one weekend (neither TCU nor Stanford played the next week) that would have been meaningful if 4 teams made the playoffs. And keep in mind I’m actually lessening the number of “meaningful” games in the playoff by automatically taking the top 4 BCS teams. In 2010, Wisconsin was sitting at #5 with the same record as Stanford. When you consider the fact that a selection committee might have taken the Big Ten champion over the Pac 10 runner up (or even an undefeated TCU), then all of Wisconsin’s late season games become meaningful too!

So don’t fear the 4 team playoff. It’s not going suddenly make the end of the college football season boring. Not even close. All it’s going to do is give us more opportunities for moments like this!

Face it, you love regression analysis as much as I do. Regression is one of the most satisfying analyses in Minitab: get some predictors that should have a relationship to a response, go through a model selection process, interpret fit statistics like adjusted R2 and predicted R2, and make predictions. Yes, regression really is quite wonderful.

Except when it’s not. Dark, seedy corners of the data world exist, lying in wait to make regression confusing or impossible. Good old ordinary least squares regression, to be specific.

For instance, sometimes you have a lot of detail in your data, but not a lot of data. Want to see what I mean?

In Minitab, choose File > Open Worksheet.
Click Look in Minitab Sample Data Folder.
Open Soybean.mtw.

The data has 88 variables about soybeans, the results of near-infrared (NIR) spectroscopy at different wavelengths. The data contain only 60 measurements, and the data are arranged to save 6 measurements for validation runs. With ordinary least squares regression, you only estimate as many coefficients as the data have samples. Thus, the traditional method that’s satisfactory in most cases would only let you estimate 53 coefficients for variables plus a constant coefficient. This could leave you wondering about whether any of the other possible terms might have information that you need.

The NIR measurements are also highly collinear with each other. This multicollinearity complicates using statistical significance to choose among the variables to include in the model.

When the data have more variables than samples, especially when the predictor variables are highly collinear, it’s a good time to consider partial least squares regression.

Try these steps if you want to follow along with the soybean data:

Choose Stat > Regression > Partial Least Squares.
In Responses, enter Fat.
In Model, enter ‘1’-‘88’.
Click Options.
Under Cross-Validation, select Leave-one-out. Click OK.
Click Results.
Check Coefficients. Click OK twice.

One of the great things about partial least squares regression is that it forms components and then does ordinary least squares regression with them. Thus the results include statistics that are familiar. For example, predicted R2 is the criterion that Minitab uses to choose the number of components.

Minitab selects the model with the highest predicted R-squared.

Each of the 9 components in the model that maximizes the predicted R2 value is a complex linear combination of all 88 of the variables. So although the ANOVA table shows that you’re using only 9 degrees of freedom for the regression, the analysis uses information from all of the data.

The regression uses 9 degrees of freedom.

The full list of standardized coefficients shows the relative importance of each predictor in the model. (I’m only showing a portion here because the table is 88 rows long.)

Each variable has a standardized coefficient.

Ordinary least squares regression is a great tool that’s allowed people to make lots of good decision over the years. But there are times when it’s not satisfying. Got too much detail in your data? Partial least squares regression could be the answer.

Want more partial least squares regression now? Check out how Unifi used partial least squares to improve their processes faster.

The image of the soybeans is by Tammy Green and is licensed for reuse under thisCreative Commons License.

Example of Minitab's fitted line plot I’ve written a number of blog posts about regression analysis and I think it’s helpful to collect them in this post to create a regression tutorial. I’ll supplement my own posts with some from my colleagues.

This tutorial covers many aspects of regression analysis including: choosing the type of regression analysis to use, specifying the model, interpreting the results, determining how well the model fits, making predictions, and checking the assumptions. At the end, I include examples of different types of regression analyses.

If you’re learning regression analysis right now, you might want to bookmark this tutorial!

Why Choose Regression and the Hallmarks of a Good Regression Analysis

Before we begin the regression analysis tutorial, there are several important questions to answer.

Why should we choose regression at all? What are the common mistakes that even experts make when it comes to regression analysis? And, how do you distinguish a good regression analysis from a less rigorous regression analysis? Read these posts to find out:

Tribute to Regression Analysis: See why regression is my favorite! Sure, regression generates an equation that describes the relationship between one or more predictor variables and the response variable. But, there’s much more to it than just that.
Four Tips on How to Perform a Regression Analysis that Avoids Common Problems: Keep these tips in mind through out all stages of this tutorial to ensure a top-quality regression analysis.

Tutorial: How to Choose the Correct Type of Regression Analysis

Minitab's regression menu Minitab statistical software provides a number of different types of regression analysis. Choosing the correct type depends on the characteristics of your data, as the following posts explain.

Giving Thanks for the Regression Menu: Patrick Runkel goes through the regression choices using a yummy Thanksgiving context!
Linear or Nonlinear Regression: How to determine when you should use one or the other.

Tutorial: How to Specify Your Regression Model

Fitting a curved relationship with Minitab Choosing the correct type of regression analysis is just the first step in this regression tutorial. Next, you need to specify the model. Model specification consists of determining which predictor variables to include in the model and whether you need to model curvature and interactions between predictor variables.

Specifying a regression model is an iterative process. The interpretation and assumption verification sections of this regression tutorial show you how to confirm that you’ve specified the model correctly and how to adjust your model based on the results.

Stepwise and Best Subsets Regression: Minitab provides two automatic tools that help identify useful predictors during the exploratory stages of model building.
Curve Fitting with Linear and Nonlinear Regression: Sometimes your data just don’t follow a straight line and you need to fit a curved relationship.
Interaction effects: Michelle Paret explains interactions using Ketchup and Soy Sauce.
Proxy variables: Important variables can be difficult or impossible to measure but omitting them from the regression model can produce invalid results. A proxy variable is an easily measurable variable that is used in place of a difficult variable.

Tutorial: How to Interpret your Regression Results

So, you’ve chosen the correct type of regression and specified the model. Now, you want to interpret the results. The following topics in the regression tutorial show you how to interpret the results and effectively present them:

Regression coefficients and p-values
Regression Constant (Y intercept)
R-squared and the goodness-of-fit
Adjusted R-squared and Predicted R-squared
How to Present Your Regression Results to Avoid Costly Mistakes: Research has shown that how you present regression results affects the number of interpretation mistakes.

Tutorial: How to Use Regression to Make Predictions

How to predict with Minitab In addition to determining how the response variable changes when you change the values of the predictor variables, the other key benefit of regression is the ability to make predictions. In this part of the regression tutorial, I cover how to do just this.

How to Predict with Minitab: A prediction guide that uses BMI to predict body fat percentage.
Predicted R-squared: This statistic indicates how well a regression model predicts responses for new observations rather than just the original data set.
Prediction intervals: See how presenting prediction intervals is better than presenting only the regression equation and predicted values.
Prediction intervals versus other intervals: I compare prediction intervals to confidence and tolerance intervals so you’ll know when to use each type of interval.

Tutorial: How to Check the Regression Assumptions and Fix Problems

Illustration of residuals Like any statistical test, regression analysis has assumptions that you should satisfy, or the results can be invalid. In regression analysis, the main way to check the assumptions is to assess the residual plots. The following posts in the tutorial show you how to do this and offer suggestions for how to fix problems.

Residual plots: What they should look like and reasons why they might not!
Multicollinearity: Highly correlated predictors can be a problem, but not always!
Heteroscedasticity: You want the residuals to have a constant variance (homoscedasticity), but what if they don’t?
Box-Cox transformation: If you can’t resolve the underlying problem, Cody Steele shows how easy it can be to transform the problem away!

Examples of Different Types of Regression Analyses

The final part of the regression tutorial contains examples of the different types of regression analysis that Minitab can perform. Many of these regression examples include the data sets so you can try it yourself!

Binary Logistic Regression: Predicts the winner of the 2012 U.S. Presidential election.
Linear Regression: Great Presidents by Patrick Runkel and my follow up, Great Presidents Revisited.
Linear regression with a double-log transformation: Examines the relationship between the size of mammals and their metabolic rate with a fitted line plot.
Nonlinear regression: Kevin Rudy uses nonlinear regression to predict winning basketball teams.
Orthogonal regression: Carly Barry shows how orthogonal regression (a.k.a. Deming Regression) can test the equivalence of different instruments.
Partial least squares (PLS) regression: Cody Steele uses PLS to successfully analyze a very small and highly multicollinear data set.

Much is made following the World Cup draw every four years over which group is the “group of death.” This is generally considered to be a really difficult group that is tough to advance from, although there is no true definition (more on that below).

First, for readers not familiar with World Cup groups, a brief explanation of how teams are “grouped” in the World Cup is in order. Thirty-two teams qualify to compete, and they are placed into eight different groups labeled A-H, with each having a predetermined “top” team. In the group stage of the World Cup, each team plays the other three teams in their group, and the top two teams move into a standard bracketed tournament. The bottom two teams watch the rest of the tournament on TV, like the rest of us.

So how do you decide which group is the Group of Death?

There is no formal definition, so let’s look at a few possibilities and see what we can learn by analyzing some data. To do so, I’m going to use the team ratings found at eloratings.net although other rating systems exist.

Highest Average Rating

One quick method to look for a Group of Death would be to see which group has the highest average rating among their four teams…this group would be the most difficult, after all.

Here are the ratings of every team in every group, with the group average plotted as well:

Highest Average Rating

Group B (Spain, Netherlands, Chile, Australia) obviously has the highest average, followed by Group G (Germany, Portugal, USA, Ghana). So based on this method, Group B is the most difficult and the Group of Death for 2014.

Variation

But take a closer look at Group B, and you’ll notice how widely spread the teams are…the top team (Spain) is significantly better than average, and the bottom team (Australia) probably has little hope of qualifying. If the top team is likely to qualify and the bottom team likely is not, then the group really can’t be that competitive and considering it a “Group of Death” seems a bit dramatic.

What if instead we considered the Group of Death to be one in which the teams are closely rated, which makes the group highly competitive – even the best team is far from certain to advance past the group stage.

So instead, let’s look at the standard deviation of the ratings within each group:

Variation

Now we see that Group C (Colombia, Greece, Ivory Coast, Japan) is actually highly competitive and may qualify as the Group of Death.

2nd to 3rd Gap

It could be, however, that the entire group doesn’t need to be highly competitive in order to be a “Group of Death.” The Group of Death designation could simply mean that it is not clear which teams are most likely to advance. Having one really weak team in your group would raise the variation, but if the other three are very competitive with one another then your group could easily get the title.

First, let’s take a look at the rating difference between the second- and third-highest rated teams in each group:

2-3 Gap

By this measure, Groups A, D, and E look to be very competitive for making it to the second spot to advance.

Top 2 Versus Bottom 2

Another measure of competitiveness for advancing would be to look at the average ratings of the top two teams in each group versus the bottom two, as a large difference indicates that it is unlikely there will be surprises within the group:

Top 2 vs Bottom 2

By this measure, Groups C, D, and E appear highly competitive and the bottom teams can easily surprise and move on, making it a Group of Death for teams rated at the top.

Best Team Left Out

As a final measure, I’ll look at the highest-rated team left out after the group stage. The third-highest rated team in any group has the lowest possible rating of the best team left out. So the better that team is, the better the group fits the Group of Death mold: a very good team will not move on because of the difficulty of the group.

Here is the rating of the third-best team in each group:

3rd-best Rating

Groups B and D each have really good teams that are only the third-best in the group, meaning at least one really good team will not advance from these groups. In fact, the third-best teams in these groups rank 10th (Chile) and 11th (Italy) in the world! By comparison, the third-best team in Group H is ranked 42nd (South Korea) in the world.

So Which Group Is the Group of Death?

The analysis here definitely illustrates one thing: without any agreed-upon definition of a “Group of Death,” it is very difficult to clearly distinguish which group deserves the designation.

Sometimes there is no single metric that best measures what you’re trying to analyze, and looking at a few different measure helps clarify things. Although not the most extreme on many of these methods, I would look at Groups B and G as the most likely candidates for Group of Death in 2014 World Cup, as each tended to rate well on most measures.

That is, unless there isn’t really such a thing as a Group of Death…

Peyton Manning in cold weather If you’re a believer that Peyton Manning plays worse in cold weather, the last few weeks have only strengthened your resolve. In 3 of his last 4 games, he’s played in temperatures below 40 degrees, and come out with a record of 1-2. In his other “warm weather” games this season, Manning has a record of 10-1. This continues a theme that has plagued Manning his entire career, that he underperforms when the temperature goes south.

But will the statistics support that theory?

Peyton’s Statistics in Cold Weather Games

Thanks to an article by the Mile High Report, I was able to obtain data on every one of Peyton’s games where the temperature was at or below 40 degrees. They had 21 such games, all from 2012 and earlier. I added three more games from 2013 and threw out the 2010 game against Buffalo (Peyton was pulled after the 1st quarter because the Colts already had already clinched home-field advantage throughout the playoffs). So I ended up with 23 games where Peyton played in “cold weather”. You can get the data I used here.

Manning’s record in those 23 games is 11-12. Not very good for a player whose career winning percentage is 0.693. But we’re going to dive deeper than simply wins and losses and look at his individual statistics. The following table compares his cold weather numbers with his career averages. And to let you know, the average temperature in his cold games was 31.9 degrees.

Career

Cold Weather

Difference

Completion %

65.4

63.1

2.3

Yards/Attempt

7.7

7.5

0.2

QB Rating

96.9

86.2

10.7

We see that the differences for completion percentage and yards per attempt are negligible. We do see that his QB Rating is over 10 points worse in the cold weather games, but keep in mind our sample size is only 23. We should perform a 1-sample t test to determine if this difference is actually statistically significant.

1-sample t

It’s really close. The p-value isn’t less than 0.05, but it is less than 0.1. So if we used an alpha value of 0.1, we would be able to conclude that Peyton Manning has a worse QB rating when he plays in cold weather.

But hold on, is cold weather the only factor?

We should be wary of a lurking variable. That is, a variable that is not included in the analysis but may affect the interpretation of the results. And that lurking variable appears to be the fact that Peyton has played 19 of the 23 cold weather games on the road. For his career, Peyton’s QB Rating at home is 100.1, while on the road it is 93.7. So clearly part of his poor play in cold weather games is due to the fact that so many of them have come on the road.

Manning’s cold weather numbers are often compared to Tom Brady’s. But that’s not really fair, since Brady gets to play most of his cold weather games at home. If only there were another great quarterback in the NFL that played most of his games in a dome who we could look at and see how he performed in cold weather. Oh wait………Drew Brees!

The Drew Brees comparison

I used pro-football reference.com to collect data on cold weather games for Drew Brees. I only used years where he played for the Saints (2006 and on) to get the dome to cold weather comparison. He has played in 10 such games, although I did include 3 games where the temperature was 41 and 42 degrees to increase the sample size (I say it’s close enough to the 40 degrees I used for Manning). The average temperature of those 10 games was 34 degrees, similar to Manning’s average temperature. Here is how Drew's cold weather numbers stack up against his career numbers.

Career (in New Orleans)

Cold Weather

Difference

Completion %

67.2

64.3

2.9

Yards/Attempt

7.7

7.1

0.6

QB Rating

98.9

88.1

10.9

The difference between Drew Brees’s career numbers and cold weather numbers are all greater than Peyton! Of course, 10 games is a very small sample size (though 23 games isn’t that large either). So let’s do our due diligence and see if the difference in QB rating is statistically significant.

1-sample t

Holy cow, the p-value is 0.03! Even with the small sample size, we can conclude that Drew Brees' QB rating in cold weather games is significantly lower than his career average in New Orleans! The reason is that Brees has a lot less variation. Manning’s QB rating in cold games ranges from 31.2 to 157.5, which results in a standard deviation of 33. Meanwhile, Brees only ranges from 67.2 to 120.3, which results in a much smaller standard deviation of 16. So even with a similar difference and a smaller sample size than Manning, we’re able to get a smaller p-value.

And to make matters worse for Brees, his record in those 10 games is 3-7. That’s terrible when you compare it to his New Orleans winning percentage of 0.632. With all this information, we can definitely conclude that Drew Brees plays worse in cold weather, right?

Well, just like we did with Manning, you have to consider other factors that might be in play, like where the game was played. All 10 of the games I used for Brees occurred on the road. And just like Manning, Drew Brees has a lower QB rating on the road than he does at home (99.9 compared to 90.4).

The fact is that if you take any great offense that plays in a dome and make them play on the road outside of a dome, they’re not going to be as good, regardless of the temperature. What’s happening with Manning and Brees is that we’re taking their career numbers (a majority of which occurred inside a dome), and comparing it to a small subset of games that occurred on the road outside of a dome. Of course their statistics are going to be lower! Might some of the reason be the cold temperature? Sure. But I believe a majority of the reason is simply the fact that they’re playing on the road.

But this still leaves one more question. If Manning and Brees both appear to play worse in “cold weather games”, why do people constantly talk about Manning and cold weather games but never mention Brees?

Why do we focus on Manning?

Here is how Peyton Manning exited the playoffs in the 2002, 2003, and 2004 NFL seasons.

2002: Lost to the Jets 41-0. Temperature was 34 degrees. QB rating of 31.2.
2003: Lost to the Patriots 24-14. Temperature was 31 degrees. QB rating of 35.5.
2004: Lost to the Patriots 20-3. Temperature was 27 degrees. QB rating of 69.3.

That’s 3 cold weather losses in playoff games where everybody was paying attention. In the age of 24/7 sports coverage where we try and explain everything, this is all it took to get the narrative going.

Never mind that in the playoffs you face better teams. Never mind that all three games were on the road. Never mind that the week before the 2003 loss to the Patriots, Manning had a 3-touchdown, 300-yard playoff win at Kansas City in cold weather. These three games were proof that Manning plays worse in cold weather!

Meanwhile, Drew Brees has had one playoff game in cold weather, a 39-14 loss to Chicago in the NFC Championship game. It was 27 degrees that day, but since it was the only time he’s failed in cold weather in the playoffs, the media pretty much ignored it. And they’re definitely not going to remember the slew of regular season games Brees lost in cold weather. The playoffs are where it’s at!

But if there was ever a chance for Peyton to dispel his cold weather characteristic, this is the year. Denver should be able to clinch hold field advantage throughout the playoffs, meaning Manning will get some home playoff games in cold weather as opposed to going on the road (I know he had one last year, and it’s really too bad Peyton made a terrible play on the ball and allowed Baltimore to score on a 70-yard touchdown pass with 31 seconds left). And then the Super Bowl will be played in New Jersey…in February! There’s potential for 3 cold-weather games, none of them having to come on the road. Of course, if he loses just one of them, the narrative will continue. But if Peyton wins them all, the media won’t know what to do. Their cold weather narrative will be dead!

Then again, there is always Drew Brees…

angel This close to the holidays, it’s hard to stay focused on work.

I should be writing a post about useful estimation tools for quality statistics. But all those yuletide carols about hosts of angels singing from on high have distracted me.

Alas, I’ve fallen into the clutches of one of the world’s oldest estimation problems, posed centuries ago by medieval scholars:

Just how many heavenly angels can dance simultaneously on the point of a pin?

The answer to this question assumes that you believe in the existence of pins, of course.

Estimation in the Middle Ages: Ask Your Doctor

Over the centuries, a variety of estimation methods have been used to estimate the number of angels that can shake it on a pin.

One of the first concrete estimates can be found in a 14th century German work, Schwester Katrei. The book states that, according to doctors, 1,000 angels can comfortably fit on the point of a needle at one time.

mouse How did medieval doctors know this? Presumably, the same way they knew that you could change husks of wheat into live mice by placing the husks in an open jar with a pair of sweaty underwear and waiting for about 21 days.

Spontaneous generation was all the rage back then. Perhaps the estimate itself was spontaneously generated from a pair of underwear.

At any rate, without more details concerning methodology, we have no idea whether this estimate was based on a random sample of angels from the entire population of heaven.

Be skeptical.

A Simple Solution Based on Superstring Theory

From the early 1300s to the late 1900s, the human race was crazy busy. It was a time of rapid progress and revolutionary advances that saw the invention of the wooden bathing suit, the parachute hat, and the detachable moustache guard. As a result, we made little headway into the angel-on-a-pin problem.

string theory Thankfully, the advent of theoretical physics changed that. In 1995, Dr. Phil Schewe of the American Institute of Physics took another shot at the estimate, using basic concepts of superstring physics.

According to superstring physics, space cannot be infinitely divided. Ultimately, even the breakdown of space breaks down. This happens when the distance scale reaches a value of 10-35 meters. (You can verify this yourself with a ruler at home.)

Shewe assumed that the size of the pinpoint was equivalent to a single atom, which is 1 angstrom, or 10-10 meters. So to estimate the maximum number of angels that could fit on the pinpoint, he simply divided the size of the pinpoint by the smallest possible breakdown of space possible:

10-10/ 10-35 = 1025

This simple, elegant approach results in an estimate between one septillion (1024) and one octillion (1027) angels. The estimate does not account for bumping wings, however, making it vulnerable to subsequent attack.

Quantum Gravitation: Factoring in Black Holes

black hole Eschewing Schewe, Anders Sandberg of the Royal Institute of Technology tackled the problem in another way.

Instead of applying the seldom-used KISS principle (Keep It Simple, Stupid), Anders opted for the ever-popular MUCK principle (Manufacture Unfathomable Complexities, Knucklehead).

Sandberg notes that Schewe's estimate assumes that there is no overlap of angels on the pinpoint. But this defies basic laws of quantum mechanics, because "when packed at quantum gravity densities, the uncertainty relation will cause their wave function to overlap significantly even if there is a strong degeneracy pressure."

That goes without saying.

Using the upper limit of entropy, the Bekenstein bound, along with the assumption that angels have no mass, Sandberg comes up with an estimate 2.448*105 angels--about a quarter million angels. (To get a more concrete sense of this, imagine everyone from Fort Wayne, Indiana dancing the cha-cha on the tip of an IBM scanning tunneling microscope.)

If the angels have mass, however, it's a whole 'nother quantum ballgame. In that case, "each angel contributes enough mass-energy to allow the information of an extra angel to move in." Which means, paradoxically, that more angels can then pile onto the pinhead. Until they reach a critical mass that causes the pin to collapse into a black hole, scattering feathers all across the galaxy.

Based on these constraints, the upper bound of angels can be estimated at 8.6766*1049. That's somewhere between 1 quinquadecillion and 1 sexdicillion angels.

Note: This estimate does not hold true for all types of dances. Also, the angels must dance with speeds close to the velocity of light, which rules out a foxtrot. For more quantum caveats, click here.

A Better Idea

If all these quantum estimates leave you feeling woozy and confused, there's a better option. Open a copy of Minitab Release 6.17*1078 and choose Stat > Pinpoint > Count Angels. Under Methods, choose Quantum: With Overlap. Click OK.

Then have yourself a wonderful holiday season and a great New Year.

Attributions: Black hole image by Ute Kraus, licensed under Creative Commons.

Anders Sandberg
SANS/NADA, Royal Institute of Technology, Stockholm, Sweden - See more at: http://www.improbable.com/airchives/paperair/volume7/v7i3/angels-7-3.htm#sthash.dYOFUYa3.dpuf

Everyone loves Minitab’s Assistant. My favorite bit, as I’ve shown with the Gage R&R Study, is the way that the Assistant puts all the results you need into reports that are easy to understand and present. But it’s also pretty neat that before you ever choose what to do in Minitab, the Assistant is ready to help you. Let’s take a closer look at the Assistant's Graphical Analysis tools.

Help Me Choose

Choose Assistant > Graphical Analysis and the most prominent thing you’ll see is a question:

What is your objective?

But you’re not left with just the three objectives. Select "graph variables over time," and before you have to pick one of the 6 possible charts, the Assistant offers to help you choose.

Clicking "Help me choose" takes you to a full flow chart that reveals important considerations in choosing which chart to make:

Choose a graphical analysis: Graph variables over time

But it gets even better.

Chart characteristics explained

Each decision point on an Assistant flow chart is clickable. So if you click “Do you have subgroups?” you get helpful content with an example to help you decide whether you have subgroups or not. Click “Next” to go to the next decision on the flow chart or click “Cancel” to go back to the entire flow chart.

Are your data in subgroups?

Guidelines for collecting your data and using your chart

Under each chart on the flow chart is the word “more.” Click “more” and you get a set of guidelines for collecting your data and for using your chart. By clicking the arrows at the top, you can get more information on any of the characteristics that would lead you to choose this chart. But if you’re ready to go, you can click the image of the chart so that you can get started telling Minitab about your data!

I-MR Chart Guidelines

The Assistant in Minitab Statistical Software provides a lot of great tools and information. So much that you wouldn’t want to miss out on any of it. So the next time you need a refresher for choosing a measurement systems analysis, capability analysis, graph, hypothesis test, or control chart, remember, Minitab is ready to assist!

R-squared Just how high should R2 be in regression analysis? I hear this question asked quite frequently.

Previously, I showed how to interpret R-squared (R2). I also showed how it can be a misleading statistic because a low R-squared isn’t necessarily bad and a high R-squared isn’t necessarily good.

Clearly, the answer for “how high should R-squared be” is . . . it depends.

In this post, I’ll help you answer this question more precisely. However, bear with me, because my premise is that if you’re asking this question, you’re probably asking the wrong question. I’ll show you which questions you should actually ask, and how to answer them.

Why It’s the Wrong Question

How high should R-squared be? There’s only one possible answer to this question. R2 must equal the percentage of the response variable variation that is explained by a linear model, no more and no less.

When you ask this question, what you really want to know is whether your regression model can meet your objectives. Is the model adequate given your requirements?

I’m going to help you ask and answer the correct questions. The questions depend on whether your major objective for the linear regression model is:

Describing the relationship between the predictors and response variable, or
Predicting the response variable

R-squared and the Relationship between the Predictors and Response Variable

This one is easy. If your main goal is to determine which predictors are statistically significant and how changes in the predictors relate to changes in the response variable, R-squared is almost totally irrelevant.

If you correctly specify a regression model, the R-squared value doesn’t affect how you interpret the relationship between the predictors and response variable one bit.

Suppose you model the relationship between Input and Output. You find that the p-value for Input is significant, its coefficient is 2, and the assumptions pass muster.

These results indicate that a one-unit increase in Input is associated with an average two-unit increase in Output. This interpretation is correct regardless of whether the R-squared value is 25% or 95%!

Asking “how high should R-squared be?” doesn’t make sense in this context because it isn’t relevant. A low R-squared doesn’t negate a significant predictor or change the meaning of its coefficient. R-squared is simply whatever value it is, and it doesn’t need to be any particular value to allow for a valid interpretation.

In order to trust your interpretation, which questions should you ask instead?

R-squared and Predicting the Response Variable

If your main goal is to produce precise predictions, R-squared becomes a concern. Predictions aren’t as simple as a single predicted value because they include a margin of error; more precise predictions have less error.

R-squared enters the picture because a lower R-squared indicates that the model has more error. Thus, a low R-squared can warn of imprecise predictions. However, you can’t use R-squared to determine whether the predictions are precise enough for your needs.

That’s why “How high should R-squared be?” is still not the correct question.

Which questions should you ask? In addition to the questions above, you should ask:

Are the prediction intervals precise enough for my requirements?

Don’t worry, Minitab Statistical Software makes this easy to assess.

Prediction intervals and precision

A prediction interval represents the range where a single new observation is likely to fall given specified settings of the predictors. These intervals account for the margin of error around the mean prediction. Narrower prediction intervals indicate more precise predictions.

Fitted line plot of using BMI to predict body fat percentage For example, in my post where I use BMI to predict body fat percentage, I find that a BMI of 18 produces a prediction interval of 16-30% body fat. We can be 95% confident that this range includes the value of the new observation.

You can use subject area knowledge, spec limits, client requirements, etc to determine whether the prediction intervals are precise enough to suit your needs. This approach directly assesses the model’s precision, which is far better than choosing an arbitrary R-squared value as a cut-off point.

For the body fat model, I’m guessing that the range is too wide to provide clinically meaningful information, but a doctor would know for sure.

Read about how to obtain and use prediction intervals.

R-squared Is Overrated!

When you ask, “How high should R-squared be?” it’s probably because you want to know whether your regression model can meet your requirements. I hope you see that there are better ways to answer this than through R-squared!

R-squared gets a lot of attention. I think that’s because it appears to be a simple and intuitive statistic. I’d argue that it’s neither; however, that’s not to say that R-squared isn’t useful at all. For instance, if you perform a study and notice that similar studies generally obtain a notably higher or lower R-squared, it would behoove you to investigate why yours is different.

If you're just learning about regression, read my regression tutorial!

“Turnovers are like ex-wives. The more you have, the more they cost you.”– Dave Widell, former Dallas Cowboys lineman

It doesn’t take witty insight from a former NFL player to realize how big an impact turnovers can have in a football game. Every time an announcer talks about “Keys to the Game,” winning the turnover battle is one of them. And as Cowboys fans know all too well, an ill-timed interception can ruin not only your chances of winning that game, but it can ruin your entire season, too.

But hold on a minute. A few weeks ago, Andrew Luck and the Colts proved that you could still win a game despite having 3 more turnovers than the opposing team. In fact, teams that lost the turnover battle are 4-4 so far in the NFL playoffs this year. So is it possible that we overvalue the importance of turnovers, or are the 8 playoff games I’m looking at just a small sample?

How Much of an Impact Do Turnovers Have on a Team’s Season?

The butt fumble, arguably the greatest turnover of all time.

Obviously in any one game, a team can lose the turnover battle and still win the game. But is that sustainable over the course of a 16-game NFL season?

For all 32 NFL teams, I recorded their turnover differential, their regular season winning percentage, and their regular season scoring differential. I want to see if having a positive turnover differential (defined as your defense creating more turnovers than your offense commits) led to a higher winning percentage and scoring differential. You can get the data here.

I used Minitab to create fitted line plots of the data (Stat > Regression > Fitted Line Plot...). Let’s start by comparing a team’s turnover differential to their winning percentage.

Turnovers vs. Winning Percentage

There is a positive correlation between turnover differential and winning percentage. We can conclude that 44% of the variation in a team’s winning percentage can be explained by their turnover differential. This shows that turnovers are not overvalued when it comes to winning percentage. Andrew Luck may have won a single game with a turnover differential of -3, but this plot shows (as did his next playoff game against New England) that you can’t continue to turn the ball over and win football games.

But we shouldn’t rest our laurels on winning percentage. Scoring differential is a better indicator of how good a team really is, so let’s see if the same trend holds when we use that as the response instead of winning percentage.

Turnovers vs. Scoring

This graph looks almost completely the same. It doesn’t matter if we look at winning percentage or scoring differential, turnovers play a major part in both. And one thing I’d like to point out is the Denver Broncos. Despite having a turnover differential of 0 for the entire season, they still outscored their opponents by over 200 points. Considering Peyton Manning threw only 10 interceptions (only 4 teams had fewer all season), it’s scary to think of what that team could have done if their defense actually created some more turnovers!

Are Turnovers Skill or Luck?

We just saw that teams with a better turnover differential have a better record and scoring differential. But are those teams with a high differential skilled at generating/avoiding turnovers, or are they just lucky? To answer this question, let’s start with a defense’s fumble recovery. I took the number fumbles recovered by a defense through the 1st half of the NFL season, and compared it to the number of fumbles they recovered the 2nd half of the season. If causing and recovering fumbles is truly a skill, then defenses with high recoveries in the 1st half of the season would continue to do so the rest of the season.

Fumbles

This clearly shows that there is no skill in causing and recovering fumbles in the NFL. The number of fumbles a defense recovers in the 1st half of the season explains 1.6% of the variation in the number of fumbles they recover in the 2nd half of the season. It’s completely random. But even though teams have no control over their fumble recoveries, we saw previously that it plays a huge part in their record.

Take for instance, the Pittsburgh Steelers (top left corner).

The Steelers started the season going 2-6, with their defense recovering only 1 fumble (worst in the NFL). But in the 2nd half of their season, their luck changed, and they recovered 9 fumbles (best in the NFL). Not surprisingly, their record dramatically improved in those final 8 games to 6-2. Now, the fumble recoveries don’t fully explain their turnaround, but it definitely played a big role. And it’s a much more tangible explanation than banning games in the locker room.

Ok, so it’s really no surprise that fumble recoveries are luck, but what about defensive interceptions? Surely a team with a really good secondary should have a higher interception rate than those with a less talented secondary. Let’s see what the statistics say!

Interceptions

Just like fumbles, defensive interceptions are completely random! Look at Tampa Bay at the top of the plot. After the 1st half of the season they had only 6 interceptions and were 0-8. Would anybody have predicted that in the 2nd half of the season they would more than double their interceptions and have as many as the Seattle Seahawks (considered one of the best defenses in the NFL)? Of course not! Oh, and after going 0-8 to start the season, Tampa Bay finished 4-4. Their increase in interceptions was a major factor, as 11 of their 15 interceptions in the 2nd half of the season came in their 4 wins. It’s easy to look back in hindsight and say those interceptions explain why they won. But this plot shows that defensive interceptions are so random that it is impossible to predict in which games they will occur.

Another thing I’d like to point out is the team on the lower left-hand corner of all 3 plots we’ve looked at. It’s the Houston Texans. They weren’t able to force many fumbles or interceptions in either half of the season, and not surprisingly had the worst record in the NFL! No matter how bad your defense is (not to mention one with J. J. Watt on it), it’s extremely unlucky to get so few turnovers. Heck, even the Jacksonville Jaguars defense had 8 interceptions in the 2nd half of this season! Next year the Texans will have a new head coach in Bill O’Brien, and will likely draft a new quarterback with their #1 overall draft pick. If they improve on their 2-14 record (and they will) much of the credit will be given to O’Brien and the new quarterback. But considering how unlucky Houston was with defensive turnovers, their record is bound to improve on luck alone!

Overall, these plots lead us to believe that great defenses hold opponents to fewer yards and fewer points. But causing turnovers? That’s just lucky icing on the cake!

Offensive Turnovers

I’m not going to bore you with offensive fumbles (spoiler alert: they’re random too!). Instead I’m going to look at something that should have some correlation from the 1st half of the season to the 2nd...offensive interceptions. Surely if Peyton Manning or Tom Brady throw a low number of INTs the first half of the season, they will continue to do so in the second half. Likewise teams with bad quarterbacks should keep throwing INTs the all season long! Let’s see if the statistics agree.

Quarterback interceptions

There is a weak positive correlation between interceptions in the 1st and 2nd half of the season. So of all the turnovers that we looked at, this is the only one that isn’t completely random. The small R-squared value shows that interceptions can still be erratic, but there is at least some skill in not throwing interceptions.

Want to see why the New York Giants were so bad this season? Just look at the top right corner. But unlike the Steelers, Buccaneers, and Texans, who just got unlucky, the Giants have somebody they can pin the blame on: Eli Manning. While his older brother had arguably his best season ever, Eli had by far his worst.

Fun Football Narratives!

Now lets look across the plots to see how luck told the fortunes of two different teams, the Detroit Lions and Cincinnati Bengals.

In the first half of the season, Detroit Lions quarterback Matthew Stafford threw only 6 interceptions, which is just as many as Tom Brady and Peyton Manning. The Lions were 5-3 and in great position to make the playoffs. But Stafford self-destructed in the 2nd half of the season, throwing 13 picks. In addition, the Lions defense had only 5 interceptions (6th worst in the league) and only 4 fumbles (bottom half of the league) in the 2nd half of the year. When you combine the bad play of Stafford with the unlucky Lions defense, it’s no surprise that the Lions went 2-6 in their last 8 games and missed the playoffs.

Cincinnati Bengals quarterback Andy Dalton had almost the exact same season as Stafford, throwing 7 picks in 1st half of the season and leading the Bengals to a 6-2 record. And just like Stafford, Dalton threw a disastrous 13 interceptions in the 2nd half of the season. But his poor play was masked by the Bengals defense in the 2nd half, which had 13 interceptions (3rd best in the league) and 6 fumble recoveries (8th best in the league). The Bengals went 5-3 in their final 8 games, won the AFC North, and earned the 3 seed in the playoffs.

So the Bengals and Lions both had poor quarterback play in the 2nd half of the season. But one team was able to get defensive turnovers while the other wasn’t. And of course in the playoffs, the Bengals luck ran out. Dalton continued to play poorly, throwing two interceptions and losing a fumble, while the Bengals defense didn’t force a single turnover. And just like that, Cincinnati lost a home playoff game by 17 points.

Maybe It Really Is Better to Be Lucky than Good

Although they don’t explain everything, turnovers can go a long way in determining whether a team wins or loses. But for the most part, the frequency at which they occur is very random. And keep in mind that we looked at the entire NFL season. Imagine if you cut that sample down to one game, such as the AFC or NFC Championship! This Sunday four relatively even teams will play for a chance to go to the Super Bowl. Don’t be surprised if the winning teams are victorious because they got a lucky bounce on a turnover. But good luck trying to figure out which team will get that bounce!

This data analysis showed us that turnovers are great at explaining why a team won or lost. But they are so random, it’s almost impossible to use them to predict who will win a game!

Feeder Machine I had the opportunity to speak with a great group of students from the New Jersey Governor’s School of Engineering and Technology—a summer program for high-achieving high school students. Students in the program complete a set of challenging courses while working in small groups on real-world research and design projects that relate to the field of engineering. Governor’s School students are mentored by professional engineers as well as Rutgers University honors students and professors, and they often work with companies and organizations to solve real engineering problems.

The team of students I talked to partnered with Silver Line by Andersen, a leading U.S. manufacturer of vinyl windows and a subsidiary of Andersen Corporation, to assess the manufacturer’s processes and identify possible improvements. The students took on the role of Silver Line industrial engineers and looked for opportunities to boost productivity and decrease costs.

The Challenge: Investigate the Process

Silver Line offers various types of glass, in addition to custom window and door shapes, and offers more than 460 different configurations of windows and doors. The company’s complex manufacturing process involves numerous steps, many inputs and suppliers, and several machines and operators. With guidance from Silver Line engineers and mentors from Rutgers, the students were asked to investigate the window-making process at Silver Line’s New Jersey facility and propose process improvements that would result in the highest return on investment for the company.

The team structured their investigation on the Six Sigma methodology, and used the DMAIC approach to frame their project into five phases: define, measure, analyze, improve, and control. They began going on-site to the facility to better understand each step of the manufacturing process. “In working side-by-side with Silver Line, we quickly realized the amount of data that needed to be collected was extraordinary,” says team member Nikhil Shukla. “We found that even the smallest of details could have an enormous effect on our end results.”

The team used the process data they collected to create visual diagrams, including SIPOC, Value Stream Maps, and process maps, in Quality Companion.

The students used the SIPOC template in Companion to define the scope of the process and its principal suppliers, inputs, outputs, and customers. The diagram helped the team understand exactly what was needed to begin a certain process, while also clarifying the transition between starting and finishing materials.

SIPOC in Quality Companion

The SIPOC above summarizes Silver Line’s window-making process from glass-cutting to window assembly.

In addition, the students built a Value Stream Map in Quality Companion that showed the flow of materials and information from the beginning of the process to its end. This not only helped them to map out the completion of an order from a customer’s initial request to product shipment, but also indicated which steps of the process added value, and which did not.

Value Stream Map in Quality Companion

The Value Stream Map above demonstrates one window’s progression through the assembly line at the factory. The facility operates multiple assembly lines simultaneously, handling various thicknesses and types of glass.

The data the students collected on the window-making process was also organized with process maps. Using Quality Companion to build their process maps made it easy to individually depict the various stages of the overall process, including class cutting, edge deletion, spacer attachment, argon application, butyl application, and final assembly, while also recording the associated variables and metrics to process steps.

“The process maps were a critical part of everyone on the team developing a thorough understanding of the processes we observed from start to finish,” says team member David Liedtka.

“We were constantly presenting the progression of our project to Silver Line employees of various ranks with different levels of understanding of our project, and Companion organized our data in a way that was easy to explain even to employees with no previous quality improvement background,” adds Jenna Ritchie, another member of the student project team.

The team’s initial investigation found that a prominent challenge faced by the company is glass breakage at various points of the window making process. To brainstorm the potential causes of glass breakage, they constructed a fishbone diagram in Companion.

Fishbone in Quality Companion

Using the Man Machines Materials fishbone template in Quality Companion, the team could easily define possible causes of glass breakage and structure those causes into categories.

Now, with a comprehensive list of possible causes for glass breakage, the students investigated the causes further and created a list of solutions. To determine which of the solutions would be best for Silver Line to focus on, the students wanted to take into account how each solution compared in terms of long-term benefits, cost of implementation, and the severity of the problem it solved. With the Quality Companion Pugh Matrix template, the team was able to develop weight criteria they could use to evaluate each solution. The weight criteria were multiplied by each solution’s value, and then these totals were added together for each solution. The higher the final sum for each solution, the greater the opportunity to make a worthwhile change.

By using the Pugh Matrix, the team was able to identify the best recommendations for improving Silver Line’s window-making processes—and decrease the amount of glass breakage that occurred on the line.

Presenting Cost-Effective Solutions

By visiting on-site at Silver Line and analyzing the window-making process, the team of New Jersey Governor’s School students identified four specific areas of improvement to focus on. In addition to the long-term savings these solutions would have, the recommended changes were assessed as being cost-effective and relatively easy to implement.

Silver Line was impressed with the work of the students and is taking their recommendations into consideration. “We especially appreciated that the solutions the students presented addressed many different aspects of our overall process, which will allow us to attack process improvements from a variety of angles,” says Scott Steurer, quality assurance manager at Silver Line.

As for the students, the experience of working with Silver Line on behalf of the New Jersey Governor’s School was invaluable. “Doing our project helped me to see the attention to detail required in industry, and how even the smallest inefficiencies are magnified in large-scale manufacturing,” says David Liedtka.

For other students on the team, the experience solidified a future career path. “Before Governor’s School, I had a very limited understanding of industrial engineering,” adds Bertha Wang, another team member. “This project has shown me the depth and detail necessary for industrial engineering, as well as the unlimited application of statistics.”

The students’ mentor, Brandon Theiss, an industrial engineer who has worked in the field for over 10 years, agrees the experience of tackling real-world problems with real-world software is pivotal. “The New Jersey Governor’s School provides students with real-world experience that just can’t be taught in classrooms,” Theiss says. “It’s a tremendous eye-opener for students to apply what they’ve learned to help companies get things done better, faster, and in the most cost-effective way possible.”

Interested in sharing how you use Minitab software in a blog post? Email us at publicrelations@minitab.com.

College basketball stat guru Ken Pomeroy uses advanced metrics to rank every NCAA Division I basketball team. Amongst the numerous statistics he tracks is one called "Luck."

This statistic is calculated as the difference between a team’s actual winning percentage, and what one would expect their winning percentage to be based on how many points they score and how many they allow.

What it really boils down to is close games. In theory, you should win about half of your close games and lose half. If you win most of your close games, you'll have a high luck statistic in the Pomeroy Ratings. Lose most of your close games, and your luck statistic will be low.

For example, take the Penn State basketball team. Out of 351 teams, they are ranked 333rd in Pomeroy’s luck statistic. And not surprisingly, they are 1-5 in 1-possession or overtime games. The most recent gut punch came last weekend, when Purdue hit a game-tying 3 pointer with seven seconds left, but only because the refs didn’t see Purdue's coach trying to call a timeout. Penn State then turned the ball over and fouled Purdue with 1 second left. A made Purdue free throw sent Penn State to their 5th loss in a close game.

But there are two ways to look at this. On the one hand, Penn State got unlucky that the refs missed the timeout and that Purdue hit the 3 (as a team, Purdue makes only 34% of their 3 point attempts). On the other hand, Penn State made its own bed by turning the ball over and then fouling. That wasn’t random luck, it was something they controlled. Perhaps the pressure is too much for them to handle, and that is why they are losing most of their close games.

But my curiosity isn’t just about Penn State. I want to look at the larger picture. Are close games really just completely random? Or do teams that continuously win close games have a “skill” for doing so, while teams that continuously lose close games do so because they can’t handle the pressure? With the help of Pomeroy’s luck statistic and Minitab Statistical Software, I plan on finding out!

The Luckiest and Unluckiest Teams in the Country

Through games of Sunday, January 19th, I collected the 20 luckiest and unluckiest teams in the country according to Pomeroy’s luck statistic. I also collected 20 teams right in the middle, with their luck being just about 0. I named the 3 groups “Lucky,” “Unlucky,” and “Neutral.”

For each team I noted their record in games decided by 6 points (2 possessions) or less, or games that went into overtime (regardless of the final score, since at the end of regulation it was obviously a close game). I also split out games decided by 3 points (1 possession) or less. Again, all overtime games were included regardless of the final score.

Each Lucky/Unlucky/Neutral group had over 100 games in their “2 possession” sample and at least 60 games in their “1 possession” sample. You can get all the data here (if you do look at the data, a final score margin of 99 means the game went to overtime).

Okay, now that the boring part is out of the way, let’s see what our data looks like!

Describe

It appears that the luck statistic does exactly what it says it should do. Lucky teams have a great record in close games, while unlucky teams have a terrible record in close games. And sure enough, teams in the middle have about a .500 record in close games.

We can use a dotplot to visualize the records of every team in the sample.

Dotplot

At the individual team level, the same trend holds up. Unlucky teams are terrible in close games. We see that in 2-possession games, not a single unlucky team has a winning percentage higher than 33%. And even worse, 12 of the unlucky teams haven’t won a single game decided by 1 possession or that went into OT! Combined, those 12 teams have a record of 0-30 in 1-possession or overtime games!

And one more quick note about the unlucky teams: I said I collected 20 teams in each group (and I did), but if you count the green dots you'll notice there are only 19 unlucky teams. That's because despite being the 6th unluckiest team, Dartmouth didn't play a single game decided by 6 points or fewer. Ergo, they don't have a winning percentage in close games yet this season. They are 0-4 in games decided between 7-10 points, so I guess that's unlucky? I dunno. Sometimes statistics are weird.

Anyway, on the flip side, lucky teams were very good in close games. Other than Utah Valley (who was ranked #17 in luck despite going 3-4 in 2-possession games), all the lucky teams won at least 60% of their 2-possession/OT games. And when you look at 1-possession or OT games, 9 teams were undefeated! Combined, those 9 teams have a record of 25-0 in 1-possession or overtime games!

Lastly, as expected, our neutral teams are mainly grouped in the middle.

And because I’m sure you’re dying to know, according to Pomeroy the luckiest team in the country is...Nicholls St! What? Who are they? I guess that was kind of anticlimactic. How about the unluckiest team in the country? That would be Temple! So don’t worry Penn State fans, not only are you not the unluckiest team in the country, you’re not even the unluckiest team in the state of Pennsylvania.

Wait, Weren’t You Going to Determine if Close Games were Luck or Skill?

Oh right, about that... We’ve taken 60 different teams and shown in the 1st half of the NCAA basketball season (okay, we’re a tad more than halfway through but close enough) a third of them are really good at wining close games, a third are really bad at it, and a third are pretty average. Now we’re going to see how those same 60 teams do in the 2nd half of the season. If there is skill involved at winning/losing close games, the unlucky group will continue to lose while the lucky group will continue to win. But if it is truly random, we would expect all three groups to look more like our “Neutral” group did above. That is, despite how good or bad your close game record is at this point, you’re still going to win about half of your close games going forward.

So if you were expecting an answer anytime soon, I apologize (hey, it does say “Part I” in the title!). But make sure to check back in March when we can see how all of our teams did. Until then, let's hope those unlucky teams can start turning things around so they avoid having tweets about them like this.

Correlation Is not Causation: Why Running the Football Doesn’t Cause You to Win Games in the NFL

Tooltips, Assistant Menu, and Help: The 5 Coolest Things You Didn't Know You Could Copy From Minitab

Statistically, How Thankful Should We Be: A Look at Global Income Distributions, part 1

Statistically, How Thankful Should We Be: A Look at Global Income Distributions, part 2

The Value Stream Map: It's Been Around Longer than You Think

Avoiding a Lean Six Sigma Project Failure, part 4

Doggy DOE Part III: Analyze This!

See How Easily You Can Do a Box-Cox Transformation in Regression

Normality Tests and Rounding

Will Playoffs Make College Football's Regular Season Less Exciting?

Fix Problems in Regression Analysis with Partial Least Squares

Regression Analysis Tutorial and Examples

Is There a World Cup "Group of Death"?

Does Peyton Manning Play Worse in Cold Weather?

Quantum Estimates: Where Angels Fear to Tread

Use the Minitab Assistant to Choose a Graph

How High Should R-squared Be in Regression Analysis?

A Statistical Look at How Turnovers Impacted the NFL Season

Creating a Shatterproof Process: Students Use Six Sigma to Improve Window Manufacturing

Analyzing “Luck” in College Basketball: Part 1