Minitab | Minitab

Fourier nonlinear function Previously, I’ve written about when to choose nonlinear regression and how to model curvature with both linear and nonlinear regression. Since then, I’ve received several comments expressing confusion about what differentiates nonlinear equations from linear equations. This confusion is understandable because both types can model curves.

So, if it’s not the ability to model a curve, what is the difference between a linear and nonlinear regression equation?

Linear Regression Equations

Linear regression requires a linear model. No surprise, right? But what does that really mean?

A model is linear when each term is either a constant or the product of a parameter and a predictor variable. A linear equation is constructed by adding the results for each term. This constrains the equation to just one basic form:

Response = constant + parameter * predictor + ... + parameter * predictor

Y = b o + b1X1 + b2X2 + ... + bkXk

In statistics, a regression equation (or function) is linear when it is linear in the parameters. While the equation must be linear in the parameters, you can transform the predictor variables in ways that produce curvature. For instance, you can include a squared variable to produce a U-shaped curve.

Y = b o + b1X1 + b2X12

This model is still linear in the parameters even though the predictor variable is squared. You can also use log and inverse functional forms that are linear in the parameters to produce different types of curves.

Here is an example of a linear regression model that uses a squared term to fit the curved relationship between BMI and body fat percentage.

Linear model with squared term

Nonlinear Regression Equations

While a linear equation has one basic form, nonlinear equations can take many different forms. The easiest way to determine whether an equation is nonlinear is to focus on the term “nonlinear” itself. Literally, it’s not linear. If the equation doesn’t meet the criteria above for a linear equation, it’s nonlinear.

That covers many different forms, which is why nonlinear regression provides the most flexible curve-fitting functionality. Here are several examples from Minitab’s nonlinear function catalog. Thetas represent the parameters and X represents the predictor in the nonlinear functions. Unlike linear regression, these functions can have more than one parameter per predictor variable.

Nonlinear functionOne possible shape Power (convex): Theta1 * X^Theta2 Power function in nonlinear regression

Weibull growth: Theta1 + (Theta2 - Theta1) * exp(-Theta3 * X^Theta4) Weibull growth function in nonlinear regression

Fourier: Theta1 * cos(X + Theta4) + (Theta2 * cos(2*X + Theta4) + Theta3 Fourier function for nonlinear regression

Here is an example of a nonlinear regression model of the relationship between density and electron mobility.

Nonlinear regression model for electron mobility

The nonlinear equation is so long it that it doesn't fit on the graph:

Mobility = (1288.14 + 1491.08 * Density Ln + 583.238 * Density Ln^2 + 75.4167 * Density Ln^3) / (1 + 0.966295 * Density Ln + 0.397973 * Density Ln^2 + 0.0497273 * Density Ln^3)

Linear and nonlinear regression are actually named after the functional form of the models that each analysis accepts. I hope the distinction between linear and nonlinear equations is clearer and that you understand how it’s possible for linear regression to model curves! It also explains why you’ll see R-squared displayed for some curvilinear models even though it’s impossible to calculate R-squared for nonlinear regression.

If you're learning about regression, read my regression tutorial!

We received the following question via social media recently:

I am using Minitab 17 for ANOVA. I calculated the mean and standard deviation for these 15 values, but the standard deviation is very high. If I delete some values, I can reduce the standard deviation. Is there an option in Minitab that will automatically indicate values that are out of range and delete them so that the standard deviation is low?

In other words, this person wanted a way to automatically eliminate certain values to lower the standard deviation.

Fortunately, Minitab 17 does not have the functionality that this person was looking for.

Why is that fortunate? Because cherry-picking data isn’t a statistically sound practice. In fact, if you do it specifically to reduce variability, removing data points can amount to fraud.

When Is It OK to Remove Data Points?

So that raises a question: is it ever acceptable to remove data? The answer is yes. If you know, for a fact, that some values in your data were inappropriately attained, then it is okay to remove these bad data points. For example, if data entry errors resulted in a few data points from Sample A being entered under Sample B, it would make sense to remove those data points from the analysis of Sample B.

But you may encounter other suggestions for removing data. Some people will use a "trimmed" data set. This means you remove the top and bottom 1-2 samples. Depending upon what the data is, and how you plan to use it, this too can be fraud.

Some people will use the term "Data Cleansing." When they do this, they remove a few data points from a large data set. The end results tend to be minimal on data analysis. But when this changes the end results of an analysis, it again can amount to fraud.

The bottom line? If you don't know for certain that the data points are bad, removing them—especially to change the outcome of an analysis—is virtually impossible to defend.

Finding and Handling Outliers in Your Data

Minitab 17 won't automatically delete values to make your standard deviation small. However, our statistical software does make it easy to identify potential outliers that may be skewing your data, so that you can investigate them. You can access the outlier detection tests at Stat > Basic Statistics > Outlier Test…

You can also look at specific statistical measures that indicate the presence of outliers in regression and ANOVA.

Of course, before removing any data points you need to make sure that the values are really outliers. First, think about whether those values were collected under the same conditions as the other values. Was there a substitute lab technician working on the day that the potential outliers were collected? If so, did this technician do something differently than the other technicians? Or could the digits in a value be reversed? That is, was 48 recorded as 84?

If you have just one factor in an ANOVA, try using Assistant > Hypothesis Tests > One-Way ANOVA… Outliers will be flagged in the output automatically:

You could then run the analysis again after manually removing outliers as appropriate.

You also can use a boxplot chart to identify outliers:

Finding Outliers in a Boxplot

As you can see above, Minitab's boxplot uses an asterisk (*) symbol to identify outliers, defined as observations that are at least 1.5 times the interquartile range from the edge of the box. You can easily identify the unwanted data point by clicking on the outlier symbols so you can investigate further. After editing the worksheet you can update the boxplot, perhaps finding more outliers to remove.

Are Your Outliers "Keepers"?

While Minitab won't offer an automated "make my data look acceptable" tool, the software does make it easy to find specific data points that may take the results of your analysis in an in inaccurate or unwanted direction.

However, before removing any "bad" data points you should understand their causes and be sure you can avoid recurrence of those causes in the actual process. If the "bad" data could contribute to a more accurate understanding of the actual process, removing them from the calculation will produce wrong results.

The Six Sigma students at Rose-Hulman Institute of Technology are at it again! A few months back, we blogged about the Six Sigma project they did to reduce food waste at the on-campus dining center.

This time, the students—lead by Dr. Diane Evans, Six Sigma black belt and associate professor of mathematics at Rose-Hulman—are performing a Lean Six Sigma project to reduce the amount of recycling thrown in the normal trash cans in all of the institution's academic buildings, including the library. Unfortunately, the recycling bins Rose-Hulman has placed all over its campus are underused, and it’s very common for recyclable items to still be thrown in the regular trash.

The students are currently in the midst of completing the DMAIC improvement project (they’ve completed the Define, Measure, and Analyze phases, and are currently working on Improve), but I thought it would be interesting to do a quick recap of where they’re at with the project right now.

A Messy Adventure: Gathering Baseline Data

First things first: the students had to gather baseline data for the amount and weight of recycling that was thrown into the normal trash cans. You can even see them in action in this YouTube video: https://www.youtube.com/watch?v=cxTROuJOdPU!

By sifting through trash from the academic buildings over the course of two weeks, the students were able to say that they are 95% certain that the interval [0.315, 0.421] contains the true mean percentage of recyclable waste (by weight) discarded in the trash.

To put these numbers into perspective, according to a study by the EPA in 2012, only 87 million tons of trash were recycled from the 251 million tons of trash produced. This is an average recycling rate of 34.5% in the U.S. You can see that the interval the students came up with for the recycling rate at Rose-Hulman is consistent with the overall recycling rate for the U.S.

Check out this Pareto Chart of the type of trash (and recyclables) the students found in the regular trash:

They used the Pareto above to confirm that a lot of ‘coffee cup trash’ from the local coffee shop winds up in the trash cans in the academic buildings. Unfortunately, the cardboard coffee cups are not recyclable because of their waxed inner linings. The students can’t control the type of material that’s used to make these cups, but they can put this on their list of recommendations for the future.

Using Attribute Agreement Analysis

As part of the measure phase, Dr. Evans and the students ran an attribute agreement analysis to ensure that the students sifting through garbage could visually distinguish between different paper thicknesses, which would tell them that the data they were collecting about paper recyclables was consistent across teams.

What prompted this analysis? When Dr. Evans received spreadsheets detailing the recyclable paper items students were collecting, she saw that students were measuring the amounts in terms of the thickness of paper in inches. She thought, “Can they really tell the thickness that well?”

The students ran an attribute agreement analysis with the following categories, :

Category 1: Total paper thickness less than 1/8 = 0.125 inch

Category 2: Total paper thickness between 1/8 and 1/4 = 0.25 inch

Category 3: Total paper thickness between 1/4 and 1/2 = 0.5 inch

Category 4: Total paper thickness between 1/2 and 1 inch

Category 5: Total paper thickness greater than 1 inch

Before class one day, Dr. Evans made five groups of paper that fell into each of these categories. She then walked around and showed the different project teams stacks of paper in random order, replicated three times.

The results ended up clearly showing that the teams couldn’t distinguish paper thicknesses by inches, especially for the categories below a half of an inch. Thus, when the students go back in the field during the improvement phase, they’ll be required to take calipers with them. After all, if you can’t trust your measurement system, you can’t trust your data!

Critical to Quality or CT Tree

As part of the early DMAIC phases, the students also created a ‘Critical To Quality' or CT Tree in Quality Companion to work downward and identify factors “critical to” reducing the amount of recyclable waste that was showing up in the trash cans on campus.

The CT tree helped the students brainstorm and start to flesh out the ideas they had for improving recycling at Rose-Hulman.

Using Fishbones to Brainstorm

The students also used fishbone diagrams to further investigate the ideas they had for improving recycling rates, in addition to what was causing the lack of recycling in the first place. Check out these two examples of the students' fishbone diagrams:

Now, on to the ‘Improve’ phase … stay tuned! More to come…

For further reading, be sure to check out:

Rose-Hulman Institute of Technology Reduces Food Waste, Reveals Cost Savings with Six Sigma

Whether you’re just learning statistics or you're already using data analysis on the job, there are not many tools more straightforward than a bar chart. Bar charts are effective at getting across their message, and are used in a diverse number of fields, from service quality to pharmaceuticals to manufacturing.

However, I’ve noticed recently that a lot of customers looking to create a bar chart are surprised at how many different options are presented when you go to Graph > Bar Chart... in Minitab. There are, in fact, 14 different options based on 1) what you want your final graph to represent, and 2) how your data are set up.

Selecting the right option can be a little daunting, especially for a beginner. I’m going to take a look at how these different cases are handled, so you can have a better idea which choice makes sense for your situation. Today I’ll cover the first three.

If you go to Graph > Bar Chart..., the first option you’ll see is a drop down titled ‘Bars Represent,’ which is a detailed way of asking how you want Minitab to calculate how tall your bars should be. The first choice is ‘Counts of unique values,’ which has three options (Simple, Cluster, and Stack).

Bar Chart Options for Counts of Unique Values

Simple

The first option is the simplest, which you can probably tell by the name. It’s asking for one column of data, and the height of the bars will be determined by the frequency of each unique category name occurring in that column. Here is an example of how your column should appear, alongside a completed dialog. (This data is a count of the color of M&M’s in one sample bag.)

The resulting graph shows that Minitab counted the number of times each color occurred, and plotted that count against the color to form the chart.

Cluster

Easy enough, right? But what if we have a second group? You can use Cluster if we have a second categorical column, much like the first, and we want to plot counts of two separate categorical variables. In this scenario, we want to see one of our groups “clustered” together along the x-axis. For example, we can look at different types of defects happening in different months. Our data sheet has one column for defects, and one for the month, like this:

It’s very similar to the first case, with just an extra sorting column. And the dialog seems straightforward, right? Just enter the variables.

But in what order? And what does ‘outermost first' mean? To see, let’s create our graph:

The finished graph allows us to see what “Outermost” refers to. When you group by 2 variables, the x-axis needs a hierarchy. Outermost refers to the group that gets sorted first. Within the outermost variable comes the next variable, and so on. It’s the last, or innermost category (in our case, Defect Type) that ends up being clustered. When filling out this dialog, the question you need to ask yourself is, "How do I want my groups ordered? Which do I want clustered together?"

Stack

Stack works very similar to the cluster chart, and in fact can use the same data setup. Let’s just get right into the dialog box.

We have a very similar dialog here. We still need to keep in mind the hierarchy of sorting along the x-axis, but we do have one additional option, and that is to make sure the ‘Stack’ option is checked. Instead of clustering our innermost categorical variable like the last chart, Minitab will stack it instead. We can see the final product below:

As you can see, nearly the same dialog produces differing results. Instead of two bars for each month, we see one bar, with two sections stacked, one on top of another.

I hope this gives you a better idea of how to navigate the first three types of bar charts available in Minitab. Check back soon when I'll be detailing another Bar Chart option, 'Function of a Variable.'

OECD member nations, 2012 There’s a lot going on in the world, so you might not have noticed that the Organization for Economic Development (OECD) released their new set of health statistics for member nations. On the OECD website, you can now download the free data series for 2014. (Be aware that “for 2014” means that the organization has a pretty good idea about what happened in 2012.)

Of course, there’s nothing more fun than sharpening your Minitab skills with real data. Each time the OECD releases their data, we hear about how much money is spent per person on health care compared to how long people live in that nation. (For example, 2013, 2012, and 2011) We also tend to be treated to graphics similar to the scatterplot below, typically with clever variations for the symbols:

Life expectancy generally increases with per capita health expenditures.

In this display, the point on the far right appears to be an outlier, spending more per capita than other nations but not fitting the general trend of increasing life expectancy as spending increases. When you see an apparent outlier, you want to investigate.

Investigating Outliers with Brushing

Brushing is a feature in Minitab that makes it easy to investigate outliers in graphs. For example, from the scatterplot in Minitab, you can fit simple regression models with and without the suspected outlier to see how great the influence is on the model. You can copy and paste the data at the end of the post if you want to follow along.

Here’s how to do a visual sensitivity analysis of data on a scatterplot with brushing. Start by making the graph:

Choose Graph > Scatterplot.
From the gallery of scatterplots, select Simple. Click OK.
In Y variables, enter ‘Life Expectancy’.
In X variables, enter ‘Per Capita Health Expenditures’. Click OK.

Turn on brushing mode and set an ID variable to get specific information about the outlier. With the scatterplot window active, follow these steps:

Choose Editor > Brush.
Choose Editor > Set ID Variables.
In Variables, enter Country 'Life expectancy' 'Per capita health expenditures'. Click OK.
On the graph, click and drag to cover the unusual point.

In the brushing window, you can see that the unusual point is the United States, row 34 in the data set. You can also see the specific values of life expectancy and per capita health expenditures.

In the United States, average life expectancy is 78.7 years and per capita health expenditures are $8,745.

To do the regression with and without the outlier, use brushing to create an indicator variable:

With the brushing window still showing, choose Editor > Create Indicator Variable.
In Column, enter United States.
Select Update now. Click OK.
Choose Editor > Select.
Choose Editor > Select Item > Symbols.
Choose Editor > Edit Symbols.
Select the Groups tab.
In Categorical variables for grouping, enter ‘United States’. Click OK.

The United States is indicated by the red square.

With groups on the graph, you can do separate regression fits for the groups.

Choose Editor > Add > Regression Fit.
Select Quadratic.
Check Apply same groups of current displays to regression fit. Click OK.
Choose Editor > Add > Regression Fit.
Select Quadratic. Click OK.

The red curve, with the United States, slopes downwards more slowly than the blue curve.

On the graph, the red curve shows that when you include the United States in the data, the decrease in life expectancy is slower than when you exclude the United States. Bonus tip: Hover over each curve in Minitab to see the regression equation used to create the curve.

Using Brushing in Other Types of Graphs

Brushing is a great tool in Minitab for investigating specific points on a graph, but it doesn’t just work on scatterplots. If you’re ready for more, you can see the complete list of graphs you can brush and how to make a graph that excludes brushed points. Prefer an example? Check out how Patrick Runkel uses brushing to study the relationships on a bubble plot!

Here’s the data for the per capita health care expenditures in United States dollars and the life expectancy for the total population at birth.

Per Capita Health Expenditures

Life Expectancy

Country

3997

82.1

Australia

4896

81.0

Austria

4419

80.5

Belgium

4602

81.5

Canada

1577

78.9

Chile

2077

78.2

Czech Republic

4698

80.1

Denmark

1447

76.5

Estonia

3559

80.7

Finland

4288

82.1

France

4811

81.0

Germany

2409

80.7

Greece

1803

75.2

Hungary

3536

83.0

Iceland

3890

81.0

Ireland

2304

81.8

Israel

3209

82.3

Italy

3649

83.2

Japan

2291

81.3

Korea

4578

81.5

Luxembourg

1048

74.4

Mexico

5099

81.2

Netherlands

3172

81.5

New Zealand

6140

81.5

Norway

1540

76.9

Poland

2457

80.5

Portugal

2105

76.2

Slovak Republic

2667

80.2

Slovenia

2998

82.5

Spain

4106

81.8

Sweden

6080

82.8

Switzerland

984

74.6

Turkey

3289

81.0

United Kingdom

8745

78.7

United States

Did you know that this year the American Statistical Association (ASA) is celebrating its 175th anniversary? That’s a pretty significant birthday!

On the ASA’s 175th anniversary webpage, they publish blog posts periodically that cover ASA happenings, such as anniversary events and celebrations, as well as interesting tidbits about the organization. Recently, they published a post covering the demographics of ASA membership during its 175th anniversary year. You can read the full post here.

I thought it’d be fun to show off this demographic data using Pie Charts in Minitab 17. I don’t know about you, but I find it much easier to digest data that’s been brought to life graphically!

Below you’ll see a pie chart that highlights the percent of ASA members who are women versus those who are men. (Keep in mind that of the ASA’s about 19,000 members, only 75 percent have provided demographic information, such as gender, age, race, etc.) Women make up about 1/3 of ASA members.

Minitab Pie Chart Trick: Do you see how it appears that the red female pie slice has “exploded” in the chart above? To expand a slice while you’re working with Minitab Pie Charts, select and double-click on the pie or slice of the pie and click on the Explode tab.

It is also worth noting that women make up 43% of ASA members under the age of 45, but they make up only 23% of ASA members who are 45 years of age or older:

Minitab Pie Chart Trick: You can create a “double” pie chart in Minitab by arranging your data in different layouts in the worksheet. For a step-by-step tutorial on how to do this in Minitab, check out the support article here.

I know I said this post would include pie charts, but while we’re on the subject of age, take a look at this bar chart of ASA membership by age:

17% of members are 30 or under, 38% are 40 or under, 58% are 50 or under, and 76% are 60 or under. Only 2% of members have reached the age of 80.

And what about education? The demographic data shows that about 89% of ASA members have advanced degrees, either at the master's or doctoral level:

Where are ASA members working? Most work in industry (47%), closely followed by academe (43%), and then government (10%):

Where are ASA members from? Not surprising, but most are from the United States and Canada, with several other countries falling inside the ‘Other’ slice—including (from largest to smallest) Japan, Australia, United Kingdom, China (including Hong Kong), Germany, Italy, Switzerland, South Korea, Belgium, and the Netherlands. It’s also noted in the ASA blog post, but I did find this surprising: people from 94 countries belong to ASA and 1 in 9 members are from outside the United States.

Happy 175th ASA – here’s to many, many more years!

When is the last time you created a pie chart? Tell us in the comments section below.

Also, check out the Pie Chart entry on the Minitab 17 Support Center, where you can learn about creating different types of pie charts in Minitab.

The Minitab Fan section of the Minitab blog is your chance to share with our readers! We always love to hear how you are using Minitab products for quality improvement projects, Lean Six Sigma initiatives, research and data analysis, and more. If our software has helped you, please share your Minitab story, too!

Once my Mom was diagnosed with Diabetes Type II, I began to track her blood sugar readings in Minitab Statistical Software.

I did it three times a day before meals...over weeks, then months, then years. At each doctor's appointment I would take in her 'book' of readings, and I would take my charts, too.

The individual value plot chart was very telling. Her blood sugars increased with each meal during the day. The doctor changed her insulin based on the undeniable visual trends.

Then the biggest surprise came. In June 2013, over a year after blood sugar tracking began, she decided to get back surgery to alleviate leg pain. The day after her back surgery, still in the hospital, her blood sugar dropped approx. 75 points. After a few months of this obvious transition and new trend, the doctor removed her from her diabetes medicine and insulin.

Minitab charts help guide my Mom's health, even in her 80's. She is 86 today, and is still off insulin and diabetes medicine. And many Minitab Statistical Software charts are a part of my Mom's health history, kept in her doctor's office records.

Paul Kelly
Black Belt
Air Products
Trexlertown, Pa.

After upgrading to the latest and greatest version of our statistical software, Minitab 17, some users have contacted tech support to ask "Wait a minute, where is that Two-Way ANOVA option in Minitab 17?"

The answer is that it’s not there. That’s right! The 2-Way ANOVA option that was available in Minitab 16 and prior versions was removed from Minitab 17. Why would this feature be removed from the new version? Shouldn’t the new version have more features instead of less?

Two-Way ANOVA was removed from Minitab 17 because you can get the same output by using the General Linear Model option in the ANOVA menu. Removing the separate 2 way ANOVA menu choice reduces redundancy and creates a more similar workflow for the linear models options.

Let's look at an example that shows how to replicate the Two-Way ANOVA output from Minitab 16 using Minitab 17.

The data shown below is a sample dataset used for 2-Way ANOVA in Minitab 16: You as a biologist are studying how zooplankton live in two lakes. You set up twelve tanks in your laboratory, six each with water from one of the two lakes. You add one of three nutrient supplements to each tank and after 30 days you count the zooplankton in a unit volume of water. You use two-way ANOVA to test whether there is significant evidence of interactions and main effects.

The Two-Way ANOVA option in Minitab 16 yields the following output:

To replicate the Two-Way ANOVA output from Minitab 16 using Minitab 17, use Stat> ANOVA> General Linear Model> Fit General Linear Model:

Using GLM, we can enter our response column (Zooplankton) in the Responses field and our two factors in the Factors field without the need to specify one factor as the row and one as the column factor:

Minitab 16's Two-Way ANOVA option also shows the two-factor interaction, so in Minitab 17 we need to manually add the interaction by clicking the Model button in the GLM dialog box. There we can highlight the factors listed on the left side (step 1 below); when we do that, the Add button on the right will become available. To add the interaction, click Add (step 2) and the interaction will be shown at the bottom under Terms in the model (step 3).

Click OK in the Model dialog box to return to the main GLM dialog.

By default, Minitab 17 will provide more detailed output than Two-Way ANOVA in Minitab 16. To make the results match, we can remove the additional output by clicking the Results button within the GLM dialog box. Unchecking the additional options so that only Analysis of variance and Model summary are selected (as shown below) will make the output match Minitab 16’s Two-Way ANOVA results.

The results from General Linear Model in Minitab 17 now match the output from Two-Way ANOVA in Minitab 16:

If you're wondering how to do something with Minitab, our technical support team is always ready to help you. Our technical support representatives are knowledgeable in statistics, quality improvement, and computer systems. Best of all, our assistance is free.

A few weeks ago I looked at the number of goals that were being scored in the World Cup. At the time there were 2.9 goals per game, which was the highest since 1970. Unfortunately for spectators who enjoyed the higher scoring goals, this did not last.

By the end, the average had fallen to 2.7 goals per game, the same amount scored in the 1998 World Cup. After such a high-scoring start, the goals per game fell off and ended up being pretty similar to other recent World Cups.

What happened?

Comparing the Group Stage to the Knockout Stage

After 15 straight days of games in the group stage, there had been 2.8 goals scored per game. But when the knockout stage started, the amount of goals per game dropped to 2.2. Will a 2-sample Poisson rate test show us that this is a significant difference? I’m using a Poisson rate test instead of a 2-sample t-test because goals are counts of occurrences, and not a continuous variable like length. You can get the data I used here.

2-sample Poisson rate test

The p-value for this hypothesis test is 0.144, which is greater than 0.05. That means we can’t conclude the difference is significant.

However, let me point out two things. First, we have a pretty small sample size for games in the knockout stage. It’s possible that the difference is significant, but our test doesn’t have enough power to detect it. Second, the Knockout stage contains a pretty big outlier. Remember when Germany destroyed Brazil 7-1? An 8-goal game is not very typical in soccer. And if we remove that observation...

2-sample Poisson rate test

Voila! The average goals per game for the knockout stage drops to 1.8, and the difference becomes statistically significant since the p-value is less than 0.05.

Is it possible that teams play differently in the knockout stage than they do in the group stage? After all, in the group stage, the margin you win by matters, since goal differential is one of the tie breakers. But once you get to the knockout stage, a 1-0 win is just as good as a 4-0 win. Teams might play defensively after obtaining a lead in hopes of not allowing an equalizer. And heavy underdogs also might play a conservative, defensive game, hoping to go the distance with the score being 0-0 and taking their chances on penalty kicks.

But before we go saying there is a difference between the two stages, let’s look at some more data.

The 2010 World Cup

I went back 4 years and collected the goals scored per game in the 2010 World Cup, then ran another 2-sample Poisson rate test.

2-sample Poisson rate test

Not only is the p-value not significant, the rate of goals scored per game in the knockout stage is actually higher than the rate in the group stage. So throw out any theories about teams playing more conservative in the knockout stage!

It appears that from game to game, goals in soccer can be pretty random. In a small sample, you can have a run of high-scoring games or one very high-scoring game, as we saw with Germany and Brazil. But trying to determine a reason for the pattern can be folly. Sometimes, you just have to chalk things up to random variation!

Right, Brazil?

Already relaxed on his first day in Napa, Brutus and his wife Suzy decide to visit their favorite winery just before lunch to taste their new Cabernet Sauvignon. The owner recognizes them as they walk in the door and immediately seats them on the patio overlooking the vineyard. Two glasses appear, and as the owner tells them about the new Cabernet, Brutus prepares for an onslaught of blackberry and plum flavors, with some notes of vanilla and leather.

Napa Valley

The wine is poured and as Brutus swirls it in his glass and breathes in the aromas, he picks up some of the oak infused into the wine as it aged in barrels in a cellar beneath where he sits. He takes a first sip, and, ignoring the obvious blackberry and plum flavors, he searches for the vanilla and leather. He has little difficulty finding them. As a second sip fills his mouth, he asks the owner if maybe he is detecting some chocolate...the owner pours a glass for himself and after a couple of sips agrees with Brutus.

Brutus, a Minitab employee, is an experienced wine drinker who has no doubt experienced many wine tastings not altogether different from what was described above.

But what if the tasting were a little different? What if...

One Thursday afternoon, at the time listed in his Outlook calendar, Brutus enters a conference room and is asked to sit at the table and place a blindfold on. A tasting of wine is poured, and Brutus is asked to tell the experimenters whether the wine is red or white, and which of four types it is. The experimenters don't tell him anything about each wine before it is poured, and offer no information after each tasting. When he is done he takes his blindfold off and goes back to his desk to continue working.

In the absence of visual senses and being "primed" to expect certain flavors, can Brutus, who at the winery could detect even subtle flavor notes in a wine, determine even the most basic information about the wines? It seemed like a question that could be settled by collecting a little data and looking at it with statistical software.

Fellow blogger Daniel Griffith and I performed this exact experiment and this week we will present the results in a series of blog posts:

Part I: The Experiment (you're reading it now)
Part II: The Survey
Part III: The Results
Part IV: The Participants

To run the experiment, we first recruited volunteers—all Minitab employees whose names have been changed in the posts—and asked them to complete a survey consisting of four questions:

On a scale of 1 to 10, how would you rate your knowledge of wine?
How much would you typically spend on a bottle of wine in a store?
How many different types of wine (merlot, riesling, cabernet, etc.) would you buy regularly (not as gifts)?
Out of the following 8 wines, which do you think you could correctly identify by taste?
- Merlot
- Cabernet Sauvignon
- Pinot Noir
- Malbec
- Chardonnay
- Pinot Grigio
- Sauvignon Blanc
- Riesling

We were looking for a broad range of responses, especially to question #1. Once we had enough surveys collected (13), we scheduled the tasting for each participant, with either 2 or 3 being served at any one time. The order of the wines was randomized independently for each participant, but only within each replicate, as we served each participant each of the four wines twice. In other words, a participant would have been given each of the four wines once before ever getting a second tasting, although this was not explained to participants.

The four wines selected were meant to satisfy the following criteria:

Two red wines and two white wines
Common enough types that a regular wine drinker would be familiar
Different enough types that they should be easily distinguishable if tasted back-to-back

Utilizing several resources including those found here, here, and here, we determined the following types to be good choices for the experiment:

Pinot Noir
Cabernet Sauvignon
Riesling
Sauvignon Blanc

"Representative" bottles of each were purchased in the $15-20 price range, and the bottles were labelled "A," "B," "C," and "D," and otherwise masked so even the experimenters would not know which type of wine they were serving each participant (although color would be obvious).

So how do you think our participants did?

Photograph of Napa Valley by Aaron Logan. Used under Creative Commons License 1.0.

In Blind Wine Part I, we introduced our experimental setup, which included some survey questions asked ahead of time of each participant. The four questions asked were:

On a scale of 1 to 10, how would you rate your knowledge of wine?
How much would you typically spend on a bottle of wine in a store?
How many different types of wine (merlot, riesling, cabernet, etc.) would you buy regularly (not as gifts)?
Out of the following 8 wines, which do you think you could correctly identify by taste?
- Merlot
- Cabernet Sauvignon
- Pinot Noir
- Malbec
- Chardonnay
- Pinot Grigio
- Sauvignon Blanc
- Riesling

Today, we'd like to take a look at the results of the survey to answer some questions about our participants.

We wanted to make sure that our participants covered a broad range of wine drinkers and, most important, covered a broad range of possible responses to question #1. Here are the distributions of responses to the first three questions:

Distribution of Responses

We were satisfied with the range and distribution of answers to those questions. Given that we had already selected which wine types to include, we were curious which wines participants believed they could identify by taste. The bar chart below shows how many out of the 13 participants indicated they could select that wine, with those included in the experimented in red*:

Wines You Could Identify

* The default color in Minitab 17 for the second group on any graph with attributes being controlled by groups is not actually called "red" but rather "wine," a clear indication this experiment was meant to be.

For most wines, only about 50% of participants felt they could identify them by taste alone. At this point they did not know the nature of the wine tasting, but must have been starting to get some ideas...

Next we wanted to see if there were any relationships between responses to different questions. For example, do participants with greater self-identified wine knowledge spend more on wine, or buy more types of wine?

Wine Knowledge versus Money Spent and Types Bought

While there was no evidence that claiming more knowledge is related to how much participants spent per bottle, there was a significant relationship with how many different types they bought regularly (p = 0.001, R-Sq = 66.2%). Perhaps those less knowledgeable stick to a few known types and spend more to make up for a lack of knowledge? And could it be that those with more knowledge buy a large variety, but know how to select without spending too much? We can't say definitively, but the results give some possibilities to consider.

We also wanted to test whether people who claim more wine knowledge (and who, we know from above, buy more types of wine regularly) would also claim to be able to identify more of the listed types:

N Identifiable vs Wine Knowledge

Again we see a significant relationship (p = 0.000, R-Sq = 70.6%), indicating that familiarity with many types of wine is closely linked with self-identified wine knowledge.

So with our survey results in hand, we see that:

We have a good distribution of participants
Participants who claim more knowledge don't tend to spend more money per bottle
Perceived wine knowledge is closely related with familiarity with a broad range of wine types

How well do you think the survey results will align with the experimental results?

Pareto charts show which data incidents are most common Everyone loves a Pareto chart. That is, everyone who knows that Pareto charts are a type of bar chart ordered by bar size to help you to determine which bars comprise the vital few that you care about and which are the trivial many that you don't care about. Pareto charts are a great tool for communicating where the largest gains can be made as you focus your improvement efforts.

Since I love Pareto charts, it was fun for me to come across Verizon Enterprise’s 2014 Data Breach Investigations Report. While the subject probably sounds dry, especially if you don’t know what a data breach is, the authors of the report can’t seem to resist any opportunity to reference Star Wars and Breaking Bad, which increases the entertainment value considerably. My all-time favorite moment, in the 2012 full report, was the following: “73.6% of people will believe any statement that includes a statistic, even if it is completely made up.” (page 70, no lie)

This year's report includes a table that breaks down types of security breaches by industry. The purpose is to help readers pick out which findings from the report apply directly to their organization. This is where the Pareto charts and Minitab Statistical Software can be useful.

Let's take a look at the Professional industry for an example. According to the Census Bureau website, the Professional industry is a broad field, that includes professions as diverse as lawyers, landscapers, veterinarians, and accountants.

Denial of service is the most common incident in the professional industry.

Emphasize the vital few

One of the desirable features of the Pareto chart in Minitab is that you don't see all of the categories. How can this be good?

The chart of the professional industry is a good example. We want to focus on the idea that Denial of Service attacks account for most of the incidents in that industry. The Pareto chart automatically lumps the smallest categories together so that they don't detract from your message.

Accumulate across categories

In a way, the Pareto chart de-emphasizes the proportion of Denial of Service incidents because the bar is so far from the top of the chart. The purpose of the scale on the Pareto chart is to allow for the accumulation line to reach 100%. This accumulation lets you see that 75% of the incident are of three types: Denial of Service, Cyber Espionage, and Web App Attacks. When you have to prioritize your resources, knowing that three problems cause 75% of your incidents is powerful information.

Wrap-up

Graphs are an excellent way to explore data, and the Pareto chart in Minitab is an excellent graph. By focusing on the largest categories, your message gets delivered clearly. By accumulating across categories, you can quickly determine how many categories deserve your attention. To focus on the primary contributors to a problem, start with a Pareto chart.

Curious about what ails the rest of the industries? Start your free trial and make your own Pareto charts in Minitab! Want more about Pareto charts? Check out how Eston Martz would explain Pareto charts so that even his boss could understand.

The image of the hand coming through the computer screen is by Berishafjolla and is used under this Creative Commons license.

In Part I and Part II we learned about the experiment and the survey, respectively. Now we turn our attention to the results...

Our first two participants, Danielle and Sheryl, enter the conference room and are given blindfolds as we explain how the experiment will proceed. As we administer the tasting, the colors of the wine are obvious but we don't know the true types, which have been masked as "A," "B," "C," and "D."

As Danielle and Sheryl proceed through each tasting, it is easy to note that they start off correctly identifying the color of each wine; it is also obvious that tasting methods differ greatly from person to person as Danielle tends to take one or two sips whereas Sheryl is drinking the entire sample.

As Sheryl has her fourth sample, we realize that her fifth sample will be the exact same wine. She makes her guess of "Pinot Noir, Red" for the fourth sample and is given the fifth, which she almost immediately takes a sip of. But the immediate response we were expecting doesn't come. Then a second sip. Still no guess. Then more. As she finishes the sample, she finally reports that it is "Cabernet Sauvignon, Red." All fears we had that perhaps the experiment would be too easy are quickly dissipated, as we have just witnessed someone make different guesses for the same wine given back-to-back. Sheryl's self-reported knowledge level is a 6.

Before jumping into how often tasters got color and type correct, it's worth looking at how often type and color were mismatched by each taster—in other words, if a taster guessed "Red, Riesling" they mismatched the two because Riesling is a white wine:

Mismatched by Taster

For the most part, mismatches weren't an issue for our tasters. However, Viviana in particular struggled, and a look at her results shows that she believed Pinot Noir to be a white wine and Sauvignon Blanc to be red. At one point during the experiment Viviana, a fluent Spanish speaker, even questions out loud if "blanc" from "Sauvignon Blanc" might be related to "blanco" (Spanish for "white"), but decides against it.

Our first look at how well participants did will use Minitab's Attribute Agreement Analysis, first for color only:

AAA Color

One thing to note about Attribute Agreement Analysis is that without testing huge numbers of unique parts, it is very difficult to distinguish between operators in any statistically significant manner. So most conclusions we draw are based on the system itself, grouping operators together into one big system. Here we see that our participants did generally well within themselves (how consistently they answered, whether right or wrong) as well as against the known standard (how consistently they answered correctly). In only two cases—Jeremiah and Santos (who had the lowest wine knowledge scores)—did the participant show higher consistency within than against the standard.

Next we'll look at the same graph, but by type instead of color:

AAA type

Here we see much less consistency "within," meaning our tasters had a harder time consistently guessing the same type when they had been given the same wine a second time. Further, once we compare to the standard, there was little consistency, with nearly half of the participants failing to get both samples of the same wine correct even once!

One extreme example is Bobby, who answered consistently 100% of the time but got none of the wines correct. From a practical standpoint, Bobby's performance is usually fairly easy to correct as he already possesses the ability to distinguish parts but is putting them in the wrong "buckets."

One drawback to traditional Attribute Agreement Analysis is that incorrectly appraising a part on even one replicate is treated as though that entire part was appraised incorrectly. In other words, suppose you got 7 out of the 8 tastings correct: despite being right 87.5% of the time, you are scored as being right 75% of the time (3 out of 4 wines correct). So another way to look at our results would be to see how often they were correct on both color and type out of the eight samples:

All Correct by Taster

Here we more clearly see that Brutus and Katherine turned in the best performances, with 6 out of 8 tastings correctly identified. Bobby, as discussed above, missed all of them but showed great consistency (as did Danielle).

Aside from looking at how each appraiser did, we can evaluate our results in a couple of other ways. One is to look at whether certain wine types were identified correctly more or less often than others:

Correct by Wine Type

There is little if any evidence that certain types were easier or more difficult to identify, with a Chi-Square test resulting in a non-significant p-value of 0.911.

Similarly, it would seem reasonable that as a participant had tasted multiple samples of wine, they would become less able to discern differences and performance would degrade. So we also looked at whether the number correct decreased as participants progressed through the eight samples:

Correct by Tasting

Again there is no visual or statistical evidence of such an effect. It could be that with more tastings the effect would have shown up, or it could be that the effect is countered by participants' memory of what they had answered previously and what to still expect in the remaining samples.

To summarize:

The regular wine drinkers who participated showed only limited ability to identify which type of wine they were drinking when blindfolded.
Color was considerably easier to identify than type.
Multiple participants showed a much greater ability to distinguish wines than to correctly identify them.

Keep in mind that our participants represented a diverse group of wine drinkers, so stay tuned for Part IV, when we evaluate whether our survey results from Part II correlate in any meaningful way to these experimental results...

In Part I, Part II, and Part III we shared our experiment, the survey results, and the experimental results. To wrap things up, we're going to see if the survey results tied to the experimental results in any meaningful way...

First, we look at whether self-identified knowledge correlated to the total number of correct appraisals:

Correct vs Knowledge

We have no evidence of a relationship (p = 0.795). So we'll look at the number correct by how much each participant usually spends:

Correct vs Spend

Again, no evidence of a relationship (p = 0.559).

How about how many types each regularly buys?

Correct vs Types

There appears to be something here, but statistically we don't have evidence (p = 0.151). Perhaps a larger experiment might uncover something.

Remember Question #4 in our survey, which asked if participants felt they could identify certain wines by taste? Eight wines were included as choices, including the four wines used in the experiment. So did participants' responses to that question correlate to their ability? To test, I did a Chi-Square test in Minitab.

Here are the expected number of correct guesses for each wine type along with the observed number of correct guesses:

Observed and Expected by Wine Type

At a strict alpha level of 0.05, there is not statistical significance—but given the small experiment, the corresponding p-value of 0.069 would probably give me reason to investigate further. The largest contributor to the Chi-Square was Riesling, which few participants felt they could identify but as many were correct on as any other wine. It could be that participants underrated their ability, or it could be process of elimination (if you know the other three, then the wine you can't identify in the experiment must be the Riesling).

In the end, we found little evidence of any relationships between the survey questions and the experimental results. While a larger study could likely draw some conclusions, we've learned enough to say that any real, underlying relationships are not particularly strong and would have considerable variation around them.

The next time you plan on trying a new wine, try tasting the wine without being told ahead of time what type it is or even looking at it (sleep masks make great blindfolds)...you might find it to be a completely different experience!

Last time, we went over Bar Charts you could create from Counts of Unique Values. However, sometimes you want to convey more information than just simple counts. For example, you could have a number of parts from different models. The number of occurrences themselves don't offer much value, so you may want a chart displaying the means, sums, or even standard deviations of the different parts. It's this case that we're after when we go to Graph > Bar Charts > Function of a Variable.

To illustrate this, I have a small data set investigating starfighters built in the Star Wars galaxy:

We’re interested in two variables: the size of the ship, and how much cargo the ship can hold. We have 6 different models of starfighters built by 3 different corporations. We can use bar charts to compare how these different groups of fighters are being produced.

Bar Charts with a Function of a Variable

Let’s say we’re interested in which manufacturer builds the largest ships. We have a few different models, and each individual ship will have a slightly different length, so we are interested in the mean length for each. Let’s start at Graph > Bar Chart, but this time let’s change our drop-down to function of a variable. Choose Simple with One Y, and enter size as the graph variable and manufacturer as the categorical variable. Make sure that the function is listed as Mean, as we want to know who, on average, builds the largest starfighters. Minitab gives us the following graph:

We can see that Koensayr Manufacturing, who are known for K-Wing and Y-Wing starfighters, comes out on top with the largest size. Sienar Fleet, on the other hand, produce smaller ships. They are mainly known for producing the smaller TIE Fighters.

They key takeaway from this graph is that if you have a number of observations for each category, Minitab Statistical Software will then perform an operation on them to get what is actually plotted.

Bar Charts with Two Categorical Variables

We can also use more than one categorical variable and either cluster them or stack them. In my last post, I discussed innermost and outermost grouping variables, and we can use that same logic here. If you have two categorical variables, you can just enter them in your dialog consecutively.

If we are interested in not just the size by company, but each model as well, we will enter Company first, followed by Model. We are still interested in the mean, so we will leave that as is.

There is one more step that is important, and it's something to keep an eye on because each corporation produces different ships. This creates an empty space for the nonexistent Incom/TIE/LN Class, for example, as well as numerous others. We want to clean up that space, and we can do that by clicking the Data Options button, sliding over to the Group Options tab, and unchecking ‘Include empty cells.’ This gives us the following graph:

Using the Stacked Bar Chart

We have one option left, and that is the stacked chart. In addition to size, we are interested in how much cargo each fighter can hold. We want to see how much total cargo each company holds in their fleet. To do this, we can "stack" the cargo capacity for each ship and see who comes out on top. To do that, we can fill out our dialog as follows:

which gives us the following, final graph:

As you are beginning to see, the Bar Chart is a very simple, yet very powerful tool. This allows Minitab to accommodate numerous date setups, and looks. We've already created six completely different charts, and there're still many more options we haven't explored. I'll continue going over these in a future post...

The calendar just flipped to August, meaning it’s time to get ready for fantasy football season! As you prepare for your draft, you will no doubt be looking at all sorts of rankings. But when the season is over, do you ever go back and see how accurate those rankings were? And are rankings for some positions more accurate than others? Well that’s exactly what we’re going to find out!

I went back the previous 5 seasons and found ESPN’s preseason rankings for quarterbacks, running backs, wide receivers, and tight ends. I know that different publications will have slightly different rankings, but the rankings are all similar enough (especially for the top players) that using only ESPN will work for our purposes. For each season I recorded the top scoring players at the end of the season (through week 16). I recorded where the top preseason players finished at the end of the season, and also where the top players at the end of the season were ranked before the season started.

We’ll analyze the data using Minitab Statistical Software, starting with quarterbacks.

Quarterbacks

Let’s begin by looking at how the top rated preseason quarterbacks fare by the end of the season. I took the top 5 ranked preseason quarterbacks for each season from 2009-2013 and recorded where they ranked to finish the season. Here is an individual value plot and summary statistics of the results.

The median value is 3, meaning that half of the top 5 preseason quarterbacks finished in the top 3 at the end of the season. That’s pretty good! The individual value plot confirms this, showing that most of the data points are clustered between a rank after the season between 1 and 4.

There does appear about be about 1 quarterback a season that slips out of the top 10 despite being highly rated in the preseason. The worst-case scenario happened last year, when Aaron Rodgers (ranked #1 in the preseason) finished as the 24th ranked quarterback. Of course, this is because he missed 8 games due to injury. Injuries will always be a part of fantasy football, but they are impossible to predict before the season. But overall, you should be confident in selecting quarterbacks who are ranked high in the preseason fantasy football rankings.

But what if you don’t want to take a quarterback early? Can you wait until a later round and still draft a top 5 quarterback? This next individual value plot and summary stats take the top 5 quarterbacks at the end of the season, and shows where they were ranked before the season started.

Diamonds in the rough are rare for quarterbacks, as half of the quarterbacks ranked in the top 5 at the end of the season were ranked in the top 4 before the season. About 1 quarterback a year is ranked outside the top 10 in the preseason but finishes in the top 5. But some of them, like Vick and Newton, were so far off the radar in the preseason that they weren’t even drafted in most fantasy football leagues.

Things get even worse if you want to draft one of the top 2 quarterbacks. The following table shows the 2 highest scoring quarterbacks for each year, with their preseason rank in parentheses.

Year

Top Scoring QB

2nd Highest Scoring QB

2013

Peyton Manning (3)

Drew Brees (2)

2012

Tom Brady (2)

Aaron Rodgers (1)

2011

Aaron Rodgers (1)

Drew Brees (3)

2010

Michael Vick (34)

Aaron Rodgers (2)

2009

Aaron Rodgers (5)

Drew Brees (2)

Michael Vick aside, if you want one of the highest scoring quarterbacks, you have to take a top-ranked one early. This year the top-ranked quarterbacks are Peyton Manning, Aaron Rodgers, and Drew Brees. If you want your quarterback to be the highest scoring at the end of the season, your best bet is to draft one of those three.

Now, this isn’t to say that you have to draft a quarterback early. You can certainly win your league with a lower-scoring quarterback. Just know that if you’re waiting until the later round to draft somebody like RG III or Nick Foles, history doesn't suggest that they’ll finish the season on top.

But before we get too much into draft strategy, let’s move on to our next position, tight ends!

Tight Ends

I thought about just writing “Jimmy Graham” and moving on to the next position. But I already collected the data, so I suppose I might as well analyze it. Here's how the top 5 preseason tight ends ranked at the end of the year.

We much higher variation with tight ends than with quarterbacks. The median and Q3 are about two times larger than for quarterbacks. The individual value plot shows that 36% of tight ends finished the season outside the top 10, as opposed to only 24% for quarterbacks. Sometimes this is due to injury, as it was with the 3 names at the top of the plot. But Jermichael Finley (2010) and Brent Celek (2010) both played in every game, yet still ranked outside the top 15 in fantasy scoring despite being highly ranked in the preseason.

So can you wait until a later round and still grab a tight end that will finish in the top 5? This next individual value plot and summary stats take the top 5 tight ends at the end of the season, and show where they were ranked before the season started.

There are 9 different tight ends since 2009 who started the season ranked outside the top 10, but still managed to finish in the top 5. Can you believe Julius Thomas was ranked as the 20th best tight end last year? In hindsight, that seems ridiculous! And in 2011, the top 3 scoring tight ends were all ranked outside the top 10 to start the season!

Will we see something similar if we just look at the top 2 scoring tight ends?

Year

Top Scoring TE

2nd Highest Scoring TE

2013

Jimmy Graham (1)

Vernon Davis (5)

2012

Rob Gronkowski (1)

Tony Gonzalez (7)

2011

Rob Gronkowski (13)

Jimmy Graham (11)

2010

Jason Witten (7)

Antonio Gates (2)

2009

Dallas Clark (4)

Vernon Davis (19)

Half of the top 2 scoring tight ends were ranked outside the top 5 to being the season. And of those, 3 were ranked outside the top 10! This inconsistency is what makes Jimmy Graham so valuable this year. He’s been a top 3 tight end three consecutive years. When it comes to tight ends, Jimmy Graham is as close to a sure thing as you can get. His average draft position is currently 15 on ESPN, and his consistency makes him well worth a 2nd round pick (and possibly even earlier). Julius Thomas is the 2nd ranked tight end. He should be a solid pick, but has only performed at a high level for one season (as opposed to Graham’s 3 consecutive years). Still, with Peyton Manning as your quarterback, I wouldn’t worry too much.

After Thomas things get dicey. Rob Gronkowski is the 3rd ranked tight end. His upside is through the roof, but with all injury problems he’s had, there is no guarantee he’ll ever be like he was in 2012. Next is Vernon Davis, who had a great year last season but finished as the 15 ranked tight end two years ago. And coming in 5th is Jason Witten, who a season ago posted his fewest catches and lowest yardage total since 2006. He made up for it with 8 touchdowns, but if he doesn’t find the end zone this year, he could see a significant drop in his fantasy value.

The average draft position for Gronk, Davis, and Witten is between the 4th and 6th round. Now compare that to Greg Olsen. Olsen is the 7th-ranked TE and is currently being drafted in the 9th round. Why did I choose the 7th-ranked tight end? Look back at the median value for where the top 5 scoring tight ends were ranked before the season. It’s 7, meaning that half of the tight ends that finished in the top 5 were ranked 7th or higher to start the season. So there is still some value at the tight end position into the 9th round and even later!

To put it in perspective, Olsen is being drafted around the same spot as Maurice Jones Drew (the 32nd running back) and Kendall Wright (the 34th wide receiver). If you miss on Graham or Thomas, your best bet may be to get a quarterback and stock up on running backs and wide receivers in the first 8 rounds or so. Then take two tight ends later, hoping one of them becomes one of the players that finish in the top 5!

In my next post I’ll move on to running backs and receivers. Then we’ll put all the statistics together and use them to look at possible draft strategies!

If you teach statistics or quality statistics, you’re probably already familiar with the cuckoo egg data set.

The common cuckoo has decided that raising baby chicks is a stressful, thankless job. It has better things to do than fill the screeching, gaping maws of cuckoo chicks, day in and day out.

So the mother cuckoo lays her eggs in the nests of other bird species. If the cuckoo egg is similar enough to the eggs of the host bird, in size and color pattern, the host bird may be tricked into incubating the egg and raising the hatchling. (The cuckoo can then fly off to the French Riviera, or punch in to work at a nearby cuckoo clock, or do whatever it is that cuckoos with excess free time do.)

The cuckoo egg data set contains measurements of the lengths of cuckoo eggs that were collected from the nests of 5 different bird species. Using Analysis of Variance (ANOVA), students look for statistical evidence that the mean length of the cuckoo eggs differs depending on the host species. Presumably, that supports the idea that the cuckoo may adapt the length of its eggs to better match those of the host.

Old Data Never Dies..It Just Develops a Rich Patina

Sample data sets have a way of sticking around for awhile. The cuckoo egg data predate the production of the Model T Ford! (Apparently no one has measured a cuckoo egg in over 100 years. Either that or cuckoo researchers are jealously guarding their cuckoo egg data in the hopes of becoming eternally famous in the annals of cuckoology.)

Originally, the data was published in a 1902 article in Biometrika by OM Latter. LHC Tippet, an early pioneer in statistical quality control, included the data set in his classic text, the Methods of Statistics, a few decades later.

That's somewhat fitting. Because if you think about it, the cuckoo bird really faces the ultimate quality assurance problem. If its egg is recognized as being different (“defective”) by the host bird, it may be destroyed before it’s hatched. And the end result could be no more cuckoos.

Analyze the Cuckoo Egg Data in Statistical Software

Displaying boxplots and performing ANOVA is the classic 1-2 punch that’s often used to statistically compare groups of data. And that’s how this vintage data set is typically evaluated.

To try this in Minitab Statistical Software, click to download the data set. (You'll need Release 17, which you can download free for a 30-day trial period.) Then follow the instructions below.

Display Side-by-Side Boxplots

In Minitab, choose Graph > Boxplots. Under One Y, choose With Groups, then click OK.
Fill out the dialog box as shown below, then click OK.

Minitab displays the boxplots

The boxplots suggest that the mean length of the cuckoo eggs may differ slightly among the host species. But are any of the differences statistically significant? The next step is to perform ANOVA to find out.

Perform One-Way ANOVA

In Minitab, choose Stat > ANOVA > One-Way.
Complete the dialog box as shown below.
Click Comparisons, check Tukeys, then click OK in each dialog box.

The ANOVA output includes the following results

Analysis of Variance

Source   DF    Adj SS Adj MS    F-Value   P-Value
Nest       5       42.94    8.5879    10.39     0.000
Error    114    94.25    0.8267
Total    119      137.19

Tukey Pairwise Comparisons

Grouping Information Using the Tukey Method and 95% Confidence

Nest                  N    Mean Grouping
HDGE SPRW   14    23.121 A
TREE PIPIT      15    23.090 A
PIED WTAIL      15   22.903 A B
ROBIN              16   22.575 A B
MDW PIPIT       45 22.299    B
WREN               15 21.130      C

Means that do not share a letter are significantly different.

----------------------------------------------------

The interval plot displays the mean and 95% confidence interval for each group. In the ANOVA table, the p-value is less than the alpha level 0.05. So you reject the null hypothesis that the means do not differ.The egg lengths are statistically different for at least one group.

Based on Tukey's multiple comparisons procedure, two groups significantly differ. The mean length of the cuckoo eggs in the wren nest are significantly smaller than the eggs in all the other nests. The mean length of the eggs in the meadow pipit nest are significantly smaller than the eggs in the tree sparrow or tree pipit nests.

With that said, the case of the morphing cuckoo eggs is frequently considered closed. The ANOVA results are said to support the theory that the cuckoo adapts egg length to the host nest.

Bottom line: If you're a mother cuckoo, stay away from ostrich nests.

Post-Analysis Pondering

As alluring and sexy as a p-value is to the data-driven mind, it has its dangers. If you're not careful, it can act like a giant door that slams shut on your mind. Its air of finality may prevent you from looking more closely—or more practically—at your results.

Case in point: Most of us know that a wren is smaller than a robin. But what about the other bird species?

Personally, I wouldn’t recognize a pied wigtail or a tree pipit if it dropped a load of statistically significant doo-doo on my shiny bald head.

How big is each bird species—or more to the point, how long, on average, are its eggs? If two species have about the same size egg, then the lack of a significant difference in the ANOVA results would actually support the theory that the cuckoo may adapt its egg length to match the host. Without any indication of whether the lengths of the eggs of these bird species differ significantly to begin with and, if so, how they differ, it's really difficult to determine how ANOVA results will support or contradict the idea of egg-length adaptation by the cuckoo.

Apart from that, there's the issue of practical consequence. Upon closer examination of the confidence intervals, it appears that the actual mean difference itself could be fractions of a millimeter. Does that size difference really matter if you're a host bird? Would it make a difference between the eggs being accepted or rejected?

Finally, there's the proverbial elephant in the room whenever you perform a statistical analysis. The one that trumpets noisily in the back of an asymptotically conscientious mind: "Assssssumptions!! Asssssumptions!"

How well do the cuckoo egg data satisfy the critical assumptions for ANOVA?

Stay tuned for the next post.

by Iván Alfonso, guest blogger

hotcakes I'm a huge fan of hot cakes—they are my favorite dessert ever. I’ve been cooking them for over 15 years, and over that time I’ve noticed many variation in textures, flavor, and thickness. Personally, I like fluffy pancakes.

There are many brands of hotcake mix on the market, all with very similar formulations. So I decided to investigate which ingredients and inputs may influence the fluffiness of my pancakes.

Potential factors could include the type of mix used, the type of milk used, the use of margarine or butter (of many brands), the amount of mixing time, the origin of the eggs, and the skill of the person who prepares the pancakes.

Instead of looking at all of these factors, I focused on the type of milk used in the pancakes. I had four types of milk available: whole milk, light, low fat, and low protein.

My goal was to determine if these different milk formulations influence fluffiness (thickness). Is the whole milk the best for fluffy hotcakes? Does skim milk works the same way as the whole milk? Can I be sure that the use of light milk will result in hot cakes that are less smooth?

Gathering Data

I sorted the four formulations as shown in the diagram below:

I used the the same amounts of milk, flour (one brand), salt and margarine for each batch of hotcakes I cooked.

The response variable was the thickness of the cooked pancakes. I prepared 6 pancakes for each type of milk, which gives me a total of 8 pancakes. I randomized the cooking order to minimize bias. I also prepared each batch by myself—if my sister or mother had helped with some lots, it would be a potential source of variation.

To measure the fluffiness, I inserted a stick into the center of each hotcake until the bottom, marked the stick with a pencil, then measured the distance to the mark in millimeters with a ruler.

After a couple of hours of cooking hotcakes, making measurements, and recording the data on a worksheet, I started to analyze my data with Minitab.

Analysis of Variance (ANOVA)

My goal was to assess the variation in thickness or fluffiness between different batches of hot cakes, so the most appropriate statistical technique was analysis of variance, or ANOVA. With this analysis I could visualize and compare the formulations based on my response variable, the thickness in millimeters, and see if there were statistically significant differences between them. I used a 0.05 significance value.

As soon as I had my data in a Minitab worksheet, I started to check it for the assumptions of ANOVA. First, I needed to see if the data followed a normal distribution, so I went straight to Statistics > Basic Statistics > Normality Test. Minitab produced the following graph:

Graph of probability of thickness

My data passed both the Kolmogorov-Smirnov and Anderson-Darling normality tests. This was a relief—since my data had a normal distribution, I didn’t need to worry about ANOVA’s assumptions of normality.

Traditional ANOVA also has an assumption of equal variances; however, I knew that even if my data didn’t meet this assumption, I could proceed using the method called Welch’s ANOVA, which accommodates unequal variances. But when I ran Bartlett’s test for equal variances, and even the more stringent Levene test, my data passed.

With confirmation that my data met the assumptions, I proceeded to perform the ANOVA and create box-and-whisker graphs.

ANOVA Results

Here's the Minitab output for the ANOVA:

one-way anova output

The ANOVA revealed that there were indeed statistically significant differences (p = 0.009) among my four batches of hotcakes.

Minitab’s output also included grouping information using Tukey’s method of multiple comparisons for 95% confidence intervals:

Tukey Method

The Tukey analysis shows that low-fat milk and light items do not show a significant difference in fluffiness. However, the batches made with whole milk and low protein did significantly differ from each other.

The box-and-whisker diagram makes the results of the analysis easier to visualize:

Boxplot of thickness

It is clear from the graph that hotcakes produced with whole milk had the most fluffiness, and those made with low protein milk had the least fluffiness. There was not a big difference between the fluffiness of hotcakes made with light milk and lowfat milk.

Which Milk Should You Use for Fluffy Pancakes?

Based on this analysis, I recommend using whole milk for fluffier hotcakes. If you want to avoid fats and sugars in milk, low fat milk is a good choice.

I always use lowfat milk, but the analysis indicates that light milk offers a good alternative for people following a strict no-fat diet.

It’s important to note that for this analysis, I only compared formulations that used the same brand of pancake mix and the same amounts of salt and butter. But there are other factors to consider! My next pancake experiment will use design of experiments (DOE) to compare milk types, different brands of flour, and margarine with and without salt, to see how all of these factors together affect the fluffiness of pancakes.

About the Guest Blogger:

Iván Alfonso is a biochemical engineer and statistics professor at the Autonomous University of Campeche, Mexico. Alfonso holds a master's degree in marine chemistry and has worked extensively in data analysis and design of experiments in basic and advanced sciences like chemistry and epidemiology.

Would you like to publish a guest post on the Minitab Blog? Contact publicrelations@minitab.com.

The current Ebola outbreak in Guinea, Liberia, and Sierra Leone is making headlines around the world, and rightfully so: it's a frightening disease, and last week the World Health Organization reported its spread is outpacing their response. Nearly 900 of the more than 1,600 people infected during this outbreak have died, including some leading medical professionals trying to stanch the outbreak's spread. And yesterday, one of the American doctors who contracted the disease arrived back in the U.S. for treatment.

Many sources state that Ebola virus outbreaks have a case fatality rate of up to 90%, but a look at the data about ebola shows the death rate significantly varies based on the ebola species, case location, and year.

Plotting Ebola Outbreaks Since 1976

Infection with the ebola virus causes a hemorrhagic fever. Symptoms most commonly appear 8 to 10 days after exposure, and include fever, headache, joint and muscle aches, and weakness. These symptoms quickly escalate to diarrhea, vomiting, stomach pain, lack of appetite, abnormal internal and external bleeding, and organ failure.

The disease first appeared in Africa in 1976, and since then sporadic outbreaks have occurred as indicated in graph 1, which depicts data from the World Health Organization web site. (You can download my Minitab project file, which includes all of the data used in this blog post, here.)

ebola virus cases per year

According to the Centers for Disease Control, of the five known species of the Ebola virus, only three have resulted in large outbreaks. The current outbreak is associated with the species Zaire ebolavirus (EBOV). The two other species that have been associated with large outbreaks are Bundibugyo ebolavirus (BDBV) and Sudan ebolavirus (SUDV).

Graphing the outbreak death rate over time can help us understand the impact of species, location, and year. But plotting raw outbreak death rates, as I did above, is not ideal due to the difference in case numbers (sample size) across outbreaks. Let's try a different approach.

Assessing Ebola Outbreaks with Binary Logistic Regression

Fitting a model which accounts for the different sample sizes and then plotting the model predictions over time is more appropriate than simply graphing the raw fatality numbers.

I put the data into Minitab Statistical Software and used binary logistic regression to fit a model with three predictors: year, ebola virus species, and location of outbreak. I could not fit interactions among these factors because of the limited amount of data available.

All three predictors had p-values below 0.001, indicating strong statistical significance:

ebola virus binary logistic regression analysis

I also created a scatterplot to illustrate the model's predicted death rates over time:

ebola scatterplot of predicted death rate vs year

We can draw the following conclusions from the binary logistic regression analysis and the graph above:

The death rate from ebola decreases over time.
The death rate is significantly different across species. After accounting for the effects of location and time, species SUDV and BDBV have lower death rates than EBOV. The current outbreak is EBOV.
The death rate is significantly different across locations. After accounting for the effects of species and time, Gabon, Sudan, and the current outbreak location (Guinea, Sierra Leone, and Liberia), appear to have a lower death rate.

Assessing the Current EBOV Outbreak with Binary Logistic Regression

The current outbreak has a low death rate relative to previous EBOV outbreaks. Since the current location has not appeared before, we can not tell whether this decreased death rate is due to improvements in treatment over time, the quality of care available in the location of the outbreak, or some other factor, such as better immunity to the virus in the region.

The graph below shows the EBOV death rate predictions from a binary logistic regression model fit to the EBOV data only.

ebola scatterplot of predicted death rate vs year - EBOV only

The current outbreak is severe in terms of number of cases, but the death rate is lower than expected based on past EBOV outbreaks in different locations.

Seeing the Outbreak Day by Day

One final graph shows the number of new cases per day by location for the current outbreak.

ebola scatterplot of new cases per day vs. date

Cases per day has fluctuated widely in Guinea, while Liberia and Sierra Leone have both seen an extremely rapid rise in cases per day since mid-July.

This is one graph that will change greatly from day-to-day as the outbreak runs its course. Let's hope the data quickly return to 0 new cases per day for all locations.

In his post yesterday, my colleague Jim Colton applied binary logistic regression to data on the current ebola virus outbreak in Guinea, Liberia, and Sierra Leone, and revealed that, horrific as it is, this outbreak actually appears to have a lower death rate than some earlier ones.

He didn't address the potential for a global ebola pandemic, but over the last few days more than enough leading publications have featured extremely scary headlines about this extremely remote possibility. Less reputable organizations have promulgated even more exaggerated stories, usually with some ludicrous conspiracy angle ("they want it to spread!") thrown in for good measure.

Medical experts say that the risk of actually getting ebola is extraordinarily low, especially for people living outside of western Africa. The Boston Globe's Evan Horowitz did a particularly nice job illustrating just how minimal that risk is.

The current outbreak is certainly newsworthy, but the ebola hysteria—a perjorative term I use very deliberately here—seems silly.

What Kept the Epidemiologist Awake at Night?

In the late 1990s I interviewed an epidemiologist while I was writing an article about a hantavirus outbreak in the southwestern U.S. I asked if hantavirus was one of the scarier diseases he studied. "As an epidemiologist, hantavirus and other exotic bugs don't scare me at all," he replied.

"The disease that keeps me up at night is the flu."

I thought he was joking. But as he explained the toll influenza has claimed throughout history, and how many lives the Spanish flu pandemic took less than 100 years ago, he shattered my misperception of the flu as a minor nuisance that had been defeated by modern science.

To him, every year the world doesn't lose millions of people to influenza seems like a momentary reprieve.

I haven't worried much about bugs like hantavirus and ebola since, although I still enjoy a good scare story like The Stand, 28 Days Later, or Contagion.

Worst-Case Scenario?

On Tuesday, one of my buddies who's a bit of a hypochondriac called to ask what ebola precautions I was taking. He couldn't believe it when I said, "Nothing." Rather than argue with him, I started looking for a quick and dirty way to put the risk of an ebola epidemic into perspective.

The Boston Globe's story about the risk of ebola included a list of several horrible diseases, the average number of people an infected person spreads the disease to ("reproductive number"), and how long it takes a newly-exposed person to become infectious themselves ("generation time").

infectious disease table

Using those numbers, I created a very basic worst-case scenario for the different diseases they cited. First I rounded the maximum reproductive number and the minimum generation time to integers. Then I extrapolated the spread over a 30-day period from a single person exposed to each disease. For each disease, I divided the 30-day period by the minimum generation time to obtain a number of periods, and simply totaled the number of new cases you'd expect to see in each period based on the reproductive number.

So, if a disease with a reproductive number of 2 had 4 generation periods in 30 days, from a single person we could expect to see 1 + 2 + 22 + 23 + 24 = 31 total cases.

Similarly, as shown in the second row in the table above, a malady with a reproductive number of 6 and 3 generation periods over 30 days would total 1 + 6 + 62 + 63 = 259 cases.

The following table shows the rough totals I came up with for each disease cited in the Boston Globe article:

worst-case-scenarios

Of course, this quick and dirty extrapolation would never satisfy someone looking for the most accurate estimate of how these diseases could spread: this model only considers time and the reproductive number, which is itself a metric with some serious limitations.

All I wanted to do was show my friend that not worrying about an ebola pandemic in the U.S. is hardly a suicidal act.

Pareto Chart of Potential Pandemics

The table above would seem to put the relative risk from ebola into perspective, but it's often useful to visualize the numbers. So I turned to one of my favorite quality tools, the Pareto chart, to illustrate my worst-case scenarios to my friend.

Pareto charts rank "defects" based on frequency of occurrence, although you can also account for severity of the defect, the cost, or any other metric you like. To create a Pareto chart in Minitab, I opened Stat > Quality Tools > Pareto Chart...

I entered the column of disease names as "Defect" and entered "Worst Case" as the summarized value column. Minitab provided the Pareto chart shown below:

Pareto chart of diseases

These data suggest that the epidemiologist I spoke with so many years ago had good reason to fear a flu epidemic. Of the diseases in the Boston Globe story, influenza infected the most people in my worst-case scenario, followed closely by cholera and measles.

Returning to the subject of today's headlines, you'll notice that "ebola" doesn't even appear on this chart—it's lumped in with SARS, dengue fever, rubella, and smallpox in the "Other" category, which combined accounts for less than 5% of the total number of infections calculated in my scenario.

Clearly I'm no epidemiologist, but this quick exercise did serve its intended purpose with my friend. He's already planning to get a flu shot this year—his first ever—and he's much less concerned about exotic maladies that, however horrible, we're extremely unlikely to ever encounter.

Which makes perfect sense. Statistically speaking, we have much scarier bugs to worry about.

What Is the Difference between Linear and Nonlinear Equations in Regression Analysis?

Can I Just Delete Some Values to Reduce the Standard Variation in My ANOVA?

A Little Trash Talk: Improving Recycling Processes at Rose-Hulman

Bar Chart: Where to Start?

How to Use Brushing to Investigate Outliers on a Graph

Some Stats on the American Statistical Association’s Membership in its 175th Year

Guest Post: Did Ma's Diabetes Get Cured by Back Surgery?

Two-Way ANOVA in Minitab 17

Where Did All the World Cup Goals Go? Find Out with a 2-Sample Poisson Rate Test

Blind Wine Part I: The Experiment

Blind Wine Part II: The Survey

Pareto Charts Show Which Data Breach Incidents are Most Common

Blind Wine Part III: The Results

Blind Wine Part IV: The Participants

Investigating Starfighters with Bar Charts: Function of a Variable

How Accurate are Fantasy Football Rankings?

Cuckoo for Quality: A Birdseye View of a Classic ANOVA Example

A Fun ANOVA: Does Milk Affect the Fluffiness of Pancakes?

How Deadly Is this Ebola Outbreak?

Is the Risk of an Ebola Pandemic Even Worth Worrying About?