Minitab | Minitab

Via Christi Health, the largest provider of healthcare in Kansas, operates a Center for Clinical Excellence that's made up of a team of quality practitioners, all who have Lean and Six Sigma training. I recently had the opportunity to talk with the team about the types of projects they're working on.

I learned not only about the areas of patient care where they are targeting improvements, but about one particular project the team completed to determine if process changes put into place in the hospital's emergency department actually improved the patient experience.

I thought it was an interesting case study to share not only with practitioners working in healthcare, but with any quality professional who is looking to prove statistically that process changes are actually making a difference.

Like many healthcare providers, at Via Christi, they sought to make their services as efficient and consistent as possible now that more and more patients are needing to be served by the healthcare system.

The various emergency departments (ED) made several changes, such as adjusting the various reporting processes for tracking patients through the different areas of the ED, making leadership changes, reallocating hospitality staff, and reducing the total number of beds in the ED’s intensive care unit. But they really wanted to see if these changes actually improved the patient experience in the ED and helped the department get closer to its goals for efficiency and consistency.

Analyzing the Data with Minitab

The team had access to the raw patient throughput data from several months before the process changes were made, as well as from the first few months after the changes were made. They analyzed the data using various charts in Minitab, including Individuals and Moving Range (I-MR) charts, histograms, boxplots, and interaction plots, as well as calculating the descriptive statistics on the raw throughput time data—including mean and standard deviation to make before-and-after comparisons, and performing an analysis of the before and after mean times.

The team used the I-MR chart above to compare the before and after throughput times and the process variation by area. This particular chart showed a slight average increase of around 8 minutes and greater variation after the changes were made to the process.

To compare the shape and spread of the time data before and after changes were made, the team built histograms in Minitab. The histograms above showed that the majority of patient throughput times (both before and after changes) surpassed the mean target time of 30 minutes. In addition, the histograms showed that the mean increased after the changes were made, again by around 8 minutes for this particular ED area. Histograms were also used to see if a certain day of the week or shift resulted in more variation.

The team also used boxplots to assess and compare the mean throughput times by shift before and after process changes were made. For the particular change charted above, the mean throughput times decreased slightly across all ED shifts.

Other 'Revelations'

What else did the data analysis reveal? After changes were made in the ED, there was only a slight improvement to the mean patient throughput time in two of the four ED areas where process changes were explored. In the two other areas where process changes were made, there was actually an increase in the mean throughput time.

“Although two areas showed slight improvement and two areas showed decline, statistically there was no significant difference between the before and after results by ED area,” says Rob Dreiling, process improvement specialist at Via Christi Health, who assisted with this project. “Our gut was telling us that the process changes had made such great improvements in the mean patient throughput times, but the data said otherwise.”

The graphs and charts helped the process improvement specialists to show the emergency department team what the data was really saying about the changes that were made. “We were then able to help them look at improvement projects in a different way," Dreiling says. "Instead of jumping in and just trying a change to the process, we taught the team to map the process from beginning to end and then use statistics to start focusing on the right areas, so the projects will make a meaningful difference and we’re getting the most bang for our buck.”

Have you ever been surprised by what the statistics had to say about your process improvement efforts? Tell us in the comments, or share your story with us in a guest post on the Minitab Blog: http://blog.minitab.com/blog/landing-pages/share-your-story-about-minitab/n

And to learn more about Via Christi Health's project, check out the full case study - Lean and Six Sigma Efforts at Via Christi Health: Bringing Safe, High-Quality Care

NOTE: This story reveals how easy it can be to identify important factors using the statistical method called Design of Experiments. It won't provide easy answers for making your own office's coffee any better, but it will show you how you can begin identifying the critical factors that contribute to its quality.

At their weekly meeting, her team gave Jill an ultimatum: Make the coffee better.

The office coffee was terrible. Drinking it was like playing a game of chicken with your taste buds. Jill’s practice was to let someone else get the first cup of the day; if gagging and/or swearing soon arose from the tester's cubicle, it was a particularly bad day for the coffee.

There were no good days for the office coffee.

Her team was right, Jill knew. Something had to change. But what was the real problem? Changing the wrong thing could, she thought with a shudder, make the coffee even worse.

Listing the Variables

She knew that some variables affecting the office coffee were fixed, and would not be easy to change. For example, the company wouldn’t purchase a new coffeemaker as long as the old one still worked, and they certainly wouldn’t buy gourmet coffee for everybody every day.

So Jill made of list of coffee-making variables that were under the team’s control, and came up with:

Coffee brand
Days since opening package of beans (0 days to 20 days)
Type of water (tap or bottled)
Duration of bean-grinding (1 minute or 4 minutes)
The amount of ground coffee placed in the filter (1 cup vs. 1.25 cups)
The number of cups in each pot (10 or 12)

Designing an Experiment

But how could she determine which of these factors played a significant role in the bitterness of the coffee? She knew she needed to perform an experiment, but she wasn’t quite sure how to do it. So she turned to a trusted source of help with data analysis: The Assistant in Minitab 17.

The Assistant's guidelines for planning and creating experiments helped her quickly determine that she needed to do a screening experiment:

However, having not done anything like this before, Jill wasn't entirely sure about how to proceed. So she clicked the Plan Screening Experiment box for some helpful guidelines:

She’d already identified the factors she needed to study, so she took some time to define the levels of those factors, then selected “Create Screening Design.” She completed the dialog box like this:

The Experimental Design for Screening the Factors

When she pressed OK, the Assistant created a datasheet containing her experimental design and even gave her the option to print out a data collection sheet.

It also provided a summary report that explained the experimental goal and the experiment’s strengths and limitations.

The experiment required Jill to make 12 pots of coffee with different combinations of factors. After running the experiment and collecting the data about the bitterness of each experimental batch of coffee, it was time to analyze the results.

Analyzing the Experimental Data

Jill returned to the Assistant, but this time she selected DOE > Analyze and Interpret... instead of Plan and Create...

She clicked “Fit Screening Model” in the Assistant’s decision tree:

And the Assistant analyzed the response data and produced all the output Jill needed to understand the results. The Report Card confirmed the data met statistical assumptions. The Diagnostic Report verified there were no unusual patterns in the results. The Effects Report illustrated the impacts of each factor.

Finally, the Summary Report made it easy to understand exactly which factors had a significant effect on the bitterness of the office coffee.

With the help of the Assistant, Jill now knew that the type of beans used, the number of cups brewed per pot, and the amount of grinding time the coffee beans received had a significant impact on the bitterness of the office coffee.

But just identifying these key factors wouldn’t resolve the team’s complaints. Now Jill needed to figure out how to use those factors to make the coffee more palatable. Fortunately, the Assistant could help her with that, too...as we'll see in tomorrow's post.

NOTE: This story will reveal how easy it can be to optimize settings using the statistical method called Design of Experiments, but it won't provide easy answers for making your own office coffee any better.

After her team’s ultimatum about the wretched office coffee, Jill used the design-of-experiments (DOE) tool in Minitab 17’s Assistant to design and analyze a screening study. Jill now knew that three of the factors she screened—the type of beans used, the number of cups brewed per pot, and the amount of grinding time the beans received—had a significant impact on the bitterness of coffee.

Now she needed to use those factors to make the coffee more palatable.

Designing an Experiment to Optimize Key Factors

Once again, she turned to the Assistant’s DOE tool.

This time, she selected “Create Modeling Design” in the Assistant’s decision tree.

She completed the dialog box as shown:

And the Assistant created a Minitab worksheet specifying the run order and variable settings for the experiment, and provided an option to print out sheets for data collection.

The experiment consisted of 16 runs, with each run being a pot of coffee prepared using varying combinations of factors. Jill asked all of the office’s coffee drinkers to sample and rate each brew. She then averaged all of the responses.

Analyzing the DOE Data for Optimization

Now it was time to analyze the data. After conducting the experiment and entering the data in the worksheet, Jill returned to the Assistant, this time to analyze the results:

The Assistant presented a decision tree that offered various analysis options. But it very easy for Jill to select and press the “Fit Quadratic Model” button, because The Assistant—which automatically recognized the model structure of the experiment—disabled the buttons for “Fit Linear Model” (which would not be appropriate for this data) and “Add Points for Curvature” (which was unnecessary since the experimental structure already included those points).

Before it could fit the model, the Assistant needed to know the goal: hitting a target, maximizing the response, or—as in this case—minimizing the response. Jill selected the appropriate option and clicked OK.

When Jill clicked OK, the Assistant analyzed the data and produced complete output for understanding the results. The Report Card confirmed the data met statistical assumptions, and the Diagnostic Report verified there were no unusual patterns in the results. The Effects Report illustrated the impacts of each factor, and the Summary Report clearly explained the bottom-line impact of the analysis.

Prediction and Optimization Report

But for Jill’s purposes, the most useful item was the Prediction and Optimization report, which detailed the optimal settings identified in the analysis, and also the top five alternative options, along with their predicted responses.

Based on the experiment’s results, Jill quickly wrote and posted new guidelines for preparing the office coffee. Steps 1 and 2 were:

Grind the whole coffee beans only for 1 minute.
Brew only 10 cups per pot of coffee.

Today, the office coffee may not rival the taste of that brewed by world-famous baristas, but it’s not so bad. And the cubicles no longer echo with sobbing and gagging sounds.

Both the brew and Jill’s sales team are a lot less bitter, thanks to an experiment designed and analyzed with the Minitab 17 Assistant.

Already relaxed on his first day in Napa, Brutus and his wife Suzy decide to visit their favorite winery just before lunch to taste their new Cabernet Sauvignon. The owner recognizes them as they walk in the door and immediately seats them on the patio overlooking the vineyard. Two glasses appear, and as the owner tells them about the new Cabernet, Brutus prepares for an onslaught of blackberry and plum flavors, with some notes of vanilla and leather.

Napa Valley

The wine is poured and as Brutus swirls it in his glass and breathes in the aromas, he picks up some of the oak infused into the wine as it aged in barrels in a cellar beneath where he sits. He takes a first sip, and, ignoring the obvious blackberry and plum flavors, he searches for the vanilla and leather. He has little difficulty finding them. As a second sip fills his mouth, he asks the owner if maybe he is detecting some chocolate...the owner pours a glass for himself and after a couple of sips agrees with Brutus.

Brutus, a Minitab employee, is an experienced wine drinker who has no doubt experienced many wine tastings not altogether different from what was described above.

But what if the tasting were a little different? What if...

One Thursday afternoon, at the time listed in his Outlook calendar, Brutus enters a conference room and is asked to sit at the table and place a blindfold on. A tasting of wine is poured, and Brutus is asked to tell the experimenters whether the wine is red or white, and which of four types it is. The experimenters don't tell him anything about each wine before it is poured, and offer no information after each tasting. When he is done he takes his blindfold off and goes back to his desk to continue working.

In the absence of visual senses and being "primed" to expect certain flavors, can Brutus, who at the winery could detect even subtle flavor notes in a wine, determine even the most basic information about the wines? It seemed like a question that could be settled by collecting a little data and looking at it with statistical software.

Fellow blogger Daniel Griffith and I performed this exact experiment and this week we will present the results in a series of blog posts:

Part I: The Experiment (you're reading it now)
Part II: The Survey
Part III: The Results
Part IV: The Participants

To run the experiment, we first recruited volunteers—all Minitab employees whose names have been changed in the posts—and asked them to complete a survey consisting of four questions:

On a scale of 1 to 10, how would you rate your knowledge of wine?
How much would you typically spend on a bottle of wine in a store?
How many different types of wine (merlot, riesling, cabernet, etc.) would you buy regularly (not as gifts)?
Out of the following 8 wines, which do you think you could correctly identify by taste?
- Merlot
- Cabernet Sauvignon
- Pinot Noir
- Malbec
- Chardonnay
- Pinot Grigio
- Sauvignon Blanc
- Riesling

We were looking for a broad range of responses, especially to question #1. Once we had enough surveys collected (13), we scheduled the tasting for each participant, with either 2 or 3 being served at any one time. The order of the wines was randomized independently for each participant, but only within each replicate, as we served each participant each of the four wines twice. In other words, a participant would have been given each of the four wines once before ever getting a second tasting, although this was not explained to participants.

The four wines selected were meant to satisfy the following criteria:

Two red wines and two white wines
Common enough types that a regular wine drinker would be familiar
Different enough types that they should be easily distinguishable if tasted back-to-back

Utilizing several resources including those found here, here, and here, we determined the following types to be good choices for the experiment:

Pinot Noir
Cabernet Sauvignon
Riesling
Sauvignon Blanc

"Representative" bottles of each were purchased in the $15-20 price range, and the bottles were labelled "A," "B," "C," and "D," and otherwise masked so even the experimenters would not know which type of wine they were serving each participant (although color would be obvious).

So how do you think our participants did?

Photograph of Napa Valley by Aaron Logan. Used under Creative Commons License 1.0.

In Blind Wine Part I, we introduced our experimental setup, which included some survey questions asked ahead of time of each participant. The four questions asked were:

On a scale of 1 to 10, how would you rate your knowledge of wine?
How much would you typically spend on a bottle of wine in a store?
How many different types of wine (merlot, riesling, cabernet, etc.) would you buy regularly (not as gifts)?
Out of the following 8 wines, which do you think you could correctly identify by taste?
- Merlot
- Cabernet Sauvignon
- Pinot Noir
- Malbec
- Chardonnay
- Pinot Grigio
- Sauvignon Blanc
- Riesling

Today, we'd like to take a look at the results of the survey to answer some questions about our participants.

We wanted to make sure that our participants covered a broad range of wine drinkers and, most important, covered a broad range of possible responses to question #1. Here are the distributions of responses to the first three questions:

Distribution of Responses

We were satisfied with the range and distribution of answers to those questions. Given that we had already selected which wine types to include, we were curious which wines participants believed they could identify by taste. The bar chart below shows how many out of the 13 participants indicated they could select that wine, with those included in the experimented in red*:

Wines You Could Identify

* The default color in Minitab 17 for the second group on any graph with attributes being controlled by groups is not actually called "red" but rather "wine," a clear indication this experiment was meant to be.

For most wines, only about 50% of participants felt they could identify them by taste alone. At this point they did not know the nature of the wine tasting, but must have been starting to get some ideas...

Next we wanted to see if there were any relationships between responses to different questions. For example, do participants with greater self-identified wine knowledge spend more on wine, or buy more types of wine?

Wine Knowledge versus Money Spent and Types Bought

While there was no evidence that claiming more knowledge is related to how much participants spent per bottle, there was a significant relationship with how many different types they bought regularly (p = 0.001, R-Sq = 66.2%). Perhaps those less knowledgeable stick to a few known types and spend more to make up for a lack of knowledge? And could it be that those with more knowledge buy a large variety, but know how to select without spending too much? We can't say definitively, but the results give some possibilities to consider.

We also wanted to test whether people who claim more wine knowledge (and who, we know from above, buy more types of wine regularly) would also claim to be able to identify more of the listed types:

N Identifiable vs Wine Knowledge

Again we see a significant relationship (p = 0.000, R-Sq = 70.6%), indicating that familiarity with many types of wine is closely linked with self-identified wine knowledge.

So with our survey results in hand, we see that:

We have a good distribution of participants
Participants who claim more knowledge don't tend to spend more money per bottle
Perceived wine knowledge is closely related with familiarity with a broad range of wine types

How well do you think the survey results will align with the experimental results?

In Part I and Part II we learned about the experiment and the survey, respectively. Now we turn our attention to the results...

Our first two participants, Danielle and Sheryl, enter the conference room and are given blindfolds as we explain how the experiment will proceed. As we administer the tasting, the colors of the wine are obvious but we don't know the true types, which have been masked as "A," "B," "C," and "D."

As Danielle and Sheryl proceed through each tasting, it is easy to note that they start off correctly identifying the color of each wine; it is also obvious that tasting methods differ greatly from person to person as Danielle tends to take one or two sips whereas Sheryl is drinking the entire sample.

As Sheryl has her fourth sample, we realize that her fifth sample will be the exact same wine. She makes her guess of "Pinot Noir, Red" for the fourth sample and is given the fifth, which she almost immediately takes a sip of. But the immediate response we were expecting doesn't come. Then a second sip. Still no guess. Then more. As she finishes the sample, she finally reports that it is "Cabernet Sauvignon, Red." All fears we had that perhaps the experiment would be too easy are quickly dissipated, as we have just witnessed someone make different guesses for the same wine given back-to-back. Sheryl's self-reported knowledge level is a 6.

Before jumping into how often tasters got color and type correct, it's worth looking at how often type and color were mismatched by each taster—in other words, if a taster guessed "Red, Riesling" they mismatched the two because Riesling is a white wine:

Mismatched by Taster

For the most part, mismatches weren't an issue for our tasters. However, Viviana in particular struggled, and a look at her results shows that she believed Pinot Noir to be a white wine and Sauvignon Blanc to be red. At one point during the experiment Viviana, a fluent Spanish speaker, even questions out loud if "blanc" from "Sauvignon Blanc" might be related to "blanco" (Spanish for "white"), but decides against it.

Our first look at how well participants did will use Minitab's Attribute Agreement Analysis, first for color only:

AAA Color

One thing to note about Attribute Agreement Analysis is that without testing huge numbers of unique parts, it is very difficult to distinguish between operators in any statistically significant manner. So most conclusions we draw are based on the system itself, grouping operators together into one big system. Here we see that our participants did generally well within themselves (how consistently they answered, whether right or wrong) as well as against the known standard (how consistently they answered correctly). In only two cases—Jeremiah and Santos (who had the lowest wine knowledge scores)—did the participant show higher consistency within than against the standard.

Next we'll look at the same graph, but by type instead of color:

AAA type

Here we see much less consistency "within," meaning our tasters had a harder time consistently guessing the same type when they had been given the same wine a second time. Further, once we compare to the standard, there was little consistency, with nearly half of the participants failing to get both samples of the same wine correct even once!

One extreme example is Bobby, who answered consistently 100% of the time but got none of the wines correct. From a practical standpoint, Bobby's performance is usually fairly easy to correct as he already possesses the ability to distinguish parts but is putting them in the wrong "buckets."

One drawback to traditional Attribute Agreement Analysis is that incorrectly appraising a part on even one replicate is treated as though that entire part was appraised incorrectly. In other words, suppose you got 7 out of the 8 tastings correct: despite being right 87.5% of the time, you are scored as being right 75% of the time (3 out of 4 wines correct). So another way to look at our results would be to see how often they were correct on both color and type out of the eight samples:

All Correct by Taster

Here we more clearly see that Brutus and Katherine turned in the best performances, with 6 out of 8 tastings correctly identified. Bobby, as discussed above, missed all of them but showed great consistency (as did Danielle).

Aside from looking at how each appraiser did, we can evaluate our results in a couple of other ways. One is to look at whether certain wine types were identified correctly more or less often than others:

Correct by Wine Type

There is little if any evidence that certain types were easier or more difficult to identify, with a Chi-Square test resulting in a non-significant p-value of 0.911.

Similarly, it would seem reasonable that as a participant had tasted multiple samples of wine, they would become less able to discern differences and performance would degrade. So we also looked at whether the number correct decreased as participants progressed through the eight samples:

Correct by Tasting

Again there is no visual or statistical evidence of such an effect. It could be that with more tastings the effect would have shown up, or it could be that the effect is countered by participants' memory of what they had answered previously and what to still expect in the remaining samples.

To summarize:

The regular wine drinkers who participated showed only limited ability to identify which type of wine they were drinking when blindfolded.
Color was considerably easier to identify than type.
Multiple participants showed a much greater ability to distinguish wines than to correctly identify them.

Keep in mind that our participants represented a diverse group of wine drinkers, so stay tuned for Part IV, when we evaluate whether our survey results from Part II correlate in any meaningful way to these experimental results...

In Part I, Part II, and Part III we shared our experiment, the survey results, and the experimental results. To wrap things up, we're going to see if the survey results tied to the experimental results in any meaningful way...

First, we look at whether self-identified knowledge correlated to the total number of correct appraisals:

Correct vs Knowledge

We have no evidence of a relationship (p = 0.795). So we'll look at the number correct by how much each participant usually spends:

Correct vs Spend

Again, no evidence of a relationship (p = 0.559).

How about how many types each regularly buys?

Correct vs Types

There appears to be something here, but statistically we don't have evidence (p = 0.151). Perhaps a larger experiment might uncover something.

Remember Question #4 in our survey, which asked if participants felt they could identify certain wines by taste? Eight wines were included as choices, including the four wines used in the experiment. So did participants' responses to that question correlate to their ability? To test, I did a Chi-Square test in Minitab.

Here are the expected number of correct guesses for each wine type along with the observed number of correct guesses:

Observed and Expected by Wine Type

At a strict alpha level of 0.05, there is not statistical significance—but given the small experiment, the corresponding p-value of 0.069 would probably give me reason to investigate further. The largest contributor to the Chi-Square was Riesling, which few participants felt they could identify but as many were correct on as any other wine. It could be that participants underrated their ability, or it could be process of elimination (if you know the other three, then the wine you can't identify in the experiment must be the Riesling).

In the end, we found little evidence of any relationships between the survey questions and the experimental results. While a larger study could likely draw some conclusions, we've learned enough to say that any real, underlying relationships are not particularly strong and would have considerable variation around them.

The next time you plan on trying a new wine, try tasting the wine without being told ahead of time what type it is or even looking at it (sleep masks make great blindfolds)...you might find it to be a completely different experience!

Last time, we went over Bar Charts you could create from Counts of Unique Values. However, sometimes you want to convey more information than just simple counts. For example, you could have a number of parts from different models. The number of occurrences themselves don't offer much value, so you may want a chart displaying the means, sums, or even standard deviations of the different parts. It's this case that we're after when we go to Graph > Bar Charts > Function of a Variable.

To illustrate this, I have a small data set investigating starfighters built in the Star Wars galaxy:

We’re interested in two variables: the size of the ship, and how much cargo the ship can hold. We have 6 different models of starfighters built by 3 different corporations. We can use bar charts to compare how these different groups of fighters are being produced.

Bar Charts with a Function of a Variable

Let’s say we’re interested in which manufacturer builds the largest ships. We have a few different models, and each individual ship will have a slightly different length, so we are interested in the mean length for each. Let’s start at Graph > Bar Chart, but this time let’s change our drop-down to function of a variable. Choose Simple with One Y, and enter size as the graph variable and manufacturer as the categorical variable. Make sure that the function is listed as Mean, as we want to know who, on average, builds the largest starfighters. Minitab gives us the following graph:

We can see that Koensayr Manufacturing, who are known for K-Wing and Y-Wing starfighters, comes out on top with the largest size. Sienar Fleet, on the other hand, produce smaller ships. They are mainly known for producing the smaller TIE Fighters.

They key takeaway from this graph is that if you have a number of observations for each category, Minitab Statistical Software will then perform an operation on them to get what is actually plotted.

Bar Charts with Two Categorical Variables

We can also use more than one categorical variable and either cluster them or stack them. In my last post, I discussed innermost and outermost grouping variables, and we can use that same logic here. If you have two categorical variables, you can just enter them in your dialog consecutively.

If we are interested in not just the size by company, but each model as well, we will enter Company first, followed by Model. We are still interested in the mean, so we will leave that as is.

There is one more step that is important, and it's something to keep an eye on because each corporation produces different ships. This creates an empty space for the nonexistent Incom/TIE/LN Class, for example, as well as numerous others. We want to clean up that space, and we can do that by clicking the Data Options button, sliding over to the Group Options tab, and unchecking ‘Include empty cells.’ This gives us the following graph:

Using the Stacked Bar Chart

We have one option left, and that is the stacked chart. In addition to size, we are interested in how much cargo each fighter can hold. We want to see how much total cargo each company holds in their fleet. To do this, we can "stack" the cargo capacity for each ship and see who comes out on top. To do that, we can fill out our dialog as follows:

which gives us the following, final graph:

As you are beginning to see, the Bar Chart is a very simple, yet very powerful tool. This allows Minitab to accommodate numerous date setups, and looks. We've already created six completely different charts, and there're still many more options we haven't explored. I'll continue going over these in a future post...

If you teach statistics or quality statistics, you’re probably already familiar with the cuckoo egg data set.

The common cuckoo has decided that raising baby chicks is a stressful, thankless job. It has better things to do than fill the screeching, gaping maws of cuckoo chicks, day in and day out.

So the mother cuckoo lays her eggs in the nests of other bird species. If the cuckoo egg is similar enough to the eggs of the host bird, in size and color pattern, the host bird may be tricked into incubating the egg and raising the hatchling. (The cuckoo can then fly off to the French Riviera, or punch in to work at a nearby cuckoo clock, or do whatever it is that cuckoos with excess free time do.)

The cuckoo egg data set contains measurements of the lengths of cuckoo eggs that were collected from the nests of 5 different bird species. Using Analysis of Variance (ANOVA), students look for statistical evidence that the mean length of the cuckoo eggs differs depending on the host species. Presumably, that supports the idea that the cuckoo may adapt the length of its eggs to better match those of the host.

Old Data Never Dies..It Just Develops a Rich Patina

Sample data sets have a way of sticking around for awhile. The cuckoo egg data predate the production of the Model T Ford! (Apparently no one has measured a cuckoo egg in over 100 years. Either that or cuckoo researchers are jealously guarding their cuckoo egg data in the hopes of becoming eternally famous in the annals of cuckoology.)

Originally, the data was published in a 1902 article in Biometrika by OM Latter. LHC Tippet, an early pioneer in statistical quality control, included the data set in his classic text, the Methods of Statistics, a few decades later.

That's somewhat fitting. Because if you think about it, the cuckoo bird really faces the ultimate quality assurance problem. If its egg is recognized as being different (“defective”) by the host bird, it may be destroyed before it’s hatched. And the end result could be no more cuckoos.

Analyze the Cuckoo Egg Data in Statistical Software

Displaying boxplots and performing ANOVA is the classic 1-2 punch that’s often used to statistically compare groups of data. And that’s how this vintage data set is typically evaluated.

To try this in Minitab Statistical Software, click to download the data set. (You'll need Release 17, which you can download free for a 30-day trial period.) Then follow the instructions below.

Display Side-by-Side Boxplots

In Minitab, choose Graph > Boxplots. Under One Y, choose With Groups, then click OK.
Fill out the dialog box as shown below, then click OK.

Minitab displays the boxplots

The boxplots suggest that the mean length of the cuckoo eggs may differ slightly among the host species. But are any of the differences statistically significant? The next step is to perform ANOVA to find out.

Perform One-Way ANOVA

In Minitab, choose Stat > ANOVA > One-Way.
Complete the dialog box as shown below.
Click Comparisons, check Tukeys, then click OK in each dialog box.

The ANOVA output includes the following results

Analysis of Variance

Source   DF    Adj SS Adj MS    F-Value   P-Value
Nest       5       42.94    8.5879    10.39     0.000
Error    114    94.25    0.8267
Total    119      137.19

Tukey Pairwise Comparisons

Grouping Information Using the Tukey Method and 95% Confidence

Nest                  N    Mean Grouping
HDGE SPRW   14    23.121 A
TREE PIPIT      15    23.090 A
PIED WTAIL      15   22.903 A B
ROBIN              16   22.575 A B
MDW PIPIT       45 22.299    B
WREN               15 21.130      C

Means that do not share a letter are significantly different.

----------------------------------------------------

The interval plot displays the mean and 95% confidence interval for each group. In the ANOVA table, the p-value is less than the alpha level 0.05. So you reject the null hypothesis that the means do not differ.The egg lengths are statistically different for at least one group.

Based on Tukey's multiple comparisons procedure, two groups significantly differ. The mean length of the cuckoo eggs in the wren nest are significantly smaller than the eggs in all the other nests. The mean length of the eggs in the meadow pipit nest are significantly smaller than the eggs in the tree sparrow or tree pipit nests.

With that said, the case of the morphing cuckoo eggs is frequently considered closed. The ANOVA results are said to support the theory that the cuckoo adapts egg length to the host nest.

Bottom line: If you're a mother cuckoo, stay away from ostrich nests.

Post-Analysis Pondering

As alluring and sexy as a p-value is to the data-driven mind, it has its dangers. If you're not careful, it can act like a giant door that slams shut on your mind. Its air of finality may prevent you from looking more closely—or more practically—at your results.

Case in point: Most of us know that a wren is smaller than a robin. But what about the other bird species?

Personally, I wouldn’t recognize a pied wigtail or a tree pipit if it dropped a load of statistically significant doo-doo on my shiny bald head.

How big is each bird species—or more to the point, how long, on average, are its eggs? If two species have about the same size egg, then the lack of a significant difference in the ANOVA results would actually support the theory that the cuckoo may adapt its egg length to match the host. Without any indication of whether the lengths of the eggs of these bird species differ significantly to begin with and, if so, how they differ, it's really difficult to determine how ANOVA results will support or contradict the idea of egg-length adaptation by the cuckoo.

Apart from that, there's the issue of practical consequence. Upon closer examination of the confidence intervals, it appears that the actual mean difference itself could be fractions of a millimeter. Does that size difference really matter if you're a host bird? Would it make a difference between the eggs being accepted or rejected?

Finally, there's the proverbial elephant in the room whenever you perform a statistical analysis. The one that trumpets noisily in the back of an asymptotically conscientious mind: "Assssssumptions!! Asssssumptions!"

How well do the cuckoo egg data satisfy the critical assumptions for ANOVA?

Stay tuned for the next post.

by Iván Alfonso, guest blogger

hotcakes I'm a huge fan of hot cakes—they are my favorite dessert ever. I’ve been cooking them for over 15 years, and over that time I’ve noticed many variation in textures, flavor, and thickness. Personally, I like fluffy pancakes.

There are many brands of hotcake mix on the market, all with very similar formulations. So I decided to investigate which ingredients and inputs may influence the fluffiness of my pancakes.

Potential factors could include the type of mix used, the type of milk used, the use of margarine or butter (of many brands), the amount of mixing time, the origin of the eggs, and the skill of the person who prepares the pancakes.

Instead of looking at all of these factors, I focused on the type of milk used in the pancakes. I had four types of milk available: whole milk, light, low fat, and low protein.

My goal was to determine if these different milk formulations influence fluffiness (thickness). Is the whole milk the best for fluffy hotcakes? Does skim milk works the same way as the whole milk? Can I be sure that the use of light milk will result in hot cakes that are less smooth?

Gathering Data

I sorted the four formulations as shown in the diagram below:

I used the the same amounts of milk, flour (one brand), salt and margarine for each batch of hotcakes I cooked.

The response variable was the thickness of the cooked pancakes. I prepared 6 pancakes for each type of milk, which gives me a total of 8 pancakes. I randomized the cooking order to minimize bias. I also prepared each batch by myself—if my sister or mother had helped with some lots, it would be a potential source of variation.

To measure the fluffiness, I inserted a stick into the center of each hotcake until the bottom, marked the stick with a pencil, then measured the distance to the mark in millimeters with a ruler.

After a couple of hours of cooking hotcakes, making measurements, and recording the data on a worksheet, I started to analyze my data with Minitab.

Analysis of Variance (ANOVA)

My goal was to assess the variation in thickness or fluffiness between different batches of hot cakes, so the most appropriate statistical technique was analysis of variance, or ANOVA. With this analysis I could visualize and compare the formulations based on my response variable, the thickness in millimeters, and see if there were statistically significant differences between them. I used a 0.05 significance value.

As soon as I had my data in a Minitab worksheet, I started to check it for the assumptions of ANOVA. First, I needed to see if the data followed a normal distribution, so I went straight to Statistics > Basic Statistics > Normality Test. Minitab produced the following graph:

Graph of probability of thickness

My data passed both the Kolmogorov-Smirnov and Anderson-Darling normality tests. This was a relief—since my data had a normal distribution, I didn’t need to worry about ANOVA’s assumptions of normality.

Traditional ANOVA also has an assumption of equal variances; however, I knew that even if my data didn’t meet this assumption, I could proceed using the method called Welch’s ANOVA, which accommodates unequal variances. But when I ran Bartlett’s test for equal variances, and even the more stringent Levene test, my data passed.

With confirmation that my data met the assumptions, I proceeded to perform the ANOVA and create box-and-whisker graphs.

ANOVA Results

Here's the Minitab output for the ANOVA:

one-way anova output

The ANOVA revealed that there were indeed statistically significant differences (p = 0.009) among my four batches of hotcakes.

Minitab’s output also included grouping information using Tukey’s method of multiple comparisons for 95% confidence intervals:

Tukey Method

The Tukey analysis shows that low-fat milk and light items do not show a significant difference in fluffiness. However, the batches made with whole milk and low protein did significantly differ from each other.

The box-and-whisker diagram makes the results of the analysis easier to visualize:

Boxplot of thickness

It is clear from the graph that hotcakes produced with whole milk had the most fluffiness, and those made with low protein milk had the least fluffiness. There was not a big difference between the fluffiness of hotcakes made with light milk and lowfat milk.

Which Milk Should You Use for Fluffy Pancakes?

Based on this analysis, I recommend using whole milk for fluffier hotcakes. If you want to avoid fats and sugars in milk, low fat milk is a good choice.

I always use lowfat milk, but the analysis indicates that light milk offers a good alternative for people following a strict no-fat diet.

It’s important to note that for this analysis, I only compared formulations that used the same brand of pancake mix and the same amounts of salt and butter. But there are other factors to consider! My next pancake experiment will use design of experiments (DOE) to compare milk types, different brands of flour, and margarine with and without salt, to see how all of these factors together affect the fluffiness of pancakes.

About the Guest Blogger:

Iván Alfonso is a biochemical engineer and statistics professor at the Autonomous University of Campeche, Mexico. Alfonso holds a master's degree in marine chemistry and has worked extensively in data analysis and design of experiments in basic and advanced sciences like chemistry and epidemiology.

Would you like to publish a guest post on the Minitab Blog? Contact publicrelations@minitab.com.

Analyzing the Data with Minitab

Other 'Revelations'

And to learn more about Via Christi Health's project, check out the full case study - Lean and Six Sigma Efforts at Via Christi Health: Bringing Safe, High-Quality Care

Learning to ride a bike is a rite of passage for any kid, so much so that we even use the expression "taking the training wheels off" for all kinds of situations. We say it to mean that we are going to let someone perform an activity on their own after removing some safeguard, even though we know they will likely experience failures before becoming proficient at it.

You see, riding a bike requires one to learn three skills—how to pedal the bike, how to brake, and how to balance on the bike—and training wheels allow the child to master two of those three without the dangers associated with failing at the third.

But at some point the training wheels must come off and the child must learn to balance on their own, and inevitably this ends with some scraped knees and tears and in some cases a general desire to never touch a bike again. We hope the child eventually learns and discovers what a great joy riding a bike is, but this isn't about the end of the process but rather how one gets there.

A few years ago, someone rethought this frustrating process and started selling "balance bikes," a two-wheeled bike with no pedals and a seat that sits low enough to the ground that a child can put their feet down without getting off the seat. In fact, they HAVE to put their feet down to propel themselves forward and to brake! Here is an example:

That's the one our son used, starting when he was two years old.

Now, here's what is so effective about a balance bike: while it still separates learning to balance from learning to pedal and brake, it requires the child to first learn the hard part—balancing—while still providing a safety net (the ability to put feet down to avoid tipping over). When the child is old enough for a pedal bike, you can leave the training wheels in the box...the child already mastered balancing while riding on two wheels, so learning to pedal and brake is simple and knees remain unscathed.

In my role at Minitab, I work with numerous organizations that train belts in Lean Six Sigma. There are two primary models that are virtually universal:

A student either selects to become a Green Belt or Black Belt, and in the respective class they learn to execute DMAIC projects by going in order of phase (first they learn Define, then Measure, etc.) and learning the tools appropriate for each phase as they go.
All students start in a Green Belt course (that typically lasts two weeks), which teaches DMAIC in the manner outlined above, and then those seeking to become Black Belts continue for an additional two weeks to learn more advanced tools that they can use.

In my experience, understanding statistics is the most challenging part of becoming a belt. So what if we let the balance bike inspire us and rethought how we teach belts? What if belts spent some time first learning some fundamental statistical concepts and tools and became comfortable applying those before we taught them the rest?

I think we might avoid many skinned knees and ensure everyone gets to experience the joy of riding alone...

Choosing the right type of subgroup in a control chart is crucial. In a rational subgroup, the variability within a subgroup should encompass common causes, random, short-term variability and represent “normal,” “typical,” natural process variations, whereas differences between subgroups are useful to detect drifts in variability over time (due to “special” or “assignable” causes). Variation within subgroup is therefore used to estimate the natural process standard deviation and to calculate the 3-sigma control chart limits.

In some cases, however, identifying the correct rational subgroup is not easy. For example, when parts are manufactured in batches, as they are in the automotive or in the semiconductor industries.

Batches of parts might seem to represent ideal subgroups, or at least a self-evident way to organize subgroups, for Statistical Process Control (SPC) monitoring. However, this is not always the right approach. When batches aren't a good choice for rational subgroups, control chart limits may become too narrow or too wide.

Control Limits May Be Too Narrow

Since batches are often manufactured at the same time on the same equipment, the variability within batches is often much smaller than the overall variability. In this case, the within-subgroup variability is not really representative and underestimates the natural process variability. Since within-subgroups variability is used to calculate the control chart limits, these limits may become unrealistically close to one another, which ultimately generates a large number of false alarms.

Too Narrow

Control Limits May Be Too Wide

On the other hand, suppose that within batches a systematic difference exists between the first two parts and the rest of the batch. In this case, the within-batch variability will include this systematic difference, which will inflate the within-subgroups standard deviation. Note that the between-subgroup variability is not affected by this systematic difference, and remember that only the within-subgroup variance is used to estimate the SPC limits. In this situation, the distance between the control limits would become too wide, would not allow you to quickly identify drifts.

Too wide

For example, in an injection mold with several cavities, when groups of parts molded at the same time but in different cavities are used as subgroups, systematic differences between cavities on the same mold will necessarily impact and inflate the within-subgroup variability.

I-MR-R/S Between/Within charts

When we encounter these situations in practice, using SPC charts can become more difficult and less efficient. An obvious solution is to consider within- and between-subgroup sources of variability separately. In Minitab, if you go to Stat > Control Charts > Variables Charts for Subgroups..., you will find I-MR-R/S Between/Within Charts to cover these types of issues.

between Within

Between/within charts are commonly used in the semiconductor industry, for example. Wafers are manufactured in batches (usually 25 wafers in a batch), and these batches are treated as subgroups in practice.

Using I-MR-R/S (between / within) charts allows you to use the I-MR chart to monitor differences between subgroups (I-MR charts), but in addition it also allows you to control within-subgroup variations (the R/S chart). Thus, this chart provides a full and coherent picture of the overall process variability. Thanks to that, identifying the right rational subgrouping scheme is not as crucial as it is when using standard Xbar-R or S control charts

Conclusion

We've all encountered ideas that seem simple in theory, but reality is often more complex than we expect. I-MR-R/S Between/Within control charts are a very flexible and efficient tool that make it much easier to account for complexities in process variability. They enable you to monitor Within and Between sources of variability separately.

If selecting the right rational subgroups is a challenge when you use control charts, this approach can minimize the number of false alarms you experience, while permitting you to react as quickly as possible to true “special” causes.

I caught the end of Toy Story over the weekend, which is definitely one of my all-time favorite children’s movies. Now—unfortunately or fortunately—I can’t get Randy Newman's theme song,“You’ve Got a Friend in Me,” out of my head!

It's also got me thinking about the nature of friendship, and how "best friends forever" are supposed to always be there when you need them. And, not to get too maudlin about it, but just like Woody and Buzz eventually realize their friendship, all of us hope the professionals who use our software also realize that “you’ve got a friend” in Minitab.

Now what do I mean by all this “BFF” business? I’m talking about our free technical support services (online and by telephone), as well as the plethora of free documentation that’s available online for each of our products. We’re here for you!

Be sure to visit the Support section of our website to browse the individual support sections that are available for each of our product offerings. From there, you can access the latest software downloads, documentation, and tutorials, and find the answers to all of your questions about software use, statistics, and quality improvement. In fact, there's a lot of great information there even if you're not using our software yet!

And for our latest and greatest release, Minitab 17 Statistical Software, we’ve expanded our online support offerings. Be sure to check out the following:

1. Getting Started with Minitab 17

Getting Started is our user guide that introduces you to some of the most commonly used features and tasks in Minitab—including how to explore your data with graphs, conduct statistical analyses and interpret the results, assess quality using control charts and capability analysis, and design an experiment.

The guide also includes shortcuts and tips for customizing Minitab.

2. Topic Library

The Minitab 17 Topic Library is a compilation of content from Help, StatGuide™, and Glossary—all of which are also available within the software itself. The library is arranged by statistical area so that you can easily find relevant topics, such as Basic Statistics and Graphs, Quality Tools, and Modeling Statistics (ANOVA, regression, DOE, etc.).

3. Data Sets

We took the best data sets from Minitab 17 Help and made them accessible online. We also made them even more realistic, so you can practice performing analyses and interpretation, explore alternate data layouts, and investigate statistical tools commonly used in your industry.

4. Macros Library

Our Macros Library includes many macros that allow you to automate, customize and repeat an analysis of your choice. You can download the .mac file for each macro we offer.

5. Technical Papers

Access technical papers that describe the research conducted to develop the methods and data checks used in the Assistant, as well as the methodology and supporting researching underlying two new analyses in Minitab 17.

6. Installation and Licensing FAQs

Browse our troubleshooting solutions to the most common error messages, installation issues, and activation/licensing topics.

The Personal Touch

If you've checked the website and still need help, know that we’re here whenever you need us (a real, live person I might add!). Access unlimited phone or online support from experts in statistics, quality improvement, and computer systems by visiting http://www.minitab.com/support/.

You really do have a friend in Minitab Support!

Previously, we looked at how accurate fantasy football rankings were for quarterbacks and tight ends. We found out that rankings for quarterbacks were quite accurate, with most of the top-ranked quarterbacks in the preseason finishing in the top 5 at the end of the season. Tight end rankings had more variation, with 36% of the top 5 preseason tight ends (over the last 5 years) actually finishing outside the top 10!

Cheat Sheat Now it’s time to move our attention to the running backs and wide receivers. Just like before, I went back the previous 5 seasons and found ESPN’s preseason rankings. For each season I recorded where the top preseason players finished at the end of the season, and also where the top players at the end of the season were ranked before the season started.

With quarterbacks and tight ends, I only looked at the top 5 players. But since more running backs and receivers are drafted, I’ll look at the top 10 players. Now let's analyze the data using Minitab Statistical Software.

How did the top-ranked preseason RBs and WRs finish the season ranked?

Let’s start by looking at how the top-rated preseason players fared at the end of the season. I took the top 10 ranked preseason RBs and WRs for each season from 2009-2013 and recorded where they ranked to finish the season.

IVP

Describe

At first glance, the individual plots show that the spread for running backs and wide receivers appears to be about the same. But the descriptive statistics tell a different story. The 3rd quartile value (Q3) is the most telling. 75% of preseason top 10 running backs finish in the top 18, while that number rises all the way to 28.75 for wide receivers! In fact, 32% of wide receivers ranked in the top 10 in the preseason finished the season outside the top 20, while the same was only true for 24% of running backs. Running backs do have the biggest outlier (when Ryan Grant had a season ending injury in his first game of 2010 and finished as the 126th ranked running back), but injuries like that are random and impossible to predict. Overall, preseason ranks for running backs are more accurate than for wide receivers.

How were the top-scoring RBs and WRs ranked in the preseason?

Let’s shift our focus to later in the draft. How often can you draft a lower-ranked running back or wide receiver and still have them finish in the top 10?

IVP

Describe

Wide receivers have had more players come out of nowhere to be top 10 scorers at the end of the season (Victor Cruz in 2011 and Brandon Lloyd and Stevie Johnson in 2010 were all ranked 87th or worse, yet finished in the top 10). But the descriptive statistics indicate a pretty even distribution otherwise. About half of the top 10 scoring RBs and WRs were not ranked in the top 10 to begin the season. And 25% of players were ranked outside the top 25, yet were still able to finish in the top 10. For both positions, there are frequently lower ranked players that exceed expectations and finish in the top 10.

But if you want one of the best players, say top 3...can you afford to wait or do you need to select a top ranked player early? The following table shows the 3 highest scoring players for each year, with their preseason rank in parentheses.

Year

Top Scoring RB

2nd Highest Scoring RB

3rd Highest Scoring RB

Top Scoring WR

2nd Highest Scoring WR

3rd Highest Scoring WR

2013

Jamaal Charles (6)

LeSean McCoy (10)

Matt Forte (12)

Calvin Johnson (1)

Josh Gordon (42)

Demaryius Thomas (6)

2012

Adrian Peterson (10)

Arian Foster (1)

Doug Martin (27)

Calvin Johnson (1)

Brandon Marshall (12)

Dez Bryant (15)

2011

LeSean McCoy (6)

Ray Rice (5)

Arian Foster (4)

Calvin Johnson (5)

Wes Welker (22)

Victor Cruz (110)

2010

Arian Foster (23)

Adrian Peterson (2)

Peyton Hillis (63)

Dwayne Bowe (20)

Brandon Lloyd (123)

Greg Jennings (11)

2009

Chris Johnson (7)

Adrian Peterson (1)

Maurice Jones-Drew (3)

Andre Johnson (2)

Randy Moss (4)

Miles Austin (68)

Since 2009, nine different receivers finished the season in the top 3 despite being ranked outside the preseason top 10. That’s 60%! And two of those players were ranked outside the top 100 in the preseason! But amongst all the inconsistency is Calvin Johnson. He’s the only wide receiver that is listed more than once. And he’s finished as the #1 ranked receiver 3 times in a row!

Meanwhile only 4 running backs (27%) were able to finish in the top 3 despite being ranked outside the preseason top 10. Right now in ESPN’s average draft position, the 10th running back is being drafted with the 19th overall pick. So before the 2nd round of the draft is even over, there is a good chance that the top 3 running backs have already been selected. Compare that to wide receivers, where the 10th receiver is being drafted with the 34th overall pick. So in the middle of the 4th round, a top 3 wide receiver (or even two) could still be on the board!

You can definitely wait to draft a wide receiver. The same can’t be said of running backs.

So how should you use this information in your fantasy football draft?

Focus on Running Backs Early

It’s not that the running back you pick is guaranteed to have a great season, but we just saw that, on average, 10 running backs are being selected before the end of the 2nd round! After that, your chances of picking a top running back start to diminish. At least one of your first two picks should be a running back, if not both!

However, keep in mind that selecting RB/RB with your first two picks can be a high-variance strategy. Consider that last year, in a 10-team league you could have taken Jamaal Charles and Matt Forte with the 6th and 15th pick respectively. Those players finished as the #1 and #3 RB, and if you didn’t win your fantasy league you definitely made the playoffs. Of course, you could have just as easily picked C. J. Spiller and Stevan Ridley, who finished 31st and 26th. Unless you got really lucky with your later picks, you could say hello to the consolation bracket.

If you want to play it more conservative, this data analysis pointed out a few other options. We know that quarterbacks are the most consistent position (Aaron Rodgers in 2013 aside), and this year Peyton Manning, Aaron Rodgers, and Drew Brees are the top 3 ranked quarterbacks. Spending an early pick on one of them should give you a consistent scorer who is much less likely to be a bust than an early running back.

Calvin Johnson and Jimmy Graham are also two very consistent players at two very inconsistent positions. Both players have finished in the top 3 at their position for the last 3 years (with Johnson finishing #1 all 3 years). You should feel just fine using your first two picks on one of these players and a running back. But use caution on selecting a different TE or WR with an early pick.

Wait on Your Wide Receivers

Wide receivers have the least accurate preseason rankings. Half of the preseason top 10 finish outside the top 12, and 25% finish outside the top 28! Because of this, there is value to be found later in the draft for wide receivers. Try to identify some wide receivers you like in later rounds, and focus your early picks on other positions.

This example is a bit extreme, but last year in a fantasy draft I spent 4 of my first 5 draft picks on running backs (with Jimmy Graham being the non-running back pick). I was able to do so because I was fine getting Eric Decker (preseason #20) and Antonio Brown (preseason #24) in the 6th and 7th rounds. They finished as the 8th and 6th ranked wide receivers. Obviously I got a little lucky that they were thatgood, but that’s kind of the point. I like to think of fantasy football picks as lottery tickets. You could hit the jackpot with some players, win a decent amount with others, and have some that are busts. After the first few rounds, wide receivers have a better chance of being winning lottery tickets than other positions.

Now, you don’t have to completely neglect the WR position before the 6th round like in the example above. Just know that you’re putting the odds in your favor by waiting to draft the bulk of your wide receivers.

Who Needs a Backup QB?

One last thing while we’re on the lottery ticket analogy. Let’s say you draft one of the top quarterbacks (Manning, Rodgers, or Brees). Don’t draft a backup quarterback! We already saw quarterbacks have the most accurate preseason rankings. By the time you draft a backup, it’s unlikely that lower-ranked player you choose will rise into a star that you will start each week or be able to use as trade bait. And on your QB’s bye week, you can easily pick somebody up off the waiver wire.

So why waste that pick on somebody with very little upside? Even if you’re picking in the 100s, there is still value to be had! Josh Gordon, Alshon Jeffery, Knowshon Moreno, and Julius Thomas were all ranked outside the preseason top 100 last year, and all turned into great fantasy players!

Want to take this idea to the (slightly crazy) extreme? If you have a late first round pick, try and use your first two picks on Jimmy Graham and one of Manning, Rodgers, or Brees. With your QB and TE position locked up, spend your next 12 picks on nothing but RBs and WRs. Then use your last two picks on a defense and kicker! I know this goes against the advice of focusing on running backs early, but I did say it was a slightly crazy and extreme strategy! If you can get lucky and find a winning lottery ticket with a lower-ranked running back or two (maybe Montee Ball, Ben Tate, Andre Ellington), it could even be a winning strategy.

If you decide to try that draft strategy, let me know how it goes! And whatever strategy you use, good luck with your 2014 fantasy football season!

Do you suffer from PAAA (Post-Analysis Assumption Angst)? You’re not alone.

Checking the required assumptions for a statistical analysis is critical. But if you don’t have a Ph.D. in statistics, it can feel more complicated and confusing than the primary analysis itself.

How does the cuckoo egg data, a common sample data set often used to teach analysis of variance, satisfy the following formal assumptions for a classical one-way ANOVA (F-test)?

Normality
Homoscedasticity
Independence

Are My Data (Kinda Sorta) Normal?

To check the normality of each group of data, a common strategy is to display probability plots. In Minitab, choose Graph > Probability Plot > Multiple. Enter the response (‘Length of Cuckoo Egg) as the Graph variable and the grouping variable (Nest) as the categorical variable. Click Multiple Graphs and check In separate panels of the same graph.

If the data are normally distributed, the points in each plot should fall along a straight line within the curved confidence bands on each side. In the graph above, the points in most plots are “kinda sorta” straight and fall mostly within the confidence bands—except for the meadow pipit group.

The output table at the bottom right of the graph includes the results of a normality test for each group. Not surprisingly, the meadow pipit group “fails” the normality test. (Its p-value is <0.005, so you reject the null hypothesis that the data are normally distributed.)

Hmmm…now what?

One commonly overlooked factor when interpreting results of a normality test is sample size. Generally speaking, a larger sample size gives the normality test more power to detect departures from normality. When the sample size is small, the test may not have enough “oomph” to catch nonnormality. So you really can’t be sure.

For the cuckoo egg data, sample size for all of the groups is about 15—except for the meadow pipit group, which has 45 data values. It’s probably not a coincidence that the biggest group is the only one flagged as being nonnormal.

Another way to evaluate the normality assumption for ANOVA is to display a normal probability plot of the errors. To do this in Minitab, just click Graphs in the ANOVA main dialog box and check Normal probability plot of residuals.

Hmmm. The data kinda sorta fall along the line. But the circled errors on the bottom left indicate some skewness in the left tail. Is it important?

ANOVA is robust to the normality assumption—it can handle “kinda sorta” normal data—if your sample sizes are large enough.

But how large, exactly? It's starting to feel like one fuzzy result leads to another fuzzy result.

You decide to have a bowl of ice cream and move onto the next assumption. Maybe you'll get lucky with...

Homoscedasticity? Say What?

If the term homoscedasticity makes your flesh crawl, read this post. Basically, all it means is that the data in each group should vary by about the same amount (i.e. have roughly equal variance).

ANOVA is also fairly robust to the equal variances assumption, provided the samples are about the same size for all groups. For the cuckoo egg data set, the samples have very different sizes, so we can’t use that “get-out-of-jail-free” card.

To visually assess the variation among the groups, you can look at boxplots. The lengths of the boxes and the whiskers should be about the same for each group. (Graph > Boxplot > With Groups).

The boxplots indicate that the variation of the data in each group is “kinda sorta" the same. Some whiskers are definitely longer and some are shorter. The lengths of the boxes vary somewhat as well.

Is it enough to matter? The boxplots can't tell us.

A more rigorous way to compare the variation among the groups is to perform an equal variances test (Stat > ANOVA > Test for Equal Variances). Here are the results for the cuckoo egg data:

Method                   Test Statistic              P-Value
Multiple comparisons         —                    0.476
Levene                    0.64                   0.670

For both tests, the p-value is not less than the alpha level of 0.05. Therefore, there’s not sufficient evidence to conclude that the variances are different. These data seem to satisfy the equal variances assumption.

But wait. Before you pop that champagne cork. Don't forget the small sample sizes (n = 15) in this data set.

The equal variances test, like the normality test, may not have enough power to detect differences in variance among the groups. You could be assuming the variances are equal only because you don't have enough data to prove they're not.

At this point, you may feel an Excedrin headache coming on.

Using the Minitab Assistant to Perform ANOVA

If you feel like you’re falling into a rabbit hole, struggling to evaluate the assumptions of the assumptions, you may want to drop by the Minitab Assistant to check what condition your condition is in.

To perform one-way ANOVA, choose Assistant > Hypothesis Tests > One-Way ANOVA. Then, click on the Report Card.

First, look at the Equal Variances check.

Say so-long to the homoscedastic heebie-jeebies! The report card informs you don’t need to worry about equal variances at all—even when your sample sizes are different. Sweet! That’s because the Assistant uses Welch’s method rather than an F- test. Welch’s method doesn’t require that the groups have equal variances.

Next, look at the Normality check.

Here, the Assistant cautions that the small samples could be an issue for the normality assumption. But it also lets you know that normality can't be reliably checked with small samples—saving you time from pondering the imponderable.

More important, it gives a clear definitive answer to the question “What size samples do I need for robustness to the normality assumption?”

The minimum requirement for robustness to normality depends on the number of groups in your data. The sample size recommendation is based on tens of thousands of simulations performed by Minitab research statisticians to examine the Type I error rate when using Welch’s method to evaluate small and moderate size samples from nonnormal distributions.

There are two other checks on the report card—one for Sample Size, and one for Unusual Data. The Unusual Data check flags any outliers that could severely bias your results. The Sample Size check lets you know if you’ve collected enough data to provide the test with sufficient power to detect a difference in means.

These aren’t formal ANOVA assumptions, but they’re critical issues that can affect your results—and they’re often overlooked.

Independence

If you’re a careful reader, and you’re still with me, you’ve probably noticed I haven’t covered the formal ANOVA assumption of independence. It’s not because it’s not important—of all the assumptions, ANOVA is least robust to violations of independence.

But independence of the observations is affected by how you collect your data. To evaluate this assumption with 100% certainty, the Assistant would have to peek over your shoulder and watch you. The Assistant can’t do that.

Yet.

But it does provide a graph of the data points graphed in worksheet order. If you’ve entered your data in the worksheet in the sequence they were collected, this alerts you to any dependence in the data points collected close together in time or space. That’s one of most common types of dependence.

These data don't seem to show a time-order effect. However, notice the outliers flagged in the meadow pippit group. Those could be an issue.

Note: If you use classic One-Way ANOVA in Minitab, you can evaluate this assumption by looking at the residuals vs. order plot. If there’s a time-dependent pattern in the data points, there will be a time-dependent pattern in the errors.

Key Points

Sample size is critical not just for providing power to the primary analysis, but for establishing robustness to some formal assumptions.
If you don't know the formal assumptions for an analysis, or if you run into trouble trying to interpret them, consider using the Assistant.
If your data are "borderline" for satisfying the ANOVA requirements, consider your application and its potential consequences. You may want to use a nonparametric test, such as Kruskall-Wallis or Mood's median test, to compare the medians of the groups rather than the means. Just remember—those tests have formal assumptions as well!

by Lion "Ari" Ondiappan Arivazhagan, guest blogger

In India, we've seen this story far too many times in recent years:

Timmanna Hatti, a six-year old boy, was trapped in a 160-feet borewell for more than 5 days in Sulikeri village of Bagalkot district in Karnataka after falling into the well. Perhaps the most heartbreaking aspect of the situation was the decision of the Bagalkot district administration to stop the rescue operation because the digging work, if continued further, might lead to collapse of the vertical wall created by the side of the borewell within which Timmanna had struggled for his life.

Timmanna's body was retrieved from the well 8 days after he fell in. Sadly, this is just one of an alarming number of borewell accidents, especially involving little children, across India in the recent past.

This most recent event prompted me to conduct a preliminary study of borewell accidents across India in the last 8-9 years.

Using Data to Assess Borewell Accidents

My main objective was to find out the possible causes of such accidents and to assess the likelihood of such adverse events based on the data available to date.

This very preliminary study has heightened my awareness of lot of uncomfortable and dismaying factors involved in these deadly incidents, including the pathetic circumstances of many rural children and carelessness on the part of many borewell contractors and farmers.

In this post, I'll lead you through my analysis, which concludes with the use of a G-chart for the possible prediction of the next such adverse event, based on Geometric distribution probabilities.

Collecting Data on Borewell Accidents

My search of newspaper articles and Google provided details about a total of 34 borewell incidents since 2006. The actual number of incidents may be higher, since many incidents go unreported. The table below shows the total number of borewell cases reported each year between 2006 and 2014.

Borewell Accident Summary Data

Summary Analysis of the Borewell Accident Data

First, I used Minitab to create a histogram of the data I'd collected, shown below.

A quick review of the histogram reveals that out of 34 reported cases, the highest number of accidents occurred in the years 2007 and 2014.

The ages of children trapped in the borewells ranged from 2 years to 9 years. More boys (21) than girls (13) were involved in these incidents.

What hurts most is that, in this modern India, more than 70% of the children did not survive the incident. They died either in the borewell itself or in the hospital after the rescue. Only about 20% of children (7 out of 34) have been rescued successfully. The ultimate status of 10% of the cases reported is not known.

Pie Chart of Borewell Incidents by Indian State

Analysis of a state-wise pie chart, shown below, indicates that Haryana, Gujarat, and Tamil Nadu top the list of the borewell accident states. These three states alone account for more than 50% of the borewell accidents since 2006.

Pareto Chart for Vital Causes of Borewell Accidents

I used a Pareto chart to analyze the various causes of these borewell accidents, which revealed the top causes of these tragedies:

Children accidentally falling into open borewell pits while playing in the fields.
Abandoned borewell pits not bring properly closed / sealed.

Applying the Geometric Distribution to Rare Adverse Events

There are many different types of control charts, but for rare events, we can use Minitab Statistical Software and the G chart. Based on the geometric distribution, the G chart is designed specifically for monitoring rare events. In the geometric distribution, we count the number of opportunities before or until the defect (adverse event) occurs.

The figure below shows the geometric probability distribution of days between such rare events if the probability of the event is 0.01. As you can see, the odds of an event happening 50 or 100 days after the previous one are much higher than the odds of the next event happening 300 or 400 days later.

By using the geometric distribution to plot the number of days between rare events, such as borewell accidents, the G chart can reveal patterns or trends that might enable us to prevent such accidents in future. In this case, we count the number of days between reported borewell accidents. One key assumption, when counting the number of days between the events, is that the number of accidents per day was fairly constant.

A G-Chart for Prediction of the Next Borewell Accident

I now used Minitab to create a G-chart for the analysis of the borewell accident data I collected, shown below.

Although the observations fall within the upper and lower control limits (UCL and LCL), the G chart shows a cluster of observations below the center line (the mean) after the 28th observation and before the 34th observation (the latest event). Overall, the chart indicates/detects an unusually high rate adverse events (borewell accidents) over the past decade.

Descriptive statistics based on the Gaussian distribution for my data show 90.8 days as the mean "days between events." But the G-chart, based on geometric distribution, which is more apt for studying the distribution of adverse events, indicates a Mean (CL) of only 67.2 days as "days between events."

Predicting Days Between Borewell Accidents with a Cumulative Probability Distribution

I used Minitab to create a cumulative distribution function for data, using the geometric distribution with probability set at 0.01. This gives us some additional detail about how many incident-free days we're likely to have until the next borewell tragedy strikes:

Based on the above, we can reasonably predict when next borewell accident is most likely to occur in any of the states included in the data, especially in the states of Haryana, Tamil Nadu, Gujarat, Rajasthan, and Karnataka.

The probabilities are shown below, with the assumption that the sample size and the Gage R&R / Measurement errors of event data reported and collected are adequate and within the allowable limits.

Probability of next borewell event happening in...

31 days or less: 0.275020 = 27.5% appx.
104 days or less = 0.651907 = 65% appx.
181 days or less = 0.839452 = 84% appx.
488 days or less = 0.992661 = 99% appx.

My purpose in preparing this study would be fulfilled if enough people take preventive actions before the possibility of occurrence next such an adverse event within next 6 months (p > 80%). NGOs, government officials, and individuals all need to take preventive actions—like sealing all open borewells across India, especially in the above 5 states—to prevent many more innocent children from dying while playing.

About the Guest Blogger:

Ondiappan "Ari" Arivazhagan is an honors graduate in civil / structural engineering from the University of Madras. He is a certified PMP, PMI-SP, PMI-RMP from the Project Management Institute. He is also a Master Black Belt in Lean Six Sigma and has done Business Analytics from IIM, Bangalore. He has 30 years of professional global project management experience in various countries and has almost 14 years of teaching / training experience in project management and Lean Six Sigma. He is the Founder-CEO of International Institute of Project Management (IIPM), Chennai, and can be reached at askari@iipmchennai.com.

An earlier version of this article was published on LinkedIn.

Terrorist Attacks, 2013, Concentration and Intensity

If you’re already a strong user of Minitab Statistical Software, then you’re probably familiar with how to use bar charts to show means, medians, sums, and other statistics. Bar charts are excellent tools, but traditionally used when you have a categorical variable on your x-axis. When you want to plot statistics with a continuous variable on your y-axis, look no further than Minitab’s line plots. Line plots are particularly well-suited for when you want to plot a statistic like the sum or the mean over time.

I like to illustrate Minitab with data about pleasant subjects: poetry, candy, and maybe even the volume of ethanol in E85 fuel. Data that are about unpleasant subjects also exist, and we can learn from that data too. We’re fortunate to have both the Chicago Project on Security and Terrorism (CPOST) and the National Consortium for the Study of Terrorism and Responses to Terrorism (START) working hard to produce publicly-accessible databases with information about terrorism.

START has been sharing analyses of its 2013 data recently. The new data prompted staff from the two institutions to engage an interesting debate on the Washington Post’s website about whether the Global Terrorism Database (GTD) that Start maintains “exaggerates a recent increase in terrorist activities.” For today, I’m just going to use the GTD to demonstrate a nice line plot in Minitab, which will give a tiny bit of insight into what that debate is about.

When you download the GTD data, you can open one file that has all of the data except for the year 1993. Incident-level data for 1993 was lost, so that year is not included, although you can get country-level totals for numbers of attacks and casualties from the GTD Codebook. Those who maintain the GTD recommend“users should note that differences in levels of attacks and casualties before and after January 1, 1998, before and after April 1, 2008, and before and after January 1, 2012 are at least partially explained by differences in data collection” (START, downloaded August 18th, 2014).

The GTD is great for detail. It records latitude and longitude for an event in 1970 when gunshots were fired at the police headquarters in Cairo, Illinois. Absent from the data is a column that references the changes in methodology. Thus, it’s not hard to end up with the recently-criticized graph that started the debate between the staff at the two institutions. The graph shows all of the data in the GTD for the number of suicide attacks for each year since 1970. It looks a bit like this:

The number of suicide attacks increases dramatically in the past two years.

The message of this graph is that the number of suicide attacks has never been higher. The criticism about the absence of the different methodologies seems fair. So how would we capture the different methodologies in Minitab? With a calculator formula, of course. Try this, if you’re following along:

Choose Calc > Calculator.
In Store result in variable, enter Methodology.
In Expression, enter:

if(iyear < 1998, 1, iyear < 2009, 2, iyear=2009 and imonth < 4, 2, iyear < 2012, 3, 4)

Click OK.

Notice that because the GTD uses 3 separate columns to record the dates, I’ve used two conditions to identify the second methodology. With the new column, you can easily divide the data series trends according to the method for counting events. This is where the line plot comes in. The line plot is the easiest way in Minitab to plot a summary statistic over time. You can try it this way:

Choose Graph > Line Plot.
Select With Symbols, One Y. Click OK.
In Function, select Sum.
In Graph variables, enter suicide.
In Categorical variable for X-scale grouping, enter iyear.
In Categorical variable for legend grouping, enter Methodology.

You’ll get a graph that looks a bit like this, though I already edited some labels.

The last two years, which are dramatically higher in number, have a new methodology.

One interesting feature of this line plot is that there are two data points for 2009. Because we’re calling attention to the different methodologies, it’s important to consider that the first quarter and the last 3 quarters of 2009 use different types of data. In this display, we can see the mixture of methodologies. The fact that the two highest points are from the newest methodology also lend some credence to the question of whether the numbers from 2012 and 2013 should be directly compared to numbers from earlier years. The amount of the increase due to better data collection is not clear.

Interestingly, a line plot that shows the proportion of suicide attacks out of all terrorist attacks presents a different picture about the increase related to the different methodologies. That’s what you get if you make a line plot of the means instead of the sums.

By proportion, the increase in suicide attacks in the last two years does not look as dramatic.

Considering which variables to compute and how to interpret them in conjunction with one another is an important task for people doing data analysis. In the final installment of the series on the Washington Post’s website, GTD staff members note that they do not “rely solely on global aggregate percent change statistics when assessing trends.” The flexibility of the line plot to show different statistics can make the work of considering the data from different perspectives much easier.

We do like to have fun at the Minitab Blog, but we know that there’s serious data in the world too. Whether your application is making tires that keep people safe on the road or helping people recover from wounds, our goal is to give you the best possible tools to make your process improvement efforts successful.

Screening experimental designs allow you to study a very large number of factors in a very limited number of runs. The objective is to focus on the few factors that have a real effect and eliminate the effects that are not significant. This is often the initial typical objective of any experimenter when a DOE (design of experiments) is performed.

Table of Factorial Designs

Consider the table below. In Minitab, you can quickly access this table of factorial designs by selecting Stat > DOE > Factorial > Create Factorial Design... and clicking "Display Available Designs." The table tells us the number of runs in a 2k standard factorial design, its resolution, and the number of factors to be analyzed. If you need to study 8 factors, or 7 factors, or even 6 factors, a 16-run design might could be a good choice, because it balances the number of runs with the ability to effectively interpret the experimental results.

Confounding is the price we pay for reducing the number of runs: the effects of different factors or interactions of factors can't be evaluated individually, so interpreting the results becomes more difficult and riskier. The yellow color indicates that this design is acceptable in terms of confounding/resolution. Designs with the green color have limited or no confounding, but a larger number of runs. On the other hand, any experimenter should refrain from using designs located in the red region due to extensive confounding. Red means that some main factors are confounded with two-factor interactions.

According to the table, studying more than 8 factors means you'll need to perform 32 in order to remain in the yellow region. But when experiments are costly and time consuming, that may be too many.

Plackett & Burman Designs: An Alternative to Factorial

Fortunately, another solution is available: Plackett & Burman designs may be used to analyze a larger number of variables more economically. For example, to study 9 factors you need only conduct 12 runs, rather than the 32 runs needed in a standard 2k fractional design. In the Minitab Assistant, for example, Plackett and Burman designs are suggested whenever the number of factors to be studied is larger than five.

The main disadvantage of this type of screening design is that two-factor interactions cannot be studied. In Plackett and Burman designs, interactions are partially confounded or "aliased" with all main effects. Sometimes the interactions appear with a positive sign, meaning they are summed up to the main effect. In other cases they have a negative sign, indicating they are subtracted from the main effect. One third of each interaction is added to or subtracted from any main effect. For example, in the experimental aliasing table below, the effect of the A factor is partially confounded with all interaction effects:

Alias / confounding structure in a Plackett & Burman design - Aliases :

A - 0.33 BC - 0.33 BD - 0.33 BE + 0.33 BF - 0.33 BG - 0.33 BH + 0.33 CD - 0.33 CE - 0.33 CF + 0.33 CG - 0.33 CH + 0.33 DE + 0.33 DF - 0.33 DG - 0.33 DH - 0.33 EF - 0.33 EG - 0.33 EH - 0.33 FG + 0.33 FH + 0.33 GH - 0.33 BCD + 0.33 BCE - 0.33 BCF + 0.33 BCG + 0.33 BCH + 0.33 BDE + 0.33 BDF + 0.33 BDG - 0.33 BDH + 0.33 BEF - 0.33 BEG + 0.33 BEH - 0.33 BFG + 0.33 BFH - 0.33 BGH + 0.33 CDE + 0.33 CDF + 0.33 CDG + 0.33 CDH - 0.33 CEF - 0.33 CEG + 0.33 CEH + 0.33 CFG + 0.33 CFH + 0.33 CGH + 0.33 DEF - 0.33 DEG - 0.33 DEH + 0.33 DFG + 0.33 DFH + 0.33 DGH + 0.33 EFG - 0.33 EFH + 0.33 EGH + 0.33 FGH

An Added Bonus for Plackett and Burman Designs

But there is an added benefit to consider when using Plackett and Burman designs. Suppose that the effects that were not significant have been gradually eliminated from the model, and that only three main effects remain. With Plackett-Burman, you do not need to perform an additional 23 = 8-run full factorial design in order to estimate the two-factor interactions. The initial Plackett & Burman design already contains all the tests you need for this 23 full factorial design—and you will even get four replicates in addition to the full design!

The Pareto diagram above from a Plackett and Burman design shows that only factors A, D & H have statistically significant effects.

Transforming a Plackett & Burman Design to a Full Factorial

To transform your Plackett & Burman design into a full factorial 23 design that will allow you to study all two-factor interactions in Minitab, go to Stat > DOE > Factorial > Define Custom Factorial Design... Select your three significant factors and click on Low/high to specify the factor settings. Now you can study the two-factor interactions.

The initial Plackett and Burman design has been transformed into a full factorial design, the interactions between the three factors that have been previously selected can now be studied, and as we can see in the Pareto, graph one interaction appears to be statistically significant.

Of course, we can use Minitab to graph the significant interaction effect to get a better understanding of it.

Choosing the Right Experimental Design for Your Needs

Experimenters need to identify the best trade-off between running a limited number of tests and getting as much information as possible to improve processes / products. Plackett and Burman designs act as a funnel, enabling a quick reduction in the number of potential factors.

That is the reason why, in the Minitab Assistant, Plackett and Burman screening designs are considered when the number of potential factors is greater than five. After the analysis phase of the DOE results, if the number of significant factors in a 12-run Plackett & Burman design is equal to or smaller than 3, the initial design may easily be transformed into a full factorial DOE enabling you to study all two-factor interactions.

The 2014-15 NFL season is only days away, and fans all over the country are planning their fall weekends accordingly. In this post, I'm going to use data analysis to answer some questions related to ticket prices, such as:

Which team is the least/most expensive to watch at home?
Which team is the least/most expensive to watch on the road?
If you are thinking of a road trip, which stadiums offer the largest ticket discount for your team?

For dedicated fans, this is far from a trivial matter. As we'll see, fans of one team can get an average 48% discount on road-game tickets, while fans of two other teams will pay, on average, more than double the cost to see their team on the road.

Gathering and Preparing NFL Ticket Price Data

The data I'm analyzing comes from Stubhub, an online ticket marketplace owned by ebay. You'll find a summary of the number of Stubhub tickets available and mimimum price on Stubhub for each NFL game in 2014 on the ESPN website: http://espn.go.com/nfl/schedule/_/seasontype/2/week/1

snapshot of NFL data from ESPN.com

I did a quick copy-and-paste from ESPN into Excel to put each variable nicely into a column, and then another copy-and-pasted the data into Minitab Statistical Software to prepare it for analysis. I used the Calc > Calculator commands Left() and Right() in Minitab to extract the minimum ticket price, the first few letters of the away team name, and the first few letters of the home team name. (Since the summary on ESPN.com only shows the minimum price, the analysis below is based only on the minimum ticket price available for each game.)

Which Is the Most Expensive Team to See on the Road?

The Bar Chart below shows that Green Bay is the most expensive road team to watch play with a 2014 average price of $145 per road game. This is noticeably higher than the other NFL teams. The next closest is San Francisco with an average price of $128 per road game. But catching a Jacksonville road game is a fraction of those costs, averaging $48.

Bar Chart of Average Minimum Price for Away Team 2014 NFL Season

Which Is the Most Expensive Team to See at Home?

The Bar Chart below shows that Chicago is the most expensive team to watch play on their home turf, with a 2014 average price of $175 per home game. Seattle is a close second with an average price of $171 per home game. Seeing Dallas or St. Louis in a home game is a fraction of those costs, averaging just $35.

Bar Chart of Average Minimum Price for Home Team 2014 NFL Season

Is It Cheaper to See Your Favorite Team on the Road?

Finally, I compared the average home game ticket price to the average road game ticket price for each NFL team.

The road team discount award goes to the Seattle Seahawks. You'll save, on average, 48% watching their games on the road. But if you're a fan of Dallas or Miami, you'll be financially better off watching your team at home—their average price increases more than 110% when they're on the road. One factor that drives this result is the popularity of Dallas and Miami across the country: the higher demand supports their higher road-game price. Also, Dallas' enormous home stadium (AT&T) offers cheap Party Pass seats (which aren't really seats at all, but rather a standing room section).

Is It Cheaper to See Your Favorite NFL Team on the Road?

One drawback with this analysis is it doesn't take into account the opponent that each team faces. For example, Chicago may happen to be playing some very popular teams at home in 2014, which drives their home-game ticket prices up for this season.

In a future post, I'll discuss how to adjust for opponents and other variables such as game day and game time.

Quality Improvement in Healthcare: Showing if process changes actually improve the patient ...

Why Is the Office Coffee So Bad? A Screening Experiment Narrows Down the Critical Factors

Making the Office Coffee Better with a Designed Experiment for Optimization

Blind Wine Part I: The Experiment

Blind Wine Part II: The Survey

Blind Wine Part III: The Results

Blind Wine Part IV: The Participants

Investigating Starfighters with Bar Charts: Function of a Variable

Cuckoo for Quality: A Birdseye View of a Classic ANOVA Example

A Fun ANOVA: Does Milk Affect the Fluffiness of Pancakes?

Quality Improvement in Healthcare: Showing if process changes actually improve the patient ...

Taking the Training Wheels Off: Rethinking How Lean Six Sigma is Taught

How Could You Benefit from Between / Within Control Charts?

“You’ve got a friend” in Minitab Support

How Accurate are Fantasy Football Rankings? Part II

Angst Over ANOVA Assumptions? Ask the Assistant.

Using the G-Chart Control Chart for Rare Events to Predict Borewell Accidents

Use a Line Plot to Show a Summary Statistic Over Time

How Could You Benefit from Plackett & Burman Experimental Designs ?

Analyzing NFL Ticket Prices: How Much Would You Pay to See the Green Bay Packers?