Quantcast
Channel: Minitab | Minitab
Viewing all 828 articles
Browse latest View live

Understanding Alpha Alleviates Alarm

$
0
0

Dr. J. Michael Hamilton preparing the carcinoembryonic antigen (CEA) vaccinia vaccine used to try to prevent cancer.One of the more misunderstood concepts in statistics is alpha, more formally known as the significance level. Alpha is typically set before you conduct an experiment. When the calculated p-value from a hypothesis test is less than the significance level (α), the results of an experiment are so unlikely to happen by chance that the more likely explanation is the results occur because of the effect being studied. That the results are unlikely to happen by chance is what we mean by the phrase “statistical significance,” not to be confused with practical significance.

There was a wonderful example of how confusing alpha can be when the National Institutes of Health canceled trials for an HIV vaccine. The headline from US News and World Report reads “HIV Vaccine Study Cancelled: Recipients More Likely to Catch Virus Than Those Given Placebo.” As is often the case with language, the headline offers more than one interpretation. One possible reading is that receiving this potential HIV vaccine caused more subjects to get HIV. This reading is suggested by the beginning of the subhead to the article: “Review of large-scale study found alarming.” The idea that a vaccine meant to prevent HIV would cause more people to get the virus makes "alarming" an understatement.

However, the subhead closes with a different tone, noting that the trial had a “non-statistically significant’ result.” A non-statistically significant result doesn’t seem nearly as alarming (because it’s not). The NIH press release about this doesn’t give us all of the information we’d need to reproduce their math, such as the number of subjects who had the vaccine at least 28 weeks. But we can probably get a good approximation from the total number of subjects.

 

Placebo Group

Vaccine Group

Total

1,244

1,250

New HIV infections

30

41

The NIH press release also didn’t report the alpha level they used to determine that these results were not significant. By tradition, alpha is 0.05. The value 0.05 probably has its roots in the paper “The Statistical Method in Psychical Research” published by Sir Ronald Fisher in 1929 in the Proceedings of the Society for Psychical Research. Fisher writes, “it is a common practice to judge a result significant, if it is of such magnitude that it would have been produced by chance not more frequently than once in twenty trials. This is an arbitrary, but convenient, level of significance for the practical investigator.”

So let’s use 0.05 as alpha for now, and use a 2 proportions hypothesis test to see if there's a statistically significant difference in HIV infection between the proportion of participants who received the vaccine and those who did not.  Here's how to do it in Minitab Statistical Software:

  1. Choose Stat > Basic Statistics > 2 Proportions.
  2. Select Summarized data.
  3. In First, enter 30 for the Events and 1244 for the Trials.
  4. In Second, enter 41 for the Events and 1250 for the Trials.
  5. Click OK.

Minitab provides the following output:

Test and CI for Two Proportions


Sample   X     N  Sample p
1       30  1244  0.024116
2       41  1250  0.032800
 

Difference = p (1) - p (2)
Estimate for difference:  -0.00868424
95% CI for difference:  (-0.0217291, 0.00436056)
Test for difference = 0 (vs not = 0):  Z = -1.30  P-Value = 0.192
 
Fisher's exact test: P-Value = 0.228

As you can see in the output above, the resulting Fisher's exact test p-value is 0.228. Because 0.228 is bigger than 0.05, and big in general, we would say that the difference in the proportion of infections between these two groups is probably not because of the vaccine.

The downside is that we still don’t have a vaccine for HIV. But the idea that the vaccine contributes to the contraction of HIV isn’t supported by the NIH data. Understanding what alpha is helps us feel more confident about what the results really mean.

The image of the doctor preparing a cancer vaccine is in the public domain. The photo is by John Keith and is from the National Cancer Institute.

 


What Are the Effects of Multicollinearity and When Can I Ignore Them?

$
0
0

Multicollinearity is problem that you can run into when you’re fitting a regression model, or other linear model. It refers to predictors that are correlated with other predictors in the model. Unfortunately, the effects of multicollinearity can feel murky and intangible, which makes it unclear whether it’s important to fix.

My goal in this blog post is to bring the effects of multicollinearity to life with real data! Along the way, I’ll show you a simple tool that can remove multicollinearity in some cases.

 multicollinearity in regression model for bone density
 My goal in this blog post is to bring multicollinearity to life with real data about bone density.
How Problematic is Multicollinearity?

Moderate multicollinearity may not be problematic. However, severe multicollinearity is a problem because it can increase the variance of the coefficient estimates and make the estimates very sensitive to minor changes in the model. The result is that the coefficient estimates are unstable and difficult to interpret. Multicollinearity saps the statistical power of the analysis, can cause the coefficients to switch signs, and makes it more difficult to specify the correct model.

Do I Have to Fix Multicollinearity?

The symptoms sound serious, but the answer is both yes and no—depending on your goals. (Don’t worry, the example we'll go through next makes it more concrete.) In short, multicollinearity:

  • can make choosing the correct predictors to include more difficult.
  • interferes in determining the precise effect of each predictor, but...
  • doesn’t affect the overall fit of the model or produce bad predictions.

Depending on your goals, multicollinearity isn’t always a problem. However, because of the difficulty in choosing the correct model when severe multicollinearity is present, it’s always worth exploring.

The Regression Scenario: Predicting Bone Density

I’ll use a subset of real data that I collected for an experiment to illustrate the detection, effects, and removal of multicollinearity. You can read about the actual experiment here and the worksheet is here. (If you're not already using it, please download the free 30-day trial of Minitab and play along!)

We’ll use General Regression to assess how the predictors of physical activity, percent body fat, weight, and the interaction between body fat and weight are collectively associated with the bone density of the femoral neck.

Given the potential for correlation among the predictors, we’ll have Minitab display the variance inflation factors (VIF), which indicate the extent to which multicollinearity is present in a regression analysis. A VIF of 5 or greater indicates a reason to be concerned about multicollinearity. To display the VIFs, go to Stat > Regression > General Regression, then click on the Results button and check Display variance inflation factors.

General Regression Results

Here are the results of the Minitab analysis:

General Regression results with unstandardized predictors

In the results above, Weight, Activity, and the interaction term are significant while %Fat is not significant. However, three of the VIFs are very high because they are well over 5. These values suggest that the coefficients are poorly estimated and we should be wary of their p-values.

Standardize the Continuous Predictors

In this model, the VIFs are high because of the interaction term. Interaction terms and higher-order terms (e.g., squared and cubed predictors) are correlated with main effect terms because they include the main effects terms.

To reduce high VIFs produced by interaction and higher-order terms, you can standardize the continuous predictor variables. In Minitab's statistical software, it’s easy to standardize any continuous variable using the Standardize tool (Calc > Standardize). You simply choose the input columns, output columns, and the standardization method.

For our purposes, we’ll choose the Subtract mean method, which is also known as centering the variables. This method removes the multicollinearity produced by interaction and higher-order terms as effectively as the other standardization methods, but it has the added benefit of not changing the interpretation of the coefficients. If you subtract the mean, each coefficient continues to estimate the change in the mean response per unit increase in X when all other predictors are held constant.

I’ve already added the standardized predictors in the worksheet we’re using; they're in the columns that have an S added to the name of each standardized predictor.

General Regression with Standardized Predictors

We’ll fit the same model as before, but this time using the standardized predictors.

General Regression results with standardized predictors

In the model with the standardized predictors, the VIFs are down to an acceptable range.

Comparing Regression Models to See the Effects of Multicollinearity

Because standardizing the predictors effectively removed the multicollinearity, we could run the same model twice, once with severe multicollinearity and once with moderate multicollinearity. This provides a great head-to-head comparison and it reveals the classic effects of multicollinearity.

The standard error of the coefficient (SE Coef) indicates the precision of the coefficient estimates. Smaller values represent more reliable estimates. In the second model, you can see that the SE Coef is smaller for both %Fat and Weight. Also, %Fat is significant this time, while it was insignificant in the model with severe multicollinearity. Also, its sign has switched from + 0.005 to – 0.005! The %Fat estimate in both models is about the same absolute distance from zero, but it is only significant in the second model because the estimate is more precise.

Compare the Summary of Model statistics between the two models and you’ll notice that S, R-squared, Predicted R-squared, and the others are all identical. Multicollinearity doesn’t affect how well the model fits. In fact, if you want to use the model to make predictions, both models produce identical results for fitted values and prediction intervals!

Multicollinear Thoughts

Multicollinearity can cause a number of problems. We saw how it sapped the significance of one of our predictors and changed its sign. Imagine trying to specify a model with many more potential predictors. If you saw signs that kept changing and incorrect p-values, it could be hard to specify the correct model!

However, we also saw that multicollinearity doesn’t affect how well the model fits. If the model satisfies the residual assumptions and has a satisfactory predicted R-squared, even a model with severe multicollinearity can produce great predictions.

You also don’t have to worry about every single pair of predictors that has a high correlation. When putting together the model for this post, I thought for sure that the high correlation between %Fat and Weight (0.827) would produce severe multicollinearity all by itself. However, that correlation only produced VIFs around 3.2. So don’t be afraid to try correlated predictors—just be sure to check those VIFs!

For our model, the severe multicollinearity was primarily caused by the interaction term. Consequently, we were able to remove the problem simply by standardizing the predictors. However, when standardizing your predictors doesn’t work, you can try other solutions such as:

  • removing highly correlated predictors
  • linearly combining predictors, such as adding them together
  • running entirely different analyses, such as partial least squares regression or principal components analysis

When considering a solution, keep in mind that all remedies have potential drawbacks. If you can live with less precise coefficient estimates, or a model that has a high R-squared but few signficiant predictors, doing nothing can be the correct decision because it won't impact the fit.

 

Control Charts: Rational Subgrouping and Marshmallow Peeps!

$
0
0

Control charts are used to monitor the stability of processes, and can turn time-ordered data for a particular characteristic—such as product weight or hold time at a call center—into a picture that is easy to understand. These charts indicate when there are points out of control or unusual shifts in a process.

Statistically speaking, control charts help you detect nonrandom sources of variation in the data. In other words, they separate variation due to common causes from variation due to special causes, where:

  • Common cause variation is variation that is naturally inherent in a process, and always present.
  • Special cause variation represents assignable or unusual sources of variation that are not typically part of a process. Special causes can be either detrimental or beneficial to a process.

(My colleague has gone into further detail about the different types of variation in another post.)

Control Chart

Statistical software like Minitab makes it easy to create and interpret control charts, but those charts are useless if they aren’t created using the right “subgroups” of your data. Understanding and choosing a rational subgroup size before you collect your data and create control charts is critical, but the concept is often misunderstood.

What Are Rational Subgroups?

A rational subgroup is a group of units produced under the same set of conditions. Rational subgroups are meant to represent a “snapshot” of the process. They also reflect how your data are collected, and represent the inherent (common cause) variation in your process at any given time.

For many processes, you can form rational subgroups by sampling multiple observations that are close together in time, but still independent of each other. For example, a die cut machine produces 100 plastic parts per hour. The quality engineer measures five randomly selected parts at the beginning of every hour. Each sample of five parts is a subgroup.

Within Subgroup vs. Between Subgroup Variation

You can use subgroups to separate the two types of variation in a process:

  • Within subgroup: the variation among measurements within subgroups; also known as common cause variation.
  • Between subgroup: the variation between subgroups that may be caused by specific identifiable factors, or special causes.

The control limits on a control chart are calculated using the variability “within” the subgroups. Therefore, it’s important to select the subgroup so that only the common cause variation in the process is represented. The goal is to improve process quality, by eliminating between subgroup variation and reducing within subgroup variation. It’s also extremely important to remember that the subgroup information you specify in Minitab when you’re creating your control chart reflects how the data were collected.

Interactive Rational Subgrouping with Marshmallow Peeps!

To illustrate the topic of subgrouping and different subgrouping methods to her statistics and Six Sigma students, Dr. Diane Evans, associate professor of mathematics at Rose-Hulman Institute of Technology, incorporated a hands-on lab into her instruction that features Marshmallow Peeps!

The interactive lesson places students as Statistical Process Control (SPC) operators in a manufacturing setting where Peeps are produced by four different machines. The operators explore different subgrouping arrangements in order to supply the best possible information to their stakeholders about Peep “sponginess.” Computer science students at Rose-Hulman even developed a special “Peep Simulator” to aid the classes in collecting the data, http://www.rose-hulman.edu/~evans/peeps/php/new.php:  

Peeps Simulator

If you’re attending the upcoming ASQ World Conference in Indianapolis May 6-8, Dr. Evans will present this exercise at 8:00 a.m. in Wasbash 1. For more information on her presentation, check out the session details page.

We hope to see you there!

Of possible interest:

A Rational Look at Subgrouping in Control Charts

Control Charts: Subgroup Size Matters

Explaining Quality Statistics So Your Boss Will Understand: Pareto Charts

$
0
0

Rock and Roll and Pareto Charts!I once had a boss who had difficulty understanding many, many things. When I need to discuss statistical concepts with people who don't have a statistical background, I like to think about how I could explain things so even my old boss would get it. 

My boss and I shared a common interest in rock and roll, so that's the device I'll use to explain one of the workhorses of quality statistics, the Pareto chart. I'd tell my boss to imagine that instead of managing a surly gang of teenaged restaurant employees, he's managing a surly rock and roll band, the Zero Sigmas. The band did a 100-date tour last year, and before going on the road again, he wants to see what were the most frequent mishaps, in hopes things might run a bit more smoothly. He's got a table of data, but it's a little difficult to figure out what the raw numbers mean. 

He needs to create a Pareto chart with that data. It's a very straightforward tool, but therein lies the danger: Because it looks like a standard bar chart, the Pareto chart can be misinterpreted. A well-intentioned (but statistics-impaired) boss may take one look at it and, assuming it's a regular old run-of-the-mill bar chart, imagine he's got it all figured out without actually thinking about what he's seeing. He'll want to make sure he really understand what the Pareto chart reveals. 

What Does a Pareto Chart Do? 

In my boss's defense, there's really not much difference between a Pareto chart and your regular old run-of-the-mill bar charts, except that the Pareto chart ranks your defects (or whatever it is you're measuring) from largest to smallest. 

From a quality improvement perspective, this is important because it can help you identify which quality problems are the most critical in terms of volume, expense, or other factors. Once you've prioritized your challenges, you can focus improvement efforts where they'll have the largest benefits.

Organizations tend to use Pareto charts in one of two ways. 

Use 1: Determine the most common type of defect. Use 2: Identify projects with the greatest potential returns or benefits.

In quality-speak, we say the Pareto chart separates the "vital few" problems from the "trivial many."  In other words, it gives you an easy way to visualize which problems have the biggest impact on your organization.

When you look at the data in a Pareto chart, you might find out, for instance, that even though there's a perception that customers complain more frequently about, say, shipping speed, your greatest volume of complaints is really about the voicemail system. Knowing that can help you tackle the problem that's most important to the most customers first.  

In keeping with the rock-and-roll theme, the Pareto chart will help us see which incidents on last year's tour kept the Zero Sigmas from rocking audiences to the fullest. 

Setting Up Data for the Pareto Chart

When you create a Pareto chart in our statistical software, your data must include the names of each defect. These names can be text or numeric. If your data are summarized in a table, you must include a column of frequencies or counts, with nonnegative numeric values for each defect. 

Let's say you've identified and tallied 9 types mishaps that occurred with some regularity during last year's tour. You can arrange the data in a Minitab worksheet like this: 

Incidents frequencies from 100-date rock and roll tour

To create a chart that shows the frequencies of these incidents graphically, we just select Stat > Quality Tools > Pareto Chart and enter Incident as our Defects data and Count as our Frequencies data. Minitab produces the following graph: 

Pareto Chart of Rock Tour Incidents

The right Y-axis shows the percent of the total mishaps accounted for by each type of incident, while the left Y-axis shows the count of those incidents. The red line indicates cumulative percentage, which can help you judge the added contribution of each category. The bars of show the count (and the percentage of total) for each category. Below the bars, the counts, percents, and cumulative percents are listed for each incident category.

You'll notice the last grouping is labeled "Other." Your raw data didn't include an "Other", but by default Minitab puts all categories with counts that represent less than 5% of the total defect count into this "Other" category.

In this example, 27.9% of the incidents involved the Zero Sigmas starting their gig late, which they did every single night of the tour. Another 22.3 percent of incidents involved the band's singer, Hy P. Value, forgetting the lyrics to his own songs. The combined, or cumulative, percentage for starting late and forgetting lyrics is 50%, and if you add in the guitars going out of tune, you've accounted for a whopping 67.9% of the incidents that plagued the band's 100-day tour.

In terms of overall numbers of incidents, it looks like these are the three areas you should focus on if you want the Zero Sigmas to kick out the jams more efficiently on the next tour. This illustrates how you would create a Pareto chart for the first use above: to determine the most commonly occurring types of defects. 

Limitations of the Pareto Chart

Although the Pareto chart is easy to create, understand and use, it does have some limitations:

  • Data collected over a short time period, especially from an unstable process, may lead to incorrect conclusions. If the data's not  reliable, you could get an incorrect picture of the distribution of defects. For example, while on tour Hy P. Value got laryngitis and was caught lip-syncing three times in the same week. Had you only looked at that week's worth of data, you'd have a distorted picture of how frequently lip-syncing incidents occurred throughout the tour. Just remember that the "vital few" problems can change frequently, and that short periods may not accurately represent your process as a whole.
     
  • Data gathered over long periods may include changes made to the process, so it's a good idea to see if you've got stratification or changes in the distribution over time.
     
  • If your initial Pareto analysis does not yield useful results, make sure you selected meaningful categories of defects. You also should make sure your "other" category is not too large.
     
  • A Pareto analysis is designed to help you get the biggest bang for your quality improvement buck, but it doesn't give you permission to ignore small, easily solved problems that can be fixed while you're working on the bigger issues.
     
  • Focusing on the areas of greatest frequency should decrease the total number of incidents (or defects). Focusing on the areas of greatest impact should increase the overall benefits of improvement.

About that last bullet...this example looks at the overall counts of incidents that happened on the Zero Sigma's recent tour. But how do we know those are the incidents that had the biggest impact on the tour's overall rock-and-roll awesomeness? That's exactly the kind of great question my old boss would never have thought to ask, and it's the question I'll answer in my next post.  

 

 

 

Talking Design of Experiments (DOE) and Quality at the 2013 ASQ World Conference

$
0
0

design of experiments for bowling ballsThe 2013 ASQ World Conference is taking place this week in Indianapolis, Indiana, and it's been a treat to see how our software was used in the projects highlighted in many of the presentations. As a supporter of the conference, a key event for quality practitioners around the world, Minitab was proud to sponsor one of the presentations that seemed to get a lot of attendees talking. Scott Sterbenz, a Six Sigma leader from Ford Motor Company, delivered a presentation entitled "Leveraging Designed Experiments for Success," which explained how to make designed experiments succeed with examples learned from experiments done in a variety of situations.

In statistics, DOE refers to the creation of a series of experimental runs, or tests, that provide insight into how multiple variables affect an outcome, or response. In a designed experiment, investigators change more than one factor at a time, and then use statistical analysis to determine what factors are important and identify the optimum levels for these factors. It’s an efficient and economical way to improve almost any process. DOE can be used to create and analyze many different kinds of experiments, and Minitab's DOE tools can help investigators identify the best experimental design for their situation, based on the number of variables being studied and other conditions.

One of the examples Sterbenz talked about involved a Design of Experiments project he led at Ford. Shortly before the 2011 Ford Fiesta was supposed to make a very high-profile debut, a cosmetic problem with the vehicle’s carpet arose. Given that the redesigned Fiesta embodied the idea that an affordable car needn't skimp on high-quality interiors, the issue needed to be solved, and fast. Sterbenz and the company’s Body Interior Six Sigma team met that challenge with a fractional factorial designed experiment that could give them the information they needed in only 34 runs.

When data from the 34 runs were analyzed in Minitab, the results revealed complex interactions between the different settings on the equipment used to make the carpet. The interactions explained why previous adjustments to individual settings had failed to find a way to eliminate the problem. This designed experiment not only provided the team with a list of significant variables and interactions, but also with equations to show how the inputs affected the responses. Even better, the results showed that optimization settings for eliminating brush marks did not have an adverse effect on the plushness. Incredibly, the entire project took 12 days, from the time the problem was defined to the point where the solution was in place and the process was under control.

Sterbenz is an avid bowler, and he also shared an example of how the U.S. Bowling Congress (USBC) used a designed experiment to assess whether or not to modify one of its existing specifications for bowling balls. The team was looking at the requirements surrounding a ball's static weight, and had identified six distinct factors to assess. Testing each factor one-by-one would be prohibitively expensive and time-consuming, so the USBC team used Minitab’s DOE tools to create a designed experiment in which they could change more than one factor at a time, then use statistics to determine which ones have significant effects on a ball's path along the bowling lane.

When the team used Minitab to analyze their data, in addition to obtaining an average of the shots and graphing the ball path, they also got a big surprise. Previous research had identified three distinct, mathematically predictable phases of bowling ball motion, and the analysis showed that a ball's static weight appeared to have very little effect on these phases. But analyzing the newly collected data with Minitab revealed that a ball’s static weight could result in a previously unidentified fourth phase of motion. This DOE not only demonstrated the necessity for the existing specifications for static weight, they also enhanced the organizations understanding of the factors that affect a bowling ball's path and opened up new areas of study for the USBC.

In an hour, Sterbenz demonstrated how DOE can be used to improve everything from the carpet in an automobile to our understanding of bowling-ball physics, but the range of challenges that might be addressed through designed experiments is literally unlimited. It's a good bet many of the people who attended Sterbenz's presentation left with ideas about how they could apply DOE in their own organizations.  What about you? 

Have you used a designed experiment to solve a quality challenge?

 

How to “Expand” Your Gage Studies

$
0
0

As we said in yesterday’s post, it’s been exciting for Minitab to be a supporter of the ASQ World Conference on Quality and Improvement taking place this week in Indianapolis. There have been many great sessions and an abundance of case studies shared that highlight how quality teams worldwide are improving the performance of their businesses.

One session that generated a lot of interest from the conference participants was conducted by Minitab trainers Lou Johnson, Daniel Griffith and Jim Colton.

Presenting Expanded Gage R&R Studies at the ASQ WQCI Event

Their presentation, Sampling Plan for Expanded Gage R&R Studies, covered Gage R&R studies and how the Gage R&R tools in Minitab can help you account for all of the important factors in your study and overcome some of the roadblocks that traditionally make these types of studies challenging.

What is a Gage R&R Study?

Let’s take a step back … What can Gage R&R studies tell you? Well, for starters, Gage studies can tell you if your measurement system is producing data you can trust. After all, if you can’t trust your measurement system, then you can’t trust the data that it produces. Specifically, Gage R&R studies can also help you understand if your measurement tools are consistent, if your measurement system is sensitive enough, and if the people taking the measurements are consistent.

Expanded Gage R&R Tools in Minitab

Traditionally, Gage R&R studies focus on just two factors: parts and operators. But what if you want to take your measurement systems analysis beyond just parts and operators and include other factors that also influence your measurement system?

If you don’t include these factors, you’ll never know how much they affect measurement variability, and your measurement system may look better or worse than it really is. Expanded Gage R&R in Minitab lets you include up to eight additional factors, so you can investigate the impact of all the factors that may affect your measurement system.

For example, say that testers working on a hectic production floor are concerned that shifts in production speed are affecting their ability to measure. By adding line speed as a factor to their gage study, they are able to see that changes in line speed influence their measurements even more than they suspected.

Expanded Gage R&R in Minitab

You can find Expanded Gage R&R in Minitab by navigating to Stat > Quality Tools > Gage Study > Gage R&R (Expanded). The dialog window for this tool may look a little different than the traditional Gage R&R dialog, but the rules remain the same: you still need to enter in columns for Part, Operator, and Measurements:

Expanded Gage R&R

For more on performing an Expanded Gage R&R study in Minitab, check out this article as well as these additional resources:

Minitab Tutorial: Fundamentals of Gage R&R

Unbalanced Designs and Gage R&R (Expanded)

Accuracy vs. Precision: What’s the Difference?

What’s your experience in performing Gage studies?

If you attended ASQ World, what kinds of take-aways are you headed back to work with? Please share in the comments below.

The Diversity (and Consistency) of Quality Improvement: the 2013 ASQ ITEA Presentations

$
0
0

I'm in the airport at Indianapolis, waiting to go home after three exciting days at the 2013 American Society for Quality World Conference.  As I write this, it's Wednesday evening after the conference has closed, and it turns out my flight has been delayed.

This could give me ample opportunity to muse about the quality issues that might keep me from reaching central Pennsylvania tonight. But I'm kind of pumped up, so I'm more interested in thinking about what I've experienced and seen over the past few days. This is the kind of event that makes you want to keep focusing on the positive, not fretting about things outside of your control!

The ASQ ITEA Awards ProcessAs I mentioned on Tuesday, Minitab has a presence at the ASQ World Conference (and many other quality-related events throughout the year) because we are dedicated to helping quality improvement practitioners. That's why we don't just attend the show as exhibitors, but our own quality improvement specialists frequently present at the event. But we are involved in many other ways.

This year, for the first time, Minitab was proud to sponsor the International Team Excellence Awards process. 

If you're a quality practitioner and you're not already aware of the ITEA, I encourage you to check it out....the annual ASQ International Team Excellence Award Process (ITEA) celebrates the accomplishments of quality improvement teams from a broad spectrum of industries from around the world.  The 2012-2013 teams don't have all of their information available on the ASQ web site yet, but you can see how the 2011-2012 team finalists saved millions of dollars for their companies and showcased exceptional projects and processes.

The teams who presented at this year's event were impressive. I made it a point to attend as many of the ITEA finalist presentations as I could fit in, and was amazed by the diversity of challenges they faced and the variety of approaches they took to find solutions to their challenges. And while not every project involved extensive data analysis (though many did), it was great to see the diversity of approaches that can be applied to improve a process, product or service through the analysis of data.

It's also gratifying to see how many of those organizations have turned to Minitab to make data analysis easier.  Nearly all of the presentations I attended included Pareto charts, control charts, or designed experiments that were created with Minitab Statistical Software. Improving quality is a big, challenging task, and we strive to make the data analysis component of quality improvement as easy as possible, so that people can devote more attention and energy to other tasks in their projects.

In the coming weeks, I'll try to share some insights I gleaned from sitting on presentations made by teams from Baxter Healthcare, Nokia HERE Mapping, Coca Cola, and many others whose projects made a huge difference at their companies. In the meantime, though, I'll just offer one insight I gleaned from watching them. As different as all of these teams were, as varied as their challenges were, and as wide-ranging as their degrees of success may have been, all of the teams displayed two consistent traits: enthusiasm, and optimism. 

As part of the presentation process, these teams discuss how they identify and select the projects they work on, and some of the challenges these teams faced were truly daunting. But these teams took on the challenges and made change, beneficial change, happen. They didn't say "That's the way things are."  They didn't say "There's nothing we can do about that."  They didn't say "That's not my department."  They didn't say anything about what couldn't be done; instead, they all looked at their problems and focused on what could be done, and then did the best they could to do it. And in most cases, they met or exceeded the goals they set for themselves.

That's pretty inspirational, and I feel incredibly fortunate to work for a company that so many of these teams look to for tools that will make a big part of their quality improvement task easier.

 

Which Big Ten Division is Better?

$
0
0

Big Ten LogoAfter another round of what seems like endless conference realignment, the Big Ten has settled on 14 teams split into two divisions; East and West. However, with the likes of Ohio State, Penn State, Michigan, and Michigan State, the East division appears to be much stronger. In fact, Indiana athletic director Fred Glass called it the “Big Boy Division,” and Penn State coach Bill O’Brien referred to it as “Murderers' Row.”

But will the statistics back up their claims? After all, it’s easy to spout off any opinion you want. I could claim that the Sun Belt is a better football conference than the SEC.  But without any kind of data analysis to back up my opinion, it’s meaningless. So before we go claiming that the East division is going to dominate the West for years to come, let’s see if the statistics agree.

The Data

I collected the conference winning percentage of every Big Ten team over the last 10 years. I stuck with only conference games because the non-conference schedule can vary a great deal from team to team. Sticking with only conference games gives us a similar sample of games for each team. Of course, Maryland, Rutgers, and Nebraska didn’t all play in the Big Ten for the last 10 years. But Big East jokes aside, their conferences really weren’t that different from the Big Ten. Plus, it’s not like Rutgers and Maryland were dominating the Big East and ACC. So for the purposes of this data analysis, we’ll use their record from their respective conferences.

But winning percentage doesn’t always tell us everything. Last year Michigan State was 3-5 in the Big Ten, but that was because they went 2-5 in Big Ten games decided by 4 points or less. On the season, they actually scored more points against conference opponents (159) than they allowed (149). Compare that to Purdue, who also had a conference record of 3-5, but scored 189 points while allowing a whopping 265. So despite having the same conference record, Michigan State was really a better team than Purdue last season because they had a higher scoring differential. Because of this, I’m going to use scoring differential along with winning percentage to compare the two divisions.

You can download a Minitab worksheet with the data in it here. I invite you to open Minitab (or download a free 30-day trial version if you don't already have it) and follow along!

Comparing Winning Percentage

Taking the winning percentage of all 14 Big Ten teams over 10 years gives us 140 observations. I split them up into two groups (East and West) and used Minitab Statistical Software to run a 2-Sample t analysis on the data.

Minitab's 1-Sample t Analysis

Boxplot

We see that the winning percentages for both divisions are about 0.500. The East is a little higher at 0.516 compared to 0.476 for the West. But the p-value of 0.35 is not less than 0.05, meaning that the difference is not statistically significant. The boxplot clearly shows that there really isn’t a big difference in the winning percentages between the two divisions over the last 10 years.

But we already stated that winning percentages don’t always tell the full story. So will we get a different result if we use scoring differential instead?

Comparing Scoring Differential

Just like above, I split the scoring differentials for each team into two groups and used Minitab to perform a 2-sample t analysis.

Minitab's 1-Sample t Analysis

Boxplot

Over the last 10 years, teams in the Eastern division have outscored their conference opponents by an average of 7 points per season while teams in the West have been outscored by 10.3 points. However, the p-value is 0.244, which again means there is not enough evidence to conclude that this difference is statistically significant.

And for fun, if you’re wondering which team is the outlier on the Boxplot, it’s the 2005 Illinois team, which was outscored by 257 points over 8 Big Ten games. Included in those games was a 61-14 loss to Michigan State, a 63-10 loss to Penn State, and a 40-2 loss to Ohio State. Ouch!

But back to the data analysis. It’s usually a good idea to determine the reason (if there is one) why you reach your conclusions. In this case, is there a specific reason why we can’t conclude there is a difference between divisions? To do this, let’s break down the data by team. The table below shows how each team ranks over the past 10 seasons in scoring differential per season.

Rank

East Team

Scoring Differential

West Team

Scoring Differential

1

Ohio State

110.4

Wisconsin

70.6

2

Penn State

46

Iowa

28.8

3

Michigan

39.8

Nebraska

12.6

4

Michigan State

13.2

Purdue

-17.1

5

Rutgers

-3.1

Northwestern

-36.4

6

Maryland

-27.8

Minnesota

-52.6

7

Indiana

-129.3

Illinois

-77.9

 
Comparing similarly ranked teams, you’ll see that the East Team is higher than the West Team in every case, minus one. Over the last 10 seasons, Indiana has been outscored by 129.3 points each season! That’s 51 more points than the last place team in the West Division! And do you know how many wins Indiana has over Eastern Division teams in the last 10 years? ONE! They just have a single win over Michigan State in 2006! In Indiana’s defense, they didn’t get to play Rutgers or Maryland. But still, things don’t look good for the Hoosiers moving forward.
 
So is Indiana the only thing keeping us from concluding that the East Division is in fact “Murderers' Row”? We can remove both Indiana and Illinois (the last-place teams) to find out.

Minitab's 1-Sample t Analysis

Boxplot

Without Indiana, the Eastern teams now have a scoring differential of 29.8. The West only rises to a scoring differential of 1 without Illinois. And this time the difference is statistically significant, since the p-value is less than 0.05. You can also see in the boxplots how the East has higher values for scoring differentials while the West has much lower values.

So Is the East Division Really Murderers' Row?

We cannot conclude that “top to bottom” the East Division is better than the West Division. Because Indiana has led the Big Ten in futility the last 10 years, there isn’t a significant difference in the winning percentages or scoring differentials between the two divisions. But what if you’re just talking about the teams at the top? We’ve shown that without the cellar dwellers, the East is in fact better than the West. Based on the last 10 years, I’d much rather play the top teams in the West than the top teams in the East.

But the fun part about sports is that the next 10 years aren’t going to be the same as the last 10. There are so many unanswered questions that will affect the competitiveness of each division. Will Wisconsin continue to remain an elite team without Brett Bielema? Can Penn State stay competitive throughout their period of sanctions? How will Rutgers and Maryland adjust to playing in the Big Ten? Will Indiana get a division win in the next decade?

But Hoosier jokes aside (c'mon, they’re just so easy!) this statistical analysis has shown that the top teams in the East do appear to have an edge over the top teams in the West. Whether that edge will persist going forward remains to be seen. Yes, the East Division may be up 7-0, but the game has just begun!


Get Your Way, Every Time: 7 Default Settings in Minitab You Didn’t Know You Could Change

$
0
0

little girlUnless you’re 3 years old, you probably can’t have things just the way you want them all the time.  

You can’t always have peanut butter and ranch dressing on your toast. Or ketchup on your pineapple. Or sugar sprinkles on your peas.

But there is one small arena in life over which you can still exert your control. 

Tools > Options in Minitab's statistical software allows you to change selected default settings in the software, without having to throw a temper tantrum first.

This powerful, underutilized feature in Minitab may save you from the inconvenience of having to change a default setting over and over again, every time you use the software. To open it in Minitab, choose Tools > Options.

tools option menu

In the  left pane, expand nodes to see the list of default options you can modify. Once you change an option, it becomes the default setting every time you open Minitab.

Here are 7 default settings you may find it useful to customize:

 1. Add Worksheet Names to Every Graph

Worksheets, worksheets, everywhere. Graphs galore. Once you start creating graphs from multiple worksheets in a project, it can get pretty confusing trying to sort out which graph was created with which worksheet.

The solution? Have Minitab automatically print the name of the worksheet used to create the graph right on the graph itself:

tools option worksheet name

Now you’re set. No need to go back and try to match worksheets with graphs. Or manually add text boxes to indicate the worksheet for each graph.

worksheet name


2. Customize Default Descriptive Statistics

Not a fan of SE of Mean? Third quartile just doesn’t do it for you? Prefer to see the variance and range displayed by default? 

Avoid getting carpal tunnel syndrome from repeatedly checking and unchecking statistics every time you perform Stat > Basic Statistics> Display Descriptive Statistics. Define your dream list of descriptive stats for Minitab to display by default.

tools options desc stats


3. Define a Go-to List of Special Cause Tests for Control Charts

There are 8 tests for special causes that you can use to flag out-of-control points on a Minitab control chart. Using more tests increases the sensitivity of the control chart, but may also increase the rate of “false alarms.” That’s why the type and number of tests used often depends on your application and your industry. Your company may even have specific guidelines for which tests to use.

To avoid changing the default tests every time you create a control chart, select your preferred tests in Tools > Options. Note you can also change the default number of data points used for each test.

tools options tests


4. Enable Command Language

Once upon a time, long, long ago, in a faraway land, users could only operate Minitab by using a cryptic, mysterious code called command language. Although those geek glory days of Minitab have long since passed, there still exists a die-hard subset of Minitab users who prefer to go commando.

Rather than having to choose Editor > Enable Commands at the start of every new Minitab session, these high-tech samurais of yore can enable the command language by default, so it appears automatically in the Session window every time they open Minitab:

command  language

(If you don’t know what command language is, ignore this tip. You may never need it!)

5. Modify the Size and Font of Input/Output

Eyesight not what it used to be? Fussy about fonts? Don’t waste time searching for your reading glasses or trying to change fonts and colors after the fact.

You can modify the default style of characters in your output (in both session window and graphs), or the data values and column labels in your worksheets.

tools options fonts


6. Change the Default Bar Chart Setup

Tend to make a bar chart the same way every time? For example, do you usually create it from data values summarized in a table in the worksheet? Annoyed that you always have to change the default setting to do it that way?

Quoth the Tools > Options raven: “Nevermore!”

crow tools option


7. Give Me My Benchmark Z's...or Else!

Some quality improvement specialists prefer their capability analysis output expressed in terms of the sigma-capability of a process, otherwise known as Benchmark Z (or Z-bench).

There’s no need for Z-bench lovers to make a special request for these calculations every time they run a capability analysis in Minitab:

capa

What If You Change Your Mind Like a Toddler...10 Seconds Later?

The great thing about changing default settings in Tools > Options is that they’re easy to change back at any time.

And if you don’t remember what you’ve changed and what you haven’t, you can change all the settings back to their original defaults in one easy manuever:

  1. Choose Tools > Manage Profiles > Manage.
  2. Move all profiles from Active Profiles to Available Profiles. (So the Active Profiles field is blank.)
  3. Click OK.

If that doesn't do it, try this workaround:

  1. Roll naked on the floor.
  2. Flail your arms and kick your legs.
  3. Scream at the top of your lungs.

Repeat as necessary until someone gives you the default settings that you want. It never fails...

Using Games to Teach Statistics

$
0
0

gamerWe usually think of games as a distraction—just something we do for fun. However, growing evidence suggests that games can do much more, especially when it comes to learning in a classroom setting.

Because statistics is a topic that doesn’t come easily to most, using properly designed games to teach statistics can become a valuable tool to spark interest and help explain difficult concepts.

So what kinds of “properly designed” games are we talking about here? Not traditional board games like Monopoly or Chutes and Ladders, but interactive computer games—the types of games younger generations have grown up with.

Dr. Shonda Kuiper, Associate Professor and Chair of the Mathematics and Statistics Department at Grinnell College, Kevin Cummiskey, Assistant Professor at the United States Military Academy, and Colonel Rod Sturdivant, Associate and Academy Professor at the United States Military Academy, have been exploring the use of games in their classrooms for many years.

Dr. Kuiper, Professor Cummiskey, and Col. Sturdivant started their research because they realized the challenge of getting their students to not only become proficient in executing statistical tests, but to also understand statistics for use in the real-world and in the context of the larger research process.

“We shifted the focus from statistical calculations without a tie to the context of scientific research,” says Kuiper. “Our materials provide an alternative to lectures and textbook style problems that incorporate research-like experiences in the classroom.”

As a way to incorporate these methods into their instruction, Kuiper and Studivant introduced game-based labs to their students.

“The labs leverage students’ natural curiosity and desire to explain the world around them—so they can experience both the power and limitations of statistical analysis,” says Kuiper.

Below you’ll find an example of how the online computer game “Tangrams” can be used to teach hypothesis testing, and how statistical software like Minitab can help to analyze the data.

Teaching Hypothesis Testing with “Tangrams”

Tangrams is a web-based puzzle game that requires a player to solve a puzzle by covering an image with a set of shapes by flipping, rotating, and moving the shapes.

Tangrams

Prior to starting the game, the class decides upon one or more research questions they want to investigate as a group. For example, students may decide to test whether the game completion times differ based on the type of music that is played in the background, and then they translate the research question into testable hypotheses.

Students design the experiment by determining appropriate game settings and conditions for collecting the data. After the student researchers design the experiment, they become subjects in the study by playing the game. 

Tangrams

The Tangrams website collects the players’ information and records their completion times, and the data is available for immediate use through the website. The students return to their role of researcher to analyze the data that they collected. 

Next, using statistical software like Minitab, students calculate basic summary statistics and plot histograms of the Tangrams completion times. Students discuss and make decisions about data cleaning, such as removing outliers, and then check assumptions, conduct appropriate statistical significance tests, and state their conclusions. 

“While playing the role of a researcher, students are forced to make decisions about outliers and possibly erroneous data,” says Kuiper. “They experience ‘messy’ data that often make model assumptions highly questionable.”

Many students responded very favorably to these types of game-based labs, commenting that they liked being involved in the data collection process because it made the data “real” to them.

“As a group, students enjoyed playing the games,” says Cummiskey. “The labs seemed to truly engage students who were otherwise quiet throughout the semester.”

An Application to Quality Improvement Statistics?

Through the National Science Foundation (NSF) supported grants (NSF DUE #0510392 and NSF DUE #1043814), Dr. Kuiper and others provide materials that can be used as projects within an introductory statistics course or to synthesize key elements learned throughout a second statistics course. The materials can also be used to form the basis of an individual research project and to help students and researchers in other disciplines to better understand how statisticians approach the scientific process.

The lessons could certainly be modified or used as-is in a classroom setting for quality improvement professionals learning statistics to complete Leans Six Sigma and other types of improvement projects. If you’re Master Black Belt or Green Belt instructor, perhaps the ideas in this post will inspire your instruction!

Upcoming Workshop at 2013 USCOTS

Are you headed to Raleigh-Durham, North Carolina for the upcoming USCOTS Meeting? Learn more from Dr. Kuiper and Col. Sturdivant by attending their workshop, “Playing Games with a Purpose: A New Approach to Teaching and Learning Statistics.” The workshop will begin at 9:00 a.m. on Thursday, May 16. The workshop is free, but preregistration is required: http://www.causeweb.org/uscots/workshops/.  

You can also read more about Dr. Kuiper and Dr. Sturdivant’s research in the paper, “Using classroom data to teach students about data cleaning and testing assumptions,” which is freely available at http://www.frontiersin.org/Quantitative_Psychology_and_Measurement/10.3389/fpsyg.2012.00354/abstract

Sample materials and datasets are also freely available at http://web.grinnell.edu/individuals/kuipers/stat2labs/.

What kinds of teaching methods have been helpful for you in learning statistics? Please share below in the comments section.

Planning Summer Fun with Decision Matrix Tools

$
0
0

Normally, I tell you about ways to practice with Minitab Statistical Software so that you can boost your confidence with statistical analysis. But over the last few days in my house, we’ve been planning some activities for the family. That planning has given me a chance to have some fun with Quality Companion.

Quality Companion is a substantial piece of software: everything that you need to manage a quality improvement project in one application. Quality Companion provides project management tools so that you can make and communicate decisions.

My favorite tools in Quality Companion, with apologies to the fans of FMEA and value stream mapping, are a set of forms that I collectively call decision matrices.

  • C&E Matrix
  • Project Prioritization Matrix
  • House of Quality Matrix
  • Pugh Matrix
  • Solution Desirability Matrix

The decision matrices all work roughly the same way. You have a series of input variables, such as proposed solutions to a problem or steps in a process. You want to evaluate the input variables with a set of output categories such as the effect on customers or some other selection criteria. You score the input variables for each output category with the highest scores being the best.

Here’s the simple example that comes from looking at some potential family outings for warmer weather. I used the Solution Desirability Matrix form.

Solution desirability matrix

One of the nice features in Quality Companion is that the forms include a Pareto chart to summarize the results.

Pareto chart of the solution desirability matrix

The Pareto chart helps show that going fishing and going to the movies tie for the highest score. Going to see the raptor show at the nature center came in last.Family recreational boating and fishing on lake

I’ve completed the entire matrix by myself though, so it may not accurately affect the feelings of the entire family. For example, I gave fishing and swimming the highest scores for time because our family has spent an entire day on them before, but another family member might score shorter activities as better. If you’re more democratically-minded about decisions, you can create a ballot right from your matrix and let people vote in Quality Companion. We’ll look more closely at that process next time.

In the meantime, if you’re intrigued by Quality Companion, you can download a free trial, getting started guide, and accompanying project files. The C&E Matrix is on page 22 of the getting started guide. I’m sure once you try them, you’ll love decision matrices as much as I do.

The image of the family recreational boating and fishing on lake is by Gentry George, U.S. Fish and Wildlife Service

Expanding the Role of Statistics to Areas Traditionally Dominated by Expert Judgment

$
0
0
Doctor examining a patientShould this doctor consult a regression model?

In a previous post, I wrote about how the field of statistics is more important now than ever before due to the modern deluge of data. Because you’re reading Minitab's statistical blog, I’ll assume that we’re in agreement that statistics allows you to use data to understand reality. However, I’d also bet that you’re picturing important but “typical” statistical studies, such as studies where Six Sigma analysts determine which factors affect product quality. Or perhaps medical studies, like determining the effectiveness of flu shots.

In this post, I’m going to push that further, much further. So far that it might even make you feel a bit uncomfortable. It’s not that I want you to feel discomfort, but I hope you can find new areas to apply statistical analyses. I’m talking about areas that are considered human specialties: intuition, experience, professional judgment, decision-making, and even creativity.

Human Judgment Is Often Insufficient

Life is complicated and it forces people to process a large amount of information, often in real time, to make complex decisions. Humans have developed mental shortcuts, intuition, and accepted wisdom in order to avoid information overload. It’s often not possible to process all of the relevant variables, their interactions and correlations, weight them all properly, and avoid personal biases. People are also easily swayed by detailed stories, irrelevant information, and anecdotal evidence.

Studies in the cognitive and behavioral sciences have consistently shown that our brains are not up to the task. However, statistical analyses can solve these problems. Studies that date back to at least the 1950s have repeatedly shown that even simple statistical models can produce better predictions than expert judgment. Let's check out some examples that show how statistical analyses have outperformed experts.

Six Examples of Predictive Analytics and Statistical Decision-Making Tools Predicting wine vintage prices

Wine glassOrly Ashenfelter, a Princeton economist, fit regression models that predict the price of wine vintages. These models include predictors such as temperatures and rainfall, among others. Wine experts dismissed the regression models until the model’s predictions beat the experts to the punch on identifying several “vintages of the century.” (Use Regression in Minitab to Make Predictions)

Predicting cardiac risk in the emergency room

Cardiovascular complications pose one of the most significant risks to patients undergoing major surgery. Ideally, cardiologists identify the patients with a high risk before surgery so they can take steps to reduce the risk. Unfortunately, the risk of complications is notoriously hard to predict even for cardiologists. Lee Goldman, a cardiologist, used statistical methodology to create an easy-to-use checklist that estimates cardiac risk. Over the years, this simple tool has proven to outperform the assessment of emergency room doctors. It was originally developed in 1977 but it was not implemented until the mid-1990s.

Predicting the correct diagnosis at the point of care

Isabel is a computer-based clinical diagnostic support system (DSS). This system uses patient variables and symptoms to calculate the most probable diagnoses from over 11,000 possible diagnoses, a task that simply isn’t possible for humans. A study found that when key features from 50 challenging cases, reported in the New England Journal of Medicine, were entered into the system, it provided the final diagnosis in 48 cases (96%).

The Journal of the American Medical Association has concluded that misdiagnosis is increasing. An autopsy study found that 20% of fatal cases are diagnosed incorrectly! So you’d think that using software like this would be a good thing! However, a 2007 study found...well, the title of the article in the Journal of Medical Decision Making says it all: Patients derogate physicians who use a computer-assisted diagnostic aid. Physicians are aware of this perception and, as a result, many don’t use these valuable tools.

Predicting academic achievement for college admissions

Academic admissions counselors attempt to predict the future academic achievement of the candidates they review. However, decades of psychology studies have found that high school rank and standardized test scores have a higher correlation with their eventual academic achievement than expert judgment.

Are you wondering if a reasonable compromise might be to use the numeric measures in conjunction with human judgment? Researchers have studied this as well. They found that academic achievement is better predicted by standardized scores alone than by the scores plus expert judgment.

Predicting job performance to select job candidates

Job interviewHuman resource (HR) professionals try to predict the future job performance of the applicants. A survey of HR executives reveals that the traditional unstructured job interview is widely considered to be more effective than pencil-and-paper tests that assess aptitude, personality, and general mental ability.

However, a meta-analysis finds that these types of tests are consistently better predictors than the unstructured interviews. Scholarly surveys have found that HR professionals are aware of these persistent findings, but they don’t think the pattern applies to them!

Predicting movie profits to influence script writing

The last example comes from a recent New York Timesarticle and it is a new one for me. You’d think that script writing would remain a creative activity. However, big money is riding on a movie’s success. Scripts are approved based on a prediction that the movie will make money, and I’m sure you see where this is heading based on the other examples. Statistician Vinny Bruzzese has developed a methodology to statistically analyze the story structure of the script in order to predict profitability. Based on the analysis, Bruzzese suggests alterations to the script that are designed to increase the predicted profits.

While the methodology is proprietary, it appears that Bruzzese maintains a very detailed movie database which he uses to create a statistical model that relates the characteristics of movies to their profits. Bruzzese hasn’t released data that would allow us to evaluate the results. However, he is reportedly expanding his business and some industry executives are saying that everyone will be doing this soon. Not surprisingly, this statistical intrusion into the creative world is causing much controversy among the creative people!

Pushing the Boundaries

I mentioned that some of these examples might make you feel uncomfortable. Does the idea that it’s better to select a job candidate based on a test score than an interview make you squirm? Perhaps you don’t like the idea that expert opinion actually lowers the reliability of predictions for academic achievement? How about the idea that your risk of complications may be determined by a checklist rather than a doctor’s assessment in the ER? Personally, I’m a writer, so the script analysis example is a bit scary!

If you do feel discomfort, you’re far from alone. In fact, there was great resistance to all of the statistical models in the examples because they intrude on human judgment. In some cases, the overwhelming results have changed minds. No one is talking about getting rid of Goldman’s cardiac risk assessment. But, that’s not always true.

After decades of research, many HR executives and admissions committees still do not understand that test scores are proven to outperform human judgment. And, patients still think that doctors who use computer tools are less competent than those who don’t. This occurs despite the fact that a meta-study, which assessed 100 studies, found that an overwhelming majority concluded that doctors who consult diagnosis predictions based on statistical analyses are correct more often than unaided doctors.

Unfortunately, it is more socially acceptable to rely on judgment rather than test scores, ratings, and formulas, even when the data suggest otherwise.

The Times They Are a-Changin’!

Don’t fret; there is a positive message behind all of this. Ultimately, this is a story about playing to our strengths. It turns out that we aren’t always as good as statistical models when it comes to predicting outcomes for complicated scenarios. However, the good news is that we excel at creating the statistical models and decision-making tools that produce better predictions! And, it has been powerful statistical software, like Minitab, that has been instrumental in unlocking this potential to produce the best predictions possible.

This blog is really a challenge to you. Find new and innovative ways to apply statistics. Push the envelope. Don’t let the old boundaries and discomfort hold you back!

My thoughts on this matter stem from a personal experience, and that will be the subject of my next post.

Explaining Quality Statistics So Your Boss Will Understand: Weighted Pareto Charts

$
0
0
is this machine properly calibrated?

Failure to properly calibrate this machine will result in defective rock and roll. 

In my last post, I imagined using the example of a rock and roll band -- the Zero Sigmas -- to explain Pareto charts to my music-loving but statistically-challenged boss. I showed him how easy it was to use a Pareto chart to visualize defects or problems that occur most often, using the example of various incidents that occurred on the Zero Sigmas last tour.  

The Pareto chart revealed that starting performances late was far and away the Zero Sigmas' most frequent "defect," one that occurred every single night of the band's 100-day tour.

This is the point at which my boss would say, "I get it!  We just need to make sure the Zero Sigmas hit the stage on time, and everything will be swell!" 

"Not so fast there, sir," I would have to reply. "There's a question that this Pareto chart of frequency doesn't answer." 

Pareto Chart of Rock and Roll Tour Incidents by Frequency

 

Are the Most Frequent Defects the Most Important? 

We know the Zero Sigmas started every show late, making that the defect that occurred most often, and this information is valuable. It's also useful to see how frequently singer Hy P. Value forgot the words to his songs and greeted the wrong city when he hit the stage. ("Hello, Albequerque!" was correct on only one night of the tour.)

All of these are incidents we'd like to happen much less frequently. But are they equal? Looking at just the raw counts of the incidents assumes all problems or defects are the same in terms of their consequences.

You can see why this is problematic if you think about defects that might occur in manufacturing a car: a scuff mark on the carpet is undesirable, but it's not on par with a disconnected brake cable. Similarly, if a shirt is sewn with thread that's just slightly off color, the defect is so small the garment might still be usable; a shirt with mismatched fasteners will need to reworked or discarded. 

In the world of rock and roll, the Zero Sigmas starting a performance late probably has fewer consequences than their getting caught lip-syncing during a performance does. How is that reflected in the Pareto chart above? It's not. 

When different defects have different impacts, a Pareto chart based only on number of occurrences doesn't provide enough information to tell you which issues are the most important. 

Are You Counting the Right Thing? 

You might be able to learn more by looking at a different measurement. In many situations, you do want to know the number of defects. But that's not always what you want to measure. For example, the Zero Sigmas' public relations manager gathered all the coverage about the recent tour, and she wants to know how it corresponds to things that happened while the band was on the road so she can be ready to handle things that might happen on the next tour! 

We can add a column of data to our worksheet that tallies the number of news reports, online reviews, and social media mentions about the various incidents that took place on tour.

Bad PR Data for Pareto Chart

This gives us insight into how the different types of incidents played out in the media. Here's how that data looks in a Pareto chart: 

Pareto Chart of Bad Press

This is very important information for the PR manager, because it shows which types of incidents resulted in the biggest number of negative mentions.These results are quite different from the raw counts of defects. For example, even though it was the most frequent defect, the band starting late was barely mentioned in negative reports. 

However, this is really just a different type of frequency data: in effect, we're counting the number of complaints rather than the raw number of defects.

There's another approach to getting more insight from a Pareto chart:  we can look at the data in conjunction with another factor, like a cost, to create a weighted Pareto chart. Because the most common problems aren't always the most important ones, a weighted Pareto chart can give extra emphasis to the most important factors.

Setting Up Data for a Weighted Pareto Chart

A weighted Pareto chart doesn't just look at how often defects occur, but also considers how important they are. A weighted Pareto chart accounts for the severity of the defects, their cost, or almost anything else you want to track. And as we saw when we looked at bad PR instead of incident counts, a weighted Pareto chart may change how we see the priority for improvement projects. 

Weighting requires a valuation: you weight the frequency counts by assigning attributes, such as cost, severity, or detectability, to each defect type. This attribute could be objective, such as the dollar amount it costs to fix each type of defect. For example, a garment manufacturer might know that wrinkles cost $.10 to fix, while dirt specks cost $.50.

Other attributes may be harder to quantify. For instance, a manufacturer might want to place a value on the potential effect of different defects on the company's reputation, a much more difficult thing to assess. Precise measures may not be available, but to get a sense of the possibilities, the manufacturer might ask a corporate counsel or communications officer to rate the damage potential of each type of defect on a scale, or even conduct a small survey to assign values. 

In looking at the tour data for the Zero Sigmas, we'll assign a number from 1 to 100 for the amount of embarrassment, or "lameness," associated with each type of incident that took place, as shown below: 

Data for Weighted Pareto Chart

Now let's use these weights to create a weighted Pareto chart with Minitab Statistical Software. To do it, we'll first need to create a new column of data with Minitab's calculator (Calc > Calculator)  by multiplying the degree of embarassment by the frequencies for each type of incident. We'll store that in a column titled "Lame-o."  

Selecting Stat > Quality Tools > Pareto Chart and entering "Incidents" as the defects and "Lame-o" as the frequencies produces the following chart:   

Pareto Chart of Lameness

The weighted Pareto chart above uses the same incident count data, except that now the defects have been weighted by the degree of lameness involved in each type of incident. Here, you can see that Hy P. Value's forgetting the lyrics to his own songs accounted for 46% of the tour's lameness. Combine that with the guitarists' failure to tune their instruments and we've accounted for 67.5% of the total lameness from the last tour. 

If the next Zero Sigmas tour is going to rock harder, we need to focus on tuning the instruments and making sure Hy P. Value remembers the words. Starting the show late may happen every night, but it doesn't even register in making the tour lame!  

What Would You Like to Know? 

All of this just goes to demonstrate that the same data can lead to different conclusions, depending on how we frame the question. If we're  concerned with the frequency of defects, we focus on getting the band to start their shows on time. If we're concerned with minimizing bad PR, we want to make sure that Zero Sigmas don't get caught lip-syncing again. And if we want to make the band's next tour less lame, figuring out why the singer forgets the lyrics is where we'll want to start. 

That's three ways the same data can give us three different insights into different aspects of quality. That's why we need to be careful about what we're actually measuring, and what we hope to achieve by measuring it. 

 

 

 

Lean Six Sigma in Healthcare: Improving Patient Satisfaction

$
0
0

Riverview Hospital AssociationFor providers like Riverview Hospital Association, serving Wisconsin Rapids, Wis. and surrounding areas, recent changes in the U.S. healthcare system have placed more emphasis on improving the quality of care and increasing patient satisfaction. “In this era of healthcare reform, it is even more essential for providers to have a systematic method to improve the way care is delivered,” says Christopher Spranger, director of Lean Six Sigma and Quality Improvement at Riverview Hospital Association. “We have had a Lean Six Sigma program in place for four years, and we are continuously working on ways to make our hospital safer and more efficient.”

The Challenge

Under what is known as Hospital Inpatient Value-Based Purchasing (VBP), a portion of the Medicare payments hospitals receive are tied directly to patient satisfaction metrics and the quality of care, rather than entirely on the volume of Medicare patients treated. A new rule from the Centers for Medicare and Medicaid Services (CMS) grants Medicare incentive payments to hospitals that are meeting high quality of care standards, or have shown sufficient improvements. Hospitals that do not meet these standards are subject to a reduction in Medicare payments. “Nearly all hospitals treat a sufficiently large percentage of Medicare patients,” says Spranger, “so this rule presents a significant challenge with substantial financial implications for us—and for many other hospitals.”

Incentive payments are determined by how well hospitals score on two point-based domains. One domain takes into account the clinical process of care, where the hospital is judged on its performance in meeting twelve predefined clinical measures. The second domain is based upon the overall patient experience and measured through a survey called the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS). HCAHPS scores depend on the percentage of positive responses received for each question within each specific ”dimension,“ or category. Dimensions include communication with doctors and nurses, clarity of discharge information, overall hospital rating and more.

When Spranger and the Lean Six Sigma team at Riverview discovered that the hospital’s overall HCAHPS survey scores were lower than desired for the discharge information dimension, they set out to improve scores with Lean Six Sigma techniques and data analysis.   

How Data Analysis Helped

The Lean Six Sigma team’s goal was to increase the percentage of “yes” responses for the following yes/no HCAHPS survey question: “During this hospital stay, did doctors, nurses, or other hospital staff talk to you about whether you would have the help you needed when you left the hospital?” To start, the team assessed their baseline performance by plotting outcomes from the previous 24 months with a Minitab control chart. The chart revealed a stable process and a current average positive monthly response rate of 85 percent. The goal was to improve positive monthly responses and exceed the preset benchmark of 91 percent.

Minitab Control Chart

To determine underlying factors that could be causing low scores, Spranger and the team chose to analyze data the hospital already had on-hand about the patient, which was collected as part of the HCAHPS survey. They had access to the responder’s age, gender, length of stay, primary language, education level, hospital unit visited and more. With Minitab charts, histograms and plots, they were able to explore the variables and responses graphically.

The team analyzed the variables further, and used the Minitab Assistant to guide their statistical analysis. They followed interactive decision trees to determine which analysis to use, and selected Analysis of Variance (ANOVA) to compare sample means and to look for significant differences within variables that could be impacting the HCAHPS survey question.

Minitab Assistant

The Riverview team began this project with a preset assumption that differences in the education level or the primary language of the respondent might offer insight into the low scores, but their analyses of these variables revealed no statistical significance. However, analysis of other variables uncovered statistically significant differences within respondent age groups and the hospital unit visited.

Surpassing the Benchmark

The Riverview Lean Six Sigma team was able to narrow the project scope and use the insights they gathered to improve patient satisfaction for groups identified as scoring the lowest. “Historically, health organizations try to increase patient satisfaction through staff training and other large-scale solutions,” says Spranger. “Now, with a data-driven approach, we are able to better target improvements.” The improvements for this project targeted low-scoring patient groups from specific age ranges, as well as patients who stayed in specific hospital units.

The team also used the improvement effort to solve key problems identified in the current process for handling patient discharges, which included timing of education, ensuring the involvement of a family caregiver, and clarifying outcomes with the patient. They redesigned the discharge education process into three phases to address previous issues with timing, collaborated with primary care physician clinics to ensure consistency, created a process to ensure that a primary family caregiver was identified to engage in care management after discharge, and clarified terminology in discharge documentation that was considered vague.

After implementing these improvement strategies, the team compared the current proportion of “yes” responses to responses before the project began. They achieved an impressive gain in the proportion of “yes” responses and met their goal to surpass the 91 percent benchmark for average positive monthly responses.

Check out the full case study at: http://www.minitab.com/uploadedFiles/Company/News/Case_Studies/Riverview-EN.pdf

How do you use quality improvement techniques in healthcare?

Has Your Minitab Had Its V8?

$
0
0

Have you ever seen those commercials where people are walking at a slant because they haven't had their V8

I was reminded of these ads recently when I had the opportunity to visit one of our Minitab customers in Tampa, Fl.  During the visit, I presented a seminar on some Advanced Minitab Tips and Tricks* using the same content we have presented in some of our free webinars

One of the very first scenarios in the presentation walks through data cleanup in your Minitab worksheet. As I started, I literally had to stop the class to address the commotion this topic kicked up on one side of the room. A trainee explained that she was always trying to manipulate the dataset in Excel before bringing it into Minitab. This data cleanup was driving her bonkers!  Her biggest beef was Minitab's apparent need for “even” columns of data. After reassuring her that Minitab could help, I proceeded with the following solution:

Worksheet Information

The first thing I always do when opening a new worksheet in Minitab is click the “Show Info” icon in the Project Manager toolbar.  The “Show Info” button opens the Project Manager window to the left of my worksheet and gives me a nice overview of the data in the worksheet, including how much data is in each column and where I might have missing values.  I can quickly see in the Project Manager screenshot below that all three of the columns in the worksheet I just opened have different lengths (Count = 8, 13, and 10).

Show Info

 

Straightening Out Numeric Columns

I can see in the Project Manager that my %Daily Value column is short of the Nutrition Facts column by three data points. If I want to analyze these two columns in the same graph, I will need to even them out. 

Note that the Type of data in the %Daily Value column is numeric.  With this in mind, I can simply put an asterisk symbol in row 13 of the %Daily Value column and Minitab will automatically backfill the rows above with an asterisk (*). 

numbers

NOTE:Missing values in Minitab are denoted by the asterisk symbol.  If you did not collect data for a certain point, enter the asterisk symbol…not a zero. Putting a zero in the place of a missing value will result in incorrect calculations for statistical calculations like means and standard deviations.

Bringing Text Data in Line

If I need to extend the number of data points Minitab recognizes in a column of text values, the procedure changes just slightly. Instead of putting the asterisk symbol in the last row you would like to have filled with data, put the asterisk symbol one cell lower.  In the example below, I would put the asterisk symbol in row 9 of the Color column. 

text1

Tab out of that cell once the asterisk is entered and you will see that the cells above remain blank but the asterisk symbol remains.  Your column will also be one row too long.  Delete the asterisk that you entered and that will bring your text column back in line!

text2

 

After I walked through this simple demonstration (within minutes of beginning the presentation), the trainee stood up excitedly and declared, "That's it! I don't need to see any more! This is the tip I've needed for years."  And with that, she promptly, and straightly, walked out the door!

*Minitab Advanced Tips and Tricks is one of our regular webinar presentations.  To see which free webinars are available this month, click here.


Will the Weibull Distribution Be on the Demonstration Test?

$
0
0

Over on the Indium Corporation's blog, Dr. Ron Lasky has been sharing some interesting ideas about using the Weibull distribution in electronics manufacturing. For instance, check out this discussion of how dramatically an early first-failure can affect an analysis of a part or component (in this case, an alloy used to solder components to a circuit board). 

This got me thinking again about all the different situations in which the Weibull distribution can help us make good decisions. The main reason Weibull is so useful is that it's very flexible in fitting different types of data, because it can take on the characteristics of other types of distributions. For example, if you have skewed data, the Weibull is an alternative to the normal distribution. 

Developing a Demonstration Test Plan with the Weibull Distribution

demonstration test plan example with turbine engine combustorIn business and industry, the Weibull distribution is frequently used to model time-to-failure data. In other words, it can help us assess the reliability of component or part by estimating how long it will take to fail. If you work for the company supplying those parts, it can help you check the quality of the components you're making, and to prove to your customers that your products meet their requirements.

A good way to do this is to follow a Demonstration Test Plan, and Minitab Statistical Software can help you create test plans using the Weibull distribution (or some other distribution, if you know it's more appropriate). Minitab's test planning commands make it easy to to determine the sample size and testing time required to show that you have met reliability specifications.

Your test plan will include:

  • The number of units or parts you need to test
  • The stopping point, which is either the amount of time you test must each part or the number of failures that must occur
  • The measure of success, which is the number of failures allowed in a passing test (for example, every unit runs for the specified amount of time and there are no failures)

You can use Minitab to create demonstration, estimation, and accelerated life test plans, but let's focus on demonstration test plans here.

What Types of Demonstration Test Plans Are There? 

There are two types of demonstration tests:

Substantiation Tests

A substantiation test provides statistical evidence that a redesigned system has suppressed or significantly reduced a known cause of failure. This test aims to show that the redesigned system is better than the old system.

Reliability Tests

A reliability test provides statistical evidence that a reliability specification has been achieved. This test aims to show that the system's reliability exceeds a goal value. 

You can tweak these tests depending on scale (Weibull or exponential distribution) or location (other distributions), a percentile, the reliability at a particular time, or the mean time to failure (MTTF). For example, you can test whether or not the MTTF for a redesigned system is greater than the MTTF for the old system.

An Example of a Demonstration Test Plan

Let's say we work for a company that makes turbine engines. The reliability goal for a new turbine engine combustor is a 1 percentile of at least 2000 cycles. We know that the number of cycles to failure tends to follow a Weibull distribution with shape = 3,  and that we can accumulate up to 8000 test cycles on each combustor. We need to determine the number of combustors it takes to demonstrate the reliability goal using a 1-failure test plan.

Here's how to do it in Minitab (if you're not already using it, download the free 30-day trial of Minitab and play along): 

  1. Choose Stat > Reliability/Survival > Test Plans > Demonstration   
  2. Choose Percentile, then enter 2000. In Percent, enter 1.
  3. In Maximum number of failures allowed, enter 1.
  4. Choose Testing times for each unit, then enter 8000.
  5. From Distribution, choose Weibull. In Shape (Weibull) or scale (other dists), enter 3. Click OK.

Your completed dialog box should look like this: 

Demonstration Test Dialog Box

Interpreting the Demonstration Test Plan Results

When you click OK, Minitab will create the following ouput:

Demonstration Test Plan Output

Looking at the Sample Size column in the output above, we can see we'll need to test 8 combustors for 8000 cycles to demonstrate with 95.2% confidence that the first percentile is at least 2000 cycles.  

When it generates your Demonstration Test Plan, Minitab also creates this graph: 

Likelihood of Passing for Weibull Model Demonstration Test

The graph gives us a visual representation of the likelihood of actually passing the demonstration test. Here,

  • The very sharp rise between 0 and 2 indicates that the probability of your 1-failure test passing increases steadily as the improvement ratio increases from zero to two.
     
  • If the improvement ratio is greater than about two, the test has an almost certain chance of passing.

Based on this information, if the (unknown) true first percentile was 4000, then the improvement ratio = 4000/2000 = 2, and the probability of passing the test will be about 0.88. If you reduced the value to be demonstrated to 1600, then the improvement ratio increases to 2.5 and the probability of passing the test increases to around 0.96. Reducing the value to be demonstrated increases the probability of passing the test. However, it also makes a less powerful statement about the reliability of the turbine engine combustor.

What's the Right Demonstration Test Plan? 

The right demonstration test for your situation will depend on many factors. Fortunately, it's easy to adjust different parts of a proposed test to see how they affect the task.  Each combination of maximum number of failures allowed and sample size or testing time will result in one test plan, so you can use Minitab to generate several test plans and compare the results.

 

 

 

A Mommy’s Look at Lyme Disease Statistics…

$
0
0

I spend a majority of my time entrenched in statistics. Using statistics. Studying statistics. Developing and testing statistical software. Statistics guide many of my decisions at work and in life. That’s the world of an engineer.Deer Tick

For this reason, you can imagine my surprise when my husband called me at work on a bright, sunny June day in 2009 to tell me that our 4-year-old daughter had been diagnosed with Lyme disease. That, to me, seemed completely improbable. We live in a development in suburbia.  Our children don’t play deep in the woods. We don’t hike in the woods. In accordance with the tick warnings that year, we were vigilant in our nightly tick checks. How could she have contracted Lyme disease? Where did she contract Lyme disease?

After her treatments began, I received a call from the Pennsylvania Department of Health gathering statistics related to the case. Emilia wasn’t the only 4-year-old diagnosed that summer, and nightly tick checks weren’t necessarily going to find the nymphs (Stage 2 of the Tick life cycle) prevalent at the beginning of the season. We found that the fever Emilia had a few weeks earlier was probably the first symptom. We had assumed that was a virus. The sluggishness that followed, we assumed, was the exhaustion from the very busy schedule of a 4-year-old. Dance recitals, flower girl duty, play dates...it sure exhausted me.  But it was the rash that eventually appeared which caused my husband to take her to the doctor. A rash that was not the standard bullseye and one that I suggested we wait out.

Fortunately, my husband never listens to me and I’m glad for that. It may have been the last symptom before more serious complications occurred.

I should note that I’m not a doctor and would not claim to be an expert in Lyme disease. I’m simply a mom who carries with me the guilt of missing a tick bite and the initial symptoms of the disease.

Fueled with guilt, I decided to do some research. What was the probability that one of my children would have contracted Lyme disease? The CDC has extensive statistics available on Lyme disease cases nationwide:

http://www.cdc.gov/lyme/stats/index.html

In the Time Series plot below, you can see that the incidence rate nationwide spiked in 2009 but is still fairly low. A closer look, however, at the Pennsylvania statistics provides a different picture. Pennsylvania has a much larger incidence rate compared to the national average.

Time Series Plot of Incidence Rate

Looking at the Pareto chart below, you can see that Pennsylvania had the largest number of confirmed cases in the U.S. in 2009:

Pareto Chart of Confirmed Cases

In 2009, there were 4950 confirmed cases of Lyme disease in Pennsylvania. The incidence rate was 39.3 cases per 100,000 residents. While that's one of the largest incidence rates in the country, it may still seem relatively small. Perhaps not what one would call an epidemic.

But, as a mom, those statistics mean nothing. One of those 4,950 cases was my daughter. One out of two of my children was diagnosed with Lyme disease in 2009. Quite frankly, those are the statistics that matter to me.

Today, Emilia is a healthy 8-year old. While she seems to be overly opinionated and sometimes downright bossy, I’ve been told that isn’t a lasting side effect of Lyme disease. Rather, it’s a lasting side effect of being raised by me.

There’s no cure for that.

As we enter another tick season, please be vigilant of Lyme disease prevention and symptoms. For more information: http://www.cdc.gov/lyme/index.html

NOTE: The doctor who quickly diagnosed Emilia was none other than Maggie’s mommy from a previous post. Thanks, Maggie’s mommy!

http://blog.minitab.com/blog/adventures-in-software-development/how-to-talk-with-your-kidsabout-quality-improvement

 

 

The Curious (Statistical) Case of Marc-Andre Fleury

$
0
0

Marc-Andre FleuryThe Pittsburgh Penguins are in the midst of another Stanley Cup playoff run. With a 3-1 lead over the Ottawa Senators, they are a mere 1 game away from their 3rd Eastern Conference Final in 6 years. But it looks like they will do so without starting goalie Marc-Andre Fleury.

After a string of disappointing playoff games, Fleury has been benched and netminder Tomas Vokoun has been guarding the goal. And Vokoun is playing so well that it doesn’t look like Fleury will see the ice anytime soon.

So what does this have to do with statistics? Well, Fleury’s statistics tell the story of why he is on the bench. It all started last year, so let’s compare his regular-season save percentage to his playoff save percentage for the 2011-12 season.

  • Regular Season: 0.913
  • Playoffs: 0.834
  • Difference 0.079

That’s a difference of almost 8% between regular-season and playoff games! To put this in perspective, let's say goalies typically face 30 shots a game. If they save 91.3%, they’ll allow 2 or 3 goals. If they save only 83.4%, they’ll allow 5 goals!

Talk about a sieve.

The difference between Fleury's regular season and playoffs wasn’t quite as bad this year (he was 2.5% worse in the playoffs), but combined with last year that was enough to put him on the bench. So what I want to know is, was Fleury’s playoff blunder the worst playoff performance ever? Is there any goalie who ever had a bigger differential between their regular-season and playoff save percentages?

The Goalie Data

I collected data on every NHL goalie since 1982 (the dataset I used actually went back to the 1940s, but didn’t include "shots against" before 1982, so I couldn’t calculate save percentage). For each year, I calculated every goalie's difference in save percentage between the regular season and playoffs. The goalie had to have faced at least 50 shots in the playoffs in order to be included. I did that to exclude small samples, like Glenn Resch, the Flyers goalie who allowed a total of 1 goal on 1 shot in the 1986 playoffs.

The Relationship between Regular-
Season and Playoff Save Percentage

In total, the data set had 542 different observations. I’ll start by using Minitab Statistical Software to perform a regression analysis between regular-season and playoff save percentage.

Fitted Line Plot

We see that there is a weak positive relationship between regular season save percentage and playoff save percentage (the R-Squared value is only 16.5%). But the positive relationship makes sense, as the better a goaltender is in the regular season, the better he should be in the playoffs.

In the graph above I’ve marked Fleury with an upside-down black triangle. I drew a line at regular-season save percentage = .900 because of all the goalies to have a save percentage of at least .900 in the regular season, none have had a worse playoff performance than Fleury last year. The model would have expected him to have a save percentage around .906, instead of the .834 he actually had. Nobody in the “Above .900” group has ever had a playoff save percentage that low.

But was his difference the worst of all time? No. We can clearly see a goalie in the bottom left that had a regular season save percentage around .86, and a playoff save percentage around .75. That's a difference of 11%, which is worse than the difference of 7.9% Fleury had. But other than that, it’s hard to tell how many other goalies have performed worse from this graph. We’re going to need another plot in order to determine the worst playoff performance ever.

The Worst Playoff Performance Ever

Luckily, Minitab has a plot that will help us out. I’ll make an individual value plot of each goalie's difference between regular season and playoff save percentage. The individual value plot will show us every single observation in the data set, so we’ll easily be able to compare who is the lowest.

Individual Value Plot

We see that most of the points are clustered around 0. This makes sense, since in the long run you would expect goalies to play about the same between the regular season and the playoffs. But let’s get to the outliers! Again, I’ve marked Fleury’s performance from 2011-12 with the black triangle. But here it’s much easier to see who has performed worse. However, there are only 7 other goalies who have a worse differential!

So, who is the data point all alone in the bottom left?

That would be former Boston Bruins goalie Rejean Lemelin. During the 1989-1990 season he posted a save percentage of .892 in 43 regular-season games. But in the playoffs his save percentage dropped to .772, which is a difference of 12%! And don’t think that was because Lemelin was a terrible goalie...in the regular season of that same year, he teamed up with his goaltending partner Andy Moog to win the NHL’s William M. Jennings Trophy (for fewest team goals allowed).

When the playoffs began in 1990, it was Lemelin who started in goal for the Bruins. But when he struggled in the first few games, Moog came in and saved the day. In the opening round, Boston was losing their series 2-1 and found themselves down 5–2 entering the third period of game 4. Moog replaced Lemelin in goal and posted a shutout for the remainder of the game. The Bruins then rallied by scoring four goals in the third period to win the game. Moog would remain in goal the rest of the playoffs, leading the Bruins to the Stanley Cup Finals (where they would lose to the Edmonton Oilers).

How cool is that!?!?! It’s not every day that you do a statistical analysis and find that the answer to your question has such a neat story. Well, it probably wasn’t neat for Lemelin at the time...but at least he’ll always have that William M. Jennings Trophy!   

Do the 2013 Penguins = the 1990 Boston Bruins?

We’ve shown that it’s quite rare for a goalie as good as Fleury to have as bad of a playoff performance has he had in the 2012 playoffs. And while his performance this year wasn’t as bad (it’d be a much different story had he not shut out the Islanders in Game 1), it was enough for the Penguins to turn to Vokoun. But the Boston Bruins are a prime example of a team making a long run in the playoffs with their backup goalie!

So can the Penguins follow suit and do the same thing? If they finish off Ottawa, there will only be one team standing between them and the Finals. And of course, it’ll most likely be the Boston Bruins. Oh, the irony! But should we have expected anything different? Absolutely not. Why? Because It’s The Cup!

Photograph by wstera2.  Licensed under Creative Commons BY-NC-SA 2.0.

 

No Matter How Strong, Correlation Still Doesn't Imply Causation

$
0
0

a drop of rain correlating with or causing ripples...There's been a really interesting conversation about correlation and causation going on in the LinkedIn Statistics and Analytics Consultants group. 

This is a group with a pretty advanced appreciation of statistical nuances and data analysis, and they've been focusing on how the understanding of causation and correlation can be very field-dependent. For instance, evidence supporting causation might be very different if we're looking at data from a clinical trial conducted under controlled conditions as opposed to observational economic data.

Contributors also have been citing some pretty fascinating ideas and approaches, including the application of Granger Causality to time series data; Hill's Causation Criteria in epidemiology and other medical-related fields; and even a very compelling paper which posits that most published research findings are false.  

All of this is great food for thought, but it underscores again what must be the most common misunderstanding in the statistical world: correlation does not equal causation. This seems like a simple enough idea, but how often do we see the media breathlessly reporting on a study that has found an associative relationship between some factor (like eating potato chips) and a response (like having a heart attack) as if it established direct, a + b = c inevitability?  

What Is Correlation? 

Correlation is just a linear association between two variables, meaning that as one variable rises or falls, the other variable rises or falls as well.  This association may be positive, in which case both variables consistently rise, or negative, in which case one variable consistently decreases as the other rises. 

An easy way to see if two variables might be correlated is to create a scatterplot.  Sometimes a scatterplot will immediately indicate correlation exists; for instance, in this data set, if we choose Graph > Scatterplot > Simple, and enter Score1 and Score2, Minitab creates the following graph: 

scatterplot showing correlation between factors

(If you want to play along and you don't already have it, please download the free 30-day trial of Minitab Statistical Software!)

In this scatterplot above, we can clearly see that as score1 values rise, so do the values for Score2.  There's definitely correlation there!  But sometimes a scatterplot isn't so clear.  From the same data set, let's create a scatterplot using "Verbal" as the X variable and GPA as the Y variable: 

Scatterplot of Verbal Scores and GPA

Well, it looks like there might be a correlation there...but there's a lot of scatter in that data, so it isn't so clear as it was in the first graph. Is it worth exploring this further (for instance, by proceeding to a regression analysis to learn more about the association)?  Fortunately, we can look at a statistic that tells us more about the strength of an association between these variables.  

The Correlation Coefficient

To find the Pearson correlation coefficient for these two variables, go to Stat > Basic Statistics > Correlation... in Minitab and enter Verbal and GPA in the dialog box. Minitab provides the following output: 

Pearson's correlation coefficient

The correlation coefficient can range in value from -1 to +1, and tells you two things about the linear association between two variables:

  • Strength - The larger the absolute value of the coefficient, the stronger the linear relationship between the variables. An value of one indicates a perfect linear relationship (the variables in the first scatterplot had a correlation coefficient of 0.978), and a value of zero indicates the complete absence of a linear relationship.
     
  • Direction - The sign of the coefficient indicates the direction of the relationship. If both variables tend to increase or decrease together, the coefficient is positive. If one variable tends to increase as the other decreases, it's negative.

The correlation coefficient for Verbal and GPA in our data set is 0.322, indicating that there is a positive association between the two. Comparing the 0.978 of the first two variables to this, we see the variability visible in the second scatterplot reflected in the lower correlation coefficient: there's a relationship there, but it is not as obvious or clear.  

So, does the connection between Verbal and GPA merit further scrutiny?  Maybe...with real data sets, it's rare to see a correlation coefficient as high as that between Score1 and Score2.  Whether you should interpret an intermediate value for the Pearson correlation coefficient as a weak, moderate, or strong correlation depends on your objectives and requirements.

Even STRONG Correlation Still Does Not Imply Causation

But even if your data have a correlation coefficient of +1 or -1, it is important to note that correlation still does not imply causality. For instance, a scatterplot of popsicle sales and skateboard accidents in a neighborhood may look like a straight line and give you a correlation coefficient of 0.9999...but buying popsicles clearly doesn't cause skateboard accidents. However, more people ride skateboards and more people buy popsicles in hot weather, which is the reason these two factors are correlated.

It is also important to note that the correlation coefficient only measures linear relationships. A meaningful nonlinear relationship may exist even if the correlation coefficient is 0.

Only properly controlled experiments let you to determine whether a relationship is causal, and as that recent LinkedIn conversation has indicated, the "requirements" for determining causality can vary greatly depending on what you're studying.  

So, in the end, what can we say about the relationship between correlation and causation? This comic from xkcd.com, also referenced in the recent LinkedIn conversation, sums it up nicely: 

xkcd correlation comic

Comic licensed under a Creative Commons Attribution-NonCommercial 2.5 license.  Photo credit to robin_24 http://www.flickr.com/photos/robin24/5554306438/.  This photo has a creative commons attribute license. 

 

Family Democracy, Summer Fun, and the Ballot

$
0
0

FishingPreviously I wrote about using a decision matrix to help make a decision. Matrices are nice tools for collecting your thoughts and visualizing a decision. But complex decisions could involve collecting and synthesizing input from a number of different people.

Quality Companion (Minitab's process improvement software) uses ballots to let team members record their input to a decision matrix. If you’ve already made the matrix, setting up the ballot is easy. The ballot simplifies data collection and organization, even among team members who are dispersed in space and time. You can follow along in the free trial of Quality Companion if you like:

  1. Right-click the completed matrix and choose Create Ballot from Table.
    Right-click to open a context menu that lets you create a ballot from a decision matrix.
  2. In Candidates, choose the column from the table that you want people to evaluate.
  3. Check the columns that you want people to provide input on.
  4. Set the scale to match the number of rows in your table.
    Set the Candidates to the Proposed Solution column. Include all of the criteria.
  5. Type a central issue for the ballot and confirm the ballot type. Click Next.
    Type the central issue to decide and confirm the type of ballot.
  6. Choose team members who will complete the ballot from the drop-down list. Click Finish.
    Choose team members who whill use the ballot.
  7. Once the results are ready, press Update Form.
    Update the decision matrix with the ballot results.

The Pareto chart with the decision matrix now shows the results from the ballot, so that the results of the voting are easy to view and communicate.

The Pareto chart orders the candidates by score.

Fishing still wins the day, which should please the folks at takemefishing.org, but probably not nearly as much as it will please the young anglers who’ve been asking me “When are we going fishing?” since January.

The ballot itself also provides some interesting analysis, especially if you want to build consensus among team members.

  1. With the ballot results open, Choose View > Task Pane.
  2. Click Set Consensus Threshold.
  3. Check Enable consensus threshold.

In the Overall Results table, categories where the voters disagreed turn red.

Red categories are areas of wide disagreement.

If you highlight one of the cells in the Overall Results table, you can get a detailed view of the disagreement.

Each vote for geo-caching cost appears on the bar chart. Two 2s and one each of 1, 5, and 6.

In this case, family members who gave geocaching low scores for cost considered the cost of purchasing one or more GPS units for the family. Those who gave high scores considered the cost of going out for the day assuming that an event organizer would have units to borrow.

Especially when you’re building consensus, the easy identification of points of disagreement, and the ability to understand them, is very powerful. Quality Companion’s ballots and decision matrices can be a big help.

If this introduction to ballots has intrigued you, you can get even more detail from the Sample Course Materials Minitab provides for its trainings. Check out “Evaluating Inputs – Brainstorming with a Fishbone Diagram” to see even more features of ballots and other tools in Quality Companion. There could even be an instructor-led training occurring near you soon. View the schedule to see.

Viewing all 828 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>