Quantcast
Channel: Minitab | Minitab
Viewing all 828 articles
Browse latest View live

ITEA Sneak-Peak: The Great Escape from Foam Defects

$
0
0

The 2014 ASQ World Conference on Quality and Improvement is coming up in early May in Dallas, and this year’s International Team Excellence Award Process (ITEA) will also come to a close at the conference, as winners from the finalist teams will be chosen for ASQ gold, silver, or bronze-level statuses.

What’s ITEA?

The annual ASQ ITEA process celebrates the accomplishments of quality improvement teams from a broad spectrum of industries from around the world. The ITEA is the only international team recognition process of its kind in the United States, and since 1985, more than 1,000 teams from the United States, Argentina, Australia, Brazil, Canada, China, Colombia, Costa Rica, Germany, Guatemala, India, Japan, Mexico, Philippines, Singapore, South Korea, Thailand, and the United Arab Emirates have participated.

The preliminary round entry requires teams to submit a presentation outlining a project completed within the last two years that achieved measurable results. Then, judges chosen by ASQ evaluate the initial team submission and select a group of team finalists from the preliminary round to present their projects at ASQ’s WCQI.

The live team presentations are judged at the conference based upon how well each team’s presentation addresses predefined criteria established by ASQ.

This year, 40 teams from 14 different countries have been named as finalists. (You can see the list of finalists here.) We congratulate all of the Minitab customers who have been selected as finalists!

One of those finalist teams is from Ford Motor Company out of Dearborn, Mich., and I caught up with Scott Sterbenz, a Six Sigma Master Black Belt at Ford Motor Company, about his team’s ITEA presentation.

This is the third year Ford has participated in the ITEA process.

“Our own internal processes at Ford for completing quality improvement projects are aligned very closely with the ITEA criteria and methodology,” Sterbenz says. “So we’ve found that our project fits in well with the ITEA competition.”

The Great Escape from Foam Defects

The Ford team’s project for this year is called “The Great Escape from Foam Defects,” and addresses the implementation of castor oil derived foam for the instrument panel of the 2013 Ford Escape using DMAIC problem solving.

“During the development of the 2013 Escape, Ford had a strong drive to deliver a compact utility vehicle with minimal impact to the environment,” says Sterbenz. “While the development team made many updates to the Escape to make it more environmentally friendly, they wanted to go even further.”

A few months before the launch of the 2013 Escape, Ford made a late decision to use a more environmentally friendly foam—castor oil foam—for the instrument panel instead of a petroleum-based foam. The problem was that Ford’s supplier was experiencing high scrap rates of about 30 percent for the new foam, instead of a typical 1 percent scrap rate.

“We used Minitab Statistical Software to help us conduct and analyze a sensitivity Designed Experiment, which helped us validate our root cause and actions for improving the scrap rate,” Sterbenz says. “And we were able to reduce the scrap rate initially to 0.7 percent, and then with some incremental actions, to a stable 0.1 percent.”

The Main Effects Plot below helped the team validate a root cause for the high scrap rate—by looking at the difference in the sensitivity DOE results between the new castor-oil-based foam and the previously used petroleum-based foam. Green circles indicate quality levels where instrument panels are acceptable; red circles indicate quality levels where instrument panels would be scrapped.

The team hypothesized that castor oil foam was more sensitive than petroleum-based foam to the normal and expected variation in process energy input from the foaming process and foaming tool parameters. The v-shaped main effects plot above indicated significant energy input variation sensitivity for castor oil foam, while the main effects plot for petroleum-based foam showed insignificant sensitivity to that same energy input variation.

To learn more about Ford’s ITEA project and the solution they developed for reducing the high scrap rate, be sure to attend the team’s live presentation at this year’s ASQ WCQI, on Tuesday, May 6 at 10:45 a.m. Also, don’t forget that the ITEA live team presentations start during the morning sessions on Monday, May 5, and there are many interesting case studies to be heard from many different industries—including transactional and other service-based industries.

The Ford team is also participating in the ASQ WCQI poster display contest, and their poster this year sounds pretty cool. They’ve likened the steps they took to accomplish their “Great Escape from Foam Defects” ITEA project to the steps of another “great escape”—the steps inmates took to plan and execute their escape from the Alcatraz federal prison on June 11, 1962.  Be sure to check out all the posters on display in the exhibit hall!

Minitab is very proud to sponsor the 2013-2014 ITEA process! Visit the ITEA website to learn more about how to enroll your team in the 2014-2015 competition. 


What if the NCAA tournament wasn’t single elimination?

$
0
0

Connecticut just defeated Kentucky to win the NCAA Men's Basketball Championship. The game had the highest combined seeding of any championship game in NCAA tournament history. This shows that while a single elimination tournament can be very entertaining, it doesn’t always determine who the “best” team is. In fact, despite winning the championship, Connecticut is still ranked 8th in the Pomeroy Ratings and 10th in the Sagarin Predictor Rankings. Though Connecticut played the best basketball the past 3 weeks, it would be folly to ignore the 30 games they played before that!

But although I’d agree Connecticut isn’t the “best” college basketball team, they definitely deserved the championship thanks to their dominant wins over Villanova, Iowa State, Michigan State, Florida, and Kentucky. And the funniest part about their run is that the closest they came to losing was against St. Josephs, by far the worst team they played in the tournament. But the point of this post isn’t to rain on their parade. Instead, I want to look at how much the single elimination format helped Connecticut and how different their chances would have been if the tournament was a best of 3, 5, or even 7 game series!

Connecticut’s Odds in a Single Elimination Tournament

First I’m going to look back at Connecticut’s run and calculate their probability of beating each team they faced with a regression model that uses each team's ranking. The rankings I’m using are the final Sagarin Predictor Rankings rather than the rankings before the tournament. The tournament gave us additional information about each team, so we might as well use it!

Opponent

Sagarin Rank

Probability of Connecticut Winning

Saint Joseph’s

56

77%

Villanova

13

54%

Iowa St

16

57%

Michigan St

6

43%

Florida

3

35%

Kentucky

12

53%

 
If you multiply all of those probabilities together, you find that the odds of Connecticut winning all 6 of those games are only 2%, or about 1 in 50. Those odds are low, but keep in mind that in a 68-team tournament, the eventual champion’s odds are always going to be low. It’s not easy to win 6 basketball games in a row, but somebody has to do it.

And for reference, if I assume Connecticut was the #1 ranked team (instead of #10), their probability of beating those same 6 teams is still only 19%, or about 1 in 5. This is why a single elimination tournament does a lousy job of determining the “best” team. Chances are the best team isn’t going to win!

Connecticut’s Odds if the Tournament had Multi-Game Series

But what would we do if we wanted to devise a tournament that did a better job of determining who the best team was? It’s actually the same thing we would do in quality improvement. Whether we’re trying to determine the best basketball team, the best supplier to buy materials from, or even if the temperature depends on whether Punxsutawney Phil sees his shadow, when we want to be more confident in our decision we should increase the sample size! So in our hypothetical tournament, instead of having just a single game determine the best team, we should use multiple games.

To determine how many, I’ll use a negative binomial distribution. The negative binomial distribution models how many trials it will take for a certain event to occur a certain number of times. So if Connecticut has a 77% chance of beating Saint Joseph’s, this distribution can tell us the probability of Connecticut beating them 2 times in 3 tries (a best-of-3 series). The table below compares Connecticut’s probability of beating each team in different variations of a multi-game series.

Opponent

Single Game

Best-of-3

Best-of-5

Best-of-7

Saint Joseph’s

77%

87%

92%

95%

Villanova

54%

56%

57%

59%

Iowa St

57%

60%

63%

65%

Michigan St

43%

40%

37%

35%

Florida

35%

28%

24%

20%

Kentucky

53%

55%

56%

57%

Probability of winning all 6 games

1.9%

1.8%

1.6%

1.5%


You can see that when the teams are evenly matched, like Connecticut was against Villanova and Kentucky, the extra games don’t change the probabilities very much. The big change in percentages comes when one team is significantly better than the other. In a single game, Saint Joseph’s was more than capable of upsetting Connecticut (and they nearly did!). But the odds tilt very heavily in Connecticut’s favor when you play more games. However, the Huskies can be glad they didn’t have to beat either Michigan State or Florida multiple times. The single game elimination format definitely helped them there!

Overall, the odds of Connecticut winning the entire tournament don’t change very much between the different series. The single elimination formatted helped Connecticut a little, but not very much.

But let’s get back to that “determining the best team” thing. What would these probabilities look like if we assumed Connecticut was the #1 team? 

Opponent

Single Game

Best-of-3

Best-of-5

Best-of-7

Saint Joseph’s

91%

98%

99%

99.8%

Villanova

78%

88%

93%

95%

Iowa St

80%

90%

94%

97%

Michigan St

70%

78%

84%

87%

Florida

62%

68%

72%

75%

Kentucky

78%

88%

93%

95%

Probability of winning all 6 series

19%

36%

49%

57%

We see that just by making the tournament a best-of-3 series, the #1 ranked team’s chances almost double. And they triple when you make it a best-of-7 series! And with a 7-game series, you’d have the “best” team actually winning the entire tournament most of the time too!

Of course, the tradeoff is that there would be significantly fewer upsets. Mercer probably doesn’t beat Duke in a multi-game series and Dayton almost certainly isn’t beating both Ohio State and Syracuse. March Madness would have to be replaced with March Monotony. So don’t take this as a criticism of how the NCAA tournament is currently constructed! The tournament is unpredictable and entertaining, and that counts for something, too! And having 67 multi-game series wouldn’t be feasible anyway.

But imagine if the tournament went to a best-of-3 (or even 5) series once we got to the Final Four. We could have the best of both worlds! After all, who wouldn’t want to watch Wisconsin and Kentucky play again? And would Connecticut be able to pull off the upset over Florida multiple times? Sure, it would take a little longer, but the tournament already lasts 3 weeks. What’s one more? And it’s not like these teams need to attend class! After all, the championship game was between a team with an 8% graduation rate and another team whose starting five will spend the rest of the spring semester preparing for the NBA. (Student athletes…right?)

But all jokes aside, the NCAA tournament is unlikely to have multi-game series anytime soon. So continue to enjoy the tournament in its current form. Just know that when you hear somebody say that the team that won the championship is the “best” team in college basketball, you should take it with a grain of salt.

After all, if the better team always won, it wouldn’t be March Madness, would it?

Photo: Bob Donnan, USA TODAY Sports

Viewing Mist-Covered Mountains of Data

$
0
0

mountainsA famous classical Chinese poem from the Song dynasty describes the views of a mist-covered mountain called Lushan.

The poem was inscribed on the wall of a Buddhist monastery by Su Shi, a renowned poet, artist, and calligrapher of the 11th century.

Deceptively simple, the poem captures the illusory nature of human perception.
 

poem   Written on the Wall of West Forest Temple

                                      --Su Shi
 
  From the side, it's a mountain ridge.
  Looking up, it's a single peak.
  Far or near, high or low, it never looks the same.
  You can't know the true face of Lu Mountain
  When you're in its midst.

 

Our perception of reality, the poem suggests, is limited by our vantage point, which constantly changes.

In fact, there are probably as many interpretations of this famous poem as there are views of Mt. Lu.

Centuries after the end of the Song dynasty, imagine you are traversing a misty mountain of data using the Chinese language version of Minitab 17...

Written in the Graphs Folder in Minitab Statistical Software     

From the interval plot, you are extremely (95%) confident that the population mean is within the interval bounds.

From the individual value plot, the data may contain an outlier (which could bias the estimate the mean).

From the boxplot, the data appear to be extremely skewed (making the confidence interval and mean estimate unreliable).

From the histogram, the data are bimodal (which makes the estimate of the mean utterly ...er...meaningless)

From the time series plot, the data show an order effect, with increasing variation and downward drift.

From the individuals and moving range charts with stages, the data appear stable and in control:

These graphs are all of the same data set.

Take it from Su Shi. Don't rely on a single graphical view to capture the true reality of your data.

Image of Lushan licensed by Wikimedia Commons.

Re-analyzing Wine Tastes with Minitab 17

$
0
0

In April 2012, I wrote a short paper on binary logistic regression to analyze wine tasting data. At that time, François Hollande was about to get elected as French president and in the U.S., Mitt Romney was winning the Republican primaries. That seems like a long time ago…

Now, in 2014, Minitab 17 Statistical Software has just been released. Had Minitab 17, been available in 2012, would have I conducted my analysis in a different way?  Would the results still look similar?  I decided to re-analyze my April 2012 data with Minitab 17 and assess the differences, if there are any.

There were no less than 12 parameters to analyze with a binary response. Among them 11 parameters were continuous variables, one factor was discrete in nature (white and red wines: a qualitative variable), and the number of two-factor interactions that could be studied was huge (66 two-factor interactions were potentially available).

The parameters to be studied :

Variable

Details

Units

Type

red or white

N/A

pH

acidity (below 7) or alkalinity (over 7)

N/A

Density

density

grams/cubic centimeter

Sulphates

potassium sulfate

grams/liter

Alcohol

percentage alcohol

% volume

Residual sugar

residual sugar

grams/liter

Chlorides

sodium chloride

grams/liter

Free SO2

free sulphur dioxide

milligrams/liter

Total SO2

total sulphur dioxide

milligrams/liter

Fixed acidity

tartaric acid

grams/liter

Volatile acidity

acetic acid

grams/liter

Citric acid

citric acid

grams/liter

Restricting Analysis to the Main Effects

In 2012, due to the very large number of potential two-factor interactions, I restricted my analysis to the main effects (not considering the interactions between continuous variables).

Because the individual parameters had to be eliminated one at a time, according to their p value (the highest p values are eliminated one at a time until all the parameters and interactions that remain in the model have p values that are lower than 0.05), this was a very lengthy process.

To avoid obtaining an excessively complex final model, I eventually decided to analyze white and red wines separately (a model for the white wines, another model for the red wines), suggesting that the effect of some of the variables were different according to the type of wine.

Including 2-Way Interactions in the Analysis

Using Minitab 17 makes a substantial difference in this respect. All 2-way interactions can be easily selected to generate an initial model :

interactions

With Minitab 17, you can use stepwise logistic binary regression to quickly build a final model and identify the significant effects. In 2012, I used a descending approach considering all variables first and eliminating one variable at a time manually.

This lengthy and tedious process takes just a single click in Minitab 17:

stepwise

 

The results above show that Alcohol and Acidity (both fixed and volatile) seem to play a major role.

The Residual sugar by Type of wine interaction is barely significant with a p value (0.087) larger than 0.05 but smaller than 0.1.

The R Squared value (R-Sq) is also available in Minitab 17, to assess the proportion of the total variability that is explained by the model. The larger the R square value, the more comprehensive our model is (a large R squared means that we have got the full picture of our process, a low R squared means that our model explains only a small part of the variability in the response). In this example, the R squared is relatively low (28%) with 72% of the total variability unexplained by the model.

In 2012, the final result consisted of two equations that could be used to understand which variables were significant for each type of wine in order to improve their taste.

Optimizing the Response

In Minitab 17, I can go one step further and use the optimization tool to identify the ideal settings and help the experimenter make the right decision.

regression equation

Optimize

The optimization tool shows that tasters tend to prefer wines with a large amount of alcohol and both high fixed acidity and high volatile acidity.

Finally, showing graphs is important to convince colleagues and managers that the right decision has been taken. A visual representation is also very useful to better understand the factor effects. In Minitab 17, contour plots and response surface diagrams are available to describe the variable effects in the logistic binary regression sub-menu.

The contour plot below shows that tasters either prefer wines with high fixed acidity and high volatile acidity or with low fixed acidity but also low volatile acidity. The balance between the two types of acidity seems to be crucial.

Contour

Surface

The models I arrived at in April 2012 are different from the one I found with Minitab 17. The two types of Acidity (Fixed and Volatile) were significant in the model for white wines, and Alcohol and Fixed Acidity had been selected in the final model for red wines.

But the main difference is that the Fixed Acidity by Volatile Acidity interaction had not been considered in 2012.  In April 2012, the two-factor interactions were not on my radar, and I instead focused only on the individual main effects and their impact on wine tastes.

Fortunately, with Minitab 17 it is a lot easier to build an initial model—even a complex one with 66 two-factor potential interactions—and stepwise regression allows you to consider a much larger number of potential effects in the initial full model.

Conclusion

Ultimately, this study shows that the methods you use definitely impact your conclusion and statistical analysis. I got a simpler model using the tools available in Minitab 17, and therefore I did not need to study white and red wines separately. The optimization tool as well as the graphs were very useful to better understand the effects of the variables that are significant.

 

ITEA Sneak-Peek: The Great Escape from Foam Defects

$
0
0

The 2014 ASQ World Conference on Quality and Improvement is coming up in early May in Dallas, and this year’s International Team Excellence Award Process (ITEA) will also come to a close at the conference, as winners from the finalist teams will be chosen for ASQ gold, silver, or bronze-level statuses.

What’s ITEA?

The annual ASQ ITEA process celebrates the accomplishments of quality improvement teams from a broad spectrum of industries from around the world. The ITEA is the only international team recognition process of its kind in the United States, and since 1985, more than 1,000 teams from the United States, Argentina, Australia, Brazil, Canada, China, Colombia, Costa Rica, Germany, Guatemala, India, Japan, Mexico, Philippines, Singapore, South Korea, Thailand, and the United Arab Emirates have participated.

The preliminary round entry requires teams to submit a presentation outlining a project completed within the last two years that achieved measurable results. Then, judges chosen by ASQ evaluate the initial team submission and select a group of team finalists from the preliminary round to present their projects at ASQ’s WCQI.

The live team presentations are judged at the conference based upon how well each team’s presentation addresses predefined criteria established by ASQ.

This year, 40 teams from 14 different countries have been named as finalists. (You can see the list of finalists here.) We congratulate all of the Minitab customers who have been selected as finalists!

One of those finalist teams is from Ford Motor Company out of Dearborn, Mich., and I caught up with Scott Sterbenz, a Six Sigma Master Black Belt at Ford Motor Company, about his team’s ITEA presentation.

This is the third year Ford has participated in the ITEA process.

“Our own internal processes at Ford for completing quality improvement projects are aligned very closely with the ITEA criteria and methodology,” Sterbenz says. “So we’ve found that our project fits in well with the ITEA competition.”

The Great Escape from Foam Defects

The Ford team’s project for this year is called “The Great Escape from Foam Defects,” and addresses the implementation of castor oil derived foam for the instrument panel of the 2013 Ford Escape using DMAIC problem solving.

“During the development of the 2013 Escape, Ford had a strong drive to deliver a compact utility vehicle with minimal impact to the environment,” says Sterbenz. “While the development team made many updates to the Escape to make it more environmentally friendly, they wanted to go even further.”

A few months before the launch of the 2013 Escape, Ford made a late decision to use a more environmentally friendly foam—castor oil foam—for the instrument panel instead of a petroleum-based foam. The problem was that Ford’s supplier was experiencing high scrap rates of about 30 percent for the new foam, instead of a typical 1 percent scrap rate.

“We used Minitab Statistical Software to help us conduct and analyze a sensitivity Designed Experiment, which helped us validate our root cause and actions for improving the scrap rate,” Sterbenz says. “And we were able to reduce the scrap rate initially to 0.7 percent, and then with some incremental actions, to a stable 0.1 percent.”

The Main Effects Plot below helped the team validate a root cause for the high scrap rate—by looking at the difference in the sensitivity DOE results between the new castor-oil-based foam and the previously used petroleum-based foam. Green circles indicate quality levels where instrument panels are acceptable; red circles indicate quality levels where instrument panels would be scrapped.

The team hypothesized that castor oil foam was more sensitive than petroleum-based foam to the normal and expected variation in process energy input from the foaming process and foaming tool parameters. The v-shaped main effects plot above indicated significant energy input variation sensitivity for castor oil foam, while the main effects plot for petroleum-based foam showed insignificant sensitivity to that same energy input variation.

To learn more about Ford’s ITEA project and the solution they developed for reducing the high scrap rate, be sure to attend the team’s live presentation at this year’s ASQ WCQI, on Tuesday, May 6 at 10:45 a.m. Also, don’t forget that the ITEA live team presentations start during the morning sessions on Monday, May 5, and there are many interesting case studies to be heard from many different industries—including transactional and other service-based industries.

The Ford team is also participating in the ASQ WCQI poster display contest, and their poster this year sounds pretty cool. They’ve likened the steps they took to accomplish their “Great Escape from Foam Defects” ITEA project to the steps of another “great escape”—the steps inmates took to plan and execute their escape from the Alcatraz federal prison on June 11, 1962.  Be sure to check out all the posters on display in the exhibit hall!

Minitab is very proud to sponsor the 2013-2014 ITEA process! Visit the ITEA website to learn more about how to enroll your team in the 2014-2015 competition. 

A Different Look at the New Medicare Data

$
0
0

It’s been an exciting week to be interested in Medicare data. On April 9th,  the American government opened up data from the Centers for Medicare and Medicaid Services (CMS) that show charges made to Medicare and payments received by over 880,000 entities. If you went to Bing on Monday, April 14, at about 12:30, chose to look at news stories, and typed Medicare money into the search box, here’s a sampling of what you got:

Medicare doctors: Who gets the big bucks & for what
The Medicare Data’s Pitfalls
Medicare Data Shines Light on Billions Paid to TX Doctors
Political Ties of Top Billers for Medicare
Bob Menendez Donor Tops Medicare Money List; Fla. Opthalmologist Raked In Nearly $21 Million
The top 10 Medicare billers explain why they charged $121M in one year
Who gets the most Medicare money in Nashville, and other fun facts from today’s CMS data release
44 Central Florida doctors get at least $1 million in Medicare money
What the new Medicare billing data show (and where to learn much more)
A Look at Where Medicare Money Goes

As you’d expect, there’s a lot of focus on who gets the most money. Some articles draw attention to where the biggest numbers in the data are, other articles try to provide a bit of context so that we don’t assume that most oncologists are crooks. One thing you know for certain: if you look at a data set with over 880,000 observations, the biggest and smallest numbers are going to be wildly unusual.

Because the news media have taken care of highlighting the biggest values, I thought we should look at the data some different ways. What do we see if instead of looking at the biggest numbers, we look at the smallest? How much does the average really tell you about what you want to know?

Want to follow along? All of the data is available from CMS.gov. I'm using the data in this file: Medicare Physician and Other Supplier Aggregate table, CY2012,

The littlest numbers

Congratulations to one Doctor Shaw, who billed Medicare a total of $19 in 2012. That’s despite Dr. Shaw providing services to 15 Medicare beneficiaries! Dr. Shaw charged $1.00 for something recorded in the CMS data as “Assay glucose blood quant.” That $1.00 figure, by the way, is the average submitted charge for that procedure. Is Dr. Shaw more patriotic for being the anti-Dr. Qamar, who billed Medicare the most in the same data set? Who knows. But let’s take a moment to celebrate the man, the myth, the legend that is Dr. Shaw.

One of the headlines above notes that the top 10 Medicare billers charged Medicare about $121 million in 2012. Turns out that many of them are just the lucky individuals whose identities are used to identify an entire business, a fact that they’ve probably had to explain a lot since the database was released. Those explanations probably aren’t being asked of the bottom 10 Medicare billers, who accounted for combined charges of $339.14. That puts those 10 well below the average submitted charges of $286,608 for those who appear in the database.

To come out at that point is no mean feat, as you cannot see on the pie chart, because the slice for $339.14 is too small to see with the naked eye.

  The bottom 10 billers asked for so little money that their slice is impossible to see on the pie chart.

Who doesn’t get paid the big bucks, and for what

The first article from the Bing search results let us know that providers who report their type as hematology/oncology, radiation oncology, and ophthalmology tend to get paid the most. But there’s an opposite end to that scale, too. The 345 entities who got paid the least, on average, by Medicare in 2012 were billed as certified nurse midwives. The other two smallest, on average, are anesthesiologist assistants and mass immunization roster billers. Hopefully, this illustrates the folly of basing the value of a profession on what Medicare pays them. Midwives, as you would guess, work much more frequently with patients who are having babies—a group that I would surmise has little overlap with Medicare patients.

Providers in these three categories were paid less than $7,000 on average by Medicare in 2012.

Other statistics are important, too

Maximums and averages are important summary statistics, but it’s important to remember that if you want to understand the data it’s foolish to focus on any one number too closely. According to the CNN story that was the first result on Bing, hematology/oncology, radiation oncology and ophthalmology were the specialties that collected the most Medicare dollars on average. But that list leaves out that the highest averages actually belong to provider types that CNN did not associate with a specialty. Here’s a look at the top 10 categories for receiving payments in the database:

Most provider types that have receive high average payments are not specialties.

Of course, just because those specialties have high averages doesn’t guarantee that we know very much about any individual. The amount that we don’t know is highlighted when we look at the standard deviations, which measure how inconsistent the payment amounts to providers are. I show all of the data in the first scatterplot that follows. In the second scatterplot, I omitted the categories with large averages that CNN did not count as specialties. In the third graph, I also omitted categories that had high standard deviations because of the range of services they supply: Independent Diagnostic Testing Facility, Unknown Specialty Physician Code, and All Other Suppliers.

Including the high mean provider types hides the relationship among specialities.

Categories that combine groups have unusually high standard deviations.

The higher the mean, the higher the standard deviation.

Once we get down to the specialties, the relationship between the mean and the standard deviation is clear. The specialties with the highest averages tend to have the highest standard deviations. The higher the average for a specialty, the less we know about any individual provider.

It’s also interesting to see what happens if you look at medians, which are a better representation of what a typical provider received from Medicare in 2012 than the means are.

The highest median payments are different from the highest mean payments.

While ophthalmology is still a top category, the median value is $178,758, or roughly half of the average. Additionally, nephrology and cardiology now round out the top three specialties. Hematology/oncology, which had the highest average, falls out of the top 10.

Conclusion

Big numbers are exciting, and they tend to attract our attention. But focusing on a small part of the data rarely tells the whole story.

Of course, if you’re really into data analysis, you know that the first question to ask is not “What’s the maximum, but “Can I trust my data?” One of the first steps now that the CMS data is open is to find out how trustworthy  the data are.

How to Correctly Interpret P Values

$
0
0

P valueThe P value is used all over statistics, from t-tests to regression analysis. Everyone knows that you use P values to determine statistical significance in a hypothesis test. In fact, P values often determine what studies get published and what projects get funding.

Despite being so important, the P value is a slippery concept that people often interpret incorrectly. How do you interpret P values?

In this post, I'll help you to understand P values in a more intuitive way and to avoid a very common misinterpretation that can cost you money and credibility.

What Is the Null Hypothesis in Hypothesis Testing?

Scientist performing an experimentIn order to understand P values, you must first understand the null hypothesis.

In every experiment, there is an effect or difference between groups that the researchers are testing. It could be the effectiveness of a new drug, building material, or other intervention that has benefits. Unfortunately for the researchers, there is always the possibility that there is no effect, that is, that there is no difference between the groups. This lack of a difference is called the null hypothesis, which is essentially the position a devil’s advocate would take when evaluating the results of an experiment.

To see why, let’s imagine an experiment for a drug that we know is totally ineffective. The null hypothesis is true: there is no difference between the experimental groups at the population level.

Despite the null being true, it’s entirely possible that there will be an effect in the sample data due to random sampling error. In fact, it is extremely unlikely that the sample groups will ever exactly equal the null hypothesis value. Consequently, the devil’s advocate position is that the observed difference in the sample does not reflect a true difference between populations.

What Are P Values?

JokeP values evaluate how well the sample data support the devil’s advocate argument that the null hypothesis is true. It measures how compatible your data are with the null hypothesis. How likely is the effect observed in your sample data if the null hypothesis is true?

  • High P values: your data are likely with a true null.
  • Low P values: your data are unlikely with a true null.

A low P value suggests that your sample provides enough evidence that you can reject the null hypothesis for the entire population.

How Do You Interpret P Values?

VaccineIn technical terms, a P value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis.

For example, suppose that a vaccine study produced a P value of 0.04. This P value indicates that if the vaccine had no effect, you’d obtain the observed difference or more in 4% of studies due to random sampling error.

P values address only one question: how likely are your data, assuming a true null hypothesis? It does not measure support for the alternative hypothesis. This limitation leads us into the next section to cover a very common misinterpretation of P values.

P Values Are NOT the Probability of Making a Mistake

Incorrect interpretations of P values are very common. The most common mistake is to interpret a P value as the probability of making a mistake by rejecting a true null hypothesis (a Type I error).

There are several reasons why P values can’t be the error rate.

First, P values are calculated based on the assumptions that the null is true for the population and that the difference in the sample is caused entirely by random chance. Consequently, P values can’t tell you the probability that the null is true or false because it is 100% true from the perspective of the calculations.

Second, while a low P value indicates that your data are unlikely assuming a true null, it can’t evaluate which of two competing cases is more likely:

  • The null is true but your sample was unusual.
  • The null is false.

Determining which case is more likely requires subject area knowledge and replicate studies.

Let’s go back to the vaccine study and compare the correct and incorrect way to interpret the P value of 0.04:

  • Correct: Assuming that the vaccine had no effect, you’d obtain the observed difference or more in 4% of studies due to random sampling error.
     
  • Incorrect: If you reject the null hypothesis, there’s a 4% chance that you’re making a mistake.
What Is the True Error Rate?

Caution signThink that this interpretation difference is simply a matter of semantics, and only important to picky statisticians? Think again. It’s important to you.

If a P value is not the error rate, what the heck is the error rate? (Can you guess which way this is heading now?)

Sellke et al.* have estimated the error rates associated with different P values. While the precise error rate depends on various assumptions (which I'll talk about in my next post), the table summarizes them for middle-of-the-road assumptions.

P value

Probability of incorrectly rejecting a true null hypothesis

0.05

At least 23% (and typically close to 50%)

0.01

At least 7% (and typically close to 15%)

Do the higher error rates in this table surprise you? Unfortunately, the common misinterpretation of P values as the error rate creates the illusion of substantially more evidence against the null hypothesis than is justified. As you can see, if you base a decision on a single study with a P value near 0.05, the difference observed in the sample may not exist at the population level. That can be costly!

In my next post, I’ll explore the actual error rate a bit more, and cover the correct way to use P values to avoid costly mistakes.
 

*Thomas SELLKE, M. J. BAYARRI, and James O. BERGER, Calibration of p Values for Testing Precise Null Hypotheses, The American Statistician, February 2001, Vol. 55, No. 1

What I Learned From Treating Childbirth as Failure, Part II

$
0
0

A couple of years ago, I wrote a blog post titled "What I Learned From Treating Childbirth as Failure" that conveniently ended up getting published the day before my daughter was born.  You should read it first, but to summarize it demonstrates how we can predict the odds of an event happening during certain time intervals even when the original data is highly censored.

Since then, several people have asked (two in the comments alone) where I came up with the numbers I stated at the end:

  • When should a relative arrive on a 7-day stay to have the greatest chance of being there for the birth? (May 17th)
  • What are the odds of the baby being born on a weekend? (28.6%)
  • What are the odds of the baby being born on her great-grandmother's birthday, May 14th? (3.7%)

To answer these, let's recap: using data, we have a good estimate of the distribution that natural childbirths follow based on days in gestation.  That distribution can be seen here:

Distribution Plot

I've marked with a reference line 280 days, which corresponds to the "due date."  So to keep things in terms of my original post, the due date corresponds to day 280, or May 19.  A key concept to understand in calculating the numbers stated above is that this is a continuous scale, so May 19 includes all points starting at 280 and going up to, but not including, 281.  So 280.1 and 280.3587465 and 280.9999999 are all part of May 19th.

So the odds that the baby would be born on the due date is the probability of falling between 280 and 281 in the distribution above are about 4.7%, as shown here:

Dist Plot Due Date

Now on to the specific bullet points from above, which I will cover in a different order for simplicity...

What are the odds of the baby being born on her great-grandmother's birthday, May 14th?

If you grasped what I said above about the odds of being born on the due date, this one is easy.  It's the same idea, except May 14th corresponds to day 275, so I just need the odds of being between 275 and 276 in the distribution:

Dist Plot May 14

There you have it...3.7%.  Another way to find that would be to find the cumulative odds of being born by day 276 (30.3%) and subtract the cumulative odds of being born by day 275 (26.6%):

Dist Plot May 14 Separate

When should a relative arrive on a 7-day stay to have the greatest chance of being there for the birth?

Thus far we have looked at a single day, but there's no reason we couldn't look at a half-day, a specific hour, a 3-day period, or any other time frame. So this question would pertain to a 7-day period rather than a 1-day period. For simplicity I limited myself to midnight-to-midnight, which is obviously not practical but keeps things conceptually easier to explain.  For example, here are the odds of the baby being born during the 7-day period starting day 273 (a week before the due date) and running through day 280 (the due date):

Dist Plot Week Early

So the odds of the baby being born early but not more than a week early are about 27.5%.  To answer the question, I just look at every 7-day period that is reasonable and find the highest odds, which corresponds to two days before the due date (May 17th) and the six days afterward:

Dist Plot 7 Days

This gives a 32.1% chance of being there for the birth.

What are the odds of the baby being born on a weekend?

This is really no different than the earlier questions, except now there are multiple areas of interest.  First we need to know which days correspond to weekends.  The due date of May 19th—day 280—was a Saturday.  So days 280 and 281 were on a weekend, and likewise were days 273 and 274 and days 287 and 288 and so on. Luckily there is a limited area where reasonable probability exists, so we just add up the odds of each weekend:

Dist Plot Weekends

.0009995 + .007606 + .02827 + .06573 + .09391 + .06904 + .01893 + .001205 = .28569

There you have it—there is a 28.6% chance of the baby being born on a weekend.

Summary

I hope this has cleared up how you can convert the estimated distribution to the odds of an event happening during any time interval...in my case, my daughter was ultimately not born on her great-grandmother's birthday, and not on a weekend.  But she was born during the most likely 7-day window, to the delight of her visiting grandparents!


Creating a Custom Report using Minitab, part 1

$
0
0

As a member of Minitab’s Consulting and Custom Development Services team, I get to help companies across a variety of industries create a many different types of reports for management. These reports often need to be generated weekly or monthly. I prefer to automate tasks like this whenever possible, so that new or updated reports can be created without much effort. A little investment up front can save a lot of time by eliminating the need to recreate the wheel every time management wants a current report. 

I’m going to tell you how to use Minitab Statistical Software to automatically generate a report based on information retrieved from an external data source. But there’s a lot to cover, so I’ll have to do it in two posts.

In this post, I’ll discuss automating analyses inside Minitab. In the next post I’ll demonstrate how to get Minitab “talking” to Microsoft Word programmatically using Minitab’s API. The end result will be a very simple automated report that exhibits the power of customization in Minitab.

Requirements for a Capability Report

capability graphWe’re going to create a simple Capability report based on some commonly used tools in the quality realm. Let’s start by breaking down what we need to make it:

  1. First, we need data! We can automatically pull in the current month’s data from an Access database. I created my own Table in Access for this example, which you can download here.
     
  2. Next, we need to analyze the data. We’ll run three analyses and store the following results:
  1. Normality Test → Store the P-Value associated with the Test.
  2. Control Chart → Store how many data points are out of control.
  3. Process Capability → Store the Cpk statistic.
Generating the Minitab Code

Now that we have our strategy for Minitab, let’s pull the data into Minitab programmatically.  Here is the Minitab Command Language we’ll use to import the data: 

ODBC;
Connect "DSN=MS Access Database;DBQ=C:blogpost.accdb";
SQLString "SELECT Date,Yield FROM Yield where MONTH(Date)=Month(NOW())" &
"and YEAR(Date)=YEAR(NOW())".

This may look like another language to you, and it sort of is.  So let’s break down what it says. The section below is a Connection String that tells Minitab where your database is, and how to connect to it:

Connect "DSN=MS Access Database;DBQ=C:blogpost.accdb";

(If you’re following along, you’ll need to alter the database location C:blogpost.accdb to reflect the location of the file on your machine.)

The next section is just an SQL Statement in command language:

SQLString "SELECT Date, Yield FROM Yield WHERE MONTH(Date)=Month(NOW())" &
"and YEAR(Date)=YEAR(NOW())".

If you look closely at the SELECT statement, you will notice we are pulling two fields (Date and Yield) from a Database table named Yield.  We also include this condition: 

WHERE MONTH(Date)=Month(NOW()) and YEAR(Date)=YEAR(NOW()).

Can you guess what the condition is doing?  If you guessed that it pulls data corresponding to the current month, you are correct. Of course, SQL statements can be much more complicated that this example. Fortunately, Minitab lets you use the SQL statement that best fits your needs, no matter how complex it may be. 

Now that we’ve pulled the data in using Minitab Command Language, we need Minitab to run a few tests and create some charts.

To run the Normality test on Yield and store the p-value in column 3, the command language is:

Normtest Yield;
Spvalue C3.

Pretty straightforward. Now let’s run the I-Chart and count how many points are out of control:

IChart Yield;
  Stamp Date; # this command puts the Date on the X-axis of the chart
  TResults C4.

Sum C4 K1

The constant K1 will now reflect how many points were out of control.

For this Capability Analysis, we will assume the data is Normal and that the Yield of the product has a Lower Spec of 7 and an Upper Spec of 13.  We will store Cpk in column 5:

Capa Yield 1; # the number 1 signifies the subgroup size
  Lspec 7;
  Uspec 13;
  CPK C5.

There you have it! We just imported the current month’s data and ran three analyses with the following command language:

ODBC;
  Connect "DSN=MS Access Database;DBQ=C:blogpost.accdb";
  SQLString "SELECT Date,Yield FROM Yield where MONTH(Date)=Month(NOW())" &
  "and YEAR(Date)=YEAR(NOW())".

Normtest Yield;
  Spvalue C3.

IChart Yield;
  Stamp Date; # this command puts the Date on the X-axis of the chart
  TResults C4.

Sum C4 K1

Capa Yield 1; # the number 1 signifies the subgroup size
  Lspec 7;
  Uspec 13;
  CPK C5.

If you save the above command language in Notepad using the file extension .mtb, you will have a Minitab Exec.  You can run this file in Minitab by navigating to File > Other Files > Run an Exec.

In the next blog post, I will describe how to utilize Minitab’s API in Visual Basic for Applications to generate the Capability Report.

 

Creating a Custom Report using Minitab, part 2

$
0
0

Now that you’ve seen how to automatically import data and run analyses in my previous post, let’s create the Monthly Report!

I will be using a Microsoft Word Document (Office 2010) and adding bookmarks to act as placeholders for the Graphs, statistics, and boilerplate conclusions.

Let’s go through the steps to accomplish this:

  1. Open up an existing report that you have previously created in Microsoft Word.
  2. Highlight a section of the document where you would like to place the created Minitab graph or statistic.
  3. Go to the Insert tab, click the Bookmark link, and type in the name of what you will be replacing.  In this instance, I typed ‘NormalityPlot’ and clicked Add:
     

  1. Repeat the steps above for each graph and statistic that will need to be inserted into your report.

Now go to the Developer tab in Microsoft Word, and click on Macros.  Here you can enter the name of your macro and click Create.

Let’s first make sure we reference the Minitab COM API, so Microsoft Word can talk directly to Minitab.  In Visual Basic for Applications, go to Tools > References, and check the box for ‘Mtb 17.0 Type Library’.

 

We can now start coding.  Let’s start by declaring some variables and initializing them:

Dim mtbApp As Mtb.Application
Dim mtbProject As Mtb.Project
Dim mtbWorksheet As Mtb.Worksheet

Set mtbApp = New Mtb.Application
Set mtbProject = mtbApp.ActiveProject
Set mtbWorksheet = mtbProject.ActiveWorksheet

' We can even have Minitab run behind the scenes, hidden from the user.
mtbApp.UserInterface.Visible = False
mtbApp.UserInterface.DisplayAlerts = False

The code above gets Minitab running.  Next let’s run the Minitab exec we created in the last blog post:

mtbProject.ExecuteCommand "Execute 'C:UsersdgriffithDesktopblog.mtb' 1."

(If you’re following along, you’ll need to alter the exec location C:USERSDGRIFFITHDESKTOPblog.mtb to reflect the location of the file on your machine.)

Now we can tell Minitab to save the graphs as JPEGs and place them in the Report. Once again, you’ll need to change the file locations to match yours.

mtbProject.Commands.Item(3).Outputs.Item(1).Graph.SaveAs "C:normtest", True, GFJPEG

ActiveDocument.Bookmarks("NormalityPlot").Range.InlineShapes.AddPicture "C:normtest.jpg"

mtbProject.Commands.Item(4).Outputs.Item(1).Graph.SaveAs "C:controlchart", True, GFJPEG

ActiveDocument.Bookmarks("ControlChart").Range.InlineShapes.AddPicture "C:controlchart.jpg"

mtbProject.Commands.Item(6).Outputs.Item(1).Graph.SaveAs "C:capability", True, GFJPEG

ActiveDocument.Bookmarks("Capability").Range.InlineShapes.AddPicture "C:capability.jpg"

At this point, we have placed all the graphs successfully into the report.  Let’s add some boilerplate text around the Normality test P-Value, the number of data points that are out of control, and whether or not Cpk meets our guideline of 1.33:

Dim pValue As Double

pValue = mtbWorksheet.Columns.Item(3).GetData(1, 1)

If pValue <= 0.05 Then

    ActiveDocument.Bookmarks("Pvalue").Range.Text = "Data does not pass Normality Test"

Else

    ActiveDocument.Bookmarks("Pvalue").Range.Text = "Since Normality Test P-Value is greater than 0.05. Assume Normal Distribution"

End If

Dim oocPoints As Integer

oocPoints = mtbWorksheet.Constants.Item(1).GetData

If oocPoints = 0 Then

    ActiveDocument.Bookmarks("OutOfControl").Range.Text = "Data appears to be stable over time"

ElseIf oocPoints = 1 Then

    ActiveDocument.Bookmarks("OutOfControl").Range.Text = "Investigation is needed as 1 data point was found to be out of control."

Else

    ActiveDocument.Bookmarks("OutOfControl").Range.Text = "Investigation is needed as " & oocPoints & " data points were found to be out of control."

End If

Dim cpk As Double

cpk = mtbWorksheet.Columns.Item(5).GetData(1, 1)

If cpk < 1.33 Then

    ActiveDocument.Bookmarks("cpk").Range.Text = "Cpk does not pass Acceptable Guideline of 1.33."

Else

    ActiveDocument.Bookmarks("cpk").Range.Text = "Cpk passes Acceptable Guideline of 1.33.  Process is operating at Acceptable level."

End If

So what does all this get us?  The code above generates the following report:

Running the Microsoft Word macro automatically generates a report that communicates with Minitab Statistical Software (in the background) and provides an understanding of our process for the current month.  And now that the code is written, you have a fully functioning Report generation tool that works with the press of a button. 

Pretty neat, if you ask me.

If you are intimidated by this type of process, we provide a service that does the programming for you.  More information can be found here. You can also e-mail us, and I will personally follow up to see if we can automate any processes for you.  Also, if you would like to see additional examples illustrated in a blog post, feel free to post a comment below.

If you want to get your hands dirty in the programming, you can find more information in Minitab’s Help System:

For Minitab Command Language, go to Help > Help and click the link for Session Commands.

For Minitab’s API, go to Help > Help and click the link for Minitab Automation.

Hockey Penalties, Fans Booing, and Independent Trials

$
0
0

RefWe’re in the thick of the Stanley Cup playoffs, which means hockey fans are doing what seems to be every sports fan's favorite hobby...complaining about the refs! While most complaints, such as “We’re not getting any of the close calls!” are subjective and hard to get data for, there's one question that we should be able to answer objectively with a statistical analysis: Are hockey penalties independent trials? That is, does the team that the next penalty will be called on depend on the team that any previous penalties were called on?

Think of flipping a coin. Even if it comes up heads 10 times in a row, the probability of getting heads on the next flip is still 50%. In theory, you would think penalties in hockey work the same way. Both teams are playing hard, and should be equally likely to commit the next penalty. Maybe a single player would be less likely to commit a second penalty right after he just committed one because he’ll play more cautiously. But at the team level, you would think the outcome of the next penalty to be 50/50.

But players aren’t the only ones who affect the outcome of a penalty. Referees are ultimately the people who decide when to let things go and when to call a penalty. And you can only imagine what the crowd and coaches would do to the ref if the home team had 10 penalties called against them in a row.

So let’s dig into the data with Minitab Statistical Software and see if refs call penalties independently, or if the team they call it on depends on which team they called previous penalties on.

The Data

I’m only going to include playoff games in my sample, because those are the games where the stakes are the highest. For every Stanley Cup Playoff game from 2013 and an additional 21 games from this year, I collected the team the penalty was called on (either home or away) and the order in which they were called. I only included penalties where one team got a power play. So if matching penalties were assessed to players on opposite teams, I didn’t include those since it didn't give either team an advantage. I also didn’t include penalties for fights that occurred late in hockey games that were blowouts. (By that point the game was effectively over, so the penalties didn’t matter.) In total I had 732 penalties that occurred in 106 playoff hockey games.

The First Penalty

No penalties have been called at this point, so there shouldn’t be any bias. We should expect an equal amount of penalties to be called on the home and away team.

Tally

Sure enough the penalties are just about 50/50. So far, so good, refs!

The 2nd and 3rd goals

Now let’s see what happens with the next two penalties, starting with the second. Is the team that the second penalty is called on independent of the team the first was called on? We can perform a chi-square test of association to get an answer.

Chi-square test

Looking at the table, we see that of the 51 times the first penalty was called on the away team, the next penalty was called on the home team 33 times (65%). And of the 55 times the first penalty was called on the home team, the next penalty was called on the away team 32 times (58%). It appears that the second penalty depends on the first, but we should examine the results of the chi-square tests to be certain.

The p-values for both chi-square tests are 0.018. Because this value is less than 0.05, we can conclude that there is an association between the first and second penalty. So if your team gets called for the first penalty, odds are the next one is going against the other team.

Will this trend continue for the third penalty? Let’s start by thinking about the number of penalties called on the home team. There could be 0, 1, 2, or 3 penalties called on them. If the penalties are independent of each other, then the probability of a single penalty being called on the home team at any point in time is 0.5. We can use this to easily calculate the probabilities for the different amount of penalties that could be called on the home team.

Penalties on the home team

Equation

Probability

0

.53

0.125

1

3*(.5)*(.52)

0.375

2

3*(.52)*(.5)

0.375

3

.53

0.125

  
Now we just have to summarize our data, and see how many times each of these actually occurred. Then we can use a chi-square goodness-of-fit test to compare our observed values to the expected probabilities that we calculated above.

Chi-square goodness-of-fit test

If you look at N in the bottom left corner, you’ll see that our sample size dropped to 103 games. That’s because there were 3 games where only 2 penalties were called, so they couldn’t be included in this analysis.

Now let’s focus on the table. In a sample of 103 games, we would expect there to be about 13 games where the home team had 0 penalties and another 13 where they had 3. But the Observed column shows us that there were far fewer. In fact, there were only 3 games where the first 3 penalties all went to the home team. It seems like the refs were reluctant to get the home crowd angry at them.

The 1 and 2 penalty categories suggest that the refs appear reluctant to have anybody get mad at them. While we would expect about 39 games to occur in each category, there were 46 and 47 instead!

The p-value for the chi-square test is 0.004. This means we can conclude that the data do not follow the proportions we would expect if the trials were independent. When it comes to the first 3 penalties of the game, the refs are reluctant to give either team too much of an advantage (especially the away team), instead opting to make the penalties as even as possible.

The Entire Game

Now let’s move on from the first couple goals and try to determine what happens throughout the entire hockey game. For every penalty throughout a single game, I gave it a 1 if it was called on the home team and a -1 if it was called on the away team. Then I added these values up throughout the game to keep track of the “count”. So if the first 3 penalties were called on the away team, the count is at -3. And if 3 of the first 5 penalties were called on the home team, the count would be at 1. Here is a histogram of the counts.

Histogram

Notice that it is very rare for the count to move too far from 0. Counts over 2 or less than -2 are pretty rare, once again showing that refs don’t want to seem too biased toward either team. Because counts of 3 or higher (or -3 and lower) were rare, these samples sizes are too small to obtain any conclusions from. So I combined them with the 2 and -2 category to increase the sample size.

Once I had the count for every penalty in the game, I recorded the team the next penalty was called on. Then for each count, we can see the proportions for the team the next penalty was called on!

Tabulated Statistics

Let’s start with the negative counts. This means the away team has had more penalties called on them than the home team. We see that in these cases, the home team is slightly more likely to be called for the next penalty, but the probabilities are still close to 50%. Any bias the refs show in calling penalties does not favor the away team.

It’s quite a different story when we look at the home team. When the home team has been called for 2 or more penalties than the away team, the next penalty goes on the away team 75% of the time! And in case you think this is just a result of combining all the counts of 2 or more, I can tell you that if you take a count of 2 as its own category, the percentage of the next penalty going to the away team was still 74.65%.

So what is causing this? It’s unlikely to be the players or the coach of the home team yelling at the ref. After all, I’m sure the players and coach of the away team yell at the refs just as much when the count is negative, and they don’t see an advantage.

That brings us back to the fans. Remember when I said every sports fans favorite hobby seems to be complaining about the refs? Well, refs are humans, too, and it definitely seems plausible that when most of the calls are going against the home team, the crowd noise might cause the refs to be more prone to call the next penalty on the away team. Of course, this analysis can’t prove that this is the cause, but it adds some credence to a nice theory!

So if you ever find yourself at a playoff hockey game, feel free to boo as loud as you want when the ref calls a penalty on your team. Even if it was a good call!  You just might be tipping the odds in your team’s favor!

“Hello, How Can I Help You?”- A Look at Quality Improvement in Financial Services

$
0
0

It’s common to think that process improvement initiatives are meant to cater only to manufacturing processes, simply because manufacturing is where Lean and Six Sigma began. However, many other industries, in particular financial services and banking, also rely on data analysis and Lean Six Sigma tools to improve processes.

Rod Toro is a business process improvement manager at Edward Jones, and I recently got the chance to talk with him about a Lean Six Sigma project the service division at his company completed to improve customer satisfaction.

Edward Jones has been increasing the number of financial advisors who work for the investment firm, and they have a goal for 20,000 financial advisors to be on board by 2020. With all of this growth, the number of service requests and service calls to support local branches has also increased.

 “We were faced with understanding how we can better meet increasing service demands and give better overall customer service,” Toro says. “Using Lean Six Sigma and statistical tools, we performed a project to answer questions, such as, is it better to cross-train employees on multiple skill areas? And, how can we optimize the average speed of answer?”

The service division at Edward Jones is highly focused on providing world-class customer service that’s not only timely, accurate, and professional, but is also customized and conveys “a spirit of caring” to the client. Now, with an increase in service calls, as well as the movement to provide better, world-class service, Edward Jones faced a challenge: how would their current staff of service associates be able to meet the increasing service demands while giving clients even better service than before?

Toro, who was the project’s Master Black Belt, and the rest of the project team had a key breakthrough when thinking through how to approach the project.

“We started thinking first about how we could help the associates do their job better,” says Toro. “Instead, we shifted our focus to improving the overall process—focusing on all aspects that make up a service call—from training associates in the beginning to each phase of the call.”

This shift in focus allowed the team to start thinking about improvement in terms of the process as a whole. A process that was repeatable, standardized, and predictable. A process that could be optimized.

“We knew we needed to identify the metrics to distinguish the right associate for the right skill, as well as streamlining the workflow in a more efficient and meaningful way,” Toro says. “By assigning the right person to the right skills, we’re reducing AHT (average handle time) and thereby improving ASA (average speed to answer) and the overall customer experience.

We had the opportunity to optimize associate capacity to balance the department performance across all phone skills.”

Knowing that their current baseline pain was that all the associates were trained on multiple skills—with all skills given the same staffing priority, as well as resource moves and additions being made frequently without taking the impacts into account—the team knew they could improve from where they were starting from.

So how would they do it? Enter the Design of Experiments (DOE) tools in Minitab.

Performing the DOE

Toro and the team selected four key factors (associate rating, after call work, shift, and training), and reviewed current historical data to determine what data already existed. For the data that was missing, they completed selected experiments.

With all the data in hand, the team ran a 2-level factorial design in Minitab that would allow them to assess the best mix of each of the four key factors.

The main effects and interaction plots below indicated that cross-training associates so they were well-versed on all skills was a detriment.

In addition, main effects and interaction plots for standard deviation revealed that consistency was based mainly on associates who have a higher skill level.

“This was our ‘Aha’ moment,” says Toro. “Cross-training all the associates on all skills wasn’t effective. Instead, we found that we needed to allow them to focus on their best skills, and then they would perform better.”

The Pareto chart below shows the interactions among factors that had a significant impact on overall performance:

“We needed the right people, with the right skills, at the right time,” Toro says.

3D Scatterplots for the win!

“3D scatterplots show the power of Minitab,” says Toro. “We were able to evaluate the relationships between the number of associates, amount of calls, and process lead time—which helped us to determine the key factors and the key levels.”

 

Now, with the right associates who have the right skills being assigned in the right areas, the service division has increased the capacity of the team that they have, while also improving all of the call-center metrics they set out to fine-tune.

“Using DOE to improve services is uncommon, but it really shouldn’t be,” Toro says. “Once you understand the principles of DOE, rather than just focusing on how to use the tools properly, you realize that the tools typically used for process improvement in manufacturing can be customized for use in the service sector—and really, everywhere.”

Want to learn more?

Rod Toro will be presenting a full case study of this Lean Six Sigma project, as well as the lessons the project team at Edward Jones learned along the way at the upcoming ASQ World Conference on Quality and Improvement in Dallas. If you’re attending the conference, be sure to attend “Hello, How Can I Help You?” Improving Customer Satisfaction in Financial Services on Monday, May 5 at 1:30 p.m. in the Senators Lecture Hall at the Hilton Anatole.

And for even more, here is some further reading that may interest you:

Quality Improvement in Financial Services

Exceeding Quality in Financial Services with Minitab

Selecting the Right Quality Improvement Project

$
0
0

I wrote a post a few years back on the difficulties that can ensue when you’re just trying to get started on your Lean Six Sigma or quality improvement initiative. It can become especially difficult when you have many potential projects staring at you, but you aren’t quite sure which one will give you the most bang for your buck.

A project prioritization matrix can be a good place to start when you need to choose which projects to focus on, as it can help you logically select optimal improvement projects against their weighted value, based on your company’s predefined metrics. The matrix can help you determine which projects offer the most value for your effort.

Here’s an example of a project prioritization matrix and a value by project graph that was completed using a template in Quality Companion:

Quality Companion template for a Project Prioritization Matrix

Companion automatically produces graphs based upon the input you list in the project matrix template.

While the example above is only comparing two projects, you could use the matrix to compare many projects. And the graph above makes it even easier (and quicker) to tell visually which projects will provide you with the most value for the least effort.

As you begin your project prioritization matrix, it’s important to consider how your company will define the selection criteria, as well as the assignment of weighting factors to the criteria. If you don’t take a little time up front to work with your team to define these items, you may end up with matrix results that provide an inaccurate view of which projects are considered optimal.

Best practice: for the weighting factor for your criteria, it’s typically made on a 1 to 10 scale with 1 being a negative effect and 10 being a positive effect.

In the example above, selection criteria—such as financial benefits, customer satisfaction, leverage, and repeatability—were chosen. It’s probably best to have at least three selection criteria, but no more than five or six, as too many can become more difficult to manage.

These examples may get your team brainstorming the criteria that will best aid in selecting the right projects for your particular initiative.

Speaking of brainstorming …

How can you come up with the selection criteria?

Brainstorming may work as an effective method for your team to come up with the right selection criteria for your matrix.

There are several different methods you may choose for brainstorming—including idea mapping, fishbone diagrams, and CT trees.

You’d likely use an idea map to structure and organize brainstorming results relating to a central question. Here’s an example:

http://support.minitab.com/en-us/quality-companion/3/Idea_Map.gif

Fishbone diagrams (also called cause-and-effect, C&E or Ishikawa diagrams) help you brainstorm potential causes of a problem—and see relationships among potential causes. Here’s an example of a fishbone that identifies the potential causes of delayed lab results:

http://support.minitab.com/en-us/quality-companion/3/Fishbone_example.gif

(There are many different types of fishbone diagrams—Simple, 4S, 8P, DOE, and Man Machines Materials. Take a look at this post from my colleague Eston that goes into further detail about the different types.)

A CT Tree, or critical-to-quality or critical-to-cost tree, is used to identify and to organize the inputs that are critical to your customer. This CT tree shows what inputs are critical to reducing the number of defects in a manufactured product:

http://support.minitab.com/en-us/quality-companion/3/CT_Tree.gif

Now you can use a white board, or even just pen and paper to brainstorm, but Quality Companion offers brainstorming templates for idea mapping, fishbone diagrams, and CT trees. What’s neat about using Companion is that the factors you enter during your brainstorming session can be drag-and-dropped and/or autofilled into other tools, reducing the need to re-enter data.

To reach consensus on which project selection criterion make the cut, you may want to poll your team members or use a ballot in Companion to assist your team members in voting:

What’s your secret to selecting optimal improvement projects?

Dividing a Data Set into Training and Validation Samples

$
0
0

Adam Ozimek had an interesting post April 15th on the Modeled Behavior blog at Forbes.com. He observed that one of the advantages of big data is how easy it is to get test data to validate a model that you built from sample data.

Ozimek notes that he is “for the most part a p-value checking, residual examining, data modeling culture economist,” but he’s correct to observe that if you can test your model on real data, then you should.

What I’ll describe is certainly not the only way to divide data in Minitab Statistical Software. Still, I think it’s pretty good if I do say so myself. Want to follow along? I’ll use steps that go with the Educational Placement data set for Minitab 17. To follow along in Minitab 16, use the Education.MTW data set, but change the column titles as appropriate. (Here's how to find the sample data set folder in Minitab 16 if its existence is new to you.)

Different professionals will have need to divide the data into different numbers of groups and in different proportions. The goal is usually to use part of the data to develop a model and part of the data to test the prediction quality of the model. The basic operations in Minitab will be similar whether you're dividing the data into two samples for training and validation or into 3 samples for fitting, validation, and testing. The steps will also be pretty similar whether you want equally sized groups, or to use only 10% of your data for validation.

I’m going to divide the data set into two groups where the size of the training set is twice the size of the validation set.

First, I’m going to set up a column to randomly assign the 180 observations in the data set to the two different samples. The size of this dataset does not require so many steps, but the steps can save you some effort if you have such a large data set that keeping track of the numbers requires thought.

  1. Choose Calc > Make Patterned Data > Simple Set of Numbers.
  2. In Store patterned data in, enter c4.
  3. In From first value, enter 1.
  4. In To last value, enter 1.
  5. In Number of times to list each value, enter 120. Click OK.
  6. Press CTRL + E to reopen Simple Set of Numbers.
  7. Change Store patterned data in to c5.
  8. Change From first value to 2.
  9. Change To last value to 2.
  10. Change Number of times to list each value to 60. Click OK.
  11. Choose Data > Stack > Columns.
  12. In Stack the following columns, enter c4 c5.
  13. In Store stacked data in, select Column of current worksheet.
  14. In Column of current worksheet, enter c6. Click OK.

Now we’ll randomize the groups to reduce the bias. After all, in this dataset, the data are in order by track. The most common application would probably not be to use data from tracks 1 and 2 to fit the model and from track 3 to validate the model.

  1. Choose Calc > Random Data > Sample From Columns.
  2. In Number of rows to sample, enter 180.
  3. In From columns, enter c6.
  4. In Store Samples In, enter c6. Click OK.

Now, you have a column that identifies which observations belong in each data set.

  1. Choose Data > Unstack Columns.
  2. In Unstack the data in, enter ‘Test Score’ Motivation.
  3. In Using subscripts in, enter c6.
  4. In Store unstacked data, select After last column in use. Click OK.

Now, you have two data sets, one with 120 observations and one with 60 observations. You can fit the model on the larger data set, then use the second data set to validate the model.

The columns with _1 are the fitting sample. The columns with _2 are the validation sample.

Statistics like predicted R2 have done a lot to help us get good models when we don’t have enough data to get good estimates and see how well the model does on new data. But in cases where you have access to more data than you need to get good parameter estimates, it’s good practice to use some of the data for validation. With a few moves in Minitab, you’re all set to go. Next time I post, I'll show some ways that you can use a validation data set to check the quality of a model in Minitab.

Not All P Values are Created Equal

$
0
0

Fancy PThe interpretation of P values would seem to be fairly standard between different studies. Even if two hypothesis tests study different subject matter, we tend to assume that you can interpret a P value of 0.03 the same way for both tests. A P value is a P value, right?

Not so fast! While Minitab statistical software can correctly calculate all P values, it can’t factor in the larger context of the study. You and your common sense need to do that!

In this post, I’ll demonstrate that P values tell us very different things depending on the larger context.

Recap: P Values Are Not the Probability of Making a Mistake

In my previous post, I showed the correct way to interpret P values. Keep in mind the big caution: P values are not the error rate, or the likelihood of making a mistake by rejecting a true null hypothesis (Type I error).

You can equate this error rate to the false positive rate for a hypothesis test. A false positive happens when the sample is unusual due to chance alone and it produces a low P value. However, despite the low P value, the alternative hypothesis is not true. There is no effect at the population level.

Sellke et al. estimated that a P value of 0.05 corresponds to a false positive rate of “at least 23% (and typically close to 50%).”

What Affects the Error Rate?

Why is there a range of values for the error rate? To understand that, you need to understand the factors involved. David Colquhoun, a professor in biostatistics, lays them out here.

Whereas Sellke et al. use a Bayesian approach, Colquhoun uses a non-Bayesian approach but derives similar estimates. For example, Colquhoun estimates P values between 0.045 and 0.05 have a false positive rate of at least 26%.

The factors that affect the false positive rate are:

  • Prevalence of real effects (higher is good)
  • Power (higher is good)
  • P value (lower is good)

“Good” means that the test is less likely to produce a false positive. The 26% error rate assumes a prevalence of real effects of 0.5 and a power of 0.8. If you decrease the prevalence to 0.1, suddenly the false positive rate shoots up to 76%. Yikes!

Power is related to false positives because when a study has a lower probability of detecting a true effect, a higher proportion of the positives will be false positives.

Now, let’s dig into a very interesting factor: the prevalence of real effects. As we saw, this factor can hugely impact the error rate!

P Values and the Prevalence of Real Effects

Joke: I once asked a statistician out. She failed to reject me!What Colquhoun calls the prevalence of real effects (denoted as P(real)), the Bayesian approach calls the prior probability. It is the proportion of hypothesis tests in which the alternative hypothesis is true at the outset. It can be thought of as the long-term probability, or track record, of similar types of studies. It’s the plausibility of the alternative hypothesis.

If the alternative hypothesis is farfetched, or has a poor track record, P(real) is low. For example, a prevalence of 0.1 indicates that 10% of similar alternative hypotheses have turned out to be true while 90% of the time the null was true. Perhaps the alternative hypothesis is unusual, untested, or otherwise implausible.

If the alternative hypothesis fits current theory, has an identified mechanism for the effect, and previous studies have already shown significant results, P(real) is higher. For example, a prevalence of 0.90 indicates that the alternative is true 90% of the time, and the null only 10% of the time.

If the prevalence is 0.5, there is a 50/50 chance that either the null or alternative hypothesis is true at the outset of the study.

You may not always know this probability, but theory and a previous track record can be guides. For our purposes, we’ll use this principle to see how it impacts our interpretation of P values. Specifically, we’ll focus on the probability of the null being true (1 – P(real)) at the beginning of the study.

Hypothesis Tests Are Journeys from the Prior Probability to Posterior Probability

Hypothesis tests begin with differing probabilities that the null hypothesis is true depending on the specific hypotheses being tested. This prior probability influences the probability that the null is true at the conclusion of the test, the posterior probability.

If P(real) = 0.9, there is only a 10% chance that the null hypothesis is true at the outset. Consequently, the probability of rejecting a true null at the conclusion of the test must be less than 10%. However, if you start with a 90% chance of the null being true, the odds of rejecting a true null increases because there are more true nulls.

Initial Probability of
true null (1 – P(real))

P value obtained

Final Minimum Probability
of true null

0.5

0.05

0.289

0.5

0.01

0.110

0.5

0.001

0.018

0.33

0.05

0.12

0.9

0.05

0.76

The table is based on calculations by Colquhoun and Sellke et al. It shows that the decrease from the initial probability to the final probability of a true null depends on the P value. Power is also a factor but not shown in the table.

Where Do We Go with P values from Here?

wooden block PThere are many combinations of conditions that affect the probability of rejecting a true null. However, don't try to remember every combination and the error rate, especially because you may only have a vague sense of the true P(real) value!

Just remember two big takeaways:

  1. A single statistically significant hypothesis test often provides insufficient evidence to confidently discard the null hypothesis. This is particularly true when the P value is closer to 0.05.
  2. P values from different hypothesis tests can have the same value, but correspond to very different false positive rates. You need to understand their context to be able to interpret them correctly.

The second point is epitomized by a quote that was popularized by Carl Sagan: “Extraordinary claims require extraordinary evidence.”

A surprising new study may have a significant P value, but you shouldn't trust the alternative hypothesis until the results are replicated by additional studies. As shown in the table, a significant but unusual alternative hypothesis can have an error rate of 76%!

Don’t fret! There are simple recommendations based on the principles above that can help you navigate P values and use them correctly. I’ll cover these guidelines in my next post.


Chaos at the Kentucky Derby? Bet on It!

$
0
0

Kentucky DerbyIf betting wasn't allowed on horse racing, the Kentucky Derby would likely be a little-known event of interest only to a small group of horse racing enthusiasts. But like the Tour de France, the World Cup, and the Masters Tournament, even those with little or no knowledge of the sport in general seem drawn to the excitement over its premier event—the mint juleps, the hats...and of course, the betting.

As most of you probably already know, then, a big part of betting is the odds placed on a particular horse, so that a bet on the favorite to win the race would pay out significantly less than a big underdog. It stands to reason, then, that those horses with the best chances of winning would tend to win more often and those with the worst chances would win less frequently.

Odds are typically listed as something like 10/1, which indicates a $1 bet would win $10. Now, this is the opposite of what we typically think of as odds, since a 10/1 horse is actually estimated as having a 1/10 chance (or "odds") of winning. So I'm going to call 10 the "inverse odds" of a horse. Therefore a low inverse odds horse would be considered a favorite to win (for example, a 2/1 horse has a 50% chance of winning based on betting odds and an inverse odds of 2), while a high inverse odds horse would be considered unlikely to win (for example a 50/1 horse has an estimated 2% chance of winning).

So a simple graph showing the inverse odds of every horse from 2007-2013 against its finishing position should easily confirm that horses with low inverse odds perform better, and horses given high inverse odds perform worse, right?  Let's see what a scatterplot made with Minitab Statistical Software reveals.

Position vs Odds

You might convince yourself there is a little bit of a cluster there at the bottom left, but simple regression demonstrates that the relationship, while statistically significant, is extremely weak (R-sq(pred) = 4.69%). I expected to see a much stronger relationship. Instead, these data look pretty chaotic.

Sometimes when data appear chaotic, all it takes is some deeper digging into variables of interest to clarify things. So I used Minitab's Ordinal Logistic Regression tool and analyzed the same dataset to predict the odds of winning based on these factors:

  • Inverse Odds
  • Post (the starting position of the horse in the gate, with 1 being on the inside of the turns)
  • Track (either "Fast" or "Sloppy" conditions)

I was able to get a better model that contained the following factors after eliminating those that were not significant:

  • Post
  • Track
  • Inverse Odds
  • Track*Post
  • Inverse Odds2

For any given track conditions, pole position, and inverse odds, I now have the estimated chances that a particular horse will win. First let's take a look at the "Winning Odds" versus the "Inverse Odds":

Winning vs Inverse Odds

We see that taking other factors into account, horses with lower inverse odds do in fact have higher odds of winning the race. The quadratic fit appears decent as well, although, as is more obvious on the blue line, horses with very low inverse odds tend to do a little better than the fit expects and those with really high inverse odds tends to do a little worse.  It is also obvious that sloppy track conditions (shown in red) tended to yield much more chaotic results. To explain that, we need to look at the odds of winning versus post position, which is not accounted for in the graph above:

Winning Odds vs Post

What we learn here is that when track conditions are fast, a horse's odds of winning are pretty much the same regardless of which post position they start in.  But when conditions get sloppy (typically due to race-day rain), there is a very large advantage in having a low post position, toward the inside of the track.

However, there's one thing I mentioned before that has big implications for where one might place a bet...the inverse odds correspond to the payout for picking correctly. So to really learn something, we need to multiple the estimated odds of winning from our model by the payout if we were to win (per dollar bet).  A payout of less than $1 indicates that over the long run we would expect to lose money placing that bet. Similarly, values over $1 indicate we would expect to win money in the long run on that bet.

To demonstrate the expected payout, I'll use a 3D Scatterplot in order to display all relevant variables in a single graph:

Expected payout

To really explore a 3D graph you need to interact with it and rotate in multiple directions. This is easy to do in Minitab, but difficult to convey on the blog, so I'll share with you the takeaways:

  1. In fast track conditions, almost all bets are long-term losers (the "house" takes a cut of every winning bet), but horses with long odds (50/1 or higher) would be expected to gain money.
  2. In sloppy conditions, horses in the first few post positions are long-term winners almost regardless of inverse odds, with horses across the range of inverse odds expected to earn roughly the same.

Armed with this information, let me save you some time and provide you with some links you might find interesting:

Given what currently looks like sunny weather, I think I'll skip watching the post draw and take my chances with a long-shot like Harrys Holiday, Pablo Del Monte, or Vinceremos!

 

Kentucky Derby image courtesy: kentuckytourism.com

Proving My Toddler Really Doesn’t Know her Left Foot from her Right

$
0
0

"Do it myself!

If only I had a nickel for every time I heard that phrase from my toddler in a given day. From throwing away trash, to putting frozen waffles in the toaster, to feeding the dog, I hear it so often that I could possibly retire with all the nickels I’d collect.

And of course, I hear this proclamation every single time my 2-year-old puts on her shoes.

What happens when a toddler tries to put on their own shoes? Well, at least in the case of my little one, the left shoe goes on the right foot, and the right shoe on the left foot, followed by a triumphant “Do it myself! Yay!!!” And the occasional round of clapping.

Watching this everyday exercise in toddler independence got me thinking—is she randomly picking her left foot versus her right, which any kid has a 50/50 chance of doing? Or, does she really think that the right shoe goes on her left foot and vice-versa?

Since I have plenty of time on my hands while I watch her struggle with her shoes, I decided to collect some data.

The 1 Proportion Test

For the next 25 times she tried on her shoes, I tallied how many times she did it correctly and incorrectly. Since my daughter has an affinity for shoes that would give even Imelda Marcos a run for her money, it fortunately didn’t take me long to collect 25 points of data.

Out of the 25 times she put on her shoes, she did it incorrectly 19 times. So, is this 76% failure rate just due to random chance? Or is 19 out of 25 significantly worse than a random 50/50 chance?

I turned to Minitab Statistical Software and the 1 Proportion Test to find out.

Using Stat > Basic Statistics > 1 Proportion, I arrived at the following results:

 

 

 

 

 

The 1 Proportion Test answers the question, “If the true population proportion is equal to 50%, how likely is it to see a sample proportion of 76%?” The answer is given as the p-value, which for this dataset is 0.007. That’s not very likely.

Using an alpha level of 0.05, this p-value is statistically significant (since it’s less than 0.05). I therefore can conclude that my toddler’s failure rate is significantly worse than 50%. And perhaps she really does think that her right shoe goes on her left foot.

Practical Significance

While she may not qualify for Mensa just yet, at least I get a good laugh from the gleeful “Yay!!!” I hear every time she gets her shoes on, whether or not they end up where they’re supposed to be.

When Will I Ever See This Statistics Software Again?

$
0
0

Minitab Statistical Software was born out of a desire to make statistics easier to learn: by making the calculations faster and easier with computers, the trio of educators who created the first version of Minitab sought to free students from intensive computations to focus on learning key statistical concepts. That approach resonated with statistics instructors, and today Minitab is the standard for teaching and learning statistics at more than 4,000 universities all over the world.

Minitab is used around the world.But many students seem to believe Minitab is used only in education. Search Twitter for "Minitab," and you're likely to find a few students grousing that nobody uses Minitab Statistical Software in the "real world."

Those students are in for a big shock after they graduate. Organizations like Boeing, Dell, General Electric, Microsoft, Walt Disney, and thousands more worldwide rely on Minitab software to help them improve the quality of their products and services.

Savvy instructors already know learning with Minitab can give students an advantage in the job market.

Stories of How Data Analysis Made a Real-World Difference

In my job, I get to talk with professionals about how they use our software in their work. I've interviewed scientists, engineers, miners, shop stewards, foresters, Six Sigma experts, service managers, bankers, utility executives, soldiers, civil servants, and dozens of others.

The statistical methods they use vary widely, but a common thread running through all of their experiences reveals a critical link between Minitab's popularity in the academic world and its widespread application in so many different businesses and industries. Virtually every person I talk to about our software mentions something about "ease of use."  

That makes a lot of sense: Minitab wasn't the first statistical software package, but it was the first statistical software package designed with the express goal of being easy to use. That led to its quick adoption by instructors and students, and those students brought Minitab with them into the workplace. And for more than 40 years, professionals have been using Minitab to solve challenges in the real world.

In case you're looking for examples, here are several of our favorite stories about how people have used Minitab:   

Case Study

Industry

Methods and Tools

U.S. Army

Military

Pareto, Before/After Capability

Rode Kruis and CWZ

Hospital

Boxplot, Pareto Chart

Belgian Red Cross

Healthcare

Histogram, Probability Plot

BetFair

Sports Betting

Interaction Plot, Capability Analysis, I-MR Chart

Ford Motor Company

Automotive

Design of Experiments (DOE)

U.S. Bowling Congress

Sports and Leisure

Scatterplot

Six Sigma Ranch

Wine

Attribute Agreement Analysis, I-MR Chart

Newcrest Mining

Mining

Individual Value Plot

NASCAR

Car Racing

Design of Experiments (DOE)

Have you used Minitab software on the job?  We'd love to hear your story!

Exponential: How a Poor Memory Helps to Model Failure Data

$
0
0

These days, my memory isn't what it used to be. Besides that, my memory isn't what it used to be. 

But my incurable case of CRS (Can't Remember Stuff) is not nearly as bad as that of the exponential distribution.

When modelling failure data for reliability analysis, the exponential distribution is completely memoryless. It retains no record of the previous failure of an item.

That might sound like a bad thing. But this special characteristic makes the distribution extremely useful for modelling the behavior of items that have a constant failure rate.

Using the Exponential Distribution to Model Failure Data

Suppose you track the time until failure of a randomly collected sample of items. When you graph the results on a histogram, you get something like this:

Notice that the number of items that fail at a given point in time decreases steadily as time elapses. This steady decrease can be fit nicely to an exponential curve.

Because the decrease is steady, if you were to plot the instantaneous risk of failure for any item at any specific point in time (t)—what's known as its hazard function—you'd end up with a constant risk of failure at any point in time:

The failure of an item at any point previously does not affect the risk of failure at any other point in time. That means the exponential distribution has the good fortune of not being able to  "remember" any of its past failures. (Every day is a brand new day, if you're exponentially distributed! No baggage!)

For this reason, the exponential distribution often provides a good model in reliability analysis for a product or part that is just as likely to fail at any time, regardless of whether it is brand new, a year old, or several years old. Such an item is not expected to age or wear out over its intended application (such as a component in a product that doesn't typically wear out until long after the product itself does).

However, if a component is expected to show fatigue, corrosion, or wear before the expected life of the product is complete, then the exponential distribution is not a good model, because the risk of failure increases over time.

Choosing a Distribution: Practical Know-How + P-Plots

The exponential distribution is just one of several distributions that are commonly used to model failure data in reliability analysis. Selecting a distribution that models your data well is a critical requirement for the analysis.

You can use Minitab's Distribution ID plots to evaluate the fit of various distributions (Stat > Reliability/Survival > Distribution Analysis...) If the points fall in a straight line along the fitted distribution line, the distribution may provide a good fit.

The plots below show that the exponential distribution is clearly a better fit for the failure data than the normal distribution.

When using probability plots, it's possible that several distributions may provide a good fit for your data. So it's also helpful to be familiar with the defining characteristics and common applications of the distributions when making your decision.

Already Forgot What You Just Read?

Luckily, you don't have to memorize all the characteristics of each distribution. Instead, save the precious space in your brain to remember Mother's Day, and bookmark the url to the Minitab 17 Online Topic Library.

On the Topic Library main page, under ModelingStatistics, click Reliability. Under Distributions in reliability analysis, you'll find topics that summarize the characteristics and applications of commonly used distributions.

There's a lot of other great info in the Minitab 17 Topic Library, too...I just can't remember what it is right now.

Five Guidelines for Using P values

$
0
0

There is high pressure to find low P values. Obtaining a low P value for a hypothesis test is make or break because it can lead to funding, articles, and prestige. Statistical significance is everything!

My two previous posts looked at several issues related to P values:

In this post, I’ll look at whether P values are still helpful and provide guidelines on how to use them with these issues in mind.

Ronald FisherSir Ronald A Fisher

Are P Values Still Valuable?

Given the issues about P values, are they still helpful? A higher than expected rate of false positives can be a problem because if you implement the “findings” from a false positive study, you won’t get the expected benefits.

In my view, P values are a great tool. Ronald Fisher introduced P values in the 1920s because he wanted an objective method for comparing data to the null hypothesis, rather than the informal eyeball approach: "My data look different than the null hypothesis."

P value calculations incorporate the effect size, sample size, and variability of the data into a single number that objectively tells you how consistent your data are with the null hypothesis. Pretty nifty!

Unfortunately, the high pressure to find low P values, combined with a common misunderstanding of how to correctly interpret P values, has distorted the interpretation of significant results. However, these issues can be resolved.

So, let’s get to the guidelines! Their overall theme is that you should evaluate P values as part of a larger context where other factors matter.

Guideline 1: The Exact P Value Matters

Small wooden P
Tiny Ps are
great!

With the high pressure to find low P values, there’s a tendency to view studies as either significant or not. Did a study produce a P value less than 0.05? If so, it’s golden! However, there is no magic significance level that distinguishes between the studies that have a true effect and those that don’t with 100% accuracy. Instead, it’s all about lowering the error rate to an acceptable level.

The lower the P value, the lower the error rate. For example, a P value near 0.05 has an error rate of 25-50%. However, a P value of 0.0027 corresponds to an error rate of at least 4.5%, which is close to the rate that many mistakenly attribute to a P value of 0.05.

A lower P value thus suggests stronger evidence for rejecting the null hypothesis. A P value near 0.05 simply indicates that the result is worth another look, but it’s nothing you can hang your hat on by itself. It’s not until you get down near 0.001 until you have a fairly low chance of a false positive.

Guideline 2: Replication Matters

Today, P values are everything. However, Fisher intended P values to be just one part of a process that incorporates experimentation, statistical analysis and replication to lead to scientific conclusions.

According to Fisher, “A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.”

The false positive rates associated with P values that we saw in my last post definitely support this view. A single study, especially if the P value is near 0.05, is unlikely to reduce the false positive rate down to an acceptable level. Repeated experimentation may be required to finish at a point where the error rate is low enough to meet your objectives.

For example, if you have two independent studies that each produced a P value of 0.05, you can multiply the P values to obtain a probability of 0.0025 for both studies. However, you must include both the significant and insignificant studies in a series of similar studies, and not cherry pick only the significant studies.

Replicate study results

Conclusively proving a hypothesis with a single study is unlikely. So, don’t expect it!

Guideline 3: The Effect Size Matters

With all the focus on P values, attention to the effect size can be lost. Just because an effect is statistically significant doesn't necessarily make it meaningful in the real world. Nor does a P value indicate the precision of the estimated effect size.

If you want to move from just detecting an effect to assessing its magnitude and precision, use confidence intervals. In this context, a confidence interval is a range of values that is likely to contain the effect size.

For example, an AIDS vaccine study in Thailand obtained a P value of 0.039. Great! This was the first time that an AIDS vaccine had positive results. However, the confidence interval for effectiveness ranged from 1% to 52%. That’s not so impressive...the vaccine may work virtually none of the time up to half the time. The effectiveness is both low and imprecisely estimated.

Avoid thinking about studies only in terms of whether they are significant or not. Ask yourself; is the effect size precisely estimated and large enough to be important?

Guideline 4: The Alternative Hypothesis Matters

We tend to think of equivalent P values from different studies as providing the same support for the alternative hypothesis. However, not all P values are created equal.

Research shows that the plausibility of the alternative hypothesis greatly affects the false positive rate. For example, a highly plausible alternative hypothesis and a P value of 0.05 are associated with an error rate of at least 12%, while an implausible alternative is associated with a rate of at least 76%!

For example, given the track record for AIDS vaccines where the alternative hypothesis has never been true in previous studies, it's highly unlikely to be true at the outset of the Thai study. This situation tends to produce high false positive rates—often around 75%!

When you hear about a surprising new study that finds an unprecedented result, don’t fall for that first significant P value. Wait until the study has been well replicated before buying into the results!

Guideline 5: Subject Area Knowledge Matters

Applying subject area expertise to all aspects of hypothesis testing is crucial. Researchers need to apply their scientific judgment about the plausibility of the hypotheses, results of similar studies, proposed mechanisms, proper experimental design, and so on. Expert knowledge transforms statistics from numbers into meaningful, trustworthy findings.

Viewing all 828 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>