Basketball Statistics Question: How Important Is a Team's "Momentum" Heading into the ...

March 1, 2013, 5:00 am

≫ Next: What Makes Great Presidents and Good Models?

≪ Previous: Helping Beginners Learn about Process Variation using Miles Per Gallon

The 1994 NCAA Tournament Bracket It’s March, which means it’s the time of year when the country's sports fans focus their gaze upon college basketball. And since there are still a few weeks until the brackets come out, people will be trying to determine which teams are poised for a deep run in the tournament. One of the criteria people use to determine a team's potential is “momentum.” Everybody says you want your team to be “peaking at the right time.” But is this really important? We just saw the Baltimore Ravens win the Super Bowl despite losing 4 of their final 5 regular-season games.

So how important is it for NCAA basketball teams to be on a winning streak going into the tournament? Let’s open our statistical software and do some data analysis to find out!

I took every single NCAA tournament team from the last 5 years and obtained

their winning percentage in the last 10 games before the tournament
their seed, and
their number of wins in the tournament.

I also calculated their expected number of wins based on the seed. So seeds 9-16 are expected to win 0 games (unless they were in a play-in game, in which case I made it 1), seeds 5-8 are expected to win 1 game, 3-4 are expected to win two, 2 seeds are expected to win 3, and 1 seeds are expected to win 4 (any victories after they reach the Final Four are bonus).

But before we dive too deep into the data, let’s just see how often teams are “hot” coming into the tournament.

Minitab's Tally Command

We see that in the last 5 years, only 14 teams have made the tournament with a losing record in their last 10 games. The team that lost 7 of their previous 10 was Villanova in 2011. Most teams in the tournament won 7 or 8 of their previous 10 games. And 21 teams have been as “hot” as you can be going into the Big Dance, winning each of their last 10 games.

Next, let’s use a Bar Chart to break down the winning percentages by seed to see if higher seeds are playing better going into the tournament than lower seeds.

Bar Chart

Going into the tournament, 1 seeds are playing the best basketball, winning on average 8.6 of their last 10 games. This number drops as the seeds get lower, bottoming out at the 10 seed. However, things start to climb again as you get into the teens. This is because those teams represent smaller conferences, where often the team is given a poor seed despite a great record in their conference. For example, last year Detroit finished the year on a 9-1 streak which included a 20-point win in the Horizon League Tournament Championship Game. For their efforts, they were given a 15 seed and were promptly beat by Kansas in the first round by 15 points.

But do any of these winning streaks actually lead to success in the tournament? I’ll use Minitab to create a scatterplot between winning percentage in a team’s last 10 games and wins in the tournament.

Scatterplot

The points appear to be randomly scattered around the plot. There doesn’t appear to be much of a relationship between tournament wins and winning percentage at all.

But wait...we just saw in the bar chart above that seeds 13-16 have a high winning percentage going into the tournament, but aren’t very likely to win any games. So I’ll remove them and plot the data again.

Scatterplot

Again, the points don’t appear to fall in any pattern. I’ve pointed out some instances where teams have bucked the “you have to be peaking at the right time” trend. Last year Florida lost 6 of their last 10 games in the tournament, and then nearly went to the Final Four. The VCU team that went to the Final Four in 2011 actually lost 5 of their last 10 games going into the tournament. And the Connecticut team that won the national championship the same year lost 4 of their last 10. Sure, you can say they got “hot” during the Big East Tournament, where they won 5 straight games. But going into the Big East Tournament, they had actually lost 7 of their last 11 games, including 4 of their last 5. They looked about as bad as you can be going into their conference tournament, and yet won 9 straight games to win the national championship.

On the flip side, in 2010 Temple won straight 10 straight games going into the tournament and got a 5 seed. What did they do with all that momentum? They lost in the first round to Cornell by 13 points.

Let’s look at one more plot. This time, instead of just using tournament wins we’ll use “Wins over expected.” That is, how many wins the team got compared to what's expected from their seed line. High seeds that get upset early will have negative values, whereas low seeds that pull upsets will have positive values. And teams that perform exactly how they’re expected (16 seed losing in 1st round or 2 seed losing in the Elite Eight) will get a 0. I’ll also include all the teams this time.

Scatterplot

And once again we don’t see any kind of pattern in the scatterplot. VCU and Connecticut are back again as examples that show you don’t have to be “hot” to make a big run. The 2010 Michigan State team is another example, as they went .500 down the stretch and then went the entire way to the Final Four (and 2 points shy of the championship game).

But it doesn’t hurt to be on big winning streak, either. The two Butler teams that made the championship game definitely won a lot going into the tournament. And the Davidson team led by Stephen Curry won 22 straight games going into their Cinderella run in 2008!

And there are times when winning streaks don’t mean much. The Kansas team in 2010 not only won 9 of their last 10, but actually won 32 of their 34 games that season! But that momentum didn’t help them as they bowed out in the second round to Northern Iowa.

And to drive the point home, we can use Minitab to calculate a statistic that will show the lack of association between winning percentage and tournament wins over expected. Since both variables are ordinal, we can calculate correlation coefficients by going to Stat > Tables > Cross Tabulation and Chi-Square, then clicking Other Stats and checking Correlation Coefficients for Ordinal Categories. This will give us a value between -1 and 1. Values close to 0 indicate no association between the two variables. The closer the value gets to -1 or 1, the stronger the relationship. Here are the correlation coefficients for the data in the scatter plot above.

Coorelation

Both values are just about 0, further emphasizing the point that there is no relationship between winning percentage in the last 10 games and NCAA tournament success.

So over the next few weeks, when you hear analysts say, “This team is peaking at the perfect time for the tournament!” or “This team isn’t making it past the first weekend with the way they’ve been playing recently,” know that it doesn’t matter. A team is defined by what they’ve done the entire season, not just what they’ve done recently. All that “momentum” doesn’t mean a thing come tournament time.

After all, they don’t call it March Madness for nothing!

↧

What Makes Great Presidents and Good Models?

March 4, 2013, 3:09 am

≫ Next: My Work in Statistics: Developing New Tools for Analyzing Data

≪ Previous: Basketball Statistics Question: How Important Is a Team's "Momentum" Heading into the ...

Lincoln If the title of this post made you think you’d be reading about Abraham Lincoln and Tyra Banks, you’re only half right.

A few weeks ago, statistician and journalist Nate Silver published an interesting post on how U.S. presidents are ranked by historians. Silver showed that the percentage of electoral votes that a U.S. president receives in his 2nd term election serves as a rough predictor of his average ranking of greatness.

Here’s the model he came up with, which I’ve duplicated in Minitab using the scatterplot with regression and groups (Graph > Scatterplot ):

scatterplot

Silver divided the data into two groups to emphasize the marked difference in historical rankings between presidents who receive less than 50% or greater than 50% of the electoral vote in their second-term election. But as you can see from the slope and position of both lines, the linear model for both groups is almost identical.

How will President Obama be ranked? The model predicts that he’ll be historically ranked about 18th among the 43 persons who’ve served as U.S. presidents thus far.

Silver cautioned that this model provides only “rough” estimates. But he didn’t provide details. That made me curious (or skeptical)—how rough is it?

To find out, I analyzed the model (without groups) using Minitab’s General Regression:

-------------------------------------------------------------------------------
Silver's Model: General Regression

Summary of Model
S = 8.48729 R-Sq = 38.59% R-Sq(adj) = 36.32% R-Sq(pred) = 30.25%　

Analysis of Variance
Source                        DF   Seq SS   Adj SS   Adj MS    F       P
Regression                    1    1222.32 1222.32 1222.32 16.97   0.000
2nd Term % Electoral College 1    1222.32  1222.32 1222.32 16.97   0.000
Error                         27   1944.92 1944.92 72.03
Lack-of-Fit                   25   1882.92 1882.92 75.32     2.43   0.333
Pure Error                    2      62.00    62.00 31.00
Total                         28   3167.24

Regression Equation
Historians rank = 31.0708 - 0.221189 2nd Term % Electoral College
-------------------------------------------------------

The p-value (0.000) indicates that the % of the electoral vote a president receives for his second term is indeed a statistically significant predictor of his historical ranking at an alpha level of 0.05.

To get an idea of how “rough” the model is, look at the R-squared values. The R-sq value of 38.59% indicates that this simple model explains about 40% of the variability in a president’s average historical ranking. But for predicting the average rankings for future presidents the model is a bit rougher—it explains only about 30% of the variability in future observations (R-sq (pred) = 30.25%).

In either case, that leaves quite a lot of variation unspoken for. Using the information readily available on U.S. presidents online, is it possible to come up with a better predictive model?

Do Great Presidents Make History? Or Vice-versa?

If life offers us anything certain at all (besides death and taxes), it’s the unbridled opportunity for tentative speculation. Would Lincoln be considered such a great president if he hadn’t governed during the violent and tumultuous times of the U.S. Civil War? Or FDR, if he hadn’t led our country through the dark days of World War II?

In other words, are historians more likely to rank a president higher if he governed during a war?

What about other factors? Would JFK be so admired if his presidency hadn’t ended so abruptly and tragically in assassination? Could even a superficial thing as a president’s physical stature affect his historical ranking and public popularity? What about his age? Or lifespan?

(One side note here. In statistics, you’re often cautioned that correlation does not imply causation. While that’s true, don’t interpret that oft-cited warning to mean that correlation and causation aren’t related. Of course they are. In fact, when you’re thinking about possible predictors for your model, you’re going to naturally think of possible causative factors. Because if there is a causative relationship between a predictor and a response variable, there should be a significant association. Correlation itself doesn’t prove the causation though—for that you need to rely on other types of analyses.)

Choosing a Top Model

Take a look at two models to predict the historical ranking of a U.S. president, both analyzed using Minitab’s General Regression.

Model 1 uses age and length of retirement as predictors, as well as a categorical predictor to indicate whether the U.S. was at war during the president’s tenure in office:

--------------------------------------------------------------------------------------
Model 1: Age (inauguration), Age (death), Length of Retirement, War

Summary of Model
S = 8.81231 R-Sq = 59.36% R-Sq(adj) = 54.43% R-Sq(pred) = 44.81%

Analysis of Variance
Source               DF Seq SS Adj SS   Adj MS    F      P
Regression           4 3743.14 3743.14 935.79 12.05 0.000
Age at inauguration 1   24.09 1140.85 1140.85 14.69 0.001
Age at death         1   29.23 1204.39 1204.39 15.51 0.000
Length of Retirement 1 2710.77 1077.89 1077.89 13.88 0.001
War                  1 979.05 979.05 979.05 12.61 0.001
Error               33 2562.67 2562.67   77.66
Total               37 6305.82

Regression Equation
War
No Historians rank = 30.7955 + 2.29991 Age at inauguration - 2.20691 Age at death + 0.00623944 Length of Retirement

Yes Historians rank = 17.8469 + 2.29991 Age at inauguration - 2.20691 Age at death + 0.00623944 Length of Retirement
-------------------------------------------------------

Model 2 also includes War as a categorical predictor. But its continuous predictor simply indicates the number of years the president served in office. A second categorical predictor indicates whether the president was subject to an assassination attempt.

----------------------------------------------------------------------------------
Model 2: Years in Office, Assassination Attempt, War

Summary of Model
S = 8.67388 R-Sq = 56.66% R-Sq(adj) = 53.24% R-Sq(pred) = 46.82%

Analysis of Variance
Source                DF Seq SS Adj SS Adj MS    F       P
Regression             3 3737.43 3737.43 1245.81 16.56   0.000
Years in Office        1 2458.05 1068.64 1068.64 14.20   0.001
Assassination Attempt 1 739.43   696.96 696.96   9.26   0.004
War                    1 539.96   539.96 539.96   7.18   0.011
Error                 38 2858.97 2858.97 75.24
Lack-of-Fit           16 992.14   992.14 62.01     0.73   0.74
Pure Error            22 1866.83 1866.83 84.86
Total                 41 6596.40

Regression Equation
Assassination War
Attempt
No No Historians rank = 38.9471 - 2.21938 Years in Office

No Yes Historians rank = 30.3752 - 2.21938 Years in Office

Yes No Historians rank = 29.6362 - 2.21938 Years in Office

Yes Yes Historians rank = 21.0644 - 2.21938 Years in Office
--------------------------------------------------------

Of these two models, which would you favor? Compare the output and look carefully at the predictors in each model.

Both models are statistically significant—in fact, all the predictors in each model have p-values less than an alpha of 0.05. Both models also have about the same R-squared value, and explain close to 60% of the variation in the historical rankings. You might favor Model 1 because its R-squared values are a wee bit higher. You also might prefer its ease of use, with only 2 regression equations to predict the response, instead of 4. Those are valid reasons, but there’s something lurking beneath the surface that trumps those reasons—a dreaded “statistical disease” that can afflict regression models, called multicollinearity.

What makes models behave erratically?

Multicollinearity is a word you’re not likely to hear bandied about in your local sports bar. But despite it's big scary name, multicollinearity is a relatively simple concept. It just means that your model contains predictors that are correlated with each other.

Take a look at Model 1 again, with its continuous predictors Age at inauguration, Age at death, Length of retirement. Any likely correlations between those predictors? Obvious, isn’t it, once you think about it? The longer you live, the longer your retirement is likely to be.

To evaluate possible multicollinearity using our statistical software, display the predictors you suspect of correlation on a scatterplot (Graph > Scatterplot) and run a correlation analysis (Stat > Basic Statistics > Correlation):

Scatterplot of corrleation

Correlations: Length of Retirement, Age at death

Pearson correlation of Length of Retirement and Age at death = 0.834
P-Value = 0.000
----------------------------------------

The graph indicates a clear relationship between the two predictors, as you’d expect. The correlation analysis shows that it’s a fairly strong, statistically significant correlation.

But what if a correlation between your predictors isn’t so intuitively obvious? Or what if your model contains lots of predictors? There’s another way to quickly spot the trouble.

When you run the regression analysis in Minitab, click Options (in Regression) or Results (in General Regression) and choose to display the variance inflation factors (VIFs). This is what you’d get for the two models:

Model 1

Term                    VIF
Constant
Age at inauguration     6.65
Age at death           22.27
Length of Retirement   14.98
War (No)                1.26

Model 2

Term                 VIF
Constant
Years in Office     1.23
Assassination (No)  1.01
War (No)            1.22

Now you can see a big difference between the models. High variance inflation factors indicate possible correlations between the predictors. You want VIFs as close to 1 as possible. Anything greater than 10 indicates trouble. As you can see, Model 1 shows strong evidence of multicollinearity.

So what’s the big deal? Is multicollinearity just another complicated rule that statisticians can use to gleefully tear apart your results?

No. The big deal is that multicollinearity makes models very unstable. Weird things can happen...estimates for the coefficients for each predictor can vary erratically depending on which other predictors you include in the model. What’s more, predictors that appear to be statistically significant may not be significant at all.

For example, look what happens after you remove one of the correlated predictors from Model 1:

Model 1 (without Length of Retirement)
Term                           P            VIF
Age at inauguration            0.31         1.59
Age at death                   0.28         1.60
War (No)                       0.00         1.01
----------------------------------------

The VIF values are now much lower—because removing one of the correlated factors addressed the problem of multicollinearity. But look what happened to the p-values. Before, both continuous predictors, "Age at inauguration" and "Age at death," were statistically significant, with p-values < 0.05. Now neither predictor is significant—together or by itself. Model 1 falls to pieces…crumbles to dust…a victim of instability caused by multicollinearity.

On the other hand, if you run Model 2 with each predictor by itself, each predictor is always statistically significant. It’s a stable model.

Conclusion: The Makings of a Good Model

As you can see, there's much more to a good model than low p-values. I set out to find a better model to account for average historical rankings of U.S. presidents. Model 1 self-destructed due to multicollinearity. But Model 2 is stable, and it does have some advantages over Silver's original model:

It accounts for more variation in the response.
It can be used to estimate historical rankings for all U.S. presidents, regardless of whether they sought a second term.

I also think the predictors I've chosen are a bit more thought-provoking (could a Wag the Dog principle be at work?)

But I admit my model is not as simple and elegant as Silver's, with its one continuous predictor that can be easily and accurately measured before a president's second term is completed. My model is definitely clunkier.

It also has some potential issues related to the measurement of the categorical predictors, such as how to define an assassination attempt. For example, someone once fired shots at the White House from afar while President Clinton was in office, but I didn’t count that as an assassination attempt—partly because I didn’t think it was part of the public consciousness. Similarly, for the "War" variable, I didn't count conflicts with Native American tribes as a U.S. war during a president's term. Clearly, there's lots of room for debate there.

There’s so much data on U.S. presidents available. If only I had world enough and time. There's has to be a better model. Maybe you can find it...

↧

My Work in Statistics: Developing New Tools for Analyzing Data

March 5, 2013, 5:00 am

≫ Next: How to Win an Oscar (If You Misunderstand Statistics)

≪ Previous: What Makes Great Presidents and Good Models?

In honor of the International Year of Statistics, I interviewed Scott Pammer, a technical product manager here at Minitab Inc. in State College, Pa. Scott works to develop new product concepts and the accompanying prototypes and business plans.

Before taking on the role of technical product manager, Scott worked for Minitab as a senior statistician. In this role, he designed and programmed various features in Minitab Statistical Software. He’s been with Minitab since 1995.

What was your journey to becoming a statistician?

When I was a senior in high school, I knew I needed to pick a college major. I was always good at math, so I majored in general mathematics because the college I attended did not offer a statistics degree. My dad felt it was important to include practical knowledge in my studies and this lead me to double-major in mathematics and accounting. The accounting paid dividends down the road because it exposed me to business problems.

From an early age, I also wanted to become a professor in some kind of math. I always gravitated to applied mathematics and thought statistics was a natural choice. Statistics is solving real problems with data– much less abstract than a proof or a theorem. That’s how I ended up at Penn State majoring in statistics. I got a dual Ph.D. in statistics and operations research, adding operations research to my resume to expose me to another type of practical problem. I came to Minitab because I really enjoyed programming and statistics, and Minitab allowed me to do both.

What is the most challenging part of your work?

Explaining statistics to a non-statistician is challenging, and it’s also difficult to create statistical output or results that a non-statistician can consume.

When it comes to developing new product concepts, the challenge is the uneasiness. Will a person pay money for the product and will it be profitable? It is one thing to say, "Yes, I like your idea," but it’s another thing to actually write the check. It’s tough to know if you have the right idea that will be both successful and profitable.

What is the most interesting problem you have used statistics to solve?

I did my dissertation on forecasting the penetration of a new product into a marketplace. This was both an interesting and challenging problem to solve. I was able to use the penetration of a similar existing product in the marketplace to greatly improve the forecasts. For example, if you were trying to figure out the penetration of color televisions in the marketplace, you could use a technological substitution – like a black-and-white TV – to gain some insight about the penetration of color televisions.

While at Penn State, I also worked in a consulting role, using statisticsto solve all kinds of biological applications.

What is the best career advice you were ever given?

My advisor in college taught me to relate every problem to something you know well, and then break it down into pieces to make the problem more manageable to solve. This advice has served me well over the course of my career.

Another thing that is overlooked is communication. Working on your communication skills and being able to communicate statistical results to non-statisticians is very important. Even if you have a smart mathematical brain, you still need to work at your communication with people outside the mathematics field. I got involved in consulting because it forced me to talk to people in various fields. It helped me understand how others think and talk, and this made it easier for me to communicate statistical subjects.

Learn more about how Minitab is celebrating the International Year of Statistics: http://www.minitab.com/company/news/news-basic.aspx?id=11770.

↧

How to Win an Oscar (If You Misunderstand Statistics)

March 6, 2013, 9:37 am

≫ Next: Using Data Analysis to Assess Fatality Rates in Star Trek: The Original Series

≪ Previous: My Work in Statistics: Developing New Tools for Analyzing Data

Replicas of Academy Awards statuettes Statistician-to-the-Stars William Briggs deserves credit for his correct prediction of the Best Picture Oscar the day before the ceremonies. And while Mr. Briggs would never encourage anyone to misuse his model this way, I feel my statistics heartstrings strummed by the desire to remind everyone about a particular common and dangerous statistical mistake: Correlation does not = causation.

Mr. Briggs correctly predicted Argo would be selected as Best Picture from among the nominated films and noted that "The key reasons for its victory will be: the lead actor is at least forty, the other featured actors are mainly older men, and the picture took in only 20% of the money earned by 2012′s top film." I imagine Mr. Briggs tongue was planted firmly in his cheek, but if you read the words as they're written, they imply that if you want to win the Oscar for Best Picture, you should make a film that has these three qualities. After all, it's been noted that the people who vote are usually older men.

With this in mind, we can take a look at a few movies that were clearly undeservedly spurned by the Academy this year. I chose examples from movies released after June 15th, 2012 that looked like they would have older male actors. Male actors and whether the movie had mostly male actors was determined from the cast lists on wildaboutmovies.com. Ages came from Bing searches in the format of the actor's name followed by the word "age". The amount of money a movie grossed in the US market is from boxofficemojo.com.

In alphabetical order, let's sympathize with some movies whose makers made sure to do everything the Academy wanted and still didn't even get a nomination for Best Picture:

Movie Average age of leading male actors Domestic Gross Proportion of Avengers Domestic Gross Alex Cross 50.8333 2588412 .01453100 Cloud Atlas 47.3750 27108272 0.0434875 Expendables 2 48.5000 85028192 0.1364030 Here Comes the Boom 57.0000 45290318 0.0726554 Last Ounce of Courage 48.5000 33219674 0.0532915 Rock of Ages 44.0000 38518613 0.0617921 Seven Psychopaths 52.6000 15024049 0.0241018 Sinister 55.0000 48086903 0.0771417 Stolen 45.0000 304318 0.0004882 The Bourne Legacy 53.2000 113203870 0.1816030 The Campaign 49.7143 86907746 0.1394190 The Watch 39.8333 35353000 0.0567138

Of course, to make it clear which movies worked the hardest, it's always best to use an image rather than a table. Minitab Statistical Software's bar charts are great for making the data even clearer.

Here's the age data with the bars in alphabetical order by movie title:

Average of of male actors, sorted by movie title

But if we really want to see who wanted the Oscar the most, we'd sort the bars to show the highest average age in the first position we typically examine: the left.

Movies sorted by average age of leading male actors

Alas, the academy didn't appreciate Here Comes the Boom's story of the biology teacher who really fights for his kids.

To put the movie that made the least money on the left, we sort the bars in ascending order instead of descending order.

Movie by amount of Avengers gross

When you think about it, isn't Stolen's story of an ex-con rescuing his estranged daughter really the same thing as Argo's story of people pretending to make a movie rescuing hostages from Iran?

Of course, if he was aware of Mr. Briggs' model, then the really disappointed person is going to be Martin McDonagh, the writer and director of Seven Psychopaths. After all, his was the only movie bold enough to both cast mostly older men and not make much money.

But since Mr. McDonagh probably knows that correlation doesn't = causation, he'll probably leave his characters to their original violent ends instead of going for the Oscar with Seven Psychopaths 2: 49 Psychopaths.

Curious about how to sort bar charts in ascending or descending order? Follow the steps in this quick example on how to make a simple bar chart of counts.

The photo of the replica statuettes is byAntoine Taveneauxand licensed for reuse under thisCreative Commons License.

↧

Using Data Analysis to Assess Fatality Rates in Star Trek: The Original Series

March 7, 2013, 4:00 am

≫ Next: Why the Weibull Distribution Is Always Welcome

≪ Previous: How to Win an Oscar (If You Misunderstand Statistics)

I’m a Star Trek fan and a statistics fan. So, I’m thrilled to finally have the opportunity to combine the two into a blog post! In the original Star Trek series with Captain Kirk, the crew members of the U.S.S. Enterprise who wear red shirts have a reputation for dying more frequently than those who wear blue or gold shirts. Wearing a red shirt appears to be the kiss of death! In this blog, we’ll conduct several hypothesis tests to determine whether this is true.

Do statistics prove red shirts are more dangerous? Matthew Barsalou published an article in Significance that studies this from a statistical perspective. Barsalou is also a guest blogger right here at Minitab's blog! His hypothesis is that the proportion of the red-shirted personnel who die is no greater than the other colors. We see more red-shirt deaths simply because red-shirts comprise about half the crew. Barsalou uses Minitab Statistical Software to produce a series of graphs that break down the data. He then assesses the probabilities with his own Bayesian calculations. It’s worth reading the original article.

Rather than duplicating his work, I’ll add to it by using Minitab to formally test his two hypotheses using several tests that he doesn't use.

Fatalities by Uniform Colors

The uniform color denotes the area that the crewmember works in. We’re going to determine if a crewmember’s duty area affects his chances of being killed. In the table below, you can see that red-shirts make up the majority of both the crew and the fatalities. The data is “real” in the sense that the deaths are depicted on the show and the crew numbers are from authoritative reference sources.

Let’s boldy go where no hypothesis tests have gone before!

Color

Areas

Crew

Fatalities

Blue

Science and medical

136

Gold

Command and helm

Red

Operations, engineering, and security

239

Ship’s total

All

430

Chi-Square Test for Uniform Color and Fatalities

To determine whether the percentage of fatalities varies by uniform color, we’ll perform a Chi-square analysis. In this case, the Chi-square statistic quantifies how the observed distribution of counts varies from the distribution you would expect if no relationship exists between uniform color and the number of fatalities. A low p-value suggests that there is a relationship. You can get the data here. (If you want to follow along but you don't already have Minitab Statistical Software, go ahead and download the free 30-day trial version of Minitab.)

In Minitab, go to Stat > Tables > Cross Tabulation and Chi-square.
In For rows, enter Color. In For columns, enter Status. And, in Frequencies are in, enter Frequencies. Under Display, check both Counts and Row percents.
Click the Chi-square button and check both Chi-Square analysis and Each cell’s contribution to the Chi-Square statistic.
Click OK in all dialog boxes.

Chi-square analysis of fatalities by uniform color

The first thing to note is that both p-values are less than 0.05, which indicates that there is a relationship between uniform color and fatalities. Next, we’ll determine the nature of the relationship.

To do this, we’ll assess how much each cell in the Dead column contributes to the significant Chi-square statistic (bottom value in each cell). I've graphed the contributions below:

Contribution to the Chi-square statistic

Red contributes almost nothing to the Chi-square statistic. Instead, the story really involves the blue and gold uniforms. These two colors produce the statistical significance.

By comparing the actual counts to the expected counts in the output, we see that blue-shirts have fewer deaths than expected while gold-shirts have more deaths than expected. We can also look at the percentage of fatalities for each uniform color (% of row).

Percentage of fatalities by uniform color

The graph confirms the conclusions drawn by comparing the actual to expected values: blue has the lowest percent, gold the highest, and red is right at the overall percentage.

The Chi-square analysis and hypothesis test support Barsalou’s theory that red-shirts do not die at a higher rate. Instead, there is simply a larger population of red-shirts who die at the average rate. However, there is more to the story. As we'll see, a moral to this story is that it's crucial to pick the truly important explanatory variable.

2 Proportions test to compare Security Red-Shirts to Non-Security Red Shirts

Barsalou goes on to postulate that perhaps shirt color is not the relevant variable. After all, crew members in a variety of other departments (including Engineering and Operations) also wear red shirts. Perhaps it’s only the security department that has a higher fatality rate?

The 2 Proportions test can put this theory to the test. We’ll compare the proportion of deaths between two subgroups of red-shirts: security vs non-security.

In Minitab, go to Stats > Basic Statistics > 2 Proportions. Choose Summarized data and enter the following:

Events

Trials

First

Second

149

The first group includes red-shirts who are in the security department while the second group includes red-shirts who are not in the security department. The events are the number of deaths while the trials are the number of personnel.

Two proportions test to compare security to non-security red-shirts

The p-value is 0.000, which indicates that the two proportions are significantly different. There is a whopping 20% fatality rate within the security department compared to only 4% for the non-security red-shirts. To put that in perspective, security personnel have the highest fatality rate on the ship, even higher than the gold-shirts. Non-security red-shirts have a fatality rate right around that of the blue-shirts.

Consequently, the 2 Proportions test supports Barsalou’s theory that it’s just the red-shirts in Security who have the higher death rate. Engineering and Operations are not at a higher risk, even though they also wear red-shirts.

Who's At Risk?

Both hypothesis tests show that specific duty areas have a significantly higher fatality rate than other duty areas. If you're a doctor, scientist, engineer, or in ship operations on board the Enterprise, you're relatively safe, with a risk of dying of about 5% over the timeframe of the series.

On the other hand, if you're in the command hierarchy or security, you're at a significantly higher level of risk, one that exceeds 15%!

↧

Why the Weibull Distribution Is Always Welcome

March 8, 2013, 4:45 am

≫ Next: Build a DIY Catapult for DOE (Design of Experiments), part 1

≪ Previous: Using Data Analysis to Assess Fatality Rates in Star Trek: The Original Series

why is the weibull distribution always welcome? In college I had a friend who could go anywhere and fit right in. He'd have lunch with a group of professors, then play hacky-sack with the hippies in the park, and later that evening he'd hang out with the local bikers at the toughest bar in the city. Next day he'd play pickup football with the jocks before going to an all-night LAN party with his gamer pals. On an average weekend he might catch an all-ages show with the small group of straight-edge punk rockers on our campus, or else check out a kegger with some townies, then finish the weekend by playing some D&D with his friends from the physics club.

He was like a chameleon, able to match and reflect the characteristics of the people he was with. That flexibility made him welcome in an astonishingly diverse array of social circles.

His name was Jeff Weibull, and he was so popular that local statisticians even named "The Weibull Distribution" after him.

What Makes the Weibull Distribution So Popular?

All right, I just made that last part up—Jeff's last name wasn't really"Weibull," and the distribution is named for someone else entirely. But when I first learned about the Weibull Distribution, I immediately recalled Jeff, and his seemingly effortless ability to be perfectly comfortable in such a wide variety of social settings.

Just as Jeff was a chameleon in different social circles, the Weibull distribution has the ability to assume the characteristics of many different types of distributions. This has made it extremely popular among engineers and quality practitioners, who have made it the most commonly used distribution for modeling reliability data. They like incorporating the Weibull distribution into their data analysis because it is flexible enough to model a variety of data sets.

Got right-skewed data? Weibull can model that. Left-skewed data? Sure, that's cool with Weibull. Symmetric data? Weibull's up for it. That flexibility is why engineers use the Weibull distribution to evaluate the reliability and material strengths of everything from vacuum tubes and capacitors to ball bearings and relays.

The Weibull distribution can also model hazard functions that are decreasing, increasing or constant, allowing it to describe any phase of an item’s lifetime.

How the Weibull Curve Changes Its Shape

So just how flexible is the Weibull distribution? Let's look at some examples using Graph > Probability Distribution Plot in Minitab Statistical Software. (If you want to follow along and you don't already have Minitab, download the free 30-day trial.)

probability distribution plots

Select "View Single," and then "Weibull" in the Distribution drop-down menu. The dialog box will let you specify three parameters: shape, scale, and threshold.

The threshold parameter indicates the distribution's shift away from 0, with a negative threshold shifting the distribution to the left of 0, and a positive threshold shifting it to the right. All data must be greater than the threshold. The scale parameter is the 63.2 percentile of the data, and it defines the Weibull curve's relation to the threshold, like the mean defines a normal curve's position. For our examples we'll use a scale of 10, which says that 63.2% of the items tested will fail in the first 10 hours following the threshold time. The shape parameter, describes the shape of the Weibull curve. By changing the shape, you can model the characteristics of many different life distributions.

For this post, I'll focus exclusively on how the shape parameter affects the Weibull curve. I'll go through these one-by-one, but if you'd like to see them all together on a single plot, choose the "Vary Parameters" option in the dialog box shown above.

Weibull Distribution with Shape Less Than 1

Let's start with a shape between 0 and 1. The graph below shows the probability decreases exponentially from infinity. In terms of failure rate, data that fit this distribution have a high number of initial failures, which decrease over time as the defective items are eliminated from the sample. These early failures are frequently called "infant mortality," because they occur in the early stage of a product's life.

Weibull Distribution with shape between 0 and 1

Weibull Distribution with Shape Equal to 1

When the shape is equal to 1, the Weibull distribution decreases exponentially from 1/alpha, where alpha = the scale parameter. Essentially, this means that over time the failure rate remains consistent. This shape of the Weibull distribution is appropriate for random failures and multiple-cause failures, and can be used to model the useful life of products.

Weibull Distribution with shape = 1

Weibull Distribution with Shape Between 1 and 2

When the shape value is between 1 and 2, the Weibull Distribution rises to a peak quickly, then decreases over time. The failure rate increases overall, with the most rapid increase occurring intially. This shape is indicative of early wear-out failures.

Weibull Distribution with shape value between 1 and 2

Weibull Distribution with Shape Equal to 2

When the shape value reaches 2, the Weibull distribution models a linearly increasing failure rate, where the risk of wear-out failure increases steadily over the product's lifetime. This form of the Weibull distribution is also known as the Rayleigh distribution.

Weibull Distribution with Shape = 2 AKA Rayleigh Distribution

Weibull Distribution with Shape Between 3 and 4

If we put the shape value between 3 and 4, the Weibull distribution becomes symmetric and bell-shaped, like the normal curve. This form of the Weibull distribution models rapid wear-out failures during the final period of product life, when most failures happen.

Weibull distribution symmetric shape value = 3.5

Weibull Distribution with Shape Greater than 10

When the shape value is above 10, the Weibull distribution is similar to an extreme value distribution. Again, this form of the distribution can model the final period of product life.

Weibull Distribution shape value = 20 skewed

Is Weibull Always the Best Choice?

When it comes to reliability, Weibull frequently is the go-to distribution, but it's important to note other distribution families can model a variety of distributional shapes, too. You want to find the distribution that gives you the best fit for your data, and that may not be a form of the Weibull distribution. For example, product failures caused by chemical reactions or corrosion are usually modeled with the lognormal distribution.

You can assess the fit of your data using Minitab’s Distribution ID plot (Stat > Reliability/Survival > Distribution Analysis (Right-Censoring or Arbitrary Censoring)). If you want more details about that, check out this post Jim Frost wrote about identifying the distribution of your data.

↧

Build a DIY Catapult for DOE (Design of Experiments), part 1

March 11, 2013, 5:00 am

≫ Next: Build a DIY Catapult for DOE (Design of Experiments), part 2

≪ Previous: Why the Weibull Distribution Is Always Welcome

by Matthew Barsalou, guest blogger

DIY DOE Catapult I needed to find a way to perform experiments to practice using design of experiments (DOE), so I built a simple do-it-yourself (DIY) catapult. The basic plan for the catapult is based on the table-top troll catapult from http://www.stormthecastle.com/catapult/how-to-build-a-catapult.htm.

My catapult is not as attractive as the troll catapult; my goal was to build a catapult with multiple adjustable factors—and not to lay siege to a castle—so I don’t mind the rough appearance of my catapult.

The frame consists of two pieces of 40 cm x 4 cm x 2 cm wood, two pieces of 24 cm x 4 cm x 2 cm wood, and eight pieces of 20 cm x 4 cm x 2 cm wood. I could have used other dimensions. The shorter pieces are 50% the length of the long pieces; however, if you use other dimensions, be sure that the wood is thick enough to avoid breaking under the stress of a launch. The catapult arm is made of a 45 cm x 2 cm x 2 cm piece of wood. I could have used a thicker piece for the catapult arm, but wanted something light. Also needed are 16 wood screws. The four screws used to hold the supports to the base must be flathead so the catapult's wooden bottom can sit flat.

I used eighteen small screw eyes to add adjustability and four screw hooks to attach the rubber bands that power the catapult arm. The rubber bands are heavy rubber bands intended for model building, although regular rubber bands could work with a smaller catapult. I used 60 mm diameter, 100 mm diameter and 130 mm diameter rubber bands. The catapult cup can be an actual small cup; I used the bottom of a small plastic bottle.

For projectiles, I could have used small balls—but I wanted a projectile that would not roll or slide much after landing, so I used three small bags of rice as the projectiles. I also used a metal rod cut into pieces for the pivot point on the catapult arm and for the rubber band guides, arm stoppers and arm starting points.

The dimensions can be modified as needed. For example, two pieces of 1” x 2” x 15.75” wood, two pieces of 1” x 2” x 9.5” wood, eight pieces of 1” x 2” x 8” and one piece of 1” x 2” x 18” wood could be used to build the catapult using standard sizes. The catapult can also be scaled-up or scaled-down; just be sure it is wide enough so that it will not tip over.

Here is the right-side view of the catapult, without the catapult arm:

DIY Catapult for Design of Experiments (DOE)

This is the view from the front, again without the catapult arm:

Front view of DIY Catapult Body for Designed Experiments (DOE)

This is the right-side view of the catapult arm:

Catapult for Design of Experiments (DOE) right side view

And this is what the completed catapult looks like:

Finished DIY Catapult for doing DOE (Design of Experiments)

Now that my catapult is built, I have one last step to complete: to find the optimal catapult setting using DOE, which I'll do with Minitab Statistical Software in my next post.

If you want to build your own, here are my plans and instructions for the DIY DOE Catapult in a PDF document.

About the Guest Blogger:

Matthew Barsalou is the quality manager at an automotive supplier in Germany since 2011, and previously worked as a contract quality engineer at Ford in Germany and Belgium. He is completing a master’s degree in industrial engineering at the Wilhelm Büchner Hochschule in Darmstadt, Germany, and is also working on a manuscript for a practical introductory book on statistics.

Would you like to publish a guest post on the Minitab Blog? Contact publicrelations@minitab.com.

↧

Build a DIY Catapult for DOE (Design of Experiments), part 2

March 12, 2013, 5:00 am

≫ Next: Evaluating a Gage Study With One Part

≪ Previous: Build a DIY Catapult for DOE (Design of Experiments), part 1

Finished DIY Catapult for doing DOE (Design of Experiments)

by Matthew Barsalou, guest blogger

In my last post, I shared my plans for building a simple do-it-yourself catapult for performing experiments to practice using design of experiments (DOE).

That's the completed catapult there on the right. If you want to build your own, here are my plans and instructions in a PDF.

Now that my catapult is built, I have one last step to complete: to find the optimal catapult setting using DOE, which I'll do with Minitab Statistical Software. (If you'd like to follow along but don't already have it, please download the 30-day free trial of Minitab.)

Planning the Designed Experiment: Possible Factors and Levels

The catapult has 6 factors with three possible levels each. The factors are:

rubber band guide
arm stopper
starting point
rubber band
rubber band attachment on arm
projectile

The rubber band guide increases the distance the rubber band must travel and thereby stretches the rubber band. The rubber band guide can be set at one of three possible heights by moving the metal rod to the screw eye at the required height. The levels are the distance from the top of the base of the catapult to the center of the screw eye for the guide.

The arm stopper stops the catapults arm after lunch and the factors are the distance from the beginning of the support connector to the center of the screw eye for the stopper.

The starting point is the lowest point at which the catapult arm can be pushed down to prior to launching the projectile. The levels are the distances from the top of the base to the middle of the screw eyes used for the starting points.

The rubber band factors are the diameters of the rubber bands, and the projectile factors are the weight of the bags of rice used as a projectile. The final factor is the point where the rubber band is attached to the catapult arm; the three levels are the distances from the end of the arm to the attachment point.

The table below shows the factors and their possible levels:

Factor

First level

Second level

Third level

Rubber band guide

4 cm

9 cm

14 cm

Arm stopper

5 cm

10 cm

15 cm

Starting point

3 cm

8 cm

13 cm

Rubber band

90 mm

100 mm

130 mm

Projectile

25 g

37.5 g

50 g

Attachment on arm

14 cm

18 cm

22 cm

Additional levels are possible. There is enough room on the catapult to add more levels to the rubber band guide, arm stopper and starting point. There are also more rubber band sizes available and the weight of the projectiles is not infinite, but limited only by the upper limit of what the catapult can throw. For most DOE trials six factors with three levels should be sufficient. This results in 36 possible combinations. That means 729 combinations. Adding an extra level to each factor would result in 4,096 combinations and a fifth level would result in 15,625 combinations. Doubling the number of levels to six would result in 46,656 different combinations.

The catapult may be large enough to accommodate six levels for each of the six possible factors; however, DOE would be needed to find the optimal settings without running all possible combinations.

The DOE Experiment

I had to put the catapult to use after finishing with the assembly. The first experiment performed using the catapult was a 23 full factorial with three replicates. This means that three factors were evaluated, each at two levels, and each treatment was run three times. The factors are levels for the experiment were:

Projectile

Rubber band

Starting point

10 g

90 mm

3 cm

10 g

90 mm

13 cm

10 g

130 mm

3 cm

10 g

130 mm

13 cm

20 g

90 mm

3 cm

20 g

90 mm

13 cm

20 g

130 mm

3 cm

20 g

130 mm

13 cm

Each run was performed at a high and low value for the factors. The high value is designated by “(2)” and the low value is designated by “(1)” in my Minitab data sheet. The run order was determined at random to ensure randomization. This was essential to ensure that the results were indeed the results of the factors and levels evaluated, and not changes in the system as the testing progressed. For example, the rubber bands used may stretch during testing. Randomization helps to ensure the test results of such uncontrollable factors are spread across the test result, and Minitab Statistical Software's DOE tools provide a randomized run order by default.

I performed the runs in the order determined by the software and recorded the results. Next, I used Minitab to generate a main effects plot and an interaction plot for the response variable, distance traveled by the projectile.

Main Effects Plot - Design of Experiments

Interaction Plot - Design of Experiments (DOE)

The results indicate that the interactions were not significant. However, the analysis results shown in the main effects plot indicate a large difference between the high and low settings for both rubber band and starting point. This makes sense, since the smaller rubber band should provide more force than the larger rubber band. The lower starting point for the catapult arm results in a greater distance of travel between the release point of the arm and the stopping point where the arm releases the projectile.

The effect of the projectile weight was not as important as the other factors. This may be due to the very light weight of the projectiles. The higher setting was twice the weight of the low setting; however, the lower setting was only the equivalent of approximately 4 quarters. Maybe a difference of a dollar’s worth of quarters is too little to result in a larger main effect for distance traveled? This could also be investigated using the catapult and Minitab...

About the Guest Blogger:

Would you like to publish a guest post on the Minitab Blog? Contact publicrelations@minitab.com.

↧

Evaluating a Gage Study With One Part

March 13, 2013, 5:00 am

≫ Next: Why Isn't This "Six Sigma" Project Improving Quality?

≪ Previous: Build a DIY Catapult for DOE (Design of Experiments), part 2

Recently, Minitab News featured an article that talked about how to perform a Gage Gage R&R Study with only one part. This prompted many users to contact our technical support team with questions about next steps, like these:

What can I do with the output of a Gage study with only one part?
How can I use the variance component estimates to obtain meaningful information about my measurement system?

By themselves, the variance component estimates from the ANOVA output for a Gage study with just one part are not particularly useful. However, if we combine what we’ve learned about the variance for Repeatability and Reproducibility with some additional information, we can obtain a significant amount of information about our measurement system. We will also need to do some simple hand calculations.

Getting Meaningful Information about the Measurement System

So, what other information will we need to get meaningful information about our measurement system?

If we have a historical estimate of the standard deviation for the Part-to-Part variation, we can get an estimate of the total variation and %Contribution. From there, we can work through a few hand calculations to obtain most of the information we would have obtained had we done a full Gage R&R.

We can also obtain some useful information about our measurement system if we use the process tolerance. Using the tolerance, we can get an estimate of the %Tolerance for the total gage (which is interpreted using the same guidelines as %Study Variation).

The way to proceed with the estimates of the historical StDev or the Tolerance is best explained with examples. Let’s assume that we’ve conducted a Gage study with one part and three operators by following the instructions in the article referenced above.

Let’s also assume that the data are arranged in the worksheet as shown below; the estimated variance components from the session window output are also shown below. The reproducibility is 0.06278 (the variance component for Operator), and the repeatability is 0.01730 (the variance component for Error).

Using Historical Part-to-Part Standard Deviation

As a first example, let’s assume that the historical standard deviation for part to part is 1.0853.

The total variation is the sum of the variance components for the part-to-part variation, for repeatability and for reproducibility.

Since we have the standard deviation for part-to-part, the first step is to square the standard deviation to calculate the variance: 1.085302 = 1.17788

As the second step, we will sum the variance estimates for the three components: 1.17788 + 0.06278 + 0.01730 = 1.25796

Now that we have an estimate of the total variance, we can estimate the %Contribution for each of the three components by dividing individual components by the total and multiplying by 100:

%Contribution

Not bad! Most of our variation is attributable to the part-to-part variation, which is desirable in a good measurement system. What else can we learn?

If we’d like to see a visual representation of how each of the operators measured the same part, we can create an individual scatterplot with groups to represent each of the operators, then add the mean symbol and mean connect line:

Measurement by operator

Here we can see that Operator A’s measures are higher than B’s, and B’s are higher than C’s. In the graph above, the vertical points for each operator represent the repeatability, and the measures across operators represent the reproducibility.

Using the same information, we can also obtain estimates of the StudyVar and %StudyVar. First, we’ll need the standard deviation of the variance components, and then we can calculate the Study Variation and % Study Variation as shown below:

%StudyVar

This is really useful information—if we arrange the table as above the result is a close match to what we could obtain from Minitab for a full Gage R&R.

Additionally, we can calculate the number of distinct categories. The number of distinct categories is calculated in Minitab as: (Standard deviation for Parts/Standard deviation for Gage)*1.41

In our example, the StDev for parts is 1.0853 and the StDev for Gage (repeatability + reproducibility) is 0.38, so we get: (1.0853/0.38)*1.41 = 4.03

In Minitab, the result of the distinct number of categories is truncated so if we’d analyzed our Gage R&R in Minitab for multiple parts, the number of distinct categories would be 4 for this example.

So, if we have an estimate of the historical standard deviation for the Part-to-Part variation we can get a lot of useful information about our measurement system. But what if we don’t have the historical StDev? What if we only have information about the tolerance?

Using the Process Tolerance

If we do not have a historical estimate of the part-to-part variation, we can still obtain some useful information using the process tolerance. We can calculate the %Tolerance for the total gage by dividing the StudyVar for each source by the tolerance. For our example, let’s say the tolerance is 8:

%Tolerance

Of course, the results of the StudyVar/Tolerance were multiplied by 100 to obtain the percentages.

For more examples of Gage R&R using Minitab, see our articles on crossed and nested studies.

↧

Why Isn't This "Six Sigma" Project Improving Quality?

March 14, 2013, 5:00 am

≫ Next: Using Minitab to Choose the Best Ranking System in College Basketball

≪ Previous: Evaluating a Gage Study With One Part

If the train already wrecked, Six Sigma can't prevent it. Whether you're a quality improvement veteran or you're just starting to do research about what quality improvement methods are available today, you've seen headlines and articles that explain why Six Sigma and other data-driven quality improvement methods don't work.

Typically these pieces have an attention-grabbing headline, like Six Sigma Initiative Fails to Save the Universe, followed by a dissection of a deployment or project that failed—usually in spectacular fashion—to achieve its goals.

"There!" the writer typically crows. "See? It's obvious Six Sigma doesn't work!" What makes these articles misleading and potentially dangerous is that there's almost always a kernel of truth to the story. Projects do fail. Deployments do go awry. Without continued monitoring, quality improvements do wane.

But none of that means you shouldn't pursue a quality strategy based on data analysis, whether you happen to call it Six Sigma or something else. It's a mistake to use an example of a failed project as a rationale for not doing data-driven quality improvement, because there are also countless stories of quality improvements that saved businesses millions of dollars, deployments that have transformed organizations for the better, and benefits that have been sustained for years.

Business improvement trends that don't work don't last long. Six Sigma principles have been used widely in businesses worldwide for more than 30 years. That's why even though articles have tried to sound the death knell for Six Sigma for at least 15 years, we're still talking about it (and doing it) today. Done correctly, it works.

An Example of Quality Improvement Craziness?

One recent article cites this plea for assistance, evidently lifted verbatim from some discussion forum, as a reason why Six Sigma is doomed to failure in the real world:

"...There is a Call Center process in which CTQ is Customer Response Time. If Customer Response Time is more than >60sec then its a defect. In this Case opportunity will be 1. Suppose there are 300 Calls and out of that 123 call Response time is >60 sec (This is defect) and opportunity is 1 then my Sigma value will be 1.727. This value came by this Formulae process Sigma = NORMSINV(1-((Total Defects) / (Total Opportunities))) + 1.5 and DPMO is 410000, this value came by this formulae Defects Per Million Opportunities (DPMO) = ((Total Defects) / (Total Opportunities)) * 1,000,000

Question is, Can anybody tell me from this case how can i get bell curve. How I should calculate USL and LSL value. How I will draw this Bell Curve in a excel. Waiting for all your master response."

The author goes on to point out, and rightly so, that all this math isn't worth much without an explanation of "what can I—as a business person—do with it?" He continues:

Presented with information of this nature I would find it hard to glean anything useful in terms of process improvement to action. What’s more, turning a process into a set of calculations to eke the last drop of efficiency from it appears counterproductive.

He then goes on to state that, at some point, making incremental improvements will fail to yield significant returns (yeah...), that sometimes we need to scrap an entire process to achieve the greatest gains (sure, sure...), and that continuous improvement is about adaptation (no argument there...). The author wraps up by asserting that Six Sigma "is about the only thing that isn't" adapting. And since the core principles and practices of Six Sigma have remained consistent over the years, I can agree with that statement in its broadest sense.

Huh. The writer hasn't said anything I disagree with...but where's his evidence that "Six Sigma doesn't work in the real world"?

How to Make Sure Your Quality Improvement Projects Get Results

It's clear that the author is really arguing against the misapplication of data analysis to things that won't make any difference to quality—an idea that any quality practitioner should concur with. But what he describes is not Six Sigma: it's using math as a substitute for squid ink.

If you expend vast resources on an incremental process improvement project that doesn't have the potential for significant benefits, you're not really practicing quality improvement, let alone the typically more formal "Six Sigma." Moreover, the managers (or "champions") who would greenlight such a project aren't doing due diligence to make sure resources are being directed where they'll do the most good.

Here are some things that dedicated quality improvement practitioners do to make sure projects succeed:

They use a Project Prioritization Matrix or C&E Matrix to compare potential projects, looking at the total value for each project, the ease of completion, and other factors to determine which projects have the most value for the effort. They don't waste time on low-value projects.
They carefully examine the process they're trying to improve to determine meaningful metrics that will help the project team measure their success. They don't pick whatever numbers just happen to be available and attempt to wring meaning out of them.
To analyze data, they use professional statistical software that provides reliable output and, when necessary, can offer guidance about statistics and even interpret their results. They don't just toss some numbers into a spreadsheet and try to make it work.
When presenting their results to team members, process owners, executives, and other stakeholders, they use terms, concepts, and explanations that the audience will understand. They don't belabor a bunch of equations just because they can.
When making recommendations for improvements, they offer clearly defined actions and make sure the participants in the process understand the hows and whys. Then they consult with the team to make sure the improvements work, and make adaptations as necessary.

Successful quality improvement needs data analysis to succeed, but analyzing data aimlessly is not the same thing as doing data-driven quailty improvement.

Similarly, asserting "Six Sigma doesn't work" because it's not always done properly is like saying "Pianos don't work" because they don't sound good if you play them with mittens on.

What do you think?

↧

Using Minitab to Choose the Best Ranking System in College Basketball

March 15, 2013, 7:46 am

≫ Next: Predicting the 2013 NCAA Tournament with Minitab

≪ Previous: Why Isn't This "Six Sigma" Project Improving Quality?

basketball Life is full of choices. Some are simple, such as what shirt to put on in the morning (although if you’re like me, it’s not so much of a “choice” as it is throwing on the first thing you grab out of the closet). And some choices are more complex. In the quality world, you might have to determine which distribution to choose for your capability analysis or which factor levels to use to bake the best cookie in a design of experiments. But all of these choices pale in comparison* to the most important decision you have to make each year: which college basketball teams to pick during March Madness!

*may not actually pale in comparison

When it comes to filling out your bracket, you have a lot of information at your disposal. Some people pick teams based on colors, others use mascots, and some even use the RPI! But being the statistics nerd that I am, I like to use ranking systems created by other statistics nerds. Two of the most popular rankings are the Pomeroy Ratings and the Jeff Sagarin Ratings. However, there are numerous others. ESPN has the BPI, Ken Massey has the Massey Ratings, professors at Georgia Tech created the LRMC Rankings, and the list goes on and on. With all these choices, we better perform a data analysis to determine which one is best!

Last year I created a regression model that calculates the probability one team has of beating the other given the rankings of the two teams. I’ll use this model to help determine which ranking system is the most accurate!

Now, it’s not as simple as seeing which ranking system predicts the most number of games correctly. Most of the time there is a clear favorite, and all of the ranking systems agree. What I’m more interested in is how accurate the system’s probability is. For example, if there are 10 games where the ranking system says the favorite should win 70% of the time, then we would actually expect 3 games where the favorite loses. So it’s not about trying to predict every game correctly, it’s about trying to accurately gage the probability that one team has of beating another.

Last Year’s Model

Last year I used exclusively the LRMC Rankings, and the predictions went quite well. It picked 3 of the 4 final four teams, had Kentucky winning it all, and even showed that Lehigh had a legitimate shot of upsetting Duke (which they did)! And we knew the predictions would be accurate because we tested the model before the tournament.

Bar Chart of the Accuracy of the 2012 LRMC Rankings

This bar chart shows the predicted and observed probabilities based on last year's LRMC rankings (based on 1,045 games before the tournament). For example, the bar on the far left shows that for all the games in which the LRMC Rankings said the favorite would win between 50% and 59% of the time (there were 190), the favorite won 54.2% of the time. The red dot represents the average probability predicted by the model. In this case it was 55%. So in the 190 games in this category, the model said the favorite would win 55% of them on average, and the teams actually won 54.2%. That’s pretty accurate!

As you look at the other groups, you’ll see that the predicted (bars) and observed probabilities (red dots) are fairly close. So last year there were no worries about using the LRMC Rankings to make predictions in the NCAA tournament. But let’s not rest on our laurels! Instead of blindly using the LRMC Rankings again, let’s be good statisticians and ensure that they’re just as good this year. I’ll make another bar chart for the LRMC rankings using data from this year.

(You can get the worksheet with the data in it here. I've put the summarized data in the first few columns, and the raw data aftewards. Be warned, it's large!)

This Year’s Model

Bar Chart of the Accuracy of the 2013 LRMC Rankings

Hmmmm, the red dots don’t seem to be as close to the bars as they were last year. In fact, in games where the LRMC rankings say the favorite should be winning 55% of the time, they’re barley winning half of those games (50.3%)!

So let’s visualize the difference between the two years by plotting the difference between the observed and predicted probabilities for each probability group. More accurate ranking systems will have a smaller difference, so we’re looking for small bars on the next chart.

Comparing the 2012 and 2013 LRMC Rankings

We see that the 2012 LRMC Rankings are superior in 4 of the 5 groups, only losing out in the “60 to 69” group. We also see that in the "90 to 99" group, the 2013 LRMC rankings are off by over 5%!

Now before we start to panic, I should point out that the data from this year are based on fewer games (678 as opposed to 1,045). A smaller sample size leaves more room for more variation. Still, it seems unsettling that the results are so different from last year's. Perhaps we should use a different ranking system! Luckily, as I was tracking the 678 games for the LRMC rankings, I also recorded the rankings from 3 other systems:

Ken Pomeroy’s College Basketball Ratings
The Massey Ratings (I used the Power Rating, as they are meant for prediction)
Jeff Sagarin NCAA Basketball Ratings (again, using the PREDICTOR rankings, as they are meant for prediction)

I would have liked to compare more, but it takes a while to gather all the data. Alas, I had to limit myself to 4. But the good news is that we can now compare all 4 ranking systems on the exact same games. Just like the last Bar Chart, I’ll compare the difference between the predicted and observed observations. May the best ranking system win!

Comparing the Rankings Systems

And we have a winner! The Sagarin Ratings have the smallest difference in 2 of the 5 groups, and are just about equal to the smallest difference in 2 others. In addition, 4 of the 5 observed probabilities differ from the predicted probabilities by 1.1% or less! And the 5th group differs by only 2%, which isn't bad at all. The total difference for the Sagarin Ratings across all 5 groups is 4.6%. No other ranking system comes close to that. And in fact, both the LRMC Rankings and Massey Ratings are off by more than 4.6% in a single group!

So now let’s come full circle and plot the observed and predicted probabilities of the favorite winning using the Sagarin ratings.

Bar Chart of the Accuracy of the 2013 Sagarin Ratings

The red dots are a lot closer to the top of the bars here. In fact, the difference between them is less than it was for the 2012 LRMC Rankings! I think we’ve found the rankings that we’re going to use for this year’s NCAA tournament. Make sure to come back Monday, as I’ll use the regression model and the Sagarin Ratings to break down the entire tournament. Until then, let’s look at the current top 5 teams according to the Sagarin Ratings.

#5 Gonzaga

No, they’re not #1 like they are in the human polls, but the Sagarin rankings still have Gonzaga as one of the best teams in the country. And despite their #1 ranking in the human polls, a lot of people don’t believe in the Zags because they play in a weak conference. And while this is true, they are absolutely dominating that weak conference, to a tune of 0.3 points per possession. If you’re not familiar with efficiency numbers, that’s a lot. In fact, it’s higher than the amount of points per possession the 2007-08 Memphis Tigers outscored their weak conference by. And that Memphis team came within a Mario Chambers 3 pointer of winning the national championship. So while I’m not saying Gonzaga is destined for the championship game, I am saying that thinking they’re overrated because they play in a weak conference is folly.

#4 Duke

This Duke team is going to be a hard one to judge. The main problem with the Sagarin Ratings (or any computer based ranking system) is that they can’t account for injuries. So that means about half of the data for Duke comes in games played without Ryan Kelly. Even so, they still are ranked #4, showing that they are a great team. But how much higher would they be if Kelly was healthy all season? Most people are assuming they’d be #1. And yes, they did look mighty impressive with Kelly in their win at North Carolina. But before that, they only beat Miami by 3 points at home (even with 36 points from Kelly) and struggled for a half against a bad Virginia Tech team. A few more games in the ACC tournament will definitely be helpful to try and gauge just how good this team is.

#3 Louisville

A three-game skip in January made everybody jump off the Louisville bandwagon. But now it seems everybody is hoping back on as the Cardinals have won 11 of their last 12 games. And really, they should never have jumped off. This Louisville team definitely has the talent to return to the Final Four this year. They play suffocating defense and have the best player that nobody seems to be talking about in Russ Smith. In fact, Smith is so good that he’s currently the leading candidate for Ken Pomeroy’s Player of the Year. This will be a dangerous team in the tournament.

#2 Indiana

It shouldn’t be much of a surprise to see the best team in the Big 10 ranked so high. In a season where college basketball might have its lowest scoring season since 1952, Indiana has done its best to boost scoring, running the most efficient offense in college basketball. We should all be rooting for Indiana to go far in the tournament, if for no other reason than to make sure we don’t get stuck watching a 48-44 Final Four game. With their high scoring offense, Indiana games are anything but boring.

#1 Florida

Yup, you read that right. The Sagarin Ratings (and most other computer ranking systems) have the Gators at #1. The reason is because Florida just blows everybody out. And when they haven’t blown people out, they’ve been unlucky, going 0-4 in games decided by 2 possessions or less. This has led to numerous sports analysts claiming that Florida “doesn’t know how to win close games” and is “unclutch.” But the statistics show that the outcome in close games has more to do with randomness (luck) than it has to with either team being clutch or unclutch. For example, in 2011 Kentucky went 1-5 in games decided by 1 possession or less. People again said they didn’t know how to win in the clutch. But all that really happened was Kentucky’s unluckiness created a very underrated 4 seed that ended up going to the Final Four. And in the tournament, they won not only one, but two 1 possession games! How did they all of the sudden learn to be clutch!?!?!?!?!?!

At the moment Florida is projected to get a 2 or even a 3 seed. If that happens, it’s going to create a very unhappy 1 seed for whoever gets put in their region.

↧

Predicting the 2013 NCAA Tournament with Minitab

March 18, 2013, 10:38 am

≫ Next: Rethinking the Obvious: How Data Analysis and Diagrams Can Upend Conventional Wisdom

≪ Previous: Using Minitab to Choose the Best Ranking System in College Basketball

The 2013 NCAA Tournament Did everybody have a good Selection Sunday? Hopefully you did, and now you’re ready to jump into the brackets. And just like last year, Minitab is here to help you along! But first we have to wait for EPSN to stop yelling about who the 68th best team in the country is. I mean, honestly, you think they would have learned their lesson two years ago when they were adamant about what a travesty it was that VCU got an at-large bid. You know, the same VCU team that then went to the final four. Maybe next year Virginia should try not losing to Delaware at home. Oh, what’s that? Dick Vitale finally stopped complaining? Okay, we’re good to go!

In case you’re wondering where the following numbers come from, last year I spent not one, but two blog posts creating a regression model that calculates the probability of Team 1 beating Team 2 given the two teams' rankings. Of course, the model is only as good as the data you put into it. So last week I determined which NCAA ranking system is the most accurate. The winner was the Jeff Sagarin NCAA basketball ratings. Now that all the hard statistical work is behind us, let’s break down the bracket!

First Four

Team 1

Team 2

Favorite

North Carolina A&T

Liberty

North Carolina A&T (57%)

Long Island

James Madison

Long Island (51%)

St. Mary’s

Middle Tennessee St

St. Mary’s (58%)

La Salle

Boise St

La Salle (52%)

It looks like we’re set for a lot of close games in Dayton. The 16 seeds are playing for the right to get slaughtered by a 1 seed. But the other teams are playing for more. In each of the last two years, a team playing in the First Four has gone on to beat their next opponent (VCU in 2011 and South Florida in 2012). This year the best candidate is St. Mary’s. The Sagarin Ratings have St. Mary’s ranked at 28, meaning they are severely underseeded. And keep in mind this team hasn’t lost to a team not named “Gonzaga” since December. If they can get past Middle Tennessee St, they’ll have a great shot against Memphis in the next round.

Midwest: First Round

Higher Seed

Lower Seed

Favorite

Louisville

North Carolina A&T

Louisville (98%)

Colorado St

Missouri

Missouri (65%)

Oklahoma St

Oregon

Oklahoma St (67%)

St. Louis

New Mexico St

St. Louis (78%)

Memphis

St. Mary’s

St. Mary’s (53%)

Michigan St

Valparaiso

Michigan St (80%)

Creighton

Cincinnati

Creighton (64%)

Duke

Albany

Duke (93%)

In these tables, I’ll highlight cases where the statistics say that the lower seed should actually be favored over the higher seed (although I’m not counting games between the 8-9 seeds). This is why St. Mary’s is in red above. If they can beat Middle Tennessee St, the model actually likes them to beat Memphis, too. Memphis would be a slight favorite over Middle Tennessee St (55%), but that game would be close. Case in point, Memphis will be on upset alert from the outset. Don’t pick them to go far in your bracket.

There doesn’t look to be much in the way of upsets after St. Mary’s, though. Oregon has a 1-in-3 chance of beating Oklahoma St, so if you must pick another upset here, that’s your best bet. Valparaiso is actually a very strong 14 seed. However, they were unlucky in being paired with Michigan St, who is a very strong 3 seed. That severely hurt Valpo’s chances of pulling the upset.

Missouri is actually a pretty big favorite over the 8 seed Colorado St. This is because the Sagarin Ratings have the Cowboys ranked #18, which is pretty high for a 9 seed. Because everybody in your pool will most likely be split 50/50 on this game (since it’s an 8/9 game), pick the Tigers and gain an edge over the brackets who picked Colorado St. And as you’ll see, this will be a common theme in every 8/9 game.

Creighton is in the same boat as Missouri. They are a 7 seed, but the Sagarin Rankings have them as the 17th best team in the country. They have a good shot at advancing in the tournament.

Midwest: Future Rounds

Higher Seed

Lower Seed

Favorite

Louisville

Missouri

Louisville (80%)

St. Louis

Oklahoma St

St. Louis (52%)

Michigan St

Memphis

Michigan St (70%)

Duke

Creighton

Duke (63%)

Louisville

Saint Louis

Louisville (80%)

Duke

Michigan St

Duke (53%)

Louisville

Duke

Louisville (72%)

Duke

Saint Louis

Duke (62%)

The final Sagarin Ratings before the tournament have Louisville as the #1 team, so you’re going to see them heavily favored in all of these games. However, don’t let that fool you into thinking they are a lock for the Final Four. Missouri is actually a very underrated 9 seed that can challenge Louisville. Saint Louis and Oklahoma St are both quality teams, and Duke and Michigan St are the 2nd strongest 2 and 3 seeds in the field. If Louisville has to play Missouri, Saint Louis, and Duke, their chances of going to the final four are: .98 * .80 * .80 * .72 = 45%. So over half of the time, we would expect them not to make the final four. Kentucky got a much more favorable draw last year as the overall 1 seed.

If St. Louis and Oklahoma St meet in the 2nd round, that game will pretty much be a coin flip. But we saw in the first round that Oklahoma St has a better chance of being upset. So the smart thing to do here would be to take St. Louis into the Sweet Sixteen. Or, if you’re like my wife, you could decide that the St. Louis’s mascot is hilarious and awesome, and use that as your reasoning to take them to the Final Four.

Michigan St will most likely have an easier 2nd round game than Duke. They would be a heavy favorite over Memphis, and if they meet St. Mary’s in this game, they’d still win 67% of the time. Duke on the other hand, has a totally loseable game against Creighton (should the two meet). Creighton is one of the highest-scoring offenses in the country. If they’re hitting their shots, they could give Duke an early exit.

Overall, Louisville is the favorite to win the Midwest. But their path won’t be easy. Duke and Michigan St are both good enough to get to the Final Four, and St. Louis and Oklahoma St aren’t terrible long shots.

West: First Round

Higher Seed

Lower Seed

Favorite

Gonzaga

Southern

Gonzaga (96%)

Pittsburgh

Wichita St

Pittsburgh (71%)

Wisconsin

Ole Miss

Wisconsin (70%)

Kansas St

La Salle

Kansas St (67%)

Arizona

Belmont

Arizona (65%)

New Mexico

Harvard

New Mexico (79%)

Notre Dame

Iowa St

Notre Dame (56%)

Ohio St

Iona

Ohio St (88%)

At first glance it doesn’t appear that there are any good chances for upsets in the first round here. Not counting Notre Dame (who is a 7 seed), all of the higher seeds have at least a 65% chance of advancing. However, there is another way to look at it. What is the probability of all of the 1 through 6 seeds advancing? The answer is: .96 * .70 * .67 * .65 * .79 * .88 = 20%. So there is an 80% chance that at least one upset happens here. It’s trying to decide where it will occur that’s the hard part!

Of all the underdogs, Ole Miss is ranked the highest (31), according to Sagarin. Unfortunately, Wisconsin is the strongest 5 seed in the field, ranked 8th! The next best underdog is Belmont, ranked 46th. The statistics don’t like Belmont quite as much as they have the last two years (they were favored against Georgetown last year), but maybe that means they're due to finally pull an upset in the tournament. La Salle has a puncher's chance against Kansas St (Boise St’s chances would be about the same) because the Wildcats are the most overrated 4 seed in the tournament (ranked 26th). But La Salle is ranked 57 and Boise St is 63! That doesn’t instill much confidence in an upset!

West: Future Rounds

Higher Seed

Lower Seed

Favorite

Gonzaga

Pitt

Gonzaga (59%)

Wisconsin

Kansas St

Wisconsin (67%)

New Mexico

Arizona

Arizona (52%)

Ohio St

Notre Dame

Ohio St (70%)

Gonzaga

Wisconsin

Gonzaga (56%)

Ohio St

Arizona

Ohio St (66%)

Gonzaga

Ohio St

Gonzaga (52%)

Ohio St

Wisconsin

Ohio St (54%)

Poor Gonzaga! This is probably their best team ever, and they got one of the worst draws a 1 seed has ever had. If you look at the probabilities, you’ll see that they’re favored in every possible matchup. But the probabilities are all in the 50s! Pittsburgh is ranked #10 in the Sagarin Ratings, which means they are probably the best 8 seed in the history of the tournament. Wisconsin is ranked 8th, making them the highest ranked 5 seed in the field, and Ohio St is 6th, making them the highest ranked 2 seed. If Gonzaga has to play each of those teams, their chances of making the Final Four are only 16%!

Ohio St actually has a better chance as the 2 seed. While Gonzaga, Pittsburgh, and Wisconsin are beating each other up in the upper part of the region, they have a pretty easy route through the bottom. They’ll be heavily favored over either New Mexico or Arizona should they play one of them in the Sweet Sixteen. And while they would be a small underdog to Gonzaga in the regional finals, there is a good chance Gonzaga never even gets there. Don’t be surprised if there is a rematch of the Big 10 championship game between Ohio St and Wisconsin in the regional finals here.

One last thing about this region. The statistics say that Arizona would actually be favored in a second round game against New Mexico. But before you pick the upset, remember that Arizona is more likely to lose in the first round. Since they have an easier first round opponent, New Mexico is actually the safer pick here.

South: First Round

Higher Seed

Lower Seed

Favorite

Kansas

Western Kentucky

Kansas (96%)

North Carolina

Villanova

North Carolina (63%)

VCU

Akron

VCU (70%)

Michigan

South Dakota St

Michigan (83%)

UCLA

Minnesota

Minnesota (58%)

Florida

Northwestern St

Florida (95%)

San Diego St

Oklahoma

San Diego St (55%)

Georgetown

Florida Gulf Coast

Georgetown (87%)

The first game that jumps out here is the 6-11 game between UCLA and Minnesota. The statistics say that the Minnesota should actually be favored as the 11 seed. And keep in mind that the statistics don’t know that UCLA guard Jordan Adams broke his foot last Friday and won’t be playing in this game. So Minnesota’s chances are actually even better than listed here. This is an upset you should pick.

Speaking of players missing the tournament, Akron was a team I was watching all season as a potential Cinderella. At one point they had won 19 straight games and were dominating the MAC. But then their starting point guard, Alex Abreu, was arrested on drug charges and suspended from the team. Akron was still able to win their conference tournament without him, but now they face a VCU team that full court presses for the entire game. It’s a terrible matchup for a team that just lost their point guard. So the statistics here actually underestimate VCU’s chances of winning. I would avoid picking Akron.

South: Future Rounds

Higher Seed

Lower Seed

Favorite

Kansas

North Carolina

Kansas (72%)

Michigan

VCU

Michigan (59%)

Florida

Minnesota

Florida (81%)

Georgetown

San Diego St

Georgetown (66%)

Kansas

Michigan

Kansas (63%)

Georgetown

Florida

Florida (72%)

Kansas

Florida

Florida (58%)

Kansas

Georgetown

Kansas (64%)

If you’re looking for a lower seed to pick for the Final Four, the South is your region. The Sagarin Ratings have Florida as the #2 team, so the model will favor them in every possible matchup. The reason they are ranked #2 with so many losses is because Florida is 0-5 in games decided by 2 possessions or less. They're a very talented team that got unlucky in a handful of games. If that luck changes in the tournament, they will be headed to the Final Four.

If you don’t want to pick Florida, Kansas is the next best bet. Sagarin has them ranked 4th, so they are definitely good enough to win this region. They will face some tough teams, though. One of them will be the winner of Michigan and VCU. Assuming they both win their first game, the winner of that game will definitely be able to give Kansas a run for their money.

One team I would stay away from is Georgetown. They are ranked 12th, so they’re slightly overrated as a 2 seed. And they’ll most likely have to go through both Florida and Kansas to get to the Final Four, where they’d be heavy underdogs in both games. They would even be an underdog to Michigan, and only a slight favorite to VCU. And if that weren’t bad enough, San Diego St has a 1 in 3 shot of beating them in the second round!

So your best bet here is to pick either Florida or Kansas. If you really want a long shot, either Michigan or VCU could make a run.

East: First Round

Higher Seed

Lower Seed

Favorite

Indiana

Long Island

Indiana (95%)

NC State

Temple

NC State (67%)

UNLV

California

UNLV (58%)

Syracuse

Montana

Syracuse (88%)

Butler

Bucknell

Butler (58%)

Marquette

Davidson

Marquette (68%)

Illinois

Colorado

Illinois (54%)

Miami

Pacific

Miami (87%)

And we’ve saved the best region for last! Now, the teams I highlighted in the table are all the higher seeds, but I wanted to point them out because the probabilities are all low. California has the best chance of any 12 seed of beating a 5 (and that happens every year!). And Butler has a mere 58% chance of beating Bucknell. Marquette’s probability is higher, but a 14 seed getting a 1 in 3 shot to beat a 3 seed is pretty big. If you want to pick a region to go crazy, this is where you want to do it.

East: Future Rounds

Higher Seed

Lower Seed

Favorite

Indiana

NC State

Indiana (76%)

Syracuse

UNLV

Syracuse (69%)

Marquette

Butler

Marquette (65%)

Miami

Illinois

Miami (68%)

Indiana

Syracuse

Indiana (69%)

Miami

Marquette

Miami (58%)

Indiana

Miami

Indiana (70%)

Miami

Syracuse

Syracuse (51%)

This region is Indiana’s for the taking. Ranked 3rd in the Sagarin Ratings, they have the easiest path to the Final Four. No other team in their region is ranked in the top 10, with Syracuse being the highest at 13. So a possible matchup between Indiana and Syracuse would be the hardest game Indiana would have to play, and we see that they would win 69% of the time. Compare that to Louisville, whose second-round game might be against a team ranked 15!

There was a lot of talk about Miami not getting a 1 seed during Selection Sunday. But honestly, they should be happy they got a 2! Despite being the weakest 2 seed in the field, they are favored in each potential game before the regional final. That’s because Marquette is the weakest 3 seed, and Butler is a very weak 6. So again, if you want to go crazy with upsets somewhere, the bottom half of this region is where you want to do it. And just to let you know, Davidson would be only about a 1-point underdog to Butler in a potential second round game, and they would be favored against Bucknell.

Final Four

Higher Seed

Lower Seed

Favorite

Louisville

Gonzaga

Louisville (68%)

Louisville

Ohio St

Louisville (70%)

Gonzaga

Duke

Gonzaga (54%)

Ohio St

Duke

Ohio St (52%)

Indiana

Kansas

Indiana (54%)

Indiana

Florida

Florida (55%)

Kansas

Miami

Kansas (67%)

Miami

Florida

Florida (74%)

Louisville

Kansas

Louisville (65%)

Louisville

Florida

Louisville (58%)

Louisville

Indiana

Louisville (62%)

Being the #1 ranked team, Louisville is obviously the favorite in every matchup here. I’ve seen a lot of experts pick Miami to get to the Final Four. If they do, they’ll be big underdogs if they have to face either Florida or Kansas. All of the other matchups are basically coin flips. So who to pick? Here is my thinking:

Nobody has a region full of easier teams than Indiana. This doesn’t guarantee them anything, but I like their chances of getting to the final four more than anybody else. They would be underdogs to Florida and Louisville, but there is no guarantee they will have to play either of those teams. And they would be favored against everybody else. And from a non-statistical standpoint, did you know that Tom Crean is married to the sister of John and Jim Harbaugh? You know, the same two coaches that just squared off in the Super Bowl. This may very well be the year of the Harbaughs! You know CBS is dying to play up that story if Indiana gets to the Final Four. The stats like Indiana, and I think karma will too. Look for the year of the Harbaughs to continue as Indiana cuts down the nets in Atlanta!

My Bracket

↧

Rethinking the Obvious: How Data Analysis and Diagrams Can Upend Conventional Wisdom

March 19, 2013, 3:33 am

≫ Next: If You Don't Try Minitab's Project Manager, You'll Hate Yourself Later

≪ Previous: Predicting the 2013 NCAA Tournament with Minitab

counterintuivie Has it happened to you?

You organize a brainstorming session to begin analyzing your process.

At the kick-off meeting, several people sit with arms crossed, lips pursed, eyes cast downward. Frequently, they’re the ones who’ve worked at the process for most of their professional lives.

“Here we go again. Wasting time to prove the obvious,” their faces say. “I’ve done my job for years. You’re not going to show me anything I don’t already know.”

Yet you bravely push forward. Every now and then you see someone roll their eyes. “When can I get back to my desk and do some real work?!!!” they seem to implore.

It’s a bit daunting to face down smart, experienced skeptics. Because their in-depth, hands-on knowledge—which they’ve painstakingly pieced together over the years—is indeed impressive, profound, and extremely invaluable. More often than not, they know a lot more than you do.

And make no mistake—you want (and desperately need) them on your team!

However, aquired knowledge is a tricky thing. It enlightens us much of the time. But it can also deceive us when we least suspect it.

It's no wonder, then, that human history is replete with examples of how the even most logical and seasoned reasoning of experts can be dead wrong.

A Classic Example: A Simple Diagram Reveals the Transmission Vector for Cholera

London, August 31,1854.

A major outbreak of cholera suddenly erupts in the city. There had been other outbreaks before, but this one was particularly fast and deadly. In just 10 days, more than 500 people were dead. And the mortality rate quickly shot up to over 12% in some areas.

The outbreak was concentrated along the Thames, in low-lying areas inhabited mainly by the poor. Public health experts at the time knew exactly what the problem was: reeking fumes from open cesspools were mingling with the ever-present fog to create a highly noxious vapor (miasma) that caused cholera.

The theory made perfect sense. And it was believed by most highly intelligent, experienced people at the time. So why would anyone question it?

Thankfully, one physician did.

John snow Instead of jumping on the miasma bandwagon, Dr. John Snow decided to take a hard look at the numbers.

First, he gathered weekly statistics on the cholera deaths (compiled by demographer William Farr). To visualize the results, Snow plotted the location where each victim had lived on a map. He also plotted the locations of the city water pumps.

Next, Snow employed a graphical tool of the time—one that had been around since the 1600s—called a Voronoi diagram. He drew a cell around each data point representing a water pump by constructing a series of connecting line segments. Each segment bisected the distance between that data point (pump location) and a neighboring data point (pump location).

This results in a honeycomb pattern, in which all the points within each cell are closer to the central data point than to any other data points outside of the cell.

voronoi diagram

Snow adjusted his diagram to account for human walking routes, which were not always the straightest distance between points.

When finished, the visual diagram provided a "Eureka!" moment—one that turned the conventional wisdom about cholera transmission on its head. The diagram revealed that nearly all of the cholera victims lived within the cell defined by a single city water pump, located on Broad Street.

Although the diagram itself didn’t prove that cholera was transmitted via water rather than air, it opened the door to subsequent research that eventually did demonstrate this transmission vector. And it showed that even a fairly simple visual diagram of basic data can be a powerful tool for challenging entrenched assumptions.

Thinking Inside the Box in the 1850s

I can’t help but wonder. What if Snow had set up a team of experts in the 1850s to collaborate on creating the Voronoi map instead of making the map by himself?

I imagine his efforts might have elicited skeptical comments like these:

“Why are we wasting time plotting the exact location of each of the victims on a map? This will take forever. We already know the area of the disease outbreak!”

“You want us to sit here and draw line segments between pairs of points on a map? Are you out of your mind?! While hundreds of people are undergoing painful death throes from a contagious disease!?”

But That Was Then. This Is Now. We're a Lot Smarter...(right?)

We know the world’s not flat. We know what causes thunder. We know the Earth orbits the Sun. We know that…[fill in the blanks].

But the sad truth is, we don’t know nearly as much as we think we do. And most of us aren’t John Snows. Most of us are more like the public health officials in London of the 1850s—shrouded in the miasma of our environment and our times.

funnel brain Problem is, we can’t always know what we don’t know. Our brains tend to quickly latch onto pat explanations, routinely treading well-worn, narrow paths of thinking. It’s how we simplify the complex reality around us.

Personally, I’ve found that with time and experience, my brain gets just a wee bit smarter with each passing year. (I wish it got much, much smarter, much, much faster. I’m running out of time!)

Unfortunately, I’ve also noticed that my brain gets increasingly stubborn with time—more prone to cling to what it deems to be intuitively right or historically correct.

“Of course I know this. I’ve seen it a hundred times before…”

But every brain should come with an operator’s warning:

DANGER! The growing schema of knowledge that empowers your judgment may one day turn around and bite you in the buttocks. And it will probably hurt.

Ultimately, perceptions that go unchallenged will fail us. And the only way to prevent a misconception, is to regularly challenge a preconception.

Properly performed and objectively analyzed, data analysis and visual diagramming can be your strongest allies in helping to keep mind your open to the counterintuitive.

A Modern Day Example: The Airline Boarding Process

boarding plane One of my favorite recent examples of how conventional thinking can be turned on its head by objective analysis involves the airline boarding process. (Not everyone would call it a process—it often feels more like a cattle herding operation!)

For years, airlines used a “block” method to board the aircraft by sections. An organized procedure like this is bound to be much faster and more efficient than having passengers board a plane randomly, right?

Wrong.

Once researchers began to question the status quo method of boarding planes—using experiments, diagrams, Monte Carlo methods, regression, and hypothesis tests—many found that (surprise!) the organized block method of boarding a plane turns out to be slower than many other boarding methods, including using a random procedure! (Steffan and Hotchkiss, 2011; Steiner and Phillip, 2009; Inman, Jones, and Thompson, 2007).

Of course, some airlines still use the block procedure for boarding, despite the recent hard evidence against it. But that's not surprising. Because in the 1850s London, most public health officials in London didn't initially accept Snow's theory of water-borne transmission for cholera, even after seeing his graphic evidence.

What can you do? Try another diagram. Run another data analysis. Then, take the long view...

Trivia notes

Friday, March 15 was the 200th anniversary of John Snow’s birth in York, England. If you're ever in London, you can visit the infamous Broad Street pump that Snow identified as the source of the cholera outbreak of 1854.
For a great historical account of how Snow analyzed the mortality data and created a Voronoi diagram, read The Ghost Map, by Steven Johnson (Riverhead books). I've summarized some of the information from Steven's book for this post.

↧

If You Don't Try Minitab's Project Manager, You'll Hate Yourself Later

March 20, 2013, 8:54 am

≫ Next: Streamlining Surveillance Processes with Lean Six Sigma

≪ Previous: Rethinking the Obvious: How Data Analysis and Diagrams Can Upend Conventional Wisdom

Disappointed woman Normally, I like to talk about fun statistical things to build your confidence: gummi bears, poetry, and movies, just to name a few. But building your confidence also means getting comfortable with Minitab Statistical Software. One of the features that makes it easy to view your results and data in a snap is Minitab's Project Manager.

My favorite way to use the Project Manager is through the toolbar:

Project Manager Toolbar

Minitab graphs go directly to Powerpoint Click the leftmost button once, and you see all of the output in your project. Click the second button once, and you see all of the worksheets in your project. Click the third button once, and you see all of the graphs in your project. The best thing about this is that you can select multiple pieces of output, right-click, and choose Send to Microsoft PowerPoint. Then, amazingly, everything you chose is right in PowerPoint waiting for you to present.

You've noticed, of course, that I've been unnecessarily saying what happens when you click the button once. But it turns out that you do get something a bit different when you click the button twice. You see the whole Project Manager, with a list of everything that's in it.

Project Manager Window

In this image, you see the details of a designed experiment. Even the terms that are in the model. By clicking on a folder you can easily move between the Session, Graphs, and Worksheets that the first three buttons went to. And if you like writing macros as much as Redouane Kouiden, then the History folder is like your "Record" button in a Microsoft product. In the History folder, you see all of the Minitab commands you can use to quickly recreate an analysis.

As I'm not aware of Oprah Winfrey's statistical inclinations, I can't reliably predict the probability that the Project Manager will end up on a list of Favorite Things that millions of people read. But I can tell you that it's on mine, and I can hypothesize that Oprah likes a fun statistic or two. If you'd like to see a few more tips and tricks for the Project Manager, check out our Project Manager Tutorial.

The photo of the disappointed woman is by nimble photography and licensed for reuse under thisCreative Commons License.

↧

Streamlining Surveillance Processes with Lean Six Sigma

March 21, 2013, 5:00 am

≫ Next: Evaluating a Gage Study With One Part

≪ Previous: If You Don't Try Minitab's Project Manager, You'll Hate Yourself Later

White Sands Missile Range logo I had the privilege of talking with Sue Schlegel, Lean Six Sigma black belt and quality improvement mentor at White Sands Missile Range, which is located just outside of Las Cruces, New Mexico. Schlegel and an improvement team at White Sands recently conducted a Lean Six Sigma project to streamline surveillance processes and they used Minitab to analyze the data. We found Sue’s story an interesting case study for a LSS project, so I thought I would share it with you here on the blog, too.

Reducing Work Hours

When clients request classified video surveillance missions, the White Sands Missile Range contracts with independent optics technicians to complete hands-on surveillance at various mission sites. The labor costs for providing these grounds checks became very costly for clients and the process itself was arduous. “The goal of the project was to reduce the work hours incurred by 30 percent,” Schlegel told me. “By eliminating non-value-added activity from the process, we knew we could cut costs and make more efficient use of people’s time—allowing them to focus on other important projects.”

Using DMAIC

The team followed the DMAIC methodology and organized their project into five phases: define, measure, analyze, improve, and control. From start to finish, the process involved fifteen steps, which included time for technicians to travel to and from the mission site, set up and tear down equipment, and monitor video surveillance streams.

Minitab Bar Chart

With Minitab charts, the team was able to view the distribution of process cycle times and identify the current mean process cycle time. To assess the baseline process performance and capability of the current process, they performed process capability analysis. The capability histogram they created verified that the process was not meeting the upper specification limit for cycle time. For further insight into each process step, the team used stacked bar charts to analyze the time technicians spent at each step across mission sites. This made it easy to see where process bottlenecks were occurring at each site.

“Early in a project, Minitab helps us to stratify the data, quantify our wastes, and pinpoint where problems lie within a process,” says Schlegel. “In this case, we found that 68 percent of the total process cycle time was expended on the non-value-added task of monitoring video streams.”

Minitab Process Capability

Armed with this understanding, the team set out to understand root causes and prioritize possible solutions. Because the surveillance missions required manpower to monitor video streams for long periods, they looked for alternative ways to monitor streams that did not require human attention. After researching and talking through various solutions, the team decided to implement and pilot-test automated video surveillance cameras equipped with powerful motion detection software. The new cameras were cost-effective, easy to install and maintain, and drastically reduced the manpower needed to carry out the monitoring phase of video surveillance missions.

Lean Six Sigma Project Results

The original goal of the project was to reduce the work hours involved in the classified video surveillance process by 30 percent. After implementation of the new surveillance system, the hours were reduced by 47 percent—greatly surpassing the original project goal. What formerly took four optics technicians to complete can now be done with two technicians, freeing the remaining technicians to work on other missions to support White Sands customers. The team anticipates the new process will also save customers $1.6 million through 2018.

By removing non-value-added time from the process and reducing the number of process steps, the project increased the process cycle efficiency from 10 percent to 19 percent. The team verified that the new process was in control and the post-project capability analysis revealed they met their goals for reducing the process cycle time.

Perhaps the greatest benefit of this project is its value for replication. “This application is being replicated in other processes and programs across White Sands Missile Range,” says Schlegel, “and can be applied to similar agencies within the Army.”

Thanks again for sharing your story with us, Sue! To read the full case study, please visit: http://www.minitab.com/uploadedFiles/Company/News/Case_Studies/White%20Sands-EN.pdf.

Do you have a story to share with us? Tell us athttp://blog.minitab.com/blog/landing-pages/share-your-story-about-minitab

↧

Evaluating a Gage Study With One Part

March 13, 2013, 5:00 am

≫ Next: Choosing Statistical Software: Four Questions You Should Ask

≪ Previous: Streamlining Surveillance Processes with Lean Six Sigma

What can I do with the output of a Gage study with only one part?
How can I use the variance component estimates to obtain meaningful information about my measurement system?

Getting Meaningful Information about the Measurement System

So, what other information will we need to get meaningful information about our measurement system?

Using Historical Part-to-Part Standard Deviation

As a first example, let’s assume that the historical standard deviation for part to part is 1.0853.

The total variation is the sum of the variance components for the part-to-part variation, for repeatability and for reproducibility.

Since we have the standard deviation for part-to-part, the first step is to square the standard deviation to calculate the variance: 1.085302 = 1.17788

As the second step, we will sum the variance estimates for the three components: 1.17788 + 0.06278 + 0.01730 = 1.25796

Now that we have an estimate of the total variance, we can estimate the %Contribution for each of the three components by dividing individual components by the total and multiplying by 100:

%Contribution

Not bad! Most of our variation is attributable to the part-to-part variation, which is desirable in a good measurement system. What else can we learn?

Measurement by operator

%StudyVar

This is really useful information—if we arrange the table as above the result is a close match to what we could obtain from Minitab for a full Gage R&R.

Additionally, we can calculate the number of distinct categories. The number of distinct categories is calculated in Minitab as: (Standard deviation for Parts/Standard deviation for Gage)*1.41

In our example, the StDev for parts is 1.0853 and the StDev for Gage (repeatability + reproducibility) is the square root the two combined variance components which is 0.282984, so we get: (1.0853/0.282984)*1.41 = 5.4

In Minitab, the result of the distinct number of categories is truncated so if we’d analyzed our Gage R&R in Minitab for multiple parts, the number of distinct categories would be 5 for this example.

Using the Process Tolerance

%Tolerance

Of course, the results of the StudyVar/Tolerance were multiplied by 100 to obtain the percentages.

For more examples of Gage R&R using Minitab, see our articles on crossed and nested studies.

↧

Choosing Statistical Software: Four Questions You Should Ask

March 22, 2013, 5:00 am

≫ Next: Getting Started with Factorial Design of Experiments (DOE)

≪ Previous: Evaluating a Gage Study With One Part

Questions About Statistical Software Data. Analysis. Statistics. It seems like everybody is talking about the importance of doing data analysis, whether it's analytics for predicting consumer behavior or looking at critical metrics for Six Sigma and other data-driven quality improvement programs. Not only do we have more data available to us than ever before, we're also blessed...and/or cursed...with an enormous range of software options to help us make sense out of all this data we're trying so hard to understand.

Your options for doing data analysis run the gamut—from a pencil, paper and calculator costing a couple of bucks to customized systems tailored precisely to your needs and costing millions.

Unless you enjoy hours of repetitive hand calculations, or you have a couple of million dollars burning a hole in your budget, you'll probably want a software package that sits somewhere between these two extremes. But that still leaves a wide array of possibilities. How can you determine which statistical software package is the best one for your needs?

There's no right or wrong selection when it comes to picking data analysis software: what's best for you will depend on a lot of different factors. So let's look at some of the considerations you should keep in mind when you're doing research and making your choice.

Who Could Use Statistical Software in Your Organization?

The first thing to consider is the people in your company who will or could be using the software. Are they expert statisticians, relative novices, or a mix of both? Will they be analyzing data day-in, day-out, or will some be doing statistics on a less frequent basis? Is data analysis a core part of their jobs, or is it just one of many different hats some users have to wear? What's their relationship with technology—do they like computers, or just use them because they have to?

Figuring out who needs to use the software will help you match the options to their needs, so you can avoid choosing a package that does too much, one that does too little, or even one that does the wrong thing entirely.

If your users span a range of cultures and nationalities, be sure to see if the package you're considering is available in multiple languages.

How Easy Is the Statistical Software to Use?

Data analysis is not simple or easy, and many statistical software packages do not even try to make it any easier. This is not necessarily a bad thing, because "ease of use" is different for different users.

An expert statistician will know how to set up data correctly and will be comfortable entering statistical equations in a command-line interface—in fact, they may even feel slowed down by using a menu-based interface. On the other hand, a less experienced user may be intimidated or overwhelmed by a statistical software package designed primarily for use by experts.

The good news is that many statistical software packages are much easier to learn than they used to be, and most offer tutorials and documentation that can provide some help when questions arise. But ease of use varies widely, so see what kinds of built-in guidance statistical software packages offer to see which would be easiest for the majority of your users.

You can also check to see if different interface options exist. Is it easy to customize the statistical software's interface for users who have different skill sets? Does the package offer a streamlined interface for novices, but also easy access to the command line for users who prefer it?

What Kind of Support Do Your Statistical Software Users Want?

If people in your organization will need help using statistical software to analyze their data, how will they get it? Does your company have expert statisticians who can provide assistance when it's needed, or is access to that kind of expertise limited?

If you think people in your organization are going to contact the software's support team for assistance, it's smart to check around and see what kinds of assistance different software companies offer. Do they offer help with analysis problems, or only with installation and IT issues? Do they charge for technical support?

Look around in online user and customer forums to see what people say about the customer service they've received for different types of statistical software. Some software packages offer free technical support from experts in statistics and IT; others provide more limited, fee-based customer support; and some packages provide no support at all.

Where Will People Use the Statistical Software?

Will you be doing data analysis in your office? At home? On the road? All of the above? Will people in your organization be using the software at different locations across the country, or even the world? What are the license requirements for software packages in that situation? Does each machine needs a separate copy of the software, or are shared licenses available?

Check on the options available for the packages you're considering. A good software provider will seek to understand your organization's unique needs and work with you to find the most cost-effective solution.

Want to Talk?

These questions can start you thinking about the important considerations in selecting a statistical software package, but your individual situation is unique. If you have questions about software for data analysis or quality improvement, please feel free to contact the Minitab representative nearest you to discuss your situation in detail. We are happy to help you identify the needs of your organization and to help you find a solution that will best fit them.

↧

Getting Started with Factorial Design of Experiments (DOE)

March 25, 2013, 5:00 am

≫ Next: When is Easter . . . for the next 2086 years?

≪ Previous: Choosing Statistical Software: Four Questions You Should Ask

When I talk to quality professionals about how they use statistics, one tool they mention again and again is design of experiments, or DOE. I'd never even heard the term before I started getting involved in quality improvement efforts, but now that I've learned how it works, I wonder why I didn't learn about it sooner. If you need to find out how several factors are affecting a process outcome, DOE is the way to go.

Somewhere in school you probably learned, like I did, that when you do an experiment you need to hold all the factors constant except for the one you're studying. That seems simple enough, until you hit a situation where you have many factors that you want study at the same time. Not only would studying each factor one at a time be very expensive and time-consuming, but you also wouldn't get any information about how the different factors might affect each other.

That's where design of experiments comes in. DOE turns the idea of needing to test only 1 factor at a time on its head by letting you change more than a single variable at a time. This minimizes the number of experimental runs you need to make, so you can obtain meaningful results and reach conclusions about how factors affect a response as efficiently as possible.

In DOE, One Size Doesn't Fit All

Depending on what you want to discover, and how much detail you need, a designed experiment may be very simple or very complex. Some experiments might include only one or two factors—others might look at a few dozen.

screening experiment DOE One of the most common types of designed experiments is a simple screening experiment, which is used to determine the factors have the greatest influence on an outcome. For example, an auto manufacturer might use a screening experiment to see which of seven or eight factors have the biggest effect on the drying time of paint on a new car.

Once the manufacturer has identified the two or three most important factors, quality engineers can use a more complex, multi-level designed experiment to identify the optimal settings for those factors. Obviously, the same experimental design wouldn't work for both cases.

It's a little bit like sandpaper: sheets with a large grit will let you sand off a big area quickly, while you'll need a finer grit to achieve total smoothness. Similarly, some designed experiments are great for broad, exploratory investigations, while others will give you tremendous precision and certainty.

What Do I Need to Create the Factorial Design?

Let's say you work for an electronics firm that has recently received a large number of complaints about defective mp3 players. Quality engineers have identified up to five different factors that could be to blame. You know that a designed experiment is needed, but how can you be sure you collect the right amount of data, under the right conditions, with the right factor settings, in the right order?

There's good reason to be concerned when starting a designed experiment. If you're setting up even a simple designed experiment by hand, it can be very difficult and leaves plenty of room for error. Fortunately, we can use statistical software to customize factorial designs. These tools make it easy to create experiments that are as detailed as they have be, but also as simple as they can be.

For example, Minitab's Create Factorial Design creates a data collection worksheet for you, indicating the factor combinations to run, as well as the random order in which to collect your data. You can also print the worksheet to simplify data collection.

Choosing the Type of Design

The right design for your experiment will depend on the number of factors you're studying, the number of levels in each factor, and other considerations. Minitab offers two-level, Plackett-Burman, and general full factorial designs, each of which may be customized to meet the needs of your experiment.

You must have at least two factors and two levels for each (if you're doing a general full factorial design, you can have more than two levels). Factor levels, or settings, can be text (such as high and low) or numeric (such as 100° and 200°). Factors also can be categorical or continuous.

Your goals may demand greater or less statistical power. Are you doing a very sensitive adjustment for a critical process, or an early screening analysis to find out what factors even affect your outcome? If you're on a tight budget, the type of experiment you select might be influenced by how many experimental runs you can afford to do. A good design-of-experiments tool will let you quickly compare power and sample size assessments for 2-level factorial, Plackett-Burman, and general full factorial designs to help you choose the design appropriate for your situation.

Learning More about DOE

If you'd like to learn more about DOE and you're using Minitab, the built-in tutorial (Help > Tutorials > DOE) will lead you through a factorial experiment from start to finish; it's a pretty painless way to get your feet wet. And if you're not already using Minitab, you can get the free 30-day trial to check it out.

Are you using DOE in your work yet?

↧

When is Easter . . . for the next 2086 years?

March 26, 2013, 4:00 am

≫ Next: Real-life Data Analysis: How Many Licks to the Tootsie Roll Center of a Tootsie Pop?

≪ Previous: Getting Started with Factorial Design of Experiments (DOE)

Easter bunny Spring is in the air, and Easter is coming up soon! Easter occurs on March 31, 2013, and I’ve heard people exclaim that it’s early this year. I never really remember the date of Easter from one year to the next, but I had vague memories of it being in March not too long ago. Like any good statistician, I started wondering about the distribution of Easter dates. What dates are more common and which are less common? Is Easter in March really that unusual?

Even after reading the official definition of when Easter occurs, I still wasn’t clear about the date range. Easter occurs on the Sunday that follows the Paschal Full Moon date for the year. The Paschal Full Moon is the Ecclesiastical Full Moon that occurs after March 20. Huh? So, I’m going to use Minitab statistical software to help find the answers!

In this post, we’ll answer these questions about when the Western Easter occurs. In fact, we’ll assess the dates of all Easters that fall within the Gregorian calendar. The Gregorian calendar starts in 1582 and is accurate up until 4099. For once, we won’t be looking at a sample but rather the entire population of the 2517 Easters that fall within the Gregorian calendar! You can get the Minitab worksheet here. And if you don't already have Minitab, please download the free 30-day trial and follow along!

Graphing the Distribution of the Easter Dates: 1583 to 4099

When you want to get a quick feel for the data, it’s a good idea to graph the distribution. Below is a bar chart that displays the number of times that Easter falls on any given date. The graph uses codes for the dates on the X-axis. For example, M31 is March 31 and A1 is April 1.

Bar chart of Easter dates

By looking at the range of the X-axis, we can see that Easter occurs from March 22 to April 25, or 35 possible dates. The broad middle of the distribution bubbles up and down around the average of 72 occurrences per date. The most common date is April 10, with 102 occurrences (4.05%). The least common day is March 23, with only 14 occurrences (0.56%).

Interestingly, the last Easter that fell on March 23 was in 2008. Oh, if I had only understood the heady rarity of that date at the time! The next March 23 Easter won’t be until 2160!

You only start to get extremely rare dates for Easter with the earliest three (March 22 – 24) and latest three (April 23 – 25) possible dates. The right tail of the distribution is thicker than the left tail. This indicates that last three dates occur a bit more often than the earliest 3 dates.

To illustrate this, Easter falls on the earliest date of March 22 only 0.60% of the time. The last time was in 1818 and the next time will be 2285, which is a span of 467 years! On the other hand, Easter falls on the latest date (April 25) 1.03% of the time. The last such occurrence was in 1943, and the next will in 2038, or a span of 95 years. I hope to be around for that one!

Is a March Easter Unusual?

My suspicion is that whenever Easter falls within March, people think it’s early. Let’s use our friend the probability distribution plot to determine how often Easter occurs in March.

Probability distribution plot of Easter dates

The shaded area indicates that Easter occurs in March about 23% of the time, or about 1 out of every 4 years. So an Easter in March really isn’t that unusual. In fact, we only have to wait until 2016 when it occurs even earlier, on March 27.

Are There Patterns in the Easter Dates?

After assessing the distribution of Easter dates, I wondered if patterns exist. Specifically, if Easter occurs on date X in one year, when does it first occur on date X again? I’ll graph the frequency of the first recurrences below.

Chart of the number of years until an Easter date first recurs

The first thing that stands out in this graph is that large spike at 11 years. This bar indicates that for 1,141 Easters (45%), the date that Easter occurs in one year will first repeat in 11 years.

However, we are in a time period where more Easter dates repeat in 11 years than normal. As the picture below shows, seven of the next 10 Easters repeat in 11 years. But that’s unusually high. It’s always important to understand the larger context of your data before drawing conclusions!

Table of Easter recurrences

The other thing that stands out for me is the relative handful of years in which an Easter date can first repeat. For example, an Easter date cannot repeat in 2 years or 7 years, or any of the other large number of possibilities that are not on the graph. It’s definitely not a random distribution!

While the official definition for Easter really didn’t help clarify things for me, I found that normal data analysis methodology provides a very clear picture of when Easter occurs, and what is truly unusual.

↧

Real-life Data Analysis: How Many Licks to the Tootsie Roll Center of a Tootsie Pop?

March 27, 2013, 4:32 am

≫ Next: What Statistical Software Should You Choose: Three More Critical Questions

≪ Previous: When is Easter . . . for the next 2086 years?

by Cory Heid, guest blogger

Almost all of us have tried a Tootsie Pop at some point. I’m willing to bet that most of us also thought, “I wonder how many licks it does take to get to the center of the Tootsie Pop?” If you haven’t wondered about this, here’s the classic commercial that may get you more curious:

Personally, I was not very satisfied with the owl's answer of “3,” so I decided to continue the little boy’s quest to find the number of licks required to reach the center of a Tootsie Pop.

Research

Looking around the ‘net, I found that other studies done by student researchers at various universities have reached very different answers. These students, who represented some significant research institutions and engineering schools, used different licking methodologies and/or licking machines and licking experiments. As I mentioned, the results varied greatly. See for yourself:

Licking Machines:

Purdue University -364 licks

University of Michigan – 411 licks

Harvard University – 2255 licks

Licking Experiments:

Purdue University – 252 licks

Swarthmore College – 144 licks

University of Cambridge – 3481 licks

Since I lacked both the equipment and desire to build a licking machine, a simpler licking experiment seemed appropriate. I wanted to assess which of these factors would have the most impact on the effectiveness of the Tootsie-Pop licking:

Force of the lick
Temperature of the licker's mouth
pH Level of the licker's saliva
Solubility of the licker's saliva

Lab Work

Once I selected my factors, I incorporated them into an experiment. I chose simulate the different levels of each factor in the lab. For my force simulation, I placed my Tootsie Pop in a beaker with 150 mL of water and a magnetic stirrer. I placed the beaker on a hot plate that could spin the stirrer and create a circular motion in the water. I then pulled the Tootsie Pop out of the water every minute and measured the height and width of the pop on the thicker band around the pop. I did this each minute until the pop revealed a noticeable amount of chocolate (the elusive ‘center’).

This method had some issues. The chocolate-flavored Tootsie Pops made the water very murky, so I omitted them from the lab testing. I also could not track the speed of the circulating water. Instead, I used the speed dial’s indicator on the magnetic stirrer (1 being slowest speed, 10 being fastest speed). After several tests at speeds 1, 2, 4 and 6, I was able to gather an ample amount of data to perform some analysis.

I then moved on to the temperature tests. Like the force tests, these used a beaker, 150 mL of water, and a hot plate. I heated the water to different temperatures and measured the height and width of the Tootsie Pop every minute until reaching the center.

The next test involved pH. I used a beaker, and 150 mL of water as the basis of my tests. I added a small amount of either baking soda (to make the solution more basic) or vinegar (to make the solution more acidic). I then recorded and measured the height and width of the pop at regular intervals.

My final test, for solubility, consisted of putting the Tootsie Pop in a very large container containing a gallon of water. I dipped the pop in the water for one minute and took height and width measurements. In addition, I stirred the water to make sure it was diluted as possible and evenly distributed. My goal was to have enough data to analyze to help determine important factors in Tootsie Pop decay.

Time to Analyze the Data!

I organized the collected data by the four factors: Force, Temperature, pH and Solubility. I wanted to look at the difference of the decay amounts for height and width during each minute or time period that I recorded. I first ran some One-Way ANOVA’s on the various factors. Below is the ANOVA I ran for Temperature. I used 3 temps: 56 C, 62 C, and 39 C respectively.

One-Way ANOVA - Time vs Temp

While it does appear that there is a significant difference between temperature tests, the data for Temp 1 was taken at a temperature of 56 C (way above normal body temperature) and the measurements were taken about two minutes apart, instead of the standard one minute. My conclusion here is that temperature in the range of body temperatures really does not have an effect on ‘the number of licks.’ Unless you have a significant fever, in which case you might not be interested in a Tootsie Pop anyway.

For pH, I used acidic, basic, neutral and solubility (large amounts of liquid):

One Way ANOVA - pH

Clearly we have no significant difference between the pH tests, leading me to believe that the pH of our saliva should not factor into the number of licks.

For speed, I used the variable knob on my stir plate, using speeds 1, 2, 4 and 6, and grouped my results based on the speed setting:

One Way ANOVA - Speed

After doing an ANOVA on the total time it takes to reach the center and the corresponding speed, there appears to be a significant difference. But it should be noted that the pops used for speed 2 were incredibly perfect -- maybe the most perfect Tootsie Pops ever in both candy shape and Tootsie center location. Hence, this difference may have been due not to speed, but rather to a greater distance to the center.

Human Lickers

Seeing that the different variables of pH, temp and force probably do not have much of an effect on the dissolving of a Tootsie Pop, human lickers were brought in test this finding. I determined a list of possible other factors that could affect the number of licks to the center – such as force, temp pH, height, weight, gender, and the questions -- ‘Have you been chewing anything in the last half hour?’, ‘Have you had anything to drink in the last half hour?’ The beginning diameter of the Pop and the number of licks to the center were also recorded to track the number of licks. The center was also defined as ‘tasting chocolate.’ After 92 lickers reached the center, I found the following:

Descriptive statistics

As you can see, there is large variability in the number of licks. After cleaning for incomplete data (missing height, weight…etc.), I was able to trim the observations down to 70, which looked like this:

Descriptive Statistics

Even after the cleaning the data, there was not much difference. I concluded that the variability was not so much in the human lickers, but within the pop itself. Most pops would look something like that on the right, while the pop on the left would be considered a ‘Perfect Pop’:

Perfect and Imperfect Pops

Given slight offsets of the center, the side you choose would affect the number of licks. Overall, the number of licks I gathered does fit in the middle of the number of licks other researchers gathered, so I feel confident that my data collection was a success.

About the Guest Blogger:

Cory Heid is a student in applied mathematics at Siena Heights University in Adrian, Mich. He is interested in data analysis projects, as well as data-driven model building. Cory presented his Tootsie Pop findings at the 2013 Joint Mathematics Meetings in January, and at the Michigan Undergraduate Mathematics Conference in February.

Would you like to publish a guest post on the Minitab Blog? Contact publicrelations@minitab.com.

↧