Quantcast
Channel: Minitab | Minitab
Viewing all 828 articles
Browse latest View live

A Surgeon's View of Data-Driven Quality Improvement

$
0
0

There has been plenty of noisy disagreement about the state of health care in the past several years, but when you get beyond the controversies surrounding various programs and changes, a great deal of common ground exists.

hospital bedEveryone agrees that there's a lot of waste and inefficiency in the way we've been doing things, and that health care should be delivered as efficiently and effectively as possible. But while a lot of successful models exist for using data to improve quality, the medical field has been slower than many other industries to adopt such data-driven quality improvement (QI) methods.

We have been talking to physicians, nurses, administrators, and other professionals at health care organizations in the United States and other countries to get more insight into the challenges of using data to improve health care processes, and to learn how Minitab might be able to help.

Operating with a Scalpel—and Statistics

We had a particularly enlightening conversation with Dr. David Kashmer, chief of surgery for Signature Healthcare in Brockton, Mass.

In addition to being a surgeon, Kashmer is a Lean Six Sigma Black Belt. In the 10 years since earning his belt, he's become passionate about using QI methods to improve trauma and acute care surgery. He also helps fellow practitioners do it, too, and talks about his experiences at the Surgical Business Model Innovation blog.

Kashmer told us about the resistance he encountered when he first began using statistical methods in his practice: “I kept hearing, ‘This guy is nuts...what’s he even talking about?’” 

Nobody's saying that any more. Kashmer has shown that applying even basic statistical methods can yield big improvements in patient outcomes, and those once-skeptical colleagues are now on board. "When they saw the results from using statistical process control rather than typical improvement methods, they understood and began to appreciate their value," Kashmer said.

The Human Face of Health Care Quality

I've written previously about the language of statistics and how it can get in the way of our efforts to communicate what's really important about our analyses. Kashmer keyed in on similar themes when we asked him about the apparent reluctance among some in the medical profession to use data analysis for quality improvement. 

"The language of the motivation for using statistics—to guard against type 1 and type 2 errors—is lost on us," he said. "We focus more on what we think will help an individual patient in a particular situation. But when we learn how statistics can help us to avoid making a change when nothing was wrong with the patient, or to avoid thinking there wasn’t a problem when there was one…well, that’s when these techniques become much more powerful and interesting."

For Kashmer, the most compelling way to show the value of data analysis is to draw a direct connection to the benefits patients experience from an improved process. 

"Making decisions with data is challenging since it doesn't resonate with everyone," he told us. "Putting a human face on data and using it to tell a story that people can feel is key when talking about the true performance of our system."

Big Insights from a Little Data

Kashmer shared several stories with us about how using data-driven methods solved some tenacious problems. One thing that struck me was that even very straightforward analyses have had big impacts by helping teams see patterns and problems they otherwise would have missed.

In one case, an answer was found by simply graphing the data. 

"We felt we had an issue with trauma patients in the emergency department, but the median time for a trauma patient looked great, so the group couldn’t figure out why we had an issue," Kashmer explained. "So we used Minitab to see the distribution, and it was a nonnormal distribution that was much different than just a bell curve."

 histogram of time

Simply looking at the data graphically revealed why the team felt there was a problem despite the median.

"We saw that the median was actually a bit misleading—it didn’t tell the whole story, and that highlighted the problem nicely: the distribution revealed a tail of patients who were a lot worse when they stayed in the emergency department for over six hours, so we knew to focus on this long tail instead of on the median. Looking at the data this way let us see something we didn’t see before."

Read Our Full Interview

We'd like to thank Dr. Kashmer for talking with us, and for his efforts to help more healthcare organizations reap the benefits of data-driven quality improvement. He had much more to say than we can recap here, so if you're interested in using data to improve health care quality, I encourage you to read our full interview with Dr. Kashmer


Do Those Who Work Less, Work Best?

$
0
0

work comicI live with a German national, who often tells me that we Americans spend way too much of our lives at work. He also frequently comments that we work much less efficiently than Germans do, during the increased time we’re at work. 

Which reminds me—I need to pay my water bill online...

Okay, I’m back. Quick, wasn’t it? So convenient. Now, where was I? Oh, work habits.

After checking the hourly weather forecast, I created this bar chart in Minitab, using international labor data from the OECD.

usual weekly 2

These data seem to indicate that Americans generally work longer weekly hours than many western Europeans, including Germans. But we’re not the extreme. We work fewer weekly hours, on average, than workers in many other countries, including Colombia, Turkey, and Costa Rica1.

What about the actual number of hours worked over the entire year?2 You can use a time series plot to show trends for each country.

time series plot

Yikes! The complete data set with all the OECD countries really gums up the works.

Graphic Overload? Subset the Worksheet!

If you’ve been busy working (and/or reading LeBron James tweets), you might not know that Minitab now has a much quicker, easier way to subset data in version 17.3. This new interface feature is a godsend when you want to quickly graph only selected portions of a very large data set, without having to manually delete values in the worksheet or set up unwieldy formulas in the Calculator to define the values.

With the worksheet active, choose Data > Subset Data. Now just check the values for the data that you want in the new worksheet.

subset

 Then use the new worksheet to recreate the graph.

time series subset

In this subset of 9 countries, Germany (green diamonds) is the lowest series at the bottom, with the fewest annual average hours per worker. So there seems to be credence to what I’ve been hearing. The U.S. (green squares) is once again somewhere in the middle, almost identical with Japan. Mexico (yellow triangles) is the highest series on the graph, with the most annual average hours per worker.

The plot shows some interesting trends. In the early 2000s, Mexico and Chile were nearly the same. But the average annual hours of workers in Chile has decreased steadily over the last 10 years, while Mexico has remained almost the same. Also, many countries (particularly Ireland, Japan, and Germany) show a pronounced dip in average annual hours starting at about 2007/2008, which corresponds with the global recession.

Does increased work hours translate into increased production output? To explore this, I graphed GDP per capita for each country3 in relation to the average annual hours worked.

scatterplot no labels

Surprise! Generally, the fewer the hours worked, the greater the gross domestic product per capita. Now, here’s a thought. If your boss doesn’t know much about statistics and the relationship between correlation and causation, show this scatterplot to him or her. Then ask that your workload be reduced to boost company productivity.

While you're waiting for your boss' response, take a look at where Germany and the U.S. fall on the plot:

scatterplot with labels

Finally—some good news for seemingly overworked Americans. Compared to other countries, our GDP per capita is much higher than expected based on the number of hours that we work. It would be nice to to say that’s because we're so incredibly efficient and productive.

But, alas, GDP per capita is a tricky metric, heavily influenced by things like oil production, foreign investment, financial services, and so on, and not a direct indicator of worker efficiency. And even if some causal relation did exist between the two variables, it could be that a higher GDP per capita leads to reduced work hours, not the other way around.

So, obviously, a lot of follow-up research and statistical analysis is needed to flush out preliminary results from these exploratory graphical analyses. But I’ve already spent hours and hours and hours  working on this post. So I don’t have time to investigate this further.

Can you find other data online that supports or contradicts these results? (While you're there, check out these hilarious animal reactions to mirrors on YouTube...)

Notes

1. These averages reflect only full-time (≥30 hours a week), dependent employees (those working for a company, government, or institution, etc.)—not the self-employed. It only tracks hours worked on a main job. The OECD did not have complete 2014 data on usual weekly hours worked for all the countries in its organization, including Japan, Korea, Russia, Canada, and Brazil.

2. Some of these data were collected by the OECD from different sources. Therefore specific, direct comparison for a given year between two countries may be misleading. The OECD recommends evaluating overall results and trends in the data.

3. The data for Luxembourg was excluded from the graph because it was an outlier with high leverage that was not representative of OECD countries.

scatterplot with Lux

If GDP per capita were actually a direct indicator of worker efficiency (which it isn't), then the Luxembourgers would have top bragging rights for the global Nose-to-the-Grindstone Award. They have the world’s highest GDP per capita, despite being on the lower end of average annual work hours.

Predicting the 2016 NCAA Tournament

$
0
0

Probability. It's really the heart and soul of most statistical analyses. Anytime you get a p-value, you're dealing with a probability. The probability is telling you how likely it was (or will be) for an event to occur. It has numerous applications across a wide variety of areas. But today I want to focus on the probability of a specific event.

A basketball tournament.

I’ll be using the Sagarin ratings to determine the probability each team has of advancing in the NCAA tournament using a binary logistic model created with Minitab Statistical Software. You can find the details of how the probabilities are being calculated here.

2016 Final FourBefore we start, I’d also like to mention one other set of basketball ratings, called the Pomeroy Ratings. Both the Sagarin ratings and the Pomeroy ratings have shown to be pretty accurate in predicting college basketball games. But Ken Pomeroy always breaks down the tournament using his system. So instead of duplicating his numbers, I like to use the Sagarin ratings. But I’ll be sure to mention places where the two systems disagree, and you can select the one you want to go with!  

Alright, enough with the small talk. Let’s get to the statistics!

South

The following table has the probabilities each team in the South Region has of advancing in each round (up to the Final Four).

Team

2nd Round

Sweet 16

Elite 8

Final 4

(1) Kansas

99%

82%

65%

44%

(2) Villanova

96% 66% 41% 21%

(3) Miami FL

92% 50% 23%

9%

(5) Maryland

83%

51%

15%

7%

(6) Arizona

61%

32%

15%

6%

(7) Iowa

83%

31%

14%

5%

(4) California

75%

37%

10%

4%

(9) Connecticut

63%

13%

7%

2%

(11) Wichita St/Vanderbilt

39%

17%

6%

2%

(8) Colorado

37%

5%

2%

0.5%

(13) Hawaii

25%

7%

1%

0.2%

(12) South Dakota St

17%

5%

0.5%

0.1%

(10) Temple

17%

2%

0.4%

< 0.1%

(14) Buffalo

8%

1%

0.1%

< 0.1%

(15) UNC Asheville

4%

1%

<0.1%

< 0.1%

(16) Austin Peay

1%

<0.1%

< 0.1%

< 0.1%

Kansas is by far the favorite to come out of the South. Their probability is over twice that of the next most likely team. Kansas is also the highest rated team in both the Sagarin Ratings and the Pomeroy Ratings. However, even with all the considered, their probability of them going to the Final Four is still less than 50%. That means it's more likely that a team other than Kansas wins the South.

The next most likely team after Kansas is Villanova. But the selection committee did the Wildcats no favors. They'll likely face a very dangerous Iowa team in the 2nd round, and if they'll almost certainly face a good team if they get to the Sweet Sixteen. That's because the bottom half of the South bracket features six teams ranked in the Sagarin top 25 (Villanova, Miami, Arizona, Iowa, Vanderbilt, and Wichita State). It's absolutely stacked. Villanova is definitely the best team our of the six, but you really shouldn't be surprised if any one of those teams advance to the regional finals.

Things have the potential to get even more crazy if you look at the Pomeroy Ratings, which have Wichita State has the the 12th ranked team (ahead of Miami and Arizona)! They are 25h in the Sagarin ratings, but keep in mind for 3 of their early season losses, Wichita St was without their best player. The statistics don't know that, so Wichita St may even be better than their numbers indicate!

In the top half of the bracket, the 13 seed actually has a better chance of beating the 4 seed than the 12 seed has of beating the 5. In fact, the Sagarin ratings have California ranked 27th, which means they are a very overrated 4 seed (a theme you'll see coming for every Pac-12 team in the tournament). If you feel like picking an upset (especially if your bracket offers upset points) Hawaii over California isn't a bad choice at all. And the Pomeroy Ratings like Hawaii even more, giving them a 31.2% chance of winning as opposed to the 25% chance they have here. And even if you don't pick Hawaii in the first round, don't have California going too far. We see that the 5, 6, and 7 seeds all have better chances of making it to the Final Four than California.  

If you like picking upsets in your bracket, but still want to pick a good Final Four team, a good strategy here might be putting Kansas in the Final Four and going chaos in the rest of the region. Iowa over Villanova, Hawaii over Cal, either Arizona or Vanderbilt/Wichita State over Miami. Whatever you like!

West

Team

2nd Round

Sweet 16

Elite 8

Final 4

(2) Oklahoma

93%

71%

45%

27%

(1) Oregon

98%

68%

40%

21%

(3) Texas A&M

91%

62%

32%

17%

(4) Duke

85%

51%

27%

14%

(5) Baylor

71%

36%

17%

8%

(6) Texas

72%

29%

12%

5%

(9) Cincinnati

56%

20%

8.6%

3%

(10) VCU

57%

17%

6%

2%

(8) Saint Joseph's

44%

12%

5%

1%

(7) Oregon St

43%

10%

3%

0.8%

(12) Yale

29%

9%

3%

0.7%

(11) Northern Iowa

28%

7%

2%

0.3%

(13) UNC Wilmington

15%

4%

1%

0.1%

(15) CSU Bakersfield

7%

2%

0.2%

<0.1%

(14) Green Bay

9%

2%

0.2%

<0.1%

(16) Holy Cross/Southern

2%

0.1%

<0.1%

<0.1%

The west is by far the weakest region in the tournament. Oklahoma is the only team ranked in the Sagarin top 10 (they're ranked 7th). Oregon is 15th, Duke is 16th, and Texas A&M is 17th. That means this region is wide open, and you would be fine taking any one of those 4 teams to advance. One caveat about Duke: the Pomeroy ratings have them all the way down at 22, and they have UNC Wilmington 15 spots higher than Sagarin does. According to Pomeroy, Wilmington has a 28% chance of upsetting Duke in the first round. And when you consider that Duke is usually picked often in most office pools (because everybody knows their name) it could be wise to avoid advancing Duke too far in your bracket.

While we're on the subject of upsets, the 10 seed in this region is actually favored over the 7 seed. So go ahead and pick VCU, especially if your pool gives bonus points for upsets. And Yale is another team to watch out for. The 29% chance they have here isn't bad for a 12 seed, but Pomeroy likes them even more, giving them a 39% chance of beating Baylor. An Ivy league team has won a game in the tournament 2 out of the last 3 years (and only lost by 2 points the year they didn't win a game). Yale has a great chance to make that streak 3 of the last 4 years. 

And lastly, if you're going to pick a 1 seed to lose before the Sweet Sixteen, this is the region to do it. Cincinnati would have about a 33% chance of upsetting the Ducks if they played them. That's best chance you're going to have in any potential 1 vs 8/9 game. The only problem is, Cincinnati could absolutely lose their first game against Saint Joseph's. So it all comes down to how big of a risk you want to take!

East

Team

2nd Round

Sweet 16

Elite 8

Final 4

(1) North Carolina

98%

83%

53%

36%

(3) West Virginia

84%

65%

45%

22%

(4) Kentucky

89%

50%

22%

13%

(2) Xavier

92%

60%

29%

11%

(5) Indiana

90%

47%

20%

11%

(6) Notre Dame

54%

19%

6%

2%

(10) Pittsburgh

50%

19%

6%

2%

(7) Wisconsin

50%

19%

6%

2%

(11) Michigan/Tulsa

47%

13%

5%

1%

(8) USC

50%

9%

2%

0.7%

(9) Providence

50%

9%

2%

0.7%

(14) SF Austin

16%

7%

 2%

0.3%

(13) Stony Brook

11%

2%

 0.2%

< 0.1%

(12) Chattanooga

10%

1%

 0.1%

< 0.1%

(15) Weber St

8%

1%

0.1%

< 0.1%

(16) Florida Gulf Coast

2%

0.3%

< 0.1%

< 0.1%

Talk about top heavy! The top 5 seeds in this region are all ranked inside the Sagarin top 14. And the lowest ranked of those 5 teams is actually Xavier, coming in at 14. Xavier is a good team, but they're the worst 2 seed in the tournament. In fact, the winner of Pittsburgh/Wisconsin will have a good shot of knocking off Xavier before the Sweet Sixteen.

This all works out very well for West Virginia, who is ranked #6 in the Sagarin ratings despite being a 3 seed. Not only are they matched up with the weakest 2 seed in the tournament, they also have the weakest 6 seed in Notre Dame. However, where West Virginia got very unlucky is their first round game. Stephen F. Austin is a very good 14 seed. In fact, their Sagarin ranking of #57 is higher than every other 13 seed, three of the four 12 seeds, and even two of the 11 seeds! (Tulsa and Northern Iowa). And if that's not enough for you, the Pomeroy Ratings have them at #33 and give them a 30% chance of beating West Virginia. West Virginia can absolutely make a run to the final four, but they could lose their opening game too! Good luck deciding what to do with them in your bracket!     

I mentioned earlier that Notre Dame is the worst 6 seed in the tournament. If Michigan wins the play in game, that 6/11 match up is basically a coin flip (the probabilities above for the 11 seed use Michigan's numbers). And if you want to find a Cinderella in this region, listen to this. According the Pomeroy ratings, Stephen F. Austin would actually be favored against Notre Dame, Michigan, or Tulsa! And even though Sagarin wouldn't favor SF Austin over Michigan or Notre Dame (although they would over Tulsa) they would have them as only 3 point underdogs! It's just a shame that SF Austin got such a hard 1st round opponent. But even despite that, they have a realistic chance of having that glass slipper fit.

In the top half of the region, things look pretty bleak for Cinderella. North Carolina should make the Sweet Sixteen and Kentucky and Indiana should both win their opening round games. But then things get tricky. Kentucky is ranked #8 in Sagarin and Indiana is #11. That's one heck of a 2nd round game! And the winner could absolutely beat North Carolina in the next round.

Midwest

Team

2nd Round

Sweet 16

Elite 8

Final 4

(2) Michigan St

96%

81%

61%

37%

(1) Virginia

99%

79%

52%

30%

(5) Purdue

85%

58%

27%

14%

(3) Utah

83%

43%

14%

5%

(4) Iowa St

84%

35%

12%

5%

(11) Gonzaga

54%

30%

11%

4%

(6) Seton Hall

46%

23%

7%

2%

(9) Butler

59%

14%

6%

2%

(10) Syracuse

52%

10%

3%

0.8%

(7) Dayton

48%

9%

3%

0.6%

(8) Texas Tech

41%

7%

2%

0.5%

(12) Arkansas LR

15%

5%

0.7%

0.1%

(14) Fresno St

17%

3%

0.4%

< 0.1%

(13) Iona

17%

2%

0.3%

< 0.1%

(15) Mid Tenn St

4%

0.9%

0.1%

< 0.1%

(16) Hampton

1%

0.1%

< 0.1%

< 0.1%

A lot was said about Michigan St not getting a 1 seed, but they may be better off where they are. Utah is the weakest 3 seed in the tournament, and that's who  Michigan St is paired with. In fact, the highest ranked team Michigan St would have to play before the regional finals is the 11 seed! Gonzaga is ranked #22 in the Sagarin Ratings, and is actually favored over Seton Hall in the opening round. And if they win, the Sagrain Ratings would favor them over Utah too! If your pool gives you bonus points for upsets, you should absolutely have Gonzaga in your Sweet Sixteen. And if you don't get upset points, having any of Seton Hall/Gonzaga/Utah in the Sweet Sixteen is fine. But considering literally any of them could get there, the safe play would then have Michigan St beating whoever you picked.

I've mentioned several times now how the Pomeroy rankings have some of the lower seeds ranked higher than Sagarin does (Wichita St, Hawaii, SF Austin). The same is true here for Arkansas Little Rock and Iona. In fact, across the board all of the "mid-major" schools appear to be ranked higher in Pomeroy. And not just a few spots, in almost all cases the difference in the rankings is 15-20 spots. So while Sagarin gives both Iowa St and Purdue about an 85% chance of winning their opening game, their probabilities of winning are 75% and 71% in Pomeroy. I've been comparing these two rankings for a couple years now, and this is the first time something like this has happened. I'm not sure what to make of it other than Pomeroy will be predicting more first round upsets than Sagarin. So it'll be interesting after the tournament to go back and see which one was more accurate (although it's not like a handful of games are going to give us a definitive answer either way). 

Anyway, if you're looking for a dark horse Final Four in this tournament, Purdue could be your answer. They are #9 in Sagarin and #10 in Pomeroy, which is very high for a 5 seed. According to Sagarin, they'd be about a 3 point favorite over Iowa St in the 2nd round, and only a 2 point underdog to Virginia in the Sweet Sixteen. And they beat Michigan St earlier in the season too. You could do worse than Purdue in the Final Four. Of course, they could also lose in the first round to Arkansas Little Rock, especially if the Pomeroy ratings are more accurate! But, that's the chance you taking going with a 5 seed.

If you want to play it safe, the way to go is Virginia vs. Michigan State in the regional finals. That game would be a coin flip, so feel free to take either team. Michigan State has actually knocked Virginia out of the NCAA tournament the last two years. Could the third time be the charm for Virginia, or will the Spartans continue to haunt them?

Final Four

Team

Final Four

Semifinal

Champion

(1) Kansas

44%

30%

18%

(2) Michigan St

37%

22%

13%

(1) North Carolina

36%

20%

12%

(1) Virginia

30%

16%

9%

(2) Villanova

21%

13%

7%

(2) Oklahoma

27%

13%

6%

(3) West Virginia

22%

11%

6%

(1) Oregon

21%

8%

3%

(5) Purdue

14%

6%

3%

(4) Kentucky

13%

6%

3%

The top 5 ranked teams in the Sagarin ratings are Kansas, Michigan State, North Carolina, Virginia, and Villanova. So it's no surprise that those teams have the top 5 probabilities of winning the entire tournament. But it's really wide open. Kansas is the best team, and only has a 18% chance of winning the title. That's a far cry from the 41% chance Kentucky had as the top team last year. So when you pick your champion, try and think about who other people entering your pool will choose. You want to try and pick a champion that few other people will pick (but still has a realistic shot). Do you live in Michigan? Then you should probably avoid Michigan State. Live in ACC country? Then maybe don't go with North Carolina or Virginia. And in general, you should avoid the high seeds that are overrated (Oregon, Xavier, Utah). But other than that, the choice is yours. Good luck and let the madness begin! 

Taking the F out of the WTF test

$
0
0

Many moons ago, I wrote a post entitled "So, Why Does the World Trust Minitab?" In that post, I alluded to upcoming improvements to one of our statistical tests. At that time, I could not give any details, because the project was shrouded in secrecy. But now that Minitab has released version 17 of our statistical software, the story of Bonnet's test can finally be told.

Suppose you want to compare the standard deviations of two samples. Previous releases of Minitab, including Release 16, gave you two handy tests, the F-test and Levene's test. The F-test is so-named because it usually fails you.1 In theory, the F-test is appropriate as long as your data are normally distributed. In practice, however, the F-test tends to be a bit on the emo side. That is, it is too sensitive (to departures from normality) to be useful for much.

Levene’s test was developed to help overcome this extreme sensitivity to nonnormality. Levene's test is sometimes called the W50 test. I sometimes call it the WTF test, for reasons that shall become apparent shortly. Take a look at the actual results below from a test of two standard deviations that I actually ran in Minitab 16 using actual data that I actually made up:

Ratio of the standard deviations in Release 16

The ratio of the standard deviations from samples 1 and 2 (s1/s2) is 1.414 / 1.575 = 0.898. This ratio is our best "point estimate" for the ratio of the standard deviations from populations 1 and 2 (Ps1/Ps2). The ratio is less than 1, which suggests that Ps2 is greater than Ps1. So far, so good.

Now, let's have a look at the confidence interval (CI) for the population ratio. The CI gives us a range of likely values for Ps1/Ps2. (The CI below labeled "Continuous" is the one calculated using Levene's method):

Confidence interval for the ratio in Release 16

Notice that the CI includes, .... er, doesn't include? .... shouldinclude !

What in Gauss' name is going on here ?!?  The range of likely values for Ps1/Ps2 (1.046 to 1.566) doesn't include the point estimate (0.898)?!?  In fact, the CI suggests that Ps1/Ps2 is greater than 1. Which suggests that Ps1 is actually greater than Ps2. But the point estimate suggests the exact opposite! Which suggests that I might be losing my mind. Or that something odd is going on here. Or both.

Well, I am happy to say that I am not losing my mind. (Not this time, at least.) One reason that Levene's method is robust to nonnormality is that Levene's method isn't actually based on the standard deviation. Instead, Levene’s method is based on a statistic called the mean absolute deviation from the median, or MADM. The MADM is much less affected by nonnormality and outliers than is the standard deviation. And even though the MADM and the standard deviation of a sample can be very different, the ratio of MADM1/MADM2 is nevertheless a good approximation for the ratio of Ps1/Ps2.

The only problem is that in extreme cases, outliers can affect the sample standard deviations so much that s1/s2 can fall completely outside of Levene's CI. Awwwwkwwwaaaaaard.

Fortunately, in Minitab 17, statisticians have made things considerably less awkward. (This may be the only time in your life that you'll hear the phrase "statisticians have made things considerably less awkward.") One of the brave-hearted folks in our R&D department toiled against all odds, and at considerable personal peril to solve this enigma. The result is an effective, elegant, and non-enigmatic test that we call Bonnet's test:

Confidence interval in Release 17

Like Levene's test, Bonnet's test can be used with nonnormal data. But unlike Levene's test, Bonnet's test is actually based on the actual standard deviations of the actual samples. And you know what that means. Yes, my friends: gone are the days of embarrassing discrepancies between the CI and the ratio—with Minitab 17, you can compare standard deviations with confidence! And as if that weren't enough, Minitab 17 also includes a handy dandy summary plot. All the information you need to interpret your test results, conveniently located right in front of your face.

Summary plot in Release 17

So what are you waiting for? Mesmerize your manager, confound your colleagues, and stun your stakeholders with Minitab 17. Operators are standing by. 

------------------------------------------------------------

 

 

1 So, that bit about the name of the F-test—I kind of made that up. Fortunately, there is a better source of information for the genuinely curious. Our white paper, Bonett's Method, includes all kinds of details about these tests and comparisons between the CIs calculated with each. Enjoy.

 
return to text of post

 

 

Submitting an A+ Presentation Abstract, Even About Statistics

$
0
0

For the majority of my career with Minitab, I've had the opportunity to speak at conferences and other events somewhat regularly. I thought some of my talks were pretty good, and some were not so good (based on ratings, my audiences didn't always agree with either—but that's a topic for another post). But I would guess that well over 90% of the time, my proposals were accepted to be presented at the conference, so even though I may not have always delivered a home run on stage, I at least submitted an abstract that was appealing to the organizers.

speakerAs chair of the Lean and Six Sigma World Conference this year, I reviewed every abstract submitted and was able to experience things from the other side of the process. Now, with the submission period upon us for the Minitab Insights Conference, I thought I'd share some insights on submitting an A+ speaking submission.

Tell A Story

People are emotional beings, and a mere list of the technical content you plan to present doesn't engage the reviewers any more than it will an audience. Connecting the topic to some story sparks an emotional interest and desire to know more. Several years ago, I presented on the multinomial test at a conference, a topic that probably would have elicited yawns if I'd pitched it as the technical details of how to perform this hypothesis test. Instead I submitted an abstract asking if Virgos were worse drivers, as stated by a well-known auto insurer, and explaining that by answering the question we can also learn how to determine if defect rates were different among multiple suppliers or error rates were different for various claims filers. Want to know if they are, I asked. Accept my talk!  They did. 

Nail the Title

This can be the most difficult step, but it helps to remember that organizers use the program to promote the conference and draw attendees. A catchy title that elicits interest from prospective attendees can go a long way. So, what makes for a good title? I like to reference the story I will tell and not directly state the topic. For the talk I describe above, the title was "Are Virgos Unsafe Drivers?" Note that from the title, someone considering attending has no idea yet that the talk will be about a statistical test. But they are curious and will read the description. More important, the talk seems interesting and the speaker seems engaging, and those are the criteria attendees use to decide what talks to attend. An alternate title that is more descriptive but not catchy,"The Proper Application of the Multinomial Test of Proportions," sounds like a good place to take a nap.

Reference Prior Experience

If the submission process allows it (the Minitab Insights Conference does), reference prior speaking engagements and even better, provide links to any recordings that may exist of you speaking. Even if it is not a formal presentation, anything that enables to organizers to get a feel for your personality when speaking is a huge plus. It is somewhat straightforward to assess whether a submitted talk would be of interest to attendees, but assessing whether speakers are engaging is difficult or impossible, even though ultimately it will make a huge impact on what attendees think of the conference. Even better, you don't actually have to be an excellent presenter—the organizer's fear is that you might be a terrible speaker! Simply demonstrating that you can present clearly and engage an audience goes a long way.

Don't Make Mistakes

It is best to assume that whoever is evaluating you is a complete stranger. Imagine you ask for something from a stranger and what they send you is incomplete or contains grammatical error or typos: what is your impression of that person? If they are submitting to speak, my suspicion is that they will likely have unprofessional slides and possibly even be unprofessional when they speak. Further, the fact that they would not take the time to review and correct the submission tells me that they are not serious about participating in the event.

Write the Presentation First

Based on experience, I believe this is not done often—but that is a mistake. True, no one wants to put hours into a presentation only to have it get rejected, but that presentation could still be used elsewhere, so the time is not necessarily wasted. Inevitably, when you prepare a presentation new insights and ways of presenting the information come to light that greatly improve what will be presented and the story that will be told. So to tell the best story in the submission, it is immensely valuable to have already made the presentation slides! In fact, if I sorted every presentation I ever gave into buckets labeled "good" and "not so good," they would correspond almost perfectly to whether I had already made the presentation when I submitted the abstract.

Ask a Friend

Finally, approach someone you trust (and who is knowledgeable in the relevant subject area) to give you an honest opinion. Ask them what they think. Is the topic of interest to the expected attendees? Is it too simple? Too complicated? Will the example(s) resonate? After all, you don't want the earliest feedback you receive on your proposal to be from the person(s) deciding whether to accept the talk.

So that's my advice. It may seem like a big effort simply to submit an abstract, but everything here goes to good use as you prepare to actually give the presentation. It's better to put in more work at the start and get to put that work to good use later, than to put in a little work that goes to waste. Do these things and you'll be in a great position to be accepted and deliver a fantastic presentation!

Celebrating Women in Statistics

$
0
0

Did you know that March is Women’s History Month? The celebration was started in the 1980s by the U.S. government to pay tribute to generations of influential women.

To celebrate, here’s a roundup of just some of the most influential women in statistics:

Florence Nightingale

florence nightengale

While Florence Nightingale is known as the founder of modern nursing, you might not know that she is also a celebrated statistician. When tasked to serve at a British hospital in Turkey during the Crimean War, she worked to address many issues, including an overworked medical staff, neglected hospital hygiene, and inadequate supplies. By the time she and her team left Turkey in July 1856, the hospitals were improved and death rates reduced significantly. How’d she do it?

She was able to analyze medical data she collected and present it graphically, thus making it clear that lack of sanitation was the main cause of wartime death—not the short supply of medicines or lack of food. She also used data analysis to reveal that in peacetime, soldiers in England died at twice the rate of civilians, confirming that military health service inadequacies were causing more damage than was previously thought. Nightingale’s work led to health policy reforms that saved the lives of countless British soldiers.

You can read more about Nightingale in a past blog post I wrote: An Unlikely Statistician: Florence Nightingale

Gertrude Cox

Known as the “First Lady of Statistics,” Gertrude Cox was an American statistician and founder of the Department of Experimental Statistics at North Carolina State University. Later, she was appointed director of both the Institute of Statistics of the Consolidated University of North Carolina and the Statistics Research Division of North Carolina State University.

Her research dealt primarily with experimental design, and while she began to teach courses on design of experiments in 1934 at Iowa State, her design material wasn’t officially published until 1950 when she collaborated with W. G. Cochran to write Experimental Designs.

In 1949, she became the first woman elected into the International Statistics Institute, and she was elected President of the American Statistical Association in 1956.

Check out this short bio on Cox from the National Academy of Sciences—I found Cox’s early life especially interesting!

And for more on designing statistical experiments and how Minitab can help, take a look at http://support.minitab.com/minitab/17/getting-started/designing-an-experiment/

Janet Norwood

Janet Norwood was the first female commissioner of the United States Bureau of Labor Statistics, first appointed in 1979 by Jimmy Carter and reappointed twice by Ronald Reagan. Her contributions focused primarily on government statistics, including economic and political data, employment and unemployment statistics, and the Consumer Price Index.

In 1989 she was elected president of the American Statistical Association, and also held offices at the International Statistical Institute and the Urban Institute, among many other professional associations.

The American Statistical Association posted the following article on Norwood’s life: http://www.amstat.org/about/statisticiansinhistory/bios/norwoodjanet.pdf

Who are other influential women in statistics? Tell us in the comments!

Illustration of Florence Nightingale courtesy of Wellcome Images, used under Creative Commons 4.0 license.

 

Gage R&R Metrics: What Do They All Mean?

$
0
0

When you analyze a Gage R&R study in statistical software, your results can be overwhelming. There are a lot of statistics listed in Minitab's Session Window—what do they all mean, and are they telling you the same thing?

If you don't know where to start, it can be hard to figure out what the analysis is telling you, especially if your measurement system is giving you some numbers you'd think are good, and others that might not be. I'm going to focus on three different statistics that are often confused when reading Gage R&R output

The first thing to look at is the %Study Variation and the %Contribution.

gage r&R output

You could look at either of them, as they are both telling you the same thing, just in a different way. By definition, the %Contribution for a source is 100 times the variance component for that source divided by the Total Variation variance component. This calculation has the benefit of making all of your sources of variability add up to 100%, which can make things easy to interpret.

The %Study Variation does not sum up to 100% like %Contribution, but it does have other benefits. %Contribution is based on the variance component that is specific to the values you observed in your study, not what the population of values might be. In contrast, the %Study Variation, by taking 6*standard deviation, extrapolates out over the entire population of values (based on the observed values, of course).

The bottom line is that both % Study Variation and %Contribution are telling you, in simple terms, about the percentage of variation in your process attributable to that particular source. 

What about %Tolerance? What does that allow us to look at? While %StudyVar and %Contribution compare the variation from a particular source to the total variation, the %Tolerance compares the amount of variation from a source to a specified tolerance spread. This can lead to seemingly conflicting results, such as getting a low %StudyVar while having a high %Tolerance. In this case, your gage system may be introducing low levels of variability compared to other sources, but the amount of variation is still too much based on your spec limits. The %Tolerance column may be more important to you in this case, as it's more specific to your actual product and its spec limits. 

So, a short summary:

%Contribution: The percentage of variation due to the source compared to the total variation, but with the added benefit that all sources will sum to 100%

%StudyVar: The percentage of variation due to the source compared to the total variation, but with the added benefit of extrapolating beyond your specific data values. 

%Tolerance: The percentage of variation due to the source compared to your specified tolerance range.

The %StudyVar is perhaps more reliant on having a good quality study and can be used when your goal is improving the measurement system. On the other hand %Tolerance can be used when the focus is on the measurement system being able to do it’s job and classify parts as in or out of spec.

Each of these statistics provide valuable information, and how you weigh each of these largely depends on what you're looking to get out of your study.

The American Statistical Association's Statement on the Use of P Values

$
0
0

P values have been around for nearly a century and they’ve been the subject of criticism since their origins. In recent years, the debate over P values has risen to a fever pitch. In particular, there are serious fears that P values are misused to such an extent that it has actually damaged science.

In March 2016, spurred on by the growing concerns, the American Statistical Association (ASA) did something that it has never done before and took an official position on a statistical practice—how to use P values. The ASA tapped a group of 20 experts who discussed this over the course of many months. Despite facing complex issues and many heated disagreements, this group managed to reach a consensus on specific points and produce the ASA Statement on Statistical Significance and P-values.

I’ve written previously about my concerns over how P values have been misused and misinterpreted. My opinion is that P values are powerful tools but they need to be used and interpreted correctly. P value calculations incorporate the effect size, sample size, and variability of the data into a single number that objectively tells you how consistent your data are with the null hypothesis. You can read my case for the power of P values in my rebuttal to a journal that banned them.

The ASA statement contains the following six principles on how to use P values, which are remarkably aligned with my own. Let’s take a look at what they came up with.

  1. P-values can indicate how incompatible the data are with a specified statistical model.
  2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

I discuss these ideas in my post How to Correctly Interpret P Values. It turns out that the common misconception stated in principle #2 creates the illusion of substantially more evidence against the null hypothesis than is justified. There are a number of reasons why this type of P value misunderstanding is so common. In reality, a P value is a probability about your sample data and not about the truth of a hypothesis.

  1. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.

In statistics, we’re working with samples to describe a complex reality. Attempting to discover the truth based on an oversimplified process of comparing a single P value to an arbitrary significance level is destined to have problems. False positives, false negatives, and otherwise fluky results are bound to happen.

Using P values in conjunction with a significance level to decide when to reject the null hypothesis increases your chance of making the correct decision. However, there is no magical threshold that distinguishes between the studies that have a true effect and those that don’t with 100% accuracy. You can see a graphical representation of why this is the case in my post Why We Need to Use Hypothesis Tests.

When Sir Ronald Fisher introduced P values, he never intended for them to be the deciding factor in such a rigid process. Instead, Fisher considered them to be just one part of a process that incorporates scientific reasoning, experimentation, statistical analysis and replication to lead to scientific conclusions.

According to Fisher, “A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.”

In other words, don’t expect a single study to provide a definitive answer. No single P value can divine the truth about reality by itself.

  1. Proper inference requires full reporting and transparency.

If you don’t know the full context of a study, you can’t properly interpret a carefully selected subset of the results. Data dredging, cherry picking, significance chasing, data manipulation, and other forms of p-hacking can make it impossible to draw the proper conclusions from selectively reported findings. You must know the full details about all data collection choices, how many and which analyses were performed, and all P values.

Comic about jelly beans causing acne with selective reporting of the results

In the XKCD comic about jelly beans, if you didn’t know about the post hoc decision to subdivide the data and the 20 insignificant test results, you’d be pretty convinced that green jelly beans cause acne!
  1. A p-value, or statistical significance, does not measure the size of an effect or the importance of an effect.
  2. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

I cover these ideas, and more, in my Five Guidelines for Using P Values. P-values don’t tell you the size or importance of the effect. An effect can be statistically significant but trivial in the real world. This is the difference between statistical significance and practical significance. The analyst should supplement P values with other statistics, such as effect sizes and confidence intervals, to convey the importance of the effect.

Researchers need to apply their scientific judgment about the plausibility of the hypotheses, results of similar studies, proposed mechanisms, proper experimental design, and so on. Expert knowledge transforms statistics from numbers into meaningful, trustworthy findings.


How to Remove Leading or Trailing Spaces from a Data Set

$
0
0

Leading and trailing spaces in a data set are like termites in your house. If you don’t realize they are there and you don’t get rid of them, they’re going to wreak havoc.

keyboardHere are a few easy ways to remove these pesky characters with Minitab Statistical Software prior to analysis.

Data Import

If you’re importing data from Excel, a text file, or some other file type:

  1. Choose File > Open and select your Excel file, text file, etc.
  2. Click Options and select Remove nonprintable characters and extra spaces.
  3. Click OK.

Note: This feature was introduced in Minitab 17.3. If you have an older version of Minitab 17, use Help > Check for Updates. If you have Minitab 16 or earlier—or you don't have Minitab at all—you can download a free 30-day trial.

The Calculator

Suppose you already have your data in Minitab, located in column C1:

  1. Choose Calc > Calculator.
  2. In Store result in variable, enter a blank column (e.g. C5), or you can overwrite an existing column.
  3. In Expression, enter TRIM(C1).
  4. Click OK.

If you also want to remove all non-printable characters using the Calculator, CLEAN is available as well.

Calculator

And that’s all there is to it.

Minitab 17.3 Takes a Quantum Leap with Data Import

$
0
0

If you watched television between 1989 and 1993, then you might have had the chance to see original episodes of the television series Quantum Leap.

The premise was that a scientist involved in a time-travel experiment gets sent into the bodies of people from the past and has the opportunity to improve the future with his actions. Most of us might not ever get to do something as dramatic as steal a million dollars from a crook to return to swindled investors, but making data import easier is still a pretty good way to make the future better.

Keep your column titles in the right row

We’ve all done it. We’ve all messed up and copied our titles into our data rows.

copy_paste_error_with_talking - steelejc15425's library

(null)

It’s like being the co-pilot on a plane with two newlyweds when you find out that the wife is suffering from appendicitis, your instruments don’t function because you’re in the Bermuda Triangle, and the pilot who’s supposed to be flying the plane is suffering from post-traumatic stress disorder.

In the past, you would have thrown up your hands and started over. But what if you had the power to prevent the tragedy in the first place? What if you could put right what once went wrong?

Here’s what happens if you paste column titles into the first data row in Minitab 17.3:

copy_paste_error_minitab_17.3_with_talking - steelejc15425's library

(null)

Not only does Minitab stop and let you know what’s happening, it also puts your column labels and your data in the right place so that you can get on with your analysis. It’s kind of like having a holographic mentor with a supercomputer who can swoop in and tell you when an unscrupulous pageant photographer is about to try to blackmail a contestant he has embarrassing photos of.

Match case as you open an Excel file

I’ve intentionally made an error in this dataset so that I can show you what Minitab can do, but it’s an easy enough mistake to make. Like when you lend your car to a friendly bartender so he can give someone a ride and the car blows up because the mob is trying to kill you.

Say you notice, or even suspect, that not all of your data are typed with the same mix of capital and little letters.

dialog

Normally, Minitab would treat them as different data points because they have different characters. But when you click Options, you can choose to have Minitab correct case mismatches for you. That’s almost as nice as having someone give you instructions about where to punch when you’re temporarily blinded in a fight with an angry biker so that you can come out the victor.

Minitab 17.3 will also let you remove blank rows and make column lengths equal, which can help you be ready to analyze your data even faster.

It may not help you go back in time to prevent a child from shooting his neighbor who he mistakes for a Russian invader during the Cuban Missile Crisis, but we think it’s pretty good stuff.

 

A DOE in a Manufacturing Environment (Part 1)

$
0
0
I used to work in the manufacturing industry. Some processes were so complex that even a very experienced and competent engineer would not necessarily know how to identify the best settings for the manufacturing equipment.

You could make a guess using a general idea of what should be done regarding the optimal settings, but that was not sufficient. You need very precise indications of the correct process parameters, considering the specificities of the manufacturing equipment.

Or you could guess the best settings, assess the results, and then try to further improve the settings by modifying one factor at a time. But this could be a lengthy process, and you still might not find the optimal settings.

Even if you have the optimal settings for current conditions, to ensure that the plant remains at the leading edge, new technologies will need to be adopted -- introducing unknown situations and new problems. Ultimately, you need a comprehensive model that helps you understand, very precisely, how the system works. You obviously cannot get that by adjusting one factor at a time and experimenting in a disorganized way.

Engineers need a tool that can help them build models in a very practical, cost-effective, and flexible way, so that they can be sure that whenever a new product needs to be manufactured the right process settings are quickly identified. Design of Experiments (DOE) is the ideal tool for that.

Choosing the Right Design

Wafer
Polishing processes in the semiconductor industry are one example of a very complex and critical manufacturing process. The objective is to produce wafers that are as flat as possible in order to maintain high yield rates. High first-pass yield rates are absolutely essential to ensure that a plant is cost-effective, and that the customer is satisfied with product quality.

PlanarizationI was involved in a DOE in the polishing sector of a microelectronics facility. "Liner" residues after wafer polishing (under-polishing at the centre) made the yield rate and consistency for a particular material unsatisfactory. The degree of uniformity had to be improved since inadequate polishing/planarization impacted subsequent manufacturing processes.

Six process parameters were considered. These parameters will be familiar to polishing engineers and they are easy to adjust (Down Force, Back Force, Carrier Velocity and Table Velocity). Other parameters (Type of Pad, Oscillations) were more innovative, but they did not necessarily involve a significant increase in the process costs.Process

The main objective was achieving a higher degree of uniformity, but it was also important to maintain a high level of productivity.

Our first step was to identify the parameter settings to test. To remain realistic, they could not be set too far apart, but to easily identify if a parameter had a real effect they could not be set too close together, either. For each parameter we selected two levels, making our experiment a two-level DOE. This enabled us to reduce the amount of tests that needed to be performed, and we assumed that the effects within our levels were likely to be mostly linear.

Next we needed to consider the experimental design and the number of tests we would need to perform. This is very important because the amount of time available to do the experiments is often very limited, and also because we want to minimize the number of parts that are manufactured during the tests, since those parts cannot be sold to customers.

The table below is very helpful in choosing the right design. It is part of the Factorial DOE menu in Minitab (Go to > Stat > DOE > Factorial design > Create a factorial design >. In the dialog box that is displayed, choose Display available designs.)
  

 Display available designs

With six parameters, we could go for a 64 runs full design (26), in the table, but that would be too expensive. We could choose the half fraction (26-1) design, but 32 runs is still too expensive. So we chose a 16-run, quarter-fraction design (26-2).

The green zone, in the table above, is the safest choice, but this is an expensive option, so our 16-run design is the best compromise in terms of cost and risk.

The 8 runs design (26-3) would have been even cheaper, but the red color indicates a higher degree of risk: with just 8 runs, some main factor effects are confounded with two-factor interactions making the analysis more difficult and riskier. The 16-run design is safer (as the yellow color suggests), and because main effects are un-confounded with other main effects, we can estimate their effect without any risk.

Two-factor interaction effects in this design are sometimes confounded with other two-factor interactions. But if such an interaction effect is significant, then we would expect the interactions in which statistically significant main factors are involved to be more likely to be significant than the other confounded interactions that do not involve statistically significant main factors. This is called the ‘heredity’ principle. This is a very helpful principle to identify the significant interaction when it is ‘confounded’ with other two-factor interactions.

I'll explain more about this DOE in my next post!
 
 

A DOE in a Manufacturing Environment (Part 2)

$
0
0

In my last post, I discussed how a DOE was chosen to optimize a chemical-mechanical polishing process in the microelectronics industry. This important process improved the plant's final manufacturing yields. We selected an experimental design that let us study the effects of six process parameters in 16 runs.

Analyzing the Design

Now we'll examine the analysis of the DOE results after the actual tests have been performed. Our objective is to minimize the amount of variability (minimize the Std Dev response) to achieve better wafer uniformity. At the same time we would like to minimize cycle times by increasing the Removal Rate of the process (maximize the V A°/Min response).

Therefore, we are dealing with a multi-response DOE.

Design array

 
From the DOE data shown above, we first built a model for the Removal Rate (V A°/Min): Down Force, Carrier velocity and Table velocity had a significant effect, as the Pareto chart below clearly shows. The Down Force*Carrier velocity and the Carrier velocity*Table velocity two-factor interactions were also significant. We therefore eliminated the remaining parameters, which were not significant, from the model, gradually.
 

Removal Rate Pareto
  
A process expert later confirmed that such effects and interactions were logical and could have been expected from our current knowledge of this process. The graph below shows the main effect plots (Removal Rate response).
  Interactions

   
We had chosen a fractional DOE, so several two-factor interactions were confounded with one another. However, the Down Force*Carrier velocity and the Carrier velocity*Table velocity interactions made more sense from a process point of view; besides, these two interactions were associated with very significant main effects.

Interaction Plot for V A/Min

We built a second model for the standard deviation response. The standard deviation does not necessarily follow a normal distribution, therefore a log transformation is needed to transform Standard deviation data into normality. 

Carrier velocity, Pad and Table velocity as well as the Pad*Carrier velocity interaction had significant effects on the logarithm of the Standard deviation response, as shown in the Pareto graph below. Again this was quite logical and confirmed our process knowledge.

 Std Dev Pareto
   
The main factor effects are shown in the graph below (Log of Standard Deviation).
  
Main effects

The interaction effect is shown in the graph below (Logarithm of Standard Deviation response).

Interaction Plot for LN

We then compared the predictions from our two models to the real observations from the tests, in order to assess the validity of our model. We were interested in identifying observations that would not be consistent with our models. Outliers due to changes in the process environment during the tests often occur, and may bias statistical analyses.

The residual plots below represent the differences between real experiment values and predictions based on the model. Minitab provides four different ways to look at the residuals, in order to help assess whether these residuals are normally and randomly distributed.
  
Residuals for Std Dev
  
In these diagrams, the residuals seem to be generally random (shown by the right hand plots, which display the residuals against their fitted values and their observation order) and normally distributed (displayed by the probability plot on the left).
 
But the observation in row 16 (the blue point that has been selected and brushed in the plot below) looks suspicious as far as the residuals for the removal rate (V A°) response are concerned. It is positioned farther away from the other residual values. We used the brushing functionality in Minitab to get more information about this point (see the values in the table below). We then eliminated observation number 16, and reran the DOE analysis...but the final Removal Rate model was still very similar to the initial one.
  
Residuals for Removal Rate

From our two final models, we used the process optimization tool within Minitab to identify a global process optimum. Two terms were included in the Standard Deviation model (V Carrier: E, Pad: D, and V Table: F), and three terms plus some interactions were included in the Removal Rate (V A°) model (Down Force: A, V Carrier: E and V Table: F).
  
Finding the best compromise while considering conflicting objectives is not always easy, especially when different models contain the same parameters. Reducing the standard deviation to improve yields was more important than minimizing cycle times. Therefore, in the Minitab optimization tool, we assigned a score of 3 in terms of importance to the standard deviation response and a score of only 1 to the V A° (Removal Rate) response.

optimize responses
  
Define optimization

Optimization tool
 
The Minitab Optimization Tool results indicate that the Down Force needs to be maximized to increase the Removal Rate (reduce Cycle time), whereas Carrier and Table velocities as well as Pad need to be kept low in order to achieve a smoother, uniform surface (a small standard deviation).  

Confirmation tests run at the settings recommended by the optimization tool were consistent with the conclusions that had been drawn from the DOE.

Conclusion

This DOE proved to be a very effective method to identify the factors that had real effects by making them clearly emerge from the surrounding process noise. This analysis thus gave us both a very pragmatic and accurate approach to adjusting and optimizing the polishing process. The conclusions from the DOE could easily be extended to the operating process.

 

Are You Putting the Data Cart Before the Horse? Best Practices for Prepping Data for Analysis, ...

$
0
0

Most of us have heard a backwards way of completing a task, or doing something in the conventionally wrong order, described as “putting the cart before the horse.” That’s because a horse pulling a cart is much more efficient than a horse pushing a cart.

cart before horseThis saying may be especially true in the world of statistics. Focusing on a statistical tool or analysis before checking out the condition of your data is one way you may be putting the cart before the horse. You may then find yourself trying to force your data to fit an analysis, particularly when the data has not been set up properly. It’s far more efficient to first make sure your data are reliable and then allow your questions of interest to guide you to the right analysis.

Spending a little quality time with your data up front can save you from wasting a lot of time on an analysis that either can’t work—or can’t be trusted.

As a quality practitioner, you’re likely to be involved in many activities—establishing quality requirements for external suppliers, monitoring product quality, reviewing product specifications and ensuring they are met, improving process efficiency, and much more.

All of these tasks will involve data collection and statistical analysis with software such as Minitab. For example, suppose you need to perform a Gage R&R study to verify your measurement systems are valid, or you need to understand how machine failures impact downtime.

Rather than jumping right into the analysis, you will be at an advantage if you take time to look at your data. Ask yourself questions such as:

  • What problem am I trying to solve?
  • Is my data set up in a way that will be useful to answering my question?
  • Did I make any mistakes while recording my data?

Utilizing process knowledge can also help you answer questions about your data and identify data entry errors. A focus on preparing and exploring your data prior to an analysis will not only save you time in the long run, but will help you obtain reliable results.

So then, where to begin with best practices for prepping data for an analysis? Let’s look no further than your data.

Clean your data before you analyze it

Let’s assume you already know what problem you’re trying to solve with your data. For instance, you are the area supervisor of a manufacturing facility, and you’ve been experiencing lower productivity than usual on the machines in your area and want to understand why. You have collected data on these machines, recording the amount of time a machine was out of operation, the reason for the machine being down, the shift number when the machine went down, and the speed of the machine when it went down.

The first step toward answering your question is to ensure your data are clean. Cleaning your data before you begin an analysis can save time by preventing rework, such as reformatting data or correcting data entry errors, after you’ve already begun the analysis. Data cleaning is also essential to ensure your analyses and results—and the decisions you make—are reliable.

With the latest update to Minitab 17, an improved data import helps you identify and correct case mismatches, fix improperly formatted columns, represent missing data accurately and in a manner that is recognized by the software, remove blank rows and extra spaces, and more. When importing your data, you see a preview of your data as a reminder to ensure it’s in the best possible state before it finds its way into Minitab. This preview helps you spot mistakes you have made in your data collection, and automatically corrects mistakes you don’t notice or that are difficult to find in large data sets.

Data Import

Minitab offers a data import dialog that helps you quickly clean and format your data before importing into the software, ensuring your data are trustworthy and allowing you to get to your analysis sooner.

If you’d rather copy and paste your data from Excel, Minitab will ensure you paste your data in the right place. For instance, if your data have column names and you accidentally paste your data into the first row of the worksheet, your data will all be formatted as text—even when the data following your column names are numeric! With Minitab 17.3, you will receive an alert that your data is in the wrong place, and Minitab will automatically move your data where it belongs. This alert ensures your data are formatted properly, preventing you from running into the problem during an analysis and saving you time manually correcting every improperly formatted column.

Copy Paste Warning

Pasting your Excel data in the first row of a Minitab worksheet will trigger this warning, which safeguards against improperly formatted columns.

This is only the beginning! Minitab makes it quick and painless to begin exploring and visualizing your data, offering more insights and ease once you get to the analysis. If you’d like to learn additional best practices for prepping your data for any analysis, stay tuned for my next post where I’ll offer tips for exploring and drawing insights from your data!

What Are the Odds? Chutes and Ladders

$
0
0

Allow me to make a confession up front: I won't hesitate to beat my kids at a game.

playing-chutesMy kids are young enough that in pretty much any game that is predominantly determined by skill and not luck, I can beat them—and beat them easily. This isn't some macho thing where it makes me feel good, and I suppose is only partially based in wanting them to handle both winning and losing well. It's just how I play, and in any event most kids games are designed with enough luck that a young child has a chance to beat an adult. I've observed other parents who let their kids win at these games of skill—maybe not every time, but enough to make the game seem more fair than it really is.

For both me and my bigger-hearted friends, Chutes and Ladders is a fantastic game. Why? It's not 10% luck, or 50%, or even 90% luck. Chutes and Ladders is 100% luck. You can't be good at it or bad at it. You can't try to win or lose. I can have no mercy and want to win as much as anything and it won't help me at all. And a parent who can't stand the idea of beating a kid at a game can (I hope) let that stress go, because they couldn't let their opponent win if they wanted to.

Modeling human decision-making with statistics can be very tough. If I wanted to make a simulation of three people playing Monopoly, for example, you can imagine the complexity in doing so accurately. But modeling a game like Chutes and Ladders is much easier. So much easier, in fact, that I went ahead and did it in Minitab Statistical Software using a concept known as Markov Chains.

So let's learn a little bit about the odds associated with Chutes and Ladders...

How the Odds Work in Chutes and Ladders

Here is a standard Chutes and Ladders board:

chutes

The game is simple: you roll a die (or use a spinner) that we can assume gives you an equal probability of obtaining a 1, 2, 3, 4, 5, or 6, and that's the number of spaces you move from your current position. If you stop on a space with a ladder that extends up, you go up the ladder. If you stop on a space with a chute that goes down, you descend. Ultimately, you must land exactly on the 100 space to win the game.

The Fastest (and Slowest) Win

The fewest turns you can possibly use to reach the 100 space is 7, and this only occurs in 0.151% of player-games, or about 1 in 622. Now those are the odds for each player—if you have multiple players in the game (and I hope no one plays Chutes and Ladders alone), then the odds are higher that at least one player finishes in the minimum 7 moves. Specifically, the odds are 1-(1-0.00151)n, where n is the number of players. For example, with three players the odds have jumped to a whopping 0.452%, or about 1 in 221 games.

Technically you could have an unlimited number of turns without ever reaching the 100 space, so there is no theoretical "most" turns it can take. But obviously as you take more turns, the odds of not winning keep decreasing. So instead let's just look at some cutoffs:

  • 90% of player-games will finish before turn 72
  • 95% of player-games will finish before turn 89
  • 99% of player-games will finish before turn 128

Again, that's for just one player. For multiple players here is a graph of how many spins there are until a winner and the odds:

So if you're not looking for a drawn-out game, it helps immensely to recruit more players! With four players, the 99th percentile drops from 128 spins to just 44 spins.

How Many Moves Are Expected?

Now that the extreme cases have been covered, consider the more basic question of how many moves are expected before the game is won. First it's worth considering whether an expected value - the mean - is truly desired, or the median value. For one player, it is simple to answer both:

  • The mean number of moves before a single player wins is about 39.
  • The median number of moves is 32.

Based on this it can be seen that the distribution of moves is skewed right and the extremely long games are raising the mean number of moves well above the median.

But what about a game with more than one player?

  • A game with two players would end in a median 23 moves and a mean 26.3 moves.
  • A game with three players would end in a median 20 moves and a mean 21.7 moves.
  • A game with four players would end in a median 18 moves and a mean 19.3 moves.

As the central limit theorem would predict, the distribution becomes less skewed as the number of players increases, and therefore the mean is closer to the median. So while increasing players decreases the median number of moves only a little bit, it greatly reduces the chances that a game will require a large number of turns.

All Spaces Are Not Created Equally

I once found myself maybe 20 spins into a game, and yet still on the bottom row. Space 6, to be exact. Not surprisingly, if you examine the board above, you find a chute that ends on space 6. As it turns out, not only is there a path that can get you all the way to the 100 spot in just a few spins, there is also a path that can be devastating. Specifically, you can take a spin while on space 97, and within a few spins find yourself back to space 6 thanks to an unfortunate series of chutes.

The chutes and ladders on the board mean some spaces are much more likely to have a player on them at any given time than others. Consider a ladder: there are multiple spaces you can spin from and land at the bottom of a particular ladder, but everyone who lands on that ladder ends up on the same spot. To illustrate the distribution of spaces occupied after a certain number of spins by a single player, I created a bubble plot where the size of each bubble corresponds to the probability of a player being on that space after each of the first 40 spins:

From the plot it can be seen that although the odds are low, even after 40 spins you might find yourself still on space 6. Rows of bubbles larger than those around them correspond to the end points of the various chutes and ladders, giving them higher probability than other points.

A Statistician's Take on Chutes and Ladders

While certain games like blackjack or poker might allow a player to improve their ability by understanding the odds, Chutes and Ladders is entirely based on luck and no such advantage can be gained. However, that doesn't mean there is nothing to learn by examining them. For example, if you really dislike the game but have a young child who always wants to play, you now know that encouraging another parent or a sibling to join in can really help prevent a never-ending game!

 

Game-play photo by Ben Husmann, used under Creative Commons 2.0. 

Linking Minitab to Excel to Get Fast Answers

$
0
0

Since opening a new office in Phoenix to support our customers on the West Coast, some evenings in Minitab technical support feel busier than others. (By evenings, I mean after 5:30 p.m. Eastern time, when the members of our tech support team in Pennsylvania go home for the day, and I become an office of one.)

The variability in terms of days that felt extremely busy versus days that didn’t seemed unpredictable, so I decided to keep track of that information in an Excel spreadsheet, which I’ve been updating each evening:

After gathering this information for several months, I used the data to make a few graphs in Minitab to see if any particular days were busiest. The graphs were fun, but not exactly what I needed. I wanted an easy way to make Minitab produce the graphs automatically each morning, so that they reflect the most up-to-date information.

In this post, I’ll show you the steps I took to link my Excel file to my Minitab worksheet and how I automated the generation of the graphs. You can do the same thing with any data you record regularly in Excel spreadsheets. 

Creating a DDE Link from Excel to Minitab

The first step was to create a DDE link from Excel to Minitab. To do that, I highlighted and copied a range of cells from Excel, beginning with the first row of data, and extending well beyond my last row of data (I went down to row 500):

After copying the data from Excel, I navigated to my Minitab worksheet, clicked in the column where I want to link the data, and then used Edit > Paste Link:

After creating the link, the data is automatically imported from Excel into Minitab:

Since I have three columns to link from Excel to Minitab, I repeated the copy/paste process again for the two other columns, until all three columns were linked. I also added titles to the columns in my Minitab worksheet:

Now with the links in place, any time I update my Excel file, the data is automatically updated in Minitab.  Since the data is being transferred from Excel to Minitab, one important thing to remember is that for these links to continue working, Excel must be opened before opening Minitab each day.

Adding a Macro to the DDE Links

As a next step, I created a Minitab macro with the commands needed to manipulate the data that is imported and generate the graphs.

After saving the commands for the graphs I wanted to create in a GMACRO titled busydays.mac, I used the Edit menu shown below to add my macro to my DDE link:

The Manage Links menu shows the links for each column in the order in which the columns were linked.  First I linked C1, then C2, and then C3, so the last link listed corresponds to C3, which is the last column of data that is imported.  Therefore, I’ll add my macro to that link so that my graphs will be generated after all the data is imported by highlighting that option and clicking the Change button:

After opening the link, I just added the macro to the Commands field—the % symbol tells Minitab to look for the Busydays macro in my default macro location.  Finally I clicked the Change button to save the change to the link:

As a final step, I saved the Minitab project file with all the links that I added.

Now each morning when I come to the office, I open the Excel file first, then open my Minitab project file and I just watch the magic happen:


Best Way to Analyze Likert Item Data: Two Sample T-Test versus Mann-Whitney

$
0
0

Worksheet that shows Likert dataFive-point Likert scales are commonly associated with surveys and are used in a wide variety of settings. You’ve run into the Likert scale if you’ve ever been asked whether you strongly agree, agree, neither agree or disagree, disagree, or strongly disagree about something. The worksheet to the right shows what five-point Likert data look like when you have two groups.

Because Likert item data are discrete, ordinal, and have a limited range, there’s been a longstanding dispute about the most valid way to analyze Likert data. The basic choice is between a parametric test and a nonparametric test. The pros and cons for each type of test are generally described as the following:

  • Parametric tests, such as the 2-sample t-test, assume a normal, continuous distribution. However, with a sufficient sample size, t-tests are robust to departures from normality.
  • Nonparametric tests, such as the Mann-Whitney test, do not assume a normal or a continuous distribution. However, there are concerns about a lower ability to detect a difference when one truly exists.

What’s the better choice? This is a real-world decision that users of statistical software have to make when they want to analyze Likert data.

Over the years, a number of studies that have tried to answer this question. However, they’ve tended to look at a limited number of potential distributions for the Likert data, which causes the generalizability of the results to suffer. Thanks to increases in computing power, simulation studies can now thoroughly assess a wide range of distributions.

In this blog post, I highlight a simulation study conducted by de Winter and Dodou* that compares the capabilities of the two sample t-test and the Mann-Whitney test to analyze five-point Likert items for two groups. Is it better to use one analysis or the other?

The researchers identified a diverse set of 14 distributions that are representative of actual Likert data. The computer program drew independent pairs of samples to test all possible combinations of the 14 distributions. All in all, 10,000 random samples were generated for each of the 98 distribution combinations! The pairs of samples are analyzed using both the two sample t-test and the Mann-Whitney test to compare how well each test performs. The study also assessed different sample sizes.

The results show that for all pairs of distributions the Type I (false positive) error rates are very close to the target amounts. In other words, if you use either analysis and your results are statistically significant, you don’t need to be overly concerned about a false positive.

The results also show that for most pairs of distributions, the difference between the statistical power of the two tests is trivial. In other words, if a difference truly exists at the population level, either analysis is equally likely to detect it. The concerns about the Mann-Whitney test having less power in this context appear to be unfounded.

I do have one caveat. There are a few pairs of specific distributions where there is a power difference between the two tests. If you perform both tests on the same data and they disagree (one is significant and the other is not), you can look at a table in the article to help you determine whether a difference in statistical power might be an issue. This power difference affects only a small minority of the cases.

Generally speaking, the choice between the two analyses is tie. If you need to compare two groups of five-point Likert data, it usually doesn’t matter which analysis you use. Both tests almost always provide the same protection against false negatives and always provide the same protection against false positives. These patterns hold true for sample sizes of 10, 30, and 200 per group.

*de Winter, J.C.F. and D. Dodou (2010), Five-Point Likert Items: t test versus Mann-Whitney-Wilcoxon, Practical Assessment, Research and Evaluation, 15(11).

What Are Degrees of Freedom in Statistics?

$
0
0

lionAbout a year ago, a reader asked if I could try to explain degrees of freedom in statistics. Since then,  I’ve been circling around that request very cautiously, like it’s some kind of wild beast that I’m not sure I can safely wrestle to the ground.

Degrees of freedom aren’t easy to explain. They come up in many different contexts in statistics—some advanced and complicated. In mathematics, they're technically defined as the dimension of the domain of a random vector.

But we won't get into that. Because degrees of freedom are generally not something you need to understand to perform a statistical analysis—unless you’re a research statistician, or someone studying statistical theory.

And yet, enquiring minds want to know. So for the adventurous and the curious, here are some examples that provide a basic gist of their meaning in statistics.

The Freedom to Vary

First, forget about statistics. Imagine you’re a fun-loving person who loves to wear hats. You couldn't care less what a degree of freedom is. You believe that variety is the spice of life.

Unfortunately, you have constraints. You have only 7 hats. Yet you want to wear a different hat every day of the week.

7 hats

On the first day, you can wear any of the 7 hats. On the second day, you can choose from the 6 remaining hats, on day 3 you can choose from 5 hats, and so on.

When day 6 rolls around, you still have a choice between 2 hats that you haven’t worn yet that week. But after you choose your hat for day 6, you have no choice for the hat that you wear on Day 7. You must wear the one remaining hat. You had 7-1 = 6 days of “hat” freedom—in which the hat you wore could vary!

That’s kind of the idea behind degrees of freedom in statistics. Degrees of freedom are often broadly defined as the number of "observations" (pieces of information) in the data that are free to vary when estimating statistical parameters.

Degrees of Freedom: 1-Sample t test

Now imagine you're not into hats. You're into data analysis.

You have a data set with 10 values. If you’re not estimating anything, each value can take on any number, right? Each value is completely free to vary.

But suppose you want to test the population mean with a sample of 10 values, using a 1-sample t test. You now have a constraint—the estimation of the mean. What is that constraint, exactly? By definition of the mean, the following relationship must hold: The sum of all values in the data must equal n x mean, where n is the number of values in the data set.

So if a data set has 10 values, the sum of the 10 values must equal the mean x 10. If the mean of the 10 values is 3.5 (you could pick any number), this constraint requires that the sum of the 10 values must equal 10 x 3.5 = 35.

With that constraint, the first value in the data set is free to vary. Whatever value it is, it’s still possible for the sum of all 10 numbers to have a value of 35. The second value is also free to vary, because whatever value you choose, it still allows for the possibility that the sum of all the values is 35.

In fact, the first 9 values could be anything, including these two examples:

34, -8.3, -555, -92, -1, 0, 1, -22, 99
0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9

But to have all 10 values sum to 35, and have a mean of 3.5, the 10th value cannot vary. It must be a specific number:

34, -8.3, -555, -92, -1, 0, 1, -22, 99  -----> 10TH value must be 61.3
0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 ----> 10TH value must be 20.5

Therefore, you have 10 - 1 = 9 degrees of freedom. It doesn’t matter what sample size you use, or what mean value you use—the last value in the sample is not free to vary. You end up with n - 1 degrees of freedom, where n is the sample size.

Another way to say this is that the number of degrees of freedom equals the number of "observations" minus the number of required relations among the observations (e.g., the number of parameter estimates). For a 1-sample t-test, one degree of freedom is spent estimating the mean, and the remaining n - 1 degrees of freedom estimate variability.

The degrees for freedom then define the specific t-distribution that’s used to calculate the p-values and t-values for the t-test.

t dist

Notice that for small sample sizes (n), which correspond with smaller degrees of freedom (n - 1 for the 1-sample t test), the t-distribution has fatter tails. This is because the t distribution was specially designed to provide more conservative test results when analyzing small samples (such as in the brewing industry).  As the sample size (n) increases, the number of degrees of freedom increases, and the t-distribution approaches a normal distribution.

Degrees of Freedom: Chi-Square Test of Independence

Let's look at another context. A chi-square test of independence is used to determine whether two categorical variables are dependent. For this test, the degrees of freedom are the number of cells in the two-way table of the categorical variables that can vary, given the constraints of the row and column marginal totals.So each "observation" in this case is a frequency in a cell.

Consider the simplest example: a 2 x 2 table, with two categories and two levels for each category:

 

Category A

Total

Category B

        ?

  

       6

 

 

      15

Total

     10

      11

      2

It doesn't matter what values you use for the row and column marginal totals. Once those values are set, there's only one cell value that can vary (here, shown with the question mark—but it could be any one of the four cells). Once you enter a number for one cell, the numbers for all the other cells are predetermined by the row and column totals. They're not free to vary. So the chi-square test for independence has only 1 degree of freedom for a 2 x 2 table.

Similarly, a 3 x 2 table has 2 degrees of freedom, because only two of the cells can vary for a given set of marginal totals.

 

Category A

 Total

Category B

         ?

        ?

 

      15

 

 

 

      15

Total

      10

      11

     9

       30

If you experimented with different sized tables, eventually you’d find a general pattern. For a table with r rows and c columns, the number of cells that can vary is (r-1)(c-1). And that’s the formula for the degrees for freedom for the chi-square test of independence!

The degrees of freedom then define the chi-square distribution used to evaluate independence for the test.

chi square

The chi-square distribution is positively skewed. As the degrees of freedom increases, it approaches the normal curve.

Degrees of Freedom: Regression

Degrees of freedom is more involved in the context of regression. Rather than risk losing the one remaining reader still reading this post (hi, Mom!), I'll  cut to the chase. 

Recall that degrees of freedom generally equals the number of observations (or pieces of information) minus the number of parameters estimated. When you perform regression, a parameter is estimated for every term in the model, and and each one consumes a degree of freedom. Therefore, including excessive terms in a multiple regression model reduces the degrees of freedom available to estimate the parameters' variability. In fact, if the amount of data isn't sufficient for the number of terms in your model, there may not even be enough degrees of freedom (DF) for the error term and no p-value or F-values can be calculated at all. You'll get output something like this:

regression output

If this happens, you either need to collect more data (to increase the degrees of freedom) or drop terms from your model (to reduce the number of degrees of freedom required). So degrees of freedom does have real, tangible effects on your data analysis, despite existing in the netherworld of the domain of a random vector.

Follow-up

This post provides a basic, informal introduction to degrees of freedom in statistics. If you want to further your conceptual understanding of degrees of freedom, check out this classic paper in the Journal of Educational Psychology by Dr. Helen Walker, an associate professor of education at Columbia who was the first female president of the American Statistical Association. Another good general reference is by Pandy, S., and Bright, C. L., Social Work Research Vol 32, number 2, June 2008, available here.

3 Ways to Examine Data Over Time

$
0
0

Did you know that Minitab provides several tools you can use to view patterns in data over time? If you want to examine, say, monthly sales for your company, or even how the number of patients admitted to your hospital changes throughout the year, then these tools are for you!

1. Time Series Plot

Time series plots are often used to examine daily, weekly, seasonal or annual variations, or before-and-after effects of a process change. They’re especially useful for comparing data patterns of different groups. For example, you could examine monthly production for several plants for the previous year, or employment trends in different industries across several months.

Here’s an example of a time series plot that shows the monthly sales for two companies over two years:

http://support.minitab.com/en-us/minitab/17/time_series_plot_def.png

This simple plot reveals a lot about the sales of these two companies. We can conclude that company A’s growth was slow, but was steadily rising over these two years. Company B’s sales started lower than company A’s, but shot up and surpassed company A by the second year.

It’s easy to create time series plots in Minitab – just choose Graph > Time Series Plot, and you’ll be off and running. Check out this article to learn about the different types of time series plots you can create in Minitab, or this past blog post I wrote on weather data and time series plots.

2. Area Graph

An area graph evaluates contributions to a total over time. They display multiple time series stacked on the y-axis against equally spaced time intervals on the x-axis. Each line on the graph is the cumulative sum so that you can see each series' contribution to the sum and how the composition of the sum changes over time.

For example, you could examine the quarterly sales of three different car models over two years, or employment trends in four different industries over several months.

And here’s an example of an area graph you could use to examine the number of cardiac inpatients and outpatients admitted over the past 12 months:

area graph

The graph shows that both inpatients and outpatients follow a similar trend, and it also suggests a seasonal effect: the number of patients admitted to the hospital is higher in the winter months.

To create area graphs in Minitab, choose Graph > Area Graph. For step-by-step instructions, check out this article.

3. Scatterplot with a connect line

You’ll want to create a scatterplot with a connect line if your data were collected at irregular intervals or are not in chronological order in the Minitab worksheet. For example, in this scatterplot, you can see that as the time in days increases, the weight of the fruit on the tree also increases:

scatterplot

You can create a scatterplot with a connect line in Minitab by choosing Graph > Scatterplot > With Connect Line. If you need a refresher on scatterplots, check out this article from Minitab Support. 

Better Home$ and Baseboards: Using Data Distributions to Set a List Price

$
0
0

People say that I overthink everything. I've given this assertion considerable thought, and I don't believe that it is true. After all, how can any one person possibly overthink every possible thing in just one lifetime?

For example, suppose I live 85 years. That's 2,680,560,000 seconds (85 years x 365 days per year x 24 hours per day x 60 min per hour x 60 seconds per minute). I'm asleep about a third of the time, so that leaves just 1,787,040,000 seconds to ponder a nearly infinite variety of things. This morning, I paused for about 2 seconds to ruminate about a gray hair. ("Hey, that hair wasn't gray yesterday.") At a rate of 1 cogitation every 2 seconds, I would have time in life to mull over only 893,520,000 items.

That's a plethora, for sure. But this number doesn't seem so big when you consider the large (though shrinking) number of not-yet-gray hairs on my head, or the vast number of ways that you can use Minitab Statistical Software to improve quality in your organization. So, to those who say that I overthink everything, after much deliberation I am confident that you are mistaken.

houseBut I do overthink some things. Take my house...please. After much blood (literal), sweat (literal), and tears (not telling), I am finally ready to list my house for sale. (If you're in the market for a lovely 4-bedroom home, nestled in the heart of State College, Pennsylvania, I have just the house for you.)

The other day, a gaggle of realtors (I think that's the collective noun for realtors) inspected the property, and each submitted their personal estimate of what the house is worth. Being a numbers guy (and being anxious to know how much I could get for the dump ol' homestead), I was excited to see the results.

Imagine my disappointment when my realtor gave me only summary data!

Ladies and gentlemen, the data you are about see are real; only the values have been transformed, to protect the innocent (a.k.a., the guy who bought way more house than he ever should have, or ever will again).

Highest valuation $460,000 Lowest valuation $425,000 Average $450,000 Number of realtors 12 Written comments "Great large rooms, bright, nice windows, loved the decks!"
"Nice, presents well!"
"Baseboards are blotchy/scratched and need to be painted."

Desperate questions crowded my frantic mind as I struggled to process the surprisingly sparse information:

  • What happened to the rest of the data!?!
  • How many realtors thought the house is worth $460,000? Just one? I can't tell!
  • Is $425,000 an outlier? Did the other realtors take all the chocolate chip cookies that I baked and leave poor 425 grumpy and snackless?
  • Do I really need to paint the baseboards!?!

I was deeply distraught by this dearth of data, this omission of observations, this not-enough-ness of numbers. So, I asked my realtor if I could see the raw data. Her response shocked me: "My assistant threw out the individual responses. These valuations are just gut feelings. Don't overthink it."

What? "Threw out the individual responses"!?!?!

"Don't overthink it"?!?!? 

"Paint the baseboards"?!?!?!?!

I had planned to use the realtor valuations to help me come up with a list price. I was concerned because the mean of different distributions can be the same, even if the shapes of the distributions are wildly different. For example, each sample in this histogram has a mean of 4. Obviously, the mean alone doesn't tell you anything about how the observations are distributed.

Differently shaped distributions, each with a mean of 4

Also, the mean itself probably wouldn't be a good list price. I'm not trying to appeal to the average buyer; I'm trying to appeal to those special few buyers who actually like the house and are willing to bid more for it. On the other hand, if I pick a number that is too high, I could out-price even the high-bidders and I might get no offers.

What to do, what to do?

I had done a little reading about Monte Carlo simulations. And I recalled that data simulations were invaluable when we designed the new test and confidence interval for 2 variances in Minitab Statistical Software. (You can read more than most people ever want to know about those simulations here.) So I decided to try some simple simulations to see what I could learn about possible sample distributions that fit the summary statistics I was given.

First, a quick note about my methodology. For simplicity, I assigned each observation 1 of 15 discrete values: $425,000, $427,500, $430,000, ..., up to $460,000. Each hypothetical distribution includes 12 observations and has a mean of $450,000 (within rounding error). Each distribution includes at least one observation at $425,000 (the reported minimum) and at least one observation at $460,000 (the reported maximum). Values on the graph are in units of $1,000 (for example, 425 = $425,000). Reference lines are included on the histograms to show the following statistics:

Mn = the mean, which is always equal to $450,000
Md = the median
Mo = the mode
Q3 = the 3rd quartile (also called the 75th percentile)

Simulated Sample Data

My first guess when I saw the summary data was that the distribution of the realtor evaluations was probably left-skewed, so I simulated that first.

Left skewed scenario

In this scenario, most of the valuations are clustered at the high end, with fewer valuations in the middle, and even fewer valuations at the low end. This is my favorite scenario, because the most frequent response (the mode) is $460,000, which is the highest value in the sample. If the real distribution looked like this, I'd be comfortable choosing $460,000 as my list price because I'd know that 3 of the 12 realtors think the house is worth that price.

Next I wondered what it would look like if there was a major disagreement among the realtors. So I worked up this bimodal scenario.

Bimodal scenario

In order to maintain a mean of $450,000, I could not include very many observations on the low end of the spectrum. So most of the valuations in this scenario fall on the high end. But—and this is a big but—in this scenario, 3 different realtors actually gave the house the minimum valuation. I would definitely want to know why those realtors priced the house so differently from the others. It could be that they noticed something that the other realtors did not. In this scenario, I can't really come up with a reasonable list price until I find out why there are two distinct peaks.

Next, I wondered what the data might look like if the realtors were feeling blasé about the price. This flat-looking distribution is my statistical interpretation of realtor ennui.

Flat distribution scenario

Again, in order to maintain a mean of $450,000, I could not put many observations on the low side. In fact, I included only the one minimum observation on the low side, which makes that observation an outlier. If I didn't already know that this outlier was just Mr. Blotchy Baseboards having a bad day, I'd need to investigate. The other valuations are distributed fairly evenly between $445,000 and $460,000. In a case like this, it seems like the 3rd quartile (Q3) might be a reasonable choice. By definition, at least 25% of the observations in a distribution are greater than or equal to Q3. If 25% of potential buyers think the house is worth $455,000, then I'll have a decent chance of getting an offer quickly at that price.

I also wondered what the data might look like if most of the realtors were in close agreement on the price.

Peaked distribution scenario

In this scenario, most the valuations are grouped closely together near the mean. The minimum valuation is again an outlier. The maximum valuation also appears to be an outlier. I definitely would not base my list price on the maximum valuation because it does not seem representative. The mode is a disappointing $450,000, but Q3 is a little higher at $452,500.

Just for the heck of it, I tried one final scenario—a right-skewed distribution.

Right skewed scenario

Again, Mr. Blotchy Baseboards is an outlier. This is another disappointing scenario because the mode is $450,000. But at least Q3 is $454,400, which is a little higher than Q3 for the peaked scenario.

Here is a recap of the list prices I would choose for each simulated data set (in decreasing order):

ScenarioList Price Left Skewed $460,000 Flat $455,000 Right Skewed $454,400 Peaked $452,500 Bimodal Undetermined

I'm still mad at my realtor for throwing away perfectly good data. But I am feeling better about choosing a list price for my house. I would like to think that the left-skewed scenario is closest to the truth. But even if it is not, the lowest list price that I came up with was $452,500, which isn't much different. The bimodal scenario is problematic, but since I don't know if the actual data were bimodal, I kind of have to ignore that one.

I will probably go with the second highest list price of $455,000. In the end, it's just gut feel, right? I don't want to overthink it.

What If Major League Baseball Had a 16-Game Season?

$
0
0

When it comes to statistical analyses, collecting a large enough sample size is essential to obtaining quality results. If your sample size is too small, confidence intervals may be too wide to be useful, linear models may lack necessary precision, and control charts may get so out of control that they become self-aware and rise up against humankind.

NFL and MLB Logos

Okay,that last point may have been over-exaggerated, but you get the idea. 

However, sometimes collecting a large sample size is easier said than done. Financial or time constraints often limit the number of observations we can collect. And in the world of sports, there is no better example of this than the NFL.

Football is a violent sport, so the players need a week to rest and recover between games. This time constraint limits the regular season to only 16 games. This is very small compared to the other major American leagues—hockey, basketball, and baseball. The NHL and NBA both play a 82-game season, while MLB plays 162 games!

But we never consider the sample size when we consider the best and worst teams in the NFL. It's not uncommon to see teams with sub-par records come back and have a great record the following year, or vice versa. We'll often credit/blame coaches and quarterbacks, but did you ever hear a sports analyst just say "Hey, sometimes crazy things can happen over a 16-game sample"? And we're almost at the point in the MLB baseball season where most teams have played 16 games.

That makes me wonder, what would baseball look like if they only played 16 games?

Looking at Major League Baseball as a 16-Game Season

I took the previous 10 seasons and recorded every MLB team's record in their first 16 games. I also looked at their final record to get a good estimate of their "true" winning percentage. The fitted line plot below shows the relationship between a baseball team's winning percentage in their first 16 games and their final winning percentage.

Fitted Line Plot

The relationship isn't completely random, as a higher winning percentage in your first 16 means you're more likely to have a better final winning percentage. But it's not a very strong relationship, as only 20.2% of the variation in a team's final winning percentage is explained by their winning percentage in the first 16 games.

Observations toward the bottom right show teams that started off with a very strong record but ended up in the bottom of the league. You can see that the Colorado Rockies have a habit of doing this. But the more interesting teams are in the top left corner. These are teams that started out slow but ended up as one of the best teams in the league. In fact, there were 31 teams that started .500 or worse in their first 16 games and ended up making the playoffs. That's 35% of all playoff teams in the last 10 years! Four teams that stand out are the Rays, Rockies, Rangers, and Phillies. The Rays, Rockies and Rangers were all sub-.500 and in last place in their division after the first 16 games—and all three ended up making it to the World Series that same year. And the Phillies were 8-8 and in third place after 16 games of the 2008 season. That would have put them out of the playoffs. But with a larger sample, they finished first in their division and ended up winning the World Series!

Can a Small Sample be Good?

In the world of sports, a small sample size isn't necessarily a bad thing. Small samples definitely make things entertaining. For example, just compare the first round of the NCAA tournament (single elimination) to the first round of the NBA playoffs (7 game series). The former has upsets galore (East Tennessee St over Michigan St anyone?) that would never be near impossible in a 7 game game series. And the variance that can occur in an NFL regular season certainly contributes to it being more popular than the marathon that is the MLB regular season. Larger samples help determine who the better team is, but the unpredictability that we love in sports is greatly helped by smaller samples.

Of course, the quality world varies greatly from sports entertainment. Usually, we want all the observations we can get to improve the reliability of our results. Just make sure that you don't collect such a large sample that statistically significant results aren't practical to your situation. Luckily, Minitab Statistical Software offers power and sample size analyses to help you determine how much data to collect. You want enough data to ensure you'll reliable results without spending extra time and money on unnecessary observations. 

And remember, when that NFL team comes out of nowhere to win their division next year; it could be the coach, it could be the quarterback...or it could just be the sample size!

Viewing all 828 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>