Quantcast
Channel: Minitab | Minitab
Viewing all 828 articles
Browse latest View live

Gage Linearity and Bias: Wake Up and Smell Your Measuring System

$
0
0

Right now I’m enjoying my daily dose of morning joe. As the steam rises off the cup, the dark rich liquid triggers a powerful enzyme cascade that jump-starts my brain and central nervous system, delivering potent glints of perspicacity into the dark crevices of my still-dormant consciousness.

Feels good, yeah! But is it good for me? Let’s see what the studies say…

Hmm. These are just a few results from copious studies on coffee consumption. But already I'm having a hard time processing the information.

Maybe another cup of coffee would help. Er...uh...maybe not.

The pivotal question you should ask before you perform any analysis

There are a host of possible explanations that might help explain these seemingly contradictory study results.

Perhaps the studies utilized different study designs, different statistical methodologies, different survey techniques, different confounding variables, different clinical endpoints, or different populations. Perhaps the physiological effects of coffee are modulated by the dynamic interplay of a complex array of biomechanisms that are differently triggered in each individual based on their unique, dynamic phenotype-genotype profiles.

Or perhaps...just perhaps...there's something even more fundamental at play. The proverbial elephant in the room of any statistical analysis. The essential, pivotal question upon which all your results rest...

"What am I measuring? And how well am I actually measuring what I think I'm measuring?"

Measurement system analysis helps ensure that your study isn't doomed from the start.

A measurement systems analysis (MSA) evaluates the consistency and accuracy of a measuring system. MSA helps you determine whether you can trust your data before you use a statistical analysis to identify trends and patterns, test hypotheses, or make other general inferences.

MSA is frequently used for quality control in the manufacturing industry. In that context, the measuring system typically includes the data collection procedures, the tools and equipment used to measure (the "gage"), and the operators who measure.

Coffee consumption studies don't employ a conventional measuring system. Often, they rely on self-reported data from people who answer questionnaires about their life-style habits, such as "How many cups of coffee do you drink in a typical day?"  So the measuring "system," loosely speaking, is every respondent who estimates the number of  cups they drink. Despite this, could MSA uncover potential issues with measurements collected from such a survey? 

Caveat: What follows is an exploratory exercise performed with small set of nonrandom data for illustrative purposes only. To see standard MSA scenarios and examples, including sample data sets, go to the Minitab's online dataset library and select the category Measurement systems analysis.

Gage Linearity and Bias: "Houston, we have a problem..."

For this experiment (I can't call it a study), I collected different coffee cups in the cupboard of our department lunchroom (see image at right). Then I poured different amounts of liquid into each cup and and asked people to tell me how full the cup was. The actual amount of liquid was 0.50 cup, 0.75 cup, or 1 cup, as measured using a standard measuring cup.

To evaluate the estimated "measurements" in relation to the actual reference values, I performed a gage linearity and bias study (Stat > Quality Tools > Gage Study > Gage Linearity and Bias Study). The results are shown below.

Note: A gage linearity and bias study evaluates whether a measurement system has bias when compared to a known standard. It also assesses linearity—the difference in average bias through the expected operating range of the measuring device. For this example, I didn't enter an estimate of process variation, so the results don't include linearity estimates.

The Y axis shows the amount of bias (in this particular case, the difference between the known amount of water (the standard) minus the amount estimated by each person using different coffee cups). If the measurements perfectly match the reference values, the data points on the graph should fall along the line bias = 0, with a slope of 0.

That's obviously not the case here. The estimated measurements for all three reference values show considerable negative bias. That is, when using the coffee cups in our department lunchroom as "gages", every person's estimated measurement was much smaller than the actual amount of liquid. Not a surprise, because the coffee cups are larger than a standard cup. (There are coffee cups that hold about one standard cup, by the way, such as the cup that I use every morning. But most Americans don't drink from coffee cups this small. It was designed back in the '50s, when most things—houses, grocery carts, cheeseburgers—were made in more modest proportions).

The Gage Bias table shows that the average bias increases as the amount of liquid increases. And even though this was a small sample, the bias was statistically significant (P < 0.000). Importantly, notice that the bias wasn't consistent at each reference value—there is a considerable range of bias among the estimates at each reference value.

Despite its obvious limitations, this informal, exploratory analysis provides some grounds for speculation.

What does "one cup of coffee" actually mean in studies that use self-reported data? What about categories such as 1-2 cups, or 2-4 cups? If it's not clear what x cups of coffee actually refers to, what do we make of risk estimates that are specifically associated with x number of cups of coffee? Or meta-analyses that combine self-reported coffee consumption data from different countries (equating one Japanese "cup of coffee", say, with one Australian "cup of coffee"?)

Of course, perfect data sets don't exist. And it's possible that some studies may manage to identify valid overall trends and correlations associated with increasing/decreasing coffee consumption.

Still, let's just say that a self-reported "cup of coffee" might best be served not with cream and sugar, but with a large grain of salt.

So before you start brewing your data...

And before you rush off to calculate p-values...it's worth taking the extra time and effort to make sure that you're actually measuring what you think you're measuring.


Creating a New Metric with Gage R&R, part 1

$
0
0

One of my favorite bloggers about the application of statistics in health care is David Kashmer, an MD and MBA who runs and writes for the Business Model Innovation in Surgery blog. If you have an interest in how quality improvement methods like Lean and Six Sigma can be applied to healthcare, check it out. 

A while back, Dr. Kashmer penned a column called "How to Measure a Process When There's No Metric," in which he discusses how you can use the measurement systems analysis method called Gage R&R (or gauge R&R) to create your own measurement tools and validate them as useful metrics. (I select the term “useful” here deliberately: a metric you’ve devised could be very useful in helping you assess your situation, but might not meet requirements set by agencies, auditors, or other concerned parties.) 

I thought I would use this post to show you how you can use the Assistant in Minitab Statistical Software to do this.

How Well Are You Supervising Residents? 

Kashmer posits a scenario in which state regulators assert that your health system's ability to oversee residents is poor, but your team believes residents are well supervised. You want to assess the situation with data, but you lack an established way to measure the quality of resident supervision. What to do?

Kashmer says, "You decide to design a tool for your organization. You pull a sample of charts and look for commonalities that seem to display excellent supervision versus poor supervision."

So you work with your team to come up with a tool that uses a 0 to 10 scale to rate resident supervision, based on various factors appearing on a chart. But how do you know if the tool will actually help you assess the quality of resident supervision?  

This is where gage R&R comes in. The gage refers to the tool or instrument you're testing, and the R&R stands for reproducibility and repeatability. The analysis will tell you whether different people who use your tool to assess resident supervision (the gauge) will reach the same conclusion (reproducibility) and do it consistently (repeatability). 

Collecting Data to Evaluate the Ability to Measure Accurately

We're going to use the Assistant in Minitab Statistical Software to help us. If you're not already using it, you can download a 30-day trial version for free so you can follow along.  Start by selecting Assistant > Measurement Systems Analysis... from the menu: 

measurement systems analysis

Follow the decision tree...

measurement systems analysis decision tree

If you're not sure about what you need to do in a gage R&R, clicking the more... link gives you requirements, assumptions, and guidelines to follow: 

After a look at the requirements, you decide you will have three evaluators use your new tool to assess each of 20 charts 3 times, and so you complete the dialog box thus: 

MSA dialog box

When you press "OK," the Assistant asks if you'd like to print worksheets you can use to easily gather your data:

gage R&R data collection form

Minitab also creates a datasheet for the analysis. All you need to do is enter the data you collect in the "Measurements" column:

worksheet

Note that the Assistant automatically randomizes the order in which each evaluator will examine the charts in each of their three judging sessions. 

Now we're ready to gather the data to verify the effectiveness of our new metric for assessing the quality of patient supervision. Come back for Part 2, where we'll analyze the collected data!  

Creating a New Metric with Gage R&R, part 2

$
0
0

In my previous post, I showed you how to set up data collection for a gage R&R analysis using the Assistant in Minitab 17. In this case, the goal of the gage R&R study is to test whether a new tool provides an effective metric for assessing resident supervision in a medical facility.  

As noted in that post, I'm drawing on one of my favorite bloggers about health care quality, David Kashmer of the Business Model Innovation in Surgery blog, and specifically his column "How to Measure a Process When There's No Metric." 

An Effective Measure of Resident Supervision? 

In one scenario Kashmer presents, state regulators and hospital staff disagree about a health system's ability to oversee residents. In the absence of an established way to measure resident supervision, the staff devises a tool that uses a 0 to 10 scale to rate resident supervision. 

Now we're going to analyze the Gage R&R data to test how effectively and reliably the new tool measures what we want it to measure. The analysis will evaluate whether different people who use the tool (the gauge) reach the same conclusion (reproducibility) and do it consistently (repeatability).   

To get data, three evaluators used the tool to assess each of 20 charts three times each, and recorded their score for each chart in the worksheet we produced earlier. (You can download the completed worksheet here if you're following along in Minitab.)   

Now we're ready to analyze the data. 

Evaluating the Ability to Measure Accurately

Once again, we can turn to the Assistant in Minitab Statistical Software to help us. If you're not already using it, your can download a 30-day trial version for free so you can follow along. Start by selecting Assistant > Measurement Systems Analysis... from the menu: 

measurement systems analysis

In my earlier post, we used the Assistant to set up this study and make it easy to collect the data we need. Now that we've gathered the data, we can follow the Assistant's decision tree to the "Analyze Data" option.  

measurement systems analysis decision tree for analysis

Selecting the right items for the Assistant's Gage R&R dialog box couldn't be easier—when you use the datasheet the Assistant generated, just enter "Operators" for Operators, "Parts" for Parts, and "Score" for Measurements.   

gage R&R analysis dialog box

Before we press OK, though, we need to tell the Assistant how to estimate process variation. When Gage R&R is performed in a manufacturing context, historic data about the amount of variation in the output of the process being studied is usually available. Since this is the first time we're analyzing the performance of the new tool for measuring the quality of resident supervision, we don't have an historical standard deviation, so we will tell the Assistant to estimate the variation from the data we're analyzing. 

gage r&r variation calculation options

The Assistant also asks for an upper or lower specification limit, or tolerance width, which is the distance from the upper spec limit to the lower spec limit. Minitab uses this to calculate %Tolerance, an optional statistic used to determine whether the measurement system can adequately sort good from bad parts—or in this case, good from bad supervision. For the sake of this example, let's say in designing the instrument you have selected a level of 5.0 as the minimum acceptable score.

gage r and r process tolerance 

When we press OK, the Assistant analyzes the data and presents a Summary Report, a Variation Report, and a Report Card for its analysis. The Summary Report gives us the bottom line about how well the new measurement system works.  

The first item we see is a bar graph that answers the question, "Can you adequately assess process performance?" The Assistant's analysis of the data tells us that the system we're using to measure patient supervision can indeed assess the resident supervision process. 

gage R&R summary

The second bar graph answers the question "Can you sort good parts from bad?" In this case, we're evaluating patient supervision rather than parts, but the Analysis shows that the system is able to distinguish charts that indicate acceptable resident supervision from those that do not. 

For both of these charts, less than 10% of the observed variation in the data could be attributed to the measurement system itself—a very good result.

Measuring the "Unmeasurable"

I can't count the number of times I've heard people say that they can't gather or analyze data about a situation because "it can't be measured." In most cases, that's just not true. Where a factor of interest—"service quality," say—is tough to measure directly, we can usually find measurable indicator variables that can at least give us some insight into our performance. 

I hope this example, though simplified from what you're likely to encounter in the real world, shows how it's possible to demonstrate the effectiveness of a measurement system when one doesn't already exist.  Even for outcomes that seem hard to quantify, we can create measurement systems to give us valuable data, which we can then use to make improvements.  

What kinds of outcomes would you like to be able to measure in your profession?  Could you use Gage R&R or another form of measurement system analysis to get started?  

 

How Good is Kentucky…Really?

$
0
0

UKAs I’m sure you’ve heard by now, Kentucky is really good at basketball. They're the only team in the country without a loss, and they have a realistic shot at becoming to first team to win the championship with an undefeated record since the 1976 Indiana Hoosiers. Under any ranking system you want to use, Kentucky is clearly the #1 team in college basketball.

Well, almost any ranking system.

All right, I have to confess that those rankings are from February 23. But still, Kansas had to lose six games before Kentucky finally moved ahead of them in the RPI. (Just a friendly reminder to ignore the RPI when filling out your brackets in March.)

Back to the question at hand: Kentucky is really good, but just how much better are they than top-ranked teams from previous years? To answer this question, I’ll use Minitab Statistical Software to dig into the Pythagorean rating from kenpom.com. Basically, this rating estimates what a team’s winning percentage "should" be based on the number of points they scored and allowed. Currently Kentucky’s Pythagorean rating is 0.9794. Last year, the #1 ranked team in the Pomeroy Ratings (Louisville) had a Pythagorean rating of 0.952. So even though both teams were ranked #1, we see that Kentucky is better due to the higher rating.

Comparing Kentucky to Previous #1 Teams

So how does Kentucky stack up to previous teams? I took the top ranked team in the Pomeroy Ratings for every year since 2002 (since that’s as far back as the ratings go). I also took the ratings before the NCAA tournament, to best represent the point in the season that Kentucky is currently at.

IVP

The individual value plot above makes it plain how much higher Kentucky’s rating is than #1 teams in previous years. In fact, from 2002-2014, the #1 ranked team in the Pomeroy Ratings had an average rating of 0.9614 with a standard deviation of 0.0084. That makes Kentucky’s rating more than 2 standard deviations higher than the average #1 team. Impressive.

How Will it Affect their Odds of Winning the Tournament?

The great thing about the Pythagorean ratings is that you can use them to calculate the probability one team has of beating another. So let’s see how different ratings change the probability of Kentucky going on a hypothetical run through the NCAA tournament. I noted where Kentucky is in the latest Joe Lunardi mock bracket, and obtained the Pythagorean ratings of the 6 teams they would have to face (assuming the higher seed advanced in each round). Then I calculated the probability of beating each team with a rating equal to the average #1 team, the previous high rating (Illinois in 2005), and the rating Kentucky currently has.

Opponent

Average #1 Team

Illinois 2005

2015 Kentucky

Sacramento St

97%

98%

98%

Ohio St

76%

81%

86%

Notre Dame

78%

83%

87%

Wisconsin

58%

65%

73%

Gonzaga

57%

64%

71%

Virginia

47%

54%

63%

Win the Championship

9%

15%

24%

Kentucky’s chances of winning the championship are 15% higher than the average #1 team, and 9% higher than the team that previously had the highest Pythagorean rating. But you’ll notice that their overall chance of winning is still only 1 out of 4...pretty low for what could be the greatest team ever. Of course, part of this is because I simply advanced the highest seed in each game, and that ended up being a brutal path. After Sacramento State, the remaining 5 teams are all in the Pomeroy Top 20 with 3 of them being in the top 6! And Virginia is so good they would actually be favored against the average #1 team!

But we know that upsets happen in the tournament. So what would their probability look like with a slightly easier path? Let’s take the teams #1 seed Florida would have had to beat to win the 2014 tournament. The ratings for our 6 opponents come from the final 2014 Pomeroy Ratings.

Opponent

Average #1 Team

Illinois 2005

2015 Kentucky

Albany

97%

97%

98%

Pitt

76%

82%

86%

UCLA

75%

80%

85%

Dayton

86%

89%

92%

Connecticut

71%

77%

83%

Kentucky

73%

79%

84%

Win the Championship

25%

35%

46%

Against an easier tournament run, Kentucky’s chances of winning are 21% greater than the average #1 team. That's huge! Kentucky will most likely be the biggest favorite ever in this year's NCAA tournament. But even against weaker competition (only 1 of these 6 teams finished in the Pomeroy Top 10, and they that team was only #8) Kentucky still has slightly less than a 50% chance of winning the championship. And that’s with their probability of winning each individual at 83% or higher! 

This just shows how hard it is to win 6 straight games in a single elimination tournament. And Kentucky’s path might look a little closer to the first table. Because……..well……..

Kentucky Has Company

Remember how Virginia would actually be favored against the average #1 team? That’s because despite being ranked #2 behind Kentucky, they are a very strong team. Not only is their Pythagorean rating higher than every other #2 ranked team from 2002-2014, it’s higher than 7 of the 13 #1-ranked teams!

I decided to look at every team in the top 10. I collected the ratings of the top 10 teams in the Pomeroy ratings (right before the NCAA tournament) from 2002-2014. For each ranking (1 through 10) I calculated the average rating, the third quartile, and the highest rating. For example, from 2002-2014 the average rating for the #2 ranked team was 0.9522, the third quartile was 0.9568, and the highest was 0.9589.

I then took the ratings of the current teams ranked in the Pomeroy top 10, and subtracted the values of teams from previous years. Virginia is currently the #2 ranked team with a rating of 0.9661. Since their rating is the highest of any #2 ranked team to come before them, their difference will be positive for the mean, third quartile, and the highest. Here are the results for the entire Pomeroy top 10, displayed in an Individual Value Plot created in Minitab Statistical Software:

Every team currently in the Pomeroy top 10 has a rating higher than the average for their ranking. Oklahoma (#9) is the only team with a rating less than third quartile for every other 9th ranked team. And, shockingly, 8 of the 10 teams in the top 10 have the highest rating of any similarly ranked team to come before them. The top-ranked teams are stacked. If you’re hoping this is the year that a 16 seed beats a 1 seed, don’t hold your breath.

Kentucky will still be an overwhelming favorite, but the data indicate that even against weaker teams their chances of winning would still be lower than 50%...and Kentucky’s top contenders this year are anything but weak. So don't think this is going going to be a cakewalk for the Wildcats.

After all, they don't call it March Madness for nothing.

The Falling Child Project : Using Binary Logistic Regression to Predict Borewell Rescue Success

$
0
0

by Lion "Ari" Ondiappan Arivazhagan, guest blogger. 

An alarming number of borewell accidents, especially involving little children, have occurred across India in the recent past. This is the second of a series of articles on Borewell accidents in India. In the first installment of the series, I used the G-chart in Minitab Statistical Software to predict the probabilities of innocent children falling into open borewells, which are sunk by farmers for agricultural and drinking water, while playing in the fields.

In this article, I will use the power of predictive analytics to predict the probability of successfully rescuing a trapped child based on the inputs of the child's age and gender using Binary Logistic Regression.

In Minitab, we can use Stat > Regression > Binary Logistic Regression to create models when the response of interest (Rescue, in this case) is binary and only takes two values: successful or unsuccessful. 

Borewell accidents data collected and provided by The Falling Child Project (www.fallingchild.org), a non-governmental organization (NGO) based in the United States, has been used for this predictive analysis.

Part of the raw data provided by the NGO is shown Table 1 below. A total of 62 borewell accident cases in India have been documented from 2001 to January 2015.

data

As part of the analysis, Minitab will predict probabilities for the events you are interested in, based on your model. The predicted probabilities for unsuccessful events versus the Predicted Age and Predicted Gender are shown in the scatterplot below.

scatterplot of events

We can predict, with 70% confidence, that the probability of unsuccessful rescue is 15% higher for a male child of age 2 than that for a female child of same age. However, it is surprising to note that above age 5, girls have about a 10 % higher chance of an unsuccessful rescue attempt than boys.

output

I should note that one outlier, a male of age 60, was replaced with a male of age 6 to reduce the unnecessary effect of outlier on the whole analysis / output.

Inferences

From the Binary Logistic Regression analysis above, we can predict that boys of age 5 and above have a greater chance of being successfully rescued than do girls of the same age. Although the analysis indicates a P-value of 0.736,hinting that there is not much of interaction between the age of the child and its gender in predicted probabilities, the over all model's P-Value is reasonable at 0.291, hinting at a moderate 70% confidence level in the model.

However, the scatter plot of predicted probabilities shown above paints a different picture. The age 5 seems to be critical age beyond which girls have lesser chances of being rescued alive than boys do.

My goal in performing this analysis and sharing my findings is to be helpful to the rescue teams that plan these rescue efforts, so that they can increase the chances of successfully rescuing every trapped child, boy or girl.

 

About the Guest Blogger:

Ondiappan "Ari" Arivazhagan is an honors graduate in civil / structural engineering from the University of Madras. He is a certified PMP, PMI-SP, and PMI-RMP from the Project Management Institute. He is also a Master Black Belt in Lean Six Sigma and has done Business Analytics from IIM, Bangalore. He has 30 years of professional global project management experience in various countries and has almost 14 years of teaching / training experience in project management, analytics, risk management, and Lean Six Sigma. He is the Founder-CEO of International Institute of Project Management (IIPM), Chennai, and can be reached at askari@iipmchennai.com.

An earlier version of this article was published on LinkedIn.

 

The Statistical Saga of Baby’s Weight

$
0
0

Many things have shocked me since having my first baby back in August. I didn’t think it was possible to be so tired that it actually hurt, and I also didn’t think that changing 10+ diapers a day would actually be the norm (or that needing to perform 10+ outfit changes was even possible, let alone necessary). I also didn’t think that we’d fall in love so hard with the little guy. What a wonderful, rewarding experience it is to be a parent!

That’s enough mushy talk for now. Let’s get back to the surprises involved in having a newborn. Another shock we experienced those first few days stemmed  from the weight loss our son experienced. I certainly didn’t imagine that my perfect newborn would lose so much weight those first couple of days! After all, he was born at a very healthy 8 pounds 3 ounces, and I was doing all I could those first couple of days to ensure he was fed every 2 hours, on the dot. I didn’t know that newborn weight loss was even a thing, let alone a very common thing. 

Here’s where things get cloudy and pretty crazy (please be sure to imagine my very ugly cry here, due to the aforementioned sleep deprivation). We took our son to his first doctor’s appointment a few days after his birthday, which included a weight check. According to the doctor, things weren’t looking good and he had lost “too much” weight. Our pediatrician followed what is known as the “10 percent rule of thumb” for breastfed babies, which basically means that a 7-10 percent weight loss after birth is considered normal. Our son had 12 ounces of weight loss, or about 9.2 percent of his total weight—the higher end of “normal.” But in my sleep-deprived mine, that 12 ounces became more than 1 pound of lost weight, and I was calling in all the troops to assess what was going wrong.

I only wish that one of the troops I called on had been this cool newborn weight tool, known as NEWT. Folks at the Penn State College of Medicine and the Penn State Hershey Children’s Hospital developed a “growth chart” for infant weight loss in the first few days of a baby’s life that mimics the percentile-approach commonly used by pediatricians for plotting the height, weight, and head circumference of children. (Before making the tool, the doctors knew they needed a large set of data for NEWT to be statistically sound. You can read more about how they got this data and implemented NEWT here.)

Let’s take a look at where his weight loss fell on NEWT’s continuum:

Now, I can definitely see our doctor’s cause for concern. After all, according to NEWT, results that tend toward higher percentile levels may provide early identification of adverse weight loss conditions. Our son's weight loss at about 61 hours after birth (see the light blue dot) fell just outside the 75th percentile.

However, since our son is a breastfed baby, his weight loss of 9.2% at three days old was still considered normal by most pediatricians, albeit on the higher end of being normal (which NEWT also shows nicely). The doctors who created NEWT brought up a good point in the article regarding the “10 percent rule of thumb”: a weight loss of 10 percent can matter a lot, or not all, depending on when and at what rate it occurs.

But…at 3 days postpartum, I was convinced I heard the doctor say our son had lost 16 ounces of weight, which equates to a much scarier 12.2% weight loss. Yikes! Sleep deprivation does crazy things to people. Like most first-time parents, I wanted my baby to be, above all things, healthy and normal. The 12.2% weight loss my tired brain had fabricated wasn’t normal, but his actual weight loss (9.2%) wasn’t far from normal at all. 

This all ended quite well, as two days later we headed back to the doctor for another weight check, and our son ended up gaining a whopping 9 ounces—putting his weight almost back to his birth weight. Our doctor likes to see breastfed babies reach their birth weight again about one week after their birthday. So we were right on track!

Since his weight has been a sore spot for me, I’ve been charting it using a Time Series Plot in Minitab in time increments that have followed his doctor appointment schedule (2 weeks, 2 months, 4 months, etc.):

Moving past his initial newborn weight loss, I’m monitoring for no dips moving forward, and hoping for a steady climb. You can see that the little guy has been doing just fine gaining weight so far, and we may even want to call him the “big” guy now!

As a parent, I’m very thankful for statistics and statistical tools like NEWT and Minitab!

How to Be a Billionaire, Revealed by Pareto Charts

$
0
0

Forbes ranked the world’s billionaires for 2015 this week, which gives us a good opportunity to have fun with some data. After all, when you’re talking about billions, the most fun you can have is to see how big the number can get.

hundred dollar bills

If you copy and paste Forbes’ data directly, you’ll find it a bit messy for analysis. For example, the sources of wealth for the billionaires include Telcom, Telecom, Telecom Equipment, Telecom Services, Telecommunications, and Telecoms. And that’s afteryou make sure that the capitalization is consistent. My cleaning has not been too rigorous, but I think it’s enough to get started.

Who to Work For

Forbes ascribes the wealth most of the billionaires on the list to industries, but supplies company names for some of the more familiar brands. Among those that I recognized as brands, here are the companies that support the most billionaires:

Companies listed as the source of wealth for the most billionaire

Although Mark Zuckerberg is the face of Facebook, Facebook’s created 7 other billionaires in its lifespan. Of course, that’s still not as many as Cargill, but Cargill seems to have an unfair edge in terms of diversification. Cargill includes everything from Crisco Vegetable Oil to Black River Asset Management LLC among its products.

Where to Work

You might look to see where most of the billionaires are, but then you would probably see only that countries with larger populations have more billionaires than countries with smaller populations. A graph with harder-to-anticipate results comes when you look at the average wealth of the billionaires in a country.

Countries with the wealthiest average billionaires

If you expected to see the United States, China, and Russia, they’re absent from the Top 10 list of countries in terms of the mean wealth of billionaires in the country. Mexico, influenced heavily by the wealth of Carlos Helu, leads the list of countries where the mean wealth of billionaires is highest. Among these nations, France has the most billionaires with 46.

 What to Do

Different countries are stronger in different industries. Forbes lists the source of wealth for four Finnish billionaires as “Elevators” and for three French billionaires as “Cheese.” But worldwide, there’s a clear winner.

Industries cited as the source of wealth for the most billionaires

Employment numbers bode well for math and science jobs, but real estate is the industry that’s produced the most current billionaires. It’s not clear how many the Diversified category might add to any of the other sources, but I can imagine that many diversified billionaires make money from real estate and investments.

Wrap Up

So would being a realtor for Cargill in Mexico really put you on the path to being a billionaire? I’ll probably never be able to tell you from experience. But we can certainly dream.

Bonus

I used pareto charts to show these data. If you're ready for more, check out how to explain quality statistics so your boss will understand.

The image of cash is by Amanda and is licensed under this Creative Commons license.

Understanding Hypothesis Tests: Why We Need to Use Hypothesis Tests in Statistics

$
0
0

Hypothesis testing is an essential procedure in statistics. A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. When we say that a finding is statistically significant, it’s thanks to a hypothesis test. How do these tests really work and what does statistical significance actually mean?

In this series of three posts, I’ll help you intuitively understand how hypothesis tests work by focusing on concepts and graphs rather than equations and numbers. After all, a key reason to use statistical software like Minitab is so you don’t get bogged down in the calculations and can instead focus on understanding your results.

To kick things off in this post, I highlight the rationale for using hypothesis tests with an example.

The Scenario

An economist wants to determine whether the monthly energy cost for families has changed from the previous year, when the mean cost per month was $260. The economist randomly samples 25 families and records their energy costs for the current year. (The data for this example is FamilyEnergyCost and it is just one of the many data set examples that can be found in Minitab’s Data Set Library.)

Descriptive statistics for family energy costs

I’ll use these descriptive statistics to create a probability distribution plot that shows you the importance of hypothesis tests. Read on!

The Need for Hypothesis Tests

Why do we even need hypothesis tests? After all, we took a random sample and our sample mean of 330.6 is different from 260. That is different, right? Unfortunately, the picture is muddied because we’re looking at a sample rather than the entire population.

Sampling error is the difference between a sample and the entire population. Thanks to sampling error, it’s entirely possible that while our sample mean is 330.6, the population mean could still be 260. Or, to put it another way, if we repeated the experiment, it’s possible that the second sample mean could be close to 260. A hypothesis test helps assess the likelihood of this possibility!

Use the Sampling Distribution to See If Our Sample Mean is Unlikely

For any given random sample, the mean of the sample almost certainly doesn’t equal the true mean of the population due to sampling error. For our example, it’s unlikely that the mean cost for the entire population is exactly 330.6. In fact, if we took multiple random samples of the same size from the same population, we could plot a distribution of the sample means.

A sampling distribution is the distribution of a statistic, such as the mean, that is obtained by repeatedly drawing a large number of samples from a specific population. This distribution allows you to determine the probability of obtaining the sample statistic.

Fortunately, I can create a plot of sample means without collecting many different random samples! Instead, I’ll create a probability distribution plot using the t-distribution, the sample size, and the variability in our sample to graph the sampling distribution.

Our goal is to determine whether our sample mean is significantly different from the null hypothesis mean. Therefore, we’ll use the graph to see whether our sample mean of 330.6 is unlikely assuming that the population mean is 260. The graph below shows the expected distribution of sample means.

Sampling distribution plot for the null hypothesis

You can see that the most probable sample mean is 260, which makes sense because we’re assuming that the null hypothesis is true. However, there is a reasonable probability of obtaining a sample mean that ranges from 167 to 352, and even beyond! The takeaway from this graph is that while our sample mean of 330.6 is not the most probable, it’s also not outside the realm of possibility.

The Role of Hypothesis Tests

We’ve placed our sample mean in the context of all possible sample means while assuming that the null hypothesis is true. Are these results statistically significant?

As you can see, there is no magic place on the distribution curve to make this determination. Instead, we have a continual decrease in the probability of obtaining sample means that are further from the null hypothesis value. Where do we draw the line?

This is where hypothesis tests are useful. A hypothesis test allows us quantify the probability that our sample mean is unusual. In my next blog post, I’ll continue to use this graphical framework and add in the significance level and P value to show how hypothesis tests work and what statistical significance means.


"Means and Histograms" by Dr. Mu's

$
0
0

I am M.
G - L - M.

G-L-M

That G - L - M!
That G - L - M!
I do not like
That G - L - M.

Do you like
Means and histograms?

I do not like them,
G - L - M.
I do not like
Means and histograms.

Would you like them
Halved or squared?
Would you like them
As a pair?

Do You Like

I would not like them
Halved or squared.
I would not like them
As a pair.

I do not like
Means and histograms.
I do not like them,
G - L - M.

Would you like them
In some code?
Would you like them
With the mode?

I do not like them
In some code.
I do not like them
With the mode.

I do not like them
Halved or squared.
I do not like them
As a pair.
I do not like
means and histograms.
I do not like them, 
G - L - M.

Would you use them
With George Box?
Would you use them
With Sir Cox?

Box and Cox in Box

Not with George Box.
Not with Sir Cox.

Not in some code.
Not with the mode.
I would not use them halved or squared.
I would not use them as a pair.
I would not use means and histograms.
I do not like them, G - L - M.

Would you? Could you?
Chart a bar?
Plot them! Plot them!
Here they are.

I would not,
Could not, Chart a bar.

You may like them.
You will see.
You may like them
Scored as Z.

I would not, could not scored as Z.
Not charted bars! You let me be.

I do not like them with George Box.
I do not like them with Sir Cox.
I do not like them in some code.
I do not like them with the mode.
I do not like them halved or squared.
I do not like them as a pair.
I do not like means and histograms.
I do not like them, G - L - M.

A gage! A gage!
A gage! A gage!
Could you, would you
Do a gage?

Not as a gage! Not scored as Z!
Not charted bars! Let me be!
I would not, could not, with George Box.
I could not, would not, with Sir Cox.
I will not use them with the mode.
I will not use them in some code.
I will not use them halved or squared.
I will not use them as a pair.
I do not like them, G - L - M.

Say!
As a chart?
An Xbar-R Chart!
Would you, could you, use a chart?

I would not, could not,
Use a chart.

Would you, could you,
Use the range?

I would not, could not, use the range.
Not on a chart.  Not as a gage,
Not charted bars, Not scored as Z.
I do not like to use stats, you see.
Not in some code. Not with George Box.
Not with the mode. Not with Sir Cox.
I will not use them halved or squared.
I will not use them as a pair!

You do not like
Means and histograms?

I do not
Like them,
G - L - M.

Could you, would you,
D - O - E?

I would not,
Could not,
D - O - E!

Would you, could you,
S - P - C?

I could not, would not, S - P - C.
I will not, will not, D - O - E.
I will not use them with the range.
I will not use them with a gage.
Not on a chart! Not scored as Z!
Not charted bars! You let me be!
I do not like them with George Box.
I do not like them with Sir Cox.
I will not use them in some code.
I will not use them with the mode.
I do not like them halved or squared.
I do not like them as a pair!
I do not like
Means and Histograms!
I do not like them,
G - L - M.

You do not like them.
So you say.
Try them! Try them!
And you may.
Try them and you may I say.

M!
If you will just teach me,
I will use them.
You will see.

Say!
I like means and histograms!
I do! I like them, G - L - M!
 
And I would use some D - O - E!
And I would use that S - P - C.
And I will use them with a gage.
And with a chart.  And with the range.
And charted bars. And scored as Z.
They are so good so good you see!
So I will use them with George Box.
And I will use them with Sir Cox.
And I will use them in some code.
And I will use them with the mode.
And I will use them halved and squared.
Say! I will use them as a pair!

 

I will

I do so like
Means and histograms!

Thank you!
Thank you,
G - L - M.

 

Even among practitioners, statistics are often something only to be consumed when forced to. That's why, on Wednesday, March 11th at the Lean Six Sigma World Conference, I'll be presenting an interactive session to make attendees more comfortable with statistical concepts.  By removing the Greek symbols, equations, and other math, I hope to transform the audience into a group eager for some means and histograms!
 

Author's note: A huge thanks to our graphics guru Trevor Calabro, who turned a ridiculous blog post into a ridiculous blog post with awesome graphics!

Tips and Tricks from Minitab's Technical Support Team

$
0
0

There are times when we are deep in a particular analysis and simply cannot seem to get past this dialog window, or that error message. Fortunately, the support team at Minitab is here to help.

Here is a list of situations people have called us about when using Minitab, and how to solve them.

If your situation isn't listed, please call Minitab Technical Support, and we will be happy to assist. There's no need to take your frustrations out on your hardware like Joe did! 

Sorry, You’re Not My (Data) Type

This message appears after entering a space in a cell of a numeric formatted column. Numeric columns will not accept text or spaces. After pressing OK in this window, you may be inclined to click somewhere else to continue your work. However, you’ll quickly realize that Minitab won’t let you continue until you remove that space from the column. Simply press the backspace key on your keyboard to remove the space.

You Look Like You’ve Seen a Ghost

When you put your mouse cursor in the ‘Variables’ box in the dialog window of Display Descriptive Statistics, you’ll see that C1 is not listed in the box to the left. The numbers are listed in C1 though, right? Minitab is interpreting C1 as a text column, and the Variables box is a smart cookie. It will only recognize numeric columns. You’ll need to convert C1 into numeric by going to Data> Change Data Type> Text to Numeric.

Editor-in-Chief

The contents of our Editor menu can change depending on what part of Minitab you are currently clicking on with your mouse. If you can't find what you are looking for under the Editor Menu, please make sure that you are selecting the appropriate area. 

Worksheet Selected Graph Window Selected Session Window Selected Lost and Found

Here are some shortcut keys that will utilize Minitab’s Project Manager to arrange what you are looking for, in the event you can't find that particular graph or worksheet, or session output.

CTRL+ALT+M: This shows a list of your analyses that were shown in the session window, as well as any graphs that were created along with it.

CTRL+ALT+D: This shows a list of your worksheets in your project.

CTRL+ALT+G: This shows a list of your graphs in your project.

Waiter, May I Please Have A Menu?

You may have been handed a computer with Minitab installed and found that some menus were missing. Someone may have customized Minitab and had them removed. You can return the menu toolbar to it’s original format by going to Tools > Customize > Toolbars. Single click on ‘Menu Bar’ under the toolbars list, and then click on the Reset button. You’ll receive a warning that your menu changes will be lost. Go ahead and hit OK to return the menu to its default setting.

Command and Conquer

There may be a time when you want to run commands to generate Minitab output as opposed to using our dialog boxes. This can be done in one of two ways:

  1. You can enter commands directly into our Session Window. You’ll need to enable our command line editor though. This is done using Editor > Enable Commands.
  2. You can use our Command Line Editor to run commands as well. This is found in Edit > Command Line Editor (or press CTRL+L on your keyboard).

For a list of session commands used in Minitab, please go to Help > Help > References > Session Commands.

Restricted Imports

You are pasting data using CTRL+V from Excel into Minitab and get an error box that says

 "A run-time error has been raised. Please report this to Minitab Technical Support or your distributor."

We do our best to anticipate the many kinds of data that people want to import into Minitab, but sometimes special formats or characters in the data set may get the best of us. There are a couple of ways to get around this so that you can import your data into Minitab successfully:

  • In Minitab, use File > Open Worksheet instead. Under the drop-down labeled “Files of Type” choose Excel(*.xls, *xlsx) to filter for Excel files.
  • Save your original file as an Excel .csv file, then copy and paste into Minitab.
Seeing Red

You have a graph set to update automatically when you add data to the worksheet. If the creation of that graph required multiple columns, then you’ll need to update each of those columns if you want the graph to continue to update. For example, if you created a control chart with a time stamp and stages, that’s 3 columns of data that were used. If you only update your measurement column, you’ll see that your update icon in the upper left corner of the graph will change to red X. Once you enter in data for your time stamp and your stage columns, the graph will update automatically. Your X will turn into a green check. The pictures below show the upper left corner of two control chart graphs. 

 

 

Do you have any tips of your own to share?  Or any situations you'd like to see covered in a future post on the Minitab Blog?  We'd love to hear from you!

 

 

Predicting the Barclay's Premier League with Regression Analysis

$
0
0

In England, with only a few months left, the Barclay’s Premier League is about to enter the final run in to finish up the season. While the top two spots seem pretty locked up with Chelsea and Manchester City showing their class, the fight for the other two spots in the coveted top 4 promises to entertain to the very last weekend. This is key, because only the top 4 finishers qualify for next season's UEFA Champions Leagues.

Right now, there are five teams who have a realistic chance at qualifying for the last two Champions League spots: Manchester United, Southampton, Arsenal, Tottenham, and Liverpool.

We’re going to use Minitab’s Prediction dialog to forecast, based on some statistics, who will finish top 4 and qualify for next season’s UEFA Champions League. Using our statistical software, we ran a regression using data from the past five seasons, with Total Points as our response variable (in the Premier League, you receive 3 points for every win, and 1 point for every draw). Our predictors included a few different team-based statistics, namely Shots per game; Possession, which tracks the percentage of time a team controls the ball, pass completion percentage; and goal difference.

After running the data through the Stat > Regression > Regression > Fit Regression Model command in Minitab, we arrived at the following final model:

Points = 45.39 - 0.157 Shots per game + 0.115 Possession + 0.040 Pass %  + 0.5945 Goal Difference

Now, using the Predict dialog in the regression menu, we can forecast and see which of the five teams competing for a Champions League spot will come out on top, based on our model.

To do this, after we have fit a regression model like we did above, we  go back to Stat > Regression > Regression > Predict

Here we are presented with a straightforward dialog that  allows us to enter either individual values to predict on, or we can enter a column of values if we are interested in multiple predictions. For this analysis, we’re going to enter a column. Our worksheet contains the following table, which includes the statistics for each of our five teams, as well as a prorated goal differential, which will be used to forecast each team’s point total at the end of the year.

If we then go to Stat > Regression > Regression > Predict, we can fill out the dialog as follows, with our new columns. :

Before pressing "OK," click "Results" and make sure Prediction Table is checked. We can check our Session Window output to see predicted values for Total Points:

So what do our results tell us? Which of these five teams will finish in the Top 4? We can look at the raw point totals for each of the teams, which is listed under "Fit." Judging by these, we can rank the teams as follows, by point total:

Arsenal - 72 
Manchester United - 71
Southampton - 70
Liverpool - 59
Tottenham - 57

According to our prediction, both Arsenal and Manchester United will qualify, with Southampton just on the outside looking in. Liverpool and Tottenham seem well behind according to our prediction. This makes sense, as the most important predictor in our model is goal difference, and those two teams are well behind the other three. Only time will tell if our predictions are correct, but for now, we'll pick Arsenal and Manchester United. 

 

How Could You Benefit from Monte Carlo Simulation to Design New Products ?

$
0
0

Suppose that you have designed a brand new product with many improved features that well help create a much better customer experience. Now you must ensure that it is manufactured according to the best quality and reliability standards, so that it gets the excellent long-term reputation it deserves from potential customers. You need to move quickly and seamlessly from Research and Development into mass production. To scale up production, the design team needs to provide the right components specifications to the suppliers, and these components specifications will be converted into “process windows” on the actual manufacturing processes.

                                                  

Optimization

Whether the manufacturing plant is located next door or far away in another country, and whether the plant is owned by your company or by an external supplier,to make the product easy to manufacture the design team must deliver the right “recipe” to the manufacturer (optimized specs, optimized manufacturing settings, etc.). If this “tolerancing” phase is not properly carried out, manufacturing engineers will have to resort to their own creativity to solve mismatches and adjust product settings. Obviously, this is not the best option since it involves tampering with the product characteristics and would have an adverse impact on time-to-market.

Capability Estimates

Unfortunately, all processes are affected by various sources of variation (environmental fluctuations as well as process variability), and this variability often causes major quality problems. If product specifications are large enough compared to the overall process variability, the result will be high-quality products at low cost (with a high Ppk capability value). If this is not the case, the percentage of out-of-spec products will substantially increase.

Consider the graph below. There are many inputs and only one output. Some inputs are controllable parameters, but some are uncontrollable noise factors.

At this stage, only few prototypes might be available to validate the design concept, however, models based on pilot-scale Designs of Experiments (DOE), Computer-Aided Design, or known physical models may enable you to study the way in which input variability will get propagated to the final output, and from that you can predict the capability values that can be expected when full production will be launched.

Enter the Monte Carlo Simulation Method

The Monte Carlo method is a probabilistic technique based on generating a large number of random samples in order to simulate variability in a complex system. The objective is to simulate and test as early as possible so we can anticipate quality problems, avoid costly design changes that might be required at a later phase, and make life a lot easier on the shop floor.

Monte Carlo has a reputation for being difficult, but software tools have made it much easier. For example, check out Devize, Minitab's Monte Carlo simulation software for manufacturing engineers.

Every input in your model is characterized by a mean and a variance. Identifying the correct probabilistic distribution may require a deeper knowledge of the way inputs behave. To simplify this, the triangular distribution may be used to simply indicate the minimum, maximum and the most probable values.

Sensitivity Analysis

When doing a Monte Carlo simulation, if the predicted capability index is insufficient and some improvements are required to reach an acceptable quality level, the variability on some inputs will need to be reduced. Variation reduction is a very costly activity; therefore, you should really focus on the few variables that will lead to the largest gains in terms of capability improvement.

The graph above illustrates a sensitivity analysis, in which a reduction of the standard deviation of a particular input (green curve) is expected to lead to a massive reduction in the proportion of out-of-spec.

Robust Design 

Some parameters that are easily controllable (control factors) in your system may interact with noise effects. This means that a noise factor effect may be modified by a controllable factor. If that is the case, such noise*control interactions may be used to mitigate noise effects and make the process or product more robust to environmental fluctuations. Nonlinear effects may also be useful to improve robustness to fluctuations.

In the graph above, after optimization and following a sensitivity analysis, the variability of one input has been reduced so that the variability of the output, which was too large compared to specifications (right part of the diagram), is now well within specifications (left part).

Conclusion 

This simulation procedure is iterative: 

  1. Design with nominal values.
  2. Simulate variability to predict capability indices.
  3. Analyze sensitivity.
  4. Redesign or re-center until the system meets all requirements.

Monte Carlo simulations are often a crucial part of DFSS (Design for Six Sigma) or DMADV (Define Measure Analyze Design Verify) projects. Innovation activities play a vital role as economies become more advanced and more dynamic. As we move into the innovation-driven stage, this approach based on simulations will become even more important.

Monte Carlo simulation used to involve high computational costs in the past, but this is no longer the case today with the availability of very powerful calculating tools.

 

P-value Roulette: Making Hypothesis Testing a Winner’s Game

$
0
0

Welcome to the Hypothesis Test Casino! The featured game of the house is roulette. But this is no ordinary game of roulette. This is p-value roulette!

Here’s how it works: We have two roulette wheels, the Null wheel and the Alternative wheel. Each wheel has 20 slots (instead of the usual 37 or 38). You get to bet on one slot.

http://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Edvard_Munch_-_At_the_Roulette_Table_in_Monte_Carlo_-_Google_Art_Project.jpg/256px-Edvard_Munch_-_At_the_Roulette_Table_in_Monte_Carlo_-_Google_Art_Project.jpg

What happens if the ball lands in the slot you bet on? Well, that depends on which wheel we spin. If we spin the Null wheel, you lose your bet. But if we spin the Alternative wheel, you win!

I’m sorry, but we can’t tell you which wheel we’re spinning.

Doesn’t that sound like a good game?

Not convinced yet? I assure you the odds are in your favor if you choose your slot wisely. Look, I’ll show you a graph of some data from the Null wheel. We spun it 10,000 times and counted how many times the ball landed in each slot. As you can see each slot is just as likely as any other, with a probability of about 0.05 each. That means there’s a 95% probability the ball won’t land on your slot, so you have only a 5% chance of losing—no matter what—if we happen to spin the Null wheel.

histogram of p values for null hypothesis

What about that Alternative wheel, you ask? Well, we’ve had quite a few different Alternative wheels over the years. Here’s a graph of some data from one we were spinning last year:

histogram of p values from alternative hypothesis

And just a few months ago, we had a different one. Check out the data from this one. It was very, very popular.

 histogram of p-values from popular alternative hypothesis

Now that’s what I call an Alternative! People in the know always picked the first slot. You can see why.

I’m not allowed to show you data from the current game. But I assure you the Alternatives all follow this same pattern. They tend to favor those smaller numbers.

So, you’d like to play? Great! Which slot would you like to bet on?

Is this on the level?

No, I don’t really have a casino with two roulette wheels. My graphs are simulated p-values for a 1-sample t-test. The null hypothesis is that the mean of a process or population is 5. The two-sided alternative is that the mean is different from 5. In my first graph, the null hypothesis was true: I used Minitab to generate random samples of size 20 from a normal distribution with mean 5 and standard deviation of 1. For the other two graphs, the only thing I changed was the mean of the normal distribution I sampled from.  For the second graph, the mean was 5.3. For the final graph, the mean was 5.75.

For just about any hypothesis test you do in Minitab Statistical Software, you will see a p-value. Once you understand how p-values work, you will have greater insight into what they are telling you. Let’s see what we can learn about p-values from playing p-value roulette.

  1. Just as you didn’t know whether you are spinning the Null or Alternative wheel, you don’t know for sure whether the null hypothesis is true or not. But basing your decision to reject the null hypothesis on the p-value favors your chance of making a good decision.
     
  2. If the null hypothesis is true, then any p-value is just as likely as any other. You control the probability of making a Type I error by rejecting only when the p-value falls within a narrow range, typically 0.05 or smaller. A Type I error occurs if you incorrectly reject a true null hypothesis.
     
  3. If the alternative hypothesis is true, then smaller p-values become more likely and larger p-values become less likely. That’s why you can think of a small p-value as evidence in favor of the alternative hypothesis.
     
  4. It is tempting to try to interpret the p-value as the probability that the null hypothesis is true. But that’s not what it is. The null hypothesis is either true, or it’s not. Each time you “spin the wheel” the ball will land in a different slot, giving you a different p-value. But the truth of the null hypothesis—or lack thereof—remains unchanged.
     
  5. In the roulette analogy there were different alternative wheels, because there is not usually just a single alternative condition. There are infinitely many mean values that are not equal to 5; my graphs looked at just two of these.
     
  6. The probability of rejecting the null hypothesis when the alternative hypothesis is true is called the power of the test. In the 1-sample t-test, the power depends on how different the mean is from the null hypothesis value, relative to the standard error. While you don’t control the true mean, you can reduce the standard error by taking a larger sample. This will give the test greater power.
     
You Too Can Be a Winner!

To be a winner at p-value roulette, you need to make sure you are performing the right hypothesis test, and that your data fit the assumptions of that test. Minitab’s Assistant menu can help you with that. The Assistant helps you choose the right statistical analysis, provides easy-to-understand guidelines to walk you through data collection and analysis. Then it gives you clear graphical output to let you know how to interpret your p-value, while helping you evaluate whether your data are appropriate, so you can trust your results.

 

The Kentucky Conundrum: Creating a New Regression Model to Predict the NCAA Tournament

$
0
0

The NCAA Tournament is right around the corner, and you know what that means: It’s time to start thinking about how you’re going to fill out your bracket! For the last two years I’ve used the Sagarin Predictor Ratings to predict the tournament. However, there is a problem with that strategy this year. The old method uses a regression model that calculates the probability one team has of beating another based on where the team is ranked in the Sagarin Ratings. So from year to year, the #1 ranked team is going to have the same probability of beating, say, the #25 ranked team.

The problem this year, of course, is that Kentucky isn’t your average #1 team.

We can’t simply use the fact that Kentucky is the #1 ranked team to calculate their probability of beating other teams because they’re so much better than the #1 teams that came before them! In fact, if you take the Pomeroy rating of the #1 ranked teams since 2002, this Kentucky team not only has the highest rating, they’re a full 2 standard deviations higher!

And Kentucky isn’t alone. The previous link goes on to show that 8 of the 10 teams in the Pomeroy top 10 have the highest rating of any similarly ranked team to come before them. This may be the best group of #1 and #2 seeds the NCAA tournament has ever had. So any predictive system based on a team's ranking ranked may be underestimating the chances of the top teams.

Luckily, Las Vegas can help us!

Using Vegas Spreads to Predict Games

Using the Sagarin ratings, we can calculate the margin of victory we’d expect the favored team to win by. That is, we can calculate the spread (according to the Sagarin ratings). Then we can use that spread to calculate the probability the favorite has of winning. How? That’s where Las Vegas comes in.

I took 3,126 college basketball games from this season, collected the spread (for the home team) and whether the home team won or not. (Note that at neutral site games, the “home” team is really just an arbitrary title given to one team. However, the spread already accounts for whether the home team is actually playing on their home court or not, so we can safely group actual home teams and home teams at neutral site games.)

We can use this data to create a binomial logistic regression model that can calculate the probability of the home team winning based on the spread:

Binary Logistic Regression

Binary Logistic Regression

You can see that our model does a very good job of predicting the probability the home team has of winning based on the spread. However, there does appear to be a big outlier in the upper right corner. This is the group of home teams that were favored by 24.5 points. In our data, there were 11 such teams, and two of them actually lost. Not only that, the teams were from the same state! Michigan lost to NJIT and Michigan State lost to Texas Southern. These were actually the only two home teams to lose a game in which they were favored by 18.5 points or more. I don’t think there is anything special about being favored by 24.5 points, so I think we can just chalk that outlier up to random variation and continue with the analysis. 

Applying the Model to the Sagarin Ratings

Let’s go back to our hypothetical match up between the #1 ranked team and the #25 ranked team to look at the difference between using the ranks and using the ratings. Right now those two teams are Kentucky and Davidson in the Sagarin predictor ratings. If we rely on where the teams are ranked, the probability that Kentucky wins a neutral site game is 84.6%. However, using the binary logistic model based on the spread from the Sagarin ratings, the probability that Kentucky wins is 91.4%. That’s quite a difference!

To win the entire tournament, Kentucky has to win 6 games. For simplicity, let’s just assume Kentucky has the same probability of winning all 6 games.

Probability of Kentucky  winning the tournament based on ranks = .846^6 = 36.7%

Probability of Kentucky winning the tournament based on ratings = .914^6 = 58.3%

You can see that even a small difference in probabilities can become extreme when compounded over 6 games!

Testing the Model

The last thing we should do is see how accurate our model is. For 594 games in February and March of this season, I obtained the probability the home team would win based on the Sagarin ratings. Then I put each game into a group based on the probability that the favorite had of winning. For example, if a team had a probability of winning of 76%, they would go in the “70 to 79” group. To test the accuracy of the model (and the ratings), we can look at the proportion of favorites that won in each group and compare that to the predicted probability.  If these two numbers are close together, then the model and the ratings are accurate. The results for the Sagarin ratings are below.

Group

Predicted Probability

Observed Probability

Difference

Number of Games

50 to 59

55.2%

58.9%

3.7%

141

60 to 69

64.3%

61.2%

3.1%

147

70 to 79

74.8%

73.3%

1.5%

120

80 to 89

84.3%

86.6%

2.3%

119

90 to 99

93.6%

94%

0.4%

67

For each group, the difference between the observed probability and the predicted probability is only a few percentage points. It looks like our model is good to go. So check back on Monday, when we break down the brackets!

Predicting the 2015 NCAA Tournament

$
0
0

Are you ready for some madness? Me too! So let’s break down the brackets. I’ll be using the Sagarin Predictor ratings to determine the probability each team has of advancing using a binary logistic model created with Minitab Statistical Software. You can find the details of how the probabilities are being calculated here.

Before we start, I’d also like to mention one other set of basketball ratings, called the Pomeroy Ratings. Both the Sagarin predictor ratings (from here on out referred to as the Sagarin ratings) and the Pomeroy ratings have shown to be pretty accurate in predicting college basketball games. But Ken Pomeroy always breaks down the tournament using his system. So instead of duplicating his numbers, I like to use the Sagarin ratings. But I’ll be sure to mention places where the two systems disagree, and you can select the one you want to go with!  

Alright, enough with the small talk. Let’s get to the statistics!

Midwest

The following table has the probabilities each team in the West Region has of advancing in each round (up to the Final Four). But I decided to add something new this year. When you’re entering your “for fun” office pools, sometimes you’re given more points for upsets. The standard is to take the difference in the seeds, and multiply it by the round. For example:

  • In the first round you correctly pick a 12 seed over a 5, you get (12-5) x 1 = 7 points
  • In the 2nd round you correctly pick a 7 seed over a 2, you get (7-2) x 2 = 10 points
  • If in the Final Four you correctly pick a 5 seed over a 1, you get (5-1) x 5 = 20 points

In a bracket with no upset points, your optimal bracket would be choosing the teams with the highest probability of advancing. But with bonus points, it may be optimal to pick a team with a smaller probability because you’ll receive more points if they win. So first I’ll give each team’s probability of advancing, and then I’ll give each teams expected points in a pool that gives upset points.

Probability of Advancing

Team

2nd Round

Sweet 16

Elite 8

Final 4

(1) Kentucky

99.0%

93.7%

86.9%

76.5%

(3) Notre Dame

91.6%

53.9%

30.7%

6.2%

(2) Kansas

88.8%

54.2%

28.4%

5.8%

(5) West Virginia

73.3%

48.2%

6.0%

2.7%

(7) Wichita St

68.6%

33.4%

14.9%

2.5%

(11) Texas

51.3%

23.9%

12.3%

2.1%

(6) Butler

48.7%

21.2%

10.3%

1.6%

(4) Maryland

71.6%

32.5%

3.1%

1.1%

(9) Purdue

55.9%

3.8%

1.9%

0.6%

(10) Indiana

31.4%

10.4%

3.0%

0.3%

(8) Cincinnati

44.1%

2.3%

1.0%

0.3%

(12) Buffalo

26.7%

11.7%

0.7%

0.2%

(13) Valparaiso

28.4%

7.6%

0.4%

< 0.1%

(15) New Mexico St

11.2%

2.0%

0.3%

< 0.1%

(14) Northeastern

8.4%

1.0%

0.1%

< 0.1%

(16) Hampton/Manhattan

1%

0.1%

< 0.1%

< 0.1%


As if Kentucky needed any further help, they were placed in one of the easiest regions. Kansas is the weakest 2 seed in the tournament, and Maryland is the weakest 4 seed. In fact, other than Kentucky, no other team in this region is in the Sagarin top 10 (Kansas is the highest at 14). If you want to pick a team to upset Kentucky, I wouldn’t do it before the Final Four.

The team most likely to meet Kentucky in a potential Elite 8 games is Notre Dame, although Kansas’s probability isn’t much lower. But if you want to go crazy, 7th seeded Wichita State and 11th seeded Texas have the next highest probabilities of playing Kentucky in the Elite 8. And the Pomeroy Ratings actually have Wichita State rated even higher than Sagarin, so they’re chances are even better according to Pomeroy (21% of reaching the elite 8). Don’t sleep on the Shockers.

One team you can sleep on is Maryland. The terrapins are ranked #2 in Ken Pomeroy’s luck statistic, which means they are not as good as their record would indicate. While both Sagarin and Pomeroy have Maryland ranked similarly (32nd and 33rd), Pomeroy has Valparaiso ranked much higher (63rd as opposed to 72). So while the Sagarin ratings give Valparaiso about a 28% chance of winning (still not terrible for a 13 seed), Pomeroy likes their chances even more at 38%.  Maryland is going to be on upset alert from their opening game.

Of course, the most likely opening-round upset is Texas over Butler. The 11th seed is actually a slight favorite in both Sagarin and Pomeroy. So even if you don’t get bonus points for upsets you should definitely consider picking Texas. In fact, this region could go crazy with upsets. Other than Kentucky, no team has a higher than 54% of reaching the sweet 16. So even if you don’t get upset points, you should consider picking some in this region.

And if you do get upset points…

Expected Points in a Pool with Upset Points

Team

2nd Round

Sweet 16

Elite 8

Final 4

(1) Kentucky

0.99

2.86

6.34

12.46

(11) Texas

2.57

5.85

8.04

8.71

(7) Wichita St

0.69

3.50

4.69

5.11

(3) Notre Dame

0.92

1.99

3.22

3.71

(2) Kansas

0.89

1.97

3.11

3.57

(13) Valparaiso

2.56

3.34

3.43

3.46

(12) Buffalo

1.87

3.11

3.28

3.32

(10) Indiana

0.94

2.24

2.66

2.74

(6) Butler

0.49

1.62

2.42

2.66

(5) West Virginia

0.73

1.70

3.4

3.64

(15) New Mexico St

1.45

1.69

1.76

1.77

(4) Maryland

0.72

1.37

1.59

1.68

(9) Purdue

0.56

1.11

1.31

1.43

(14) Northeastern

0.92

1.03

1.07

1.07

(8) Cincinnati

0.44

0.72

0.81

0.85

(16) Hampton/Manhattan

0.14

0.17

0.17

0.17

To use these numbers, the first thing you want to look for is the highest value for picking a team to the Final Four. That of course, would be Kentucky, so you advance them to the final four. Then, out of the possible teams that might face them in the Elite 8, you want to find the highest values in the Elite 8 column. That would be Texas, so you advance them to the Elite 8. Then continue working backwards until you’ve filled out the bracket. The numbers in bold represent the teams you should select for the “optimal” bracket.

The reason you work backwards is to account for possible points after that round. For example, if you look at only the first round game between Indiana and Wichita State, you’ll see picking Indiana will get you 0.94 points on average, while picking Wichita State only gets you 0.69. However, Wichita State has a much better chance of beating Kansas in the 2nd round, so their expected points becomes higher when you factor that in.

As you can tell, if you’re in an upset pool, you have two jobs in this region.

  1. Put Kentucky in the final four.
  2. Go hog wild with upsets.

Seriously, put both Texas and Wichita State in the Sweet 16, and pick one for the Elite 8 (remember, Pomeroy liked Wichita State even more than Sagarin, so their numbers would be closer to Texas using Pomeroy). Definitely pick Buffalo and Valparaiso to win in the first round, and choose one to go to the Sweet 16. And if you really feel like going crazy, pick New Mexico State and Northeastern. Are they likely to win? No. But you’re picking Kansas and Notre Dame to lose their next game anyway, so the most you can gain by picking them is 2 points. And if one of them actually gets upset? All the bonus points!

West Probability of Advancing

Team

2nd Round

Sweet 16

Elite 8

Final 4

(1) Wisconsin

97.9%

87.6%

66.3%

43.5%

(2) Arizona

98.9%

71.7%

53.8%

28.5%

(4) UNC

91.2%

68.5%

24.5%

11.7%

(3) Baylor

85.6%

58.1%

21.4%

7.0%

(10) Ohio St

71.7%

23.8%

13.8%

4.8%

(5) Arkansas

82.3%

27.1%

5.1%

1.4%

(6) Xavier

55.8%

22.0%

5.5%

1.2%

(9) Oklahoma St

60.6%

8.4%

2.9%

0.7%

(11) BYU

44.2%

15.9%

3.5%

0.7%

(7) VCU

28.3%

4.5%

1.5%

0.3%

(8) Oregon

39.4%

3.6%

0.9%

0.2%

(14) Georgia St

14.4%

4.0%

0.4%

<0.1%

(13) Harvard

8.8%

2.3%

0.2%

<0.1%

(12) Wofford

17.7%

2.1%

0.1%

<0.1%

(16) Coastal Carolina

2.1%

0.4%

<0.1%

<0.1%

(15) Texas Southern

1.1%

<0.1%

<0.1%

<0.1%

This region looks like it could be pretty chalky. Wisconsin, Arizona, and North Carolina all have pretty high percentages of reaching the Sweet 16. And the top 5 seeds all have a greater than 80% chance of winning their opening game. Ohio State could create some chaos, but we’ll get to them in a minute.

Both Sagarin and Pomeroy have Wisconsin and Arizona as the 2nd and 3rd best teams after Kentucky. Sagarin has Wisconsin as 2 and Arizona at 3 while Pomeroy has them switched. But no matter the order, it’s a bit of a shame that they have to face off in the same region. Wisconsin is favored here, but Arizona is actually the favored in Pomeroy (42% of going to the final four as opposed to Wisconsin’s 36% chance). Your best bet is selecting one of these two teams to reach the final four.

Arizona’s probability is lower than Wisconsin’s here thanks to the fact that they might have to face an insanely good 10 seed in Ohio State in the 2nd round. The Sagarin ratings have Ohio State ranked 11th. Eleventh! Now, the Pomeroy ratings have Ohio State ranked at a more reasonable 21st .  And despite the high ranking, the Sagarin ratings still only give the Buckeyes a 24% chance of reaching the Sweet 16. So don’t go thinking they’re a lock to pull the upset. But if you’re going to go with Wisconsin to win the region and want to pick chaos elsewhere, you could do worse than Ohio State in the Sweet 16.

A popular upset is always the 12 seed over the 5, and Arkansas is a 5 seed that normally would be ripe for the upset. The problem is that Wofford is an extremely weak 12 seed (ranked 114th in Sagarin and 90th in Pomeroy). Both Harvard and Georgia State would have had better chances of pulling the upset if they had gotten the 12 seed. But alas, Wofford got the 12 and Harvard and Georgia State are stuck with much harder opponents. So since this might be a popular upset pick with so many other brackets, perhaps it’s one for you to stay away from.

The best chance for a 1st round upset is the 6/11 game, as the 11 seed will be only a small underdog to Xavier (I used numbers for BYU as the 11 seed, but if Ole Miss wins they would be slightly bigger underdogs). But other than the 11 seed and Ohio State, this region looks like it could be pretty chalky.

But what about when you factor in upset points?

Expected Points in a Pool with Upset Points

Team

2nd Round

Sweet 16

Elite 8

Final 4

(10) Ohio St

2.15

5.85

7.86

9.21

(1) Wisconsin

0.98

2.73

5.38

8.86

(2) Arizona

0.99

2.42

4.58

6.85

(4) UNC

0.91

2.28

4.16

5.10

(11) BYU

2.21

4.13

4.74

4.95

(3) Baylor

0.86

2.02

2.87

3.44

(9) Oklahoma St

0.61

1.79

2.16

2.30

(6) Xavier

0.56

1.64

2.10

2.26

(14) Georgia St

1.59

2.04

2.14

2.16

(5) Arkansas

0.82

1.36

2.02

2.15

(12) Wofford

1.24

1.47

1.49

1.50

(13) Harvard

0.79

1.04

1.09

1.09

(8) Oregon

0.39

0.83

0.92

0.94

(7) VCU

0.28

0.71

0.83

0.87

(16) Coastal Carolina

0.32

0.37

0.38

0.38

(15) Texas Southern

0.15

0.15

0.15

0.15

When you factor in upset points, Ohio State becomes the best choice for the Final Four! But let’s pump the breaks for a second. Wisconsin’s expected points aren’t that much lower, and they’re much more likely to reach the final four. Plus we know the Pomeroy ratings like Arizona a little more and Ohio State a lot less. So if you’re in a smaller pool, I’d still go with either Wisconsin or Arizona (or North Carolina if you want a dark horse). If you take Ohio State and they lose early, you probably just eliminated yourself.

Now if you’re in a very large pool, Ohio State becomes worth the chance. Sure, if they lose early, you’re eliminated. But hundreds of other people will also pick Wisconsin and Arizona, so your chances of winning are still small even if you correctly pick one of the top seeds. To win large pools, you need to correctly pick the huge upsets, and Ohio State to the Final Four would be just that.

One interesting thing is that even with upset points, the statistics say you should pick North Carolina to the Sweet 16. So no matter what type of pool you’re in, I would go ahead and match them up against Wisconsin.

In the bottom half of the bracket, you should definitely choose Ohio State over VCU and the 11 seed over Xavier (especially if it’s BYU). After that it’s your decision if you want to have them going further. One strategy would be to split the difference. If you have Arizona in the elite 8 (or beyond) go ahead and put BYU in the Sweet 16. And if you have Ohio State knocking off Arizona, go with the more likely Baylor Bears.

 East Probability of Advancing

Team

2nd Round

Sweet 16

Elite 8

Final 4

(1) Villanova

97.9%

83.3%

62.6%

36.0%

(2) Virginia

97.4%

75.2%

53.4%

32.8%

(3) Oklahoma

94.5%

71.5%

30.0%

13.5%

(4) Louisville

89.2%

60.3%

22.1%

8.8%

(7) Michigan St

69.6%

19.6%

9.1%

3.3%

(5) N Iowa

81.9%

34.1%

8.5%

2.3%

(8) NC State

60.4%

11.0%

4.5%

1.1%

(6) Providence

60.2%

18.1%

4.2%

1.1%

(11) BSU/Dayton

39.8%

9.5%

1.7%

0.4%

(9) LSU

39.6%

5.5%

1.8%

0.3%

(10) Georgia

30.4%

4.9%

1.5%

0.3%

(12) Wyoming

18.1%

3.0%

0.2%

< 0.1%

(13) Irvine

10.8%

2.6%

0.2%

< 0.1%

(14) Albany

5.5%

0.9%

< 0.1%

< 0.1%

(15) Belmont

2.6%

0.2%

< 0.1%

< 0.1%

(16) Lafayette

2.1%

0.2%

< 0.1%

< 0.1%

Villanova and Virginia have nearly identical probabilities of reaching the Final Four, although Virginia comes with a caveat. For most of the season, they were ranked #2 right behind Kentucky. In fact, at one point in the Pomeroy ratings, Virginia’s Pythagorean rating was higher than 7 of the previous 13 #1 ranked teams!

However, one of Virginia’s best plays (Justin Anderson) was injured and missed 8 games towards the end of the season. During this span, Virginia did not play nearly as well. Their ranking slipped while they played without him, as they are now 4th in both Pomeroy and Sagarin. Anderson returned for the last two games, but clearly wasn’t 100%, as he had to play with a wrap on his shooting hand. Without a healthy Anderson, Virginia’s chances are probably even lower than shown here. But if he’s ready to go for the tournament, then their chances are better than shown here and they should be considered the favorites in the region. The problem is it’s hard to know which version of Anderson is going to show up!

Just like the West region, the top 5 seeds all have very high chances of avoiding a first-round upset. And even the 6, 7, and 8 seeds have over a 60% chance of winning!  Upsets could be hard to come by in the first round of this region. However, one to consider is UC Irvine over Louisville. The numbers here say Irvine only has an 11% chance, but they don’t tell the entire story. Louisville will be playing without starting point guard Chris Jones, and UC Irvine will be playing with 7’6” Mamadou Ndiaye (yes, that's 7 feet 6 inches), who missed most of the season with an injury. The Vegas spread is still 8.5, so Louisville is still a heavy favorite. But that line shows their chances of winning are closer to 80% than the 89% shown here.

In the 2nd round, an upset that is getting mentioned a lot is Michigan State over Virginia. However, the Sagarin ratings only give them about a 1 in 4 chance of beating Virginia. And that’s if they can get past Georgia. They have a 70% chance of winning their opening game here, but the Pomeroy ratings have their percentage at a much more loseable 62%. If you’re not getting points for upsets, it’s probably best to go with Virginia here.

One final note about Northern Iowa. The numbers like Louisville over Northern Iowa in a potential 2nd round game, but they also don’t know that Louisville is playing without Chris Jones. Additionally, the Pomeroy ratings like Northern Iowa a lot more than Sagarin (ranked 12th as opposed to 25th). So Northern Iowa’s chances of reaching the sweet 16 are greater than shown here.

Expected Points in a Pool with Upset Points

Team

2nd Round

Sweet 16

Elite 8

Final 4

(1) Villanova

0.98

2.65

5.15

8.04

(2) Virginia

0.97

2.48

4.62

7.24

(3) Oklahoma

0.95

2.38

3.57

4.66

(4) Louisville

0.89

2.10

3.74

4.44

(7) Michigan St

0.70

2.52

3.32

3.86

(11) BSU/Dayton

1.99

3.27

3.60

3.70

(8) NC State

0.60

2.00

2.45

2.64

(6) Providence

0.60

1.57

1.93

2.07

(10) Georgia

0.91

1.61

1.84

1.92

(5) Northern Iowa

0.82

1.50

2.32

2.54

(12) Wyoming

1.27

1.60

1.66

1.66

(9) LSU

0.40

1.17

1.40

1.47

(13) UC Irvine

0.98

1.26

1.31

1.32

(14) Albany

0.60

0.71

0.72

0.72

(15) Belmont

0.34

0.37

0.37

0.37

(16) Lafayette

0.31

0.35

0.35

0.35

Even in an upset pool, this region is pretty straightforward. You should put Villanova and Virginia in the regional finals, with the winner of that game being up to you. These numbers say to put Louisville in the Sweet 16, but remember, they’re overstating their chances. So you would be fine putting either Louisville or Northern Iowa in the Sweet 16. And whichever team you don’t select, pick them to lose their first round game.

In the bottom half of the bracket, definitely pick the 11 seed to upset Providence (Boise State and Dayton are about the same, so it doesn’t matter who wins the play in game). The numbers would say to pick them to beat Oklahoma too (especially if you’re putting Virginia in the Elite 8). But we saw previously that Oklahoma is a pretty heavy favorite to reach the sweet 16, so if you don't want to go upset heavy Oklahoma in the Sweet 16 is fine.

Your last decision in this region is what to do with Michigan State. If you don’t think Justin Anderson will be ready to go for Virginia, then I don’t think it would be a bad decision to put Michigan State in the Sweet 16 and Oklahoma in the Elite 8. But remember that Pomeroy gives Michigan State a smaller chance of winning their first round game than Sagarin. So the safer play would be to advance Virginia to the sweet 16 and pick Georgia to upset Michigan State in the opening round game.  

South Probability of Advancing

Team

2nd Round

Sweet 16

Elite 8

Final 4

(1) Duke

96.9%

83.1%

55.6%

36.6%

(2) Gonzaga

97.2%

72.9%

48.9%

24.2%

(5) Utah

85.5%

65.8%

31.2%

18.5%

(3) Iowa St

93.6%

69.2%

32.3%

12.1%

(4) Georgetown

88.6%

27.6%

7.4%

2.7%

(7) Iowa

59.2%

17.3%

7.4%

2.0%

(6) SMU

63.0%

20.7%

6.0%

1.3%

(8) San Diego St

58.0%

10.2%

3.2%

0.9%

(10) Davidson

40.8%

9.6%

3.5%

0.8%

(9) St. John’s

42.0%

6.1%

1.6%

0.4%

(11) UCLA

37.0%

9.0%

1.9%

0.3%

(12) SF Austin

14.5%

5.8%

1.0%

0.2%

(14) UAB

6.4%

1.1%

0.1%

0.0%

(13) E Washington

11.4%

0.8%

0.0%

0.0%

(16) UNF/RBU

3.1%

0.5%

0.0%

0.0%

(15) North Dakota St

2.8%

0.2%

0.0%

0.0%

If there is a region where the final four team isn’t a 1 or 2 seed, the South region is the most likely candidate. The top two seeds have a combined 60% chance of reaching the final four. That’s still pretty high, but it’s lower than the other 3 regions (East: 69%, West: 72%, Midwest: 82%). The most likely candidate to play spoiler is Utah. They are a 5 seed, but they are actually ranked 8th overall in both Pomeroy and Sagarin. And you should also know that the Pomeroy ratings have Gonzaga ranked above Duke, so they actually have Gonzaga has the favorite to win the region. But even then, the probability Pomeroy has them of getting to the final four is only 29%. What that really means is that this region is up for grabs, and you can’t go wrong picking any of the top 4 teams in the table above.

But one warning before you go putting Utah in your Final Four. They drew a very tough opening round game against Stephen F. Austin. Utah is still heavily favored, but this game could be close. And in the Pomeroy ratings, SF Austin is ranked 35th! That’s higher than some of the 8 and 9 seeds! Pomeroy actually gives SF Austin a 26 % chance of winning. That’s probably not high enough that you want to pick it, but just something to keep in mind.

One team to definitely avoid is Georgetown, as they have only a 27.6% chance of getting to the sweet 16. That’s lower than Maryland! The Hoyas should still win their first round game, but I wouldn’t be confident picking them to go much farther than that.

Expected Points in a Pool with Upset Points

Team

2nd Round

Sweet 16

Elite 8

Final 4

(1) Duke

0.97

2.63

4.85

7.79

(2) Gonzaga

0.97

2.43

4.38

6.32

(5) Utah

0.85

2.17

3.49

5.27

(3) Iowa St

0.94

2.32

3.61

4.58

(11) UCLA

1.85

3.03

3.38

3.47

(10) Davidson

1.22

2.62

3.16

3.35

(7) Iowa

0.59

2.19

2.83

3.13

(6) SMU

0.63

1.73

2.24

2.41

(8) San Diego St

0.58

1.83

2.13

2.29

(4) Georgetown

0.89

1.44

1.97

2.18

(12) SF Austin

1.02

1.76

1.99

2.05

(9) St. John’s

0.42

1.26

1.45

1.53

(13) E Washington

1.03

1.11

1.12

1.12

(14) UAB

0.71

0.84

0.86

0.86

(16) UNF/RBU

0.46

0.54

0.55

0.55

(15) North Dakota St

0.37

0.40

0.40

0.40

Duke still comes out on top in upset pools, but they do have the smallest expected point of any 1 seed in the tournament. Plus, that number would be even lower if you used Pomeroy. So honestly, I think taking Gonzaga, Utah, or Iowa State would be fine selections. Then fill out the rest of the region based on that decision. For example, if you have Iowa State going to the Elite 8, go ahead and pick Davidson or Iowa to beat Gonzaga (Davidson would get you more upset points, but Iowa is more likely to pull the upset). Or if you have Gonzaga in the Elite 8, go ahead and put UCLA in the Sweet 16. They’re not likely to get there, but if they do that could net you 21 points! (Remember Dayton last year?) And since you have Gonzaga advancing anyway, you’re not losing many points if UCLA loses early. And at the very least, make sure you pick UCLA to beat SMU.

On the top of the bracket, your best bet is having Duke and Utah meet up. But if you have Duke winning that game, you might want to have SF Austin winning a game or two. And if you’re not advancing Georgetown far (and you probably shouldn’t), take a flyer and pick Eastern Washington.

Remember, crazy things can happen in this tournament. And especially in early rounds, when you’re not giving up many points if you’re wrong, make sure you go for those upset points!

Final Four

Team

Final Four

Semifinal

Champion

(1) Kentucky

76.5%

54.2%

40.7%

(1) Wisconsin

43.5%

19.8%

12.0%

(1) Duke

36.6%

20.8%

8.2%

(2) Virginia

32.8%

18.8%

7.9%

(1) Villanova

36.0%

19.1%

7.2%

(2) Arizona

28.5%

12.1%

6.9%

(2) Gonzaga

24.2%

12.5%

4.3%

(5) Utah

18.5%

9.2%

3.0%

(4) North Carolina

11.7%

3.5%

1.5%

(3) Oklahoma

13.5%

5.4%

1.5%

(3) Iowa St

12.1%

4.7%

1.2%

(4) Louisville

8.8%

3.0%

0.7%

(4) Baylor

7.0%

1.8%

0.6%

(3) Notre Dame

6.2%

1.9%

0.6%

(2) Kansas

5.8%

1.8%

0.6%

(5) West Virginia

2.7%

0.6%

0.2%

The Sagarin ratings give Kentucky approximately a 41% chance of winning the championship (Pomeroy has them a little lower at 34%). Both systems say there is a better chance Kentucky doesn’t win the championship. So don’t think Kentucky winning it all is a forgone conclusion.

In fact, Kentucky’s biggest challenger could come before they even reach the title game. Wisconsin has the 2nd highest odds here, and they would face Kentucky in the national semifinal game.  And Wisconsin has the size to match up with Kentucky. That game would be a great rematch of last year’s semifinal, which Kentucky won by a single point. And if Wisconsin doesn’t face Kentucky in the final four, it’ll most likely be Arizona (which the Pomeroy ratings actually have as the next highest favorite after Kentucky). And remember that Arizona was the 3rd ranked team in Sagarin. So although their probability takes a hit here thanks to potentially drawing Ohio State in the 2nd round, they are also a legitimate threat to Kentucky.

After Wisconsin, there are 4 more teams with similar probabilities. Out of those 4 the one that stands out the most is Virginia. If you have them getting to the final four, you’re probably thinking Anderson has returned and is playing at the same level as before the injury. And with a healthy Anderson, Virginia was playing at a level that wasn’t too far behind Kentucky. If that’s the case (and I know it’s a big if), I think their chances are even better than shown here, and they would be most likely to reach the finals to potentially face Kentucky.

You’ll notice that the teams with the best probabilities are all 1 and 2 seeds. Because of this, you shouldn’t worry about bonus points too much at this stage. Instead just pick the team you think has the best chance of winning. After all, if you did take somebody like Ohio State to get to the final four and you were correct, you’ll most likely already be in 1st place in your pool. So at this point you want to try and pick the same teams everybody else did so they don’t have a chance at catching you.

 So there you have it, the 2015 bracket broken down. May your upset picks be correct and your final four teams survive and advance. And above all else, enjoy the madness!


Planning a Trip to Disney World: Using Statistics to Keep It in the Green

$
0
0

Our vacation planning has begun. My daughter has requested a trip to Disney World as her high school graduation present. For most people, trip planning might mean a simple phone call to the local travel agent or an even simpler do-it-yourself online booking.

Not for me.

As a statistician, a request like this means I’ve got a lot of data analysis ahead. So many travel questions require (in my world, anyway) data-driven decisions. What is the best time to book tickets? What is the best flight/airline to use given the probability of cancellation and/or missed connections traveling from our small airport? How do we schedule our in-park time so we aren’t waiting in line most of the day?


The statistician and the graduate-to-be during a previous visit to Disneyland Paris. 

My list of questions goes on and on, and will keep me very busy in the weeks to come. But to keep this at a reasonable length for a blog post, let’s just focus on the last one. Specifically, how do we minimize queue time and maximize fun? There are many valid approaches to looking at a question like this, but to keep things simple and use available data, I’m going to take advantage of some new Data window features available in Minitab Statistical Software 17.2.

Disney queue time data is available on several websites with varying levels of sophistication, but I chose to use a very simple set of average wait times. It’s well known that park attendance is highly seasonal, so I chose to only look at data that matches the predicted crowd level for the days we will be there. We’re also going to focus this particular analysis on my family’s seven must-see attractions at The Magic Kingdom.

If you want to follow along in Minitab 17.2, please download my data sheet.  

My primary variable of interest is wait time in minutes. I want to investigate wait time by time of day using the specific ride as a grouping variable. To get a quick overview of the data, I started with a time series plot (Graph > Times Series Plot > Simple.) You can overlay multiple graphs (in this case, our seven must-see rides) on a single plot like this by selecting Overlaid on the same graph under the Multiple Graphs button. 

time series plot of Magic Kingdom rides

From this graph, I see that the new Seven Dwarfs Mine Train—the yellow line at or near the top for every hour—is going to be a tough one to ride without a substantial wait, but with the right timing, we can get through It’s a Small World pretty quickly. Everything else falls somewhere in between.

This is certainly good to know. What would be more useful, though, is to look at the actual data in our Minitab spreadsheet and set up some rules based on what we believe are acceptable wait times. My personal wait time tolerance can be roughly described as:

  • Under 20 minutes: totally Happy.
  • 20 to 35 minutes: may get a little Sleepy.
  • More than 35 minutes:  this better be the best ride ever or I’ll become very Grumpy.

Wouldn’t it be great if I could use information to visualize my data in the Data window? Fortunately, with Minitab 17.2, I can use conditional formatting to do this. Simply click in the Data window, and either right-click or choose Editor > Conditional Formatting. I used the options under Highlight Cell  to set three rules: Less than 20, Between 20 and 35, and Greater than 35. I can now use the resulting Green, Yellow, and Red formatting to plan my day at The Magic Kingdom.

conditional formatting of data for magic kingdom rides

Although Seven Dwarfs Mine Train never makes it into the green zone, putting the wait at the very end of our day—when our feet will be tired from walking anyway—may be reasonable. To avoid red as much as possible, we might want to hit either Space Mountain or Big Thunder Mountain first and move around from there.

I still need to collect information about the distance between rides to complete our plan, but I think we’re off to a promising start!

Check Your 2015 Men's NCAA Bracket Against Multiple Models

$
0
0

We’ve been pretty excited about March Madness here at Minitab. Kevin Rudy’s been busy creating his regression model and predicting the winners for the 2015 NCAA Men’s Basketball Tournament. But we’re not the only ones. Lots of folks are doing their best analysis to help you plan out your bracket now that the tip-offs for the round of 64 are just a day away. As you ponder your last-minute changes, I’ll compare some models to see where they agree and disagree. Here are some highlights from Five Thirty Eight, Microsoft Bing (in their debut to bracketology), PlayoffStatus.com, Ed Feng's Power Rank, and our very own Kevin Rudy’s model based on the Sagarin rankings.

Who’s going to win it all?

There’s no bigger question than who’s going to win it all, and for some analysts, there’s never been more certainty. Here’s how the 1 and 2 seeds from each region measure up. I included Kansas by virtue of its seeding, but for the 5 mathematical models here, Kansas has a lower mean probability of winning the tournament than 3 seeds Iowa State and Notre Dame. Being in the same region as Kentucky will do that to you.

Kentucky has the highest average probability to win the tournament across the 5 models, followed by Villanova, Duke, Wisconsin, Arizona, Virginia, and Gonzaga.

One interesting point is that while Kentucky is the favorite for every model, there’s a big difference in how sure people are about it. The model from Microsoft Bing has Kentucky at 18% and second-place Duke at 14%. For PlayoffStatus.com, the prediction is 17% for Kentucky and 12% for Villanova. None of the other models have anyone with 20% of Kentucky. Not every model gives game-by-game predictions, but I suspect that some of the difference comes from the projected ease-of-victory in the final game of the tournament. Five Thirty Eight, who announced that college basketball parity is over, use their model to say that if Kentucky gets to the championship game they should win 76.8% of the time against most likely opponents Villanova, Virginia, and Gonzaga. Microsoft Bing gives Kentucky a 55% chance of defeating most likely opponent Duke.

Upsets the models agree on

While not necessarily shocking upsets, the average win probability among the 5 models is higher for the lower seed in two games in the first round. The biggest difference in seeding is number 11 Texas defeating number 6 Baylor.

Texas is a favorite in 3 of the 5 models.

While 11 seeds historically win about 36% of the time in the first round, 3 of the models have Texas as a favorite heading into the game and all of them say that Texas has a better chance to win than a randomly selected number 11 seed.

The second upset is number 10 Ohio State defeating number 7 Virginia Commonwealth University.

Ohio State is a favorite in 3 of the 5 models.

This game is the Kevin Rudy special, a bold prediction of nearly 71% to win even though 10 seeds historically win only about 28% of the time. While Microsoft Bing and PlayoffStatus.com don’t have Ohio State as the favorite as the other 3 models do, everyone agrees that Ohio State is much more likely to win than a randomly selected 10 seed.

Bing gets bold

As you’re making your last-minute tweaks, it can be heartening to know that the experts don’t always agree. Here’s a graph of the standard deviations among the first round probabilities for the 5 models. The most disagreement is around different games than you might expect:

The models disagree about the two games: Oklahoma State vs. Oregon and Eastern Washington vs. Georgetown.

The disagreement for Oklahoma State vs. Oregon exists because the team at Microsoft Bing has made the boldest prediction of the tournament.

3 models have Oregon as an underdog, but Microsoft Bing says they are 91% likely to win.

In a game that’s supposed to be hard to call between an 8 and 9 seed, where in the history of the tournament 8 seeds have won 47% of the time, Microsoft Bing predicts a 91% win probability for Oregon. This prediction is particularly surprising because Microsoft Bing isn’t 91% sure about very many games. Kentucky, Villanova, Notre Dame, Arizona, and Iowa State are the only other teams that get over a 90% probability to win their first game. Even Microsoft Bing’s second-most-likely participant in the title game, Duke, only gets an 87% chance to finish off North Florida (Microsoft Bing’s projected winner of tonight’s game against Robert Morris.).

The disagreement about the game between Eastern Washington and Georgetown is also because of the prediction from Microsoft Bing.

4 Models have the higher-seeded Georgetown as a favorite. Microsoft Bing favors Eastern Washington.

In a game where 4 seeds have historically won nearly 81% of the time, Microsoft Bing projects Georgetown as an underdog to advance. That’s encouraging news for Tyler Harvey and the rest of the Eagles. But since this is Microsoft Bing’s debut in predictions and traditionally good models strongly disagree, it’s tempting to speculate that the extreme Bing results are flaws instead of insights. We’ll know by Thursday.

Wrap up

Simulations, advanced statistics, machine learning, and plenty of linear algebra go into coming up with models that describe how likely a team is to win a basketball game. It’s convenient when the models agree, but that won’t always be the case as long as some amount of variability comes from factors we can’t measure. In such cases, comparing different models is a sensible practice. Here’s to your bracket success, may it do at least as well as Pete Thamel’s, the first person I found with a published bracket without Kentucky in the national championship game.

Understanding Hypothesis Tests: Significance Levels (Alpha) and P values in Statistics

$
0
0

What do significance levels and P values mean in hypothesis tests? What is statistical significance anyway? In this post, I’ll continue to focus on concepts and graphs to help you gain a more intuitive understanding of how hypothesis tests work in statistics.

To bring it to life, I’ll add the significance level and P value to the graph in my previous post in order to perform a graphical version of the 1 sample t-test. It’s easier to understand when you can see what statistical significance truly means!

Here’s where we left off in my last post. We want to determine whether our sample mean (330.6) indicates that this year's average energy cost is significantly different from last year’s average energy cost of $260.

Descriptive statistics for the example

Probability distribution plot for our example

The graph above shows the distribution of sample means we’d obtain under the assumption that the null hypothesis is true (population mean = 260) and we repeatedly drew a large number of random samples.

I left you with a question: where do we draw the line for statistical significance on the graph? Now we'll add in the significance level and the P value, which are the decision-making tools we'll need.

We'll use these tools to test the following hypotheses:

  • Null hypothesis: The population mean equals the hypothesized mean (260).
  • Alternative hypothesis: The population mean differs from the hypothesized mean (260).
What Is the Significance Level (Alpha)?

The significance level, also denoted as alpha or α, is the probability of rejecting the null hypothesis when it is true. For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference.

These types of definitions can be hard to understand because of their technical nature. A picture makes the concepts much easier to comprehend!

The significance level determines how far out from the null hypothesis value we'll draw that line on the graph. To graph a significance level of 0.05, we need to shade the 5% of the distribution that is furthest away from the null hypothesis.

Probability plot that shows the critical regions for a significance level of 0.05

In the graph above, the two shaded areas are equidistant from the null hypothesis value and each area has a probability of 0.025, for a total of 0.05. In statistics, we call these shaded areas the critical region for a two-tailed test. If the population mean is 260, we’d expect to obtain a sample mean that falls in the critical region 5% of the time. The critical region defines how far away our sample statistic must be from the null hypothesis value before we can say it is unusual enough to reject the null hypothesis.

Our sample mean (330.6) falls within the critical region, which indicates it is statistically significant at the 0.05 level.

We can also see if it is statistically significant using the other common significance level of 0.01.

Probability plot that shows the critical regions for a significance level of 0.01

The two shaded areas each have a probability of 0.005, which adds up to a total probability of 0.01. This time our sample mean does not fall within the critical region and we fail to reject the null hypothesis. This comparison shows why you need to choose your significance level before you begin your study. It protects you from choosing a significance level because it conveniently gives you significant results!

Thanks to the graph, we were able to determine that our results are statistically significant at the 0.05 level without using a P value. However, when you use the numeric output produced by statistical software, you’ll need to compare the P value to your significance level to make this determination.

What Are P values?

P-values are the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis.

This definition of P values, while technically correct, is a bit convoluted. It’s easier to understand with a graph!

To graph the P value for our example data set, we need to determine the distance between the sample mean and the null hypothesis value (330.6 - 260 = 70.6). Next, we can graph the probability of obtaining a sample mean that is at least as extreme in both tails of the distribution (260 +/- 70.6).

Probability plot that shows the p-value for our sample mean

In the graph above, the two shaded areas each have a probability of 0.01556, for a total probability 0.03112. This probability represents the likelihood of obtaining a sample mean that is at least as extreme as our sample mean in both tails of the distribution if the population mean is 260. That’s our P value!

When a P-value is less than or equal to the significance level, you reject the null hypothesis. If we take the P value for our example and compare it to the common significance levels, it matches the previous graphical results. The P value of 0.03112 is statistically significant at an alpha level of 0.05, but not at the 0.01 level.

If we stick to a significance level of 0.05, we can conclude that the average energy cost for the population is greater than 260.

A common mistake is to interpret the P-value as the probability that the null hypothesis is true. To understand why this interpretation is incorrect, please read my blog post How to Correctly Interpret P Values.

Discussion about Statistically Significant Results

A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. A test result is statistically significant when the sample statistic is unusual enough relative to the null hypothesis that we can reject the null hypothesis for the entire population. “Unusual enough” in a hypothesis test is defined by:

  • The assumption that the null hypothesis is true—the graphs are centered on the null hypothesis value.
  • The significance level—how far out do we draw the line for the critical region?
  • Our sample statistic—does it fall in the critical region?

Keep in mind that there is no magic significance level that distinguishes between the studies that have a true effect and those that don’t with 100% accuracy. The common alpha values of 0.05 and 0.01 are simply based on tradition. For a significance level of 0.05, expect to obtain sample means in the critical region 5% of the time when thenull hypothesis istrue. In these cases, you won’t know that the null hypothesis is true but you’ll reject it because the sample mean falls in the critical region. That’s why the significance level is also referred to as an error rate!

This type of error doesn’t imply that the experimenter did anything wrong or require any other unusual explanation. The graphs show that when the null hypothesis is true, it is possible to obtain these unusual sample means for no reason other than random sampling error. It’s just luck of the draw.

Significance levels and P values are important tools that help you quantify and control this type of error in a hypothesis test. Using these tools to decide when to reject the null hypothesis increases your chance of making the correct decision.

In my next post, I’ll continue to use this graphical framework to help you understand confidence intervals!

A Simple Guide to Using Monte Carlo Simulation to Estimate Pi

$
0
0

Monte Carlo simulation has all kinds of useful manufacturing applications. And - in celebration of Pi Day - I thought it would be apropos to show how you can even use Monte Carlo simulation to estimate pi, which of course is the mathematical constant that represents the ratio of a circle’s circumference to its diameter. For our example, let’s start with a circle of radius 1 inscribed within a square with sides of length 2.

We can then use Monte Carlo simulation to randomly sample points from within the square. More specifically, we can randomly sample points using a uniform distribution where the minimum is -1 and the maximum is +1:

Since:

Then the ratio of the two areas - we'll call it r - can be represented as:

Using Devize software to run Monte Carlo simulation with 1,000,000 iterations, we arrive at 0.785069. (Since Monte Carlo simulation uses random sampling, this number will not be exactly the same every time you run a simulation.)

Therefore, if we use the r value generated using Monte Carlo simulation, we have:

Solving for pi, we multiply 0.785069 * 4 which gives us an approximation for pi of 3.140276. And that is how we can use Monte Carlo simulation to estimate pi.

My Life as an Outlier

$
0
0

I always knew I was different. Even as a kid.

“Is that me? Way out there in left field?” I asked the doc.

“Yes,” he nodded, as he looked at my chart. “I used brushing to identify you on the graph.”

I wasn’t sure I liked getting brushed. It felt like my true identify was being detected and displayed in a window for all to see.

The doctor must have sensed my discomfort.

“It’s not uncommon—even for those from a normal population—to appear as outliers,” he said, doing his best to put a good spin on it.

“For example, based on diagnostic criteria that define an outlier as a value that lies beyond the quartile 1 value minus 1.5 times the inter-quartile range, or beyond the quartile 3 value plus 1.5 the times the inter-quartile range, we’d expect 0.8% of observations to appear as outliers, even when they come from a perfectly normal population.”

I wondered where he’d learned to speak Golic Vulcan so well. Seeing the blank look on my face, he called in the Assistant to explain in clearer, simpler terms.

“For every 1000 observations,” the Assistant said, “roughly 8 are going to be labelled as funny little stars on this chart—even when they’re perfectly normal. In fact, that’s a very conservative estimate.”

“So maybe I’m just like everybody else?”  I asked, hopefully.

“I’d like to do some follow-up tests,” the doctor replied, cautiously.

The Results Come In, and I'm Out

It took about 9 seconds, using Minitab Statistical Software, to get my Dixon’s r22 Ratio Test results back. It seemed like forever.  

“The p-value for the outlier test is less than the significance level of 0.05,” the doctor began. “So we must reject the null hypothesis that you come from the same normal population as others.“

He paused to take a deep drag on a Lucky Strike that he held between the two thumbs of his left hand. Then he droned on in measured tones, summarizing each and every analysis that seemed to confirm my diagnosis as a delinquent datum. 

But I didn’t hear a word he said. I was already a million miles away, wondering how my parents would react.

Mom and Dad Try to Interpret Their Outlier

When my dad saw the individuals chart, he hit the roof.

“He’s out of control!!” Dad exploded.

“I’m sure there must be some special cause for it,” my mother reasoned.

“He’s never learned to respect limits,” he said.

“Let’s not overreact, dear. This might be a false alarm.”

“False alarm, huh?” my dad sneered. “Then what about this stem-and-leaf I found in his bedroom?”

“What were you doing in my bedroom?!” I protested."Did you brush me?"

“Maybe it belongs to one of his friends…” my mother said, with the same vague, speculative tone you’d use to say, “Maybe there’s life on other planets…"

Rebel without a Special Cause

When you treat someone a certain way, they begin living up (or down) to your expectations.

Once the world pegged me as an outlier, my attitude quickly changed. If I was going to be treated as an outcast, by god, I’d be an extreme one. Then they’d find out just how problematic a single, aberrant datum could be!

At first, I started messing with simple parametric statistics, like the mean. They were so sensitive and easy to push around, especially when they weren’t part of a large crowd.

Man, what a power trip! Single-handedly I could drag down an arithmetic average. Or blow a variance sky-high, until it reached over 50 times its original magnitude. Sweet! 

See the data huddle fearfully inside a single histogram bin when I’m around? Heh heh heh...they' so afraid they even begin to question their own normality!

As time went on, my insatiable craving for deviation made me move on to bigger things. That's when I started going out at night to wreck models.

I loved to ruin a clean, shiny model and instantly make it a disjointed, insignificant mess.


When I was feeling even more insidious, I'd use sleight of hand to make an insignificant relationship appear significant, to the unsuspecting. Little did they know, as soon as I walked away their perfect little model would crumble into a million little unrelated pieces. Ha ha  ha!

Ah, those were the days. The grand vicissitudes of youth! My pointy, pixelated head was either soaring high in the clouds, or spiraling down to the bottom of a subterranean sinkhole.

Then I got busted.

Remember that Assistant in the doctor's office? The one who could cogently explain the maximum likelihood function to a group of rodeo clowns? Turns out he's also a part-time policeman who conducts routine data checks.

One day he flagged me running a red light. Then the gig was up. My deviance was exposed for all to see.

What To Do with Me?

Once I'd been apprehended and booked, the debate began. How should the world deal with the error of my ways?

Some wished I'd never existed in the first place. They believed I wasn't fit to live with other normal data. I upset the natural balance.

"How simple and peaceful and wonderful the results would be," they argued, "If we could just delete this errant value."

Others argued it was ethically wrong to expunge me. They believed that, with the right transformation, I could be successfully reformed.

To curb my extremist tendencies, some statistical shrinks recommended that I undergo the rigors of a square root or logarithmic transformation. A few even advised shipping me off to the Box-Cox Boarding School for the Delinquent Datum.

"It can work wonders on reforming outliers like your son," the headmaster told my parents.

Yet others felt the reformist approach was just a charade. A sneaky scaling maneuver with smoke and mirrors--one that really didn't change the true nature of my underlying character. They argued against treating me as an aberration.

"There's nothing really wrong with him," they said. "He doesn't need changing. He's just crying out for attention."

These people recognized a simple, basic truth about me. 

All I'd ever really wanted, was to be understood.

Viewing all 828 articles
Browse latest View live