Quantcast
Channel: Minitab | Minitab
Viewing all 828 articles
Browse latest View live

Practical Statistical Problem Solving Using Minitab to Explore the Problem

$
0
0

By Matthew Barsalou, guest blogger

A problem must be understood before it can be properly addressed. A thorough understanding of the problem is critical when performing a root cause analysis (RCA) and an RCA is necessary if an organization wants to implement corrective actions that truly address the root cause of the problem. An RCA may also be necessary for process improvement projects; it is necessary to understand the cause of the current level performance before attempts are made to improve the performance.

Many statistical tests related to problem-solving can be performed using Minitab Statistical Software. However, the actual test you select should be based upon the type of data you have and what needs to be understood. The figure below shows various statistical options structured in a cause-and-effect diagram with the main branches based on characteristics that describe what the tests and methods are used for.

The main branch labeled “differences” is split into two high-level sub-branches: hypothesis tests that have an assumption of normality, and non-parametric tests of medians. The hypothesis tests assume data is normally distributed and can be used to compare means, variances, or proportions to either a given value or to the value of a second sample. An ANOVA can be performed to compare the means of two or more samples.

The non-parametric tests listed in the cause-and-effect diagram are used to compare medians, either to a specified value, or two or more medians, depending upon which test is selected. The non-parametric tests provide an option when data is too skewed to use other options, such as a Z-test.

Time may also be of interest when exploring a problem. If your data are recorded in order of occurrence, a time series plot can be created to show each value at the time it was produced; this may give insights into potential changes in a process.

A trend analysis looks much like the time series plot; however, Minitab also tests for potential trends in the data such as increasing or decreasing values over time. Exponential smoothing options are available to assign exponentially decreasing weights to the values over time when attempting to predict future outcomes.

Relationships can be explored using various types of regression analysis to identify potential correlations in the data such as the relationship between the hardness of steel and the quenching time of the steel. This can be helpful when attempting to identify the factors that influence a process. Another option for understanding relationships is Design of Experiments (DoE), where experiments are planned specifically to economically explore the effects and interactions between multiple factors and a response variable.

Another main branch is for capability and stability assessments. There are two main sub-branches here; one is for measures of process capability and performance and the other is for Statistical Process Control (SPC), which can assess the stability of a process.

The measures of process performance and capability can be useful for establishing the baseline performance of a process; this can be helpful in determining of process improvement activities have actually improved the process. The SPC sub-branch is split into three lower-level sub-branches; these are control charts for attribute data such as number of defective units, control charts for continues data such as diameters, and time-weighted charts that don’t give all values equal weights.

Control charts can be used for both assessing the current performance of a process such as by using an individual’s chart to determine if the process is in a states of statistical control, or for monitoring the performance of a process such as after improvements have been implemented.

Exploratory data analysis (EDA) can be useful for gaining insights to the problem using graphical methods. The individual values plot is useful for simply observing the position of each value relative to the other values in a data set. For example, a box plot can be helpful when comparing the means, medians and spread of data from multiple processes. The purpose of EDA is not to form conclusions, but to gain insights that can be helpful in forming tentative hypotheses or in deciding which type of statistical test to perform.

The tests and methods presented here do not cover all available statistical tests and methods in Minitab; however, they do provide a large selection of basic options to choose from.

These tools and methodsare helpful when exploring a problem, but their use should not be limited to problem exploration. They can also be helpful for planning and verifying improvements. For example, an individual value plot may indicate one process performs better than a comparable process, and this can then be confirmed using a two-sample t test. Or, the settings of the better process can be used to plan a DoE to identify the optimal settings for the two processes and the improvements can be monitored using an xBar and S chart for the two processes.  

 

About the Guest Blogger

Matthew Barsalou is a statistical problem resolution Master Black Belt at BorgWarner Turbo Systems Engineering GmbH. He is a Smarter Solutions certified Lean Six Sigma Master Black Belt, ASQ-certified Six Sigma Black Belt, quality engineer, and quality technician, and a TÜV-certified quality manager, quality management representative, and auditor. He has a bachelor of science in industrial sciences, a master of liberal studies with emphasis in international business, and has a master of science in business administration and engineering from the Wilhelm Büchner Hochschule in Darmstadt, Germany. He is author of the books Root Cause Analysis: A Step-By-Step Guide to Using the Right Tool at the Right TimeStatistics for Six Sigma Black Belts and The ASQ Pocket Guide to Statistics for Six Sigma Black Belts.


Capability Quandaries Exposed

$
0
0

Don't be a grumpy cat when something on your capability report doesn't smell right. After pressing that OK button to run your analysis, allow your inner cat to understand how and why certain statistics are being used. To help you along, here are some capability issues that customers have brought up recently.

CP is missing

You’ve generated a capability analysis report with the Johnson transformation and don’t see a table for your Potential (Within) Capability metrics, i.e., cp and cpk:

You would use a transformation method like Johnson or Box-Cox when your data is non-normal. The transformation will attempt to convert the data into something more approximately normal. The problem lies with how the transformations handle subgrouping. The Johnson transformation unfortunately doesn’t use subgrouping, as it considers all of the data to be part of the same group prior to transformation. Therefore, it's impossible to calculate within-subgroup capability indices, and hence Minitab only displays overall capability metrics. 

Fortunately, when you apply the Box-Cox transformation, Minitab does calculate the within-subgroup capability indices, because the Box-Cox method is able to preserve the grouping information of your measurements into subgroups. This allows Minitab to display metrics for within-subgroup capability and overall capability.

Asterisks for Metrics

Your Process Capability Report is displaying asterisks for some of your capability metrics, as shown below:

The answer to this riddle lies in the graph itself. The user entered in a lower boundary instead of a lower spec. Thus, the red vertical line on the left is labelled LB instead of LSL, which is an abbreviation for lower specification limit. Under the capability analysis dialog menu, check the ‘Boundary’ box only if it’s impossible for a defect to fall beyond that spec. Otherwise, leave the box unchecked.

CP Is Present When the Subgroup Size Is 1

If you do not have subgrouping in your capability analysis, you can enter in a value of 1 in the box entitled Subgroup size:

If you were expecting to just see overall statistics in your graph, you will be in for a surprise. Fortunately, Minitab will still display CP metrics on the report for measuring the Potential(Within) Capability. We end up measuring the within standard deviation by using the average moving range method. In the absence of subgrouping, this will allows us to still track process variation. An example of calculating moving ranges is shown below.

Ppk > Cpk

When Cpk > Ppk, it means that if you are able to eliminate assignable causes that occur in between subgroups in the long-term or overall process, then your process has the potential to improve and become more capable. What about when Ppk > Cpk? Although it is unusual for Ppk to be larger than Cpk, it can happen on occasion.

We see this more often when using a subgroup size of 1 as opposed to larger subgroups. As mentioned above, when the subgroup size is 1, we estimate the within-subgroup standard deviation using the average moving range method. If there is a lot of variation between consecutive observations (for example, when the data alternate between a low and a high value), it is possible that the calculations will yield a within standard deviation that is larger than the overall standard deviation. As a result, the Cpk will be smaller than the Ppk.

Missing the Target

When you generate a capability analysis, you may see a metric called CPM under Overall Capability. By default, this will show up as an asterisk in your report. This is because CPM requires the input of a target, which needs to be entered under the Options sub-menu of Capability Normal:

You can then assess if your process is on target in addition to being within specification and see an actual value for CPM on your Capability Report. If, by chance, your target ends up being the same as your data’s sample mean, you’ll find that your CPM equals the PPK.

If you don’t have a target at all, that’s okay—it's not required that you have one to run the analysis.

Hopefully, your inner cat feels better about performing a capability analysis in Minitab after understanding the logic behind these potential scenarios!

 

 

Big Ten 4th Down Calculator: Week 7

$
0
0

Going into Saturday, Nebraska was 0-5 in games decided by 7 points or less and Michigan State was 4-0 in games decided by 7 points or less. You'll often hear sports analysts use this as proof that one team chokes under the pressure and the other team knows how to win in the clutch. But in reality, the result in close games can have just as much to do with luck as it can skill.

Consider the formula that the Big Ten 4th down calculator uses for win probability. The general idea is that a team's final margin of victory can be approximated as a normal random variable with a mean of the Vegas line and a standard deviation of the difference between the final margin and the Vegas line (which in college football is 15.53). For example, if you're favored by two touchdowns you have approximately a 82% chance of winning at the beginning of the game.

Probability Distribution Plot

We can clearly see that the better team will win the game most of the time. But as the game is being played, both the mean and the standard deviation decrease to account for the diminishing time remaining in the game. If the game is tied going into the 4th quarter and the current situation on the field gives both teams an expected value of 0, the probability of the favored team winning drops to 67%. 

The team with more skill is still likely to win, but we see it's not as sure of a thing. And their probability of winning keeps decreasing as the time decreases. If the game is still tied with 5 minutes left, we'd expect the team with more "skill" to win only 60% of the time! Remember, this team was a two-touchdown favorite! And if the game comes down to the final play, the most random of events can decide the outcome of the game. 

In Michigan State's win over Oregon, Oregon quarterback Vernon Adams overthrew a wide open receiver that would have given Oregon the lead with less than 90 seconds left in the game. And when the Spartans beat Michigan, well...you know what happened on the final play of that game. As for Nebraska, they lost to BYU on a 42-yard Hail Mary pass on the last play of the game. In fact, Nebraska's first 4 losses of the season came on their opponents' final offensive play of the game. It's not that Michigan State had some special ability to win close games and Nebraska didn't. Michigan State was just having the lucky breaks go their way at the end of games and Nebraska wasn't.

But the thing about lucky breaks is that they can't go for you or against you forever.

And sure enough, at the end of the Nebraska/Michigan State game, Nebraska threw a touchdown pass with 17 seconds left to take the lead. Except it clearly looked like the receiver ran out of bounds on his own before catching the ball, which meant there should have been a penalty on the play and the touchdown shouldn't have counted. However, the referees strangely let the play stand, and Nebraska went on to win. The call by the ref could have gone either way, and it had nothing to do with either team's ability to win or lose close games. So the next you hear a narrative about a team based on the results of close games, don't believe it. Because chances are, factors outside their control influenced the outcome of their close games just as much as any ability they had did. 

Now let's move on to the games! If you're new to this and want to know more about what exactly the Big Ten 4th down calculator does, you can read the intro from a previous weeks post.

Illinois 48 - Purdue 14

Purdue is leading the country in 4th down attempts, and is also one of the best teams in the Big Ten in 4th down decision making. Unfortunately for the Boilermakers, that didn't help them at all against Illinois. 

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Illinois

7 2 2 3 2 0.53 Purdue 7 0 7 0 0 0


Purdue continues it's great 4th down decision making, but this week it wasn't for reasons that Purdue would like. Every 4th down Purdue had in the first 3 quarters ended in a punt, and the distances were so far that the calculator agreed with all of them. Purdue did go for it a couple of times on 4th down in the 4th quarter, but by then the game was already out of hand, so it didn't really have an effect on the outcome. 

Illinois had two 4th downs in the same drive that the calculator disagreed with. First, they had a 4th and 6 at the Purdue 32 yard line. They decided to go for it when the calculator suggested kicking a 49 yard field goal. Now, sometimes this long of a field goal is out of a kicker's range, and in that case the calculator fully supports going for it over punting. But Illinois kicker Taylor Zalewski is 3 for 7 in his college career from 50 or more yards, so he definitely has the leg. It's hard to blame a coach for being aggressive since they are typically so conservative, but Illinois should have kicked the field goal here. Of course, that makes their decision later in the drive pretty ironic. With a 4th and 3 at the Purdue 16 yard line, they decided to kick a field goal when they should have gone for it. Sure, it was an easier field goal to make, but 4th and 3 is also a lot easier to make than 4th and 6! So the coach went for it when he should have kicked, and then kicked when he should have gone for it. Go figure.  

Illinois had this game wrapped up by the 4th quarter, so let's move on to the next game.

Michigan 49 - Rutgers 16

Rutgers has now allowed 48 or more points in their last 4 games. Maybe they should join Indiana in the #NeverPunt support group. After all, if your defense is that bad, it doesn't really matter where the other teams starts their drives from, does it?

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Rutgers

7 1 4 2 1 0.75 Michigan 3 2 1 2 0 0.73

Okay, to Rutgers credit they never really had a chance to be aggressive in this game. Their first three punts were on 4th down distances of 10, 12, and 19 yards. Even with a terrible defense, you should probably be punting in those situations. The only disagreement Rutgers had was actually being aggressive. Instead of kicking a field goal on 4th and 7 at the Michigan 18, they decided to go for it. You'll see that the difference in expected points is 0.75, so out of context it was a really bad decision. But they were down 46-16 at the time, so they went for it. Although with that big of a deficit, it didn't really matter either way.

Michigan's first disagreement was punting on 4th and 3 from their own 22 yard line. The calculator is always going to suggest going for it on 4th and 3, even deep in your own territory. But the difference in expected points is only 0.07, so teams should consider other factors to make their decision. In this case, Michigan was a heavy favorite, they were already winning by 22 points, and there were only 9 seconds left in the half. Punting here was absolutely the correct decision, so the calculator won't count it against them in the team summary.

The other disagreement was kicking a field goal on 4th and 3 from the Rutgers 16 when the calculator says to go for it. This cost Michigan 0.66 points, but at the time they were already up 27 points. So it didn't really matter to the outcome of the game.

Ohio State 28 - Minnesota 14

This game started with 6 straight possessions that ended in punts. That's B1G.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Minnesota

8 1 7 1 0 0.41 Ohio State 5 0 5 0 0 0

Lots of punts that the calculator agreed with in this game. The lone exception was a Minnesota punt on 4th and 2. Sure, it was on their own 23 yard line, but teams convert on 4th and 2 about 60% of the time. In addition, Minnesota was a 23-point underdog in this game. You're going to have to play aggressively to pull the upset. Punting here was the incorrect decision in general, but it was even worse for such a heavy underdog.

All these correct punts doesn't leave us with much to talk about here, so let's move on to the 4th quarter.

4th Down Decisions in the 4th QuarterTeamTime Left4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionWin Probability Go For ItWin Probability Kick Ohio State
(Up by 21) 14:42 4 65 Punt Punt 99.93% 99.96% (Punt) Minnesota
(down by 21) 11:29 4 12 Go for it Go for it 0.08% 0.03% (FG) Ohio State
(up by 14) 7:46 4 18 FG FG 99.92% 99.98% (FG) Minnesota
(down by 14) 6:25 9 58 Go for it Punt 0.09% 0.04% (Punt)

Ohio State was up 21-0 entering the 4th quarter, but Minnesota was able to rally and actually make it a 7-point game. However, you'll see by the win probabilities that Ohio State had the game pretty well under control. The 4th down decisions went pretty well until Minnesota punted on a 4th and 9 with 6:25 left. Obviously 4th and 9 is a hard situation to convert. But they were down by 2 touchdowns with only 6 and a half minutes left. Even if you punt and get the ball back, it's going to be very hard to score two more times. Minnesota cut their chances of winning in half by punting. 

But the Golden Gophers did get the ball back and they even scored a touchdown to make it a 7-point game. But there were only 2 minutes left in the game, meaning they had to attempt an onside kick. Ohio State recovered the kick and ended up scoring a touchdown, effectively ending the game. But let's back up one minute to Minnesota's touchdown. Minnesota scored with two minutes left to cut the deficit to 8 points. They then kicked the field to make it a 7 point game, but they actually should have gone for 2.

And the decision isn't even a close one.

College teams convert a two-point conversion about 41% of the time, and they make the extra point 94% of the time. So in general, you're better off kicking the extra point unless the situation dictates it. But being down by 14 at the end of the game is a situation that absolutely dictates going for 2. The reason is that going for 2 leaves you the option of winning the game in regulation, whereas kicking the extra point requires you to win in overtime. And the key is that you get two conversion attempts. So if you miss on the first one, you can still tie the game up the second time around. Here's the math behind it.

Let's start with the strategy of kicking two extra points. The following table shows the different possibilities and the win probabilities for each one. 

1st Touchdown

2nd Touchdown

Win In Overtime

Win Probability

Made EP (94%)

Made EP (94%)

Road Team (44%)

38.9%

Missed EP (6%)

Lost in Regulation (0%)

0%

Missed EP (6%)

Made 2PT (41%)

Road Team (44%)

1.1%

Missed 2PT (59%)

Lost in Regulation (0%)

0%

The probability for winning in overtime comes from a previous blog post which found that home teams win in overtime about 56% of the time. So when you add it all up, the total probability of winning by kicking extra points is approximately 40%. Now what if they went for 2 after the first touchdown? Keep in mind, if you make the first two-point conversion, you just kick the extra point the second time around. But if you miss it, you go for two on the second touchdown to try to tie the game. 

1st Touchdown

2nd Touchdown

Win In Overtime

Win Probability

Made 2PT (41%)

Made EP (94%)

Won in Regulation (100%)

38.5%

Missed EP (6%)

Road Team (44%)

1.1%

Missed 2PT (59%)

Made 2PT (41%)

Road Team (44%)

10.6%

Missed 2PT (59%)

Lost in Regulation (0%)

0%

By going for 2 after the first touchdown, Minnesota increase their chances of winning to 50%! And these numbers are all pretty straightforward. It's crazy to think that with all the comebacks you see in college football every week, nobody has ever tried this strategy. Not even once! There are millions of dollars on the line for these college football teams, and yet they are unable (or unwilling) to do simple math. It boggles my mind.

Iowa 35 - Indiana 27

Iowa continues their undefeated season. I can't help but think the football gods are rewarding Kirk Ferentz for his decision earlier this season to go for it on 4th and 1 at his own 25 yard line against Wisconsin.Fortune favors the bold.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Iowa

5 1 4 0 1 0.79 Indiana 5 0 4 1 0 0

So "technically" the 4th down calculator didn't disagree with any of Indiana's decisions. But considering how good their offense is and how bad their defense is, they should have been much more aggressive than the calculator suggests. Already down 7-0, they had a 4th and 4 at the Iowa 15. The calculator suggests kicking a field goal (which Indiana did), but the difference in expected points is 0.11. And keep in mind this is an Indiana defense that gave up 47 points to Southern Illinois (a team that is currently 3-6 and sitting next to last in the Missouri Valley Conference). Do you really think you're going to beat an undefeated team with field goals?

Things got worse on their next drive. They had a 4th and 5 at the Iowa 42 yard line. Again, the calculator suggests punting but the difference is only 0.09 points! If I'm Indiana, I'm absolutely going for it here. But instead, they punted. And sure enough, Iowa proceeded to drive 95 yards and scored a touchdown.

If Kirk Ferentz pleased the football gods by going for it on 4th and 1 against Wisconsin, he most certainly angered them in this game. On a 4th and 1 from their own 39, Iowa punted the ball. This is bad enough in general, but keep in mind you're going up against Indiana's defense! Iowa's two running backs averaged 5.7 yards per carry on the day. I think they could have managed getting a single yard here. But instead, they punted. And Indiana had an 80 yard drive on the next possession, which brought the game into the 4th quarter.

4th Down Decisions in the 4th QuarterTeamTime Left4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionWin Probability Go For ItWin Probability Kick Indiana
(Down by 4) 13:22 8 10 Go for it FG 29.9% 29.8% (FG) Indiana (Down by 8) 9:13 17 82 Punt Punt 2.5% 2.9% (Punt) Iowa
(Up by 15) 4:28 2 63 Punt Punt 99.8% 99.98% (Punt)

The 4th quarter gave us more evidence that Indiana shouldn't kick. Down by only 4, they had a 4th and 8 at the Iowa 10. Their chance of converting was bleak, with teams only converting 24% of the time (I considered it 4th and goal, which gave a lower probability than a normal 4th and 8, which I thought was a more accurate representation of Indiana's true probability of converting). But even so, the calculator ever so slightly favors going for it because you're still losing even with a field goal. But it's so close you should consider other factors. If I have a strong defense, I have no problem with the decision to kick here. But this is Indiana. They should have absolutely gone for it. Or if you are going to kick a field goal, how about following that up with a surprise onside kick! But no, Indiana doesn't like to try to win. After the field goal they kicked deep, and Iowa drove 75 yards for a touchdown. Then Indiana found themselves in a 4th down distance so long they had to punt, only to see Iowa score yet another touchdown on the vaunted Indiana defense. 

After an Indiana interception and a correct Iowa punt, Indiana drove 75 yards and scored a touchdown. They went for it on 4th down twice, but down 15 points with under 4 minutes left, you obviously have to go for it so I didn't include them in the table. I will note that Indiana scored their touchdown on a 4th and 9 from the Iowa 11. Not very different than the 4th and 8 they had earlier in the quarter when they decided to kick a field goal. Oh, Indiana...maybe some day.

Wisconsin 31 - Maryland 24

Maryland put up quite a fight here. But could a more aggressive strategy on 4th downs have turned this close loss into a victory?

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Wisconsin 6 1 4 1 1 0.79 Maryland 7 1 5 2 0 0.14

It may not have turned the close loss into a victory, but Maryland missed an opportunity on their first drive to be aggressive. They had a 4th and 3 at the Wisconsin 46 and decided to punt. The difference in expected points is only 0.14, so on the surface the decision to kick isn't terrible. But when you consider that Maryland was a two-touchdown underdog in this game, they should have absolutely been aggressive and went for it. 

With the game tied at 7, Wisconsin had a 4th and 1 at their own 22 yard line. The 4th down calculator was sad to see Wisconsin trot on the punt team. But wait, what's this? A fake punt that goes for 57 yards! That's the spirit, Bucky Badger! And on the very next play Wisconsin scored a touchdown. That just goes to show that it pays to be aggressive on 4th and 1!

But things weren't all peaches and cream for Wisconsin on 4th down. Later in the game they found themselves in the exact same position, 4th and 1 at their own 22. This time they actually did punt, to disastrous results. The punt went only 11 yards, and 3 plays later Maryland was in the end zone.

4th Down Decisions in the 4th QuarterTeamTime Left4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionWin Probability Go For ItWin Probability Kick Maryland
(Down by 14) 12:74 11 66 Go for it Punt 0.81% 0.75% (Punt) Maryland
(Down by 14) 9:10 10 83 Go for it Punt 0.24% 0.16% (Punt) Maryland
(Down by 14) 5:32 2 72 Go for it Go for it 0.11% 0.03% (Punt) Wisconsin
(Up by 7) 0:54 1 35 Go for it Go for it 98.4% 96.3% (Punt)

Wisconsin had some 4th downs in the 4th quarter. But they were all obvious decisions until the last one, so I left them out of the table. But let's focus on Maryland first. It's always interesting to see how long a team waits to get aggressive in the 4th quarter when they're trailing. One thing the calculator has shown this year is that they should start being aggressive much earlier, especially when down multiple scores. Here it suggests that Maryland should have gone for it on 4th and 11 and 4th and 10 in their own territory. By the time they did decide to go for it, their win probability had fallen from 0.81% to 0.11%. Things were bleak from the start, but that's still a decrease of 86%. If they were aggressive earlier, Maryland could have given themselves a reasonable chance to win instead of needing a miracle at the end.

Although they almost got the miracle they needed.

After scoring a touchdown to make it a 7 point game with 2:39 left (and yes, they should have gone for two after that touchdown instead of kicking the extra point), Maryland recovered an onside kick. Well, almost. They were called for offsides and had to re-kick, and Wisconsin was able to recover the second time around. This then set up a 4th and 1 decision for Wisconsin. A first down ended the game, but a failed conversion gave Maryland the ball at their own 35 with just under a minute left. So could making Maryland gain about 15-25 extra yards be worth the punt?

Our model for expected points isn't applicable to situations at the very end of games, so we can't apply the usual formula for win probability. To fix this, I went through play-by-play data for every college football game from 2006-2012. I found 87 games where a team had the ball in their own territory needing a touchdown with between 30 and 60 seconds left. I separated the data into two groups, teams starting inside their own 20, and teams starting between the 20 and midfield. I wanted to see if the teams that had fewer yards to gain won more often.

It turns out, they really don't.

Cross Tabulation

Teams that started outside their own 20 won slightly more often than teams inside their own 20, but the numbers are pretty close. So it doesn't look like Wisconsin would really gain anything by punting. Plus everybody remembers what happened in the end of that Michigan/Michigan State game, right? So going for it was absolutely the correct call by Wisconsin.

Northwestern 23 - Penn State 21

Was Northwestern able to fix their poor 4th down decision making in their win over Penn State?

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Penn State 10 1 10 0 0 0.41 Northwestern 8 1 6 2 0 0.79

Only one disagreement, for less than a point! This is actually the best week Northwestern has had when it comes to their 4th down decision making! Although I shouldn't give them too much praise, as their one disagreement was really bad. They punted on 4th and 1 from about midfield. That day, Northwestern running back Justin Jackson averaged 6.6 yards per carry. But somehow Pat Fitzgerald thought trying to gain one yard here was just too risky. Maybe next week you won't get a 4th down decision wrong, Northwestern. Maybe next week.

Penn State had 10 punts in the first 3 quarters, which sums up pretty well what kind of a day it was for the Nittany Lions. In fact, their best pass of the of game came from a wide receiver! Their incorrect 4th down decision was punting on a 4th and 2 from their own 17 yard line. However, when you consider the teams playing, punting this deep in their own territory wasn't too bad a decision. Northwestern has a very strong defense and the Penn State offense...well, did I mention their best pass was from a wide receiver? So the difference in expected points is actually probably less than 0.41. Although punting didn't work well for them, as Northwestern scored a touchdown on the very next drive. 

Later in the game, Penn State had a 4th and 8 from the Northwestern 36. If your kicker has the leg, the calculator likes a field goal here. But Penn State was actually playing their 2nd-string kicker, so it's probably safe to assume a 54 yard field goal is out of his range. In that case, the calculator says to punt, although the difference in expected points is only 0.1. So either decision is probably fine. Penn State decided to punt, and for the second time in the game punting backfired, as Northwestern drove 91 yards for their second touchdown of the game.

4th Down Decisions in the 4th QuarterTeamTime Left4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionWin Probability Go For ItWin Probability Kick Northwestern
(Down by 1) 11:25 11 76 Punt Punt 30.9% 35.9% (Punt) Northwestern
(Down by 1) 3:19 17 38 Punt Punt 36.4% 46.9% (Punt) Penn State
(Up by 1) 2:13 2 86 Go for it Punt 76.8% 75.8% (Punt)


Earlier, we saw that Maryland should have gone for it on 4th and 11 early in the 4th quarter when trailing by 2 scores. But when the deficit is only 1 point, there isn't a need to be as aggressive, as we see with Northwestern's first punt of the first quarter. And even with a 4th and 17 with 3 minutes left in the game, they still had a reasonable chance of winning by punting the ball back to Penn State. The Nittany Lions found themselves in a 4th and 2.

And yes, the calculator says they should have gone for it.

To calculate Penn State's win probability, I again went back through the play-by-play data and looked for teams down 1 or 2 points with 1 to 2 minutes left in the game. I divided the games into teams that had 1st and 10 inside the red zone (to see how often Penn State would lost if they failed on the 4th and 2) and teams that had the ball between their 40 and their opponents 40 (to get a win probability for punting). Here are the results.

Cross Tabulation

Most people think that going for it on 4th and 2 at your own is stupid because if you fail, you just lost the game. Well, most people would be wrong. Teams that start in the red zone down 1-2 points (like Northwestern would have) only win half of the time! I'm guessing this is because teams score with enough time left for their opponents to respond. And Penn State had all 3 timeouts left, so if they failed on 4th down they "should have" been able to manage the clock to have plenty of time to respond to a Northwestern score. Of course, I say "should have" because Penn State ended up managing the clock terribly in the actual game.

But enough talk about what would have happened if they failed...what if they converted on 4th down?That's where the benefit of going for it really lies. With a first down, Penn State would have all but ended the game (I gave them a win probability of 95% since Northwestern had a timeout left, so Penn State would have had to punt with around 30 seconds left).

But before we make a final conclusion that Penn State absolutely should have gone for it, we should consider other factors. Northwestern has one of the best defenses in the Big Ten, and at this point I think jokes about Penn State's best throw coming from a wide receiver have stopping being funny. Teams convert on 4th and 2 about 60% of the time. But it doesn't take a probability much lower than that for the calculator to suggest a punt. In fact, if Penn State thought their chances of converting were 57% of lower (and they probably were), they should have punted.

Honestly, these numbers are so close that you shouldn't be upset with either decision. But of course people would have been irate if they went for it and failed. So that is why Penn State did what every other team in the country would do and punted. And as the final score indicates, the result didn't end well for them.

Nebraska 39 - Michigan State 38

This game had only 5 total punts in it. Watch out, Nebraska and Michigan State, we might have to send you to the Big 12!

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Michigan State 4 1 2 1 1 0.07 Nebraska 5 0 2 2 1 0

The only disagreement with the calculator was Michigan State punting on 4th and 3. But the difference in expected points is so small that you can't be upset with the decision to kick. So instead let's compliment these teams for their great 4th down calls. Nebraska had a 4th and 1 that they correctly went for. They were successful, and the very next play was a 38 yard touchdown pass. Michigan State also had a 4th and 1 that they went for on their opening drive. They failed to convert, but it wouldn't be the last we'd see them be aggressive on 4th and 1. But for that, we'll have to head to the 4th quarter.

4th Down Decisions in the 4th QuarterTeamTime Left4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionWin Probability Go For ItWin Probability Kick Michigan St
(Up by 5) 9:01 1 44 Go for it Go for it 85.8% 84% (Punt) Michigan St
(Up by 5) 0:55 8 42 Go for it Punt 96.6% 96.3% (Punt)

Sometimes if coaches fail on a 4th down early in the game, they'll be hesitant to try again. But not the Spartans! Despite previously  failing on a 4th and 1, they correctly went for it on their first drive of the 4th quarter. And this time they succeeded and went on to score a touchdown on the drive. When it comes to 4th and 1, if at first you don't succeed, try, try again!

Now things get crazy for Michigan State's next 4th down decision. On a 4th and 8 with less than a minute left, the Spartans took a delay of game penalty and punted. But this was the exact same scenario we saw in the Wisconsin-Maryland game, and there we concluded that punting really didn't do a whole lot to increase your win probability. And the "Outside 20" group had an average starting position of their own 38 yard line. That's really not very different than the 42 yard line that Nebraska would start from if the Spartans failed to convert. So even though teams convert on 4th and 8 only 32% of the time, punting does so little for you that the calculator suggests going for it.

Having said that, these probabilities are very close together, so it's not like the punt lost Michigan State the game. But it did willingly give Nebraska a chance. And as we saw Saturday night, if a team is given a chance, even the slimmest of ones, anything can happen.

Summary

Each week, I’ll summarize the times coaches disagreed with the 4th down calculator and the difference in expected points between the coach’s decision and the calculator’s decision. I’ll do this only for the 1st 3 quarters since I’m tracking expected points and not win probability. I also want to track decisions made on 4th and 1, and decisions made between midfield and the opponent’s 25 yard line. I call this area the “Gray Zone”. Then we can easily compare the actual outcomes of different decisions in similar situations.

Team Summary Team Number of Disagreements Total Expected Points Lost Northwestern 9 6.57 Rutgers 7 3.96 Minnesota 9 3.81 Illinois 9 3.74 Michigan 7 3.64 Penn State 8 3.61 Indiana 5 3.3 Iowa 4 2.59 Nebraska 6 2.57 Michigan St 6 2.25 Wisconsin 5 2.16 Ohio State 3 0.92 Purdue 1 0.24 Maryland 2 0.18


The team I want to single out this week is Ohio State. So far this year they've given up less than a point because of their 4th down decision making. And they are the only team to have gone for it multiple times on 4th and 1 in their own territory (in the first 3 quarters). This team is going to be hard enough to beat normally. But if they're making optimal 4th down decisions, it's going to be next to impossible. 

4th and 1

Yards To End Zone

Punts

Average Next Score After Punt

Go for It

Average Next Score after Go for it

Field Goals

Average Next Score After FG

75-90

3

2.33 1 7

*

*

50-74

17 -0.35 3 4.67

*

*

25-49

0

0

8 3.13 1 -7

1-24

*

* 10 2.1 3 3

Thanks to Wisconsin's fake punt, we finally have a team that went for it on 4th and 1 inside their own 25 yard line. We now have four 4th and 1 attempts that have taken place in the team's own territory. Three of them have ended in touchdowns (in the other, nobody scored before the end of the half). So when I see that there have been 20 punts on 4th and 1, I simply see 20 wasted opportunities to maximize points.

The Gray Zone (4th downs 25-49 yards to the end zone)

4th Down Distance

Punts

Average Next Score After Punt

Go for It

Average Next Score after Go for it

Field Goals

Average Next Score After FG

1

0

0

8 3.1 1 -7

2-5

19 0.16 12 -0.58 3 -0.33

6-9

18 0.5 10 -0.4 8 3.25

10+

31 0 1 7 16 0.81

Usually, teams should be going for it on 4th and 5 and shorter in the gray zone. We see that teams mostly make the correct decision on 4th and 1, but are more hesitant on the distance 2 through 5. The number of punts in the 2-5 category should not outnumber the number of 4th down attempts!

In the 6-9 yard category, I'm finding it pretty funny that the average next score after a field goal attempt is higher than 3. This is because twice a team has missed the field goal, only to get the ball back and score a touchdown next. Small sample sizes, everybody!

Terry Bradshaw Might be the Best Super Bowl Quarterback Ever

$
0
0

By U.S. Navy photo by Chief Photographer's Mate Chris Desmond. [Public domain], via Wikimedia CommonsLast time I touched on the subject of the greatest Super Bowl quarterback, I promised a multivariate analysis considering several different statistics. Let’s get right to a factor analysis.

Getting Ready for Factor Analysis

One purpose of factor analysis is to identify underlying factors that you can’t measure directly. These factors explain the variation of many different variables in fewer dimensions. Here are the variables we’re going to consider:

  • Margin of victory
  • Difference between Super Bowl winner’s passer rating and the playoff passer rating allowed by the opposing team—PR Diff (Winner – Allowed)
  • Point spread—Spread
  • Adjusted career rating of the losing quarterback—Adjusted Career PR Loser
  • The difference between the winning and losing quarterback’s ratings—PR Difference (Winner – Loser)
  • Winning quarterback’s rating—Passer Rating Winner
  • Losing quarterback’s rating—Passer Rating Loser
Determining the number of factors

To begin the factor analysis, you usually determine the number of factors to use. The determination is similar to determining the number of principal components. Looking for eigenvectors bigger than 1, for the number of factors that determine 80% of the variation, and for the number of components that explain large amounts of variation relative to the other factors. A scree plot of the eigenvectors look like this:

The first two factors have eigenvalues greater than 1. The third eigenvalue is close to 1.

Two factors have eigenvalues greater than 1, and the third factor is close. The 3 factors explain about 80% of the variation in the data, so 3 factors seems likes a reasonable number to explore.

Factor rotation

Once we determine the number of factors, we want to see if we can find a rotation that produces underlying factors that make sense. In general, rotation of the factors makes them load on fewer variables so that the factors are simpler. For example, the Minitab output from the varimax rotation shows the unrotated and rotated factor loadings:

Unrotated Factor Loadings and Communalities
Variable                        Factor1  Factor2  Factor3  Communality
Passer Rating Loser              -0.644    0.502    0.420        0.843
Passer Rating Winner              0.723    0.563    0.127        0.857
PR Difference (Winner - Loser)    0.953    0.027   -0.212        0.955
Adjusted Career PR Loser         -0.181    0.748   -0.400        0.752
Spread                           -0.536    0.189   -0.687        0.794
PR Diff (Winner - Allowed)        0.620    0.570    0.145        0.731
Margin of victory                 0.700   -0.324   -0.214        0.640
Variance                         3.0410   1.5952   0.9359       5.5721
% Var                             0.434    0.228    0.134        0.796

Rotated Factor Loadings and Communalities
Varimax Rotation
Variable                        Factor1  Factor2  Factor3  Communality
Passer Rating Loser               0.032   -0.912   -0.106        0.843
Passer Rating Winner              0.909    0.171    0.030        0.857
PR Difference (Winner - Loser)    0.598    0.767    0.096        0.955
Adjusted Career PR Loser          0.336   -0.256   -0.757        0.752
Spread                           -0.363   -0.085   -0.810        0.794
PR Diff (Winner - Allowed)        0.850    0.086    0.011        0.731
Margin of victory                 0.178    0.754    0.198        0.640
Variance                         2.1852   2.0973   1.2896       5.5721
% Var                             0.312    0.300    0.184        0.796

 

In this output, the unrotated first factor has 5 variables where the absolute value of the loading is 0.6 or higher. The rotated first factor has 2 variables with loadings of 0.6 or higher, so the rotated factor should be easier to interpret.

We’re lucky, in this case, because the different rotation methods available in Minitab all produce factors that load on the same variables. When different methods agree, you feel more certain about the results.

Interpreting the factors

The first factor, which loads highly on the winning quarterback’s passer rating and the difference between that passer rating and what the opposing team allowed in the playoffs, looks like a measure of how well the winning quarterback played. Higher values of this factor indicate better performance.

The second factor is the most difficult to interpret because of the signs of the different variables with high loadings. You get a higher value of the second component by having a higher margin of victory, by having a higher difference between the ratings of the winning and losing quarterbacks, and by having a lower passer rating by the losing quarterback. I would think that the first two components would be values that you would want to be high, but that you would also want the third value to be high.

It looks like the variation in the data suggests that a losing team is much more likely to lose by a lot of points if the opposing quarterback plays poorly. In my first post about the best Super Bowl quarterback, I made the judgement that winning a competitive Super Bowl was more impressive than winning a noncompetitive match. Thus, I’m going to tend to think that lower values of the second component, caused by high passer ratings of the opposing quarterback, small differences, and smaller margins of victory are more impressive; but I’ll conduct the final comparisons both ways to see how it affects the conclusion.

The third factor loads on two factors: the point spread and the passer rating of the losing quarterback. This factor is about the quality of the victory. The more positive the point spread, the more unexpected the victory was. Also, the better the opposing quarterback the better the victory was. Because both of these loadings are negative, more negative values of the third factor indicate better performance.

Conclusion?

In addition to the decision of what to do with the second component, there are still some other considerations for how to determine the best super Bowl quarterback. For example, should we compare the candidate quarterbacks to the average performance or to the best performance? Should we look at the mean performance of the best quarterbacks or the median performance? With so many options available for the remaining analysis, we’ll have to wait for next time to review them all. For now, here’s some initial impressions of the three factors.

Different points identify Montana, Bradshaw, Aikman, and Brady

Factor scores for all Super Bowl victors

In terms of a quarterback playing well, especially in light of the opposing team, Terry Bradshaw’s first victory over the Dallas Cowboys, in Super Bowl X, takes the prize among our candidate quarterbacks. A factor score of 1.62 is not quite as good as Jim Plunkett’s 1.77, but pretty good for a guy throwing against Hall of Fame cornerback Mike Renfro. Among the candidates, Bradshaw also has the second-place score for his second victory over the Cowboys in Super Bowl XIII.

We’ll explore the best of the second factor in more detail, but the extremes make quarterback’s look good in both directions. Among the candidates, Tom Brady has the minimum score from his victory over the Carolina Panthers in Super Bowl XXXVIII. Brady overcame an incredible effort by Jake Delhomme that resulted in a 113.6 passer rating, the highest rating by a losing quarterback in a Super Bowl. Brady’s effort is also the overall minimum for factor 2.

On the maximum side of factor 2 lies another candidate, Montana’s victory over the Broncos in Super Bowl XXIV. The 45-point victory is the only Super Bowl in our data set where the winning quarterback’s passer rating exceeded the losing quarterback’s by over 100 points.

With respect to the third component, no victory was more unexpected than Brady’s overcoming of the Kurt Warner-led Rams in Super Bowl XXXVI. The 14-point underdog did enough that day to fend off the fourth quarter charge of the Greatest Show on Turf in what was, at the time, the first Super Bowl to be decided by a score on the final play of the game.

So, will it come down to Bradshaw or Brady defying the odds, or Montana’s domination? We’ll evaluate all three factors next time!

Ready to try out your own factor analysis? Check out the overview in Perform a Factor Analysis on the Minitab Support Center.

The photograph of Terry Bradshaw and Lieutenant Commander Heather Pouncey is by Chief Photographer's Mate Chris Desmond, whose work deserves attribution this Veteran's Day even though it's in the public domain.

Control Charts - Not Just for Statistical Process Control (SPC) Anymore!

$
0
0

Control charts are a fantastic tool. These charts plot your process data to identify common cause and special cause variation. By identifying the different causes of variation, you can take action on your process without over-controlling it.

Assessing the stability of a process can help you determine whether there is a problem and identify the source of the problem. Is the mean too high, too low, or unstable? Is variability a problem? If so, is the variability inherent in the process or attributable to specific sources? Control charts answer these questions, which can guide your corrective efforts.

Determining that your process is stable is good information all by itself, but it is also a prerequisite for further analysis, such as capability analysis. Before assessing process capability, you must be sure that your process is stable. An unstable process is unpredictable. If your process is stable, you can predict future performance and improve its capability.

While we associate control charts with business processes, I’ll argue in this post that control charts provide the same great benefits in other areas beyond statistical process control (SPC) and Six Sigma. In fact, you’ll see several examples where control charts find answers that you’d be hard pressed to uncover using different methods.

The Importance of Assessing Whether Other Types of Processes Are In Control

I want you to expand your mental concept of a process to include processes outside the business environment. After all, unstable process levels and excessive variability can be problems in many different settings. For example:

All of these processes can be stable or unstable, have a certain amount of inherent variability, and can also have special causes of variability. Understanding these issues can help improve all of them.

The third bullet relates to a research study that I was involved with. Our research goal was to have middle school subjects jump from 24-inch steps, 30 times, every other school day to determine whether it would increase their bone density. We defined our treatment as the subjects experiencing an impact of 6 body weights. However, we weren’t quite hitting the mark.

To guide our corrective efforts, I conducted a pilot study and graphed the results in the Xbar-S chart below.

Xbar-S chart of ground reaction forces for pilot study

The in-control S chart (bottom) shows that each subject has a consistent landing style that produces impacts of a consistent magnitude—the variability is in control. However, the out-of-control Xbar chart (top) indicates that, while the overall mean (6.141) exceeds our target, different subjects have very different means. Collectively, the chart shows that some subjects are consistently hard landers while others are consistently soft landers. The control chart suggests that the variability is not inherent in the process (common cause variation) but rather assignable to differences between subjects (special cause variation).

Based on this information, we decided to train the subjects how to land and to have a nurse observe all of the jumping sessions. This ongoing training and corrective action reduced the variability enough so that the impacts were consistently greater than 6 body weights.

Control Charts as a Prerequisite for Statistical Hypothesis Tests

As I mentioned, control charts are also important because they can verify the assumption that a process is stable, which is required to produce a valid capability analysis. We don’t often think of using control charts to test the assumptions for hypothesis tests in a similar fashion, but they are very useful for that as well.

The assumption that the measurements used in a hypothesis test are stable is often overlooked. As with any process, if the measurements are not stable, you can’t make inferences about whatever you are measuring.

Let’s assume that we’re comparing test scores between group A and group B. We’ll use this data set to perform a 2-sample t-test as shown below.

two sample t-test results

The results appear to show that group A has the higher mean and that the difference is statistically significant. Group B has a marginally higher standard deviation, but we’re not assuming equal variances, so that’s not a problem. If you conduct normality tests, you’ll see that the data for both groups are normally distributed—although we have a sufficient number of observations per group that we don’t have to worry about normality. All is good, right?

The I-MR charts below suggest otherwise!

I-MR chart for group A

I-MR chart of group B

The chart for group A shows that these scores are stable. However, in group B, the multiple out-of-control points indicate that the scores are unstable. Clearly, there is a negative trend. Comparing a stable group to an unstable group is not a valid comparison even though the data satisfy the other assumptions.

This I-MR chart illustrates just one type of problem that control charts can detect. Control charts can also test for a variety of patterns in the data and for out-of-control variability. As these data show, you can miss problems using other methods.

Using the Different Types of Control Charts

The I-MR chart assesses the stability of the mean and standard deviation when you don’t have subgroups, while the XBar-S chart shown earlier assesses the same parameters but with subgroups.

You can also use other control charts to test other types of data. In Minitab, the U Chart and Laney U’ Chart are control charts that use the Poisson distribution. You can use these charts in conjunction with the 1-Sample and 2-Sample Poisson Rate tests. The P Chart and Laney P’ Chart are control charts that use the binomial distribution. Use these charts with the 1 Proportion and 2 Proportions tests.

if you're using Minitab Statistical Software, you can choose Assistant > Control Charts and get step-by-step guidance through the process of creating a control chart, from determining what type of data you have, to making sure that your data meets necessary assumptions, to interpreting the results of your chart.

Additionally, check out the great control charts tutorial put together by my colleague, Eston Martz.

So Why Is It Called "Regression," Anyway?

$
0
0

Did you ever wonder why statistical analyses and concepts often have such weird, cryptic names?

One conspiracy theory points to the workings of a secret committee called the ICSSNN. The International Committee for Sadistic Statistical Nomenclature and Numerophobia was formed solely to befuddle and subjugate the masses. Its mission: To select the most awkward, obscure, and confusing name possible for each statistical concept.

A whistle-blower recently released the following transcript of a secretly recorded ICSSNN meeting:

"This statistical analysis seems pretty straightforward…"

“What does it do?”

“It describes the relationship between one or more 'input' variables and an 'output' variable. It gives you an equation to predict values for the 'output' variable, by plugging in values for the input variables."

“Oh dear. That sounds disturbingly transparent.”

“Yes. We need to fix that—call it something grey and nebulous. What do you think of 'regression'?”

“What’s 'regressive' about it? 

“Nothing at all. That’s the point!”

Re-gres-sion. It does sound intimidating. I’d be afraid to try that alone.”

“Are you sure it’s completely unrelated to anything?  Sounds a lot like 'digression.' Maybe it’s what happens when you add up umpteen sums of squares…you forget what you were talking about.”

“Maybe it makes you regress and relive your traumatic memories of high school math…until you  revert to a fetal position?”

“No, no. It’s not connected with anything concrete at all.”

“Then it’s perfect!”

 “I don’t know...it only has 3 syllables. I’d feel better if it were at least 7 syllables and hyphenated.”

“I agree. Phonetically, it’s too easy…people are even likely to pronounce it correctly. Could we add an uvular fricative, or an interdental retroflex followed by a sustained turbulent trill?”

The Real Story: How Regression Got Its Name

Conspiracy theories aside, the term “regression” in statistics was probably not a result of the workings of the ICSSNN. Instead, the term is usually attributed to Sir Francis Galton.

Galton was a 19th century English Victorian who wore many hats: explorer, inventor, meteorologist, anthropologist, and—most important for the field of statistics—an inveterate measurement nut. You might call him a statistician’s statistician. Galton just couldn’t stop measuring anything and everything around him.

During a meeting of the Royal Geographical Society, Galton devised a way to roughly quantify boredom: he counted the number of fidgets of the audience in relation to the number of breaths he took (he didn’t want to attract attention using a timepiece). Galton then converted the results on a time scale to obtain a mean rate of 1 fidget per minute per person. Decreases or increases in the rate could then be used to gauge audience interest levels. (That mean fidget rate was calculated in 1885. I’d guess the mean fidget rate is astronomically higher today—especially if glancing at an electronic device counts as a fidget.)

Galton also noted the importance of considering sampling bias in his fidget experiment:

“These observations should be confined to persons of middle age. Children are rarely still, while elderly philosophers will sometimes remain rigid for minutes.”

But I regress…

Galton was also keenly interested in heredity. In one experiment, he collected data on the heights of 205 sets of parents with adult children. To make male and female heights directly comparable, he rescaled the female heights, multiplying them by a factor 1.08. Then he calculated the average of the two parents' heights (which he called the “mid-parent height”) and divided them into groups based on the range of their heights. The results are shown below, replicated on a Minitab graph.

For each group of parents, Galton then measured the heights of their adult children and plotted their median heights on the same graph.

Galton fit a line to each set of heights, and added a reference line to show the average adult height (68.25 inches).

Like most statisticians, Galton was all about deviance. So he represented his results in terms of deviance from the average adult height.

Based on these results, Galton concluded that as heights of the parents deviated from the average height (that is as they became taller or shorter than the average adult), their children tended to be less extreme in height. That is, the heights of the children regressed to the average height of an adult.

He calculated the rate of regression as 2/3 of the deviance value. So if the average height of the two parents was, say, 3 inches taller than the average adult height, their children would tend to be (on average) approximately 2/3*3 = 2 inches taller than the average adult height.

Galton published his results in a paper called “Regression towards Mediocrity in Hereditary Stature.

So here’s the irony: The term regression, as Galton used it, didn't refer to the statistical procedure he used to determine the fit lines for the plotted data points. In fact, Galton didn’t even use the least-squares method that we now most commonly associate with the term “regression.” (The least-squares method had already been developed some 80 years previously by Gauss and Legendre, but wasn’t called “regression” yet.) In his study, Galton just "eyeballed" the data values to draw the fit line.

For Galton, “regression” referred only to the tendency of extreme data values to "revert" to the overall mean value. In a biological sense, this meant a tendency for offspring to revert to average size ("mediocrity") as their parentage became more extreme in size. In a statistical sense, it meant that, with repeated sampling, a variable that is measured to have an extreme value the first time tends to be closer to the mean when you measure it a second time. 

Later, as he and other statisticians built on the methodology to quantify correlation relationships and to fit lines to data values, the term “regression” become associated with the statistical analysis that we now call regression. But it was just by chance that Galton's original results using a fit line happened to show a regression of heights. If his study had showed increasing deviance of childrens' heights from the average compared to their parents, perhaps we'd be calling it "progression" instead.

So, you see, there’s nothing particularly “regressive” about a regression analysis.

And that makes the ICSSNN very happy.

Don't Regress....Progress

Never let intimidating terminology deter you from using a statistical analysis. The sign on the door is often much scarier than what's behind it. Regression is an intuitive, practical statistical tool with broad and powerful applications.

If you’ve never performed a regression analysis before, a good place to start is the Minitab Assistant. See Jim Frost’s post on using the Assistant to perform a multiple regression analysis. Jim has also compiled a helpful compendium of blog posts on regression.

And don’t forget Minitab Help. In Minitab, choose Help > Help. Then click Tutorials > Regression, or  Stat Menu >  Regression.

Sources

Bulmer, M. Francis Galton: Pioneer or Heredity and Biometry. Johns Hopkins University Press, 2003.

Davis, L. J. Obsession: A History. University of Chicago Press, 2008.

Galton, F. “Regression towards Mediocrity in Hereditary Stature.”  http://galton.org/essays/1880-1889/galton-1886-jaigi-regression-stature.pdf

Gillham, N. W. A  Life of Sir Francis Galton. Oxford University Press, 2001.

Gould, S. J. The Mismeasure of Man. W. W. Norton, 1996.

Big Ten 4th Down Calculator: Week 8

$
0
0

Big Ten LogoWe use statistics to arm ourselves with more information. That information allows us to make more informed decisions. And the sooner we can obtain this information, the better.

For example, suppose one of your manufacturing machines starts to malfunction and makes your products out of spec. You don't want to wait until the product reaches customers before you discover this information. Then it's too late to do anything about it. The ideal situation would be to use statistical tools—like a control chart—to discover that your process is out of control as soon as possible, so you can stop production and fix the problem early on. The faster you get that information, the better.

The same idea can be applied to a particular situation in football. Suppose you're losing by 15 points in the 4th quarter and you score a touchdown. You can either kick the extra point and be down by 8 points, or go for two and be down either 7 or 9 points. Almost every coach will kick the extra point, since if you miss the two-point conversion and are losing by 9, you have to score two more times. But if you're down by 8 you can potentially tie the game with one score, if you make a 2-point conversion after scoring a touchdown.

The key word there: potentially.

The problem is that if you're down by 8, you don't really know if you need one or two more scores. I like to think of it as Schrodinger's score: You're simultaneously down by 1 and 2 possessions, but you don't know which one it is until you actually score. Most coaches behave as if they're only down one possession. But if you score at the very end of the game and miss the two-point conversion, it's too late to do anything about it—just like discovering your product is defective after sending it to customers. That's why you should always go for 2 after the first touchdown. It's all about knowing the information as soon as possible. Sure, if you miss the two-point conversion, you have to score two more times. But isn't it better to know that fact when there is still time left in the game, as opposed to the very end? Of course it is. Whether it's football or quality, you don't want to delay the possible detection of bad news. Discover it as soon as possible so you have time to fix the problem!  

Now let's move on to the games! If you're new to this and want to know more about what exactly the Big Ten 4th down calculator does, you can read the intro from a previous week's post.

Nebraska 31 - Rutgers 14

At least Rutgers held an opponent to under 48 points for the first time in a month. Silver lining?

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Nebraska

3 1 3 0 0 0.79 Rutgers 8 2 6 0 2 1.02

Nebraska opened up our bad 4th down decision making on their second possession, when punted on 4th and 1 from midfield. This is a Rutgers defense that has allowed 48 or more points in its last 4 games. Come on, Nebraska! Last week against a much better defense (Michigan State) the Cornhuskers went for it on a 4th and 1 in similar field position (a difference of only 10 yards). They converted and scored a touchdown the very next play. So why play so much more conservatively against this Rutgers defense? I don't get it.

Luckily for Nebraska, Rutgers returned the favor on the very next possession, as they punted on a 4th and 1 from their own 11. They actually had a delay-of-game penalty and ended up punting on 4th and 6, but I'm counting it as a 4th and 1 since their plan was to punt all along. Sure, if you don't make it, Nebraska has great field position. But they're going to get good field position anyway—and your defense is terrible! And it's only 1 yard! And you're 3-6 on the year! Go for it! But my pleas for aggressive decision making fell on deaf ears, and Rutgers boomed a punt away. Sure enough, 5 plays later Nebraska was in the end zone, taking a 14-0 lead.

Rutgers' second disagreement with the model came late in the 3rd quarter. They had a 4th and 4 at their own 31 yard line. The statistics say to punt, but Rutgers went for it in the form of a fake punt. At the time Rutgers, was down 28-14, and the difference in expected points between punting and going for it was only 0.23. So even though it was a "disagreement," the calculator doesn't take any issue with Rutgers decision to be aggressive here. But why wait until such a dire situation? Even if Rutgers converted, they'd only have a win probability of 3.5%. Perhaps they should have tried being more aggressive earlier in the game. Like, I don't know, maybe on a particular 4th and 1 in the first quarter?

The fake punt failed for Rutgers and Nebraska kicked a field goal on the next possession, making the score of 31-14, which would remain so until the final whistle. 

Michigan State 24 - Maryland 7

Easy win for Sparty to keep their playoff hopes alive.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Maryland 7 0 6 0 1 0 Michigan St 7 1 5 1 1 0.02

In the first 3 quarters of their Big Ten conference games, Maryland has had 32 punts, and the 4th down calculator has only disagreed with one of them. Now, this isn't really because of great 4th down decision making by the Terps. It's because their average yards to go on those 32 punts is 10.5 yards. That's right, when Maryland punts it's usually because they lost yards on the previous 3 plays. That's B1G.

Michigan State's disagreement came on a 4th and 6 at the Maryland 14 yard line. The calculator says to kick a field goal, but Michigan State ended up going for it in the form of a fake field goal. The play was a disaster, as the the Spartans lost 6 yards. But the decision really wasn't a bad one, as the difference in expected points between kicking and going for it is basically 0. This is because even if the play failed, Michigan State was still likely to be the next team to score. And this ended up being true, as the next score in the game was a Michigan State touchdown.

Maryland had a punt on 4th and 3 that the calculator would usually disagree with. But in this case there was only a minute and a half left until halftime, and Maryland was at their own 24. With that little time left, even if you convert you're not likely to drive the length of the field and score. And if you fail, Michigan State has such good field position that the time will not matter for them. Plus, the difference in expected points is only 0.07. So Maryland definitely made the correct decision by punting, and I won't count the decision against them. 

The rest of this game was pretty boring from a 4th down decision making standpoint (and honestly, from a football entertainment standpoint too), so we'll move on to the next game.

Ohio State 28 - Illinois 3

Ohio State hasn't lost a Big Ten regular season game since 2011. Impressive.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Ohio State

7 4 4 1 2 2.85 Illinois 9 2 5 3 1 0.37

I've been praising Ohio State all season for their great 4th down decision making. They're the only team to have gone for it on 4th and 1 in their own territory multiple times. Oh, and every single one of those attempts has been successful and ended with a touchdown.

But then Saturday happened.

Not once. Not twice. But three times Ohio State punted on 4th and 1! On the day, Buckeye running back Ezekiel Elliott averaged 6.7 yards per rush. But Urban Meyer didn't trust him to get one yard? Considering how aggressive Ohio State has been on 4th and 1 this year, it was very confusing to see their decision making in this game. 

That accounted for 3 of Ohio State's disagreements with the model. As for the 4th, they were actually being too aggressive. On a 4th and 11 from the Illinois 38, Ohio State went for it. Because teams convert on 4th and 11 only 27% of the time, the stats suggest to punt. Now, Ohio State's offense is much better than your average offense, so it's never a bad thing for them to be more aggressive than the numbers suggest—but that makes the previous 4th down decisions more puzzling. 4th and 1? Hey, we better punt here! 4th and 11? Yeah, we got this!

And of course, because it's Ohio State, they converted on the 4th and 11 and went on to score a touchdown. This gave them a 14-3 lead and the game would never get within 1 possession again.

Iowa 40 - Minnesota 35

Iowa has now scored 31 or more points in their last 4 games. The idea that this is a stale and boring offense just isn't true.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Minnesota

3 0 3 0 0 0 Iowa 3 2 1 2 0 0.83

Iowa only had one punt in the first three quarters. Of course, that punt came on a 4th and 1 at the Iowa 39 yard line. This is a team that successfully converted a 4th and 1 at their own 25 yard line, and in the 4th quarter of a close game, too! If Iowa coach Kirk Ferentz has the confidence to make that decision, going for it on 4th and 1 near midfield should be easy! Also, I feel the need to mention that Hawkeye running back LeShun Daniels Jr. averaged 7.5 yards per rush in this game. There was no reason to punt here for Iowa, and it cost them 0.79 points.

The other Iowa disagreement came when they kicked a field goal on 4th and 5 from the Minnesota 20. The calculator says to go for it, but the difference in expected points is only 0.04. So really there isn't a big issue with Iowa's decision to kick a field goal here.

Minnesota didn't have any disagreements, but I will mention one of their punts. They kicked on 4th and 4 from the Iowa 47. The calculator agreed with the decision, but the difference in expected points is only 0.23 points. And in this game, Minnesota was a double-digit underdog. Considering they were in Iowa territory, I think they should have been more aggressive than the calculator suggests and gone for it. And to make matters worse for the Gophers, the decision to punt backfired. Iowa took the ball and drove 91 yards on the next possession to score the opening touchdown.

4th Down Decisions in the 4th QuarterTeamTime Left4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionWin Probability Go For ItWin Probability Kick Minnesota
(Down by 12) 13:17 3 68 Go for it Punt 1.9% 1.3% (Punt) Iowa
(Up by 12) 10:49 5 30 FG FG 98.7% 98.8% (FG) Minnesota
(Down by 12) 9:30 1 61 Go for it Go for it 1.3% 0.8% (Punt) Minnesota
(Down by 12) 9:30 11 71 Go for it Punt 0.7% 0.6% (Punt) Iowa
(Up by 12) 6:08 3 72 Punt Punt 98.4% 99.2% (Punt)

Early in the 4th quarter, Minnesota had a very manageable 4th down. Down by 2 scores, they should have been aggressive and gone for it instead of punting. And keep in mind, at this point in the game Iowa had punted exactly once. So by punting, Minnesota was banking on stopping Iowa, then scoring, then stopping Iowa again, then scoring to take the lead, then stopping Iowa a third time to preserve the win. Good luck with that, Gophers.

After the Minnesota punt, it took Iowa exactly 2 plays to reach the point where Minnesota punted from. The Hawkeyes ended up attempting a field goal on 4th and 5. I wish they would have made it, because then we would have had a real-life scenario of the end of game 15-point deficit discussed at the start of this post. But Iowa missed the kick, and Minnesota got the first stop they needed.

But 3 plays later Minnesota found itself with a 4th and 1 in their own territory. Just like the 4th and 3 from the previous drive, the model says the correct decision is to go for it. And that's exactly what they did, gaining 10 yards on the play. However, the Gophers were called for holding, and the 4th and 1 turned into 4th and 11. At this point they were in real trouble no matter what they decided to do. The stats say to go for it, but 11 yards is so hard to convert you can't blame Minnesota too much for deciding to punt here.

But Minnesota was able to force the 2nd Iowa punt of the game. And they quickly scored a touchdown to make it a 5 point game. But then Iowa showed exactly why Minnesota should have been more aggressive earlier in the quarter. Even though the Golden Gophers knew Iowa was just going to run the ball to burn the clock, they couldn't stop them. Iowa ran the ball 8 straight plays, getting two first downs followed by a 51 yard touchdown run. Minnesota was able to score again, but it was too little too late as a failed onside kick attempt ended the game. Essentially, Minnesota passed on trying to convert a 4th and 3 to try and recover an onside kick later in the game. The first has a success rate of about 53% and the latter has a success rate of 19%.

If only coaches knew what playing the percentages actually meant.

Northwestern 21 - Purdue 14

All season, Northwestern has been the worst Big Ten team at 4th down decisions and Purdue has been the best. So there is no way Northwestern did a better job on 4th downs than Purdue, right?

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Purdue 8 2 4 1 3 1.31 Northwestern 5 1 4 0 1 0.79

Oh no! Purdue, what happened? It started in the 1st quarter, when the Boilermakers punted on 4th and 2 at the Northwestern 45. Purdue, you lead the country in 4th down attempts! It's 2 yards......in Northwestern territory! And you're a 16 point underdog! Why are you punting here? Why?

The other Purdue disagreement found them being too aggressive. They went for it on 4th and 7 at the Northwestern 23 yard line, when the statistics say to kick. In fact, the difference in expected points is 0.75, so the decision isn't really close. In addition to that, Northwestern has the best pass defense in the Big Ten (in yards/attempt) and Purdue has one of the worst passing offenses. I understand that as a double-digit underdog you should play aggressively. But the gap between kicking and going for it is so big here that Purdue should have attempted a field goal. 

Northwestern punted on a 4th and 1, which at this point seems to be a weekly occurrence for them. But I will finally give Wildcat coach Pat Fitzgerald a compliment. On a 4th and 1 from the Purdue 29 yard line he correctly went for it instead of kicking a field goal. Good job, Pat! Teams convert on 4th and 1 about 68% of the time and they make 46 yard field goals 60% of the time, so this decision should really be a no-brainer. (Okay, so it was a backhanded compliment, but a compliment nonetheless!)

4th Down Decisions in the 4th QuarterTeamTime Left4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionWin Probability Go For ItWin Probability Kick Purdue
(Tie Score) 14:51 3 60 Go for it Punt 28.1% 26.6% (Punt) Northwestern
(Tie Score) 12:47 11 57 Punt Punt 61.1% 68% (Punt) Purdue
(Tie Score) 10:50 5 90 Go for it Punt 18.4% 18.2% (Punt) Purdue
(Down by 7) 4:37 8 73 Go for it Punt 1.5% 0.8% (Punt)

Purdue, the most aggressive 4th down team in the country, picked a poor game in which to be timid. At the start of the 4th quarter, they punted on a 4th and 3 when they should have gone for it. Purdue, this is the perfect time to implement that aggressive 4th down strategy! You've won exactly 2 Big Ten games in the last 3 years. What do you have to lose? But my pleas once again fell on deaf ears, and Purdue punted. The kick went a whole 27 yards and Northwestern went 3 and out on their next possession. So the kick didn't give them much field position, and if they failed on 4th down Northwestern probably would have punted back to them anyway. But if they had converted that 4th and 3? We might be talking about this game as a huge upset. 

By the way, you might be wondering why Purdue has such a small probability to win even though the game was tied. This is because the win probability takes into account the Vegas spread, and Northwestern was a 16 point favorite. So even with a tie score going into the 4th quarter, Northwestern was still more likely to win. All the more reason for Purdue to be aggressive. 

Now after the Northwestern punt, you'll see that Purdue punted on another 4th down when the calculator said to go for it. Although I think Purdue probably made the right choice here. The probabilities are very close, and we already mentioned Northwestern's great pass defense and Purdue's poor pass offense. This probably tips the numbers enough that Purdue was best off by punting. Although the result didn't work out for them, as Northwestern ended up scoring a touchdown. This set up another 4th and long for Purdue deep in their own territory. Suddenly a 4th and 3 near midfield doesn't sound so bad, does it? Teams usually convert 4th and 8 32% of the time, but Purdue's chances were probably less than that. Even so, they would have needed less than a 16% chance of converting to warrant punting. With so little time left, you just can't be sure you're going to get another chance if you kick. And sure enough, after the Purdue punt Northwestern ran out the clock to end the game.

Michigan 48 - Indiana 41

So close Indiana, so close!

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Michigan 4 1 3 1 0 0.41 Indiana 8 1 1 4 3 0.07

As a double digit underdog, you're going to have to be aggressive if you want a chance at pulling the upset. And that's exactly what Indiana did. On their opening drive, Indiana went for it on 4th and 2 and the Michigan 44. They failed, and 4 plays later Michigan scored a touchdown. Often a result like this induces coaches to give up on being aggressive on 4th down. Thankfully, that was not the case with Indiana coach Kevin Wilson. Two possessions later, Indiana went for it on 4th and 1 at their own 44, and 4th and 2 from the Michigan 43. They converted both times and ended up scoring a field goal on the possession. 

Speaking of field goals, Indiana kicked 4 of them in the first three quarters of this game. Usually, I want Indiana to be more aggressive, since field goals aren't going to win you games given a defense as bad as theirs. However, the 4th down distances were all so long that the field goals were definitely the correct decision.

Of course, things weren't perfect for Indiana's 4th down decision making. They did punt on a 4th and 3 from midfield. The expected points for the difference between kicking and going for it are close to 0, so on the surface of it the decision to punt isn't that bad. But considering Indiana has a great offense and a terrible defense, and the fact that they were heavy underdogs, they should have absolutely gone for it. Luckily for them, the punt set up Michigan's poor 4th down decision.

Michigan's disagreement was punting on 4th and 2. It cost them 0.41 expected points, but the real life results turned out to cost them even more. Indiana drove 61 yards for a touchdown the possession after the Michigan punt.

4th Down Decisions in the 4th QuarterTeamTime Left4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionWin Probability Go For ItWin Probability Kick Indiana
(Up by 2) 13:27 7 44 Punt Punt 45.8% 48.9% (Punt) Michigan
(Down by 2) 6:30 2 2 Go for it FG 68.1% 59.2% (FG) Indiana
(Tie Score) OT 1 1 Go for it Go for it 50% 48.1% (FG)

Clinging to a 2 point lead, Indiana punted on a 4th and 7 in Michigan territory early in the 4th quarter. The calculator says this was the correct decision, although in reality the numbers are probably much closer together. Given how bad Indiana's defense is, it's really tempting to think that Indiana should have gone for it here. But 7 yards is a lot to gain, too. The win probabilities for both are so close together that either decision was fine.

The decision to punt didn't work out for Indiana, as Michigan drove all the way to the Indiana goal line, finding themselves with a 4th and goal at the 2. Conventional wisdom is that you have to kick a field goal here to take the lead. But conventional wisdom doesn't realize how valuable a touchdown is. Michigan would have been much better off going for the touchdown here. And even if they failed, Indiana would start at the 2 yard line. Michigan would be very likely to get the ball back in great field position with another chance to take the lead.

After the Michigan field goal, Indiana drove right down the field and scored a touchdown. The only downside was that they left Michigan almost 3 minutes, and the Wolverines were able to answer with a touchdown of their own to send the game to overtime.

In the first possession of overtime Indiana had a 4th and goal at the 1 yard line. The calculator suggests going for it, and that's exactly what Indiana did. But one quick caveat. I calculated their win probability using data from my previous post on college football overtimes, where my sample size was only 88 games. So because we don't have a very large sample, I wouldn't consider these probabilities written in stone. Even so, they do slightly favor going for it. And we also know that Indiana's defense is really bad, so if they kick the field goal their chances of stopping Michigan are lower than your average team. When you consider that, I believe Indiana made the correct decision to go for it.

The results support the decision as well. Between the first and second overtimes, Michigan ran 3 plays and scored 2 touchdowns. Indiana ended up losing, but had they kicked a field goal, it's likely they would have lost more quickly than they did. By going for it on 4th and goal, they gave themselves the best chance at winning. It just didn't work out.

Summary

Each week, I’ll summarize the times coaches disagreed with the 4th down calculator and the difference in expected points between the coach’s decision and the calculator’s decision. I’ll do this only for the 1st 3 quarters since I’m tracking expected points and not win probability. I also want to track decisions made on 4th and 1-2 yards, and decisions made between midfield and the opponent’s 25 yard line. I call this area the “Gray Zone”. Then we can easily compare the actual outcomes of different decisions in similar situations.

Team Summary Team Number of Disagreements Total Expected Points Lost Northwestern 10 7.54 Rutgers 9 4.98 Illinois 11 4.11 Michigan 8 4.05 Minnesota 9 3.81 Ohio State 7 3.77 Penn State 8 3.61 Iowa 6 3.42 Indiana 6 3.37 Nebraska 7 3.36 Michigan St 7 2.27 Wisconsin 5 2.16 Purdue 3 1.55 Maryland 2 0.18

It was a bad weekend for the previously great 4th down decision makers. With their terrible game (from a 4th down decision perspective), Ohio State jumps from 12th to 6th for the most expected points lost. Purdue stays at 13th, but their total expected points lost increased from 0.24 to 1.55! 

Northwestern has a stranglehold on the team whose decisions have cost them the most points. With only two weeks left in the Big Ten season, it seems unlikely that anybody will catch them.  

4th and 1............and also 4th and 2!

Yards To End Zone

Punts

Average Next Score After Punt

Go for It

Average Next Score after Go for it

Field Goals

Average Next Score After FG

75-90

11

-3.64 1 7

*

*

50-74

31 -0.03 6 4

*

*

25-49

3

-4.67 15 2.07 1 -7

1-24

*

* 11 2.55 4 3

I'm adding 4th and 2 to this table to increase our sample size. Even though the 4th down calculator always suggests going for it on 4th with 1 or 2 yards, punting reigns supreme when teams are in their own territory. It's not working out very well for them, especially when they punt inside their own 25. Teams that have correctly gone for it in their own territory have been handsomely rewarded. In 4 of the instances a team has gone for it, they've scored a touchdown on the same possession. Another team kicked a field goal, and the last two instances saw neither team score before halftime. So not once has going for it on 4th and short in your own territory backfired for a Big Ten team this year. Of course, that doesn't mean that it will never backfire. But so far, this year's data says that in the long run, you'll score more points than you give up by going for it. 

The Gray Zone (4th downs 25-49 yards to the end zone)

4th Down Distance

Punts

Average Next Score After Punt

Go for It

Average Next Score after Go for it

Field Goals

Average Next Score After FG

1

0

0

9 3.56 1 -7

2-5

21 -0.52 16 -0.69 4 0.5

6-9

18 0.5 10 -0.4 9 2.1

10+

37 0 1 7 19 0.63

Northwestern is still the only team to kick on 4th and 1 between the 25 and 49 yard line. And even if you are in field goal range, you'll see that the number of points actually scored by going for it is 3.56. That's higher than the field goal even if you guarantee that you make it (which you can't). It just continues to show that teams need to always go for it on 4th and 1.

Things aren't quite as neat for 4th and 2 to 4th and 5. For most of the season, teams that punted have actually had a higher average next score than teams that go for it. But those numbers are getting closer and closer each week. With a larger sample, I suspect the going-for-it group will have the higher value, since most of the time that is the optimal decision.

Approaching the Food Waste Problem with Lean Six Sigma and Statistics

$
0
0

According to this article published on Food Tank, over 22 million pounds of food is wasted on college campuses each year. Now that’s a lot of food waste!

Students all over the country are noticing excessive food waste at their schools and are starting programs to bring awareness and improve the problem. Naturally, many of these programs have roots in Lean Six Sigma. In one example, a group of students at Rose-Hulman Institute of Technology lessened to the food waste problem at their school by completing a Lean Six Sigma project that followed the DMAIC framework.

Dr. Diane Evans, Six Sigma black belt and associate professor of mathematics at Rose-Hulman, led the students in their effort. "I wanted my students to go through the process of completing a project from start to finish," Evans says. "The food waste project provided students with this opportunity, and gave them a chance to put the skills they were learning in class to use in the real world."

According to a July 2012 article in Food Policy, U.S. food waste on the consumer level translated into almost 273 pounds per person in 2008. Evans’ students converted this number into pounds per day, and to determine the amount of waste per meal, they divided the figure by 2.5 meals per day (they did not count breakfast as a full meal because it typically does not see as much waste as lunch or dinner). The students ended up calculating an average food waste amount of 4.78 ounces per meal. So their goal became to reduce edible food waste per student by one ounce per meal during the school’s lunch period.

Using Lean Six Sigma tools, such as process maps and CT Trees, as well as using Minitab for data analysis, Rose-Hulman students reached an impressive outcome—greatly surpassing their original goal. I won’t give away all of their results here, but I encourage you to check out this case study to learn more and find out how they did it.

And for an even quicker read, take a look at this past blog post.


What Is ANOVA? And Who Drinks the Most Beer?

$
0
0

Back when I was an undergrad in statistics, I unfortunately spent an entire semester of my life taking a class, diligently crunching numbers with my TI-82, before realizing 1) that I was actually in an Analysis of Variance (ANOVA) class, 2) why I would want to use such a tool in the first place, and 3) that ANOVA doesn’t necessarily tell you a thing about variances.

Fortunately, I've had a lot more real-world experience to draw from since then, which makes it much easier to understand today. TI-82 not required.

Why Conduct an ANOVA?

In its simplest form—specifically, a 1-way ANOVA—you take 1 continuous (“response”) variable and 1 categorical (“factor”) variable and test the null hypothesis that all group means for the categorical variable are equal. Typically, we’re talking about at least 3 groups, because if you only have 2 groups (samples), then you can use a 2-sample t-test and skip ANOVA all together.

As an example, let’s look at the average annual per capita beer consumption across 3 regions of the world: Asia, Europe, and America. Here’s the null and alternative hypothesis:

H0: All regions drink the same average amount of beer (μAsia = μEurope = μAmerica)

H1: Not all regions drink the same average amount of beer

Any guess on who consumes the most beer?

According to the individual value plot created using Minitab 17, Europe consumes the most beer on average and Asia consumes the least. However, are these differences statistically significant? Or are these differences simply due to random variation?

How ANOVA Works

The basic logic behind ANOVA is that the within-group variation is due only to random error. Therefore:

  • If the between-group variation is similar to the within-group variation, then the group means are likely to differ only due to random error. (Figure 1)
  • If the between-group variation is large relative to the within-group variation, then there are likely differences between the group means. (Figure 2)

Say what?

In our example, the between-group variation represents the variation between the 3 different regions. And the within-group variation represents the beer consumption variability within a given region. Take Europe, for instance, where we have the Czech Republic. It appears to be the thirstiest country, consuming the most beer at 148.6 liters. But Europe also contains Italy, whose population drinks the least at only 29 liters (perhaps the Italians are passing up the Peroni for some vino and Limoncello?). So you can see that there is variability within the Europe group. There’s also variability within the Asia group, and within the America group.

With ANOVA, we compare the between-group variation (i.e., Asia vs. Europe vs. America) to the within-group variation (i.e., within each of those regions). The higher this ratio, the smaller the p-value. So the term ANOVA refers to the fact that we're using information about the variances to draw conclusions about the means.

The Analysis

If we run a 1-way ANOVA using this beer data, Minitab Statistical Software provides the following output in the Session Window:

Our p-value is statistically significant at 0.000. Therefore, we can reject the null hypothesis that all regions drink the same average amount of beer.

This leads us to our next question: Which countries differ? Let’s use Tukey multiple comparisons to find out.

Per the footnote on the Tukey comparisons graph, “If an interval does not contain zero, the corresponding means are significantly different.” Therefore, the intervals shown in red tell us where the differences are. Specifically, we can conclude that the average beer consumption for Europe is significantly higher than that of Asia. We can also conclude that America consumes significantly more than Asia. However, there is not sufficient evidence to conclude that the average beer consumption for Europe is different than for America.

The Last Sip

Although it’s unlikely that you’re analyzing beer data in your professional career, I do hope this provides a little insight into ANOVA and how you can utilize it to test averages between 3 or more groups.

 

Survival Analysis and Zombies

$
0
0

Reliability and survival analysis is used most frequently in manufacturing. Companies use these methods to estimate the proportion of units that will fail within, or survive beyond, a given period of time. But could these reliability and survival analysis techniques prove useful in a zombie apocalypse, too? Today's blog post explores that chilling scenario. 

Think. This is what Zachary is telling himself as he helps his nephew Liam across a creek located somewhere in the woods of Wayne County, Pennsylvania. Liam has just been bitten in the lower leg by a fallen zombie, hidden underneath some dense brush a few miles back.

Zachary knows his nephew doesn’t have a lot of time left. But how much? That question, consuming all other thoughts at the moment, has quietly invoked a core tenet of this new era's code of conduct: When you have the unfortunate luck of running into someone who has just been bitten, then certain...actions must be taken quickly. It gets easier to carry out the more you do it—unless it involves your nephew. 

After reaching the other side of the creek, Zachary glances at his nephew. Liam gives a familiar nod to keep going, and they do. They slowly march up the small hill in front of them, reaching the top to discover a pleasant surprise. A camp. They don’t make it within 15 feet of the tents and campfires before a small group of 8 approach them with guns. Zachary hears one person tell another that “one of them” has been bitten. He knows he has to talk fast.

“We don’t want trouble. I hear a clinic close by has developed a vaccine. I am taking my nephew there. Do you know what direction the clinic is in?”

“It's about nine hours to the east. We’re heading to the clinic tomorrow if you’d like join us. He won’t make it. 

“We’ll be on our way then. Thank you for your help”

“Wait!” Someone emerges from the circle of people surrounding Zachary and Liam. “My name is Claire. I am the group's doctor. When was he bitten?” 

“Just a few hours ago, so he may have a chance. But I would have a better idea if I had actual data on time to transformation.”

“We do, actually," Claire replies. "This group has stuck together for a quite a while, and we have had access to enough medical supplies to start testing if we can prolong the period between virus exposure and transformation into a zombie. Wait here."

Claire steps into her tent and comes back out holding a notepad. “Here are the transformation times for the 65 people we’ve lost in our group. Each value represents the elapsed time from being bitten to waking up as a zombie."

Zachary removes a laptop from his backpack and turns it on. So...the clinic is 9 hours away. With your data, we might be able to find out the likelihood of Liam surviving after 9 hours.”

 Zachary opens Minitab 17 and quickly enters Claire's data in the worksheet. Here is an excerpt of the data set (in hours):

He then goes to Stat > Reliability/Survival > Distribution Analysis (Right Censoring) > Distribution ID Plot... to figure out what distribution the data follows prior to running a survival analysis. Minitab quickly returns the following results:

Zach is looking for the distribution with the smallest A-D value, and in this case it is Weibull. Zachary goes to Stat > Reliability/Survival > Distribution Analysis (Right Censoring) > Parametric Distribution Analysis… to perform his survival analysis.

Now he clicks on the Estimate... button in the dialog box. He wants to estimate the odds of surviving 9 hours after a zombie bite. Since this dialog window only accepts numeric values, Zachary enters 9:0:0 in a blank column. The colons separate the hours, minutes and seconds. He then right-clicks on the column and chooses Format Column > Automatic Numeric. Minitab calculates that 0.375 is the numerical equivalent of 9:0:0, or 9 hours of elapsed time, and Zach enters that into the dialog:

Here is the table of survival probabilities after pressing OK:

Things are not looking good for Liam. He has an estimated 0.3% chance of making it to the clinic alive after 9 hours have passed. Using the confidence interval, the upper bound is at around 2 percent. Pretty low. Liam re-runs the analysis quickly, using 0.354167(equivalent to 8 hours and 30 minutes), knowing that time is ticking and they should be leaving now.

It’s still poor odds, but not impossible. If they pick up the pace, they might make it there in time. Zachary leaves his laptop with the camp. The less weight on he carries, the better. He thanks the group for the data they provided, informing them that if all goes well, he will see them again in a few days. He hoists his nephew onto his shoulders, and hastily makes his way towards the clinic.

 

Big Ten 4th Down Calculator: Week 9

$
0
0

Big Ten LogoThis past weekend in the Big Ten showed how being conservative on 4th down decisions can cost you a game. Ohio State punted on 4th and 1 three different times, while Penn State and Illinois both kicked field goals in the 4th quarter when they needed a touchdown to tie or take the lead. All three teams lost. Perhaps taking some advice from the 4th down calculator would have greatly benefited them!

If you're new to this, I've used Minitab Statistical Software to create a model to determine the correct 4th down decision. And throughout the 2015 college football season, I've used that model to track every 4th down decision in Big Ten Conference games. However, the decision the calculator recommends isn’t meant to be written in stone. In hypothesis testing, it’s important to understand the difference between statistical and practical significance. A test that concludes there is a statistically significant result doesn’t imply that your result has practical consequences. You should use your specialized knowledge to determine whether the difference is practically significant.

Apply the same line of thought to the 4th down calculator. Coaches should also consider other factors, but the 4th down calculator still provides a data-informed starting point for the decision making. 

I'll break the analysis for each game into two sections: 4th down decisions in the first 3 quarters, and 4th down decisions in the 4th quarter. In the first 3 quarters, coaches should try to maximize the points they score. But in the 4th quarter, they should maximize their win probability. To calculate win probability, I’m using this formula from Pro Football Reference.

Indiana 47 - Maryland 28

Indiana finally gets a win after losing six in a row.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Indiana

6 2 2 4 0 1.13 Maryland 9 5 6 0 3 1.49

This game featured 7 disagreements with the 4th down calculator. Seven! Indiana opened things off on their opening drive by kicking a field goal on 4th and goal from the 4. The difference in expected points between kicking and going for it is only 0.13, so on the surface this decision isn't too bad. But Indiana has a great offense and a horrible defense. They should have gone for it here. And to prove the terrible defense part, Maryland went on to score three straight touchdowns and take a 21-3 lead with 5 minutes left in the 1st quarter.

But then the Terps starting making poor 4th down decisions of their own.

Maryland's first punt came on a 4th and 3 near midfield. The calculator will always say to go for it on 4th and 3 regardless of field position, but in your own territory the difference in expected points is only 0.07. So in general, the decision to punt (especially deep in their own territory) is fine. But in this game, Maryland has to keep in mind they're playing Indiana. The Hoosier defense is so bad, and their offense is so good, that you should be more aggressive than usual on 4th downs. And since they were close to midfield, Maryland should have definitely gone for it here. 

After the punt, Indiana scored a touchdown recovered a surprise onside kick, then scored another touchdown. What a great call! Seriously, Indiana should try one surprise onside kick every single game. The decision to do so here was superb. When Maryland finally got the ball back, they punted again on 4th and 2. This time they were deep in their own territory...but have I mentioned that this Indiana defense is bad? On the day Maryland running back Brandon Ross averaged 13.2 yards per carry. That's right, thirteen-point-two! So why are you punting on 4th and 2, no matter what the field position is?

Indiana followed up Maryland's poor decision on 4th and 2 with a terrible one of their own. They had 4th and goal at the 2 yard line and kicked a field goal. This cost Indiana a full point. I've been saying this all year: even if you fail on 4th down near the end zone, the other team still has to start close to their goal line! Here's a case where failing is actually succeeding! Stop kicking on 4th and goal inside the 5!

Maryland's next 4th down decision is absolutely puzzling. On a 4th and 9 from their own 25, they faked a punt. It failed miserably and led to a Indiana field goal. But the fake punt isn't the real problem: it's the earlier punts on 4th and 2 and 4th and 3. You're going to play it safe there, but then 4th and 9 is where you decide to get aggressive? I don't get it.

Maryland's last disagreement with the calculator came on another punt on 4th and 2. Again, they were deep in their own territory, but again this is Indiana. And sure, if you fail they get great field position, but they're going to get good field position anyway! Nonetheless, boom went the punt, and 4 plays later Indiana was in the end zone, taking a commanding lead they would never relinquish. The next time Maryland had a 4th down distance under 5 yards, less than 7 minutes remained in the game and they were down by 19 points. If only had you had gone for some of those 4th downs earlier, Maryland. If only...

Iowa 40 - Purdue 20

And then there was only one undefeated Big Team team.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Purdue 7 2 4 2 1 0.86 Iowa 3 1 3 0 0 0.07

Purdue is leading the country on 4th down attempts, and on Saturday they were a 3 touchdown underdog playing an undefeated team...on the road. On the first drive of the game they punted on 4th and 1. Not only did Iowa score a touchdown on the next possession anyway, the Hawkeyes actually scored touchdowns on their first 3 possessions! But keep punting on 4th and 1, Purdue. That's the way you'll pull the upset.

Speaking of ways to not pull the upset, Purdue then punted on 4th and 3 from midfield. The difference in expected points is only 0.07, so on the surface this wasn't a terrible decision. But Purdue was already down 20-0 and Iowa had done nothing but score touchdowns on every offensive possession so far. How does punting make any sense here?

Luckily for Purdue, they got the ball back and scored a touchdown. Oh, and on that drive they went for it and succeeded on 4th and 1! Purdue actually cut the lead to 7 points in the 3rd quarter, but two consecutive Iowa touchdowns put the game out of reach. Maybe if Purdue had tried going for it earlier, they could have had a chance at pulling the upset.

Northwestern 13 - Wisconsin 7

It's time to get the answer to the weekly question, "Did Northwestern kick on 4th and 1 again this week?"

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Northwestern

8 1 5 2 1 0.07 Wisconsin 5 0 5 0 0 0

Wait, what is this? Northwestern had only one disagreement with the calculator, and the difference in expected points was basically 0 anyway? And they went for it on a 4th and 1?

Hold on, I need to sit down for a minute...

Okay, I'm good now. The lack of disagreements really illustrates how wells these defenses played. Wisconsin never really had a chance to be aggressive on 4th down in the first 3 quarters. In fact, at one point they had a 4th and 33. The model used for 4th down conversion percentages gave the Badgers a 0.0005% chance of converting that. Probably a good idea to punt.

I mentioned that Northwestern actually went for a 4th and 1. Of course, they didn't get it and Wisconsin scored their only touchdown of the game 5 plays later. I'm afraid this will cause Pat Fitzgerald to never attempt to convert a 4th down ever again. The Northwestern disagreement came on a punt on 4th and 3 from their own 41. But honestly, the strength of this Northwestern team is their defense, and the difference in expected value is close to 0. So I actually think punting was the correct call here. And that brings us to the 4th quarter.

4th Down Decisions in the 4th QuarterTeamTime Left4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionWin Probability Go For ItWin Probability Kick Northwestern
(Up by 3) 14:50 3 23 Go for it FG 60.9% 57.1% (FG) Wisconsin
(Down by 3) 13:47 8 75 Punt Punt 37.4% 41.5% (Punt) Northwestern
(Up by 3) 12:29 4 72 Punt Punt 42.4% 43.1% (Punt) Northwestern
(Up by 3) 9:10 7 34 Tie Punt 57.2% 57.2% (Punt) Northwestern
(Up by 3) 4:00 20 20 FG FG 80.3% 67.8% (FG) Wisconsin
(Down by 6) 2:30 14 69 Go for it Punt 4.2% 2.2% (Punt)

I knew Northwestern's good 4th down decision making couldn't last. They kicked a 40 yard field goal at the start of the 4th quarter when they should have went for it on 4th and 3. And this decision is even worse when you consider that Northwestern kicker Jack Mitchell has struggled on kicks from 40-49 yards, going 5 for 13 on them (well below the 70% value the calculator used). And sure enough, the field goal was no good.

After a few punts and a Wisconsin fumble, Northwestern had a 4th and 7 at the Wisconsin 34. The calculator actually gave the exact same win probabilities for punting and going for it. Personally, I would have gone for it just on the principle of not punting from the 34 yard line, but the decision to punt wasn't terrible.

After a Wisconsin interception, things got really interesting from a 4th down decision making perspective. Northwestern had a 4th and goal from the 20 yard line while up 3 points. Common sense says this isn't even a decision—you kick the field goal. The calculator and Pat Fitzgerald agree, as the Wildcats kicked and made the field goal. But common sense and the 4th down calculator don't know football's dirty little secret.

At the end of the game, it's better to be up by 3 points than 6.

A few weeks ago, I compared teams that were up 3 points and 4 - 6 points with one to five minutes remaining in the game. Teams that were up by 3 won about 82% of the time, and teams that were up 4-6 won only 76% of the time. So the data suggest Northwestern actually would have been better off by missing the field goal. The reason for this is that when coaches are only down by 3 points, their play calling gets very conservative once they get in field goal range. But if they need a touchdown, they stay aggressive. A perfect example happened in this game. With 30 seconds left, Wisconsin was out of timeouts and had a 1st and 10 at the Northwestern 23. Down by 3, they probably call a safe pass or a run to set up the field goal. But since they were down by 6, they threw downfield and completed a pass at the 1 yard line. 

It worked out for Northwestern, as they were able to make a goal line stand and win the game. But in general you can't feel good about your chances when up 6 if the other team has the ball at the 1 yard line. And that's why the stats say, if you're in field goal range up by 3 towards the end of the game, your thought process should be touchdown or bust.

Minnesota 32 - Illinois 23

Illinois may have just "kicked" their chance at a bowl game goodbye.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Illinois

4 2 2 2 0 0.19 Minnesota 4 0 3 1 0 0

Both teams did pretty well from a 4th down perspective in the first 3 quarters. Illinois had two disagreements with the model, but both of those were pretty minor. One was a field goal on 4th and goal from the 4, and the other was a 4th and 3 at midfield. Illinois was only a small underdog in this game, so no real need to play a super aggressive strategy. Since there isn't much to talk about here, we'll just jump right to the 4th quarter. 

4th Down Decisions in the 4th QuarterTeamTime Left4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionWin Probability Go For ItWin Probability Kick Illinois
(Down by 4) 14:13 8 35 Go for it Go for it 26.7% 25.6% (Punt) Minnesota
(Up by 4) 13:27 5 65 Punt Punt 59.3% 62.5% (Punt) Illinois
(Down by 4) 6:56 3 18 Go for it FG 36% 28.4% (FG) Minnesota
(Up by 1) 4:46 5 57 Punt Punt 49.5%% 53.3% (Punt) Illinois
(Down by 1) 3:10 19 68 Punt Punt 13.3% 22.5% (Punt)

Things started off so well for Illinois. They correctly went for it on a 4th and 8 at the Minnesota 35. They threw an interception on the play, but they got the ball back after a Gopher punt. And then things went south. On a 4th and 3 from the Minnesota 18, they kicked a field goal. After the made field goal, Illinois went from losing to...oh, wait, they were still losing. The decision to kick was a terrible one, lowering their win probability by 8%! It was only 3 yards! When you're losing late in the 4th quarter, odds are you're going to have to go for it on 4th down at some point. So with a 4th and short, Illinois should have gone for it.

Minnesota ended up punting and Illinois got the ball back. But then they found themselves in a 4th and 19! Nineteen! Oh man, Bill Cubit, I bet you wish you could make that 4th and 3 decision over again. With such a long distance, Illinois had to punt. But Minnesota scored a touchdown in 2 plays and made the 2 point conversion to go up by 9, effectively ending the game. Illinois will now have to beat 9-2 Northwestern next week to become bowl-eligible. Otherwise, they'll have a long off-season thinking about that field goal.

Michigan 28 - Penn State 16

Surely we couldn't have two coaches kick a field goal when they needed a touchdown on the same day, could we?

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Michigan 5 3 5 0 0 1.20 Penn State 7 0 6 1 0 0

For the most part, defenses dominated in this game, as 11 of the 12 4th down decisions were punts. Two of Michigan's "disagreements" were on a 4th and 2 and 4th and 3 deep in their own territory. Considering the strength of both of these teams are their defenses, I think Michigan was correct to go against the calculator's recommendation and punt (especially on the 4th and 3).

Michigan's third disagreement was a terrible one in theory, but ended up with a great result. With a 4th and 2 at the Penn State 43 yard line, they punted. The statistics say this cost the Wolverines 0.81 points. But the punt ended up costing the Nittany Lions, as they fumbled it and Michigan recovered inside their own 10 to set up a touchdown. This gave Michigan an 11 point lead as we entered the 4th quarter. Sometimes bad decisions can still end up with great results.

Michigan held the 11 point lead to late in the 3rd quarter, when Penn State found itself with a 4th and 9 in Michigan territory. But since there were only 10 seconds left in the quarter, I included this decision with the 4th quarter decisions. So let's move on to there.

4th Down Decisions in the 4th QuarterTeamTime Left4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionWin Probability Go For ItWin Probability Kick Penn State
(Down by 11) 15:10 9 41 Go for it Go for it 7.8% 7.6% (Punt) Penn State
(Down by 11) 14:02 6 6 Go for it FG 12.2% 10.5% (FG) Michigan
(Up by 8) 12:29 15 79 Punt Punt 74.4% 81.7% (Punt) Penn State
(Down by 8) 8:05 1 1 Go for it FG 25.5% 14.6% (FG)

Just like Illinois, things started so well for Penn State. They correctly went for it on a 4th and 9 in Michigan territory, and they converted. But then on 4th and goal from the 6, Nittany Lion coach James Franklin kicked a field goal when the stats would recommend to go for it. Part of the reason is that by kicking, you might still need 2 possessions to score. If you get a touchdown, then you know that you only need one more possession. And lest you think no coach would ever do what the stats suggest, know that it's not without precedent. In 2012, then-Penn State coach Bill O'Brien went for it on 4th and 4 from the 6 in almost the exact same position. They were down 11 points to Northwestern with just under 10 minutes left in the game. The result for O'Brien was a touchdown and after the 2 point conversion he knew that they could tie with a field goal or take the lead with a touchdown. By kicking a field goal, Franklin still didn't know how many more times he needed to score.

After a Michigan punt, Franklin then made the worst decision the Big Ten 4th down calculator has seen this year. On 4th and goal from the 1, he kicked a field goal. Now, instead of needing a touchdown...oh, wait, Penn State still needed a touchdown. This dropped Penn State's win probability by a staggering 11%! I will note that although the play-by-play data says the ball was at the 1, it actually looked closer to the 2 yard line. But even if we put the ball at the 2 yard line, the difference in win probability is still 9%! You need a touchdown at some point; why not try to score one when you're just a yard or two away? It's mind boggling.

After the game Franklin said they kicked the field goal because he didn't think they could score a touchdown. Then why did he go for it on 4th and 9? After all, if you don't think you can get a yard or two on 4th down, what makes you think you could get 9? Teams convert 4th and 9 only 30% of the time. They convert 4th and goal from the one yard line 59% of the time (and 50% of the time from the two yard line). And even if you fail, Michigan is starting at their own 1 yard line! You're likely to get the ball back in great field position anyway.

After the field goal, Michigan took all of 6 plays to score a touchdown and take a 2 possession lead. Penn State ended up having to go for it on 4th and 10 from their own 25 yard line. Kinda makes you wish you had an easier 4th down to convert, doesn't it? Like, maybe, 4th and 1 at the goal line?

And remember that Bill O'Brien game? He actually passed up a 35 yard field goal that could have tied the game with 4 minutes left to go for it on a 4th and 2. Penn State converted the 4th down, scored a touchdown, and went on to win the game. Two different strategies, two different outcomes. Going for it won't always work like it did for O'Brien, but there is a reason it pays to be aggressive on 4th down.

Michigan St 17 - Ohio St 14

Hey. remember what I just said about it paying to be aggressive on 4th down? Well, wait until you get a load of this.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Michigan St 5 2 4 1 0 0.93 Ohio St 8 4 7 0 1 1.99

All season long, Ohio State has been super aggressive on 4th and 1, twice going for it deep in their own territory (and scoring a touchdown on both drives). But last week things got weird. They punted on 4th and 1 three different times! Ohio State still won the game, but they left a lot of expected points on the field. Had they played a more talented opponent, that decision making could have cost them their undefeated season.

Well, a week later, it did.

Ohio State had 4th and 1 four different times in this game, and they punted on three of them. And one of them was in Michigan State territory! To add insult to injury, Ohio State also punted on a 4th and 2. Add it all up, and it cost the Buckeyes about two expected points lost. Points they badly needed in a game that they lost by a field goal. The one time Ohio State made the correct decision was 4th and goal at the 1. But they even screwed that up by having to take a time out before the play. Why do these coaches take so much time to make these decisions? Ohio State had 3rd and goal at the 1. The coach should know ahead of time what they plan on doing if they don't score on 3rd down. Why are you wasting a time out? All you have to do is think one play ahead!

Ohio State's third punt on 4th and 1 came with 32 seconds left in the first half, so I didn't count any expected points lost against them for this decision. However, a conversion would have give them the ball at midfield with enough time left to move into field goal position. It would have been a low-risk opportunity to get some more points before halftime. And in a tight defensive game like this, you can't waste opportunities.

Michigan State punted on 4th and 2 from midfield. Normally I would say this is bad, but since the strength of the Spartans is their defense and their starting quarterback was out, this decision was probably the correct one. Although they later punted on 4th and 3 from the Ohio State 41 yard line. Even with an inexperienced quarterback, the upside of converting here is so high compared to the risk of failing that Michigan State should have gone for it. And as you'll soon see, it's not like a 4th and 3 is impossible. Even with a backup quarterback.

4th Down Decisions in the 4th QuarterTeamTime Left4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionWin Probability Go For ItWin Probability Kick Michigan St
(Down by 7) 14:45 3 30 Go for it Go for it 13.6% 10.5% (FG) Ohio St
(Tie Score) 8:43 4 52 Punt Punt 62.7% 65.6% (Punt) Michigan St
(Tie Score) 5:49 3 58 Go for it Punt 36.3% 33.4% (Punt) Ohio St
(Tie Score 4:07 6 89 Punt Punt 32.7% 35.5% (Punt)

We saw Illinois and Penn State pass up opportunities to score game tying/lead changing touchdowns to instead kick field goals and continue to trail. Michigan State decided to do things a little differently. They passed up a potential 47 yard field goal and went for it. The Spartans converted and ended up scoring a touchdown on the possession. Illinois and Penn State failed to pull upsets, while Michigan State now controls their own destiny to the Big Ten Championship game. Just saying.

After an Ohio State punt, the Spartans had a 4th and 3 fro their own 42. The statistics say to go for it, but I think this is a case where the situational knowledge makes punting the correct decision. Again, without their starting quarterback and with their defense being the strength of the time, I agree with Michigan State's decision to go against the calculator's advice and punt. Now, had they been losing, things might be different. But in a tie game, go see if your defense can make a play. And that's exactly what happened. The defense got a stop and gave the offense the ball with great field position, and Michigan State used the short field to set up a game-winning field goal. 

And Ohio State can only wonder what might have happened if they built a larger lead by going for it on those 4th and 1s.

Summary

Each week, I’ll summarize the times coaches disagreed with the 4th down calculator and the difference in expected points between the coach’s decision and the calculator’s decision. I’ll do this only for the 1st 3 quarters since I’m tracking expected points and not win probability. I also want to track decisions made on 4th and 1-2 yards, and decisions made between midfield and the opponent’s 25 yard line. I call this area the “Gray Zone”. Then we can easily compare the actual outcomes of different decisions in similar situations.

Team Summary Team Number of Disagreements Total Expected Points Lost Northwestern 11 7.61 Ohio State 11 5.76 Michigan 11 5.25 Rutgers 9 4.98 Indiana 8 4.5 Illinois 13 4.3 Minnesota 9 3.81 Penn State 8 3.61 Iowa 7 3.49 Nebraska 7 3.36 Michigan St 9 3.2 Purdue 5 2.41 Wisconsin 5 2.16 Maryland 7 1.67

In two weeks, Ohio State has moved from 12th place to 2nd in most expected points lost in the Big Ten. I guess that's what happens when you punt on 4th and 1 six different times! One more week like that and they might catch Northwestern. But more important, they might lose a game to their arch rival because of it.

4th and 1...and also 4th and 2!

Yards To End Zone

Punts

Average Next Score After Punt

Go for It

Average Next Score after Go for it

Field Goals

Average Next Score After FG

75-90

13

-3.85 1 7

*

*

50-74

37 -0.216 6 4

*

*

25-49

5 0 16 1.5 1 -7

1-24

*

* 13 3.2 5 3

When a team has a 4th and short inside the red zone, you'll hear a lot of announcers say that you have to "take the points." They mean that you should kick the field goal to get the certain points (because field goals are never missed) instead of going for it when you might get stopped and score nothing. But the statistics say that on 4th and 1 or 2, you should never kick a field goal. In the long run you'll score more points by going for it. Here, we're finally seeing that play out in real life. The 13 teams that have gone for it on 4th and short inside the 25 have scored an average of 3.2 points. This is higher than the 3 "automatic" points you'll get for a field goal (although we see that the 5 teams that have kicked have all made the field goal), so the events on the field are providing good support for the statistical model.

Now if only Penn State and Illinois would listen.  

The Gray Zone (4th downs 25-49 yards to the end zone)

4th Down Distance

Punts

Average Next Score After Punt

Go for It

Average Next Score after Go for it

Field Goals

Average Next Score After FG

1

1

7 10 2.5 1 -7

2-5

24 -0.33 18 -0.61 5 1

6-9

22 0.59 11 -0.64 10 2.2

10+

40 .18 2 3.5 20 0.6

Thanks to Ohio State, we now have our first punt of the year on 4th and 1 from the Gray Zone. You might say "Well, it worked out for them since the next score was an Ohio State touchdown." But keep in mind that the touchdown came as a result of Ohio State converting a 4th and goal at the 1, so they kind of cancel each other out. On top of that, had they gone for it in the gray zone, they might have scored on that drive and the next one. And as it turns out, they certainly could have used all the opportunities to score as they could get.

Giving Statistically Significant Thanks

$
0
0

This week is the annual Thanksgiving holiday in the United States, a period where we are encouraged to eat turkey and cranberries, then consider the blessings in our lives before falling into a comfortable pre-football nap. That includes many of us here at Minitab

Consequently, we won't have new posts for you over the next two days.  But one of the things I'm grateful for is having had the opportunity to write for and edit the Minitab Blog for the past four years. I realized that we've done a number of gratitude- and thanksgiving-related posts in that time. I thought I'd send us into the holiday season with a roundup of how our Minitab bloggers have celebrated with statistics and data. 

Carly Barry discussed how she applies Lean tactics to optimize time spent in the kitchen when cooking Thanksgiving dinner. For those who are involved in feeding very large groups of people, she also detailed how the crew responsible for organizing and executing Minitab's company-wide thanksgiving celebration uses value stream mapping and other methods to feed a big gathering quickly and efficiently. 

Many of our bloggers have looked at their own eating habits on this occasion—Cody Steele, for instance, tracked his pie-related activity using bar charts. Still others turned their attention to the statistical methods that they were most grateful for, including Patrick Runkel, who explained why he's grateful for the Regression menu in Minitab. 

Jim Frost turned a more somber eye to data about income levels around the world and detailed his analysis and reflections in a great two-part blog post: Part 1, Part 2

And for those whose Thanksgiving experiences are inextricably linked with football, Kevin Rudy has penned a cornucopia of insightful posts that illustrate how analyzing data teaches us more about the United States' favorite game. 

Whatever you'll be doing this Thanksgiving, all of us at Minitab hope it finds you happy and healthy! 

A Simple Guide to Between / Within capability

$
0
0

Having delivered training courses on capability analyses with Minitab, several times, I have noticed that one question you can be absolutely sure will be asked, during the course, is: What is the difference between the Cpk and the Ppk indices?

Ppk vs. Cpk indices

The terms Cpk and Ppk are often confused, so that when quality or process engineers refer to the Cpk index, they often actually intend to mean Ppk indices.

Ppk is used to assess the long-term, overall variability, whereas Cpk is the capability index for short-term, potential variability. In this blog post, I will try to make this difference more explicit.

Consider the graph below. Suppose that during a full week period, measurements have been collected day after day. Suppose also that the process we are monitoring is cyclical. The amount of variability within one day is quite small, but because of the cyclical behavior or process instability from day to day, the overall variability during the whole week is much larger than the variability within single days. The Ppk is estimated from the dispersion of all individual values during the whole period, whereas the Cpk is based on variations only within subgroups (within days in my example).

https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/120bd71123f5d33ea2be221a1f60cdee/120bd71123f5d33ea2be221a1f60cdee.png

Customers are likely to be affected by the overall variability, and therefore only the Ppk index matters to them. The Cpk often provides an overly optimistic capability estimate because cyclical behaviors between days are not taken into consideration.

As far as the vendor is concerned, it is useful to know that potentially, if the cyclical, unstable behavior was successfully dealt with, the Ppk would be improved and would become equivalent to the Cpk. The graph below illustrates a situation in which the Ppk and the Cpk indices are equivalent because the process is stable. In this case, variations within subgroups are similar to variations during the whole period.

https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/593e4a2824414c8bcce069a0415c7144/593e4a2824414c8bcce069a0415c7144.png

Between / Within capability

The sources of variability that affect processes in the long term might be different from the ones that take place in the short run. For example, within-batch variability is often much smaller than variations between batches, since parts in the same batch are often processed on the same tools within a short period of time, and have the same processing history. Other sources of variations—such as seasonal changes and modifications in the environment—may have a longer-term impact.

The Between/Within capability method can be used to estimate short-term variability even more accurately than described above. The variability within batches is still used to estimate short-term variability, but as far as the differences between batches are concerned the approach is slightly more subtle. One part of between-batches variability will be accounted for as short-term variability, by considering only differences between consecutive batches (in time order).

Referring to my previous example, Within variability represents within-batch variability, whereas Between variability represents short-term fluctuations between batches. To estimate these short-term variations between batches, only differences between the averages of consecutive batches are considered (Moving Ranges between averages of consecutive batches). Within and Between variations estimates are then compounded together to calculate a short-term between/within Cpk, whereas the Ppk is still based on the overall long-term variability, considering all individual values (not in time order) during the whole period.

In the graph below, the process is cyclical in nature and a long-term trend is clearly visible, although differences between averages of consecutive batches are small (variability between consecutive batches) it is clear that the overall variability during the full period is much larger due to this long-term trend. Differences between consecutive batches fail to capture the full extent of the overall variability, but are good estimates of short-term variability.   

https://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/31b80fb2-db66-4edf-a753-74d4c9804ab8/Image/e7f0384f31674e829d8aae224010121c/e7f0384f31674e829d8aae224010121c.png

Conclusion

It is important to differentiate short-term variability from long-term variability, because if a process is affected by drifts and systematic long term trends, it will become unstable and therefore unpredictable. A Ppk that is estimated today may not be valid tomorrow because of a long term process shift. Process stability and fluctuations due to random / common causes are necessary to ensure a predictable behavior. The Ppk (overall capability) index should therefore be as close as possible to the Cpk (short term) estimate.

Statistics Has Become Famous, and Aren't We Lucky

$
0
0

‘Statistics’ is a rising star. Everywhere I turn, people are talking about data and the value of being able to analyze and act on it. As someone who’s been writing about that for years, I say it’s about time.

Statistics is like a talented actress whose decades of appearing off Broadway have finally paid off.

For years, her work has been enriching our lives without us knowing it. Statistics helps produce our Netflix recommendations, but she’s not listed in the credits. And though you may not recognize her, Statistics plays pivotal roles in countless other projects that keep us healthy, secure, and content.

When we’re sick or injured, we expect any blood components we receive to be safe. And they almost always are, but only because organizations like the Red Cross statistically monitor their processes to meet strict quality standards.

We want our tax dollars spent well, without realizing that public officials who do are often analyzing data to cut costs – like those in Tyler, Texas, who implemented more than 100 Lean Six Sigma projects and drastically reduced expenses.

We can even count on brewers like Anheuser-Busch InBev to keep us in good beer, thanks to how they used statistics to simplify production line conversion from one brand to another. And if you prefer Guinness, you should know the company actually created a new statistical method to make sure it delivered the perfect stout. Talk about writing a part with an actress in mind.

Like any big star, Statistics makes money for whoever employs her. These quality improvement projects saved their organizations a combined total of nearly $6 million. And as you’d expect, that gives Statistics some clout when negotiating her contracts.

Money, Opportunity, and Prestige

Just last year, the mean annual salary of a statistician in the U.S. was $83,000, and in Silicon Valley it was a whopping $150,000. The McKinsey Global Institute predicted a shortage of up to 190,000 people with analytical expertise, which will likely drive salaries higher. LinkedIn acknowledged this trend when it rated statistical analysis and data mining as the number one hottest job skill of 2014.

Colleges and universities are responding in kind. Between 2003—2013, the number of U.S. schools granting undergraduate statistics degrees jumped from 73 to 110. And in just the last four of those years, the number of degrees they granted increased by more than 95%.  Fortune reports that a Ph.D. in Statistics was the best graduate degree for jobs.

Even high schools are feeling the impact. The number of students taking the AP exam in Statistics has been increasing by 12% each year since 2003.

True to form, Statistics’ star power enables these professionals to work on the projects they care about most. The American Statistical Association’s awareness campaign This is Statistics reveals that data analysts are everywhere—assessing federal assistance programs, contributing evidence to war crimes trials, and estimating human exposure to pollution and its effects on public health. They’re even predicting which plays an NFL opponent will call. As Natalie Cheung Hall, a featured statistician, says, “Whatever your passions are, statistics will be involved.”

Perhaps the clearest sign that Statistics has arrived is that even her spokesperson is a celebrity. In his TED talk, Professor Arthur Benjamin makes a winning pitch to teach statistics, calling it “the mathematics of games” and “a way to predict the future.” His three-minute video Teach statistics before calculus! has captivated more than 1.6 million YouTube viewers.

But the best thing about Statistics’ rise to fame isn’t her commercial and popular success, it’s her indifference to it. Like any true artist, she’s more interested in the work itself—the material she gets to explore, the insights she discovers, and the effect she has on her audience. And like any true fan, we can never get enough.

Big Ten 4th Down Calculator: Week 10

$
0
0

Big Ten LogoThere was a lot at stake in the final week of Big Ten play this week. Three different games had an impact on not only the Big Ten Championship Game, but the College Football playoff as well. Unfortunately for the viewers, none of the games were really close in the 4th quarter. But that doesn't mean we can't analyze the 4th down decisions in the first 3 quarters and see whether the losing teams had opportunities to score more points.

If you're new to this, I've used Minitab Statistical Software to create a model to determine the correct 4th down decision. And throughout the 2015 college football season, I've used that model to track every 4th down decision in Big Ten Conference games. However, the decision the calculator recommends isn’t meant to be written in stone. In hypothesis testing, it’s important to understand the difference between statistical and practical significance. A test that concludes there is a statistically significant result doesn’t imply that your result has practical consequences. You should use your specialized knowledge to determine whether the difference is practically significant.

We should apply the same approach to the 4th down calculator. Coaches should consider other factors, but the 4th down calculator provides a data-informed starting point for the decision making. 

I'll break the analysis for each game into two sections: 4th down decisions in the first 3 quarters, and 4th down decisions in the 4th quarter. In the first 3 quarters, coaches should try to maximize the points they score. But in the 4th quarter, they should maximize their win probability. To calculate win probability, I’m using this formula from Pro Football Reference.

Iowa 28 - Nebraska 20

Imagine the state of confusion and utter disbelief the entire country will be in if Iowa wins the College Football Playoff. For that reason alone, I hope it happens.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Iowa

6 1 6 0 0 1.09 Nebraska 5 1 4 0 1 0.79

Iowa began the game with a very confusing 4th down decision: on a 4th and 2 from the Nebraska 38 yard line, they punted. It's only two yards, and Iowa has one of the better offenses in the Big Ten. In fact, on that day Hawkeye running back Jordan Canzeri averaged 8.2 yards per carry...and you punt on 4th and 2 in Nebraska territory? What a terrible decision. It didn't cost Iowa in this game, but they are going to be underdogs in every game they play the rest of the season. If they really want a chance at shocking the world and winning the College Football Playoff, they can't waste opportunities to score like this one. 

Nebraska's disagreement with the 4th down calculator did cost them in this game. On a 4th and 1 at their own 34, they punted. The decision was a bad one, but the result was a disaster. After a good punt return and a Nebraska penalty, Iowa started their drive from the Nebraska 33 yard line. So Iowa started with better field position than if Nebraska would have went for it on 4th down and rushed for no gain. Two plays later, Iowa was in the end zone to take an 11 point lead. And that's the margin they would find themselves leading by going into the 4th quarter.

4th Down Decisions in the 4th QuarterTeamTime Left4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionWin Probability Go For ItWin Probability Kick Nebraska
(Down by 11) 13:33 6 57 Go for it Punt 6.6% 6.4% (Punt) Iowa
(Up by 11) 11:24 1 85 Go for it Punt 89.8% 89.2% (Punt) Nebraska
(Down by 11) 6:37 1 19 Go for it Go for it 9.3% 4.1% (FG)

Down by 11, Nebraska punted on a 4th and 6 near midfield. The stats say to go for it, but the win probability is so close you really can't fault either decision. But that punt set up a situation Iowa found themselves in earlier this year. In their Big Ten opener against Wisconsin, Iowa had a 4th and 1 on their own 24 yard line clinging to a 4 point lead. The stats said to go for it, and that is exactly what Hawkeye coach Kirk Ferentz did. They converted, and went on to win the game. But here, Ferentz decided to punt. It's a close decision, but remember how good this Iowa running game is. Because of that, I think they should have gone for it.

Speaking of going for it, Nebraska finally did that on their next drive. They were in field goal range, so you'll hear people say you have to "take the points and cut it to a one possession game." Except that field goals can be missed, and an 8 point lead could be either a one or two possession game. Plus, if you're down by two scores in the 4th quarter, you're probably going to have to go for it on 4th down at some point. It doesn't get any easier than 4th and 1, so Nebraska clearly made the correct decision to go for it. Now, the play they called was a fade route to the end zone (that fell incomplete), so you can question that all you like. But to have the best chance of winning the game, Nebraska had to go for it here.

But the result didn't work out for them, and Iowa is still alive for a shot at making the college football playoff. 

Michigan State 55 - Penn State 16

The Spartans picked a good time to have their highest scoring game of the season.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Penn St 6 2 3 1 2 1.62 Michigan St 2 0 2 0 0 0

Michigan State only had two 4th downs through the first 3 quarters. I guess that's what happens when you're scoring so often. Luckily, there is a lot to talk about with Penn State. The first 4th down of a game was a 4th and 3 from midfield for the Nittany Lions. The stats say to go for it, but the difference between going for it and punting is only 0.07 points. Considering Michigan State has one of the better defenses in the Big Ten and Penn State has one of the worst offenses, I don't see any problem with Penn State's decision to punt it here.

But a couple of yards makes a huge difference. In the next possession, Penn State had another 4th and 3, but this time they were at the Michigan State 32 yard line. By being 18 yards closer, the difference between punting and going for it is now 1.2 points. There is no way your offense can be bad enough to warrant punting. And luckily for Penn State, they listened to the stats, as they went for it and picked up the first down. So far, so good for the Nittany Lions.

But it wouldn't last.

On 4th and goal from the 1, Penn State kicked a field goal. This is the worst decision a coach can make on 4th down, as it cost Penn State 1.55 expected points. And to add insult to injury, Penn State lost to Michigan last week thanks in large part to their decision to settle for field goals inside the 10 (Franklin also kicked a field goal on 4th and goal from the 1 against the Wolverines). And they went right back and did the same thing this week. As a double digit underdog. On the road. Against the #5 team in the country. You simply can't leave points on the table like that if you're going to pull the upset. 

To Franklin's credit, he did later go for it on 4th and 1 at his own 46 yard line. They successfully converted, and soon found themselves deep in Michigan State territory. But unfortunately they lost a fumble that Michigan State returned for a touchdown, and the rout was on.

Ohio State 42 - Michigan 13

It would appear that busting buckeyes on grave sites does not correlate with wins.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Ohio State

2 0 1 0 1 0 Michigan 4 1 3 1 0 0.36

On Michigan's opening possession, they punted on a 4th and 5 from the Ohio State 36 yard line. The calculator assumes the punt is downed at the 10 yard line, and using that, the model suggests going for it. In fact, even if you guarantee to me that the punt is downed at the 1 yard line, the calculator still says to go for it. When you're only 36 yards from the end zone, you can't willingly pass on opportunities to score points. Unfortunately for Michigan, they didn't down the punt at the 1 yard line. Or even the 10 yard line. The punt sailed into the end zone for a touchback, and Michigan traded an opportunity to score points for 16 yards of field position. And as we later saw, they desperately needed all the points they could get.  

The calculator has been very unhappy with Ohio State's 4th and 1 decision the previous two weeks. Their conservative decisions may have even cost them a chance at the Big Ten Championship game. But that wasn't the case this week. Their only 4th and 1 came at the Michigan 17 yard line late in the 3rd quarter, and they correctly went for it. They successfully converted, and the very next play they scored a back-breaking touchdown to take a commanding 17 point lead.

Ohio State has gone for it on 4th and 1 six times this season, and they've successfully converted on all six of them. And on top of that, they've scored a touchdown on every single drive where they've gone for it. Why this team would ever punt on 4th and 1 befuddles me. But they have in fact punted 7 times on 4th and 1. To me, that's just 7 wasted opportunities to score points. And against Michigan State, those wasted opportunities cost them a chance at repeating as National Champions.

Northwestern 24 - Illinois 14

If they win their bowl game, Northwestern will have a chance to finish the season with 11 wins for the first time since, well, since ever.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Northwestern

6 4 6 0 0 1.31 Illinois 10 4 7 1 2 0.76

At first, this looks really bad. There were sixteen 4th down decisions, and the calculator disagreed with half of them! But when we dig a little deeper it's not quite as bad as it seems. Northwestern punted on three different 4th and 2s, and on a 4th and 3. All of the punts came deep in their own territory and the strength of this Northwestern team is their defense, not their offense. In fact, on a yards per play basis, the Northwestern defense is 4th in the Big Ten and the offense is dead last. So being aggressive on 4th down deep in their own territory is not something Northwestern should be doing.

Illinois had two punts on 4th and 3. The calculator suggests going for it, but the difference in expected points between going for it and kicking is only 0.07, so there really isn't anything wrong with the decision to punt. However, their other two disagreements were confusing. On a 4th and 5 from the Northwestern 30 yard line, they kicked a field goal when they should have gone for it. The odds of converting on 4th and 5 aren't that much lower than a kicker making a 48 yard field goal, and with the former you can still score a touchdown. And sure enough, Illinois missed the field goal.

But the really confusing part came later in the game. With a 4th and 11 at the Northwestern 32 yard line, they went for it. In this case, the probability of converting on 4th and 11 is so low that you should kick a field goal. But Illinois logic says 5 yards is too far to try and go for it, but 11 yards isn't. It's true that they were down 14 points when they went for the 4th and 11, but it was still the third quarter. You should still be trying to maximize your points, which means kicking a field goal.

4th Down Decisions in the 4th QuarterTeamTime Left4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionWin Probability Go For ItWin Probability Kick Illinois
(Down by 10) 4:28 1 4 Go for it Go for it 12.5% 3% (FG) Illinois
(Down by 10) 4:28 6 9 Go for it FG 6.9% 2.9% (FG)

I want to focus on one 4th down in the 4th quarter. With a 4th and 1 at the Northwestern 4 yard line, Illinois decided to go for it. This was by far the correct decision, as kicking a field goal would have lowered their win probability by over 9 percent! However, Illinois got a false start penalty, and 4th and 1 become 4th and 6. You can see how costly a penalty that was, as it cut Illinois's chances of winning almost in half. But that assumes that Illinois continues to make the correct decision and go for it.

Unfortunately, with the longer distance they decided to kick the field goal. So the false start penalty took Illinois chances from 1 in 8 to 1 in 34. That's a huge penalty. And to make matters worse, Illinois missed the 27 yard field goal ("take the points," amirite?). Northwestern followed it up by running out the clock, and a losing season was sealed for Illinois.

Indiana 54 - Purdue 36

This game featured two of the worst defenses in the Big Ten, so first team to punt loses.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Indiana 7 1 3 2 2 0.1 Purdue 6 1 4 0 2 0.79

Purdue was in fact the first team to punt, so that prophecy came true! It was a 4th and 6 where the calculator suggested punting, but still, "first team to punt loses" means first team to punt loses! And speaking of the calculator suggesting to punt, Indiana correctly defied that suggestion on their opening drive. Having one of the best offenses and worst defenses in the Big Ten, Indiana should always be more aggressive than the calculator suggests. And that's exactly what they did on a 4th and 5 from the Purdue 42. Instead of punting, they went for it, converted, and scored a touchdown on the drive.

Purdue had a 4th and 1 on the Indiana 43 that they went for and converted. They ended up fumbling on the goal line, but forced an Indiana punt and ended up scoring a touchdown on their next possession. The football gods remembered your aggressiveness on that 4th and 1, Purdue, and rewarded you accordingly.

It's just a shame the Boilermakers had to go and screw it all up.

The next possession after their touchdown drive, Purdue had a 4th and 1 on their own 34, and punted. Did you forget which defense you're playing Purdue? That day, the Boilermaker running backs averaged over 6 yards per carry. But for some reason the coaches thought gaining 1 yard on 4th down would just be too hard. And sure enough, after the punt Indiana scored a touchdown on the very next drive. To add injury to insult, the Hoosiers went for and converted a 4th and 1 on their drive. Purdue would never cut the deficit to single digits the rest of the game.

Wisconsin 31- Minnesota 21

Wisconsin has now beaten Minnesota 12 straight times.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Wisconsin 5 0 3 2 0 0 Minnesota 4 2 4 0 0 1.2

Minnesota started their 4th down decision making by punting on a 4th and 2. Sure, it was from their own 35, but 2 yards is such a short distance that the calculator suggests going for it. And when you haven't beaten a team in 11 straight tries, maybe it would be a good idea to try something different. No? Okay, let's see how punting works out for you.

In this instance, it worked out great for Minnesota. Wisconsin threw an interception the very next play and Minnesota returned it for a touchdown. But their luck wouldn't last. Fast-forward to the end of the 3rd quarter. Minnesota is down by 17 points and has a 4th and 1 on their own 41 yard line. They line up to go for it. Great decision! Not only should you always go for it on 4th and 1, but you're down by 17, so you absolutely need to score points. But wait, what's this? Minnesota was only trying to get Wisconsin to jump offsides. When the Badgers didn't jump, Minnesota took a delay of game penalty and punted. Did I mention they were down by 17 points? Seventeen points! The decision to punt there wasinexplicable.

But it somehow gets worse.

The next time Minnesota had the ball, they had a 4th and 2 with about 11 minutes to play in the game. And again, they are still down by 17 points! And yet, they punted. Minnesota wasn't likely to win either way, but the decision to punt cut their win probability almost in half (from 0.69% to 0.39%). And if you're going to come back from a 17 point deficit, odds are you're going have to convert a 4th down at some point. This one is only 2 yards! Why not try and convert now? Who knows what distances you'll be forced to have to convert later if you pass on this one! Oh, and may I remind you that you're down by 17 points?

Coaches are constantly telling their players to give 110% and to never give up. And they would certainly be furious if a player simply gave up and walked off the field before the clock hit 0:00. But that's exactly what Minnesota coach Tracy Claeys did here. He seemed to be more concerned about minimizing the margin Wisconsin would win by than he was about his own team's chances of winning the game. Sure, if he failed on a 4th and short in his own territory people might criticize him for giving Wisconsin the ball in great field position. But being a leader isn't about making decisions that will cause the least amount of criticism. It's about making the correct call in spite of what other people might think. 

Minnesota did score a touchdown with 5:30 left to cut the lead to 10. And they even got the ball back two more times after that. But how did those last two possessions end? By failing to convert on a 4th and 12 and throwing an interception on a 4th and 19. Kinda makes you wish you had gone for it when the distance to gain was only 1 and 2 yards, doesn't it?

Maryland 46 - Rutgers 41

Maryland picks up their first Big Ten win of the year.

4th Down Decisions in the First 3 Quarters

Team

4th Downs

Number of Disagreements with the 4th down Calculator

Punts

Field Goals

Conversion Attempts

Expected Points Lost

Maryland

8 1 3 3 2 0.11 Rutgers 5 2 4 1 0 1.2

On their opening drive, Maryland had a 4th and 1 at their own 44. They correctly went for it, but failed to convert. Rutgers took over in Maryland territory and scored a touchdown. The result didn't work out, but the decision by Maryland was correct. On a yards per play basis, Rutgers has the worst defense in the Big Ten. You should absolutely be aggressive against them on 4th down.

On the flip side, Rutgers has to know that they have the worst defense in the Big Ten. So it may not matter where the other team starts with the ball, which means you should be aggressive on 4th down. But, for whatever reason, Rutgers played it "safe." They punted on a 4th and 2 from midfield, and on a 4th and 1 from their own 28. Of course, I'm not sure how that's playing if "safe" when your defense has now given up 46 or more points five different times this season. If the Scarlet Knights are going to win a game, it's going to be by scoring as many points as they can. And punting on 4th and short is definitely not how you maximize your points.

4th Down Decisions in the 4th QuarterTeamTime Left4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionWin Probability Go For ItWin Probability Kick Rutgers
(Down by 1) 4:57 4 12 Go for it FG 65.4% 59.7% (FG)

The one 4th quarter decision that I want to talk about surprised me. The calculator always thinks you should be aggressive and go for a touchdown on 4th and 5 and less when you're inside the opponents red zone. However, I thought the one time that would change is when a field goal would allow you to take the lead late in the 4th quarter. But here we see that the calculator overwhelming favors going for it, even though a field goal would put Rutgers up by 2 with less than 5 minutes left. Teams convert on 4th and 4 about 46% of the time, which is more than high enough to warrant going for the touchdown. In fact, Rutgers would have had to think they had a less than 35% chance of converting on 4th down to justify kicking a field goal. And all of this doesn't even take account the terrible Rutgers defense. Do you really think you're going to hold a 2 point lead? A touchdown and a 2 point conversion would make the worst case scenario overtime.

In the actual game, Rutgers kicked a field goal and Maryland scored a touchdown on the first play of their next drive. Rutgers then drove into Maryland territory, but failed on a 4th and 1, and Maryland ran out the clock to end the game. Overall, Maryland was the much more aggressive team on 4th down in this game. And it paid off in the end, as the Terrapins finally got a conference win.

Summary

Going to skip the summary for this week. Next week I'll break down the Big Ten Championship game, then go into a detailed analysis of all the 4th down decisions from the Big Ten this year.


What Can You Say When Your P-Value is Greater Than 0.05?

$
0
0

Regardless of where you fall in the ongoing debate about the validity of p-values—which I will not rehash here since my colleague Jim Frost has detailed the issues involved at some length—the fact remains that the p-value will continue to be one of the most frequently used tools for deciding if a result is statistically significant. 

You know the old saw about "Lies, damned lies, and statistics," right?  It rings true because statistics really is as much about interpretation and presentation as it is mathematics. That means we human beings who are analyzing data, with all our foibles and failings, have the opportunity to shade and shadow the way results get reported. 

While I generally like to believe that people want to be honest and objective—especially smart people who do research and analyze data that may affect other people's liveshere are 500 pieces of evidence that fly in the face of that belief

We'll get back to that in a minute. But first, a quick review...

What's a P-Value, and How Do I Interpret It?

Most of us first encounter p-values when we conduct simple hypothesis tests, although they also are integral to many more sophisticated methods. Let's use Minitab 17 to do a quick review of how they work (if you want to follow along and don't have Minitab, the full package is available free for 30 days). We're going to compare fuel consumption for two different kinds of furnaces to see if there's a difference between their means. 

Go to File > Open Worksheet, and click the "Look in Minitab Sample Data Folder" button. Open the sample data set named Furnace.mtw, and choose Stat > Basic Statistics > 2 Sample t... from the menu. In the dialog box, enter "BTU.In" for Samples, and enter "Damper" for Sample IDs.

Press OK and Minitab returns the following output, in which I've highlighted the p-value. 

In the majority of analyses, an alpha of 0.05 is used as the cutoff for significance. If the p-value is less than 0.05, we reject the null hypothesis that there's no difference between the means and conclude that a significant difference does exist. If the p-value is larger than 0.05, we cannot conclude that a significant difference exists. 

That's pretty straightforward, right?  Below 0.05, significant. Over 0.05, not significant. 

"Missed It By That Much!"

In the example above, the result is clear: a p-value of 0.7 is so much higher than 0.05 that you can't apply any wishful thinking to the results. But what if your p-value is really, really close to 0.05?  

Like, what if you had a p-value of 0.06? 

That's not significant. 

Oh. Okay, what about 0.055?

Not significant. 

How about 0.051?

It's still not statistically significant, and data analysts should not try to pretend otherwise. A p-value is not a negotiation: if p > 0.05, the results are not significant. Period.

So, what should I say when I get a p-value that's higher than 0.05?  

How about saying this? "The results were not statistically significant." If that's what the data tell you, there is nothing wrong with saying so. 

No Matter How Thin You Slice It, It's Still Baloney.

Which brings me back to the blog post I referenced at the beginning. Do give it a read, but the bottom line is that the author cataloged 500 different ways that contributors to scientific journals have used language to obscure their results (or lack thereof). 

As a student of language, I confess I find the list fascinating...but also upsetting. It's not right: These contributors are educated people who certainly understand A) what a p-value higher than 0.05 signifies, and B) that manipulating words to soften that result is deliberately deceptive. Or, to put it in words that are less soft, it's a damned lie.

Nonetheless, it happens frequently. 

Here are just a few of my favorites of the 500 different ways people have reported results that were not significant, accompanied by the p-values to which these creative interpretations applied:  

  • a certain trend toward significance (p=0.08)
  • approached the borderline of significance (p=0.07)
  • at the margin of statistical significance (p<0.07)
  • close to being statistically significant (p=0.055)
  • fell just short of statistical significance (p=0.12)
  • just very slightly missed the significance level (p=0.086)
  • near-marginal significance (p=0.18)
  • only slightly non-significant (p=0.0738)
  • provisionally significant (p=0.073)

and my very favorite:

  • quasi-significant (p=0.09)

I'm not sure what "quasi-significant" is even supposed to mean, but it sounds quasi-important, as long as you don't think about it too hard. But there's still no getting around the fact that a p-value of 0.09 is not a statistically significant result. 

The blogger does not address the question of whether the opposite situation occurs. Do contributors ever write that a p-value of, say, 0.049999 is:

  • quasi-insignificant
  • only slightly significant
  • provisionally insignificant
  • just on the verge of being non-significant
  • at the margin of statistical non-significance

I'll go out on a limb and posit that describing a p-value just under 0.05 in ways that diminish its statistical significance just doesn't happen. However, downplaying statistical non-significance would appear to be almost endemic. 

That's why I find the above-referenced post so disheartening. It's distressing that you can so easily gather so many examples of bad behavior by data analysts who almost certainly know better.

You would never use language to try to obscure the outcome of your analysis, would you?

 

Why You Should Use Non-parametric Tests when Analyzing Data with Outliers

$
0
0

There are many reasons why a distribution might not be normal/Gaussian. A non-normal pattern might be caused by several distributions being mixed together, or by a drift in time, or by one or several outliers, or by an asymmetrical behavior, some out-of-control points, etc.

I recently collected the scores of three different teams (the Blue team, the Yellow team and the Pink team) after a laser tag game session one Saturday afternoon. The three teams represented three different groups of friends wishing to spend their afternoon tagging players from competing teams. Gengiz Khan turned out to be the best player, followed by Tarantula and Desert Fox.

One-Way ANOVA

In this post, I will focus on team performances, not on single individuals. I decided to compare the average scores of each team. The best tool I could possibly think of was a one-way ANOVA using the Minitab Assistant (with a continuous Y response and three sample means to compare).

To assess statistical significance, the differences between team averages are compared to the within (team) variability. A large between-team variability compared to a small within-team variability (the error term) means that the differences between teams are statistically significant.

In this comparison (see the output from the Assistant below), the P value was 0.053, just above the 0.05 standard usual threshold. The P value is the probability that the differences in observed means are only due to random causes. A p-value above 0.05, therefore, indicates that the probability that such differences are only due to random causes is not negligible. Because of that, the differences are not considered to be statistically significant (there is "not enough evidence that there are significant differences," according to the comments in Minitab Assistant). But the result remains somewhat ambiguous since the p-value is still very close to the significance limit (0.05).

Note that the variability within the Blue team seems to be much larger (see the confidence interval plot in the means comparison chart below) than for the other two groups. This not a cause for concern in this case, since the Minitab Assistant uses the Welch method of ANOVA, which does not require or assume variances within groups to be equal.

Outliers and Normality

When looking at the distribution of individual data (below) one point seems to be an outlier or at least a suspect, extreme value (marked in red). This is Gengiz Khan, the best player. In my worksheet, the scores have been entered from the best to the worst (not in time order). This is why we can see a downward trend in the chart on the right site of the diagnostic report (see below).

The Report Card (see below) from the Minitab Assistant shows that Normality might be an issue (the yellow triangle is a warning sign) because the sample sizes are quite small. We need to check normality within each team. The second warning sign is due to the unusual / extreme data (score in row 1) which may bias our analysis.

Following the suggestion from the warning signal in the Minitab Assistant Report Card, I decided to run a normality test. I performed a separate normality test for each team in order not to mix different distributions together.

A low P value in the normal probability plot (see below) signals a significant departure from normality. This p-value is below 0.05 for the Blue team. The points located along the normal probability plot line represent “normal,” common, random variations. The points at the upper or lower extreme, which are distant from the line, represent unusual values or outliers. The non-normal behavior in the probability plot of the blue team is clearly due to the outlier on the right side of the normal probability plot line.

Should we remove this value (Gengiz Khan’s score) in the Blue group and rerun the analysis without him ?

Even though Gengiz Khan is more experienced and talented than the other team members, there are no particular reasons why he should be removed—he is certainly part of the Blue team. There are probably many other talented laser game players around. If another additional laser game session takes place in the future, there will probably still be a large difference between Gengiz Khan and the rest of his team.

The problem is that this extreme value tends to inflate the within-group variability. Because there is a much larger within-team variability for the blue team, differences between groups when they are compared to the residual / within variability do not appear to be significant, causing the p-value to move just above the significance threshold.

A Non-parametric Solution

One possible solution is to use a non-parametric approach. Non-parametric techniques are based on ranks, or medians. Ranks represent the relative position of an individual in comparison to others, but are not affected by extreme values (whereas a mean is sensitive to outlier values). Ranks and medians are more “robust” to outliers.

I used the Kruskal-Wallis test (see the correspondence table between parametric and non-parametric tests below). The p-value (see the output below) is now significant (less than 0.05), and the conclusion is completely different. We can consider that the differences are significant .

Kruskal-Wallis Test: Score versus Team

Kruskal-Wallis Test on Score

Team            N             Median             Ave Rank             Z

Blue               9            2550,0                  23,7               2,72

Pink             13             -450,0                 11,6               -2,44

Yellow          10             975,0                 16,4                -0,06

Overall         32               16,5

H = 8,86  DF = 2  P = 0,012

H = 8,87  DF = 2  P = 0,012  (adjusted for ties)

See below the correspondence table for parametric and non-parametric tests :

Conclusion

Outliers do happen and removing them is not always straightforward. One nice thing about non-parametric tests is that they are more robust to such outliers. However, this does not mean that non-parametric tests should be used in any circumstance. When there are no outliers and the distribution is normal, standard parametric tests (T tests or ANOVA) are more powerful.  

Big Ten 4th Down Calculator: Championship Edition

$
0
0

Big Ten LogoThe College Football Playoff technically doesn't start until December 31st, but in reality it started Saturday night in Indianapolis. The winner of the Big Ten Championship Game was in the playoff, while the loser was out. The stakes couldn't have been higher. So the competitors need to make sure they gain every advantage they can. And that's where 4th down decisions come in. With a lot of coaches making sub-optimal 4th down decisions, your team could gain an edge if you use statistics to make the correct decision. And that edge can be the difference between a playoff appearance and an exhibition game.

So, how did Iowa and Michigan State do?

A few quick things before we begin. My 4th down calculator is based on a regression model that predicts a team's expected points, and takes into account home field advantage. However, this game was played at a neutral site. So I used an adjusted model that doesn't use home field advantage. Also, for the average net yards on a punt, I used each team's season average (as opposed to the Big Ten average that I've used previously).

And, as always, the calculator isn’t meant to provide decisions written in stone. In hypothesis testing, it’s important to understand the difference between statistical and practical significance. A test that finds a statistically significant result doesn’t imply that the result has practical consequences. The people testing the hypothesis should use their specialized knowledge to determine whether the difference is practically significant. Apply the same line of thought to the 4th down calculator. If you're the coach, you should also consider other factors—but the 4th down calculator still provides a great starting point for making the decision.

Now let's break down the game.

Michigan State 16 - Iowa 13

Instead of summarizing the 4th down decisions, I'm going to break down each one. After all, we only have one game to talk about.

4th Down Decisions in the First Half Team4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionExpected Points Go For ItExpected Points Kick Michigan St 5 5 Field Goal Field Goal 2.79 2.80 (FG) Iowa 1 52 Go for it Punt 0.82 0.13 (Punt) Iowa 6 6 Field Goal Field Goal 2.52 2.78 (FG) Michigan St 5 70 Punt Punt -1.79 -1.48 (Punt) Iowa 4 20 Go for it Go for it 2.21 2.15 (FG) Iowa 9 25 Field Goal Field Goal 1.04 1.76 (FG) Michigan St 15 34 Field Goal Field Goal -0.05 0.79 (FG) Michigan St 1 71 Go for it Punt -0.60 -1.56 (Punt) Iowa 11 73 Punt Punt -2.70 -1.45 (Punt)


The score at halftime was 6-3. On the surface, this might make you think this was a boring game. But you would be incorrect, as the first half featured only 4 punts. And if the coaches had made some better 4th down decisions, we would have had even fewer punts!

Michigan State opened the 4th down decision making with a 4th and goal at the 5. The general consensus is that you have to take the field goal, and in this instance the calculator agrees...barely. You'll see that the expected points for going for it are about the same as kicking a field goal. This is because even if Michigan State fails on 4th down (which they would do about 68% of the time), Iowa would start with the ball deep in their own territory. Any time a team punts, the best-case scenario (outside of a turnover) is that they down the punt inside the 5 yard line. In this case, the worst-case scenario (outside of a turnover)is that the opponent starts inside their own 5 yard line. There is nothing wrong with Michigan State kicking a field goal here, but in general teams should be much more aggressive inside the five yard line on 4th down.

Having said that, there is a big problem with Iowa kicking on the next 4th down decision. They punted on a 4th and 1 at midfield. And the worst part is that it wasn't even a full yard. On 3rd and 1 they tried a quarterback sneak that just came up inches short. This season Iowa had one of the best rushing offenses in the Big Ten. To punt on 4th and inches is just insane. Following the punt, Iowa did get an interception that led to a field goal. But imagine if they converted the 4th down, scored points on the drive, and then still got the interception on defense. In a game where points were hard to come by, you can't just willingly leave them on the field.

Iowa left more points on the table early in the 2nd quarter. But this time it was the players' fault, not the coaches. On a 4th and 4 from the Michigan State 20 yard line, Iowa correctly decided to go for it instead of kicking a field goal. But their tight end moved early, and the false start penalty moved them back 5 yards. Now faced with a 4th and 9, Iowa correctly attempted a field goal. They made the kick, but we're left to wonder what might have been if they converted the 4th down and scored a touchdown on the possession.

The next possession, Michigan State faced a 4th and 15 at the Iowa 34. This is a spot on the field where Michigan State has actually been more aggressive than the calculator suggests. This year, they've had 6 instances where they've had a 4th and 6 or longer between their opponent's 25 and 34 yard line. In all 6 cases, the calculator suggested kicking a field goal. But Michigan State actually went for it 4 times. And the 4th down distance in those 4 cases were 6, 8, 8, and 10 (they kicked a field goal on a 4th and 8 and a 4th and 12). This shows that Michigan State really doesn't trust their field goal kicker from longer distances. If the 4th down distance was just a little shorter, I'm almost certain Michigan State would have went for it here. But 15 yards is just too hard to convert, so they (correctly) kicked a field goal. But as if to justify Michigan State's decisions earlier in the season, Spartan kicker Michael Geiger missed the kick. 

Believe it or not, Michigan State has had only two 4th and 1s this season (although this doesn't include every 4th quarter). They've correctly gone for it both times, but it helps that they were in opponent territory on both occasions. Late in the 2nd quarter, Michigan State punted on a 4th and 1 from their own 29. Now there was only 1:49 left in the half. So this decision isn't quite as bad as usual, since Michigan State may not have had enough time for a full drive anyway. But it still gave Iowa one more opportunity to score, and half of the reason for going for it on 4th and 1 is limiting the opportunities your opponent has to score. Luckily for the Spartans, the punt netted 57 yards, and neither team scored before halftime.

4th Down Decisions in the 3rd Quarter Team4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionExpected Points Go For ItExpected Points Kick Michigan St 5 70 Punt Punt -1.79 -1.45 (Punt) Michigan St 5 39 Go for it Punt 0.53 0.41 (Punt) Iowa 12 90 Punt Punt -4.03 -2.72 (Punt) Michigan St 6 11 Field Goal Field Goal 2.45 2.62 (FG) Iowa 5 70 Punt Punt -1.79 -1.22 (Punt) Michigan St 4 29 Go for it Field Goal 1.53 1.38 (FG)

Michigan State had an opportunity to take the game over in the 3rd quarter. They did go from being down 3 to being up by 3, but they passed on opportunities to make that lead even larger. The first bad decision was a punt on 4th and 5 from the Iowa 39 yard line. The difference in expected points is only 0.12, so on paper the decision to punt doesn't seem like a terrible one. But really, what is the down side of going for it? The other team might get the ball in their own territory? That's going to to happen if you punt too! Michigan State should have went for it here.

Remember how I said Michigan State has been passing on long field goals and going for it, even on longer 4th down distances? Well, that makes their decision at the end of the 3rd quarter even more confusing. They decided to kick a 47 yard field goal on 4th and 4 from the Iowa 29. Michigan State went for it in this spot on 4th and 10 earlier this season. With the game tied, you might argue that they should kick the field goal to take the lead. Except 47 yard field goals are missed all the time, and Michigan State already missed a long field goal earlier in the game.

And you know what is better than a 3 point lead? A 7 point lead!

Luckily for Michigan State, they did make the field goal. And the Spartans took a 9-6 lead into the 4th quarter.

4th Down Decisions in the 4th Quarter TeamTime Left4th Down DistanceYards to End ZoneCalculator DecisionCoach DecisionWin Probability Go For ItWin Probability Kick Michigan St
(Down by 4) 12:14 9 60 Punt Punt 25.8% 28.4% (Punt) Iowa
(Up by 4) 9:37 2 47 Go for it Punt 73.7% 73.4% (Punt) Michigan St
(Down by 4) 1:54 2 5 Go for it Go for it 46.1% 27.5% (FG)

In the 3rd quarter, Iowa gained a grand total of -7 yards (including penalties) against Michigan State. The Hawkeyes hadn't even gained a first down since 5 minutes remained in the 2nd quarter. And to make matters worse, they had a 2nd and 20 at their own 15 yard line. Even though Michigan State left points on the table in the 3rd quarter, you probably thought they were in good shape with a 3 point lead going into the final 15 minutes of the game. 

You thought wrong.

The first play of the 4th quarter was a 85 yard touchdown pass for Iowa. That's why it's so important to maximize your points in the first three quarters: If the score is close, it only takes one play to completely change the dynamic of the game. If Michigan State had been up by multiple scores, they would still be in good shape. But now with Iowa being up by 4 points, they were more likely to lose the game, especially when they had to punt on a 4th and 9.

Everybody is going to remember the 4th and 2 that Michigan State converted at the end of the game. But Iowa had a big decision on their own 4th and 2 the drive before. Instead of going for it, they decided to take a delay of game penalty and punt. My first reaction was that this was a terrible decision. I was sure the statistics would heavily favor going for it. But surprisingly, the numbers are just about even. They slightly favor going for it, but the difference is 0.3%, so really, either decision is fine. And with Iowa having a strong defense and Michigan State quarterback Connor Cook appearing to possibly be hurt, there is nothing wrong with the decision to punt. Having said that, I think there is a more general reason most coaches will opt to punt in this situation.

Iowa kicked and Michigan State scored the game-winning touchdown on the next possession. Did you hear any post game analysis about how Iowa should have gone for it ? How they should have relied on their strong running game to put the game away? Of course not. There was no second guessing the punt, even though it directly cost Iowa the game. Now imagine that Iowa had gone for it, failed, and Michigan State won the game. People would be second guessing that decision left and right. I find it crazy how we never associate the decision to punt with costing a team a game, but we do it with 4th down decisions all the time. Even when it's the correct decision!

Speaking of the correct decision, there is no doubt Michigan State should have went for it on 4th and 2 at the end of the game. You're only 5 yards from taking the lead, and there is no guarantee you get the ball back if you kick a field goal. Deciding to kick would have lowered their win probability by almost 20%. Kicking a field goal here would have by far been the worst coaching decision made in the Big Ten this year. Luckily for Michigan State, they left their conservative decision making in the 3rd quarter, and went for it.

And if they're going to upset Alabama in the College Football Playoff, they're going to need to do more of the same.

End of the Season Summary

Here is a breakdown of all the 4th downs (through the first 3 quarters) in the Big Ten this year. I'm using just the first 3 quarters since many of the games this year were blowouts in the 4th quarter. You can get every 4th down decision that I tracked this season here.

Team Summary

Team Number of Disagreements Total Expected Points Lost Northwestern 15 8.74 Ohio State 13 6.64 Rutgers 12 6.43 Michigan 13 5.65 Iowa 9 5.53 Penn State 12 5.37 Illinois 17 5.03 Minnesota 11 5.02 Indiana 9 4.64 Michigan St 13 4.43 Nebraska 8 4.15 Purdue 7 3.27 Wisconsin 5 2.18 Maryland 8 1.78

Northwestern had 4th and 1 seven different times this season. They went for it only two times, punting 3 times and kicking two field goals. And they had 10 more 4th down distances of 2 or 3 yards, and kicked every time. Ironically, they did go for a 4th and 7 from their opponents 29 yard line when the calculator suggested a field goal. Oh, and of course the very next game they kicked a field goal on 4th and 1 from their opponents 25. Go for it on 4th and 7, and then kick a field goal on 4th and 1. There is Northwestern's 4th down decision making this season in a nutshell.

But you might ask what the point is if Northwestern was still able to go 10-2 despite their 4th down decision making? Well, Northwestern was 5-0 in games decided by 1 possession this season. By leaving points on the table, they kept playing in close games. And in the long run, your record in 1-possession games is going to be 0.500. So they aren't as good as their record would indicate (just look at the fact that they are 9 point underdogs to 8-4 Tennessee in their bowl game). And if Northwestern continues to make sub-optimal 4th down decisions next year, I sure wouldn't bet on them to repeat their success of this year.

 Where Can Coaches Improve?

Let's take a look at where coaches could improve their 4th down decision making. First, let's look at how often they agreed and disagreed with the calculator.

Calculator Suggests This

Coach Did This

 

Field Goal

Go for it

Punt

Field Goal

90 (80%)

16

6

Go for it

34

58 (32%)

89 (49%)

Punt

0

7

404 (98%)

When the calculator suggested a field goal, coaches did kick the field goal about 80% of the time. And honestly, the other 20% probably had more to do with my model overestimating a kicker's accuracy from longer distances than coaches making bad decisions. I used game data to calculate the probability of a kicker making a field goal. But for longer distances, a coach is only going to attempt a field if he thinks his kicker has the range. So for kickers without a strong leg, my model likely overestimates their chance of making a long field goal, and thus the coach is correct to go for it or punt instead of listening to the calculator.

And coaches are right on board when the calculator suggests punting. So that only leaves one option...going for it!

If you've been following along with me all season, I bet you never saw this coming...but coaches are being too conservative on 4th down. They are punting when they should be going for it about half of the time! That's 89 times when they could have tried to keep possession and score more points, but instead willingly gave the other team the ball.

But I know what most coaches are probably thinking. "This just based on some mathematical jargon—it doesn't really apply to the real world." Well, here are some real-world results. I took every single 4th and short (1 or 2 yards to go) and looked at what the next score was for each coaching decision.

Yards to End Zone Punts Average Next Score After Punt Go for it Average Next Score After Go for it Field Goals Average Next Score After FG 75 to 90 14 -4.10 1 7 * * 50 to 74 47 -0.34 8 1.25 * * 25 to 49 6 1.17 19 2.37 1 -7 1 to 24 * * 15 3.73 6 3

If you go for it on 4th down deep in your own territory, the fear is that the other team will have an easy score if you fail. But look at what happened to teams who punted this season. The 14 times the team punted on 4th and 1 or 2 inside their own 25 yard line, their opponent was more likely to be the next team to score, to the tune of 4.1 points! Even with a punt, the other team scores next most of the time, anyway! So what do you have to lose by going for it?

Things get a little more reasonable outside the 25 yard line. But the 8 teams who went for it between their own 25 and midfield were justly rewarded, outscoring their punting counterparts by a point and a half! And once you cross midfield, going for it remains the best option. Teams that went for it between midfield and their opponent's 25 scored more on average than the teams that tried to "pin their opponent deep." And LOL Northwestern for being the one team that attempted a field goal.

As for the people who claim you have to "take the points" once you're in easy field goal position, well...teams who went for it scored more points than the teams who kicked the field goal. Take the points, indeed!

But on a positive note, most coaches are going for it on 4th and 1 and 2 once they get past midfield. That's a positive! But should they be even more aggressive than that? Coaches are good at going for it on 4th and 1 or 2, but once they get past the 10 yard line they should also be going for it on 4th and 3 and 4th and 4. So let's look at what decisions they made inside the 10.

4th Down Distance Field Goals Average Next Score After FG Go for it Average Next Score After Go for it 1-2 Yards 4 3 8 5.13 3-4 Yards 10 3 0 0 5-10 Yards 18 3.44 0 0

Unfortunately we have some small samples sizes here, but the real-life outcomes still support what the calculator suggests. Teams who went for it on 4th and 1 and 4th and 2 scored over two more points than teams who kicked a field goal. Unfortunately, teams stopped being aggressive on 4th and 3 and 4th and 4. Not a single coach decided to go for it, as they all went with "taking the points." Except you'd actually be taking more points if you went for it!

You might be wondering how teams who kicked a field goal on 4th and 5 to 4th and 10 actually scored more than 3 points on average. There were two cases where a team missed a field goal. Oh no! That's terrible, right? Except they were able to get the ball back and scored a touchdown on their next drive. And that's with the other team starting at the 20 after the missed field goal. If you fail on 4th down, they'll start even closer to their own end zone. All the more reason to be aggressive on 4th down inside the 10!

And if you're still not convinced after all that math, consider an intangible benefit of being aggressive on 4th down. Imagine you're a coach recruiting a star running back or star quarterback. You tell him that every other coach is going to punt on 4th and 1 in their own territory and kick a field goal on 4th and goal from the 3. But us, we're going to go for it. And you know what that means for you? More yards. More touchdowns. More impressive stats. And if we fail on 4th down, nobody is going to blame you. When the Heisman voters look at statistics, they don't care about what percentage of 4th downs a quarterback converts. What they care about is yards and touchdowns. And we're going to give you more opportunities to gain yards and score touchdowns than any other team.

So that will wrap things up for the Big Ten 4th down calculator this year. Enjoy the bowl season, and until next year always remember that...

Kicking is for quitters!

Factor Analysis and the Best Super Bowl Quarterback Ever

$
0
0

The Lombardi Trophy from Super Bowl XXIIILast time, we used factor analysis to come up with factors we could use to describe the performances of Super Bowl-winning quarterbacks. Now, we’ll try to use those factors to compare the performance of our candidate quarterbacks who won at least 3 Super Bowls.

Recall that one purpose of factor analysis is to identify underlying factors that you can't measure directly. For example, the members of a committee at a hospital design a survey with questions that assess 3 underlying factors: timeliness of service, accuracy of service, and courteousness of service. The committee members can use factor analysis to combine the responses for many different questions into 3 factors.

For the analysis of quarterbacks who have won at least 3 Super Bowls, we came up with three factors from 7 variables:

  • Factor 1: How well the winning quarterback played. Higher is better.
  • Factor 2: Competitiveness of matchup. Far from 0 is better, either low or high.
  • Factor 3: How impressive the victory was. More negative is better.

The most typical statistical analysis for describing a group, like the data points for Joe Montana, would be to measure their distance from the average factor point. (In this case, the average factor point would be where every factor equals 0). Because our factors are directional, a quarterback could be far from the average point, but in the wrong direction. Instead of looking at the distance, I’m going to measure the distance from a hypothetical great performance where each factor has its best value attained so far. Because either high or low values are good for factor 2, I’ll check the distances for 2 different points.

For factor 1, the best possible point was 1.773 by Jim Plunkett. For factor 2, the best possible point is either Joe Montana’s 2.322 or Tom Brady’s -0.386, depending on whether we take the low side or the high side. For factor 3, the best value is Tom Brady’s -2.144. This 3-D scatterplot shows the mean value for each candidate quarterback, the mean value for all other performances by quarterbacks who won Super Bowls, and the 2 ideal values.

A rough guess from the graph assures me only that Aikman is probably further from the ideal points than the other 3. Here are the actual Euclidean distances between the mean factor scores for each candidate quarterback and the ideal games. (Want to see how to calculate Euclidean distance? In Minitab, Click Help > Methods and Formulas. Then, use the Search tab to look for "Euclidean.")

Quarterback

Distance from ideal with factor 2 high

Distance from ideal with factor 2 low

Aikman

4.08328

3.53594

Bradshaw

3.51415

2.15082

Brady

4.15049

2.31167

Montana

2.98226

2.36892

Among the quarterbacks with 3 or more victories, Montana’s factor means are the closest to the best factor values ever achieved when you value how thoroughly a team dominated the other. Montana’s distance is a half unit closer to this ideal game than Terry Bradshaw is. Interestingly, although Tom Brady has the lowest value of factor 2 among all quarterbacks, it’s Terry Bradshaw who is the closest to the ideal point when you consider the ideal game to be winning a competitive matchup. The difference in the means is not as striking here. Bradshaw is within 0.22 of both Tom Brady and Joe Montana.

Conclusion

Terry Bradshaw deserves a lot of credit for beating Roger Staubach-led Cowboys teams twice. Tom Brady's achievement in beating the highest-rated loser ever is noteworthy. But Joe Montana comes away with credibility for beating the Ken Anderson-led Bengals and the Boomer Esiason-led Bengals in competitive Super Bowls, as well as blowing out John Elway and Dan Marino in less-competitive matchups. With these data, and these criterion, I'll turn to Joe Montana as the best superbowl quarterback ever.

The image of the Lombardi Trophy from the 49ers victory in Super Bowl XXIII is by youraddresshere and is licensed under this Creative Commons License.

Why Are P Value Misunderstandings So Common?

$
0
0

Danger thin ice signI’ve written a fair bit about P values: how to correctly interpret P values, a graphical representation of how they work, guidelines for using P values, and why the P value ban in one journal is a mistake. Along the way, I’ve received many questions about P values, but the questions from one reader stand out.

This reader asked, why is it so easy to interpret P values incorrectly? Why is the common misinterpretation so pervasive? And, what can be done about it? He wasn’t sure if it these were fair questions, but I think they are. Let’s answer them!

The Correct Way to Interpret P Values

First, to make sure we’re on the same page, here’s the correct definition of P values.

The P value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis. In other words, if the null hypothesis is true, the P value is the probability of obtaining your sample data. It answers the question, are your sample data unusual if the null hypothesis is true?

If you’re thinking that the P value is the probability that the null hypothesis is true, the probability that you’re making a mistake if you reject the null, or anything else along these lines, that’s the most common misunderstanding. You should click the links above to learn how to correctly interpret P values.

Historical Circumstances Helped Make P Values Confusing

This problem is nearly a century old and goes back to two very antagonistic camps from the early days of hypothesis testing: Fisher's measures of evidence approach (P values) and the Neyman-Pearson error rate approach (alpha). Fisher believed in inductive reasoning, which is the idea that we can use sample data to learn about a population. On the other side, the Neyman-Pearson methodology does not allow analysts to learn from individual studies. Instead, the results only apply to a long series of tests.

Courses and textbooks have mushed these disparate approaches together into the standard hypothesis-testing procedure that is known and taught today. This procedure seems like a seamless combination but it's really a muddled, Frankenstein's-monster combination of sometimes-contradictory methods that has promoted the confusion. The end result of this fusion is that P values are incorrectly entangled with the Type I error rate. Fisher tried to clarify this misunderstanding for decades, but to no avail.

P Values Aren’t What We Really Want to Know

The common misconception is what we'd really like to know. We’d loooove to know the probability that a hypothesis is correct, or the probability that we’re making a mistake. What we get instead is the probability of our observation, which just isn’t as useful.

It would be great if we could take evidence solely from a sample and determine the probability that the sample is wrong. Unfortunately, that's not possible—for logical reasons when you think about it. Without outside information, a sample can’t tell you whether it’s representative of the population.

P values are based exclusively on information contained within a sample. Consequently, P values can't answer the question that we most want answered, but there seems to be an irresistible temptation towards interpreting it that way.

P Values Have a Convoluted Definition

The correct definition of a P value is fairly convoluted. The definition is based on the probability of observing what you actually did observe (huh?), but in a hypothetical context (a true null hypothesis), and it includes strange wording about results that are at least as extreme as what you observed. It's hard to understand all of that without a lot of study. It's just not intuitive.

Unfortunately, there is no simple and accurate definition that can help counteract the pressures to believe in the common misinterpretation. In fact, the incorrect definition sounds so much simpler than the correct definition. Shoot, not even scientists can explain P values! And, so the misconceptions live on.

What Can Be Done?

Historical circumstances have conspired to confuse the issue. We have a natural tendency to want P values to mean something else.  And, there is no simple yet correct definition for P values that can counteract the common misunderstandings. No wonder this has been a problem for a long time!

Fisher tried in vain to correct this misinterpretation but didn't have much luck. As for myself, I hope to point out that what may seem like a semantic difference between the correct and incorrect definitions actually equates to a huge difference.

Using the incorrect definition is likely to come back to bite you! If you think a P value of 0.05 equates to a 5% chance of a mistake, boy, are you in for a big surprise—because it’s often around 26%! Instead, based on middle-of-the-road assumptions, you’ll need a P value around 0.0027 to achieve an error rate of about 5%. However, not all P values are created equal in terms of the error rate.

I also think that P values are easier for most people to understand graphically than through the tricky definition and the math. So, I wrote a series of blog posts that graphically show why we need hypothesis testing and how it works.

I have no reason to expect that I'll have any more impact than Fisher did himself, but it's an attempt!

Viewing all 828 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>