Minitab | Minitab

Purchasing a used car can be stressful due to all the factors that need to be considered. Web sites such as www.cars.com provide you a wealth of information, but how do you navigate through it all to find the best deal?

Minitab to the rescue. Once you narrow your choice down to a particular car model, such as an Acura TSX, the data from www.cars.com can be copied and pasted into Minitab. After some data manipulation, you can use a regression analysis to develop an equation that calculates the expected list price of a vehicle based on variables such as year, mileage, whether or not the technology package is included, and whether or not a free Carfax report is included (which is possibly an indicator of how confident the seller is in the vehicle).

A Regression Model for Used Car Price

Let's apply this idea to an Acura TSX, using data for 986 cars downloaded from www.cars.com on 7/24/2013. If you'd like to do this analysis yourself, download the data (and a free 30-day trial of our statistical software, if you don't already have it).

After you've opened the data in Minitab, choose Stat > Regression > General Regression and fill out the dialog box like this:

general regression factors

Below is the regression model Minitab fits to this data.

regression analysis of used car prices

What Does Regression Analysis Tell Us About the Price of Used Cars?

Several interesting findings come from this regression analysis:

Every mile that is added to the car decreases the expected list price by approximately 6 cents.
Each year that is added to the car's age decreases the expected list price by approximately $1310.
The technology package adds, on average, approximately $1044 to the list price of the car.
Cars with a free Carfax report have a list price, on average, approximately $441 more than those with a paid report. A Carfax report only costs $40, so this increased price is likely due to the fact that the car has a clean report (or else they probably wouldn’t provide it for free!).
An impressive 89.8% of the variation in car list price is explained by these predictors.

Those findings are interesting, but the main focus of this analysis is to find the car that has the best value. In other words, the car that has the largest difference between the actual list price and the expected list price. The residuals from the regression analysis contain this exact information.

Finding the Price Difference in a Residuals Plot

To get the residuals plot from this analysis, rerun the analysis (you can just hit Ctrl-E on your keyboard to bring up the last dialog box used, which should be the General Regression dialog shown above). Then click on Graphs, and check the box for "Normal plot of residuals" so it looks like this:

residuals plot

Press OK, run the analysis, and you'll get the plot shown below.

This probability plot of the residuals indicates that three cars have an unusually large difference between the actual list price and the expected list price. They are underpriced by $7,500 to $10,000.

normal probability plot

Unfortunately, two of those cars have severe damage.

Damaged Car damaged car 1

After removing the two damaged cars from the analysis, one car is clearly priced better than the other 983 cars. There's our best value.

plot of car prices

This car appears to have no damage. The www.cars.com description is below.

Description of Car

In summary, there is a little bit of work getting the data from www.cars.com into Minitab in an analysis-ready format, but the effort will reveal the best-value cars, resulting in potential savings in the thousands of dollars.

Chili Peppers My husband, Sean, and I were recently at my parent’s house for a picnic dinner. As a lover of hot sauce (I’m talking extremely hot, hot, hot, HOT sauce!), my stepdad always has a plethora of bottles around to try. While I do enjoy spicy foods from time to time, I’ve learned not to touch his hot sauce selections. His favorites are much too spicy for my taste!

Unfortunately, Sean learned the hard way. He used Habanero hot sauce on his hot sausage sandwich – talk about double the heat! I saw him sinking in his seat, eyes watering … a few hoarse coughs …

Yikes! Anyway, Sean is alive and well after suffering for a few uncomfortable minutes. His recent hot sauce hardship got me thinking more what makes hot sauce “hot” and how the heat is measured.

Visualizing the Relative Spiciness of Hot Peppers

The Scoville Heat Scale is a measure of the hotness of a chili pepper, the main ingredient in hot sauce. The scale is actually a measure of the concentration of the chemical compound called capsaicin, which creates that “my-mouth-is-on-fire” feeling. Different types of chili peppers contain various amounts of capsaicin, and the Scoville scale provides us with a measure of the heat of a chili pepper depending on the level of capsaicin it contains.

The heat values are measured on a scale of 0, which would be a sweet bell pepper with no spiciness, to upwards of well over a million Scoville heat units, which are chili peppers with the highest heat ratings. Check out this bar chart (In Minitab, navigate to Graph > Bar Chart) with a few of the hottest recorded chili peppers (based on the chart in this article):

Minitab Bar Chart

Keep in mind the variability of ratings, which can change based on different species of chilies, and variable growing conditions. The chart above is just an interpretation for the sake of comparing the different kinds of chilies out there and their approximate heat levels.

Do Your Ratings of Different Hot Sauces Match Mine?

For a little bit of fun, I wanted to see whether Sean and I rate the same hot sauces based on their “heat” levels consistently. That way, at least from my perspective, I can tell if he’s just a big baby who can’t take the heat, or if I’m the one with the spicy intolerance. But perhaps, we’ll rate the hot sauces the same? Let’s just find out.

We picked up a sampler of 10 different hot sauces to test. We each rated the 10 different sauces on a 4-point scale: 1 = mild, 2 = hot, 3 = very hot, 4 = uncomfortably hot, and recorded our data into a Minitab Worksheet:

Minitab Worksheet

(You can download the dataset and follow along if you’d like.)

Performing an Attribute Agreement Analysis

What we want to do in this case is evaluate the ordinal measurement “rating” system for the hot sauce samples by performing an attribute agreement analysis in Minitab.

This type of analysis can be especially useful in the quality improvement world. For example, attribute agreement analysis helps assess the agreement of subjective ratings or classifications given by multiple appraisers. Using this analysis, you can assess if operators in your factory are agreeing on the pass/fail ratings for product samples.

In Minitab 16, choose Stat > Quality Tools > Attribute Agreement Analysis:

Menu Path in Minitab 16

In the Attribute column, enter Ratings, in the Samples column, enter Sauce, and in Appraisers, enter Appraiser. Also, be sure to check the box at the bottom of the window for “Categories of attribute data are ordered.” Here’s the Minitab output:

Attribute Agreement Analysis

How to Interpret the Results of the Attribute Agreement Analysis

According to the ‘Between Appraisers’ table above, Sean and I agree on the rating for 7 of the 10 hot sauces. Not bad! I hear that after a while married people tend to look alike, but I guess they tend to “rate” alike too …

The p-value for the overall Kappa value is very low indicating that our agreement is not by chance. The p-value for Kendall’s coefficient of concordance is less than .05 (the typically used value of alpha), which indicates that the ratings between appraisers are associated. Kendall’s coefficient of concordance takes into account the ordinal nature of the data. The Minitab bar chart of ratings versus sauce grouped by appraiser below shows the 3 times that Sean and I didn’t match in our ratings. And those ratings were only apart by 1 unit, with the only disagreements happening on sauces 1, 3, and 6:

Minitab Bar Chart

For more on attribute agreement analysis, check out this document from our tech support team, or the several tutorials available within Minitab Help (Help > Tutorials > Measurement Systems Analysis > Attribute Agreement Analysis, and in the StatGuide: Help > StatGuide > Quality Tools > Attribute Agreement Analysis).

Photo of Chili Peppers by Xenia, used under creative commons 2.0 license.

Tootsie Pop Owl by Cory Heid, guest blogger

A few months ago I posted a blog about Tootsie Pops and how many licks it takes to get to the Tootsie Roll center. If you haven’t read the post, here's a quick summary.

Recap of Initial Study

I broke down my experiment into four parts where I would test:

the force of a lick
temperature of a person's mouth
pH level of a person's saliva
the solubility of a person's saliva

After some tests and analysis of the data I collected, I was able to conclude that none of the factors I tested were statistically different or important enough to affect the number of licks required to reach the center.

I then moved to doing it the old-fashioned way and got a bunch of human lickers and recorded some data on them. The mean number of licks I found was about 356 licks, but there was a pretty large standard deviation (186 licks) and a lick range of 73 to 1087 licks.

Based on what I was able to gather, it seemed that there was a lot of variability in this experiment. Rather than trying to remove the human element from the equation, I decided to dig deeper into the Tootsie Pop itself.

And by dig deeper, I meant cutting some pops open and taking a look at them.

Plan of Attack

After much experimentation and testing, I finally figured out the best way to cut a tootsie pop in half. After heating a knife with a flame, I was able to chop some pops in half, and it was just like cutting butter.

chopped-pop Tootsie Pop 2

As the pictures indicate, it is clear that no two pops are alike. I took a few measurements of the pop, which consisted of pop height and width, core height and width, and shell thickness on the right side. To be consistent, I decided to make my measurement point the top of the Tootsie Pop's stick. Turns out, that didn't work so well … take a look:

tootsie pop experiment misstep

Since the sticks did not have a consistent length, it was hard to find the center and measure the width of the pop each time. After I found the center, I took some measurements and logged the data.

Running some basic statistics in Minitab reveals that there is pretty large variation in each of the measurement categories. Specifically, take a look at the stick length and the shell width on the right side. These two categories in particular show that there is much variation in the middle of the Tootsie Pop itself.

Tootsie Pop Analysis

What Does This Mean?

People love Tootsie Pops just as they are, so we shouldn't necessarily think of this variation as a problem. But in theory, since we know that no two Tootsie Pops are the same based on the large variation in their core, we could look further into why this is and how it might be possible to "fix" this variation.

For example, since all Tootsie Pops are manufactured, we could use W. Edward Deming's work in common cause and special cause statistics to help understand where the variation in the Tootsie Pops comes from. Then, by implementing Deming's "Plan-Do-Check-Act," we could start to find and control the differences between pops.

Another possible system to help make Tootsie Pops more uniform would be Six Sigma, which was developed by Motorola in 1985. This system would help to put each pop within 6 standard deviations (hence the name Six Sigma) of the mean for pops. This system would also help reduce the defects to 3.4 defects per million.

Since the Tootsie Pop company makes about 20 million Tootsie Pops a day, that would reduce the number of inconsistent pops to 68 per 20 million pops. Not too shabby! Either system, if implemented, might help find a definitive answer for the number of licks to the center.

On the other hand, with the current system, Tootsie Pops can hold true to their longstanding answer about how many licks it takes to get to the center: "The World May Never Know!"

If you enjoyed this read or have any questions or insights, let me know in the comments below. Thanks!

About the Guest Blogger:

Cory Heid is a student in applied mathematics at Siena Heights University in Adrian, Mich. He is interested in data analysis projects, as well as data-driven model building. Cory presented his Tootsie Pop findings at the 2013 Joint Mathematics Meetings in January, and at the Michigan Undergraduate Mathematics Conference in February.

Would you like to publish a guest post on the Minitab Blog? Contact publicrelations@minitab.com.

Kevin Rudy has recently written two great posts (here and here) about how fantasy football studs perform the following year. For any fantasy team manager, the results demonstrate how difficult it can be to predict player performance...pity the person with the first pick in a draft, who seems almost certain to not pick the best performer that year!

But why is this the case?

One cause is special circumstances such as injury...if you placed among the top fantasy performers, you were almost certainly injury-free or close to it for the entire season. So an injury the following year means you obviously will underperform relative to your standout year.

But the bigger reason is what is known as "regression to the mean."

The Meaning of "Regression to the Mean"

When some hear that phrase, they misinterpret it to mean that an individual will regress to the mean of all data...but that is certainly not the case. What it does mean, however, is that data will tend to regress back towards some expected value. Extreme data points—like the top-scoring RB in fantasy football in a given year—involve some amount of skill and some amount of luck. In the subsequent year it is not likely this year's top players will experience the same luck.

Let me give some examples. Suppose we have an X that predicts a Y really well, with small error. A plot of that data in Minitab Statistical Software would look like this:

Y vs X with Small Error

In this case, with very little error around the predicted values, the highest X value predicts the highest Y value, and as it turns out corresponds to the actual highest Y value. The 2nd highest X value corresponds to the 2nd highest Y value. While this won't always match exactly, the ranks won't tend to be very far off. In fantasy football terms, think of X as the player's true ability, Y the expected fantasy points, and the actual value to include the expected fantasy points plus or minus some amount of luck.

Now consider a more realistic scenario, where there is more moderate error ("luck"):

Y vs X with Moderate Error

Now consider the point at the highest X value, which has the highest expected Y value...the actual Y value is only the 5th highest. The 2nd-highest X value corresponds to the highest observed Y value. The 2nd highest observed Y value? The corresponds to the 16th highest X value. So back to fantasy football, if the observed Y values are fantasy points for a given season then the top 3 performers had the 2nd, 15th, and 16th highest true abilities. Ignoring the myriad of other factors that would predict the next season (aging, a change to a new team, a new coach, different players around them, etc.), we would only expect one of these three to be in the top 3 in the subsequent season.

Regression to the Mean in Fantasy Football—and Real Life

To reiterate, "regression to the mean" does not mean each point is expected to return to the average Y value of the entire dataset—just that we would expect it to fall back to the predicted value indicated by the line. That 16th-best player that obtained the 2nd-highest point total would need another incredibly lucky season to repeat.

Of course, regression to the mean is all around us and not limited to fantasy football, and examples abound in news stories and especially articles about finance. So the next time you read "Home Prices Pull Back From Record Highs" or "Crime Rates Up From Three-Year Low" try to consider whether anything has really changed or whether the data are just showing regression to the mean with no underlying cause.

And good luck in your fantasy draft! Given regression to the mean, you'll need it.

by Dan Wolfe, guest blogger

How would you measure a hole that was allowed to vary one tenth the size of a human hair? What if the warmth from holding the part in your hand could take the measurement from good to bad? These are the types of problems that must be dealt with when measuring at the micron level.

As a Six Sigma professional, that was the challenge I was given when Tenneco entered into high-precision manufacturing. In Six Sigma projects “gage studies” and “Measurement System Analysis (MSA)” are used to make sure measurements are reliable and repeatable. It’s tough to imagine doing that type of analysis without statistical software like Minitab.

Tenneco, the company I work for, creates and supplies clean air and ride performance products and systems for cars and commercial vehicles. Tenneco has revenues of $7.4 billion annually, and we expect to grow as stricter vehicle emission regulations take effect in most markets worldwide over the next five years.

We have an active and established Six Sigma community as part of the “Tenneco Global Process Excellence” program, and Minitab is an integral part of training and project work at Tenneco.

Verifying Measurement Systems

Verifying the measurement systems we use in precision manufacturing and assembly is just one instance of how we use Minitab to make data-driven decisions and drive continuous improvement.

Even the smallest of features need to meet specifications. Tolerance ranges on the order of 10 to 20 microns require special processes not only for manufacturing, but also measurement. You can imagine how quickly the level of complexity grows when you consider the fact that we work with multiple suppliers from multiple countries for multiple components.

To gain agreement between suppliers and Tenneco plants on the measurement value of a part, we developed a process to work through the verification of high precision, high accuracy measurement systems such as CMM and vision.

The following SIPOC (Supplier, Input, Process, Output, Customer) process map shows the basic flow of the gage correlation process for new technology.

What If a Gage Study Fails?

If any of the gage studies fail to be approved, we launch a problem-solving process. For example, in many cases, the Type 1 results do not agree at the two locations. But given these very small tolerance ranges, seemingly small differences can have significant practical impact on the measurement value. One difference was resolved when the ambient temperature in a CMM lab was found to be out of the expected range. Another occurred when the lens types of two vision systems were not the same.

Below is an example of a series of Type 1 gage studies performed to diagnose a repeatability issue on a vision system. It shows the effect of part replacement (taking the part out of the measurement device, then setting it up again) before each measurement and the bias created by handling the part.

For this study, we took the results of 25 measurements made when simply letting the part sit in the machine and compared them with 25 measurements made when taking the part out and setting it up again between each of 25 measurements. The analysis shows picking the part up, handling it and resetting it in the machine changes the measurement value. This was found to be statistically significant, but not practically significant. Knowing the results of this study helps our process and design engineers understand how to interpret the values given to them by the measurement labs, and give some perspective on the considerations of the part and measurement processes.

The two graphs below show Type 1 studies done with versus without replacement of the part. There is a bias between the two studies. A test for equal variance shows a difference in variance between the two methods.

Type 1 Gage Study with Replacement

Type 1 Gage Study without Replacement

As the scatterplot below illustrates, the study done WITH REPLACEMENT has higher standard deviation. It is statistically significant, but still practically acceptable.

With Replacement vs. Without Replacement

Minitab’s gage study features are a critical part of the gage correlation process we have developed. Minitab has been integrated into Tenneco’s Six Sigma program since it began in 2000.

The powerful analysis and convenient graphing tools are being used daily by our Six Sigma resources for these types of gage studies, problem-solving efforts, quality projects, and many other uses at Tenneco.

About the Guest Blogger:

Dan Wolfe is a Certified Lean Six Sigma Master Belt at Tenneco. He has led projects in Engineering, Supply Chain, Manufacturing and Business Processes. In 2006 he was awarded the Tenneco CEO award for Six Sigma. As a Master Black Belt he has led training waves, projects and the development of business process design tools since 2007. Dan holds a BSME from The Ohio State University and an MSME from Oakland University and a degree from the Chrysler Institute of Engineering for Automotive Engineering.

Would you like to publish a guest post on the Minitab Blog? Contact publicrelations@minitab.com.

When trying to solve complex problems, you should first list all the suspected variables identify the few critical factors and separate them from the trivial many, which are not essential to understanding the cause.

Ishikawa

Many statistical tools enable you to efficiently identify the effects that are statistically significant in order to converge on the root cause of a problem (for example ANOVA, regression, or even designed experiments (DOEs)). In this post though, I am going to focus on a very simple graphical tool, one that is very intuitive, can be used by virtually anyone, and does not require any prior statistical knowledge: the multi-vari chart.

What Is a Multi-Vari Chart?

Multi-vari charts are a way of presenting analysis of variance data in a graphical form, providing a "visual" alternative to analysis of variance.

They can help you carry out an investigation and study patterns of variation from many possible causes on a single chart. They allow you to display positional or cyclical variations in processes. They can also be used to study variations within a subgroup, between subgroups, etc.

A multi-vari chart is an excellent tool to use particularly in the early stages of a search for a root cause. Its main strength is that it enables you to visualize many diverse sources of variations in a single diagram while providing an overall view of the factor effects.

To create a multi-vari chart in Minitab Statistical Software, choose Stat > Quality Tools > Multi-Vari Chart... Then select your response variable and up to four factors in the dialog box.

Interpreting Multi-Vari Charts

Suppose that you need to analyze waiting times from several call centers that are part of a large financial services company. Customers and potential customers call to open new accounts, get information about credit cards, ask for technical support, and access other services.

Since waiting for a long time while trying to reach an operator may become a very unpleasant experience, making sure callers get a quick response is crucial in building a trusting relationship with your customers. Your customer database has data about customer categories, types of requests, and the time of each phone call. You can use multi-vari graphs to analyze these queuing times.

The multi-vari chart below displays differences between the two call centers (Montpellier and Saint-Quentin: red points on the graph), the weekdays (green points on the graph) and the day hours (several black and white symbols). It suggests that waiting times are longer on Mondays (Mon: in the first part of the graph).

Multivari1

In the following multi-vari graph, the type of requests has been introduced. Notice that the types of request (black and white symbols) generate a large amount of variability. Again it suggests that waiting times are longer on Mondays (the first panel in this plot).

Multivari 2

In the third multi-vari graph, customer categories have been introduced (black and white symbols in the graph). Notice that for request types (the red points in the graph), technical support questions seem to require more time. Again, the queuing times tend to get longer on Mondays.

Multivari 3

In the fourth multi-vari chart, the call centers (the red points), the customer categories (the green points) and the types of requests (black and white symbols) are all displayed. Waiting times seem to be larger at the Montpellier call center. Note that each call center focuses on specific types of requests. For example, technical support calls are only processed at the Montpellier call center. Obviously, the technical support calls (represented by circles with a dot in this plot) are the main issue in this situation.

Multivari 4

Next Steps After the Multi-Vari Chart

This financial services company needs to better understand why queuing times last longer on Mondays, also longer waiting times for technical support calls need to be dealt with. This conclusion is correct only if the full range of the potential sources of variations, has been considered.

A multi-vari chart provides an excellent visual display of the components of variation associated with each family. However, when there is no obvious dominant factor, or when the “signals” from the process are too “weak” to be detected easily, it is useful to augment the multi-vari graph with more powerful statistical techniques (such as an ANOVA or a regression analysis) to numerically estimate the effects due to each factor.

Football! When it comes to fantasy football, there is a common statistical term that comes up again and again. It’s "variation."

From season to season, week to week, and even quarter to quarter, NFL players can be very inconsistent. This can make selecting your fantasy team as much about luck as it is about skill. Nobody has a crystal ball that reveals who will be fantasy sleepers and fantasy busts in the upcoming season. And even if they did, they’d be keeping it to themselves and making millions in Vegas rather than writing about it on the Internet. (Can you make millions off of fantasy football in Vegas? I’m assuming yes, because...you know, Vegas.)

But all hope is not lost. In fact, we can use that variation to our advantage. And I'll use Minitab to show you how.

The Data Analysis

I used ESPN’s projections to rank the top 100 players. But it wasn't as simple as seeing who was projected for the most points. If Aaron Rodgers is projected for the most points, but 10 other quarterbacks are projected for only 5 fewer points, it doesn't make sense to spend a high draft pick on Rodgers—you can grab a quarterback just as good later on. So we need to compare our projections for each position to an “average” player of the same position. A common way to determine the average player is to use the number of players in each position that are drafted in the top 100. I used EPSN’s average draft position to look at the first 100 picks.

On average, 13 quarterbacks are drafted in the first 100 picks. So I took the projection for the 13th pick (Eli Manning) and subtracted it from the projections of every other quarterback in the top 100. After doing this for each position, I was able to have a common value (I called it the “Value Score”) I could rank all the players on.

But I wanted to do one more step. After determining a player's ranking based on the projections, I compared that to their average draft position. Then I could see if the player appears to be overrated or underrated. For example, Wes Welker is ranked 52nd based on the projections, but is currently has an average draft position of 33. So I would say that Welker is somebody that you should avoid drafting! (That is, until the 52nd pick...but odds are he’ll already be drafted by somebody else by then.)

Now, let’s break down the results!

NOTE: All of my data came from ESPN. Obviously other projections would yield slightly different results. Because of this, I wanted to also compare Yahoo’s projections. But unfortunately, you have to pay to get them this year. Because writing this blog hasn't doubled my salary yet, I decided to just stick with the free ESPN projections.

The Top 10

Rank

Player

Value Score

Adrian Peterson

165

Arian Foster

147

Aaron Rodgers

120

Marshawn Lynch

119

Ray Rice

118

Jamaal Charles

114

Calvin Johnson

113

C.J Spiller

113

Trent Richardson

112

Doug Martin

112

No big surprises in the Top 2. The statistics say that Adrian Peterson and Arian Foster should be your top 2 picks, and it’s not even close. Their value scores are both well above the rest of the top 10, so don’t think twice about pulling the trigger on those two if you have a Top 2 pick.

But I should tell you something you don’t already know, right? Okay. Look at the value scores for the next 8 players. There is only a difference of 8 points between all of them. By the end of the season some of these players will live up to and exceed expectations, while others won’t. But because of all the variation in the sports, it’s almost impossible to say which players will do what. So how can this help you? Easy, there are two ways to use this data.

1. Take the “safe” route and pick Aaron Rodgers if you have a pick between 3-10. His average draft position is 9, so you’ll most likely be able to grab him no matter where your pick is. Rodgers is the “safe” play because quarterbacks are very consistent players from year to year. You can minimize the amount of variation in your picks because running backs and receivers are much less consistent. Drafting a first-round running back that turns into a bust can be a disaster for your fantasy season. Drafting Rodgers would be an easy way to avoid stepping on that landmine.

2. Pull a Bill Belichick and trade down. You don’t want to trade out of the top 10 (values scores start dropping more rapidly after that), but if you have the 3rd or 4th pick, see if you can’t trade down and get the 8th, 9th or 10th. You’re not really giving up much value, and you should get extra picks in return!

The truth is, because those 8 players are so close together and we can't predict which ones will be busts, you really can't go wrong. But try to take advantage if you can!

Wide Receivers

Instead of going through all 100 players, I’m going to go through each position and pick out a few select players you should avoid, and then some that you should draft. I’m basing these selections on the differences between the average draft position and their ranking (so positive values are underrated, while negative values are overrated). Let’s start with wide receivers.

In general, the stats don’t like drafting wide receivers too early (Calvin Johnson being the one exception, which makes drafting him in the 1st round a very good option). Of the 35 wide receivers in the data analysis, 29 of them had negative differences, meaning they’re being drafted too early. The 6 receivers who aren't being overrated are:

Calvin Johnson
TY Hilton
Stevie Johnson
Mike Williams
Torrey Smith
Cecil Shorts

Other than Johnson, that’s not too great of a list. All of these receivers should be available in the 8th round and later, so I would prioritize other positions early, then aim for 2 of these players later on to give you WR depth. And I’m not saying don’t draft any wide receivers. I’m just saying only take one or two and save your depth for later.

If you do draft a wide receiver early, here are the top 5 most overrated receivers (with their difference in parentheses)

Danny Amendola (-47)
Hakeem Nicks (-32)
DeSean Jackson (-31, although the projection may have been made before Jeremy Maclin was injured, so take this one with a grain of salt)
Dwayne Bowe (-22)
Wes Welker (-20)

Quarterbacks

Quarterbacks are the antithesis of wide receivers, in that the statistics say they’re being drafted too late! There wasn't a single quarterback in the top 100 who was being drafted way too early (RG III was the closest, but his difference was only -3, so he’s really being drafted about where he should be). So if you miss on one of the elite quarterbacks early, it would be in your best interest to wait and take one of the following names late. Each of these 4 quarterbacks is going in the 6th round or later, but the stats say they should all be about 4th round picks.

Matthew Stafford
Russel Wilson
Andrew Luck
Tony Romo

A quick note about the last name on that list: if you’re looking for that most undervalued guy in the draft, Romo is your man. His average draft position is 77, but the projections say he should be the 38th pick. The difference of 39 is more than any other player in the top 100.

Running Backs

There isn't a lot to say here, as most running backs are being drafted close to where they should be. Only 4 running backs have a double-digit difference (two of them being overrated and two underrated). Your overrated running backs are Vick Ballard and Chris Johnson, and your underrated running backs are BenJarvus Green-Ellis and Shane Vereen. Vereen is the most drastic, having a difference of 20. With Danny Woodhead being out of New England, Vereen could be a pleasant surprise even as the backup running back for the Patriots. Keep him in mind as you look to get running back depth.

Tight Ends

There is one name and one name only that I have to mention when it comes to tight ends: Rob Gronkowski. The statistics have him ranked at #22, one spot behind Jimmy Graham (the consensus #1 tight end). But his average draft position is 43, giving him a difference of 21 spots! Of course, the main reason for this is his injuries and how soon he’ll be able to recover. If you like to gamble, aim for Gronkowski in the 5th round. If he’s healthy all season, you just got the steal of the draft. And if he’s not, well, there are worse things than losing a 5th round pick.

But imagine if that 5th round pick came from a result of trading down your early 1st round pick! Then you just got a lower 1st round pick with almost the same value as the one you traded, and you have an extra 5th round pick to take a free gamble on Gronkowski! That’s how you win your fantasy football league.

After all, you’re never going to get rid of the variation in the NFL. But if you know how you can use it to your advantage, you can put the fantasy odds in your favor!

smokejumper It’s wildfire season out West. Time to be in awe of the destructive power of Nature.

According to active fire maps by the USDA Forest Service, over 300 fires are now burning across a total of 1.5 million acres—including 35 large, uncontained blazes.

Shifting winds, humidity, and terrain can quickly alter a fire's intensity. In extreme conditions, flames can reach over 150 feet, with temperatures exceeding 2000° F.

This ferocious power is matched by only one thing: The incredible strength, courage, and skills of smokejumpers who parachute into remote areas to combat the deadly blazes.

But danger looms before a smokejumper even confronts a fire.

In statistics, we ask: “Can you trust your data?”

For a smokejumper, the critical initial question is: “Can you trust your parachute?”

Smokejumping + Statistics = Technical Fire Management

When they’re not battling wildfires, many smokejumpers pursue advanced studies in fields like fire management, ecology, forestry, and engineering in the off-season.

At Washington Institute, smokejumpers and other students in the Technical Fire Management (TFM) program apply quantitative methods—often using Minitab Statistical Software—to evaluate alternative solutions to fire management problems.

Dr. Bob Loveless “The students in this program are mid-career wildland firefighters who want to become fire managers, i.e., transition from a technical career path to a professional path in the federal government,” says Dr. Robert Loveless, a statistics instructor for the TFM program.

As part of the program, the students have to complete, and successfully defend, a project in wildland fire management. One primary analysis tool for these projects is statistics.

“Many students have no, or a limited, background in any college-level coursework,” Loveless noted. "So teaching stats can be a real challenge."

Minitab often helps students overcome that challenge.

“Most students find using Minitab to be easy and intuitive,” Dr. Loveless told me. That helps them focus on their research objectives without getting lost in tedious calculations or a complex software interface.

Using Minitab to Evaluate the Quality of Smokejumper Parachutes

For his TFM project, Rigging and Research and Development Supervisor for Boise Smokejumpers Steve Stroud used Minitab to evaluate the relationship between the age, the number of jumps used, and the permeability of a smokejumper’s parachute.

The permeability of a parachute is a key measure of its performance. Repeated use and handling cause the nylon fabric to degrade, increasing its permeability. If permeability becomes too high, the chute opens more slowly, the speed of descent increases, and the chute becomes less responsive to steering maneuvers.

Not things you want to happen when you’re skydiving over the hot zone of raging wildfire.

99% Confidence Intervals for Parachute Permeability

Stroud sampled 70 smokejumper parachutes and recorded their age, number of jumps, and the permeability of cells within each parachute. permeability tester

Permeability is measured as the airflow through the fabric in cubic feet of air per one square foot per minute (CFM). For a new parachute, industry standards dictate that the CFM should be less than 3.0 CFM. The chute can be safely used until its average permeability exceeds 12.0 CFM, at which time it’s considered unsafe and should be removed from service.

Using the descriptive statistics command in Minitab, the study determined:

Smokejumpers could be 99% confident that the mean permeability in unused parachutes (0-10 years old, with no jumps) was between 1.99 and 2.31 CFM, well within industry standards.

Only one unused parachute, an outlier, had a cell with a CFM greater than 3.0 (3.11). Although never used in jumps, this parachute was 10 years old and had been packed and repacked repeatedly.

For used parachutes (0-10 years old, with between 1-140 jumps), smokejumpers could be 99% confident that the mean permeability of the parachutes was between 4.23 and 4.61 CFM. The maximum value in the sample, 9.88, was also well below the upper limit of 12.0 CFM.

Regression Analysis to Estimate Parachute Service Life

The service life for the smokejumper parachutes was 10 years at that time. However, this duration was based on a purchase schedule used by the U.S. military for a different type of parachute with different usage. Smokejumpers use a special rectangular parachute made of pressurized fabric airfoil.

ram air chute

Stroud wanted to determine a working service life appropriate for the expected use and wear of smokejumper chutes. Using Minitab’s regression analysis, he developed a model to predict permeability of smokejumper parachutes based on number of jumps and age (in years). (A logarithmic transformation was used to stabilize the unconstant variance shown by the Minitab residual plots.)

-------------------------------------------------------------------------

Regression Analysis: logPerm versus logJumps, logAge

The regression equation is logPerm = 0.388 + 0.198 logJumps + 0.170 logAge

Predictor        Coef         SE Coef       T            P
Constant      0.38794   0.02859     13.57    0.000
logJumps     0.197808 0.007920   24.97    0.000
logAge         0.17021    0.01704       9.99     0.000

S = 0.196473 R-Sq = 76.6% R-Sq(adj) = 76.4%

Analysis of Variance
Source                  DF        SS           MS              F         P
Regression           2       56.201   28.100    727.96   0.000
Residual Error   446     17.216    0.039
Total                  448    73.417
-----------------------------------------------------------------------

Both predictors, the number of jumps (log) and age of the parachute (log), were statistically significant (P = 0.000). The coefficient for LogJumps (0.19708) was greater than logAge (0.17021), indicating that number of jumps is a stronger predictor of the permeability of a parachute than its age. The R-Sq value indicates the model explains approximately 75% of the variation in parachute permeability.

Using the fitted model, the permeability of the chutes can be predicted for a given number of jumps and age. Based on 99% prediction intervals for new observations, the study concluded that the service life of chutes could be extended safely to 20 years and/or 300 jumps before the permeability of any single parachute cell reached an upper prediction limit of 12 CFM.

By adopting this extended service life, Stroud estimated they could save over $700,000 in budget costs over a period of 20 years, while still ensuring the safety of the chutes.

Follow-up: One Good Analysis Leads to Another

Smokejumper B Stroud’s TFM student research project, completed in 2010, provided the impetus for further investigation and potential policy change in two federal agencies.

“The permeability-based service life paper has been implemented in the Bureau of Land Management,“ says Stroud, who is now Assistant Loft Supervisor for the BLM smokejumpers. “We are in our 4th year of truth testing the model. It has been working very well.”

“The Forest Service (MTDC) has also taken it and is in the process of gathering their initial data set to run the model based on their usage and wear characteristics. Once completed they will be going base to base to gather intelligence for the Forest Service Jumpers.”

Piggybacking off Stroud's original research, those analyses may help put smokejumpers, who face threats from both hot zones and budget constraints, safely into the black.

Acknowledgements: Many thanks to both Bob Loveless and Steve Stroud for their invaluable contributions to this post. True to reputation, smokejumpers strive to go beyond the call of duty: Steve e-mailed me his assistance while he was thousands of feet above the air in a smokejumping plane, with the sign-off “Sent from Space!”

Source: Stroud, S. Permeability Based Analysis on the BLM Smokejumper Parachute. Washington Institute, Technical Fire Management student projects database.

Photo credits: Smokejumper photos courtesy of Mike McMillan

Additional Links:

Washington Institute, Technical Fire Management program
Bureau of Land Management, Boise Smokejumpers site.
Want to experience what it feels like to smokejump over a wildfire (while remaining safely at your desk)? Check out the incredible smokejumping video clips at http://www.spotfireimages.com/

by Jeff Parks, guest blogger

Being a Cincinnati Bengals fan is tough. It's true that Bengals fans don't have it as bad as, say, long-suffering Chicago Cubs fans...nevertheless, the Bengals haven’t won a playoff game since January 1991. That's currently the longest streak in the NFL. In the 1990s they were voted the worst sports franchise by ESPN. Not the worst football team, mind you, but the worst franchise in all of sports.

Not the L.A. Clippers. Not the Cleveland Browns. Not the Pittsburgh Pirates.

The Cincinnati Bengals.

Why? Why must it be so? What separates the Bengals from the good teams in the NFL?

During the 1980s they went to the Super Bowl twice. Once they were within about 39 seconds of winning the whole thing. In the 1970s they were competitive with the great Pittsburgh Steelers dynasty, year-in and year-out, for AFC North supremacy.

So what happened?

It was a question like this that sent me on the cathartic journey of writing a book, Applying Six Sigma Tools to the Woeful Bengals: A Fan Laments.

As a Six Sigma Master Black Belt for the past 12 years, I've worked on more than 350 projects in over 15 industries...surely I could bring some of what I know about process improvement to find some way—any way—to improve them “Who Dey” Bengals.

I started this venture by postulating (the “Define” Phase of DMAIC, if you will), what would the Bengals need to do to be more like today's AFC Champions—the people who play in the Super Bowl like the Bengals once did?

Let’s start with the win-loss record over the past 20-odd years. From 1991 till 2012, the Bengals have average a record of

6 wins
10 losses

For a winning percentage of 37%. That’s right—37%.

Now look at the percentages for the AFC Champs over that same time period:

YEAR

AFC CHAMP

WINS

LOSSES

TIES

WINNING PCT

81.25%

68.75%

75.00%

68.75%

68.75%

68.75%

75.00%

87.50%

81.25%

75.00%

New England Patriots†

68.75%

2002

Oakland Raiders

68.75%

2003

New England Patriots†

87.50%

2004

New England Patriots†

87.50%

68.75%

75.00%

100.00%

75.00%

87.50%

75.00%

81.25%

62.50%

AVERAGE

76.70%

So, on average, the AFC champs won twice as many games (12) as the Bengals did (6) over those 20-odd years from 1991.

We can use Minitab to superimpose those two win-lose curves on the same graph.

Bengals vs AFC Champs

The Bengals in essence need to:

Move the above curve to the right (i.e., increase their average wins per season more in line with the AFC champs).
Decrease the width of the curve (i.e., be more consistent in the wins each season).
In other words, “shift and narrow” the curve.

A good question to ask right about now would be: “What does it take to produce a good winning percentage—12 or more games in a 16 game schedule—in the NFL?” It has been said that defense wins championships, but is that really true? To find out, I pulled data from the past 10 years for all NFL teams from the link below:

https://nfldata.com/nfl-stats/team-stats.aspx?sn=14&c=0&dv=0&ta=0&tc=0&st=PointsPerGame&d=1&ls=PointsPerGame

Then I used Minitab to do a regression analysis on “defensive points/game” (how many points does a team’s defense allow each game) as well as “offensive points/game” (how many points does a team’s offense score each game) as “X” or “independent variables.” I wanted to see if any correlation exists for my “Y” or “dependent variable” of “Winning percentage” (number of wins each year divided by 16 total games in a season). My analysis in Minitab produced the following output:

Bengals Regression Analysis

Points per game (Pts/G) for both offense and defense are statistically significant, and the adjusted R-squared value shows the model explains 83.8% of the variation in winning percentage (not too bad of a model).

But which is more important: offense or defense? Notice the coefficients or the numbers in front of each of “DEFENSE Pts/G” and “OFFENSIVE pts/game” above respectively. These values tell us how much each of the variables impact winning percentage (our “Y”).

Since the defense .0279 is larger than the offense .0273, we know that defense DOES matter more, but not by much.

(Note: that the defensive coefficient is negative, -0.0279, only means that as the defense allows more pts/game then the winning pct goes down. Likewise, if the offense pts/game goes up, so does the winning pct. This should be intuitive, as when any defense stops any opponent from scoring the defense pts/game will go down—and that’s a good thing.)

By comparing the two numbers (their absolute values) we can say that, the defense pts/game has a 2.2% greater impact on winning pct than does the offense.

Maybe Defense does win championships, but not by much.

Now that we know defense matters so much, to really help the Bengals we would need to do a deeper dive into what aspects of the Bengals defense is so lacking when compared to the defense of the AFC Champions. And since the two variables of offense/defense pts/game are so close, we would want to do the same thing for the Bengals’ offense.

For instance I was able to determine that there is a statistically significant difference between the number of sacks the Bengals get each game compared to the AFC Champions over the past 10 years:

Bengals Paired T Test

As I explain in my book, by using Minitab for hypothesis testing, capability analysis, regression, and graphing, I was able to come up with some specific, precise items that the Bengals need to address. (For instance, the sack difference above is totally attributed to the linebackers. The sacks from the defensive line, corner backs and safeties are on par with the AFC Champion teams.)

Will they do it?

I don’t know but I emailed a copy of my book to Paul Brown, the Bengals' general manager—so one can only hope, right?

About the Guest Blogger:

Jeff Parks has been a Lean Six Sigma Master Black Belt since 2002 and involved in process improvement work since 1997. He is a former Navy Nuclear Submarine Officer and lives in Louisville, KY with his wife and 7 children. He can be reached at Jwparks407@hotmail.com and via Twitter, @JeffParks3.

Would you like to publish a guest post on the Minitab Blog? Contact publicrelations@minitab.com.

Photograph of Bengals quarterback Andy Dalton by Melissa Batson, used under Creative Commons 3.0 license.

AT&T Park, June 30, 2012 by Peter Thomsen

Because I didn't trust the numbers on the ESPN web site, I calculated my own park factors using their formula. There’re a lot of interesting ways to look at the numbers, but one of the first things I want to do is to focus in on a classical statistics lesson:

One statistic rarely tells the whole story.

Let’s focus on AT&T Park, home of the San Franciso Giants. ParkFactors.com notes that “Overall, AT&T Park plays as a neutral park, with summer days favoring hitters but the damp nighttime air being particularly helpful to pitchers. As an open-air park by the Bay, the park is also quite subject to variable winds.” The variable winds and the day-night difference might help explain why AT&T Park has an unusual property over the last 7 years. On average, it’s a pitcher’s park with a mean park factor of about 0.97. Typically, it’s a hitter's park with a median park factor of 1.03.

Does that sound a little odd to you? I hope so. Unless we use Minitab to get a more thorough picture of the data, it almost seems like nonsense.

We can see what's going by looking at the data using individual value plots from Minitab. Here’s the plot of the AT&T park factors from 2006-2012 with the mean shown by the square. A mean below 1.00 indicates a park that favors pitchers.

Park factors and mean at AT&T Park

Now, here’s the same graph with the median shown by a square. Because the median is larger than 1.00, we would say that AT&T Park typically favors hitters.

Park factors and median for AT&T Park

So what’s happening? In 4 of the 7 seasons, the park factor favors hitters, so the median favors hitters. But In 2 of the 3 seasons that favor pitchers, the factors are much further from neutral than any of the years that favor hitters. These low park factors have a great influence on the mean, which thus indicates that the park favors pitchers.

In fact, it's hard to say that either the mean or median, both of which are close to neutral, really give a good picture of what happens at AT&T Park. The Minitab graphs, which show how non-neutral the park can be, are essential to understanding it.

We have more data than just the park factors though. We also have time. While the main description on ParkFactor.com classifies AT&T Park as neutral, the classification based on the most recent 3-year average indicates that AT&T Park is an extreme pitcher's park. A time-series plot of the data by year puts what happened in 2012 and 2011 in stark relief.

Park factors for AT&T Park 2006-2012

When you use statistical software such as Minitab to explore the data more deeply, you get a lot more information than if you rely on any single measure. Graphing the data over time prompts us to ask: What’s happening in 2011 and 2012? Did something change about the team? Did something change about the park?

I’m going to explore those questions more deeply next time.

The image of AT&T Park is by Peter Thomsenand licensed for reuse under thisCreative Commons License.

Fitted line plot with misspecified linear model We often think of a relationship between two variables as a straight line. That is, if you increase the predictor by 1 unit, the response always increases by X units. However, not all data have a linear relationship, and your model must fit the curves present in the data.

This fitted line plot shows the folly of using a line to fit a curved relationship!

How do you fit a curve to your data? Fortunately, Minitab statistical software includes a variety of curve-fitting methods in both linear regression and nonlinear regression.

To compare these methods, I’ll fit models to the somewhat tricky curve in the fitted line plot. For our purposes, we’ll assume that these data come from a low-noise physical process that has a curved function. We want to accurately predict the output given the input. Here are the data to try it yourself!

Fitting Curves with Polynomial Terms in Linear Regression

The most common way to fit curves to the data using linear regression is to include polynomial terms, such as squared or cubed predictors.

Typically, you choose the model order by the number of bends you need in your line. Each increase in the exponent produces one more bend in the curved fitted line. It’s very rare to use more than a cubic term.

Model Order Linear Quadratic Cubic

The graph of our data appears to have one bend, so let’s try fitting a quadratic linear model using Stat > Fitted Line Plot.

Fitted line plot with quadratic model

While the R-squared is high, the fitted line plot shows that the regression line systematically over- and under-predicts the data at different points in the curve. This shows that you can’t always trust a high R-squared.

Let’s see if we can do better.

Fitting Curves with Reciprocal Terms in Linear Regression

If your response data descends down to a floor, or ascends up to a ceiling as the input increases (e.g., approaches an asymptote), you can fit this type of curve in linear regression by including the reciprocal (1/X) of one more predictor variables in the model. More generally, you want to use this form when the size of the effect for a predictor variable decreases as its value increases.

Because the slope is a function of 1/X, the slope gets flatter as X increases. For this type of model, X can never equal 0 because you can’t divide by zero.

Looking at our data, it does appear to be flattening out and approaching an asymptote somewhere around 20.

I used Calc > Calculator in Minitab to create a 1/Input column (InvInput). Let’s see how that works! I fit it with both a linear (top) and quadratic model (bottom).

Model with reciprocal term

Model with quadratic reciprocal term

For this particular example, the quadratic reciprocal model fits the data much better. The fitted line plots change the x-axis to 1/Input, so it’s hard to see the natural curvature of the data.

In the scatterplot below, I used the equations to plot fitted points for both models in the natural scale. The green data points clearly fall closer to the quadratic line.

Scatterplot to compare models with reciprocal terms

Compared to the quadratic model, the reciprocal model with the quadratic term has a lower S value (good), higher R-squared (good), and it doesn’t exhibit the biased predictions. So far, this is our best model.

Transforming the Variables with Log Functions in Linear Regression

A log transformation is a relatively common method that allows linear regression to perform curve fitting that would otherwise only be possible in nonlinear regression.

For example, the nonlinear function:

Y=eB0X1B1X2B2

can be expressed in linear form of:

Ln Y = B0 + B1lnX1 + B2lnX2

You can take the log of both sides of the equation, like above, which is called the double-log form. Or, you can take the log of just one side, known as the semi-log form. If you take the logs on the predictor side, it can be for all or just some of the predictors.

Log functional forms can be quite powerful, but there are too many combinations to get into detail in this overview. The choice of double-log versus semi-log (for either the response or predictors) depends on the specifics of your data and subject area knowledge. In other words, if you go this route, you’ll need to do some research.

Let’s get back to our example. For data where the curve flattens out as the predictor increases, a semi-log model of the relevant predictor(s) can fit. Let’s try it!

Minitab’s fitted line plot conveniently has the option to log-transform one or both sides of the model. So I’ve transformed just the predictor variable in the fitted line plot below.

Fitted line plot with semi-log model

Visually, we can see that the semi-log model systematically over and under-predicts the data at different points in the curve, just like quadratic model. The S and R-squared values are also virtually identical to that model.

So far, the linear model with the reciprocal terms still provides the best fit for our curved data.

Fitting Curves with Nonlinear Regression

Nonlinear regression can be a powerful alternative to linear regression because it provides the most flexible curve-fitting functionality. The trick is to find the nonlinear function that best fits the specific curve in your data. Fortunately, Minitab provides tools to make that easier.

In the Nonlinear Regression dialog (Stat > Regression > Nonlinear Regression), enter Output for Response. Next, click Use Catalog to choose from the nonlinear functions that Minitab supplies.

We know that our data approaches an asymptote, so we can click on the two Asymptotic Regression functions. The concave version matches our data more closely. Choose that function and click OK.

Nonlinear Regression Catalog of functions

Next, Minitab displays a dialog where we choose our predictor.

Nonlinear Regression predictors dialog box

Enter Input, click OK, and we’re back at the main dialog.

If we click OK in the main dialog, Minitab displays the following dialog:

Nonlinear Regression Parameters dialog

Unlike linear regression, nonlinear regression uses an algorithm to find the best fit step-by-step. We need to supply the starting values for each parameter in the function. Shoot, I don’t have any idea! Fortunately, Minitab makes it easy.

Nonlinear function Let’s look back at the function we chose. The picture makes it easier!

Notice that Theta1 is the asymptote, or the ceiling, that our data approaches. Judging by the initial scatterplot, that’s about 20 for our data. For a case like ours, where the response approaches a ceiling as the predictor increases, Theta2 > 0 and Theta3 > 0.

Consequently, I’ll enter the following in the dialog:

Theta1: 20
Theta2: 1
Theta3: 1

After we enter these values, we go back to the main dialog, click OK, and voila!

Fitted line plot with nonlinear model

Nonlinear model summary information

It’s impossible to calculate R-squared for nonlinear regression, but the S value for the nonlinear model (0.179746) is nearly as small as that for the reciprocal model (0.134828). You want a small S because it means the data points fall closer to the curved fitted line. The nonlinear model also doesn’t have a systematic bias.

Comparing the Curve-Fitting Effectiveness of the Different Models

Model

R-squared

Biased fits

Reciprocal - Quadratic

99.9

0.134828

Nonlinear

N/A

0.179746

Quadratic

99.0

0.518387

Yes

Semi-Log

98.6

0.565293

Yes

Reciprocal - Linear

90.4

1.49655

Yes

Linear

84.0

1.93253

Yes

The linear model with the quadratic reciprocal term and the nonlinear model both beat the other models. These top two models produce equally good predictions for the curved relationship. However, the linear regression model with the reciprocal terms also produces p-values for the predictors (all significant) and an R-squared (99.9%), none of which you can get for a nonlinear regression model.

For this example, these extra statistics can be handy for reporting, even though the nonlinear results are equally valid. However, in cases where the nonlinear model provides the best fit, you should go with the better fit.

Closing Thoughts

If you have a difficult curve to fit, finding the correct model may seem like an overwhelming task. However, after all the effort to collect the data, it’s worth the effort to find the best fit possible.

When specifying any model, you should let theory and subject-area knowledge guide you. Some areas have standard practices and functions to model the data.

While you want a good fit, you don’t want to artificially inflate the R-squared with an overly complicated model. Be aware that:

R-squared can be misleading
Overly complicated models can produce misleading results
Check the residual plots to avoid misleading results (I didn’t display them in this post but I did check them!)

One member of Minitab's LinkedIn group recently asked this question:

I am trying to create a chart that can monitor change by month. I have 2012 data and want to compare it to 2013 data...what chart should I use, and can I auto-update it? Thank you.

As usual when a question is asked, the Minitab user community responded with some great information and helpful suggestions. Participants frequently go above and beyond, answering not just the question being asked, but raising issues that the question implies. For instance, one of our regular commenters responded thus:

There are two ways to answer this inquiry...by showing you a solution to the specific question you asked or by applying statistical thinking arguments such as described by Donald Wheeler et al and applying a solution that gives the most instructive interpretation to the data.

In this and subsequent posts, I'd like to take a closer look at the various suggestions group members made, because each has merits. First up: a simple individuals chart of differences, with some cool tricks for instant updating as new data becomes available.

Individuals Chart of Differences

An easy way to monitor change month-by-month is to use an individuals chart. Here's how to do it in Minitab Statistical Software, and if you'd like to play along, here's the data set I'm using. If you don't already have Minitab, download the free 30-day trial version.

I need four columns in the data sheet: month name, this year's data, last year's data, and one for the difference between this year and last. I'm going to right-click on the Diff column, and then select Formulas > Assign Formula to Column..., which gives me the dialog box below. I'll complete it with a simple subtraction formula, but depending on your situation a different formula might be called for:

assign formula to column

With this formula assigned, as I enter the data for this year and last year, the difference between them will be calculated on the fly.

data set

Now I can create an Individuals Chart, or I Chart, of the differences. I choose Stat > Control Charts > Variables Charts for Individuals > Individuals... and simply choose the Diff column as my variable. Minitab creates the following graph of the differences between last year's data and this year's data:

Individuals Chart

Updating the Individuals Chart Automatically

Now, you'll notice that when I started, I only had this year's data through September. What happens when I need to update it for the whole year? Easy - I can return to the data sheet in January to add in the data from the last quarter. As I do, my Diff column uses its assigned formula (indicated by the little green cross in the column header) to calculate the differences:

auto-updated worksheet

Now if I look at the I-chart I created earlier, I see a big yellow dot in the top-left corner.

automatic update for an individuals chart

When I right-click on that yellow dot and choose "Automatic Updates," as shown in the image above, Minitab automatically updates my Individuals chart with the information from the final three months of the year:

automatically updated i chart

Whoa! It looks like we might have some special-cause variation happening in that last month of the year...but at least I can use the time I've saved by automatically updating this chart to start investigating that!

In my next post, we'll try another way to look at monthly differences, again following the suggestions offered by the good people on Minitab's LinkedIn group.

A member of Minitab's LinkedIn group recently asked how to create a chart to monitor change by month, specifically comparing last year's data to this year's data. My last post showed how to do this using an Individuals Chart of the differences between this year's and last year's data. Here's another approach suggested by a participant in the group.

Applying Statistical Thinking

An individuals chart of the differences between this year's data and last year's might not be our best approach. Another approach is to look at all of the data together. We'll put this year's and last year's data into a single column and see how it looks in an individuals chart. (Want to play along? Here's my data set, and if you don't already have Minitab, download the free 30-day trial version.)

We'll choose Stat > Control Charts > Variables Charts for Individuals > Individuals... and choose the "2 years" column in my datasheet as the variable. Minitab creates the following I chart:

i chart of two years

Now we can examine all of the data sequentially and ask some questions about it. Are there outliers? The data seem remarkably consistent, but those points in December (12 and 24) warrant more investigation as potential sources of special cause variation. If investigation revealed a source for these data points that indicate these outliers should be disregarded, these outliers could be removed from the calculations for the center line and control limits, or removed from the chart altogether.

What about seasonality, or a trend over the sequence? Neither issue affects this data set, but if they did, we could detrend or deseasonalize the data and chart the residuals to gain more insight into how the data are changing month-to-month.

I-MR Chart

Instead of an Individuals chart, one participant in the group suggested using an I-MR chart, which provides both the indiviudals chart and a moving-range chart. We can use the same single column of data, then examine the resulting I-MR chart for indications of special cause variation. "If not, there's no real reason to believe one year was different than another," this participant suggests.

Another thing you can do with most of the control charts in Minitab is establish stages. For example, if we want to look for differences between years, we can add a column of data (call it "Year") to our worksheet that labels each data point by year (2012 or 2013). Now when we select Stat > Control Charts > Variables Charts for Individuals > I-MR...we will go into the Options dialog and select the Stages tab.

I-MR Chart stage dialog

As shown above, we'll enter the "Year" column to define the stages. Minitab produces the following I-MR chart:

I-MR Chart with Stages

This I-MR chart displays the data in two distinct phases by year, so we can easily see if there are any points from 2013 that are outside the limits for 2012. That would indicate a significant difference. In this case, it looks like the only point outside the control limits for 2012 is that for December 2013, and we already know there's something we need to investigate for the December data.

Time Series Plot

For the purposes of visual comparison, some members of the Minitab group on LinkedIn advocate the use of a time series plot. To create this graph, we'll need two columns in the data sheet, one for this year's data and one for last year's. Then we'll choose Graph > Time Series Plot > Multiple and select the "Last Year" and "This Year" columns for our series. Minitab gives us the following plot:

Time Series Plot

Because the plot of this year's and last year's data are shown in parallel, it's very easy to see where and by how much they differ over time.

Most of the months appear to be quite close for these data, but once again this graph gives us a dramatic visual representation of the difference between the December data points, not just as compared to the rest of the year, but compared to each other from last year to this.

Oh, and here's a neat Minitab trick: what if you'd rather have the Index values of 1, 2, 3...12 in the graph above appear as the names of the months? Very easy! Just double-click on the X axis, which brings up the Edit Scale dialog box. Click on the Time tab and fill it out as follows:

Edit the time scale of your graph

(Note that our data start with January, so we use 1 for our starting value. If your data started with the month of February, you'd choose to start with 2, etc.) Now we just click OK, and Minitab automatically updates the graph to include the names of the month:

Time Series Plot with Months

The Value of Different Angles

One thing I see again and again on the Minitab LinkedIn group is how a simple question -- how can I look at change from month to month between years? -- can be approached from many different angles.

What's nice about using statistical software is that we have speed and power to quickly and easily follow up on all of these angles, and see what different things each approach can tell us about our data.

Most of the data that one can collect and analyze follow a normal distribution (the famous bell-shaped curve). In fact, the formulae and calculationsused in many analyses simply take it for granted that our data follow this distribution; statisticians call this the "assumption of normality."

For example, our data need to meet the normality assumption before we can accept the results of a one- or two-sample t (Student) or z test. Therefore, it is generally good practice to run a normality test before performing the hypothesis test.

But wait...according to the Central Limit Theorem, when the sample size is larger than 30, normality is not a crucial prerequisite for a standard t (Student) or z hypothesis test: even though the individual values within a sample might follow an unknown, non-normal distribution, the sample means (as long as the sample sizes are at least 30) will follow a normal distribution.

Central Limit Theorem

Moreover, some tests are more robust to departures from normality. For example, if you use the Minitab Assistant, a two-sample T test requires only 15 values per sample. If the sample size is at least 15, normality is not an issue and the test is accurate even with non-normal data. Again, in the Minitab Assistant, a one-sample t test only requires at least 20 values in the sample. The reason for this is that the tests that are available in the Minitab Assistant have been modified in order to make them more robust to departures from normality.

What can you do when your sample sizes are still smaller than these threshold limit values and your data are not normally distributed ? The only remaining option is to use a nonparametric test. A nonparametric test is not based on any theoretical distribution. Therefore as a last resort and when all other options are exhausted, you can still use a nonparametric test.

In the service sector, for example, durations are often analyzed to improve processes (reduce waiting times, queuing times, lead times, payment times, faster replies to customer requests…). How long we wait for something is an important aspect of the customer experience, and ultimately influences customer satisfaction. Typically, duration times will not follow a normal distribution.

Non Normal distribution

The P value in the probability plot above is smaller than 0.05, indicating that the data points do not follow a normal distribution. We can see a very significant curvature in the normal probability plot, and the points clearly do not follow the normal probability line. The histogram shows that the distribution is highly skewed to the right; also, the sample size is quite small (14).

This data set is an ideal candidate for a nonparametric approach.

But which nonparametric test do we need to use in this situation? The correspondence table below shows how each nonparametric test (in Minitab, choose Stats > Non Parametric Tests) is related to a parametric test. This table provides a guideline for choosing the most appropriate nonparametric test in each case, along with the main characteristics of each nonparametric test.

Correspondence table

This weekend my 3-year-old son and I were playing with his marble run set, and he said to me, "The marbles start together, but they don't finish together!"

It dawned on me that the phenomenon he was observing seems so obvious in the context of a marble run, and yet many practitioners fail to see the same thing happening in their processes. I quickly made a video of me placing six marbles in simultaneously so I could illustrate to others what I will call "variation amplification:"

It is obvious in the video that there is little variation in the positions of the marbles in the beginning, but as they progress through the run the variation in times becomes larger and larger. In fact, these facts are obvious even to a 3-year-old:

The balls spread out as they progress
Certain parts of the run cause the balls to spread out more than others
The balls do not finish in the same order that they started
Some pieces allow balls to change position while others do not

To help further illustrate some of these points, here is a graph (created in Minitab Statistical Software) of each of the six balls at various points in the run:

Time versus Position

At this point, these facts all seem very obvious. But when working to improve cycle times of a process—whether through lean efforts, a kaizen event, or a Six Sigma project—many practitioners completely fail to take advantage of these characteristics.

Some will even tell you that times "even out" during the process, and a part that took an exceptionally long time in one step of the process will probably take a short time on another so that parts end up with roughly similar total cycle times.

In reality, that part is just as likely to take exceptionally long again on another step and be even further from average. This is the essence of variation amplification: variance in cycle times will only increase at each step of the process and, without some control in place, will never decrease.

Consider processes of invoice payments in a finance department—or indeed most other transactional processes, whether in an office, healthcare, or other environment. The points from above can be generalized to:

"Parts" starting at the same time will spread apart from one another as they progress through the process, and will not finish together.
Certain steps in the process will cause more spread than others.
Parts will not complete each step in the same order that they completed the previous step.
Only some steps will allow for re-ordering.

So how do we combat variation amplification in transactional processes? There are multiple lean tools at our disposal. I won't pretend that a few sentences in a blog post can cover everything, but I will offer a few starting points.

Collect data to find out which steps are adding the most to the variation. In the marble run it is obvious that the round "bowls" are the biggest contributors, but in most transactional processes, various steps are electronic and it is difficult to watch a part progress through the process. Collect data to gain clarity. Then focus on the biggest contributor(s) first.
In most cases, reducing the average time in a step will also reduce the variation.
Establish flow so that parts are not re-ordering (FIFO).
Allow parts to queue prior to steps that add significant variation. As you reduce the cycle time and variation within that step you can reduce the queue until (hopefully) you establish a pull system, where there is little or no need for queuing.

From a simple marble game a 3-year-old understood variation amplification, and you likely could too when you watched the video. But can you see that the same phenomenon is happening in transactional processes all around you?

We tend to think of control charts only for monitoring the stability of processes, but they can be helpful for analyzing a process before and after an improvement as well. Not only do control charts allow you to monitor your process for out-of-control data points, but you’ll be able to see how your process mean and variability change as a result of the improvement.

Control Chart

You might create separate before and after control charts for each phases of the improvement project, but making comparisons between those charts can be difficult. You could also try analyzing all of the data over the course of your project in a single control chart, but this could result in incorrectly flagging out-of-control points. This method also won’t calculate changing mean and control limit values.

Putting Control Charts on the Stage

The best choice in a case like this is to create a control chart in “stages,” which is easy to do with a statistical software package such as Minitab. Stages are used to create a historical control charts that shows how a process changes over specific time periods. At each stage, Minitab recalculates the center line and control limits on the charts.

Check out this simple example featuring a control chart that tracks admission times for a hospital’s ICU over a three-month period:

Minitab Control Chart

Though the process for admitting patients underwent improvement each month, all of the data is graphed on a single chart without utilizing stages.

It’s easy to see that the admission times decreased, but the only thing we really know for certain is that the mean admission time was about 18 minutes. Also, quite a few points at the outset suggest that this process is out of control.

Take a look at the exact same data, now charted with stages:

Minitab Control Chart

At each stage, Minitab recalculates the center line and control limits on the chart. Now it’s easy to see that this process was kept in control at every stage of improvement, and you also get a much more accurate idea of how the mean admission times went down at each stage of the improvement. You can also see that the variation of the process—the area between the upper and lower control limits—decreased over time.

To create a historical control chart with stages in Minitab, you’ll need a column of data indicating the stage of each observation. Here’s what that looks like for our ICU example:

Minitab Worksheet

The column "month" contains the stages and the values June, July, and August correspond to each stage in the process. When you are creating the chart, just click the chart options button and click the stages tab. There you can enter the column containing the stages information. In this case, the column month contains the stages:

Minitab Stages

A special thanks to Patrick North in Minitab’s Information Development department for his contributions to this blog post!

For more on control charts, check of these posts:

Control Charts Show You Variation that Matters

Control Charts: Subgroup Size Matters

How to Create and Read an I-MR Control Chart

Ever start a fantasy football draft and realize that passing touchdowns are worth 6 points, not 4? Or how about realizing at the last minute that the commissioner of your league decided to have a point per reception (PPR) league. We know that this year running backs are going to be going early in the draft. But if your league is a PPR or gives 6 points for a passing touchdown, should you be focusing on quarterbacks and receivers instead?

Sounds like a perfect question for a data analysis in a statistical software package like Minitab!

Getting Six Points for Passing Touchdowns

My first reaction to this type of league is “HOLY VINCE LOMBARDI, GIVE ME ARRON RODGERS RIGHT NOW!!!!!!!” But when you stop and think about it, every quarterback is benefiting from this scoring upgrade, not just Rodgers. So if they are all scoring more points (remember, QBs already score more points than other positions), will any of them really be more valuable?

To test this, I obtained projections for the top 100 players, then compared their projections to the “average” player of the same position. A common way to determine the average player is to use the number of players at each position who are drafted in the top 100.

For example, 13 quarterbacks are drafted in the first 100 picks on average. So I took the projection for the 13th pick (Eli Manning) and subtracted it from the projections of every other quarterback in the top 100. After doing this for each position, I was able to have a common value (I called it the “Value Score”) I could rank all the players on.

NOTE: This is similar to what I did in my previous blog post. However, instead of using projections from just one site (ESPN), a friendly reader pointed me to fantasypros.com. Their projections are averages of projections obtained from multiple fantasy sports sites (CBS Sports, ESPN, Pro Fantasy Focus, FFToday, and NFL.com). I love the idea of having a much more diverse collection of projections instead of just relying on one site, so I'm getting my data from their site. Now back to the analysis.

First, I ranked the top 100 players on their value score using projections that scored passing touchdowns as 4 points. Then I did it again using projections that scored them as 6 points. Now I can compare a player’s ranking before and after the scoring change to see how much they moved in the rankings. To see how the 13 quarterbacks moved as a group, I did a paired t test.

Paired t-test

We see that the average ranking of quarterbacks when touchdown passes are 4 points is 46.8, and when you change them to 6 points it...well...pretty much stays the same! The average does drop to 48.2, but the difference isn’t statistically significant. So giving quarterbacks more points for passing touchdowns doesn’t change a thing!

That can’t be right, can it?

Well, it’s not quite right. The reason the difference is 0 is because some quarterbacks rankings go up, but others actually go down. That’s right—giving 6 points for a passing touchdown makes some quarterbacks less valuable. Remember that Eli Manning is our “average” player. He’s projected for 28.5 touchdown passes. Now think of the mobile quarterbacks who are ranked ahead of him (Newton, Kaepernick, RG III, Wilson, and even Luck). All 5 of them are projected for fewer touchdown passes than Eli. They are ranked ahead of him because they will get rushing yards and rushing touchdowns that Eli won’t. So when passing touchdowns become more valuable, the gap between Eli and the mobile quarterbacks narrow, making them all less valuable, and thus decreasing their ranking.

Below is a list of each quarterback, and how their rankings changed.

Quarterback

Rank for 4 points

Rank for 6 points

Difference

Aaron Rodgers

Drew Brees

Peyton Manning

Cam Newton

-11

Tom Brady

Matt Ryan

Colin Kaepernick

-14

Matthew Stafford

Andrew Luck

-2

Robert Griffin III

-17

Russell Wilson

-10

Tony Romo

Eli Manning

Unless you’re Rodgers, Brees, Manning, or Brady, the scoring change isn’t helping you at all. And you can see how much it hurts the mobile quarterbacks. Keep that in mind if your league gives 6 points for a passing touchdown!

Point Per Reception Leagues

Now let’s move on to PPR leagues. Is Megatron going to become a top 5 pick now? (Hold on, adding Megatron to my Word dictionary.) I did the exact same thing as before, but this time I gave running backs, tight ends, and wide receivers a point per reception (based on their projected receptions for the season). My rankings also assumed 4 points for a passing touchdown. So let’s get right to the data. I did another paired t test on the wide receivers rankings before and after the PPR rule change.

Paired t-test

We see that on average a PPR league improves a wide receiver’s rankings by almost a full 5 places, and the difference is statistically significant. Megatron doesn’t become a top 5 pick (his ranking only improves from 11th to 9th), but there are 7 receivers that improved by double digits. Here is a table of those 7 players.

Wide Receiver

Rank Standard

Rank PPR

Difference

Brandon Marshall

Andre Johnson

Randall Cobb

Reggie Wayne

Wes Welker

Antonio Brown

Danny Amendola

So if all those receivers are moving up in the rankings, who is moving down? The answer is: quarterbacks. They become almost an afterthought in PPR, as Aaron Rodgers is the highest ranked QB at #26.

That's right, twenty-six!

On average, each quarterback drops 11.9 spots in the rankings. So make sure not to take a QB early if you're in a PPR league, and focus on the 7 receivers mentioned above!

Your Own Scoring System

Maybe your league has a unique scoring system. For example, my family league couldn’t agree on whether to have a PPR or not. So we decided to split the difference and make it half a point per reception. If you want to see the rankings on your own league settings, here is a worksheet with the raw data I used in it. It lists the top 100 players with their projected fantasy points (with 4 points for a passing TD and no PPR), projected receptions, and projected touchdown passes. Simply calculate each player's new projection for your league (for my family league, I’ll divide the projected receptions by two and add them to the standard projections). Then find the “average player” at each position (which is the player with the lowest score). Then subtract the new projections from the average player’s score, and sort on that value!

And if you’re interested, the worksheet above also has the top 100 players for each of the 3 scoring systems I mentioned in the article.

Don’t forget, rankings are more of a guideline than the absolute truth. These projections love Doug Martin, as he’s ranked #2 in each scoring system. But I also did a statistical analysis showing that there are signs that he might not perform as well as he did last year. The projections also don't think too highly of Tom Brady (ranked 39th in a standard league), because he lost 4 of his top 5 targets from last year and the fifth is currently injured. But didn't Brady win a Super Bowl with Troy Brown as his best receiver? I'll believe Tom Brady will struggle when I see it. In fact, I'm of the mind that if you just pushed a shopping cart away from the line of scrimmage each play, Brady could get the shopping cart to average 5 catches and 32 yards a game.

So make sure to include your own thoughts about each player when making your pick. And above all else, just have fun with it!

After all, it’s only fantasy football.

quartile analysis for phone center duration data The value of analyzing data is well established in industries like manufacturing and mining, but data-driven process and quality improvement is increasingly being adopted in service industries like retail sales and healthcare, too. In this blog post, I'll discuss how a simple data analysis may be used to improve processes in the service sector.

Suppose we want to improve the way incoming calls are processed in a call center run by a large insurance company. We are interested in analyzing duration metrics, which are very useful in assessing both the experience of customers—who always appreciate quick processing of incoming calls—and issues of employee productivity, such as which tasks are the most difficult to deal with.

If two customers call with exactly the same objective, process durations will certainly differ, because each employee will process the calls in a slightly different way. In addition, the customers may react differently to questions asked by the call center operator, and many other factors can influence the duration of the call.

Collecting Process Data

The process map below illustrates the 12 sequential steps involved in processing the incoming calls in this example.

Process Map

Some tasks are performed during the one-to-one conversation with the customer (Preliminary analysis, Database search, Data Entry …), and other tasks are done immediately after the phone call, while the customer is waiting to receive the final document.

The task durations have been measured for each step of each call received, over a reasonable period of time. Our objective is to transform this large quantity of raw data into useful information, and to calculate summary statistics.

We are interested in analyzing the amount of variability in the time it takes to complete each task. We would like to better understand which steps in this process generate the largest amount of variability.

Using Quartile Summary Statistics to Analyze Variability

Analyzing variability using quartile summary statistics may play an important role in benchmarking. For example, we can seek to understand if there are large differences in processing times for a particular step, and then gain an insight into what accounts for the differences. Is the task duration affected by having more experienced operators, different levels of knowledge, or better practices?

Quartiles represent the value for which 25% of the data is below (Q1) and the value for which 25% of the data is above (Q3). The Interquartile Range (IQR) is the difference between these two quartiles (Q3 – Q1 = IQR). A major advantage of using the Interquartile Range (IQR) to estimate variability is that it is much less sensitive to outliers than the variance or the Standard Deviation summary statistics.

The Box Plot below shows the Q3 values (upper bounds of the boxes) for each of the twelve process steps, as well the Q1 values (lower bounds of the boxes) and the outliers which are indicated by asterisks beyond the whiskers. The height of the rectangular boxes for each group represent the middle 50% (interquartile ranges) of the data.

Boxplot

Minitab Statistical Software has been used to compute the IQR values for the 12 steps and display them on the bar chart graph below.

IQRs

Q1 (the first quartile), Q3 (the third quartile) and the IQR for the 12 process steps are shown in the bar chart below.

Q1, Q3 and IQR

You can see that the IQR is much larger for the follow-up calls (12th step). The IQRs are also large for the 2nd, 3rd and 4th steps (Preliminary analysis, Database search and Evaluation) during the initial conversation and for the 10th step (Recordkeeping) after the one-to-one conversation. Because the differences for these specific tasks are significant, we might have an opportunity to reduce durations.

These variations might be due to different levels of knowledge for the more complex tasks. Can we set best practices for these tasks ? Can we identify what the best practices are ? Do we need to standardize the way follow-up calls are carried out ? Do we need to redefine what we expect from a follow-up call ? Could the large differences for the database search and for the record keeping tasks be due to different levels of knowledge of the customer relations management (CRM) database? Perhaps not all employees feel comfortable using the company CRM system.

Now that we've identified the tasks with the most potential for improvement, we can begin to see which factors might have the most influence on the variation we've seen.

The Power of Quartile Statistics

Quartile statistics possess a great deal of analytical power and are a very useful technique for benchmarking purposes. These statistics are used less often in the service sector than in manufacturing, but more and more businesses are learning how even very simple summary statistics such as quartiles can help you identify the steps that can easily be improved.

In an earlier post, I used AT&T Park to illustrate that a single number is rarely a good way to summarize data. Even the mean and median have their limitations.

The time series plot shows how the park factor dropped below 0.80 after the 2010 season, when it had been around 1 previously. I left you with a question about why the park effect appeared to change so drastically at AT&T Park.

Time series plot of park factors for AT&T Park

Let’s take a look at a few fun theories.

"It’s El Nino."

Well, not really El Nino. The last El Nino event in California was 2009-2010. But you can find a relationship between weather in San Francisco and the park factor. Look at what happens when you use the mean of the average daily temperature and the mean sea level pressure in San Francisco between April 3rd and October 5th to predict the park factor at AT&T Park.

Plot showing relationship between pressure, temperature, and park factor

Park Factor = -24562.3 + 399.168 Mean temp + 819.055 Mean pressure - 13.3102 Mean temp*Mean pressure

The model that includes the interaction between temperature and pressure has an R2 value of 91.90% and a predicted R2 value of 73.20%. Those statistics represent a lot of the variation in the data.

"It’s Coors."

Late in 2010, the San Francisco Giants opened a new seating area in right field. Initially sponsored by Coors Light and called the “Coors Light Cold Zone,” it was just the latest in many changes to seating arrangements that have taken place since AT&T Park opened (with a different name) in 2000. Adding, removing, or changing seats can have all kinds of effects on a ballpark. Maybe walls moved or changed in height, or maybe the winds coming in out of San Francisco bay were affected. Either way, those seats came in right before the park factor dropped dramatically.

"It’s not San Francisco."

One interesting possibility is that the change in the park factor doesn’t have to do with changes in San Francisco. If scoring stays about the same AT&T Park, but games involving the Giants as the road team have more scoring, then the AT&T Park would look more like a pitchers’ park.

Enter a familiar suspect: Chase Field. According to an article by correspondent Jeff Summers, preparations for the 2011 All Star Game at Chase Field included the addition of LED boards on each side of the large dbTV scoreboard in center field. Also, mosaics on the panels that open to provide ventilation to the dbTV were replaced by high-definition photos. Take a look at the time series plot of park factors at AT&T Park when we add Chase Field. A dramatic increase in the park factors at Chase Field happens at the same time that those at AT&T Park drop.

Times series plot contrasting AT&T Park with Chase Field

What do I think?

So what do I think after working through the data? I think there’s a cause, but it’s too complicated to be explained by anything simple. If the weather really had a strong, causal effect on park factors, then you might expect to see similar effects at other baseball parks. But if you use the same model at other ballparks, you get somewhat ridiculous predictions. For example, here's Minitab's fitted line plot of the predictions from the AT&T Park model to the real park factors at Coors Field:

Predicted vs. real park factors at Coors Field

Forget about that less-than-stellar R2 value, and notice that the real park factors range between 1.1 and 1.6 while the predictions range between 0.5 and 4. We get some pretty awful statistics if we try to use the same predictors but estimate new coefficients too. Expand the data set to include Chase Field, Petco Park, and Safeco Field, and the R2 drops to 36.18%.

While an association between weather and park factor might exist, the relationship is not as simple as the strong fit statistics for the San Francisco data suggest.

As for the seating theory, sure, there’s a physical change. But there’s no record that the renovations involved moving walls or anything else that would explain a relationship with park factor.

The corresponding change that shows up with the renovations in Chase Field is also most likely a coincidence. If the park factor for Chase Field really changes, then the park factor for AT&T Park could be affected—the Giants play more games against division rivals. But if the change to Chase Field affected the Giants, then I would expect it to affect the other division rivals too. Notice that I left those off my first graph. Here’s a time-series plot that shows all of the National League West teams:

Park factors for all teams in the West Division of the National League

When Chase Field changes between 2010 and 2011, the only corresponding effect is at AT&T Park. That's not the plot we would expect if the change at Chase Field was affecting the park factors of visiting teams that often play at Chase Field.

No Easy Answers

The amount of change at AT&T Park from 2010 to 2011 is unusual, at least from the perspective of how we would expect moving ranges to behave in a stable process.

But as we've seen, it’s important to be careful about accepting easy answers about the cause of the change.

The image of AT&T Park is by Darin Marshalland licensed for reuse under thisCreative Commons License.

Mythbusters title screen In my home, we’re huge fans of Mythbusters, the show on Discovery Channel. This fun show mixes science and experiments to prove or disprove various myths, urban legends, and popular beliefs. It’s a great show because it brings the scientific method to life. I’ve written about Mythbusters before to show how, without proper statistical analysis, it’s difficult to know when a result is statistically significant. How much data do you need to collect and how large does the difference need to be?

For this blog, let's look at a more recent Mythbusters episode, “Battle of the Sexes – Round Two.” I want to see how they’ve progressed with handling sample size. There are some encouraging signs: during the show, Adam Savage, one of the hosts, explains, “Sample size is everything in science; the more you have, the better your results.”

To paraphrase the show, here at Minitab, we don’t just talk about the hypotheses; we put them to the test. We’ll use two different hypothesis tests and this worksheet to determine whether:

Women are better at multitasking
Men are better at parallel parking

Are Women Better Multitaskers?

The Mythbusters wanted to determine whether women are better multitaskers than men. To test this, they had 10 men and 10 women perform a set of tasks that required multitasking in order to have sufficient time to complete all of the tasks. They use a scoring system that produces scores between 0 and 100.

The women end up with an average of 72, while the men average 64. The Mythbusters conclude that this 8 point difference confirms the myth that women are better multitaskers. Does statistical analysis agree?

The statistical perspective

The average scores are based on samples rather than the entire population of men and women. Samples contain error because they are a subset of the entire population. Consequently, a sample mean and the corresponding population mean are likely to be different. It’s possible that if we reran the experiment, the sample results could change.

We want to be reasonably sure that the observed difference between samples actually represents a true difference between the entire population of men and women. This is where hypothesis tests play a role.

Choosing the correct hypothesis test

Because we want to compare the means between two groups, you might think that we’ll use the 2-Sample t test. However, based on a Normality Test, these data appear to be nonnormal.

The 2-Sample t test is robust to nonnormal data when each sample has at least 15 subjects (30 total). However, our sample sizes are too small for this test to handle nonnormal data. Therefore, we can’t trust the p-value calculated by the 2-Sample t test for these data.

Instead, we’ll use the nonparametric Mann-Whitney test, which compares the medians. Nonparametric tests have fewer requirements and are particularly useful when your data are nonnormal and you have small sample sizes. We’ll use a one-tailed test to determine whether the median multitasking score for women is greater than the median men’s score.

To run the test in Minitab statistical software, go to: Stat > Nonparametrics > Mann-Whitney

The Mann-Whitney test results

Mann-Whitney test results

The p-value of 0.1271 is greater than 0.05, which indicates that the women’s median is not significantly greater than the men’s median. Further, the 95% confidence interval suggests that the median pairwise difference is likely between -9.99 and 30.01. Because the confidence interval includes both positive and negative values, it would not be surprising to repeat the experiment and find that men had the higher median!

The Mythbusters looked at the sample means and “Confirmed” the myth. However, the data do not support the conclusion that women have a higher median score than men.

Power analysis to determine sample size

If the Mythbusters were to perform this experiment again, how many subjects should they recruit? For a start, if they collect at least 15 samples per group, they can use the more powerful 2-Sample t test.

I’ll perform a power analysis for a 2-sample t test to estimate a good sample size based on the following:

I’ll assume that the difference must be at least 10 points to be practically meaningful.
I want to have an 80% chance of detecting a meaningful difference if it exists.
I’ll use the sample standard deviation.

In Minitab, go to Stat > Power and Sample Size > 2-Sample t and fill in the dialog as follows:

Power and sample size for 2-sample t dialog

Under Options, choose Greater than, and click OK in all dialogs.

Power and sample size results for 2-sample t test

The output shows that we need 29 subjects per group, for a total of 58, to have a reasonable chance of detecting a meaningful difference, if that difference actually exists between the two populations.

Are Men Better at Parallel Parking?

The Mythbusters also wanted to determine whether men are better at parallel parking than women. They devised a test that produces scores between 0 and 100. At first glance, this appears to be a similar scenario as the multitasking myth where we’ll compare means, or medians. However, the means and medians are virtually identical and are not significantly different according to any test.

Descriptive statistics for parallel parking by gender

There’s a different story behind this myth. During the parking test, the hosts notice that the women’s scores seem more variable than the men’s. The women are either really good or really bad, while men are somewhere in between, as you can see below.

Individual value plot of parallel parking scores by gender

We want to be reasonably sure that the observed difference in variability actually represents a true difference between the populations. We need to use the correct hypothesis test, which is Two Variances (Stat > Basic Statistics > 2 Variances). The test results are below:

Two variances test results for parallel parking by gender

The null hypothesis is that the variability in both groups are equal. Because the p-value (0.000) is less than 0.05, we can reject the null hypothesis and conclude that women’s scores for parallel parking are more variable than men’s scores.

The Mythbusters correctly busted this myth because the means and medians are essentially equal. We can't conclude that one gender is better at parallel parking than the other.

However, we can conclude that men are more consistent at parallel parking than women.

Closing Thoughts

In one of their online videos, Adam and Jamie explain that they understand the importance of sample size. However, Adam states that the Mythbusters put more effort into the methodology of collecting good data. It’s true, they are great at reducing sources of variation, obtaining accurate measurements, etc. He goes on to explain that they just don’t have the resources to obtain larger sample sizes. Fair enough—for a television show.

However, if you’re in science or Six Sigma, you don’t have this luxury. You must:

Have a good methodology for collecting data
Have a sufficient sample size
Use the correct statistical analysis

Without all of the above, you risk drawing incorrect conclusions.

How Residuals Can Save You Thousands of Dollars on Your Next Car Purchase

Spicy Statistics and Attribute Agreement Analysis

How Many Licks to the Tootsie Roll Center of a Tootsie Pop? Part 2

Fantasy Studs and Regression to the Mean

Doing Gage R&R at the Microscopic Level

Using Multi-Vari Charts to Analyze Families of Variations

Finding Value in Your Fantasy Football Draft

Hotshot Stats: Evaluating the Service Life of Smokejumper Parachutes

Using Minitab Statistical Software to Analyze the Woeful Bengals

Analyzing Baseball Park Factors: Home of the San Francisco Giants

Curve Fitting with Linear and Nonlinear Regression

Creating a Chart to Compare Month-to-Month Change

Creating Charts to Compare Month-to-Month Change, part 2

A correspondence table for non parametric and parametric tests

Variation Amplification: Even a 3-Year-Old Understands It...Do You?

Analyzing a Process Before and After Improvement: Historical Control Charts with Stages

How much do different scoring systems affect fantasy football rankings?

Quartile Analysis for Process Improvement

Analyzing Baseball Park Factors: Don't Settle for Easy Answers

Using Hypothesis Tests to Bust Myths about the Battle of the Sexes