Quantcast
Channel: Minitab | Minitab
Viewing all 828 articles
Browse latest View live

What Statistical Software Should You Choose: Three More Critical Questions

$
0
0

questions about statistical softwareEarlier I wrote about four important questions you should ask if you're looking at using statistical software to analyze data in your organization, especially if you're hoping to improve quality using methods like Six Sigma. But there are other points to consider as well. If you're in market for statistical software, be sure to investigate these questions, too!

What Types of Statistical Analysis Will They Be Doing? 

The specific types of analysis you need to do could play a big part in determining the right statistical software for your organization. The American Statistical Association's software page lists highly specialized programs for econometrics, spatial statistics, data mining, statistical genetics, risk modeling, and more. However, if your company has employees who specialize in the finer points of these kinds of analyses, chances are good they already identified and have access to the right software for their needs.

Most users will want a general statistics software package that offers the power and flexibility to do all of the most commonly used types of analysis, including regression, ANOVA, hypothesis testing, design of experiments, capability analysis, control charts, and more. If you're considering a general statistics software package, check its features list to make sure it does the kinds of analysis you need. Here is the complete feature list for Minitab Statistical Software.  

Many general statistical software packages offer the same types of analysis, but their power, ease of use, and the amount of statistical know-how they demand of the user vary widely. That's why it's important to consider the level of statistical expertise users will have, and how easy it is for them to get help when they need it.

Are There Special Considerations for Data Analysis in Your Industry? 

Some professions may have specialized data analysis needs due to regulations, industry requirements, or the unique nature of their business. For example, the pharmaceutical and medical device industry needs to meet FDA recommendations for testing, which may involve statistical techniques such as Design of Experiments. Similarly, control charts that were developed to measure defects may not be the right control charts for measuring "rare events" such as those in medical professions. 

Depending on the needs of your business, one or more of these highly specialized software packages may be appropriate. However, general statistics software packages with a full range of tools may provide the features and functionality your industry requires, so be sure to investigate and compare these packages with the more specialized, and often more expensive, programs used in some industries. 

What Do Statistical Software Packages Cost? 

Last but not least, you will need to consider the cost the software package, which can range from $0 for some open-source programs to many thousands of dollars per license for more specialized offerings.

Once again, it is important to compare not just the unit-copy price of a software package (i.e., what it costs to install one copy of the software on a single machine) but to find out what licensing options for statistical software are available for your situation.  

Got More Questions about Statistical Software?

These questions are only the starting point in selecting the best statistical software package for your organization or business. If you have questions about software for data analysis or quality improvement, please feel free to contact the Minitab representative nearest you to discuss your situation in detail.   


The Glass Slipper Story: Analyzing the Madness in the 2013 NCAA Tournament

$
0
0

Glass SlipperCinderella showed up early and often during the first weekend of the 2013 NCAA Tournament. Florida Gulf Coast stole the show with their glass slippers, becoming the first ever 15 seed to reach the Sweet 16. But don’t let that overshadow what happened in the West Region: Wichita St and La Salle both arrived in a pumpkin-turned-carriage, and now the Shockers are a game away from the Final Four! And don’t forget about Harvard just because the clock struck midnight on them first. They were at the ball, too! Madness indeed.

In the world of statistics, we have another word for this “madness.” It’s called variation. In the quality world, variation is usually bad. You don’t want there to be large differences in the length of the part you’re making or in the weight of the bag of food you produce. But in the world of sports, variation makes for great entertainment. If the better teams always won, what would be the point of watching? We’d have to rename it March Monotony. Luckily, that’s not the case, as this year’s tournament clearly shows. But how much “madness” did we actually have this year? And was it the most ever?

How to Judge the Madness

 Normally, people judge the madness in the tournament by looking at the seeds. Here is a list of the craziest Sweet 16s by seed. 2013 ranks 5th. But using the seeds doesn’t quite tell us the whole story. For example, Minnesota was an 11 seed this year, but was actually favored over 6 seeded UCLA. And Oregon was a lot better than your average 12 seed, as they were 26-8 on the season and won the Pac-12 tournament. So I’m going to ignore seeds and instead use the probability that one team has of beating the other.

I calculated the probabilities using a regression model that takes the rankings of both teams and calculates the probability one has of beating the other. I used the Sagarin Ratings to get the rankings of the teams since I determined earlier in the year that they were the most accurate ranking system. You can get all the data here. Now let’s get to the statistics!

For each of the first 48 games in the NCAA tournament (I didn’t count the First Four), I calculated the odds one team had of beating another. Then I grouped the probabilities into categories (50 to 59, 60 to 69, and so on). So if there are 10 games in the “80 to 89” category, you would expect the favorite to win 8 or 9 of those games. Now we can compare what the model predicted to what actually happened.

Bar Chart

We can clearly see why there was no shortage of Cinderellas this year! Only 54% of the teams favored to win between 60% and 69% of the time actually won. And things weren’t any better for the teams favored to win 70% to 79% of the time, as only 58% of them won. That’s a lot of upsets! The 70% category in particular seems pretty low, so let’s break it down further.

There were 12 games in the “70 to 79” category, with an average probability of the favorite winning being 74%. Of those 12 games, the favorite won only 7 of them. The 5 upsets were Ole Miss, Harvard, Wichita St (twice...the computers loved Pitt), and Florida Gulf Coast (over San Diego St, their win over Georgetown was the lone upset in the “80 to 89” group). So how unlikely was it that in this group of games only 7 favorites would win? We can use a probability plot to find out.

Distribution Plot

This plot shows the probability for each possible outcome if 12 games were played with the favorite having a 74% chance of winning (I know the probability varied for each game, but I’m using the constant to satisfy the assumptions of the binomial distribution). The red area shows the probability of 5 upsets is 11.43%.

That’s pretty low, but look at the rest of the plot. The most likely outcome is that 9 favorites would win, and that only has a probability of about 25%. And having 5 upsets is just as likely to occur as having 1 upset. If only 1 of these upsets occurred, sports analysts would have been saying this tournament was boring because nothing unpredictable happened. But in fact, that would have just as unpredictable as what did happen!

This all just goes to show how hard it is to predict the tournament. Even if you’re given 12 games and told the favorite will win 74% of the time, you’ll have no better than a 25% of picking the correct number of upsets. And even if you do, you then have to be lucky enough to correctly pick the games where the upsets will occur.

Why do you have to be so difficult, Cinderella?

But back to the data: we saw that there was a high number of upsets in both the “60 to 69” and “70 to 79” categories this year. Let’s see if the same thing happens in other tournaments that had high seeds in the Sweet 16.

You Get a Glass Slipper! And You Get a Glass Slipper! And You Get a Glass Slipper!

We keep hearing that this is the wackiest Sweet 16 since 2000, so let’s check out that year. 2000 had only two 1 seeds, one 2 seed, and one 3 seed advance to the Sweet 16. Surely the observed probabilities for the higher categories will be crazy here (and again, these numbers are only through the first weekend of the tournament).

Bar Chart

Wait, what is this? I thought everybody was getting glass slippers...I’ve been misled! This chart shows the tournament went almost exactly as we’d expect. And the two highest groups show there was actually a lack of upsets!!!!

This just shows how seeds don’t tell the whole story. The two 8 seeds that beat the 1 seeds were Wisconsin and North Carolina, not exactly who you would think of as Cinderellas (both teams went on to reach the Final Four). The three 3 seeds that lost all did so to 6 seeds: not exactly major upsets. And there were only two double-digit seeds in the Sweet 16 (both of them 10 seeds).

In fact, according to the Sagarin Ratings, there were only 11 upsets in the first two rounds of the 2000 NCAA tournament, compared to 15 this year. So before we go any further, let’s just see if any other year can beat 2013 in just number of upsets. I’ll stick to years at the top of the list of craziest Sweet 16s.

Tally

Sagarin Ratings only go back to 1999, so I couldn’t do any earlier years. 2001 and 2012 are lower on the list, but I added them because 15 seeds beat 2 seeds in both years. I also added 2009 as a fun comparison because it was the chalkiest Sweet 16 ever.

We see that no year is able to top the 15 upsets that occurred in this tournament. In addition, the 2010 tournament (6th on the list of craziest Sweet 16s) has the same number of upsets as the Sweet 16 with the most chalk!  That just goes to show the 2010 tournament had some terrible seeding.

When we break the statistics down by our categories, there is only one year that had an upset in the “90 to 99” group (2012, when Norfolk St beat Missouri). And no other year had more upsets than 2013 in the “80 to 89” group. But do any other years have 2013 beat when it comes to upsets in the “60 to 69” and “70 to 79” categories? Or have we just witnessed the craziest opening weekend ever (well, since 1999)? The next chart will tell us.

Bar Chart

A few other years come close to 2013 in the “60 to 69” group, but none beat it. And no other year even touches the 5 upsets that have occurred in the “70 to 79” category. I think that does it. The clock just struck midnight, and 2013 is the last one dancing. So congrats Florida Gulf Coast and company, you’ve made 2013 the craziest Sweet 16 ever. I just hope you saved some glass slippers for next year! 

Photograph by Glamhag.  Licensed under Creative Commons BY-SA 2.0.

How to Prove You're (a) Case Using Statistics

$
0
0

prove you're (a) case!by Patrique Roonquel, guest blogger
Institut Sacre Bleu

I enjoy using Minitab Statistical Software to uncover the vast causal relationships unfolding in the universe all around me. 

What kind of novel things have I proven with Minitab? Almost anything you can imagine, mon petite shoe.

For example, the fitted line plot below clearly shows one thing:  it’s time for our political parties to stop all the bickering and finally give Americans what we really want…

height vs vote

…a much taller president!

(See the dot way up at the top of the plot? That’s George Washington, the Father of our Country.  He was one of our tallest presidents and got 99.9% of the vote. Bada bing. Bada boom.)

Feel free to use this plot to predict future election results. 

For example, write this one down: I predict that in2020, Igor Vovkovinskiy will be elected U.S President by a landslide.

What Makes the Human Body Evolve?

Fortunately, my statistical expertise is not limited to political soothsaying. Oh contraire, mon chair!

Like Walt Whitman, I contain multitudes— or multivariates—or  Malto-Meal—or whatever he said.

Move over, Kevin Rudy. Because I’m also an expert on sports statistics.

These side-by-side  time series plots clearly show why winning times in the Olympic 100-yard dash are getting faster each year.

dash gold

Who wouldn’t run faster for gold, at those prices?  The inverse correlation is about as obvious as it gets.  

How Can We Improve the Lives of Our Children?

But my statistical analyses have also uncovered more elusive and  hidden cause-effect relationships.

For example, everyone is concerned about the effect of television on the impressionable minds of our youth.

To better understand this complex issue, I’ve collected historical data on U.S. television ownership from the Nielsen survey and average math scores of 4th graders from the U.S. Department of Education.

tv and math

Par bleu, Mon duh!

The chart shows that the more TVs we add to our households, the better the mathematical skills of our children will become.  Which makes perfect sense. I suspect that manipulating large, 3-digit channel numbers using the remote markedly improves their number sense.  

But I know what you’re thinking: That similar trend in both  bar charts  doesn’t prove anything.

So just to satisfy you critics, I performed a correlation analysis in Minitab to verify the statistical significance of the association between these two  variables.

---------------------

Correlations: Avg TV sets per household, Avg Math Scores-4th graders

Pearson correlation of Avg TV sets per household and Avg Math Scores-4th graders = 0.972

P-Value = 0.000

-------------------

I mean, what more do you want? A p-value can’t get any lower than that! And look at that correlation—almost a perfect positive correlation!

The statistics prove it. If you truly care about your children, and their futures, buy them more TVs.

  Editor's Note:

The data in this post are true and accurate. The author and his interpretations are, thankfully, completely fictional.  If you’re interested in other works by Dr. Roonquel, check out his groundbreaking paper on how an antioxidant deficiency caused World War II.

And by the way, today (April 1), is Dr. Roonquel's birthday!

 

My Job as a Minitab Statistician

$
0
0

Rob KellyIn honor of the International Year of Statistics, I interviewed Rob Kelly, a senior statistician here at Minitab Inc. in State College, Pa.

Rob designs features you see in Minitab Statistical Software, and the focus of much of his work is on Design of Experiments. Check out what Rob had to say about how he became interested in pursuing a career in statistics.

1. What kind of reaction do you get when you tell people you are a statistician?

I find that a lot of people have some experience with statistics, but I usually get a couple of different reactions. There are the people who immediately tell me, “Statistics was my least favorite subject in school,” or “That sounds hard.” There are also the people who are really interested and ask me to help them with their statistical projects! Even though people don’t seem to like statistics, it’s so widely used and taught that most people have had exposure and seem to know the basics.

2. Who inspired you to be a statistician?

I was always interested in math, but I never knew much about statistics until my undergraduate years at Northern Illinois University. I tried out different things, but I kept coming back to math.

I had a great statistics professor named Stan Trail who taught my first introductory statistics course at Northern Illinois and encouraged me to keep going in statistics. I found that I really enjoyed more advanced statistics courses and I got a lot of encouragement, so I kept pursuing statistics as a career path.

3. What is the best thing about being a statistician?

The best thing about my job is also one of the most challenging things. What we do here at Minitab is to make statistics more accessible, and while this can be very hard to accomplish—it’s also really great to help people understand and apply it to solve their problems.

4. What is the best career advice you were ever given?

It’s important to find something you care about enough to work hard at. If you are passionate about what you do, you’ll be more willing to go the distance to get things done. This is all pretty standard advice, but in my experience it's absolutely true.

Learn more about how Minitab is celebrating the International Year of Statistics: http://www.minitab.com/company/news/news-basic.aspx?id=11770.

Use Minitab to Graph Minecraft's Success

$
0
0

Giant castle built in MinecraftI have a good time putting together simple data sets that you can use to build your confidence in statistics. But I tend to like fairly old things: Shakespeare (1564-1616), Poe (1809-1849) and gummi bears (invented 1922). But I have some modern interests too. One of those, appearing in about 2009, is Minecraft.

If you like Minecraft, then here’s a data set that you can use to practice a few things in Minitab Statistical Software. One of the nicest things about Minitab is that even with this spreadsheet, saved in Googledocs, you can copy and paste directly into Minitab.

Change Data with the Time Series Menu

The data set contains cumulative values. The number of buys goes from 533,451 on 11/2/2010 to 547,544 on 11/4/2010. This means that we’re missing the number of people who buy Minecraft per day, on average, which is a neat number to check out.

Fortunately, Minitab Statistical Software has an easy way to get the daily data from the cumulative rows. (If you're not already using Minitab, you can download a free 30-day trial.) Try this:

  1. Choose Stat > Time Series > Differences.
  2. In Series, enter Date.
  3. In Store differences in, enter‘Days between rows’.
  4. Click OK.
  5. Press Ctrl+E to reopen the Differences dialog box:
  6. In Series, enter buys.
  7. In Store differences in, enter‘Buys per row’.
  8. Click OK.
  9. Choose Calc > Calculator.
  10. In Store result in variable, enter‘Buys per day’.
  11. In Expression, enter‘Buys per row’/’Days between rows’.
  12. Click OK.

Without doing any complicated formulas, you now have a column that shows the average number of times people bought minecraft for the range of dates in the data set.

Use Graphs to Show What Was Hidden

First, let’s take a look at a scatterplot of the original data for the number of people who buy Minecraft. These are the steps I used to make scatterplots of the number of people who bought Minecraft.

  1. Choose Graph > Scatterplot.
  2. Choose Simple. Click OK.
  3. In row 1, enter buys for the Y variable and Date for the X variable.
  4. In row 2, enter ‘Buys per day’ for the Y variable and Date for the X variable.
  5. Click OK.

The first scatterplot of the cumulative data shows a fairly straight line, indicating that the number of people who buy Minecraft per day stays about the same. Nothing looks unusual.

The number of total buys increases in a line.

The second scatterplot of the data per day shows two data points that clearly stand out from the others.

Minecraft promotions are unusual data.

A little research quickly reveals why these two data points are unusual. The first, December 20, 2012 you might guess would be related to eager Christmas shoppers. However, it turns out that December 20 was the day that Minecraft reached beta status. On December 21, the price increased and the company no longer promised that future updates would be free. So Minecraft gave a heavy incentive to people to buy before December 20.

The second date, 8/14/2011, was the Sunday of Minecraft Wedding Weekend. At first, this sounds like an event where everyone decorated their Minecraft worlds in white and had cake, but actually, this was the weekend that Mojang owner Markus Persson got married. To celebrate, if you bought Minecraft that weekend, you got a code for a free copy.

What You Learned

So you’ve had a chance to practice with Minitab, building your confidence so that you’ll be ready to make solid decisions with your own data. You saw how to change cumulative data into data increasing by row. You also got to spot unusual observations in a scatterplot.

Like seeing the value of finding unusual data? Examine the graph Newcrest Mining Ltd. uses to show when the time comes to replace a fuel injector, part of a strategy expected to save $835,000 in a year.

The image of the huge minecraft castle town is by Kenming Wangand licensed for reuse under thisCreative Commons License.

Great Presidents Revisited: Does History Provide a Different Perspective?

$
0
0

Dewey Defeats Truman newspaperRecently, Patrick Runkel blogged about using regression models to explain how historians ranked the U.S. presidents. Given that I both love regression and that I’ve written about using regression to predict U.S. presidential elections, I wanted to take Patrick up on his challenge to improve upon his model.

My goal isn’t merely to predict the eventual ranking for any President. Instead, I’m much more interested in a fascinating question behind this analysis. Is the public’s contemporary assessment of the president consistent with the historical perspective, or do they differ?

With this in mind, I’ve collected two additional types of data that provide the contemporaneous assessment of the President and the social mood: presidential approval ratings and the Dow Jones Industrial Average.

Along the way, I’ll highlight the problems of overanalyzing small datasets, and how to determine if you are!

Gallup Presidential Approval Ratings

The Gallup organization has tracked the approval rating of the president since the days of Franklin Roosevelt. As I’ve written about here, Gallup uses consistent wording in order to facilitate comparisons over time. I’ll use the fitted line plot for a preliminary investigation into whether this variable is worthy of consideration. I’ll run it three times to see how the historian’s ranking corresponds to the highest approval, average approval, and lowest approval.

Looking at the three plots, it’s interesting to note that the highest approval rating produces an R-squared of 0.7%! The fitted line is essentially flat. If you want an exemplar for what no relationship looks like, this is it!

Fitted line plot of historians rank by highest Presidential approval

Fitted line plot of historian's rank by average Presidential approval

Fitted line plot of historian's rank by lowest Presidential approval

The picture is more interesting in the average and low approval ratings plots. The low approval rating plot provides a better fit with an R-squared of 34.7%. Collectively, these plots suggest that it’s more important to know how low the approval has gone than how high! Even though there are only 12 data points, the approval rating is significant with a p-value of 0.044.

It seems that history remembers the worst of a President, rather the best!

Eleven data points follow the general trend. However, the one data point in the bottom-left of the plots is clearly an outlier. That data point is Harry Truman (pictured at top). Truman doesn’t fit the model because he had very low approval ratings while he was president, but the historians give him a fairly good rank of #6.

It’s tempting to remove this data point because the model then yields an R-squared of 67%. However, there is no reason to question that data point and I think it would be a mistake to remove it. It’s not good practice to remove data points simply to produce a better fitting model.

You may be wondering, can we add other variables into this model to improve it? Unfortunately, that’s not possible because of the limited amount of data available. In regression, a good rule of thumb is that you should have at least 10 data points per predictor. We’re right at the limit and can’t legitimately add more predictors.

Instead, let’s look at a new variable that provides more data points!

Presidents and the Dow Jones Industrial Average

Previously, I assessed a model by Prechter, et al. that claims to predict whether an incumbent president would be re-elected using just the Dow Jones. The theory states that the stock market is a proxy variable for social mood, not that the stock market directly affects voting. The stock market is a good measure of social mood because if society feels positive enough to invest more money in the stock market, they are presumably happy with the status quo, which favors the incumbent.

The researchers find a positive, significant relationship between several outcomes for presidential elections that have an incumbent and the percentage change in the Dow Jones over a 3-year period. The researchers also include the traditional big three predictors of Presidential elections: economic growth, inflation, and unemployment. The study concludes that the three-year change in the DJIA is the best predictor. Further, when the DJIA predictor is included in the model, the other “Big Three” predictors become insignificant.

I concluded that their model was statistically valid and used it to accurately predict the outcome of the last election.

Historian Rankings of U.S. Presidents and the Dow Jones

Because the Dow Jones Industrial Average is such an important predictor for re-election, can it also predict how well historians view past presidents?

I gathered the Dow Jones (DJ) data for the beginning and end of each president’s time in office, and calculated the percentage change. The Dow Jones began in 1896. For elections prior to 1896, I used the Foundation for the Study of Cycles data set, which I also used for my election prediction post. This data set uses market data from earlier indices to create a longer DJIA.

The initial exploration looks promising when I graph it in the fitted line plot.

Fitted line plot of historian's rank by the Dow Jones

You can see the overall negative slope. In the upper left corner the negative DJ changes are associated with worse ranks. In the bottom right, the higher DJ changes are associated with better ranks. The relationship appears to be curvilinear. This curvature makes sense because there is no limit on how much the Dow Jones can improve, but the rankings cannot be better than #1! Consequently, the downward slope has to flatten out as the DJ increases. We’ll incorporate the curvature in our regression models.

My approach will be to add in the Dow Jones data to both Nate Silver’s and Patrick Runkel’s model to see if it increases the explanatory power of either.

Nate Silver’s Model

Silver’s original model (below) uses the percentage of the electoral vote a president receives for his second term to predict the historian's ranking.

Nate Silvers model with the historian's rank by percentage of the electoral college

As Patrick notes, it’s an elegant model because it requires only one easy to collect variable per president. The model yields an R-squared of 38.6%, which is nearly equal to the approval rating model. Silver’s model only applies to presidents who run for a second term. That gives us 29 data points, which is just enough to include the quadratic form of the Dow Jones data.

General Regression output for historian's ranking model

In the output, we can see that the Electoral College and Dow Jones predictors are all significant and the R-squared is 56.7%. The adjusted R-squared also increased from Silvers original model, which suggests that adding the additional predictors is valid. The coefficients are all as expected given the previous analyses.

Winning a higher percentage of the Electoral College and a positive Dow Jones both improve a president’s ranking by historians.

The Electoral College variable reflects the voter’s assessment of an incumbent president. The Dow Jones variable represents the social mood of the time, which has been shown influence elections. These two variables represent an entirely contemporaneous assessment of both the president and the times and together explain just over half the variability of the historian’s ranking.

Given the number of data points, it wouldn’t be wise to add more predictors to this model. So, we’ll move on to Patrick’s model.

Patrick Runkel’s model

Patrick’s original model includes these variables: years in office, assassination attempt, and war. Collectively, these variables explain 56.66% of the variance. Let’s add in the Dow Jones data and see what we get.

General Regression output for Patrick Runkel's model of historian ranking of Presidents

All of the variables are significant and all three R-squared values have increased. This model accounts for 63.42% of the variance, or nearly two-thirds.

More years in office, a war, an assassination attempt, and a positive Dow Jones all improve a President’s ranking by historians.

With five predictors, the model is pushing these 41 data points to their limit. However, I think the model is good. The two main risks of including too many predictors in a model are:

  • Insufficient power to obtain significance due to imprecise estimates.
  • Overfitting the model, which is when the model starts to fit the random noise. The R-squared increases but, because you can’t predict the random noise for new data, the predicted R-squared decreases.

Fortunately, all of the predictors are significant, so power isn’t a problem. Further, the predicted R-squared has increased, so we probably aren’t overfitting the model.

The Contemporaneous vs. the Historical Perspective

Is the historical perspective different from the contemporaneous perspective? How much can you divine from the present about the ultimate assessment by historians? These are very interesting questions. Our best model suggests that contemporaneous data account for two-thirds of the variance in the rankings by historians.

What about the other third? We can’t say for sure. It’s possible that if we could include more variables, or better variables, that contemporaneous data could account for even more of the variance. It’s also likely that the historical perspective does account for some of it. After all, history is complex and with hindsight, additional knowledge, etc., the perspective provided by time could revise the contemporary conclusions somewhat.

However, it’s quite clear that it’s easy to account for half the variance with a simple model that contains only two contemporaneous variables, and it's not too difficult to get up to two-thirds! This result reaffirms why I love statistics: You can observe and record the data around you and have a good assessment of reality that withstands the test of time. The historical perspective definitely has its place, but if you go find the right data and use the correct analyses, you can gain good insights right now!

Lightsaber Capability Analysis: The Results

$
0
0

lightsaberHere at the lightsaber factory, we've completed several steps in doing a capability analysis:

We’re getting close to our deadline, and it’s finally time to carry out our Capability Analysis and see if we are manufacturing our lightsabers to the correct specifications as set forth by the Jedi Temple.

First, let’s go to Stat > Quality Tools > Capability Analysis > Normal.  (If you want to play along and you don't already have it, get the free 30-day-trial of our statistical software.)  Fill out the dialog as follows:

capamenu

In addition, click on the "Options" button, and add a Target of 3. Once we have this, we can click OK and Minitab will perform our Capability Analysis.

Looking at the results, there are a lot of numbers and a lot of abbreviations. It can get confusing, so let’s look a little deeper at each piece of output.

2

First, at the top left you can see the Process Data box. This is just an overview of some descriptive statistics. It shows our lower spec limit (LSL), upper spec limit (USL), and Target values, as well as the mean, number of observations, and standard deviations that are calculated from our data.

You'll notice there are two different standard deviations that get calculated. The first is the overall standard deviation. This is exactly what it sounds like, the standard deviation among all of your points. The second is the within-subgroup standard deviation. This is the standard deviation within your subgroups.

Now, we can look at the Potential, or within, Capability on the right-hand side of the graph. These are Capability indices that are calculated using your within-subgroup variation. Below that overall Capability, which is associated with the overall standard deviation. The statistics are similar for both. Cp and Pp compares the process spread to the specification spread. The process spread is your standard deviation times a multiplier. The multiplier is usually 6, which would put 99.74% of the observations within +/- 3 sigma of the mean. The specification spread is just your USL-LSL. In this case, our Pp is 0.83 and our Cp is 0.82.

It looks like our process needs to be improved to meet the requirements of the temple.  

The PPL, CPL, PPU, and CPU respectively, relate the process spread to the single-sided specification spread. This is beneficial because it takes into account your process spread as well as your process center. However, the drawback is that they only measure from your process center to one side of your process, either upper or lower. You could fit well within the upper spec, but be way off base on the lower spec. This is why we have Cpk and Ppk.

Ppk and Cpk then take the minimum of your PPL, PPU, or CPL, CPU respectively. This gives us a nice number to look at which takes into account both specification limits as well as your spread. In general (though not necessarily always), this is the number that most people are looking for. The difference between Cpk and Ppk is that Cpk is sometimes referred to as the "potential capability" because it represents the potential your process has at producing parts within spec, presuming there is no variation between subgroups over time. Ppk, however, factors in that subgroup variation, so it gives an indication of where you actually are at this moment in time.

In other words, Cpk is where you can be, and Ppk is where you are. It is important to look at both.

In our case, our Cpk is 0.80, and our Ppk is 0.81. The Jedi Temple uses standard AIAG guildlines, which means we want our Capability indexes greater than 1.33. It is clear we’ve fallen short.

We’ve now carried out our analysis, and we’ve unfortunately come to the conclusion that we are not manufacturing the lightsabers to correct specifications. We’re going to need to consider our process and see what changes we can make to improve. We’ll leave those recommendations up to the Jedi council.

 

 

 

 

Learning Process Capability Analysis with a Catapult, part 1

$
0
0

Process Capability Catapultby Matthew Barsalou, guest blogger

We can use a simple catapult to teach process capability analysis using Minitab Statistical Software’s Capability SixpackTM. Here's how.

A process capability analysis is performed to determine if a process is statistically capable. Based on the results of the capability study, we can estimate the amount of defective components the process would produce.

However, a process must be in statistical control and have a normal distribution. A process that is not in statistical control must be brought in control before the capability analysis is performed. In addition, data that does not fit the normal distribution will need to be normalized using a transformation such as the Box-Cox transformation.

The Catapult Setup and First Run

A process and a specification are needed to demonstrate process capability analysis; we used a simple catapult. A rough idea of the catapult’s range and precision was required for determining what the specification should be, so we fired five catapult shots and recorded the distance the projectile traveled. Based on the results, we determined the catapult should be able to consistently land projectiles within a range of one meter. This was just a rough figure used to get started.

We set the specification at 700 cm from the end of a hallway to the point where the projectiles should land. The tolerance was set as +/- 50 cm, which might or might not be a specification that the catapult could meet. The purpose of the study was to determine if the catapult is capable, so the uncertainty was not a problem.

The catapult was then set up 400 cm away from the target of 700 cm.

The First Run and Capability Analysis

The catapult has a rubber band that hooks onto the front of the catapult and then goes over a wire guide that causes additional stretching before the rubber band is mounted onto the catapult arm. The wire guide was replaced with a thin wire that was bent and distorted. Ten shots were fired and the results were recorded. The wire was rotated 22.5° after each shot; rotating a weak and bent wire simulated a cause of variation in the process. 

The Minitab Capability Sixpack results for the run using a thin wire are depicted below. Minitab's Capability Sixpack provides a Xbar chart, an R chart, a view of the last five subgroups, a capability histogram, a normal probability plot and a capability plot with capability indices.

The Xbar chart and R chart were automatically calculated by Minitab using the run data that was entered. The specified subgroup size was 2, so each dot in the Xbar chart represents the average of two catapult shots. The average of the averages is 675.8 cm. This is short of the target of 700 cm, but still within specification.

Minitab has calculated the upper control limit (UCL) and lower control limit (LCL). The control limits are 3σ above or below the mean. The UCL is 761.8 cm and the lower control limit is 589.8 cm. The average was within the specification; however, the control limits are +/- 3σ of the process mean and 99.7% of the data will be within the control limits. Unfortunately, the catapult process will result in shots that will be out of specification because the LCL is below the LSL.

Reading the Process Capability Charts

The R chart calculates the average of the ranges in each subgroup of sample size two. The UCL and LCL for the range can be calculated using using the average of the ranges and a table; however, Minitab automatically performs the calculations. Here we can observe a large amount of variation in the catapult results. The actual results of the last five subgroups are also given. The difference between the first and second shot fired was over 100 cm; hence, the large range for the first subgroup in the R chart.

The capability histogram presents a histogram of the results with the shape of the distribution overlaid. The capability histogram also visually depicts the process output compared to the lower specification limit (LSL) and the upper specification limit (USL).

The normal probability plot depicts an Anderson-Darling goodness-of-fit test; this is used to determine if the data follows a normal distribution. The H0 is “data fits the normal distribution” and the Ha is “data does not fit the normal distribution.” The test statistic is automatically calculated by Minitab. Using an alpha of 0.05, we reject the null hypothesis because the calculated P value was only 0.022. This run not only had a large amount of variability; it violated the assumption of normality needed for the calculations.   

process capability analysis with a catapult 1

What's Next in this Capability Analysis?

In my next blog post, I'll perform a second run using thicker and more robust wire to stretch the rubber band. Since this wire will not have the variation that the thin one did, it will simulate a process improvement. We should see a reduction in variability as a result. 

In the meantime, if you want to build your own catapult, here are my plans and instructions for the DIY DOE Catapult in a PDF document.  

 

About the Guest Blogger: 
Matthew Barsalou has been the quality manager at an automotive supplier in Germany since 2011, and previously worked as a contract quality engineer at Ford in Germany and Belgium. He is completing a master’s degree in industrial engineering at the Wilhelm Büchner Hochschule in Darmstadt, Germany, and is also working on a manuscript for a practical introductory book on statistics.
  

Would you like to publish a guest post on the Minitab Blog? Contact publicrelations@minitab.com

 

 


Learning Process Capability with a Catapult, part 2

$
0
0

process capability catapultby Matthew Barsalou, guest blogger

Process capability analysis using Minitab Statistical Software’s Capability SixpackTM can be taught using a catapult. A process capability analysis is performed to determine if a process is statistically capable.

In my last blog post, I collected data from a first run of catapult results and found that the run not only had a large amount of variability, it also violated the assumption of normality. Now it's time to do a second run.  

The Second Run and Capability Analysis

A second run was performed using thicker and more robust wire to stretch the rubber band; this wire did not have the variation that the first one did, so it simulates a process improvement that should reduce variability. The results are depicted in the Capability Sixpack below. The Xbar chart for this run shows the UCL is below the USL; unfortunately, the LCL is still below the LSL. All shots landed within the specification limits; however, the capability histogram indicates the normal distribution for this run is wider than the specification limits.

process capability analysis catapult 2

The normal probability plot has a P value of 0.461 so we fail to reject the null hypothesis, which we stated as “data fits the normal distribution.” The first run appears to have had special cause variation; this could have been the result of using a wire that was heavily distorted. The normal probability plot indicates the data fits the assumption of normality so we move on to the capability plot.

Differences Between Within and Overall Indices in a Capability Analysis

There are results for “within” and “overall” indices. The difference between within and overall is the way in which the process variation is estimated. Within variation is based on the Statistical Process Control Manual and uses only common cause variation in subgroups in the calculation and overall variation includes both common cause and special cause variation for the entire set of data from the process study. Common cause variation results “from the system” and special cause variation results from an “assignable cause.” The manipulation of the weak wire would be an example of a special cause.

The within indices are the capability indices Cp and Cpk and they can be thought of as what the process is capable of producing. The overall indices are Pp and Ppk and they are the processes’ actual performance with the possible presence of variation due to special cause.

A Cp is a process capability index used to determine if a process is capable of meeting a specification. It is determined by dividing the tolerance range by six times the standard deviation.

Cp equation

Ideally, a Cp should be 1.33 or greater as this would mean the spread of the data is only 75% of the tolerance range; this leaves room for slight variations in the process without generating out-of-specification parts. However, the Cp index does not tell us if the process would produce parts that are within specification. A process could have a Cp of 2.00 due to very little variation, but still be producing out of specification parts because the process mean is at the edge of a specification limit.

A more complete picture is provided by also using the Cpk index, which is based on two calculations. The first calculation is USL minus the process mean divided by three times the process standard deviation, and the second calculation is the process mean minus LSL divided by three times the process standard deviation.

Cpk equations

The lower of the two results is used to identify the process capability in regards to the centering of the process in comparison to the specification limits. Like Cp, a Cpk should generally be 1.33 or greater.  

The formulas for Pp and Ppk are somewhat similar to the Cp and Cpk formulas; however, calculations for Pp and Ppk use the process variation in place of the standard deviation.

Pp and Ppk equations

The Minitab Capability Sixpack results for the second run indicate a Pp of 0.73 and a Ppk of 0.44, and this is much lower than the ideal of 1.33. Minitab has determined the process would result in a parts per million (PPM) of 94,081.64. This means that over 94,000 out-of-specification catapult shots can be expected for every 1,000,000 shots. This corresponds to a defect rate of 9.4%.

Although none of the 10 shots we made were out of specification, the process is still not capable and needs improvement.   

The Third Run and Process Capability Analysis

The means of the previous run were spread around a mean of 680 cm so the catapult process was adjusted by moving the catapult 20 centimeters closer to the target area. We then fired 10 shots and analyzed the new data. The results show an improvement as depicted in figure 3. The Xbar control limits are now within the range of the specification limits and both Pp and Ppk have improved. The PPMs indicate a defect rate of only 0.5%; this is an improvement, but still not acceptable for ensuring a product that conforms to specification in a mass-production environment. A manufacturing company with a Pp of 1.08 and a production run of 10,000 units should anticipate 50 units out of specification. The catapult process needs further improvements to reduce variation; fortunately, there are quality tools available to help with this.

capability analysis catapult 3

If you want to build your own catapult for learning capability analysis, here are my plans and instructions for the DIY DOE Catapult in a PDF document.  

 

About the Guest Blogger: 
Matthew Barsalou has been the quality manager at an automotive supplier in Germany since 2011, and previously worked as a contract quality engineer at Ford in Germany and Belgium. He is completing a master’s degree in industrial engineering at the Wilhelm Büchner Hochschule in Darmstadt, Germany, and is also working on a manuscript for a practical introductory book on statistics.
  

Would you like to publish a guest post on the Minitab Blog? Contact publicrelations@minitab.com

 

Status Reports: Reduce Your Cycle Time using Minitab Macros!

$
0
0

Across all industries, there are many different ways professionals utilize Minitab Statistical Software to improve the quality of their products and services.

You may be a professional in the health care industry who is interested in monitoring the days between hospital-acquired infections (HAIs) using rare event charts.  You may be a professional in manufacturing who is using a Pareto chart to evaluate the types of defects you are discovering during the inspection process.  Or you may be a product manager like me who uses Minitab analytics to evaluate customer survey data.

But no matter the different applications, we all share a common thread.

Status ReportsQuality Report

Many of us present reports to management and stakeholders at regular time intervals using Microsoft Word or PowerPoint.  These reports communicate actionable information, supported with Minitab graphs and analysis.

We take pride in our reports. We want to impress our audience. It's our voice.  So I guess it's no surprise that they can take quite a bit of time to prepare.

When we do prepare them, we want to maximize our time spent on the actual content while trying to minimize time spent on all of the non-value added activities.  

We want to reduce our status report cycle time.

Status Reports: The Old Way

One step that can be somewhat laborious is updating our Minitab graphs in the status report.

We all know the drill. Every couple of weeks we create the same set of graphs and then manually copy and replace those same graphs into our latest status report.

And then, right when we are ready to send the report late on a Friday afternoon...the data change and we have to do it again!  Yes, it happened to me once and I was consequently late with the family pizza—but that's a separate post altogether.

Bruno Scibilia wrote a great blog post on how to use Minitab macros to publish reports. Now let's also learn how to use Minitab macros and Microsoft Word together to automatically update our reports.

Status Reports: The New Way

“I’m a Quality Engineer at a manufacturing plant and I inspect our product daily looking for defects.  If the product does have a defect, I record the type of defect in Microsoft Excel.  Every two weeks, I run my executable in Minitab that imports the data from Excel and creates a Pareto chart.  I then copy and paste the graphs into a report template for management.”

The engineer’s Minitab executable may look something like this:

Minitab Executable

Now, let’s add three additional subcommands to the Pareto chart command:

Minitab Executable

These subcommands instruct the executable to save the Pareto chart to a file location as a JPEG file and to automatically replace the current Pareto chart, if one exists.  That's it from the Minitab side!

Now let’s set up the link to the Pareto chart in Microsoft Word.  Within Microsoft Word, first click in the document where you'd like to display the Pareto Chart, go to the Insert tab, and select "Picture" from the Illustrations group:

Picture

Once we click the Picture icon, we are brought to a dialog where we can navigate to where our Pareto Chart was saved.  Importantly, I will also select “Insert and Link” from the Insert drop down menu:

insert

We are now finished.  Microsoft Word is now connected to the source file, in this case, a Pareto chart.  Every time we run the executable in Minitab, a new Pareto chart is created and saved.  When we open our status report, we will see the most up-to-date Minitab graph!  If you have more than one graph, just follow the above steps for all your graphs.  Awesome!

And I'm happy to report there have been no more late pizzas for the family on Friday nights.

 

Lean, Six Sigma, or Lean Six Sigma?

$
0
0

Six Sigma symbolWhen I first started working among quality improvement professionals, I was caught off-guard by the varying terminology for "quality improvement." Some companies call their quality programs “process excellence initiatives” or “continuous quality improvement,” while others refer to their programs as “Lean” or “Lean Six Sigma." Others subtract ‘lean’ from their program titles altogether, and refer to their efforts simply as “Six Sigma.”

Are there really any differences, or is all of this terminology just jargon for ‘process improvement?’ The short answer to that question is that it all means process improvement, but there are some key differences to be aware of.

Lean vs. Six Sigma

First off, understand that there is a strong connection between Lean and Six Sigma. Both methodologies seek to make processes and the business as a whole more efficient by removing defects or waste through focused efforts that likely involve a project-based approach.

However, Lean refers to activities that are meant to be quick and efficient (see this post about performing kaizen or “blitz” improvements), while Six Sigma projects are meant to be thorough and permanent. Six Sigma operates off of the data-driven DMAIC approach, where projects are broken down into the five phases of Define, Measure, Analyze, Improve, and Control. In Six Sigma projects, all improvement efforts must be proven statistically significant, and this is where statistical software like Minitab can come in handy.

Lean projects are more loosely based and not as phase-driven, although most Lean activities also can be done within the framework of a Six Sigma DMAIC project. Some Lean tools, such as SIPOC and FMEA, have become strongly tied to specific DMAIC phases. For example, you’d likely perform a SIPOC or FMEA in the project selection or project scoping portion of the ‘Define’ phase.

FMEA form in Quality Companion

The opposite is not true though – not all Six Sigma tools can fit into Lean. A three-dimensional response surface DOE is too complex and requires much more of a time investment than a typical Lean tool would require. But just because Lean tools aren’t as complex as some Six Sigma tools, it doesn’t mean they aren’t powerful and effective. In fact, unlike traditional Six Sigma methods, many Lean tools are easily and directly used by individuals of all skill levels with minimal training.

A Battle of Methodologies?

In talking with various quality professionals at different organizations, I’ve noticed that there can be battles internally where proponents of Lean and proponents of Six Sigma form individual silos. Both sides do not see the full value of each other’s methods and toolkits. Is this true at your organization?

The key is to strike a happy medium and use both methodologies where it makes sense. Remember that the right tools should be used for the problem at hand, even if those tools fall within the Lean methodology, and you are technically supposed to be working on a 'Six Sigma' project.

Regardless of what you call your quality improvement endeavor, keep an open mind and rely on data analysis to drive improvements. Personally, I think Lean and Six Sigma are like peanut butter and jelly, or even Laverne and Shirley. You certainly can’t have one without the other!

What is the name of your company’s quality improvement program? Does the name reflect the tools you use?

The Top 10 (Statistically) Craziest Things that Happened in the 2013 NCAA Tournament

$
0
0

2013 NCAA BracketIn my previous blog post, I analyzed the madness in this year’s NCAA tournament for games through the Sweet 16. I found that it was one of the wackiest Sweet 16s ever. But things didn’t stop there—the Final Four was pretty crazy, too, having two 4 seeds and a 9 seed! So now that the tournament is over, I want to look back and see what were (statistically speaking) the most unlikely things to have occurred. Was it Florida Gulf Coast in the Sweet 16? Or Wichita State in the Final Four? What about Wisconsin’s horrible shooting performance? Let’s start analyzing the statistics to find out.

All probabilities come from a regression model that uses rankings from the Sagarin Ratings to calculate the probability one team has of beating another. Rankings are based on where the team was ranked at the time the game was played.

10. Louisville Winning It All: 30%

If you really think a 68-team, single-elimination tournament determines the “best team” in college basketball, this number should make you think again. Louisville was the top-ranked team in the Sagarin Ratings, so the regression model favored them in every game. In addition, Louisville got some breaks in their matchups, getting to play Oregon in the Sweet 16 and Wichita St in the Final Four (both of which were weaker teams than you’d normally expect to play in those rounds). And still, they only had a 30% chance of winning all 6 of their tournament games. In other words, despite being the best team in the tournament, there was a 70% chance that Louisville wouldn’t win the title!

9. Harvard Beating New Mexico: 21%

Hey, remember when this happened? Harvard was actually the Cinderella of the tournament at one point. Then, less than 24 hours later, Florida Gulf Coast beat Georgetown and Harvard was forgotten. Of course, they didn’t help themselves by getting blown out by Arizona in their next game.

Harvard winning a game was surprising because it was a 14 seed over a 3 seed. However, we shouldn’t have been that surprised that at least one 14 seed won a game. The odds of all of the 3 seeds winning would have been .79 (New Mexico) * .96 (Florida) * .80 (Michigan St) * .68 (Marquette) = 41%. In other words, there was a 100-41 = 59% chance that at least one 14 seed would win a game. The hard part was just correctly picking which one!

8. Syracuse Reaching the Final Four: 17%

For a 4 seed, this probability is actually higher than we would expect. That's because the East Region had very weak 2, 3, 5, and 6 seeds. Syracuse and Indiana (the 1 seed) were actually the two highest-ranked teams in the East Region. I thought this gave Indiana an easy path to the Final Four. But Syracuse had different plans, as their 2-3 zone stifled the Indiana offense and the Orangemen pulled the upset.  

Other than the Indiana game, Syracuse had a relatively easy path to the Final Four. The model heavily favored them in the 3 other games they won: Montana (88%), California (87%), and Marquette (63%). So while it was unlikely that Syracuse would reach the Final Four, it really wasn’t that shocking.

7. Oregon Getting to the Sweet 16: 11%

The Ducks pulled off a pair of upsets to reach the second weekend of the tournament. The model gave them the exact same probability of winning in each of their first two games, 33%. So neither game was that big of an upset, but combined it comes out to the 7thth unlikeliest event in the tournament!

6. La Salle Reaching the Sweet 16: 7%

Keep in mind that La Salle had to win three games to get to the Sweet 16, so that is what makes this probability so low. But shouldn’t that make it a lot lower than Oregon, who only had to win 2 games? Well, La Salle can thank Wisconsin’s horrible shooting for an easier path to the Sweet 16. Because Mississippi pulled the upset over Wisconsin, La Salle had a much easier opponent in the round of 32 than Oregon did. The regression model would have given La Salle only a 22% chance of beating Wisconsin, which is a lot lower than the 35% chance they had against Mississippi.

In fact, if you remove the First Four game against Boise State, the probability of La Salle reaching the Sweet 16 was 13%. That actually makes Oregon’s run to the Sweet 16 less likely than La Salle’s!

5. The Atlantic 10 Goes Undefeated in the First Round: 4%

After La Salle beat Boise State in the First Four, the Atlantic 10 had 5 teams in the round of 64. All 5 teams won their game, giving the Atlantic 10 a great start to the tournament. The 5 wins included two heavy favorites winning (St. Louis at 87% and VCU at 70%), two upsets (La Salle at 37% and Temple at 35%), and one coin flip winner (Butler at 57%). When you multiply all those, you get a probability of 4%. Impressive, Atlantic 10!

But the impressive streak ended there. In the round of 32, the Atlantic 10 went 1-4, with La Salle being the only winner. Not so impressive.

4. Florida Gulf Coast Getting to the Sweet 16: 3.5%

Yep, Florida Gulf Coast comes in at 4th! Bet you thought they’d be higher! The model gave them a 13% chance of beating Georgetown, and a 27% chance of beating San Diego State. So winning both of those games were certainly unlikely, but it wasn’t out of the realm of possibility!

We saw before that we shouldn’t have been that surprised that a 14 seed won a game. Can we say the same thing about the 15 seeds? The probability of all of the 2 seeds winning would have been .87 (Georgetown) * .87 (Miami) * .88 (Ohio St) * .93 (Duke) = 62%. So there was a 38% chance that at least 1 15 seed would win a game! Not as good as the 14 seeds, but that’s still higher than most people would expect!

3. Michigan Making the Championship Game: 3.2%

We also saw before that it wasn’t terribly unlikely that Syracuse made the Final Four. They were helped by a weak region, and really only had to beat one elite team. The same cannot be said of Michigan. Just to get to the Final Four they had to beat the 13th ranked team (VCU), the 6th ranked team (Kansas) and the 2nd ranked team (Florida).  The probability of winning all 3 of those games (plus South Dakota State) was a mere 6%. So you see that they had a much harder path to the Final Four than Syracuse.

But unlike the Orangemen, Michigan’s run didn’t end with the Final Four. The Wolverines made it to the championship game, making their run even more unlikely, at 3.2%. But just like the tournament, Michigan isn’t going to bring home the title, as another team outdid their probability. I’m guessing you won’t be shocked to hear who it is.

2. Wisconsin’s Shooting Performance: 2%

Ha! I bet you thought I was going to talk about Wichita State next. We’ll get to them, but first let’s talk Wisconsin. I went off script a little bit here, but the Badgers’ shooting in their first-round loss to Mississippi was so horrific that I had to look into it. On the season they made 48% of their 2pt field goals. Against Mississippi, they made 8 out of 29, for a measly 27.6%. So what is the probability that a team that normally makes 48% would make 8 or fewer out of 29 attempts? We can use a probability plot to get the answer.

Wisconsin's Shooting

The most likely outcome would have been for Wisconsin to make 14 of their 2-point shots in that game. This is represented by the highest bar in the middle. In order to see what they actually did, we have to look all the way down to the left. It’s represented by the red region. There was only a 2% chance that Wisconsin would shoot as badly as or worse than they did. Making another 6 shots would have given Wisconsin 12 more points. And what do you know, Wisconsin lost by 11. So for all the praise that Marshall Henderson got after this game (he actually shot almost as badly as the Badgers), it was Wisconsin’s terrible shooting that won this game for Mississippi, not Henderson.

1. Wichita State Getting to the Final Four: 1.3%

If Wichita State would have simply gotten to the Sweet 16 and then lost, their run would have still been good enough for 6th on this list. They had a brutal opening game because the Sagarin Ratings loved Pitt, ranking them 10th. Wichita State had a mere 29% of just winning their first game! But after pulling that upset, things only got harder. They had to upset a Gonzaga team that was ranked 7th. After defeating Gonzaga they beat a La Salle team they were favored over, then pulled their final upset over 5th ranked Ohio State.

So the Shockers beat 3 top 10 teams in 4 games. That’s pretty impressive, and also very unlikely...for any team! But it was even more unlikely for the low-rated Shockers, who started the tournament ranked 40th in the Sagarin Ratings. By the time the Final Four had started, they had moved up to 20th.

It was a tournament full of Madness. From Harvard, to Gulf Coast, to La Salle, the upsets never stopped. But at the end of it all, nobody pulled off anything as unlikely as what Wichita State did. So congratulations Shockers, you are the craziest thing that happened in the 2013 NCAA tournament!

Benthic Invertebrates Gone Wild!

$
0
0
Using a Survey of Aquatic Bugs to Estimate Stream Quality

monika and yolandaAs we click, flip, and scroll through hundreds of sites and channels, cruising for our daily dose of e-thrills, it’s easy to forget there’s a beautiful, wild, creative universe right in our backyards.

I had the chance to experience a tiny part of that universe on a recent Saturday afternoon, when a couple of friends, Yolanda and Monika, asked me if I wanted to join them to monitor the water quality of the stream that runs in back of our house.

Yolanda and Monika are part of a large grassroots network of volunteers who selflessly give their time to regularly monitor the quality of streams, lakes, and rivers across the country. Government agencies simply don’t have the resources to regularly collect environmental data on all of the myriad waterways in the U.S., so they depend on getting help from trained volunteers like Yolanda and Monika.

Benthic Invertebrates: Bellweathers of Water Quality

Yolanda and Monika regularly monitor benthic invertebrates counts in a small local stream called Woods Creek.

Benthic invertebrates are tiny animals with no internal skeleton that live on the bottom of lakes, ponds, rivers and streams. Many are insect larvae that spend their “childhood” in the water, until they evolve into adult insects like dragonflies and damselflies.

water pennySome of these tiny critters are extremely fussy and can survive only in fresh, clean water (the Water Penny, the cute-as-button tiny beetle larva shown at right, is one example). Others—like sowbugs, leeches, and scuds—can thrive even in stagnant, polluted environments. For that reason, scientists often survey benthic invertebrates in a natural body of water to assess its ecological condition.

The day I tagged along, we randomly sampled a local stream at 3 locations, following the protocols of the organization Virginia Save Our Streams (VASOS).

As Yolanda loosened rocks that formed riffles in the stream, Monika used a finely meshed seign net to collect the invertebrate sample. Then we recorded the tallies for each type of macroinvertebrate on a data collection sheet (shown partially below) before releasing them back into the stream:

tally sheet

(The images not to scale, of course. Most of these critters are very tiny. You need a magnifying glass to see them clearly.)

Before explaining how their counts are used to estimate stream quality, I hope you’ll grant me a temporary license to digress. Because I’d like you to meet my favorite new BBIFF (best benthic invertebrate friend forever).

(If you’d rather get right down to the number crunching in Minitab Statistical Software, skip ahead to the Using Minitab Formulas…section.)

Mother Nature: The Ultimate Quality Engineer

Life ain’t easy. Especially if you’re a soft, gooey dab of flesh getting regularly pummeled by stream rapids and in constant danger of being gobbled up as a tasty afternoon snack. (And you thought you had it bad...)

But the Caddisfly Strong-Case Maker has developed a resourceful way of protecting itself  from the rough-and-tumble of life in a stream bed. A close relative of the butterfly, the larva collects sand, pebbles, sticks, leaves—whatever material is readily available—and then spins silk as a mortar to piece together a surprisingly hard, crush-resistant case to protect its soft body (hence its name).

casemaker rocks

We found several strong casemakers like this in our recent stream survey—a very good sign! Because the strong casemaker generally requires clean, fresh water to survive. (As do the trout that love to eat them once they sprout wings and flutter above the stream!)

Next time, I’m hoping that we’ll find the casemaker that uses leaves for building materials:

casemaker leaf

Creative, isn’t it? Like a Japanese origami design for body armor. This leaf case must be a heckuva lot lighter than the all-rock version—for the casemaker who wants to stay light on his feet!

To see more cool pics of casemakers and their brilliantly engineered cases, check out this blog on aquatic insects by Bob Henricks, who kindly gave me his permission to use these pictures.  

Then take a look at how a French artist enlists casemakers to create gorgeous custom jewelry, by providing them with gold flakes, turquoise, pearls, and other precious source materials.) You can even watch a video of a casemaker artiste doing his thing, using precious gems.


Now back to statistics…

Using Minitab Formulas to Calculate Stream Quality Index

Once the counts for each macroinvertebrate category are tallied, VASOS uses a series of formulas to calculate a multimetric index that indicates the water quality of the stream.

VASOS provides data sheets that show how to calculate these formulas “by hand.” But I think it’s much easier—and much less prone to error—if you let Minitab do the heavy lifting. To automate the calculations, you can assign formulas to columns in a Minitab worksheet (right-click a column and choose Formula > Assign formula to column).

To illustrate this, I recorded benthic invertebrate counts for some actual Virginia streams in columns C4-C26 of a Minitab worksheet.

worksheet counts

Then I followed the steps below to assign formulas to estimate the water quality of a stream.

(If you want to follow along and practice adding formulas to a Minitab worksheet, click here to download the Minitab 16 project file that contains the original data without the formulas. If you don't have Minitab, you can get a free 30-day trial here.)

Step 1: Assign a formula to add sums

In column C27, I assigned a formula to sum the total number of benthic invertebrates  recorded in columns C4-C26.

formula for sums

Tip: Notice I hid columns C4-C26 in Minitab for this screenshot. Hiding columns can make it easier to get a birds-eye view of selected data in your worksheet. To hide columns, select the columns you want to hide, right-click, and choose Columns > Hide Selected Columns. To unhide columns, choose Editor>Columns>Hide/Unhide columns.  

Step 2: Assign a formula to calculate a percentage

Once Minitab calculates the total organisms in column C27, that value can be used to calculate the percentage of each type of invertebrate. For example, the following formula in column C28 calculates the % of invertebrates that are mayflies, caddiflies, or stoneflies:

percentage dialog

 Using similar formulas to calculate percentages of  Gompihdea (dragonflies), beetles, and invertebrates tolerant of pollution, respectively, gives the percentage values in columns C28-C31 shown below:

percentage formulas

Step 3: Assign a formula to assign an index value

The next step requires using a logical function. Whenever you want to assign numerical index scores or categorical text values based on multiple conditions, use the IF(general) function in the Minitab Calculator.

For the invertebrate data, the guidelines require assigning a score of 6, 3, or 0 depending on the percentage of each category of organisms in the sample.

For example, Metric 1 assigns a value of 6, 3, or 0 depending on the total percentage of mayflies, caddisflies, and stoneflies in the sample.

  • If the percentage is greater than 4.6, the Metric 1 score is 6.
  • If the percentage is greater than 1.6 but  less than 4.6, the Metric 1 score is 3.
  • If the percentage is less than 1.6, the Metric 1 score is 0.
 Using IF(general) to define a formula

Here’s what the IF(general) function in the Calculator looks like before you enter any values: IF(test,value_if_true,...,test,value_if_true,[value_if_false]).

In test, enter a condition. After the comma that follows the condition, enter the value you want Minitab to return if the condition is true. For example, column C28 contains the percentage of mayflies, caddisflies, and stoneflies in the sample. IF(C28 >4.6,6…) tells Minitab to return a value of 6 if the percentage of mayflies, caddisflies, and stoneflies is greater than 4.6.

Now add the next condition. In this example, if the percentage is greater than 1.6 (but less than 4.6), we want Minitab to assign a value of 3. In the Expression field, enter IF(C28 >4.6,6,C28>1.6,3…)

Continue to enter conditions and the value to return if each condition is true, as needed. Minitab works from left to right, and returns the value for the first true condition, then the second true condition, and so on. You can also add an optional value at the very end of the expression to return if none of the conditions hold.

In this example, I’ll use this optional value to return a score of 0 for Metric 1 if the percentage is not greater than 1.6. So the complete  IF(general) formula for Metric 1 looks like this: IF(C28 >4.6,6,C28 >1.6,3,0)

Metrics 2, 3, and 4 assign index values based on the percentages of gomphidea, beetles, and pollution-tolerant organisms, respectively. Each metric uses a different range of percentages to define the index values of 6, 3, and 0, so each one requires a different IF (general) formula assigned to the columns.

metrics worksheet

Step 4: Create an overall index to rate the stream

We’re almost there!

The last step is simply to add all four metrics together to calculate the Multimetric Index Score in column C36. Then, the ecological condition of the stream (acceptable, partially acceptable, or nonacceptable) can be determined by assigning a formula that uses—you guessed it—the IF(general) function, like this:

IF('Multimetric Index Score'>14,"Acceptable",'Multimetric Index Score'>=8,"Partially Acceptable","Unacceptable")

Note: Enclose any text values you want Minitab to return based on the conditions in double quotes.

Presto! An Instant Estimate of Stream Quality

Setting the formulas up in the worksheet initially requires some time and some basic knowledge of calculator functions. But after that, you’re good to go!

Now, just plop in the counts from each new survey (in C4-C26) and you can instantly estimate the ecological condition of the stream, without having to do any calculations.

 Stream quality worksheet

Note: Not all water-monitoring organizations use these exact formulas and categories to estimate water quality based on benthic invertebrate counts. The formulas and categories can vary—but the general concept is the same.

For example, the Wadeable Streams Assessment Study was a large-scale statistical survey of streams throughout the U.S. that also used benthic invertebrate counts to assess stream quality. The 2004 study included nearly 1400 random sites chosen to represent the conditions of all U.S. streams. It was a great collaborative effort by state and federal agencies, Native-American tribes, universities, hundreds of dedicated volunteers like Yolanda and Monika.

Here's the national report card on the water quality in our wadeable streams:

 pie chart

The pie chart speaks for itself. 

But here's the good news. The first step in improving quality is measuring it. You can't fix what you don't know needs fixing.

Earth Day is just one week away...

Looking for a way to feel more connected to the real world around you? Consider joining the thousands of trained volunteers like Monika and Yolanda who monitor local lakes, streams, and rivers in the U.S. (including urban environments). It’s an excellent way to meet some wonderful people and do some tangible good for the planet.

Here are a few organizations that depend on volunteer monitoring programs to help them regularly assess water quality:

(And by the way, if you don’t like bugs there are many other methods of water monitoring you can do besides the benthic invertebrate survey! Trust me—it's fun.)

Enough Is Enough! Handling Multicollinearity in Regression Analysis

$
0
0

In regression analysis, we look at the correlations between one or more input variables, or factors, and a response. We might look at how baking time and temperature relate to the hardness of a piece of plastic, or how educational levels and the region of one's birth relate to annual income. The number of potential factors you might include in a regression model is limited only by your imagination...and your capacity to actually gather the data you imagine.

But before throwing data about every potential predictor under the sun into your regression model, remember a thing called multicollinearity. With regression, as with so many things in life, there comes a point where adding more is not better. In fact, sometimes not only does adding "more" factors to a regression model fail to make things clearer, it actually makes things harder to understand!  

What Is Multicollinearity and Why Should I Care?

tackling multicorrelation in regression analysisIn regression, "multicollinearity" refers to predictors that are correlated with other predictors.  Multicollinearity occurs when your model includes multiple factors that are correlated not just to your response variable, but also to each other. In other words, it results when you have factors that are a bit redundant.

You can think about it in terms of a football game: If one player tackles the opposing quarterback, it's easy to give credit for the sack where credit's due. But if three players are tackling the quarterback simultaneously, it's much more difficult to determine which of the three makes the biggest contribution to the sack. 

Not that into football?  All right, try this analogy instead: You go to see a rock and roll band with two great guitar players. You're eager to see which one plays best. But on stage, they're both playing furious leads at the same time!  When they're both playing loud and fast, how can you tell which guitarist has the biggest effect on the sound?  Even though they aren't playing the same notes, what they're doing is so similar it's difficult to tell one from the other. 

That's the problem with multicollinearity. 

Multicollinearity increases the standard errors of the coefficients. Increased standard errors in turn means that coefficients for some independent variables may be found not to be significantly different from 0. In other words, by overinflating the standard errors, multicollinearity makes some variables statistically insignificant when they should be significant. Without multicollinearity (and thus, with lower standard errors), those coefficients might be significant.

Warning Signs of Multicollinearity 

A little bit of multicollinearity isn't necessarily a huge problem: extending the rock band analogy, if one guitar player is louder than the other, you can easily tell them apart. But severe multicollinearity is a major problem, because it increases the variance of the regression coefficients, making them unstable. The more variance they have, the more difficult it is to interpret the coefficients.

So, how do you know if you need to be concerned about multicollinearity in your regression model? Here are some things to watch for:

  • A regression coefficient is not significant even though, theoretically, that variable should be highly correlated with Y.
  • When you add or delete an X variable, the regression coefficients change dramatically.
  • You see a negative regression coefficient when your response should increase along with X.
  • You see a positive regression coefficient when the response should decrease as X increases.
  • Your X variables have high pairwise correlations. 

One way to measure multicollinearity is the variance inflation factor (VIF), which assesses how much the variance of an estimated regression coefficient increases if your predictors are correlated.  If no factors are correlated, the VIFs will all be 1.

To have Minitab Statistical Software calculate and display the VIF for your regression coefficients, just select it in the "Options" dialog when you perform your analysis. 

VIF Option in Regression Analysis

With Display VIF selected as an option, Minitab will provide a table of coefficients as part of its output.  Here's an example involving some data looking at the relationship between researcher salary, publications, and years of employment: 

regression output coefficient table with VIF

If the VIF is equal to 1 there is no multicollinearity among factors, but if the VIF is greater than 1, the predictors may be moderately correlated. The output above shows that the VIF for the Publication and Years factors are about 1.5, which indicates some correlation, but not enough to be overly concerned about. A VIF between 5 and 10 indicates high correlation that may be problematic. And if the VIF goes above 10, you can assume that the regression coefficients are poorly estimated due to multicollinearity.

You'll want to do something about that. 

How Can I Deal With Multicollinearity? 

If multicollinearity is a problem in your model -- if the VIF for a factor is near or above 5 -- the solution may be relatively simple. Try one of these: 

  • Remove highly correlated predictors from the model.  If you have two or more factors with a high VIF, remove one from the model. Because they supply redundant information, removing one of the correlated factors usually doesn't drastically reduce the R-squared.  Consider using stepwise regression, best subsets regression, or specialized knowledge of the data set to remove these variables. Select the model that has the highest R-squared value. 
     
  • Use Partial Least Squares Regression (PLS) or Principal Components Analysis, regression methods that cut the number of predictors to a smaller set of uncorrelated components.

With Minitab Statistical Software, it's easy to use the tools available in Stat > Regression menu to quickly test different regression models to find the best one. If you're not using it, we invite you to try Minitab for free for 30 days.  

Have you ever run into issues with multicollinearity? How did you solve the problem? 

When Should I Use Confidence Intervals, Prediction Intervals, and Tolerance Intervals

$
0
0

In statistics, we use a variety of intervals to characterize the results. The most well-known of these are confidence intervals. However, confidence intervals are not always appropriate. In this post, we’ll take a look at the different types of intervals that are available in Minitab, their characteristics, and when you should use them.

I’ll cover confidence intervals, prediction intervals, and tolerance intervals. Because tolerance intervals are the least-known, I’ll devote extra time to explaining how they work and when you’d want to use them.

What are Confidence Intervals?

Illustration of confidence level for confidence intervalsA confidence interval is a range of values, derived from sample statistics, that is likely to contain the value of an unknown population parameter. Because of their random nature, it is unlikely that two samples from a given population will yield identical confidence intervals. But if you repeated your sample many times, a certain percentage of the resulting confidence intervals would contain the unknown population parameter. The percentage of these confidence intervals that contain the parameter is the confidence level of the interval.

Most frequently, you’ll use confidence intervals to bound the mean or standard deviation, but you can also obtain them for regression coefficients, proportions, rates of occurrence (Poisson), and for the differences between populations.

Suppose that you randomly sample light bulbs and measure the burn time. Minitab calculates that the 95% confidence interval is 1230 – 1265 hours. The confidence interval indicates that you can be 95% confident that the mean for the entire population of light bulbs falls within this range.

Confidence intervals only assess sampling error in relation to the parameter of interest. (Sampling error is simply the error inherent when trying to estimate the characteristic of an entire population from a sample.) Consequently, you should be aware of these important considerations:

  • As you increase the sample size, the sampling error decreases and the intervals become narrower. If you could increase the sample size to equal the population, there would be no sampling error. In this case, the confidence interval would have a width of zero and be equal to the true population parameter.
     
  • Confidence intervals only tell you about the parameter of interest and nothing about the distribution of individual values.

In the light bulb example, we know that the mean is likely to fall within the range, but the 95% confidence interval does not predict that 95% of future observations will fall within the range. We’ll need to use a different type of interval to draw a conclusion like that.

What Are Prediction Intervals?

A prediction interval is a type of confidence interval that you can use with predictions from linear and nonlinear models. There are two types of prediction intervals that use predictor values put into the model equation.

Confidence interval of the prediction

A confidence interval of the prediction is a range that is likely to contain the mean response given specified settings of the predictors in your model. Just like the regular confidence intervals, the confidence interval of the prediction presents a range for the mean rather than the distribution of individual data points.

Going back to our light bulb example, suppose we design an experiment to test how different production methods (Slow or Quick) and filament materials (A or B) affect the burn time. After we fit a model, statistical software like Minitab can predict the response for specific settings. We want to predict the mean burn time for bulbs that are produced with the Quick method and filament type A.

Minitab calculates a confidence interval of the prediction of 1400 – 1450 hours. We can be 95% confident that this range includes the mean burn time for light bulbs manufactured using these settings. However, it doesn’t tell us anything about the distribution of burn times for individual bulbs.

Prediction interval

A prediction interval is a range that is likely to contain the response value of a single new observation given specified settings of the predictors in your model.

We’ll use the same settings as above, and Minitab calculates a prediction interval of 1350 – 1500 hours. We can be 95% confident that this range includes the burn time of the next light bulb produced with these settings.

The prediction interval is always wider than the corresponding confidence interval of the prediction because of the added uncertainty involved in predicting a single response versus the mean response.

We’re getting down to determining where an individual observation is likely to fall, but you need a model for it to work.

What Are Tolerance Intervals?

A tolerance interval is a range that is likely to contain a specified proportion of the population. To generate tolerance intervals, you must specify both the proportion of the population and a confidence level. The confidence level is the likelihood that the interval actually covers the proportion. Let’s look at an example, because that’s the easiest way to understand tolerance intervals.

Example of a tolerance interval

The light bulb manufacturer is interested in how long their light bulbs burn. The analysts randomly sample 100 bulbs and record the burn time in this worksheet.

In Minitab, go to Stat > Quality Tools > Tolerance Intervals. Under Data, choose Samples in columns. In the textbox, enter Hours. Click OK.  (If you're not already using it, please download the free 30-day trial of Minitab and play along!)

Example of a tolerance interval

The normality test indicates that our data is normally distributed. Consequently, we can use the Normal interval (1060 1435). The manufacturer is 95% confident that at least 95% of all burn times will fall between 1060 to 1435 hours. If this range is wider than their clients' requirements, the process may produce excessive defects.

How tolerance intervals work compared to confidence intervals

A confidence interval's width is due entirely to sampling error. As the sample size approaches the entire population, the width of the confidence interval approaches zero.

In contrast, the width of a tolerance interval is due to both sampling error and variance in the population. As the sample size approaches the entire population, the sampling error diminishes and the estimated percentiles approach the true population percentiles.

To determine where 95% of the population falls, Minitab calculates the data values that correspond to the estimated 2.5th and 97.5th percentiles (97.5 - 2.5 = 95). Read here for more information about percentiles and population proportions.

Unfortunately, the percentile estimates will have error because we are working with a sample. We can’t be 100% confident that a tolerance interval truly contains the specified proportion. Consequently, tolerance intervals have a confidence level.

Uses for tolerance intervals

In general, use tolerance intervals if you have sampled data and want to predict a range of likely outcomes.

In the quality improvement field, Six Sigma analysts generally require that the output from a process have measurements (e.g., burn time, length, etc.) that fall within the specification limits. In this context, tolerance intervals can detect excessive variation by comparing client requirements to tolerance limits that cover a specified proportion of the population. If the tolerance interval is wider than the client's requirements, there may be too much product variation.

With Minitab statistical software, it’s easy to obtain all of these intervals for your data! You just need to be aware of what information each interval provides.


Remembering the Positive in a Time of Tragedy

$
0
0

Mile 25 from the 2005 Boston MarathonMy holy of holies is the human body, health, intelligence, talent, inspiration, love, and the most absolute freedom imaginable, freedom from violence and lies, no matter what form the latter two take.

                -Anton Chekhov

Normally, I write about subjects that are generally of interest to me when I do a blog post: you may have seen the list before. So I’ll have to start out this post with the admission that until I was leaving the office on Monday, I had no idea that the Boston Marathon was going on that day. I had no clue that the estimate of the number of spectators would be over half a million people. I didn’t know that the third Monday in April is Patriot’s Day in Massachusetts and Maine. And I didn't know about the explosions at the finish line until I turned on the news.

Now the whole world is aware of these events, and all of us at Minitab are shocked and heartbroken about the harm inflicted by this despicable attack.

In following the subsequent news coverage, I've seen many people talk about piecing together facts to understand what happened—and why. That's important, but I've started to think that the most important motivations from Monday, April 15th 2013 weren’t about detonating devices. The important motivations are those of the people who attended and participated in the race for reasons we already understand: to improve society.

I’m always curious about numbers, and there are certainly a lot of statistics about the Boston Marathon. Without losing sight of the events that are dominating the news, the most meaningful numbers to me are ones that speak to the good the participants and supporters of the Boston Marathon do, such as the amount of money raised for various charities that were sponsored by John Hancock on crowdrise.

Here's a boxplot of that data and some notes about a few statistics of interest.

Boxplot of money raised on crowdrise by teams sponsored by John Hancock

Sum

When raising money, the sum is always an exciting statistic. The pictures with the teams on crowdrise have a total of $6,333,927 raised for charity. The Boston Marathon Official Charity Program and the John Hancock-sponsored charities expect to raise more than $16,000,000 in 2013. And many other charities raise money during the marathon without sponsorships from these two organizations.

Maximum

The boxplot uses asterix symbols for teams that raised exceptionally large amounts of money on crowdrise compared to the other teams. The maximum in the data set belongs to the 105-person team from Mass General Hospital for Children, with $593,449. That's an achievement to feel good about.

Mean

For the 250 teams in the data set, the mean raised was $25,336. Multiply the mean and the number of teams and you’ll get close to the sum, which is my favorite thing about the mean. In this data, the mean is approximately equal to the third quartile. The mean is shown on the boxplot by the tangerine dot.

Median

Half the teams in the data set raised over $5,725 on crowdrise. On the boxplot, the line  that divides the box is median.

I picked these numbers to share because the data was convenient to get. But there are a lot more good stories and numbers about the good that people were doing on Monday. Like most of you, I will be following the news about the sorrowful events that took place, and I will do what I can to help in some small way.

But I’m also going to keep thinking about all the good that people achieved at the 2013 Boston Marathon, and I hope you will, too.

The image of the 2005 Boston Marathon is by Pingsweptand licensed for reuse under thisCreative Commons License.

Explaining Quality Statistics So My Boss Will Understand: Measurement Systems Analysis (MSA)

$
0
0

silverware holderAs a teenaged dishwasher at a local eatery, I had a boss who'd never washed dishes in a restaurant himself. I once spent 40 minutes trying to convince him that forks and spoons should go in their holders with the business end up, while knives should go in point-down. Whatever I said, he didn't get it. We were ordered to put forks and spoons in the holders with the handles up.

The outraged wait staff soon made clear what I hadn't: you can't immediately tell the difference between a fork and a spoon when all you can see is the handle! Explaning that in the right way would have minimized wasted time and  waitress anger.   

I knew nothing about statistics then. Now that I do, I often need to explain statistical concepts to people who don't know (and usually don't care) about the calculations and assumptions -- they just want the bottom line as fast as possible. I've found it useful to imagine how I could explain things to my old boss so that even he could understand. 

If you've got a boss who doesn't appreciate why we need to do the things we do to analyze data, maybe these thoughts will help you. 

"Why Do We Need this MSA Thing? Just Grab Some Numbers and Go." 

What's the first question you ask when you're about to run an analysis? For me, it's very simple: "Can I trust my data?"  If the answer to that question is "No," or even "I'm not sure," it's usually pointless to go any further: why would you spend any valuable time to interpret data you can't rely on? 

That's where measurement systems analysis, or MSA, comes in. MSA is a collection of methods you can use to assess your ability to collect  trustworthy, reliable data--the kind of data you want to analyze. 

MSA helps determine how much of the overall variation  found in your data is due to variation in your measurement system itself. Factors that might affect the measurement system can include data collection procedures, gages, and other test equipment, not to mention the individuals who are taking the measurements. It's just good sense to evaluate your measurement system before control charting, capability analysis, or any another analysis: it proves that your measurement system is accurate and precise, and your data are trustworthy.

Let's say you're overseeing a production line making a new line of precision screws that need to meet strict specifications. The value of doing an MSA seems self-evident to you, but you've got a boss who hasn't needed to actually measure anything since grade school. "Look," he tells you, "we've got the best people using the best equipment money can buy. Don't waste time measuring how you measure stuff! Just get some quick numbers to show we're meeting the customer's requirements." 

You know that even if you do have the best people and the best equipment available, you can't prove your products meet spec unless you've proved your measurements are reliable. But the customer's not specifically asking for an MSA, just some inspection data.  So how can you convince your boss it's worth the time to do an MSA first? 

Getting a Handle on Measurement System Error

On the surface, measurement can seem pretty straightforward: just record the number you see, and you're done! But that's not really the case. There will always be some degree of variation or error in a measurement system, and you can put those errors into two different categories: accuracy and precision. "Accuracy" is how close the measurement is to the actual value being measured.  "Precision" refers to the ability to measure the same part with the same device consistently. 

If your measurement system is working perfectly, wonderful: you won't have problems with accuracy or precision. But most systems can have one or both of these problems, and even if the system was working great a few months ago, something may have changed in the interim. The device that worked great last month might have slipped out of calibration, either through accident or just plain wear and tear. You might have a gage that measures the same part consistently (it has precision), but that measurement is wrong. Or the device might take measurements that are close to the actual value (it has accuracy), but show a lot of variation between multiple measurements of the same part. And you might have a device that records both inaccurate and widely variant measurements. 

The easiest way to visualize the value of an MSA is to think of targets: the bullseye is the actual value that you're measuring, and a good measurement hits the target right in the center each time. Doing an MSA is like looking at the pattern of "shots" so you can see if and where a device is having problems, and how to adjust it if needed, as shown below.

visualizing precision and accuracy in measurement systems analysis

And thus far we're just considering the device being used to take the measurement. If your measurements are being done and/or recorded by human beings, with all their innate potential for error and variation, and you can quickly see where doing data analysis without doing an MSA first creates boundless opportunities for disaster. 

Seeing (and Measuring) Is Believing

If your boss still doesn't get it, a hands-on demonstration involving actual human beings will probably help, and can even be fun. Get a bag of your boss's favorite candy and take a couple of samples. Measure their height or width yourself, and use these as your "standards." Now you can use Minitab to create a worksheet that will help you gather data for a simple Gage R&R study.  My colleague Cody Steele detailed how to analyze this data by doing a simple gummi bear experiment with his coworkers, which serves as a good model for what you can do with your boss.

Depending on how you set up your demonstration, you are very likely to find that different measuring devices and operators can assess the same part or sample and generate different results. The more error in the measurements, the more likely it is that any decisions based on those measurements will be in error, too.

And that means doing an MSA and making sure you're getting the good measurements you want can make the difference between a product that meets customer specifications and one that suggests your whole company has a couple of screws loose. 

Look for a statistical software package that offers MSA tools to help you evaluate accuracy and precision in your measurement systems. For example, Minitab offers tools including:

  • Gage R&R Study (crossed and nested)
  • Gage Run Chart
  • Gage Linearity and Bias Study
  • Type 1 Gage Study
  • Attribute Agreement Analysis
  • Attribute Gage Study (AIAG method)

If you don't already use it, you can try out these tools yourself with a free 30-day trial of our statistical software

Using Binary Logistic Regression to Investigate High Employee Turnover

$
0
0

Human resources might not be a business area where you’d typically expect to conduct a Six Sigma project. However, Jeff Parks, Lean Six Sigma master black belt, found the opportunity to apply Six Sigma to human resources while leading quality improvement efforts at a large manufacturer of aerospace engine parts.

The manufacturer was suffering from high employee attrition, or turnover, and struggled to understand why. With a DMAIC Six Sigma project, Parks set out to work with the HR department to investigate and reduce the high turnover rates.

In 2009, the manufacturer had normal attrition rates of 15-18 percent, which mirrored general manufacturing attrition rates for the region where the company was based. However, the downturn of the U.S. economy in 2009 coincided with an increase in the company’s attrition rate, which soared to more than 30 percent. Given the high unemployment rates regionally and nationwide, management expected the rates to be much lower.

Human resources personnel at the manufacturer typically relied on exit interviews to understand why employees leave. While a lot can be learned from these interviews, there is no opportunity to conduct an exit interview when employees leave without warning. And in an hourly manufacturing environment where the work is very laborious, it was not uncommon for employees to quit unannounced, leaving behind little or no information about what prompted their departure. The manufacturer had compiled basic data about employees who had left their jobs unannounced over the past two years, and Parks took on the task of analyzing the data to see what insights it might offer about the sudden rise in attrition.

How Minitab Helped

The company had hired 100 employees over a span of two years. Of those new hires, 32 had quit unannounced. The company had basic information about each of the new hires, including the employee’s gender, position, pay classification, shift worked, prior years of manufacturing experience, and length of commute.

To see if statistically significant differences existed between the proportion of employees who quit across pay classifications (hourly vs. salary) and shifts worked (first shift vs. second shift), Parks performed hypothesis tests (Stat > Basic Statistics) in Minitab. He had hoped to reveal patterns in employee attrition behavior, but the analysis showed no statistically significant differences between the proportions for any of the variables.

But Parks didn’t stop there. He investigated the raw data further, and noticed that many of the employees who quit were females with prior manufacturing experience who traveled various distances to work. With Minitab charts, Parks was able to visualize raw data the company compiled about employees with attrition behavior. The histogram below outlines the distribution of commute distances for newly hired employees over the course of two years:

Minitab Histogram

He also used Minitab to find out if there was a correlation between gender, prior years of manufacturing experience, or commuting distance and whether or not the employee had left the company. In this case, the response Parks wanted to assess—quitting or not quitting—was binary and only had two possible values. Minitab’s powerful binary logistic regression analysis allowed Parks to create statistical models to predict which variables might have made a person more likely to quit.

Binary Logistic Menu Path in Minitab

 

The analysis revealed that the length of an employee’s commute was statistically significant:

Logistic Regression Table

The output from his Minitab analysis yielded a regression equation Parks could use to predict the probability of employees quitting based on the number of miles that made up their commute to work. He used the equation to analyze distances up to30 miles, and found that commuting distance had little impact on the probability of an employee quitting until the 12 mile mark. At 12 miles, the probability of an employee quitting increased to more than 18 percent. “And at 13 miles, which is about a 30-45 minute commute, the probability of quitting jumped to more than 92 percent,” Parks notes. “If the commute exceeds 13 miles, it is almost assured that an employee will quit.”

Revamping the Interview Process

The manufacturer’s HR department used the results of Parks’ analysis to revamp the interview process for manufacturing positions. They began to look more closely at the potential employee’s commuting distance and took this into account before making hiring decisions.

Parks believes the data explain the seeming paradox between a rising attrition rate and a time of high unemployment. “The analysis seemed to imply that during the recession, people were very liberal about the jobs they applied for because the economy was so bad,” says Parks. “Once employees realized that the commute could be arduous, they quickly went on to look for other jobs, and often would quit.”

Of possible interest:

Analyzing Titanic Survival Rates, Part II: Binary Logistic Regression

U.S. Army: Improving HR Processes with Minitab

Truth, Beauty, Nonparametrics & Symmetry Plots

$
0
0
 swimmer“Shall I compare thee to a standard normal distribution?
  Thou art more symmetric and more bell-shaped…”  — Melvin Shakespeare (William’s lesser-known statistician brother)

The Greek philosopher Aristotle believed that symmetry was one of the primary elements of the universal ideal of beauty. Over 2000 years later, emerging research seems to bear him out. 

Studies suggest we tend to be more attracted to people with symmetrical bodies. Using motion-capture technology to record the movements of people dancing to a popular song, one recent study concluded that we even prefer those who dance symmetrically. (Note to Seinfeld fans: This may explain why Elaine’s dancing really stinks!)

But symmetry is more than skin-deep. It's also rooted in some universal truths.

The concept of symmetry pops up in the fundamental laws of physics (including the laws of motion and gravitation theory), in fractal geometry (chaos theory), quantum mechanics, in the design and analysis of computer algorithms, and in many other fields in science and mathematics.

In statistics, symmetry can have some important, concrete implications when you analyze your data.

The World's Most Beautiful Distribution

normal distribution

Quite attractive, isn’t it? The distribution of everyone's dreams.

People often think their data should always be normally distributed. That “normal” is “good,” and “nonnormal” is “bad.”

Sometimes I wonder if it’s partly because the symmetry of this distribution is so pleasing to the eye. Looking at its graceful bell curve, you might believe for a second that everything in the universe falls into perfect order.

But your data probably won’t look like this…and you know what? That’s perfectly okay.  In fact, it’s perfectly…um...normal.

Why the Beauty of the Normal Distribution May Be Overrated...and Overstated.

Many statistical analyses, such as t-tests, are formally based on the normal distribution. But often these analyses—like t-tests—are robust to modest departures from normality, especially if you have a reasonably large sample (n > 30).

Other analyses, such as capability analysis and reliability analysis, are less forgiving and do require fairly strict adherence to a distribution. But that distribution doesn’t have to be normal—it could be a Weibull distribution, an exponential distribution, or other distribution.

For these parametric analyses—that is, analyses based on a specific distribution of data—the key issue is that your data match (more or less, depending on the analysis) the distribution that’s being used to analyze your data.

But that distribution need not always be a symmetric, bell-shaped curve.

When to Go Nonparametric

There are also times when a parametric test may not meet your needs:

  • You have extreme outliers that can’t be removed from the data.
  • Your data set is small and extremely skewed.
  • Your data can’t be transformed (or you don’t want to transform them).
  • You cannot clearly ascertain the distribution of your data.
  • The distribution is not covered by methods of parametric analysis.
  • A nonparametric measure is more meaningful for your application.

In these cases, you may opt for nonparametric analysis, which does not require your data to follow specific distribution.

For example, suppose you want to estimate “average” home prices in a town. After you collect the data, you could use a 1-sample t-test to test whether the mean price of a home is greater or less than a hypothesized value, such as $200,000.

But suppose there are several new mega-mansions in the town that sell for millions of dollars. Those extreme values will inflate the arithmetic average of all the homes—making the mean not truly representative of the “average” home price. However, you shouldn’t remove those extreme values from your data set because they’re not errors or flukes.  

Instead, it may be wiser to assess the “average” home prices using the median, a nonparametric measure, instead of the mean. The median is simply the value that 50% of the data fall above, and 50% fall below. The median is not overly influenced by extreme values, so in this case, it provides a better reflection of the “average” home price. 

Symmetry in Nonparametrics: Choosing Between 1-sample Tests

butterflyStatistics is based on probability theory, so making informed choices often involves deciding how to best hedge your bets.

A nonparametric test generally provides less power than its corresponding parametric test. So by using it when your data do not satisfy minimum parametric requirements, you’re sacrificing some ability to detect a significant difference in order to gain increased certainty that any significant difference that you find is legitimate. Less power for more safety. (People weigh similar tradeoffs when buying a car.)

However, one common misconception is that a nonparametric test has no data requirements at all. Although it's true that no specific distribution is assumed for the data, nonparametric tests may have other requirements.

For example, suppose you decide to use a 1-sample nonparametric test to the median against a hypothesized value. In Minitab you’ll see two tests to choose from:

nonparametric menu

The 1-sample Wilcoxon is more powerful than the 1-sample Sign test, but it comes with a hitch: the 1-sample Wilcoxon test requires that your data be fairly beautiful (i.e., symmetric). The 1-sample Sign test does not.

Assessing the Symmetry of Data

To assess the symmetry of your data, you can use a symmetry plot, a histogram, or a boxplot. Personally, I like to use all three to get a composite sense.

The symmetry plot in Minitab (Stat > Quality Tools > Symmetry plot) includes a histogram in the upper right corner, as you can see below. I also created a boxplot (Graph > Boxplot) of the same data. Then I used Minitab’s Graph Layout tool (Editor > Layout Tool)  to combine the boxplot and symmetry plot on the same graph.

symmetrical symmetry plot

The symmetry plot graphs the distance from the median of points above the median against the corresponding points below the median. The closer these points lie to the 45-degree line, the more symmetric the data is.

Caution: The symmetry plot is not the same as a probability plot. It’s a more general tool, so the data points do not need to hug the diagonal line as closely. The important thing to look for is whether the points remain close to or parallel to the line, versus the points diverging from the line.

For the boxplot and histogram, look for whether they could be folded in half at their center and roughly match on each side. For the boxplot, the length of the whiskers and boxes on each side of the centerline (median) should be about the same.

Judging by the plots above, these data are fairly symmetric. You could use the 1-Sample Wilcoxon test.

What Do Nonsymmetric Data Look Like?

When data are not symmetric, they may be skewed in one direction. Here’s how skewed data appear on the plots:

skewed left

skewed right

Notice that the points on the symmetry plot diverge consistently either above or below the diagonal line. The histogram has longer tail in one direction. The boxplot has a longer box and whisker on one side of the median. Alas, these less-lovely plots have a much tougher time getting dates.

Because they’re not symmetric, the 1-Sample Sign test would be a better choice for these data than the 1-sample Wilcoxon.

Conclusion

Symmetry is more than just a pretty face. Whenever you're choosing a 1-sample nonparametric test, use Minitab boxplots, histograms, and symmetry plots to evaluate the symmetry of your data and choose the appropriate test.

As for the beauty of symmetry, it may not be as universally admired as Aristotle thought. The Japanese aesthetic of wabi-sabi reveres asymmetry and imperfection.

Personally, I find that a lot more interesting and appealingin both people and data.

Leveraging Designed Experiments (DOE) for Success

$
0
0

DOE Menu Path in MinitabYou know the drill…you’re in Six Sigma training and you’re learning how to conduct a design of experiment (DOE). Everything is making sense, and you’ve started thinking about how you’ll apply what you are learning to find the optimal settings of a machine on the factory floor. You’ve even got the DOE setup chosen and you know the factors you want to test …

Then … BAM! … You’re on your own and you immediately have issues analyzing the data. The design you’ve chosen might actually not be the best for the results you need.  It's a classic case of learning something in theory that becomes much more challenging when applied in the real world.

Scott Sterbenz is a Six Sigma master black belt at Ford Motor Company and a technical advisor for the United States Bowling Congress, and he knows the above scenario all too well. As part of his job at Ford, he teaches Six Sigma to quality engineers, and he’s also performed many successful DOEs over the course of his career in the auto industry and with the U.S. Bowling Congress.

Scott shared a few tips with us for performing DOEs that he’s gathered from mentoring Six Sigma students and conducting DOE applications at Ford.

Get Creative with Your Response

Most training materials for designed experiments teach that the response for an experiment should be the key process output variable (KPOV) of the process, or selected from a C&E matrix or fishbone diagram during the ‘Define’ phase of the DMAIC methodology. Generally these guidelines are true, but sometimes they yield a non-meaningful measure. You want to be able to measure your responses quantitatively.

At Ford, one Six Sigma team was faced with correcting premature bulb failures. Warranty costs in certain models were increasing every year, and bulb failures were the single highest warranty cost for Ford. There was some disagreement about what was causing the bulb failures. Was it over-voltage, an issue with the vibration of the vehicle, or perhaps something to do with the quality of bulbs from the supplier?

The team set up a 25 full factorial DOE with the following factors: voltage input, vibration input, filament orientation angle, bulb supplier, and filament centering. What should the response be? The team brainstormed many responses, including average time to failure, variance in time to failure, signal-to-noise ratio. But was there something better?

By thinking outside of the box and getting creative with responses, the team ended up selecting reliability measures over the other responses they had brainstormed. They performed follow-up analysis in Minitab Statistical Software, and ended up with the insight they needed to address why premature bulb failures were occurring. Since implementing their solution in 2008, the team has saved Ford several million dollars.

Moral of the story: get creative with your responses!

The Response Needn’t Be Continuous Data

When a cosmetic problem with the vehicle’s carpet threatened the impending launch of the 2011 Fiesta, Ford’s Body Interior Six Sigma team saw a clear opportunity for quality improvement. Ford and the supplier were at odds over how to solve a defect that became known as ‘carpet brush marking.’

Carpet Brush Marks

Sterbenz and the team began by working with the supplier to analyze the process used to manufacture the automotive carpet. They found that the settings of a machine called a needler were the likely cause of the diminished product quality. But the manufacturer worried that altering the needler’s settings also would affect the plushness of the carpet. The team needed to find process improvements that would eliminate brush marks while maintaining the plushness.

Using Minitab’s DOE tools, the Ford team created a fractional factorial design with center points that would give them the information they needed in only 34 runs. For each of the experimental runs, a team of evaluators compared the new product to the current carpet, and their ratings for softness and brush markings were averaged and analyzed.

The evaluators responses were collected as attribute data (from an ordinal Likert scale questionnaire). The team knew they needed continuous data – but how could they make these responses continuous? For this experiment, the team converted the units digit ordinal data to continuous data with tenths digit resolution by leveraging multiple evaluators and averaging:
 

Worksheet

Moral of the story: remember that your response needn’t be continuous data!

To read more about how Ford eliminated the carpet defect and successfully completed this DOE, check out the full case study: Launching a Flawless Fiesta: Ford Motor Company

Also, if you’re headed to the 2013 ASQ World Conference in Indianapolis, be sure to attend Scott’s session, From Bulb Failures to Bowling: Making Sure Your Designed Experiments Succeed on Monday, May 6, 2013 at 1:30 p.m. in room 205. He’ll share more tips for performing successful DOEs, including the most common problems quality practitioners run into when setting up and analyzing data for a DOE.

Viewing all 828 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>