Quantcast
Channel: Minitab | Minitab
Viewing all 828 articles
Browse latest View live

Enhancing Weekly Productivity with Staged Control Charts

$
0
0

by The Discrete Sharer, guest blogger

As Minitab users, many of us have found staged control charts to be an effective tool to quantify and demonstrate the “before and after” state of our process improvement activities.

However, have you ever considered using them to demonstrate the effects of changes to compensation/incentive plans for your employees? 

Here's an example of how a mid-sized commercial bank used Minitab’s staged control charts to better inform senior leaders about the benefits of an pilot incentive plan program, in the hopes of having it implemented nationwide.

Background

Because most retail banking operations do not generate revenues or profits for the organization, there is a significant focus on reducing costs. For most organizations, this is most easily accomplished by offering starting salaries that are not much higher than minimum wage. While this may make sense initially, it does not properly account for the impact of high turnover in those jobs—and the associated costs of needing to frequently replace the people who work in those positions.

To address that, one organization designed an incentive compensation plan that “paid for itself,” and tested it at one of its processing sites over a period of 18 months. That test window included the baseline period followed by three iterations of the incentive plan, where the potential rewards increased from each previous level.

Production Environment and Incentive Plan Structures

The incentive process had engineered standards that deemed “on-target performance” to be between 315 – 385 items per hour. This reflected a +/- 10% range to the mean of 350 items per hour. For the initial 20 weeks of the pilot, the baseline weekly productivity mean was 349.90 items per hour, with a standard deviation of 5.8.

As had been the common business practice for many years, no additional incentive or compensation rewards were offered to employees for the first 20 weeks of the pilot program. For the subsequent three 20-week periods, increasing incentive awards were developed and implemented. They were structured as follows:

Single Plan:
Any employee who showed a 5% or greater improvement to the initial 20-week period received a $25 gift card each month. This award also included a “tax gross-up” so that the employee would get the full value of the award without incurring a tax penalty.

Compounded Plan:
About half of the employees showed some slight degradation in their work quality. To combat this unintended consequence, employees could earn additional rewards in the second phase by maintaining their gain (if they had met the initial 5% goal) or improving to that threshold while keeping their quality levels at the same rate or better. For those who met these objectives, the gift card amount was increased to $50 per month.

Increased Compounded Plan:
This was similar to the Compounded Plan, except for the amount of the gift card increased to $100 per month.

Assessing the Impact of the Incentive Plan

By offering employees a way to “earn more by doing more” through the incentive plan, the site saw improved performance that exceeded initial estimates. The table below provides some basic data on the results of each stage.

incentive plan data table

Shown graphically on a staged control chart created in Minitab Statistical Software, the results looked like this:

staged control chart for weekly productivity

Using the Staged Control Chart to Influence Management

Despite the fact that the incentive plan paid for itself through increased employee productivity, the business line manager was reluctant to embrace the results.

That is, until she was shown the Staged Control Chart seen above.

The manager's newfound support for the program was due to the visual impact of seeing variation reduced over the life of the pilot. She hadn't been educated as an engineer, but she did understand the importance of reducing variation within a process.

For her, knowing that the process was seeing reduced variation meant that the unit could do a better job predicting the labor needs for the upcoming quarters.

 

About the Guest Blogger

The Discrete Sharer's identity and biographical information is confidential. 


Has the College Football Playoff Already Been Decided?

$
0
0

A mere 10 seasons ago, USC and Oklahoma opened the college football season ranked #1 and #2 in the preseason AP Poll and the Coaches Poll. They remained there the entire regular season, as neither lost a game. But as chance would have it, they weren’t the only undefeated teams that year. Both Auburn and Utah went undefeated, but neither could crack the top 2, and Oklahoma and USC went on to play in the BCS Championship game.

That’s right, it was only 10 years ago that an undefeated SEC Champion was left out of the BCS title game.

If you’ve only been following college football for 8 years, you probably just fell out of your chair and had a heart attack.

BCSNo, that's not what I meant when I said the playoffs have already been decided!

But if you think about it, the BCS Championship game was already decided before the season started. In the preseason poll, voters determined that USC and Oklahoma were better than Auburn and Utah. There was nothing either team could do to change that thinking. They were dependent on the teams ranked above them to have more losses than they did.

Is this typical in college football? If the preseason poll has Team A ranked higher than Team B, and they finish the season with the same number of losses, will Team A still be ranked higher? If so, where a team is ranked in the preseason could have a major impact on whether or not they get selected to participate in the playoffs!  

Collecting the Data

I took the top 5 ranked teams in the AP Poll that had the same number of losses. The key is making sure they have the same number of losses. Last year Florida State was the only undefeated team, so of course they were going to be the top-ranked team regardless of their preseason ranking. But there were 5 teams ranked right below them with 1 loss (in total there were nine 1-loss teams, but I only used the 5 highest-ranked teams to keep the number of teams consistent from year to year). I recorded the order in which those 5 teams were ranked in the last regular season AP Poll (since the college football playoff will be decided before the bowl games, I’m ignoring the final AP Poll), and the order they were ranked in the preseason poll. For example, here is the data from 2013.

Team

Preseason Rank

Postseason Rank

Auburn

5

1

Alabama

1

2

Michigan State

3

3

Baylor

4

4

Ohio State

2

5

Before the season started, Auburn was the lowest ranked team of the 5. However, voters changed their mind by the end of the season, ranking Auburn the highest out of the 5. Perhaps the Tigers finally got some justice from 2004! But 2 years' worth simply isn’t enough data. So I went back and collected similar data for the last 10 college football seasons.  

For most years I used a group of five 1-loss teams, since there weren’t enough undefeated teams to compare. And there were two years I wasn’t able to get a group of five, so I have 47 total teams. You can get the data I used here.

Comparing Preseason and Postseason Ranks

The first thing we’ll look at is an individual value plot showing a team’s preseason and postseason rank. If teams stay in the same order from the preseason to the postseason, we would expect most of the data points to fall along the diagonal from bottom left to top right.

Individual Value Plot

Most of the points do fall along the diagonal. In the bottom left corner, you’ll see that of the 10 teams that held the top spot in the final postseason rankings, half were the top-ranked team in the preseason too. A majority of the points continue going up the diagonal, indicating top-ranked teams with the same number of losses stay in the same order from preseason to postseason.

But we should quantify this relationship to make sure it truly exists. First let’s look at the correlation between the two variables. To calculate the correlation between ordinal variables in Minitab, go to Stat > Tables > Cross Tabulation and Chi-Square. Then click Other Stats and select Correlation coefficients for ordinal categories.  

Correlation coefficients

The correlation coefficient can range in value from -1 to +1. The larger the absolute value of the coefficient, the stronger the relationship between the variables. An absolute value of 1 indicates a perfect relationship, and a value of zero indicates the absence of relationship. Minitab displays two different coefficients, Pearson’s r and Spearman’s rho. For these data, they both equal about 0.44, indicating a positive association preseason rankings and postseason rankings. Teams ranked higher in the preseason tend to be ranked higher going into the postseason.

Measuring Concordance

We can take this analysis one step further by looking at the concordant and discordant pairs. A pair is concordant if the observations are in the same direction. A pair is discordant if the observations are in opposite directions. This will let us compare teams to each other 2 at a time. For example, let’s go back to our five 1-loss teams from 2013.

Team

Preseason Rank

Postseason Rank

Auburn

5

1

Alabama

1

2

Michigan State

3

3

Baylor

4

4

Ohio State

2

5


Look at Auburn and Alabama. In the preseason, voters had Alabama ranked higher. But at the end of the season, they ranked Auburn higher. So this pair is discordant. Now look at Alabama and Ohio State. In the preseason, Alabama was voted higher. And even though the ranking of both teams fell during the season, Alabama still ranked higher at the end of the season. This pair is concordant. More concordant pairs means voters are keeping the same order as the preseason, while more discordant pairs means they’re switching teams.

When I did this for the other 9 seasons, I ended up with a total of 89 pairs. Of those, 61 were concordant. That’s over 68%! In fact, there were only two seasons with more discordant pairs than concordant pairs (2013 and 2010, though my group in 2010 only had 3 teams).

This give us further indication that if a team is thought to be better in the preseason, that thinking will not change throughout the season, given that the teams lose the same number of games.

Which Teams Are Able to Move Up?

We did see that voters change their minds on teams every now and then. After all, there were 28 discordant pairs. So which are these teams that are able to impress voters so much that they move ahead similar loss teams that were ranked ahead of them in the preseason? To answer this question, I’m going to show the individual plot again. But this time I’m going to add labels to the teams that moved up from their preseason ranking by at least 2 spots.

Individual Value Plot

Spoiler alert: Voters love the SEC! Five of the 6 teams that were able to improve their rank by at least 2 spots are from the SEC. In other words, the preseason rank for SEC teams does not matter. If they have a great season (much like Auburn did in 2013), they’ll be one of the top-ranked teams at the end of the year regardless of who was ranked ahead of them before the season.

Which Teams Get Passed Over?

Can we find a similar pattern with teams that start the season ranked high but then drop in the view of the voters eyes? Here is the individual value plot with teams labeled that fell at least 2 sports from preseason to postseason.

Individual Value Plot

It’s no surprise to see Boise State on this list, as they played in a small conference at the time. The year 2009 was incredibly unlucky for them. They started the season ranked a respectable 14th and went undefeated. However, they were only able to move up to 6th, as 4 other teams also went undefeated, two of which (TCU and Cincinnati) passed Boise State in the rankings along the way. To add injury to insult, a 1-loss Florida team (not included in the data) was also ranked ahead of undefeated Boise State.  

What is really surprising is what happened to USC. In 2007, they started the season ranked #1 overall in the AP Poll. They lost two games, but so did just about everybody else in college football that season, including LSU, who played in and won the BCS Championship Game. But LSU wasn’t the only team to pass USC in the rankings with the same number of losses. Oklahoma, Georgia, and Virginia Tech did too. For crying out loud...Virginia Tech!?!?

If that weren’t bad enough, the same thing happened in 2008. USC started the season ranked #3 in the AP Poll, and the #1 and #2 team would go on to lose multiple games. No BCS team went undefeated that year, and USC only lost a single game. So you would think their high preseason ranking would put them in one of the top 2 spots, and they would be playing the BCS title game. Except that’s not even close to what happened.

Despite all of them being ranked lower in the preseason and losing a game, Florida, Oklahoma, Texas, and Alabama all jumped USC in the rankings. 

The takeaway, I suppose, is don't play in a small conference.  And, uh, don't play on the west coast?

Will This Trend Carry Over to the College Football Playoff?

The selection committee won’t select the top 4 teams based on the AP poll. But the committee will be free to think on their own (like the AP voters do) instead of being locked into using certain criteria like the BCS did with using computers. It would be folly to think the committee doesn’t have any preconceived notion about which teams are the best, and I bet they are similar to the top ranked teams in the preseason AP Poll. And those notions may just decide who gets into the playoff!

For example, we know that the SEC Champion is getting in (even if they start the season ranked low). And as the preseason #1 team, even a 1-loss Florida State team is a safe bet to get in. That leaves the champion of the other 3 major BCS conferences (Big 10, Pac-12, and Big 12) left to fight over 2 spots.

Oregon and Oklahoma appear to be in the best spot, as they are ranked #3 and #4 in the preseason. If they win their conference, even a single loss shouldn’t hurt them assuming a Big 10 team doesn’t go undefeated. Meanwhile, to be on the safe side, Michigan State or Ohio State should hope one of those two teams ends up with more losses than them. If you’re coming out of nowhere to win one of those conferences, like maybe Nebraska, Kansas State, or Arizona State, you’re probably going to need some chaos to get in the playoffs. And considering there isn’t a single non-BCS school in the AP Top 25, I think it’s safe to say Cinderella isn’t going to make it to the ball this year.

Now, the committee claims that there will be a more deliberative evaluation system and that there could be volatile swings from week to week, with lower-ranked teams moving ahead of higher-ranked teams without either team losing. Whether that actually happens or not (especially with non-SEC teams) remains to be seen. So at the end of the season we’ll compare the final rankings of the committee to the preseason AP Poll. Until then, enjoy the games! Just know that the order for which teams get into the playoffs may already have been decided!

Why Are There No P Values for the Variables in Nonlinear Regression?

$
0
0

Previously, I showed why there is no R-squared for nonlinear regression. Anyone who uses nonlinear regression will also notice that there are no P values for the predictor variables. What’s going on?

Just like there are good reasons not to calculate R-squared for nonlinear regression, there are also good reasons not to calculate P values for the coefficients.

No P values in nonlinear regression

Why not—and what to use instead—are the subjects of this post!

Why are P values possible for linear regression?

This may be an unexpected question, but the best way to understand why there are no P values in nonlinear regression is to understand why you can calculate them in linear regression.

A linear regression model is a very restricted form of a regression equation. The equation is constrained to just one basic form. Each term must be either a constant or the product of a parameter and a predictor variable. The equation is constructed by adding the results for each term.

Response = constant + parameter * predictor + ... + parameter * predictor

Y = b o + b1X1 + b2X2 + ... +bkXk

Thanks to this consistent form, it’s possible to develop a hypothesis test for all parameter estimates (coefficients) in any linear regression equation. If you enter a coefficient of 0 into any term, and multiply it by the predictor value, the term always equals zero and indicates that the predictor variable does not affect the response value.

Given this consistency, it’s possible to set up the following hypothesis test for all parameters in all linear models:

  • H0: bi = 0
  • HA: bi <> 0

The p-value for each term in linear regression tests this null hypothesis. A low p-value (< 0.05) indicates that you have sufficient evidence to conclude that the coefficient does not equal zero. Changes in the predictor are associated with changes in the response variable.

How to interpret P values and coefficients in linear regression analysis

So, why are P values impossible in nonlinear regression?

While a linear model has one basic form, nonlinear equations can take many different forms. There are very few restrictions on how parameters can be used in a nonlinear model.

The upside is that this flexibility allows nonlinear regression to provide the most flexible curve-fitting functionality.

The downside is that the correct null hypothesis value for each parameter depends on the expectation function, the parameter's place in it, and the field of study. Because the expectation functions can be so wildly different, it’s impossible to create a single hypothesis test that works for all nonlinear models.

Instead of P values, Minitab can display a confidence interval for each parameter estimate. Use your knowledge of the subject area and expectation function to determine if this range is reasonable and if it indicates a significant effect.

To see examples of nonlinear functions, see What is the difference between linear and nonlinear regression equations?

Switch the Inner and Outer Categories on a Bar Chart

$
0
0

Did you just go shopping for school supplies? If you did, you’ve participated in what’s become the second biggest spending season of the year in the United States, according to the National Retail Federation (NRF). Kids running in backpacks

The trends and analysis are so interesting to the NRF that they actually add questions about back-to-school shopping to two monthly consumer surveys. The two surveys have different questions, but there’s one case where the allowed responses are the same. In July, the survey asked, “Where will you purchase back-to-school items this year?” In August, the survey asked, “Where do you anticipate you will do the remainder of your Back-to-School shopping?”

Did people give the same answers both times? Let’s use Minitab Statistical Software to find out. Doing so will give us a chance to see how easy it is to change the focus of a chart by switching the inner and outer categories on a bar chart.

Did people answer the same way in both surveys? Yes.

Let’s say that your data are in the same layout as the original NRF reports. Each row contains the percentage for a different location. I put the dates in two different columns because the numbers came from two different PDF files (July and August).

Percentages of people who said that they would shop at each location.

Making a bar chart in Minitab is easy, so follow along if you like:

  1. Choose Graph > Bar Chart.
  2. In Bars represent, select Values from a table.
  3. Under Two-way table, select Cluster. Click OK.
  4. In Graph variables, enter '7/1 to 7/8 2014' '8/5 to 8/12 2014'
  5. In Row labels, enter 'Where will you purchase?' Click OK.

From this display, you can quickly determine that the order of the categories is the same in each survey. In both cases, most consumers plan to shop the most at discount stores and the least from catalogs. In fact, the popularity of where consumers planned to shop and where they planned to finish shopping has a constant order.

With month outermost, you can see that the popularity of the categories is the same in both surveys.

Did people answer the same way in both surveys? No.

The order of popularity might not be all that you want to know from this data. Minitab makes it easy for you to get another view of the data. You can quickly switch which category is inner and which is the outer category.

  1. Press CTRL + E.
  2. In Table arrangement, select Rows are outermost categories and columns are innermost. Click OK.
  3. Double-click one of the bars in the graph.
  4. Select the Groups tab. Check Assign attributes by graph variables. Click OK.
  5. Double-click one of the category labels on the bottom of the graph.
  6. Select the Show tab. In the Show Labels By Scale Leveltable, uncheck Tick labels for Graph variables. Click OK.

With categories outermost, you can see which locations have the biggest change between the two surveys.

In this display, you can easily see the change for each location between the two questions. For every location, the number of people who reported that they planned to shop there on the first survey is higher than the number who planned to finish shopping there on the second survey.

This result seems reasonable. One possible explanation is that people finished their shopping at some locations. In terms of the difference in the percentages, those who plan to shop for school items at clothing stores and electronics stores changed the most. Customers who finished shopping at a location seem to have finished at those types of locations the earliest.

Wrap up

When you’re looking at data, discovering what’s important often involves looking at the data from more than one perspective. Fortunately, Minitab’s bar chart makes it easy for you to change the focus of the categories so that you can dig deeper, faster. It’s nice to know that the information that you need is so readily available!

Bonus

I set up my data as values from a table today. Want to see what the other two options do? Check out Choosing what the bars in your chart represent!

The image of the children running in backpacks is from healthinhandkelowna.blogspot.com and is licensed under this Creative Commons License.

Analysis and Reanalysis: The Controversy Behind MMR Vaccinations and Autism, part 1

$
0
0

The other day I received a request from a friend to look into a new study in a peer reviewed journal that found a link between MMR vaccinations and an increased risk of autism in African Americans boys. To draw this conclusion, the new study reanalyzed data that was discarded a decade ago by a previous study.

syringeMy friend wanted to know, from a statistical perspective, was it unethical for the original researchers to discard the data for African Americans? Did they set scientific knowledge back a decade?

To answer these questions, I looked at the 2004 study, their decision to discard the data, and the reanalysis of this data in 2014.

The Scenario

I clicked the link that my friend sent and saw this headline: “Fraud at the CDC uncovered, 340% risk of autism hidden from public.” Yikes! Apparently variations on this story are making the rounds via social media. Here is the gist of the story.

In 2004, a major study by DeStefano et al.1 determined that there was no connection between the timing of measles-mumps-rubella (MMR) vaccinations and autism. This study was conducted by researchers at the Center for Disease Control (CDC). The CDC presents this study as a key piece of evidence in the debate about vaccinations and autism.

In 2014, Brian S. Hooker2 reanalyzed the 2004 data and found a possible link between MMR vaccinations and the risk of developing autism for a very specific group. African American boys who had their first MMR vaccination between 24 and 35 months of age were 3.4 times more likely to develop autism.

The fraud allegation is made by anti-vaccination groups who say that the CDC applied political pressure to discard the data in order to high these explosive findings.

My Perspective

I’m very aware that this specific case is just a part of a larger, very controversial issue. For this post, I assess the statistical validity of these two studies, and the decision to discard data.

I’m in favor of reanalyzing previous studies. I think it’s great if a later researcher can draw more or better conclusions from a data set. However, all studies must be statistically valid, or their results can be misleading.

Observational Studies of Vaccination and Autism

The original dataset was collected for an observational study. Because both studies use the same data set, they’re both observational studies. To evaluate the validity of the two studies, we need to understand the differences between observational studies and randomized control trials (RCTs),

In an RCT, all subjects are randomly assigned to the treatment and control groups. This process helps assure that the groups are similar to each other when treatment begins. Therefore, any post-study differences between groups are probably due to the treatment rather than prior differences.

For observational studies, there is no random assignment, which increases the chances that the treatment and control groups are not equivalent at the beginning of the study. Consequently, differences in the outcome at the end could be due to the preexisting differences (confounding variables). If the analysis does not account for the confounding factors, the results will be biased.

These issues are crucial in evaluating these two studies. For more information, read my posts about confounding variables, random assignment, and an in depth look at an observational study about the effects of vitamins.

DeStefano et al. (2004)

In one corner we have the original DeStefano (2004) study that agrees with the scientific consensus and concludes there is no connection between MMR vaccinations and autism risk. Researchers compare 624 children with autism, age 3 to 10, with 1,824 developmentally healthy children. Most of the children were vaccinated between 12 and 17 months of age in accordance with vaccination recommendations.

The first thing that I noticed is that DeStafano et al. did not actually exclude data as the critics claim. The claim is that the study discarded data from the subjects who could not be linked to birth certificates in order to hide the findings about African-American boys.

There are three issues to consider here:

One, the non-birth-certificate group was not based on race, but on whether the subject was matched to a Georgia birth certificate. African-American kids were actually underrepresented in the non-birth-certificate group. The birth certificate group was made up of 37.9% African Americans while the supposedly excluded non-birth-certificate group contained only 32.2%.

Two, while making the distinction between kids based on birth certificates may seem arbitrary, there are good data reasons for doing so. For those kids who could be matched to Georgia birth certificates, the researchers were able to gather more complete information on possible risk factors for autism, such as the child’s birth weight, mother’s age, and education. This information was not available for children who could not be matched to Georgia birth certificates.

Because this is an observational study, it’s crucial to statistically control for these risk factors. Just having more data isn't always a good thing. You need to record and analyze the potential confounders.

Three, the DeStafano study did not exclude the kids who were not linked to birth certificates. Instead, the study performed an analysis on the full sample (subjects linked and not linked to birth certificates) and a separate analysis of those linked to a birth certificate. So, no data was actually excluded. The results of the two analyses were in agreement.

The study found that some of the additional variables in the birth certificate sample are potential confounders because they are associated with autism risk. The risk factor that becomes important for this post is that low birth weight is associated with a higher risk of autism. We come back to that in the Hooker study.

Using regression analysis, the study concluded that the timing of MMR vaccinations is not associated with the risk of developing autism.

My Verdict on the DeStefano Study

The criticism that the study discarded data from African American subjects just doesn’t hold water. No data was discarded. For the subjects who were linked to birth certificates, the researchers performed additional analyses. In this light, I see a careful observational study that assessed the role of potential confounders.

The biggest weakness that I see for this study is that the researchers could not compare subjects who were vaccinated for MMR to those who were not vaccinated at all. The authors wrote that they “lacked an unvaccinated comparison group.” The truth is that the vast majority of kids are vaccinated. Consequently, this study compared the distribution of vaccination ages for case and control children to see if the timing impacted the risk of autism. It didn’t.

DeStafano et al. cite a large Dutch study that did include an unvaccinated group of nearly 100,000 subjects. This study found no increased risk for those who were vaccinated compared to the completely unvaccinated group.

Stay tuned!  Tomorrow I’ll take a look at Hooker’s reanalysis of the DeStafano data.

1. DeStefano, Frank, Tanya Karapurkar Bhasin, William W. Thompson, Marshalyn Yeargin-Allsopp, and Coleen Boyle, Age at First Measles-Mumps-Rubella Vaccination in Children with Autism and School-Matched Control Subjects: A Population-Based Study in Metropolitan Atlanta, Pediatrics, 2004;113;259

2. Hooker, Brian S., Measles-mumps-rubella vaccination timing and autism among young African American boys: a reanalysis of CDC data, Translational Neurodegeneration 2014, 3:16

Analysis and Reanalysis: The Controversy Behind MMR Vaccinations and Autism, part 2

$
0
0

In my previous post, I described how I was asked to weigh in on the ethics of researchers (DeStefano et al. 2004) who reportedly discarded data and potentially set scientific knowledge back a decade. I assessed the study in question and found that no data was discarded and that the researchers used good statistical practices.

Scientist at workIn this post, I assess a study by Brian S. Hooker that was recently published with a bang of many social media stories that accompanied it. Hooker reanalyzed the DeStefano data and concluded that certain African American boys have a 340% increased risk of developing autism after receiving the MMR vaccination.

The Scenario

After the study by DeStefano et al. was complete, the raw data was made available for other scientists to use. Hooker reanalyzed the data after being contacted by William Thompson, one of the authors in the original study. Thompson is a senior scientist at the CDC. Hooker’s study was published in the peer reviewed journal, Translational Neurodegeneration.

Thompson gave this statement to CNN: "I regret that my co-authors and I omitted statistically significant information in our 2004 article. I have had many discussions with Dr. Brian Hooker over the last 10 months regarding studies the CDC has carried out regarding vaccines and neurodevelopmental outcomes, including autism spectrum disorders. I share his belief that CDC decision-making and analyses should be transparent."

Brian Hooker and the social media stories allege that the original researchers deliberately excluded subjects to hide the increased risk for African American boys.

I personally don’t buy this theory because the DeStafano study analyzed the data from all subjects. The study compared the full results to a subanalysis of just the birth certificate sample, which had more complete data that included potential confounding variables. The two analyses agreed that the timing of the MMR vaccination did not affect the risk of developing autism.

Hooker is a scientific adviser for the Focus Autism Foundation, which believes that vaccines have helped cause an autism epidemic. He also has a 16-year-old son whom he describes as "vaccine-injured."

Hooker (2014)

Hooker used the DeStafano data. Where the two studies performed the same analyses, the results were the same. However, in general, Hooker used the data in a different manner and performed different analyses than DeStefano.

To derive his most startling conclusion, Hooker splits the data into two mutually exclusive groups:

  • African-American children
  • All children except African-American children

He then uses Chi-square analysis to look for an association between vaccination timing, gender, and the risk of autism within each of these groups. There are no significant results in the non-African American group at all.

Within the African American group, the only significant results are for the boys. The significant results are for African American boys vaccinated between:

  • 18 and 23 months, who have a relative risk of 1.73
  • 24 and 35 months, who have a relative risk of 3.36

The latter group is the one mentioned in the headlines that cite a 340% increased risk.

My Verdict on the Hooker Study

On the surface, it looks like there might be something to this study. However, there are problems lurking beneath the surface. The two major problems I see with Hooker’s study are:

  • The full data set has been sliced and diced into a small, biased sample.
  • Low birth weight is an uncontrolled, confounding variable

Sample size

To obtain the results, Hooker has to exclude data for all subjects except for the African American subjects. Even then, only African-American boys vaccinated at specific times were statistically significant.

Let’s look at the number of cases in the subgroup behind the shocking result at the heart of those social media stories and allegations of fraud—African American boys vaccinated between 24 and 35 months.

Consider the following:

  • 70% of the sample was vaccinated at less than 18 months because the guideline is to get the MMR vaccination between 12 and 17 months. There were no significant results in this group.
  • Only 7% of the sample was vaccinated during the time frame highlighted in the Hooker study.
  • Further sample reductions are necessary because we’re only assessing African American kids (37% of the sample), who are boys (80%).

While neither study lists the number of autism cases for this super-specific sub-population, using the percentages and the number of cases, I can estimate that the shocking news of a “340% increase” is based on about 13 cases of autism!

This tiny sample size explains why the confidence interval, which measures the precision of the estimated risk, is so wide [1.5 to 7.51]. In this context, a 1 indicates no increase in risk, which is barely excluded from the CI.

Uncontrolled confounding variable

DeStafano used regression analysis to assess and control the effects of potential confounders. With regression analysis, he could study the effect of the various predictors (e.g., race, gender, birth weight) without having to subdivide the data. Instead, he included the predictors in the model to estimate the effects within the context of the full sample.

Hooker used Chi-squared analysis which cannot control for these confounders. That’s a huge problem for an observational study.

As for confounding variables, DeStafano found that low birth weights are associated with an increased risk of autism. In the original study, this and other potential confounders didn’t influence the uncontrolled results because they were evenly distributed across the groups of data. However, Hooker’s study sliced and diced the data so much that we can’t make this assumption.

Hooker wrote this about his highly pared down dataset: “It was found that there was a higher proportion of low birth weight African-Americans compared to the entire cohort.”

Hooker has a small data set in which a known confounder (low birth weight) is over-represented. Other studies have estimated that low birth weights can increase the risk of autism by 5 times! Because Hooker’s analysis does not control for this factor, we must assume that the estimated risk of autism is positively biased for this group. In other words, the estimated relative risk of 3.36 is likely higher than the true amount.

Thanks to the tiny sample and the uncontrolled confounding variable, Hooker’s results are both imprecise and biased. Consequently, my personal opinion is that Hooker’s results have no scientific value at all.

It turns out that Translational Neurodegeneration, the journal that published Hooker’s study, is having similar thoughts as well. During the course of writing this post, the journal has removed the article from its website and stated:

"This article has been removed from the public domain because of serious concerns about the validity of its conclusions. The journal and publisher believe that its continued availability may not be in the public interest. Definitive editorial action will be pending further investigation."

Closing Thoughts

In a previous post, Why Statistics is Important, I wrote that how you perform statistics matters, and there are many potential pitfalls. Dissecting the problems in Hooker’s study is the perfect illustration of this!

The CDC has stated that not all risk factors are known and further study is required. Because we’re dealing with our children, an abundance of caution is required. Therefore, it is worthwhile to continue to investigate the possible risk factors for autism. Even though I don’t think Hooker’s study has scientific merit, if race is a potential a risk factor, we should study it further.

Nova program, vaccines calling the shotsAs for the larger debate of vaccinations and autism, the consensus of the scientific literature overwhelmingly supports the view that vaccinations do not increase the risk of autism. This finding is evident in many studies that assess vaccinations and the eventual diagnoses of autism. Other studies have found that autism starts in utero, long before a child is given the MMR or any other vaccinations.

Nova, the science television series on PBS in the U.S., will air a new documentary, Vaccines - Calling the Shots, on Wednesday, Sept. 10, 2014 at 9 p.m. EDT. Diseases that were largely eradicated are returning.

Why Kurtosis is Like Liposuction. And Why it Matters.

$
0
0

The word kurtosis sounds like a painful, festering disease of the gums. But the term actually describes the shape of a data distribution.

Frequently, you'll see kurtosis defined as how sharply "peaked" the data are. The three main types of kurtosis are shown below.

Lepto means "thin" or "slender" in Greek. In leptokurtosis, the kurtosis value is high.

Platy means "broad" or "flat"—as in duck-billed platypus. In platykurtosis, the kurtosis value is low.

 

Meso means "middle" or "between." The normal distribution is mesokurtic.

Mesokurtosis can be defined with a value of 0 (called its "excess" kurtosis value). Using that benchmark, leptokurtic distributions have positive kurtosis values and platykurtic distributions have negative kurtosis values.

Question: Which type of kurtosis correctly describes each of the three distributions (blue, red, yellow) shown below?

Answer: All three distributions are examples of mesokurtosis. They're all normal distributions. The (excess) kurtosis value is 0 for each distribution.

OK, that was a mean trick question. You can roast me in the comments field. But it had a good intention—to illustrate some common misconceptions about kurtosis.

Each normal distribution shown above has a different variance. Different variances can appear to change the "peakedness" of a given distribution when they're displayed together along the same scale. But that's not the same thing as kurtosis.

Think of Kurtosis Like Liposuction

In Nature, there's no such thing as a free lunch—literally. Research suggests that fat that's liposuctioned from one part of the body all returns within a year. It just moves to a different place in the body.

Something similar happens with kurtosis. The clearest way to see this is to compare probability distribution plots for distributions with the same variance but with different kurtosis values. Here's an example.

The solid blue line shows the normal distribution (excess kurtosis ≈ 0). That's the body before liposuction. The dotted red line shows a leptokurtic distribution (excess kurtosis ≈  5.6) with the same variance. That's the body one year after liposuction.

The arrows show where the fat (the data) moves after being "sucked out" from the sides of the normal distribution. The blue arrows show that some data shifts toward the center, giving the leptokurtic distribution its characteristic sharp, thin peak.

But that's not where all the data goes. Notice the data that relocates to the extreme tails of the distribution, as shown by the red arrows.

So the leptokurtic distribution has a thinner, sharper peak, but also—very importantly— "fatter" tails.

Conversely, here's how "liposuction" of the normal distribution results in platykurtosis (excess kurtosis ≈ - 0.83).

 

Here, data from the peak and from the tails of the normal distribution is redistributed to the sides. This gives the platykurtic distribution its blunter, broader peak, but-— very importantly— its thinner tails.

In fact, kurtosis is actually more influenced by data in the tails of the distribution than data in the center of a distribution. It's really a measure of how heavy the tails of a distribution are relative to its variance. That is, how much the variation in the data is due to extreme values located further away from the mean.

Why Does It Matter?

Consider the three normal distributions that appeared to mimic different types of kurtosis, when in fact they had the same kurtosis, just different variances.

For each of these distributions, the same percentage of data falls within a given number of standard deviations from the mean. That is, for all three distributions, approximately 68.2% of observations are within +/- 1 standard deviation of the mean; 95.4% are within +/- 2 standards deviations of the mean; and 99.7% are within +/- 3 standard deviations of the mean.

What would happen if you tried to use this same rubric on a distribution that was extremely leptokurtic or platykurtic? You'd make some serious estimation errors due to the fatter (or thinner tails) associated with kurtosis.

You Could Lose All Your Money, Too.

In fact, something like that appears to have happened in the financial markets in the late 90s according to Wikipedia. Some hedge funds underestimated the risk of variables with high kurtosis (leptokurtosis). In other words, their models didn't take into consideration the likelihood of data located in the "fatter" extreme tails—which was associated with greater volatility and higher risk. The end result? The hedge funds went belly up and needed bailing out.

I don't have a background in financial modelling, so I can't verify that claim. But it wouldn't surprise me.

If you click on the following link to Investopedia, you'll see a definition of high kurtosis as "a low, even distribution" with fat tails. Fat tails, yes. But "low and even"?

Hmmm. I hope the investment firm managing my 401K isn't using that definition.

If so, it might be time to move my money into an investment vehicle with a much lower kurtosis risk. Like my mattress.

Attendance Awareness Month: A Graphic Look at the Data

$
0
0

You might not have known, but September is Attendance Awareness Month. Specifically, attendance of children at American public schools. The organization Attendance Works recently came out with a report that highlights the learning gap between students with strong attendance and students with poor attendance.

Statistical software helps us quickly and easily create graphs that make it easier to understand what a set of data is telling us. To encourage everyone to recognize the importance of school attendance, here are some Minitab graphs of data shared by Attendance Works and other researchers that the authors cite. 

Attendance's association with academic performance

The main point of the Attendance Works report is to point out the association between attendance and the National Assessment for Education Progress (NAEP).

Students who routinely miss school, tend to score more than one grade level below their peers on math and reading assessments in 4th and 8th grade. It looks like missing school makes academic achievement harder.

Students who miss 3 or more school days in a month tend to be a grade level behind their peers in math and english test scores.

Well-begun is half done

Data from California in 2011 shows that being chronically absent in kindergarten and first grade has a strong association with whether a child is a proficient reader in 3rd grade. Only 17% of students who were chronically absent in kindergarten and first grade achieved a proficient score on a test of the English language given in 3rd grade. Of students with better attendance records, 64% achieved a proficient score.

64% of students who were not chronically absent in kindergarten and first grade were proficient on exams in 3rd grade. 17% of students who were chronically absent were proficient.

Not just a problem in high school

Data from students in Rhode Island indicate that even when chronically absent students can graduate from high school, they will likely struggle to continue their educations successfully. Only 11% of chronically absent students went on to a second year of college education, giving them a chance to finish an associate’s degree. Over 50% of students who were not chronically absent in high school were able to begin a second year of college.

Over 50% of students with good attendnace began a second year of college. Only 11% of students who were chronically absent in high school began a second year of college.

Go to school

I’ve been fooled by the difference between correlation and causation before. It’s entirely possible that other factors could impede a child’s learning whether they attend school properly or not.

Attendance Works identifies that some potential causes for chronic absenteeism might include lack of access to health care, community violence, unreliable transportation and unstable housing. These problems could also be responsible for poor school performance. But we should be on the lookout for absenteeism as a symptom that other problems need to be addressed and take the initiative to help solve those problems.

For tools for people from all levels of involvement, from parents to city leaders, go to http://www.attendanceworks.org/tools/

Bonus

For Attendance Awareness Month 2013, the Oakland County School district made a music video featuring Marshawn Lynch. Check that out!


Not Getting a No-Hitter? Statistically Speaking, the Best Bet Ever

$
0
0

The no-hitter is one of the most impressive feats in baseball. It’s no easy task to face more than 27 batters without letting one of them get a hit. So naturally, no-hitters don’t occur very often. In fact, since 1900 there has been an average of only about 2 no-hitters per year.

But what if you had the opportunity to bet that one wouldn’t occur?

That’s exactly what happened to sportswriter C. Trent Rosecrans. He had a friend who kept insisting the Reds would be no-hit his season. And with 24 games left in the season, the friend put his money where his mouth is, betting Mr. Rosecrans $5 that the Reds would be no-hit by the end of the year.

Even if the Reds do have one of the worst hitting percentages in baseball, would you take the bet that in 24 games there won’t be an event that occurs only twice in an entire year?

Sounds like a no-brainer.

Calculating the odds

Back in 2012, I calculated that the odds of throwing a no-hitter were approximately 1 in 1,548. If you update that number to include all the games and no-hitters that have occurred since 2012, the odds become 1 in 1,562. The numbers are very similar, but we’ll use the latter since it incorporates more data.

So there is a 99.936% chance that a no-hitter does not occur in any single game. But the bet was that it wouldn’t occur in 24 games. What are Mr. Rosencrans' chances of winning the bet?

24 games without a no-hitter = .99936^24 = .98475 = approximately 98.475%

I wish I could make bets with a winning percentage that was that high! For Mr. Rosecrans, 98.475% of the time he’ll win $5, and 1.525% of the time he’ll lose $5. For his friend, the opposite is true. We can use these numbers to calculate the expected value for each side of the bet.

Reds don’t get no-hit: (0.98475*5) – (0.01525*5) = $4.85

Reds get no-hit: (.01525*5) – (0.98475*5) = -$4.85

Making it a fair bet

Obviously this was just a friendly wager and was not meant to be taken too seriously. If Mr. Rosecrans regularly made bets with expected values close to $5 with all of his friends, he probably wouldn’t have many left. But what if he wanted to be a nice friend? How much money should he have offered in return to make it a fair bet?  We’ll simply set the expected value to 0 and solve for the amount of money he’d lose the 1.525% of the time the Reds were no-hit.

0 = (0.98475*5) – (0.01525*X)

0.01525*X = 4.92375

X = $322.87

To make the bet fair, Mr. Rosecrans should offer to pay his friend $322.87 if the Reds get no-hit. And earlier this week the Reds didn’t get their first hit until the 8th inning. Imagine sweating out that game if you had over $300 on the line!

Adjusting for the Reds

One of the reasons the friend bet on the Reds to be no-hit was that they are one of the worst-hitting teams in their league. Their batting average of 0.238 is ranked 28th in baseball. That means, on average, a Reds batter won’t hit the ball 76.2% of the time. So if a pitcher wanted to no-hit the Reds, they would need to face at least 27 batters who didn’t get a hit.

Probability of having 27 straight batters not have a hit = 0.762^27 = 0.00065 = approx. 1 in 1,539

But remember, just because a batter doesn’t get a hit does not mean they’re out. They can get walked, hit by a pitch, or reach on an error. Unless they pitch a perfect game, the pitcher will face more than 27 batters. Let’s look how the probability changes as we increase the number of Reds batters that the pitcher must face without allowing a hit.

Probability of having 28 straight batters not have a hit = 0.762^28 = approx. 1 in 2,020

Probability of having 29 straight batters not have a hit = 0.762^29 = approx. 1 in 2,650

Probability of having 30 straight batters not have a hit = 0.762^30 = approx. 1 in 3,478

Probability of having 31 straight batters not have a hit = 0.762^31 = approx. 1 in 4,565

This was supposed to show that because they are a poor-hitting team, the Reds have a better chance of being no-hit than the average used above. But as you can see, that’s not the case at all. Despite being one of the worst-hitting teams in the league, it appears that it’s harder to no-hit the Reds than the historical average.

Things get even odder when you consider that the average batting average (according to Baseball-Reference.com) is 0.263. Using that number, the odds of having 27 straight batters not have a hit is 1 in 3,788. And those odds drop as you increase the number of batters the pitcher has to face. Applying this probability to the number of games played since 1900, we would expect there to be fewer than 100 no-hitters. And how many have there been? 241!

This is the same conundrum I encountered when finding the odds of throwing a perfect game. The number of perfect games and no-hitters that have occurred is much higher than what we would expect based on historical batting statistics. One explanation could be pitching from the wind-up vs. the stretch. With no runners on base (which is always the case in a perfect game and often the case in a no-hitter), the pitcher can always throw from the wind-up. Assuming pitchers are better when pitching from the wind-up, this would result in a lower batting average than normal, thus explaining the higher number of perfect games and no hitters. This would make for a great analysis using Minitab Statistical Software, but since we can’t separate the data on hand into at bats facing pitchers throwing from the stretch vs. the wind-up, we can't test the theory.

Since the Reds have a batting average .025 points lower than the historical average, it’s probably safe to assume they do in fact have a greater chance of being no-hit. The problem is that it’s nearly impossible to quantify how much greater!

Looking ahead to next year

With the season almost over, it’s unlikely the Reds will be no-hit this year. But what if the two friends decided to do their bet again next year, only this time, they do it at the start of the season. Let’s use our original probability of throwing a no hitter (the one we’ve observed) and determine what the odds are that the Reds go 162 games getting at least one hit per game.

162 games without a no-hitter = .99936^162 = .9015 = approximately 90.15%

The probability of the Reds getting no-hit is still pretty low, but it’s a lot better than the current bet. I just hope next year the friend gets some better odds than even money!

How Politicians and Governments Could Benefit from Statistical Analyses

$
0
0

Using statistical techniques to optimize manufacturing processes is quite common now, but using the same approach on social topics is still an innovative approach. For example, if our objective is to improve student academic performances, should we increase teachers wages or would it be better to reduce the number of students in a class?

Many social topics (the effect of increasing the minimum wage on employment, etc.) generate long and passionate discussions in the media and in politics. People express very different and subjective points of views according to political/ideological opinions and varied ways of thinking.

Hypothesis Testing in the Policy Realm

Social experimentation and data analysis can provide a firmer ground on which we can base more objective decisions.

The objective is to investigate the effects of a policy intervention and to test specific hypotheses. In these social experiments “randomization” is a key element. If one policy option is tested in, say, the Netherlands, and another policy option is tested in France, the experimenter will never be in a position to fully understand whether a difference in outcomes is due to the intervention itself or to the many other differences between these two countries.

It would clearly be preferable to test the two approaches in different regions of France and of the Netherlands, for example, and assign the policy intervention in a random way to a “treatment” group (individuals who receive it) and a “comparison” group (individuals who do not receive it).

At the beginning of the study, the “treatment” and the “control” groups should be as similar as possible to prevent any systematic previous bias. The objective is not to “observe” differences but to identify the actual causal effects.

Designed Experiment Techniques

Other techniques that are often used in designed experiments (DOEs) may also be useful in this context, such as blocking and balancing. In my example, France and the Netherlands might be considered as a blocking factor (an external extra factor which the experimenter cannot control), and the tests should be “balanced” across blocks so that the treatment effect estimates are not biased and the blocking effects of the countries are neutralized. Other potential blocking factors in policy studies might be urban versus rural regions, or females versus males.

Examples of Policy Experiments

Data analysis and statistics have been used to inform several important policy debates around the world over the past few years.  Here are a few examples:

- In Kenya, a social experiment showed that neither hiring extra teachers to reduce class sizes in schools nor providing more textbooks to pupils had much effect on academic performances. A surprising finding of this study was that deworming (intestinal worms) programs were very effective in decreasing child absenteeism.

- In the U.S, a full factorial design (DOE) was used to assess the effectiveness of commitment contracts. The objective of these contracts was to encourage individuals to exercise more in order to reduce health risks and prevent obesity. The effects of different factors such as duration of the physical exercises, their frequency and financial stakes were studied. The outcome was the likelihood of accepting such a contract.

- Different strategies to quit smoking based on commitment contracts have been tested using a randomized experimental approach.

- In France, a social experiment was conducted to compare different job-counselling strategies for placing young unemployed people. The studied outcome was the probability to find a job.

Conclusion

Experiments make it possible to vary one factor at a time, but a more effective approach is to modify several factors for each test using proper designs of experiments. Expertise in setting up randomized field experiments to test economic hypotheses is clearly a key factor.

Experimental results are often surprising, therefore experimentation and data analysis are potentially new and powerful tools in the arsenal of politicians and governments.

Here are sources of more information about the examples I've mentioned :

Miguel, Edward and Michael Kremer (2004). “Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities,” Econometrica, Volume72 (1), pp. 159-217.

Gine, Xavier, Dean Karlan and Jonathan Zinman (2008). “Put Your Money Where Your Butt Is: A Commitment Savings Account for Smoking Cessation,” MIMEO, Yale University.

http://www.voxeu.org/article/job-placement-and-displacement-evidence-randomised-experiment

Using Nudges in Exercise Commitment Contracts : http://www.nber.org/bah/2011no1/w16624.html

 

Using Before/After Control Charts to Assess a Car’s Gas Mileage

$
0
0

Keeping your vehicle fueled up is expensive. Maximizing the miles you get per gallon of fuel saves money and helps the environment, too. 

But knowing if you're getting good mileage requires some data analysis, which gives us a good opportunity to apply one of the common tools used in Six Sigma -- the I-MR (individuals and moving range) control chart to daily life.   

Finding Trends or Unusual Variation

Looking at your vehicle’s MPG data lets you see if your mileage is holding steady, declining, or rising over time. This data can also reveal unusual variation that might indicate a problem you need to fix.

Here's a simulated data set that collects 3 years’ worth of gas mileage records for a car that should get an average of 20 miles per gallon, according to the manufacturer’s estimates. However, the owner didn’t do any vehicle maintenance for the first two years he owned the car. This year, though, he’s diligently performed recommended maintenance.

How does his mileage measure up? And has his attention to maintenance in the past 12 months affected his car’s fuel economy?  Let’s find out with the Assistant in Minitab Statistical Software.

Creating a Control Chart that Accounts for Process Changes

To create the most meaningful chart, we need to recall that a major change in how the vehicle is handled took place during the time the data were collected. The owner bought the car three years ago, but he’s only done the recommended maintenance in the last year.

Since the data were collected both before and after this change, we want to account for it in the analysis.

The easiest way to handle this is to choose Assistant > Before/After Control Charts… to create a chart that makes it easy to see how the change affected both the mean and variance in the process.

If you're following along with Minitab, the Maint column in the worksheet notes which MPG measurements were taken before and after DeWaggen started paying attention to maintenance. Complete the Before/After I-MR Chart dialog box as shown below:

Interpreting the Results of Your Data Analysis

After you press OK, the Assistant produces a Diagnostic Report with detailed information about the analysis, as well as a Report Card, which provides guidance on how to interpret the results and flags potential problems. In this case, there are no concerns with the process mean and variation.

The Assistant's Summary Report gives you the bottom-line results of the analysis.

The Moving Range chart, shown in the lower portion of the graph, illustrates the moving range of the data. It shows that while the upper and lower control limits have shifted, the difference in variation before and after the change is not statistically significant.

However, the car’s mean mileage, which is shown in the Individual Value chart displayed at the top of the graph, has seen a statistically significant change, moving from 19.12 MPG to just under 21 MPG. 

Easy Creation of Control Charts

Control charts have been used in statistical process control for decades, and are among the most commonly accessed tools available in statistical software packages. The Assistant has made it particularly easy for anyone to create and see whether or not a process is within control limits, to confirm that observation statistically, and to see whether or not a change in the process results in a change in the process outcome or variation.

As for the data we used in this example, whether or not a 2 mile-per-gallon increase in fuel economy is practically as well as statistically significant could be debated. But since the price of fuel rarely falls, we recommend that the owner of this vehicle continue to keep it tuned up!

Congratulations! It's an Area Graph!

$
0
0

"He looks just like his father...and mother!"

Popular morphing sites online let you visualize the hypothetical offspring of some very unlikely couples.

The baby of Albert Einstein and Kim Kardashian (Kimbert?) would presumably look something like the image shown at right.

What happens if you morph the features of two different graphs?

For example, what would the baby of a time series plot and a stacked bar chart look like? 

"Preposterous!" you say? I'd argue that the two make a very compatible match.

Take a Time Series Plot...

The time series plot (Graph > Time Series Plot) is very predictable, but a bit old-fashioned and conventional. Although it has its ups and downs, it doggedly plots one point after the next, obediently following seasonal patterns and trends.

In fact, it's so predictable, you can practically forecast its future.

Cross it With a Stacked Bar Chart....

The stacked bar chart (Graph > Bar Chart > Stack ) enjoys a zesty, colorful existence. It has a snappy way of summing up data in categories. But it lacks a certain sense of continuity.

What Do You Get?

If a time series plot and a stacked bar chart had a baby, it'd look like this.

Recognize it? This graphical offspring is also known as an area graph (Graph > Area Graph). It combines the best features of its proud parents: The ability to plot individual points for each group over time to see trends, while summarizing the cumulative effect of all the groups.

When Should You Use an Area Graph? 

This type of graph can be useful in many applications. For example, suppose you want to track the number of customer complaints for a chain of stores. The area graph allows you to simultaneously track complaints at each store location while summarizing the total number of complaints at all stores. You might discover that while complaints are increasing at certain locations, the overall number of complaints is decreasing.

Caution: Always be sure to interpret each subsequent boundary line in an area graph as defining the sum of the categories below it, not as the individual values for a single group. The individual value for each group is represented by the "height" of each color at points along the time scale, as on a stacked bar chart).

When Is an Area Graph Not a Good Choice?

As the baby grows up, the parents soon learn it's not as perfect as they first thought. Sadly, the area graph has the exact same shortcomings as mom and dad. That means in situations when a time series plot or stacked bar chart is not appropriate for data, neither is an area graph. 

  • When the time intervals are not equally spaced



Thetime intervals on the time series plot and the area graph must be equal or the graph will be misleading. In this example, adding the data for 2012 creates a 2-year interval rather than a 10-year interval. That makes it appear that the total amount of packaging waste has been holding steady over the last ten-year period.  (This is actual data from the EPA, by the way. I hope the value in 2020 will be even lower than that for 2012!)

  • When the cumulative sum doesn't make sense

The EPA tracks the percentage of each packaging material that is recycled from U.S. municipal waste. For each material, the recycling percentage is increasing over time. (Plastic still has a long way to go!)

It's interesting data, but displaying these percentages cumulatively on a stacked bar chart or area graph doesn't make sense. Here's why.

Examine the Y axis. The percentages don't add up to 100%. That's because the percentage for each packaging material is calculated using a different "whole." For example, in 2000, 52% of paper packaging was recycled and  58.9% of steel packaging was recycled. But you can't add those percentages together to claim that 110.9% of paper and steel packaging was recycled in 2000!

Meet the Newest Arrivals

Once you start thinking about data displays as morphs, you start to see them everywhere. For example, Release 17 of Minitab Statistical Software now includes a marginal plot (Graph > Marginal Plot). The marginal plot offers three display options. Recognize the "parents" of each one?

Picking baby names can be tough. Personally, I’m leaning toward Scattergram, Scatterbox, and Scatterdot.

How Painful Does the Income Gap Look to You?

$
0
0

I’m always on the lookout for statistical news, so I was excited by the recent Matt Phillips article on Quartz titled Painfully, American families are learning the difference between median and mean. Phillips' allegory about Warren Buffet walking into a skid row bar makes a nice illustration of the statistical question about how outliers affect the mean. (If you want more on the mean and median, we showed you how to spot when the mean can mislead with Michael Jordan in 2012.)

How painful does the income gap look to you? Rather than do more on the mean and the median, I’m going to explore the income trends that Phillips graphs to illustrate ways that you can edit scatterplots in Minitab. Graphs are powerful tools for explaining the data, and the graphs that Minitab produces by default are usually excellent. However, the duty of the analyst is to make sure that graphs help people make the best decisions possible. Minitab provides numerous tools that make it fast and easy for you to clarify the meaning in your data.

The data that Phillips uses are from the Federal Reserve’s 2013 Survey of Consumer Finances. All amounts have been converted to 2013 dollars so that changes are not due to inflation.

What makes this graph painful

The first graph that Phillips shows in the article displays the trend in what’s happening to the mean household income and the median household income over time. When I recreate it from the tables the Federal Reserve published in an Excel file, it looks like this:

In the last 3 years, mean income increased while median income decreased.

Phillips echoes the same point that the authors of the Federal Reserve’s report do. The growth in average income since the 2010 survey is largely due to increases in the highest 10 percent of incomes. The decline in the median at the same time suggests that typical Americans are not doing as well.

Easing the pain

If all we wanted to do was make the situation look less dire on a graph, Minitab provides some easy-to-use features that can change the message of this graph.

Edit the scale
  1. Double-click the numbers on the y-scale
  2. In Scale Range, Uncheck Auto.
  3. Enter new values.

By default, Minitab does a good job of editing your scales to contain your data, but sometimes different values can be meaningful. For example, you can change the y-scale so that it contains 0—the lowest income that someone can have. The change in scale has two pronounced effects on the graph. Let’s look at them side-by-side to make the comparison easier.

The median wage looks higher.

The first thing you probably notice in the new graph is that the median wage is shown much higher on the figure. On the original graph, the median was about as low as you could get, but the message of the new graph is that the median is in the middle. The vertical lines illustrate the distance from the bottom of the graph to the median.

If the median is further away from the bottom of the graph, the median appears to be higher.

The difference between the mean and the median looks smaller.

Take a look at the length of the line between the points in 2013. With the extra space at the bottom of the graph, the mean and median are a lot closer together on the graph than they were before. The vertical lines here illustrate the distance between the mean and the median.

If the mean and median are closer together on the graph, the difference appears smaller.

Add a reference line.

Choose Editor > Add > Reference Lines.

The economist and philosopher John Stuart Mill is credited with the saying: “Men do not desire to be rich, but to be richer than other men.” Thus, when we compare the mean to the median, the median looks low. Changing the point of comparison can make the median look higher. For example, we can add to the plot the 2014 guideline for a  family of 4 to qualify for Medicaid and CHIP in the 48 contiguous U.S. states and the District of Columbia.

By comparison, the poverty level for a family of 4 makes the median appear higher.

With the median now in the middle of two points of comparison, we reinforce the visual impression that the median is not low.

Add a regression fit.

Choose Editor > Add > Regression Fit.

Phillips and the report from the federal reserve note that median wages were lower in 2010 than in 2007 and lower in 2013 than in 2010. But the data on the graph go back to the Survey of Consumer Finances from 1989. Display the least-squares regression line on this graph and the overall trend suggests that median wages have been (ever so slightly) increasing since 1989.

The least squares regression line between median income and time trends upwards.

Wrap up

To communicate your message most effectively, you have to think about how to make the message of a graph as clear as possible. Minitab Statistical Software makes it easy to rescale, add lines, and add shapes so that your graphs are the best representation of the data possible. When the message of the data is clear, then you can make good decisions.

Ready for more? Check out even more of the graph options you can use to get your point across.

Exoplanet Statistics and the Search for Earth Twins

$
0
0

Astronomy is cool! And, it’s gotten even more exciting with the search for exoplanets. You’ve probably heard about newly discovered exoplanets that are extremely different from Earth. These include hot Jupiters, super-cold iceballs, super-heated hellholes, very-low-density puffballs, and ultra-speedy planets that orbit their star in just hours. And then there is PSR J1719-1438 which has the mass of Jupiter, orbits a pulsar, and is probably one giant diamond!

In this post, I'll use statistics to look at the overall planetary output from the Milky Way’s planet-making process. Where does Earth fit in the overall distribution of planets? In light of the extreme exoplanets, is Earth actually the oddball?  I’ll also look into the search for an Earth twin, and highlight data that suggest exciting finds down the road.

Our Sample of Exoplanets

We have 1,826 confirmed exoplanets! That’s a very good sample size, but is this sample representative of all planets? To obtain a representative sample, you need to collect a random sample. It’s easy to point our instruments at random stars, but that doesn’t guarantee a representative sample. Equipment characteristics may make certain types of exoplanets easier to detect than others, thus biasing the sample.

Let’s look at the two methods that have been used to discover the most exoplanets to see how the sample might be biased. These are the radial-velocity and transit methods of planet discovery.

illustration of the radial velocity method for detecting exoplanetsThere’s nothing better than working with data yourself to truly understand it! These data are from the Planetary Habitability Laboratory’s exoplanet catalog, and I encourage you get the free trial version of our statistical software and explore the data yourself. .

Radial-velocity method

Astronomers who use the radial-velocity method look for the wobble that an orbiting exoplanet causes in its star. The bigger the wobble, the easier the exoplanet is to detect. Large exoplanets that are close to small stars cause the largest wobbles and are, therefore, easier to detect by this method. Hot Jupiters are easiest to detect with this method because of their large size and close proximity to the star.

Transit method

The transit, or photometry, method measures the decrease in brightness as an exoplanet passes in front of the parent star. The Kepler space telescope uses the transit method and it was specifically designed to be able to detect Earth-size planets. For this to work, the orbits of the exoplanets have to be perfectly aligned from the astronomers' vantage point.

illustration of the transit method of detecting exoplanetsKepler must observe at least three transits to flag a candidate exoplanet. The multiple transits help rule out other potential reasons for a light decrease but doesn’t prove it’s a planet. These candidates need to be confirmed via other methods, such as direct imaging. Unfortunately, Kepler was crippled by a failure after four years of data collection. Consequently, detecting exoplanets much more than one AU out from its star is not expected with the Kepler data.

Method comparison

The histograms below show the distribution in mass and distance by detection method for all confirmed exoplanets.

Mass of exoplanet grouped by detection method

Exoplanet distance from parent star by detection method

Both methods found a large proportion of planets that are both close to the star and not particularly massive. Even the radial-velocity method, which favors massive planets, found more smaller planets. As expected, the Kepler stopped finding planets that are further than one astronomical unit (AU) away from their star, but the detections by radial-velocity continue out to greater distances.

So, while we probably don’t have a completely representative sample, the two methods agree on the general shape of the distribution: smaller, closer planets far out number massive, more distant planets.

Overall Distribution of Exoplanets

Next, we’ll look at the overall distribution of all confirmed exoplanets by several key variables. The green bar in each graph shows where the Earth fits in.

Histogram of mass of confirmed exoplanets

We can see that the range in exoplanet mass varies greatly, from very small to thousands of Earth masses. For comparison, Jupiter is 317 Earth masses. There are many more small planets, like Earth, than large planets! Let's zoom in.

Zoomed in look at mass of Earth-sized planets

In this graph, I've truncated the X-axis. Here we can see that Earth is actually smaller than the peak value. Among rocky planets, it turns out that super-Earths are more common than Earth-sized planets. Super-Earths are rocky like Earth, but have 2 to 5 times the mass of Earth. So, Earth might be slightly unusual for rocky planets by being on the small side.  

Average distance of exoplanet from parent star

Earth is a bit further from the sun (1 AU) than the more frequent distances on the graph. This reflects the fact that red dwarf stars are by far the most common type of star (80%). These smaller, cooler stars have planetary systems that are much more compact than those of stars like our sun.

Orbital eccentricity of confirmed exoplantes

Orbital eccentricity measures whether an object's orbit is close to circular or more elliptical (oval shaped). Highly eccentric orbits cause extreme climate changes because there is a greater difference between the minimum and maximum distances from the parent star.

Zero is a perfect circle while just less than one is the most elliptical an orbit can be. The orbits of the planets in the solar system are very circular. We can see on the graph that Earth's very low eccentricity is not unusual among confirmed exoplanets.

These graphs show that Earth really isn’t such an oddball—there’s a wide range of planets, and Earth falls near the more common values in each graph.

The Search for an Earth Twin

There’s more to our search than just looking at the distributions by mass, length of year, and orbital eccentricity. We want to know about specific cases where everything lines up just right to produce exoplanets that are habitable Earth twins.

Let me introduce you to the Earth similarity index (ESI). This measure indicates how similar an exoplanet is to Earth. Values range from 0 to 1, where Earth has a value of one. ESI is based on estimated parameters for each exoplanet, such as radius, density, surface temperature, and escape velocity. In our solar system, Mars has an ESI of 0.64 and Venus is 0.78.

The bubbleplot below shows all of the confirmed exoplanets and unconfirmed Kepler candidates that have ESI values greater than 0.80 and are in the habitable zone. For comparison, the blue bubble is Earth.

Bubbleplot of Earth-like exoplanets by mass, length of year, and star mass

There are 23 exoplanets and candidates that have an ESI greater than 0.80. In fact, five are greater than or equal to 0.90, with the highest being 0.93. The blue Earth bubble is smaller than most other bubbles on the plot, which again indicates that Earth is on the smaller side for rocky planets. On the graph, 18 out of 23 (78%) are super-Earths, four are classified as Earth-sized, and one is smaller than Earth.

Even though a majority are super-Earths, that’s fine because, super-Earths might be even more habitable than Earth-sized planets!

There are two groups of Earth-like planets on the graph. Let’s call them Earth cousins and Earth twins.

The Earth cousins are on the bottom-left. These exoplanets are similar to Earth, but they orbit red dwarf stars that are much cooler and less massive than our sun. These exoplanets need to orbit much closer to be in the habitable zone, which produces the short years.

The Earth twins are on the top right. These exoplanets are like Earth and orbit stars that are like our sun. Consequently, they have years that are more similar to our own.

The bubbleplot contains both confirmed planets and unconfirmed Kepler candidates. The green bubbles indicate confirmed planets, but they’re all in the Earth cousin group. So far, all Earth twins are unconfirmed by other methods. Kepler has detected three transits for each candidate, but some of the candidates may be false positives. The false positive rate for Kepler candidates varies by planet size, and for Earth-sized planets it is 12.3%.

While it is reasonable to expect that some of the 14 Earth twins are false positives, we can also expect that 88% (12) will eventually be confirmed. That will be exciting news! And those are just the twins that we currently have data for.

Given the context about the distribution of planets, it’s not surprising that scientists estimate there are 40 billion Earth-sized planets in the habitable zones of their stars in the Milky Way!

The image of the radial velocity method is by Rnt20 and the image of the planet transit method is by Nikola Smolenski. Both images are used under this Creative Commons license.

Capability Snapshot in the Minitab 17 Assistant

$
0
0
In quality initiatives such as Six Sigma, practitioners often need to assess the capability of a process to make sure it can meet specifications, or to verify that it produces good parts. While many Minitab users are familiar with the capability analysis tools in the Stat menu and in Minitab’s Assistant, the Assistant includes a less-frequently used featurethe Capability Snapshot.What Is the Capability Snapshot, and When Is It Useful?

The Shapshot can give you a capability estimate when data have not been collected over a long enough period of time to validate process stability (a key assumption for a true process capability analysis) or to capture the different sources of process variation.

The Capability Snapshot provides the most information possible from the limited data provided.

The Assistant's Snapshot mentions using this option only with data that are not in time order, but remember that even if the measurements being used are stored in the worksheet in the order in which they were collected, data collected over a short period of time won't capture the true long-term variation in the process.  The time order here refers to data that has been collected over a long period of time, not data that is merely sequential.

According to the guidelines set forth in the Assistant’s Capability Analysis menu for a standard Capability Analysis, sufficient data must be collected over a long-enough period of time to obtain reliable estimates of process capability:

In some situations, it may not be possible to collect enough data over a sufficiently long period of time to obtain precise estimates or capture the different sources of variation. 

For example, parts produced in an R&D laboratory are not likely produced in the same environment—by the same operators, using the same equipment, at the same temperature, or more generally under the same operating conditions—as they would be in a manufacturing plant making parts for customers.  Similarly, it may not be possible to produce enough parts in an R&D laboratory. In some situations only a few parts can be produced and it may be necessary to estimate the capability of the process based on this limited information.

Capability Snapshot to the Rescue!

If you choose the Capability Snapshot, Minitab only estimates the overall standard deviation. Because the analysis makes no assumption about the amount of time over which you collected the data, it is not possible to determine whether the variation in the data represents the inherent variation (typically captured by within-subgroup variation) or the overall variation of the process (which can only be estimated over the long-term). 

In other words no subgroups are assumed, and only the overall variation in the sample is used.

Because the calculations use the overall standard deviation of the sample, the resulting output is labeled Ppk…. However that number should be interpreted with caution!  The overall standard deviation for a snapshot of the process may not capture all sources of process variation that would be observed if the data were collected over the long-term. Because of that, the Ppk in this output cannot be interpreted in the usual way.

So Do the Capability Snapshot Statistics Represent Cpk or Ppk?

As one of Minitab’s fantastic (and hilariously entertaining) trainers, Paul Sheehy, likes to say, “No mother, no father—it’s neither!”  Some like to argue that it represents Cpk, or within-subgroup variation. But to truly capture within-subgroup variation, we calculate the variation averaged across many subgroups of data, and in this case, we only have one ‘subgroup.’

Others prefer to think of the resulting capability estimate as Ppk, or overall variation. And even though that is how the output is labeled in the Capability Snapshot, the data that is typically used for this type of analysis does not capture the variation from the process over the long term, so it's not quite the same metric as traditional Ppk.

If that seems like a lot to remember, don't worry about it! The output from the Assistant is packed full of guidance and helpful tips to guide your interpretation of the results:

To read more about Capability Analysis, check out Eric Heckman’s blog post (which you’ll really like if you’re a Star Wars fan!): Starting Out with Capability Analysis.


Attribute Acceptance Sampling for an Acceptance Number of 0

$
0
0

Suppose that you plan to source a substantial amount of parts or subcomponents from a new supplier. To ensure that their quality level is acceptable to you, you might want to assess the capability levels (Ppk and Cpk indices) of their manufacturing processes and check whether their critical process parameters are fully under control (using control charts). If you are not sure about the efficiency of the supplier quality system or if you cannot get reliable estimates of their capability indices, you will probably need to actually inspect the incoming parts from this vendor.

Parts for visual inspectionHowever, checking all parts is expensive and time consuming.  In addition to that, visually inspecting 100% of all parts will not necessarily ensure that all defective parts are detected (operators will eventually get tired performing repetitive visual inspections).

Acceptance sampling is a more efficient approach: to reduce costs, a smaller sample of parts is selected (in a random way to avoid any systematic bias) from a larger batch of incoming products, these sampled parts are then inspected.

Attribute Acceptance Sampling

The Acceptable Quality Level (AQL) of your supplier is the quality level that you expect from them (a proportion of defectives that is still considered acceptable). If the proportion of defectives is larger than that, the whole batch should get rejected (with a financial penalty for the supplier). The RQL is the Rejectable Quality Level (a proportion of defectives that is not considered acceptable, in which case the whole batch should be rejected).

The graph below represents the probability to accept a batch for a given proportion of defectives. The probability to accept the whole batch when the actual percentage of defectives in the batch is 1% (1% is the AQL in this case) is 98.5%, but if the true percentage of defectives increases to 10% (10% is the RQL), the probability to accept the whole batch will be 9.7%.

The inspection criterion, in this case, should be the following: check 52 parts, and if there are more than 2 defective parts, then reject the whole batch. If there are two defective parts or less, then do not reject. The AQL and the RQL need to be negotiated with your supplier, whereas the acceptance criteria are calculated by Minitab.

This graph represents the probability to accept a batch for a given proportion of defectives.

In Minitab, go to Stat > Quality Tools > Acceptance Sampling by Attributes... and enter your AQL and RQL as displayed in the dialogue box below to obtain the acceptance criteria.

C = 0 Inspection Plans (Acceptance Number of 0):

From a quality assurance point of view, however, in many industries the only acceptable publicized quality level is 0% defective parts. Obviously, the ideal AQL should be 0. You may have a difficult time explaining your final customers that a small proportion of defectives is still acceptable. So let's focus on 0 defective control plans, when the acceptance number is 0 and a batch is rejected as soon as a single defective is identified in the sample.

Note that Minitab will not allow you to enter an AQL of exactly 0 (it should always be larger than 0).

The Producer’s Risk

If the acceptance number is set to 0, the conditions for accepting a lot become considerably more restrictive. One consequence of setting very strict standards for accepting a batch is that if quality is not 100% perfect, and even with a very small proportion of defectives, the probability of rejecting a batch will increase very rapidly.

The Alpha risk (the Producer’s risk) is the probability to reject a batch even though the proportion of defectives is very small. This impacts the producer since many of the batches they deliver will get rejected if the true proportion of defectives is not exactly 0. 

In the graph below the probability to accept a batch with a 1% defective rate is now 80% (so that nearly 20% of the batches will get rejected if the true proportion of defectives is 1%)! This high rejection rate is the price we need to pay for the very strict 0 acceptance number.

Conclusion

The sample size to inspect is smaller with an acceptance number of 0 (22 parts are inspected in the second graph vs. 52 in the first graph). However, this is a very ambitious objective. If the true percentage of defectives is, say, 0.5% in the batches (if the AQL is set at 0.5%), then 10,4% of all batches will get rejected.

To obtain a lower and more realistic proportion of rejected batches, the level of quality from your supplier should be nearly 100% perfect (almost 100% good parts).

The Influence of the AP Preseason College Football Poll

$
0
0

AP PollA few weeks ago I looked at how the preseason college football poll influences the rankings at the end of the year. I found that for the most part, the teams that ranked higher in the preseason tend to be ranked higher going into the postseason. So if Team A and Team B both finish the regular season undefeated, the team that was ranked higher in the preseason tends to be the one ranked higher going into the postseason.

The biggest exception was, and I hope you’re sitting down for this, SEC teams. SEC teams that finished the regular season with 0 or 1 loss tended to be ranked higher than non-SEC teams with the same amount of losses, regardless of the preseason poll. On the other side, Boise State (a team from a smaller conference) tended to get jumped by teams with the same number of losses that were ranked lower than them in the preseason poll. And the oddball was USC, who in both 2007 and 2008 was jumped by 4 different teams despite starting the season ranked lower than the Trojans and finishing the regular season with the same number of losses.

Looking at 2014

Now let’s turn our attention to this season. Are the same trends happening? There are currently 10 undefeated teams. I looked at the order those 10 teams were ranked in the preseason AP Poll and the order they are currently ranked in the AP Poll. Below is an individual value plot of the data that shows each team's preseason rank versus their current rank.

Individual Value Plot

Teams on the diagonal line haven’t moved up or down since the preseason. So of the 10 undefeated teams, Florida St and Auburn were ranked #1 and #2 respectively in the preseason. And they remain that way currently.

Teams above the line have jumped teams that were ranked ahead of them in the preseason. Just as we saw before, it’s the SEC teams that rise to the top regardless of where they were ranked in the preseason. Mississippi State had 22 points in the AP preseason poll, which is actually one less than TCU received, and weren’t even close to being in the top 25. But now? They have 1,320 points, are ranked 3rd, and jumped 5 other undefeated teams that were ranked higher than them in the preseason. It’s good to win games in the SEC.

Teams below the line have been jumped by teams ranked lower than them in the preseason. Baylor and Notre Dame are simply casualties of the Mississippi teams. If somebody moves up, somebody else has to move down. But Marshall appears to fit our “Boise State” profile, as they are a team from a small conference. They received 41 points in the preseason AP Poll, putting them just outside the top 25. And now, despite being 5-0, they have only 78 points and are the only undefeated team still outside the top 25.

Measuring Correlation and Concordance

It appears that the preseason poll is influencing the current rankings, but we can get some statistics to confirm our hunches. The first is a correlation coefficient. The correlation coefficient can range in value from -1 to +1. The larger the absolute value of the coefficient, the stronger the relationship between the variables. An absolute value of 1 indicates a perfect relationship, and a value of zero indicates the absence of relationship. 

Correlation

For these data, both values equal about 0.685, indicating a positive association between preseason rankings and the current rankings. Teams ranked higher in the preseason are also currently ranked higher.

We can also look at the concordant and discordant pairs. A pair is concordant if the observations are in the same direction. A pair is discordant if the observations are in opposite directions. More concordant pairs means voters are keeping the same order as the preseason, while more discordant pairs means they’re switching teams.

Concordance

Of the 45 pairs of teams, 35 of them are concordant, meaning most of the time voters are keeping teams in the same order as the preseason. The 10 discordant pairs all involve either Mississippi, Mississippi State, or Marshall. So it’s pretty clear when voters are switching teams from the preseason.

Implications for the College Football Playoff

So far we’ve shown that voters in the AP Poll prefer to keep the same order teams are ranked in the preseason, minus SEC teams and teams from smaller conferences. But the voters of the AP Poll won’t decide the teams in the college football playoff, a separate college football playoff committee will. Luckily for us, that committee will start releasing their own set of rankings on October 28. So we’ll come back then and see if the committee is also influenced by preseason rankings the same way the AP voters are.

The Ghost Pattern: A Haunting Cautionary Tale about Moving Averages

$
0
0

Halloween's right around the corner, so here's a scary thought for the statistically minded: That pattern in your time series plot? Maybe it's just a ghost. It might not really be there at all. 

That's right. The trend that seems so evident might be a phantom. Or, if you don't believe in that sort of thing, chalk it up to the brain's desire to impose order on what we see, even when it doesn't exit.  

I'm going to demonstrate this with Minitab Statistical Software (get the free 30-day trial version and play along, if you don't already use it). And if things get scary, just keep telling yourself "It's only a simulation. It's only a simulation."

But remember the ghost pattern when we're done. It's a great reminder of how important it is to make sure that you've interpreted your data properly, and looked at all the factors that might influence your analysis—including the quirks inherent in the statistical methods you used. 

Plotting Random Data from a 20-Sided Die

We're going to need some random data, which we can get Minitab to generate for us. In many role-playing games, players use a 20-sided die to determine the outcome of battles with horrible monsters, so in keeping with the Halloween theme we'll simulate 500 consecutive rolls with a 20-sided die. Choose Calc > Random Data > Integer...  and have Minitab generate 500 rows of random integers between 1 and 20.  

Now go to Graph > Time Series Plot... and select the column of random integers. Minitab creates a graph that will look something like this: 

Time Series Plot of 200 Twenty-Sided Die Rolls

It looks like there could be a pattern, one that looks a little bit like a sine wave...but it's hard to see, since there's a lot of variation in consecutive points. In this situation, many analysts will use a technique called the Moving Average to filter the data. The idea is to smooth out the natural variation in the data by looking at the average of several consecutive data points, thus enabling a pattern to reveal itself. It's the statistical equivalent of applying a noise filter to eliminate hiss on an audio recording.  

A moving average can be calculated based on the average of as few as 2 data points, but this depends on the size and nature of your data set. We're going to calculate the moving average of every 5 numbers. Choose Stat > Time Series > Moving Average... Enter the column of integers as the Variable, and enter 5 as the MA length. Then click "Storage" and have Minitab store the calculated averages in a new data column. 

Now create a new time series plot using the moving averages:

moving average time series plot

You can see how some of the "noise" from point-to-point variation has been reduced, and it does look like there could, just possibly, be a pattern there.

Can Moving Averages Predict the Future?

Of course, a primary reason for doing a time series analysis is to forecast the next item (or several) in the series. Let's see if we might predict the next moving average of the die by knowing the current moving average.  

Select Stat > Time Series > Lag. In the dialog box, choose the "moving averages" column as the series to lag. We'll use this dialog to create a new column of data that places each moving average down 1 row in the column and inserts missing value symbols, *, at the top of the column.

Now we can create a simple scatterplot that will show if there's a correlation between the observed moving average and the next one. 

Scatterplot of Current and Next Moving Averages

Clearly, there's a positive correlation between the current moving average and the next, which means we can use the current moving average to predict the next one.  

But wait a minute...this is random data!  By definition, you can't predict random, so how can there be a correlation? This is getting kind of creepy...it's like there's some kind of ghost in this data. 

Zoinks! What would Scooby Doo make of all this?  

Debunking the "Ghost" with the Slutsky-Yule Effect

Don't panic—there's a perfectly rational explanation for what we're seeing here. It's called the Slutsky-Yule Effect, which simply says an autoregressive time series (like a moving average) can look like patterned data, even if there's no relationship among the data points.  

So there's no ghost in our random data; instead, we're seeing a sort of statistical illusion. Using the moving average can make it seem like a pattern or relationship exists, but that apparent pattern could be a side effect of the tool, and not an indication of a real pattern. 

Does this mean you shouldn't use moving averages to look at your data? No! It's a very valuable and useful technique. However, using it carelessly could get you into trouble. And if you're basing a major decision solely on moving averages, you might want to try some alternate approaches, too. Mikel Harry, one of the originators of Six Sigma, has a great blog post that presents a workplace example of how far apart reality and moving averages can be. 

So just remember the Slutsky-Yule Effect when you're analyzing data in the dead of night, and your moving average chart shows something frightening. Shed some more light on the subject with follow-up analysis and you might find there's nothing to fear at all. 

With the Assistant, You Won't Have to Stop and Get Directions about Directional Hypotheses

$
0
0

I got lost a lot as a child. I got lost at malls, at museums, Christmas markets, and everywhere else you could think of. Had it been in fashion to tether children to their parents at the time, I'm sure my mother would have. As an adult, I've gotten used to using a GPS device to keep me from getting lost.

The Assistant in Minitab is like your GPS for statistics. The Assistant is there to provide you with directions so that you don't get lost. One particular area where it's easy to get lost is with directional hypotheses.Wait... is my hypothesis the other direction?

What Is a Directional Hypothesis?

When you do a statistical hypothesis test, you have a null hypothesis and an alternative hypothesis. Directional hypotheses refer to two types of alternative hypotheses that you can usually choose. The common alternative hypotheses are these three:

  • The value that you want to test is greater than a target.
  • The value that you want to test is different from a target.
  • The value that you want to test is less than a target.

If you select an alternative hypothesis with "greater than" or "less than" in it, then you've chosen a directional hypothesis. When you choose a directional hypothesis, you get a one-sided test.

What does it look like to choose a one-sided test, and why would you? Let's consider an example.

Choosing Whether to Use a One-sided Test or a Two-sided Test

Suppose new production equipment is installed at a factory that should increase the rate of production for electrical panels. Concern exists that the change could increase the percentage of electrical panels that require rework before shipping. A quality team prepares to conduct a hypothesis test to determine whether statistical evidence supports this concern. The historical rework rate is 1%.

At this point, you would usually choose an alternative hypothesis. Maybe you remember hearing that you should think about whether to use a one-sided test or a two-sided test, or you may not even know how a test can have a side.

To keep from getting lost, you use your GPS. To keep from getting confused about statistics, you can use the Assistant. The Assistant uses clear and simple language. The Assistant doesn't ask you about "directional hypotheses" or "one-sided tests." Instead, the Assistant asks the question, "What do you want to determine?"

Is the % defective of Panels greater than .01? Is the % defective of Panels less than .01? Is the % defective of Panels different from .01?

In this scenario, it's easy to see why the team would want to determine whether the percent is greater than 1. By performing the one-sided test for whether the percentage is greater than 1, the team can determine if there is enough statistical evidence to conclude that the percentage increased. If the percentage increased, then the concern is justified.

In practical terms, you should consider what it means to limit your decision to whether there is evidence for an increase. A one-sided test of whether the percentage increased will never show a statistically significant decrease in the percentage of boards that require rework. Evidence of a decrease in the number of defectives might guide the quality team to investigate the reasons for the unforeseen benefit.

Why Use a One-sided Test?

Given this possible concern about whether a one-sided test excludes important information from the result, why would you ever use one? The best answer is that you use a one-sided test when the one-sided test tells you everything that you need to know.

In the example about the electrical panels, the quality team might feel completely secure in assuming that the new equipment will not result in a decrease in the percentage of panels that require rework. If so, then a test that checks for a decrease is flawed. The team needs only to determine whether to solve a problem with increased defectives or not.

The Assistant Gets Even Better

While a p-value for a one-sided test can be useful, more analysis can help you make better decisions. For example, in the electrical panel example, if the team finds a statistically significant increase, it will be important to know what the percentage increase is. The Assistant produces several reports with your hypothesis tests that help you get as much information as you can from your data. The report card verifies your analysis by providing assumption checks and identifying any concerns that you should be aware of. The diagnostic report helps you further understand your analysis by providing additional detail. The summary report helps you to draw the correct conclusions and explain those conclusions to others. The series of reports includes a variety of other statistics and analyses. That way, you have everything that you need to interpret your results with confidence.

The % defective of Panels is not significantly greater than the target (p > 0.05)

The image of the face in the crowd without the thought bubble is by _Imaji_ and is licensed under this creative commons license.

How Important Are Normal Residuals in Regression Analysis?

$
0
0

I’ve written about the importance of checking your residual plots when performing linear regression analysis. If you don’t satisfy the assumptions for an analysis, you might not be able to trust the results. One of the assumptions for regression analysis is that the residuals are normally distributed. Typically, you assess this assumption using the normal probability plot of the residuals.

Normal Probability Plot showing residuals that are not distributed normallyAre these nonnormal residuals a problem?

If you have nonnormal residuals, can you trust the results of the regression analysis?

Answering this question highlights some of the research that Rob Kelly, a senior statistician here at Minitab, was tasked with in order to guide the development of our statistical software.

Simulation Study Details

The goals of the simulation study were to:

  • determine whether nonnormal residuals affect the error rate of the F-tests for regression analysis
  • generate a safe, minimum sample size recommendation for nonnormal residuals

For simple regression, the study assessed both the overall F-test (for both linear and quadratic models) and the F-test specifically for the highest-order term.

For multiple regression, the study assessed the overall F-test for three models that involved five continuous predictors:

  • a linear model with all five X variables
  • all linear and square terms
  • all linear terms and seven of the 2-way interactions

The residual distributions included skewed, heavy-tailed, and light-tailed distributions that depart substantially from the normal distribution.

There were 10,000 tests for each condition. The study determined whether the tests incorrectly rejected the null hypothesis more often or less often than expected for the different nonnormal distributions. If the test performs well, the Type I error rates should be very close to the target significance level.

Results and Sample Size Guideline

The study found that a sample size of at least 15 was important for both simple and multiple regression. If you meet this guideline, the test results are usually reliable for any of the nonnormal distributions.

In simple regression, the observed Type I error rates are all between 0.0380 and 0.0529, very close to the target significance level of 0.05.

In multiple regression, the Type I error rates are all between 0.08820 and 0.11850, close to the target of 0.10.

Closing Thoughts

The good news is that if you have at least 15 samples, the test results are reliable even when the residuals depart substantially from the normal distribution.

However, there is a caveat if you are using regression analysis to generate predictions. Prediction intervals are calculated based on the assumption that the residuals are normally distributed. If the residuals are nonnormal, the prediction intervals may be inaccurate.

This research guided the implementation of regression features in the Assistant menu. The Assistant is your interactive guide to choosing the right tool, analyzing data correctly, and interpreting the results. Because the regression tests perform well with relatively small samples, the Assistant does not test the residuals for normality. Instead, the Assistant checks the size of the sample and indicates when the sample is less than 15.

See a multiple regression example that uses the Assistant.

You can read the full study results in the simple regression white paper and the multiple regression white paper. You can also peruse all of our technical white papers to see the research we conduct to develop methodology throughout the Assistant and Minitab.

Viewing all 828 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>