How to Create a Graphical Version of the 1-sample t-Test in Minitab

March 25, 2015, 5:00 am

≫ Next: Making Better Estimates of Project Duration Using Monte Carlo Analysis

This is a companion post for a series of blog posts about understanding hypothesis tests. In this series, I create a graphical equivalent to a 1-sample t-test and confidence interval to help you understand how it works more intuitively.

This post focuses entirely on the steps required to create the graphs. It’s a fairly technical and task-oriented post designed for those who need to create the graphs for illustrative purposes. If you’d instead like to gain a better understanding of the concepts behind the graphs, please see the following posts:

Understanding Hypothesis Tests: Why We Need to Use Hypothesis Tests
Understanding Hypothesis Tests: The Significance Level and P Values
Understanding Hypothesis Tests: Confidence Intervals (forthcoming)

To create the following graphs, we’ll use Minitab’s probability distribution plots in conjunction with several statistics obtained from the 1-sample t output. If you’d like more information about the formulas that are involved, you can find them in Minitab at: Help > Methods and Formulas > Basic Statistics > 1-Sample t.

The data for this example is FamilyEnergyCost and it is just one of the many data set examples that can be found in Minitab’s Data Set Library. We’ll perform the regular 1-sample t-test with a null hypothesis mean of 260, and then graphically recreate the results.

1-sample t-test output from Minitab statistical software

How to Graph the Two-Tailed Critical Region for a Significance Level of 0.05

To create a graphical equivalent to a 1-sample t-test, we’ll need to graph the t-distribution using the correct number of degrees of freedom. For a 1-sample t-test, the degrees of freedom equals the sample size minus 1. So, that’s 24 degrees of freedom for our sample of 25.

In Minitab, choose: Graph > Probability Distribution Plot > View Probability.
In Distribution, select t.
In Degrees of freedom, enter 24.
Click the Shaded Area tab.
In Define Shaded Area By, select Probability and Both Tails.
In Probability, enter 0.05.
Click OK.

You should see this graph.

Probability distribution plot of t-values

This graph shows the distribution of t-values for a sample of our size with the t-values for the end points of the critical region. The t-value for our sample mean is 2.29 and it falls within the critical region.

For my blog posts, I thought displaying the x-axis in the same units as our measurement variable (energy costs) would make the graph easier to understand. To do this, we need to transform the x-axis scale from t-values to energy costs.

Transforming the t-values to energy costs for a distribution centered on the null hypothesis mean requires a simple calculation:

Energy Cost = Null Hypothesis Mean + (t-value * SE Mean)

We’ll use the null hypothesis value that we entered in the dialog box (260) and the SE Mean value that appears in the 1-sample t-test output (30.8). We need to calculate the energy cost values for all of the t-values that will appear on the x-axis (-4 to +4).

For example, a t-value of 1 equals 290.8 (260 + (1*30.8). Zero is the null hypothesis value, which is 260.

Next, we need to replace the t-values with the energy cost equivalents.

Choose Editor > Select Item > X Scale.
Choose Editor > Edit X Scale.
In Major Tick Position, choose Number of Ticks and enter 9.
Click the Show tab and check the Low check box for Major ticks and Major tick labels.
Click the Labels tab of the dialog box that appears. Enter the energy cost values that you calculated as shown below. I use rounded values to keep the x-axis tidy. Click OK.

Dialog box for showing the transformed values on the x-scale

You should see this graph. To cleanup the x-axis, I had to delete the t-values that were still showing from before. Simply click each t-value once and press the Delete key.

Probability distribution plot of t-distribution with the x-scale transformed to energy costs

Let’s add a reference line to show where our sample mean falls within the sampling distribution and critical region. The trick here is that the x-axis still uses t-values despite displaying the energy costs. We need to use the t-value for our sample mean that appears in the 1-sample t output (2.29).

Choose Editor > Add > Reference Lines.
In Show reference lines at X values, enter 2.29.
Click OK.
Double click the 2.29 that now appears on the graph.
In the dialog box that appears, enter 330.6 in Text.
Click OK.

After editing the title and the x-axis label, you should have a graph similar to the one below.

Probability distribution plot with two-tailed critical region for a significance level of 0.05

How to Graph the P Value for a 1-sample t-Test

To do this, we’ll duplicate the graph we created above and then modify it. This allows us to reuse some of the work that we’ve already done.

Make sure the graph we created is selected.
Choose Editor > Duplicate Graph.
Double click the blue distribution curve on the graph.
Click the Shaded Area tab in the dialog box that appears.
In Define Shaded Area By, select X Value and Both Tails.
In X value, enter 2.29.
Click OK.

You’ll need to edit the graph title and delete some extra numbers on the x-axis. After these edits, you should have a graph similar to this one.

Probability distribution plot that displays the p-value for our sample mean

How to Graph the Confidence Interval for a 1-sample t-test

To graphically recreate the confidence interval, we’ll need to start from scratch for this graph.

In Minitab, choose: Graph > Probability Distribution Plot > View Probability.
In Distribution, select t.
In Degrees of freedom, enter 24.
Click the Shaded Area tab.
In Define Shaded Area By, select Probability and Middle.
Enter 0.025 in both Probability 1 and Probability 2.
Click OK.

Your graph should look like this:

Probability distribution plot that represents a confidence interval with t-values

Like before, we’ll need to transform the x-axis into energy costs. For this graph, I’ll only display the x-values for the end points of the confidence interval and the sample mean. So, we need to convert the three t-values of -2.064, 0, 2.064.

The equation to transform the t-values to energy costs for a distribution centered on the sample mean is:

Energy Cost = Sample Mean + (t-score * SE Mean)

We obtain the following rounded values that represent the lower confidence limit, sample mean, and upper confidence limit: 267, 330.6, 394.

Simply double click the values in the x-axis to edit each individual label. Replace the t-value with the energy cost value. After editing the graph title, you should have a visual representation of the confidence interval that looks like this. I rounded the values for the confidence limits.

Probability distribution plot that displays a visual representation of a 95% confidence interval around the sample mean

Consider Using Minitab's Command Language

When I create multiple graphs that involve many steps, I generally use Minitab's command language. This may sound daunting if you're not familiar with using this command language. However, Minitab makes this easier for you.

After you create one graph, choose Editor > Copy Command Language, and paste it into a text editor, such as Notepad. Save the file with the extension *.mtb and you have a Minitab Exec file. This Exec file contains all of the edits you made. Now, you can easily create similar graphs simply by modifying the parts that you want to change.

You can also get help for the command language right in Minitab. First, make sure the command prompt is enabled by choosing Editor > Enable Commands. At the prompt, type help dplot, and Minitab displays the help specific to probability distribution plots!

To run an exec file, choose File > Other Files > Run an Exec. Click Select File and browse to the file you saved. Here are the MTB files for my graphs for the critical region, P value, and confidence interval.

Happy graphing!

↧

Making Better Estimates of Project Duration Using Monte Carlo Analysis

March 26, 2015, 5:00 am

≫ Next: How Could You Benefit from a Box-Cox Transformation?

≪ Previous: How to Create a Graphical Version of the 1-sample t-Test in Minitab

by Lion "Ari" Ondiappan Arivazhagan, guest blogger.

Predicting project completion times is one of the major challenges project managers face. Project schedule overruns are quite common due to the high uncertainty in estimating the amount of time activities require, a lack of historical data about project completion, organizational culture, inadequate skills, the complex and elaborative nature of projects, and many other factors.

PMI’s Pulse of the Profession™ research, which is consistent with other studies, shows that "fewer than two-thirds of projects meet their goals and business intent (success rates have been falling since 2008), and about 17 percent fail outright. Failed projects waste an organization’s money: for every US$1 billion spent on a failed project, US$135 million is lost forever…unrecoverable."

In another report on infrastructure project schedule and cost overruns, released in 2013 by PMI-KPMG, 79 percent of the survey respondents agreed that the infrastructure sector in India faces a shortage of skilled project managers with the prerequisite skill set, which results in time/schedule overruns. One of the reasons for inefficient project delivery is the paucity of skilled project managers in the infrastructure sector.

Yet predicting an achievable project completion time is more important today than ever before, due to the high liquidated damages (LD) or penalty charges for late completion and growing dissatisfaction among clients and the public.

The Drawbacks of Traditional CPM Technique

Deterministic, single-point estimates of project activities are highly risky as it is impossible to complete all the project activities exactly on the estimated single-point durations. Moreover, most estimators tend to estimate activity durations that are closer to optimistic estimates than to pessimistic ones. The most likely estimates are the modal estimates and the traditional Critical Path Method (CPM), which assumes activities are normally distributed. In a normal distribution, the modal estimates have only a 50% chance of being completed within or below the estimated duration, and hence the critical path duration. In other words, we typically start with estimated project completion time that has a 50% chance of being EXCEEDED from the second the project begins.

Why Probabilistic Method (PERT)

Models that use three-point estimates, such as the PERT model, reduce uncertainty in project completion estimates by taking into account the Optimistic (To), Most-likely (Tml) and Pessimistic (Tp) to some extent. The width of the range (Tp -To) indicates the degree of the risk in each activity duration. While probabilistic estimates can give us three different project completion times based on either To, Tml, or Tp, we generally calculate the project completion time based on an equivalent single-point expected duration by assigning appropriate weights to each of the 3 durations. For example, the PERT model, which assumes a Beta distribution, uses the following formula to calculate the expected duration,Te.

beta distribution for activity duration estimate

Expected duration,Te = (To + 4Tml +Tp) / 6

activity table

Using the PERT's 3-point estimates of activities whose durations are in weeks, we get the following PERT Network Diagram to calculate the critical path.The expected durations so calculated are then used as single-point durations in the traditional CPM method to arrive at the critical path duration. Please note that the Te values have been used as the fixed length or known activity durations (similar to the CPM) and the critical path is found by the traditional CPM way using forward and backward passes to calculate the total float of each activity.The critical path is shown below in red.

flowchart

The Critical Path Duration , T = A + E + H + I + J = 6 + 3 + 4 + 2 + 2 = 17 Weeks

Unfortunately, this PERT project duration, found by adding the critical activities, also enjoys a mere 50% chance of on-time completion.The project completion time, regardless of the distribution shapes of the critical activities, tends to follow an approximately normal distribution if there are a sufficiently large number of activities ( say, >30) in the critical path, according to the Central Limit Theorem (CLT). Hence, our problem is still not solved, as PERT-based project completion time is nothing but a glorified-CPM-based completion time.

Going to Monte Carlo

This is where simulation techniques, such as Monte Carlo, come in handy. We can use simulation to estimate various project completion times along with their probability of completion so that we can plan contingency reserves (CR) to ensure at least a 90-95% probability of completion (as opposed to 50% by CPM or PERT methods) during the risk management planning stage itself.

In Monte Carlo simulation, the durations of critical path activities are simulated to take on random values between their Low and High limits, depending on the distributions assumed, using a random number generator until the specified number of simulations—say, 5,000—are exhausted. For each simulation, a set of project completion time and its probability of completion is calculated and stored. When all 5,000 simulations are done, we get 5,000 project completion times and their probability values.

Monte Carlo Simulation outputs (with 5,000 simulations using the software Devize® from Minitab) for various target project completion times are given below. These simulated outputs help determine the Contingency Reserves (CR) needed in terms of the project completion time for better planning and completion assurance to clients.

Devize output
The 5,000-simulation output above predicts that the single-point Critical Path duration of 17 weeks has only a 25.9% chance of completion, or a 74.1% chance of failure (exceeding the estimated duration).

The simulation shown below estimates the probability of completing the project ahead of schedule by 1 month, possibly by fast-tracking. It shows that the chances of completing the project in 16 weeks (as opposed to the baseline duration of 17 weeks from CPM) are only 13.14%. Such predictions are very helpful to project managers in effective planning and deployment of project resources.

Devize output
If the client wants to know or predict the project completion duration that has at least an 85% chance of success, we can easily do that using simulations performed in Devize. In the output below, we can see that the target completion duration of 21 weeks ( USL=21 weeks) has an 86.58% chance of being completed on time.

monte carlo simulation software output
If the project manager wants to submit a completion time that has at least an 85% chance of completion with all the duration combinations of the critical path activities taken into account, it will be wiser to commit to a completion time of 21 weeks, as opposed to the contractual completion time of 16 weeks, which had only a 13.14% chance of success.

Monte Carlo Simulation for Project Managers

Monte Carlo simulation is a boon to project managers in general—and to risk managers in particular—for simulating various possible combinations of the predictor variables within their range of values. Project managers can use Monte Carlo simulations to make more informed decisions and, as a result, complete more projects within the agreed time. Software packages such as Devize make the analysis simpler and intuitive, which in turn makes it easier for us to mitigate the overall project schedule risks to an acceptable threshold.

References

1. An Introduction to Management Science: Quantitative Approaches to Decision Making, by Anderson et al.

2.The PMBOK® Guide - 5th edition, Project Management Institute (PMI).

3. Devize®, Simulation and Optimization software from Minitab® Inc.

4. PMI’s Pulse of the Profession™ -The High Cost of Low Performance. 2013.

5. PMI-KPMG Study on Project Schedule and Cost Overruns - Expedite Infrastructure Projects. 2013.

About the Guest Blogger:

The author, Ondiappan Arivazhagan, "Ari", is an Honors graduate in Civil / Structural Engineering from University of Madras.He is a certified PMP, PMI-SP, PMI-RMP from PMI, USA. He is also a Master Black Belt in Lean Six Sigma and has done Business Analytics from IIM,Bangalore. He has 30 years of professional global project management experience in various countries around the World and has almost 14 years of teaching / training experience in Project management, Analytics, Risk Management and Lean Six Sigma .He is the Founder-CEO of International Institute of Project Management (IIPM), Chennai and can be reached at askari@iipmchennai.com.

An earlier version of the article appeared on LinkedIn.

↧

How Could You Benefit from a Box-Cox Transformation?

March 30, 2015, 5:00 am

≫ Next: Identifying the Distribution of Your Data

≪ Previous: Making Better Estimates of Project Duration Using Monte Carlo Analysis

Imagine that you are watching a race and that you are located close to the finish line. When the first and fastest runners complete the race, the differences in times between them will probably be quite small.

Now wait until the last runners arrive and consider their finishing times. For these slowest runners, the differences in completion times will be extremely large. This is due to the fact that for longer racing times a small difference in speed will have a significant impact on completion times, whereas for the fastest runners, small differences in speed will have a small (but decisive) impact on arrival times.

This phenomenon is called “heteroscedasticity” (non-constant variance). In this example, the amount of Variation depends on the average value (small variations for shorter completion times, large variations for longer times).

This distribution of running times data will probably not follow the familiar bell-shaped curve (a.k.a. the normal distribution). The resulting distribution will be asymmetrical with a longer tail on the right side. This is because there's small variability on the left side with a short tail for smaller running times, and larger variability for longer running times on the right side, hence the longer tail.

Why does this matter?

Model bias and spurious interactions: If you are performing a regression or a design of experiments (any statistical modelling), this asymmetrical behavior may lead to a bias in the model. If a factor has a significant effect on the average speed, because the variability is much larger for a larger average running time, many factors will seem to have a stronger effect when the mean is larger. This is not due, however, to a true factor effect but rather to an increased amount of variability that affects all factor effect estimates when the mean gets larger. This will probably generate spurious interactions due to a non-constant variation, resulting in a very complex model with many (spurious and unrealistic) interactions.
If you are performing a standard capability analysis, this analysis is based on the normality assumption. A substantial departure from normality will bias your capability estimates.

The Box-Cox Transformation

One solution to this is to transform your data into normality using a Box-Cox transformation. Minitab will select the best mathematical function for this data transformation. The objective is to obtain a normal distribution of the transformed data (after transformation) and a constant variance.

Consider the asymmetrical function below :

If a logarithmic transformation is applied to this distribution, the differences between smaller values will be expanded (because the slope of the logarithmic function is steeper when values are small) whereas the differences between larger values will be reduced (because of the very moderate slope of the log distribution for larger values). If you inflate differences on the left tail and reduce differences on the right side tail, the result will be a symmetrical normal distribution, and a variance that is now constant (whatever the mean). This is the reason why in the Minitab Assistant, a Box- Cox transformation is suggested whenever this is possible for non-normal data, and why in the Minitab regression or DOE (design of experiments) dialogue boxes, the Box-Cox transformation is an option that anyone may consider if needed to transform residual data into normality.

The diagram above illustrates how, thanks to a Box-Cox transformation, performed by the Minitab Assistant (in a capability analysis), an asymmetrical distribution has been transformed into a normal symmetrical distribution (with a successful normality test).

Box-Cox Transformation and Variable Scale

Note that Minitab will search for the best transformation function, which may not necessarily be a logarithmic transformation.

As a result of this transformation, the physical scale of your variable may be altered. When looking at a capability graph, one may not recognize his typical values for the variable scale (after transformation). However, the estimated Ppk and Pp capability indices will be reliable and based on a normal distribution. Similarly, in a regression model, you need to be aware that the coefficients will be modified, although the transformation is obviously useful to remove spurious interactions and to identify the factors that are really significant.

↧

Identifying the Distribution of Your Data

March 31, 2015, 5:00 am

≫ Next: 3 New Things You Can Do by Right-Clicking in Minitab 17.2

≪ Previous: How Could You Benefit from a Box-Cox Transformation?

To choose the right statistical analysis, you need to know the distribution of your data. Suppose you want to assess the capability of your process. If you conduct an analysis that assumes the data follow a normal distribution when, in fact, the data are nonnormal, your results will be inaccurate. To avoid this costly error, you must determine the distribution of your data.

So, how do you determine the distribution? Minitab’s Individual Distribution Identification is a simple way to find the distribution of your data so you can choose the appropriate statistical analysis. You can use it to:

Determine whether a distribution you used previously is still valid for the current data
Choose the right distribution when you’re not sure which distribution to use
Transform your data to follow a normal distribution

Let's take a closer look at three ways you can use the Individual Distribution Identification tool in our statistical software.

Confirm a Certain Distribution Fits Your Data

In most cases, your process knowledge helps you identify the distribution of your data. In these situations, you can use Minitab’s Individual Distribution Identification to confirm the known distribution fits the current data.

Suppose you want to perform a capability analysis to ensure that the weight of ice cream containers from your production line meets specifications. In the past, ice cream container weights have been normally distributed, but you want to confirm normality. Here’s how you use Individual Distribution Identification to quickly assess the fit.

Choose Stat > Quality Tools > Individual Distribution Identification.
Specify the column of data to analyze and the distribution to check it against.
Click OK.

Probability Plot for Weight

A given distribution is a good fit if:

The data points roughly follow a straight line
The p-value is greater than 0.05

In this case, the ice cream weight data appear to follow a normal distribution, so you can justify using normal capability analysis.

Determine Which Distribution Best Fits Your Data

Perhaps you have successfully used more than one distribution in the past. You can use Individual Distribution Identification to help you decide which distribution best fits your current data. For example, you want to assess whether a particular weld strength meets customers’ requirements, but several distributions have been used to model this data historically. Here’s how you use Individual Distribution Identification to choose the distribution that best fits your current data.

Choose Stat > Quality Tools > Individual Distribution Identification.
Specify the column of data to analyze and the distributions to check it against.
Click OK.

Determine Which Distribution Best Fits Your Data

Choose the distribution with data points that roughly follow a straight line and the highest p-value. In this case, the Weibull distribution fits the data best.

Note

When you fit your data with both a 2-parameter distribution and its 3-parameter counterpart, the latter often appears to be a better fit. However, you should use a 3-parameter distribution only if it is significantly better. See Minitab Help for information about choosing between a 2-parameter distribution and a 3-parameter distribution.

Use a Normal Statistical Analysis on Nonnormal Data

While Minitab offers various options for analysis of nonnormal data, many users prefer to use the broader palette of normal statistical analyses. Minitab’s Individual Distribution Identification can transform your nonnormal data using the Box-Cox method so that it follows a normal distribution. You can then use the transformed data with any analysis that assumes the data follow a normal distribution.

Choose Stat > Quality Tools > Individual Distribution Identification.
Specify the column of data to analyze.
From the Distribution drop-down menu in the main dialog, choose Box-Cox transformation, and select any other distributions to compare it with.
Click OK in each dialog box.

USE A NORMAL STATISTICAL ANALYSIS ON NONNORMAL DATA

For the transformed data, check for data points that roughly follow a straight line and a p-value greater than 0.05.

In this case, the probability plot and p-value suggest the transformed data follow a normal distribution. You can now use the transformed data for further analysis.

Note

Data transformations will not always produce normal data. You must check the probability plot and p-value to assess whether the normal distribution fits the transformed data well.

Putting Individual Distribution Identification to Use

It is always good practice to know the distribution of your data before choosing a statistical analysis. Minitab’s Individual Distribution Identification is an easy-to-use tool that can help you identify the distribution of your data as well as eliminate errors and wasted time that result from an inappropriate analysis.

You can use this feature to check the fit of a single distribution, or to compare the fits of several distributions and select the one that best fits your data. If you prefer to work with normal data, you can even use Minitab’s Individual Distribution Identification to transform your nonnormal data to see if they follow a normal distribution.

↧

3 New Things You Can Do by Right-Clicking in Minitab 17.2

April 1, 2015, 5:24 am

≫ Next: Understanding Hypothesis Tests: Confidence Intervals and Confidence Levels

≪ Previous: Identifying the Distribution of Your Data

Minitab 17.2 is available. You can check out all of the new stuff on the What’s New Page, but I would say that a little demonstration is in order. Here are some new shortcuts that make arranging your data easier in Minitab 17.2.

Sorting

Let’s suppose that you had copied some Human Development Index data into Minitab and wanted to sort countries in the order of their rank. In the past, you would have used the Data menu, identified all the columns that you wanted to sort, selected the column that you wanted to sort by, and then selected where to store the sorted columns. In Minitab 17.2, you can do all those steps faster with a right-click:

Select a cell in HDI Rank.
Right-click and choose Sort Columns > Entire Worksheet > Smallest to Largest.

Sort with a right-click

That’s it, you’re done! If your column is text, you can select whether to sort alphabetically or reverse alphabetically instead of by increasing or decreasing numeric values.

Changing data format

Especially when you copy and paste data, you can find that numeric columns are defined as text. For example, I copied the HDI data from the Excel file that the United Nations Development Programme provides. After pasting the data into Minitab, I deleted the rows at the top that weren't data. Because the first piece of data in the HDI rank column was “Very high human development,” the entire column is still defined as text, even though I wanted the rank to be a number:

Paste from Excel

In the past, you would have used the Data menu to change the data type. In Minitab 17.2, you can change the data type faster by right-clicking. Here’s how:

Select a cell in HDI Rank
Right-click and choose Format Column.
Select Automatic numeric and click OK.

Change the data format from text to numeric

That’s it, you’re done! You can change columns to 9 different formats this way.

Highlighting

For simple comparisons, it can be just as easy to look at the data in the worksheet as it is to create a graph. Let’s say that you wanted to know if the same 10 countries with the highest HDI Index scores in 2013 had the highest scores in 2012. You can use conditional formatting to quickly highlight the cells of interest to compare two columns:

Select cells in the columns that you want to format.
Right-click and choose Conditional FormattingHigh/Low > Highest Values.
Click OK.

Select the columns you want to conditionally format.

Right-click and select Conditional Formatting > High/Low > Highest Values

Visually compare the two columns.

With the conditional formatting, you can quickly see that Singapore moved into the top 10, making Ireland number 11. For another example, see how Cheryl Pammer uses conditional formatting to identify the shortest wait times for rides at Disney World.

Wrap-up

I’ve said it before and I’ll say it again. The only thing better than doing fearless data alanysis is doing fearless data analysis even faster. The new ways that you can use Minitab’s worksheet are meant to help you get the information that you need with as little time and effort as possible. That’s always a good thing.

Bonus

In Minitab’s support center, you can find all sorts of examples about ways that you can conditionally format cells. Check out the list in the Conditional Formatting section in Data and Data Manipulation.

↧

Understanding Hypothesis Tests: Confidence Intervals and Confidence Levels

April 2, 2015, 5:00 am

≫ Next: Wave a Magic Wand Over Your DOE Analyses

≪ Previous: 3 New Things You Can Do by Right-Clicking in Minitab 17.2

In this series of posts, I show how hypothesis tests and confidence intervals work by focusing on concepts and graphs rather than equations and numbers.

Previously, I used graphs to show what statistical significance really means. In this post, I’ll explain both confidence intervals and confidence levels, and how they’re closely related to P values and significance levels.

How to Correctly Interpret Confidence Intervals and Confidence Levels

Illustration of confidence levels A confidence interval is a range of values that is likely to contain an unknown population parameter. If you draw a random sample many times, a certain percentage of the confidence intervals will contain the population mean. This percentage is the confidence level.

Most frequently, you’ll use confidence intervals to bound the mean or standard deviation, but you can also obtain them for regression coefficients, proportions, rates of occurrence (Poisson), and for the differences between populations.

Just as there is a common misconception of how to interpret P values, there’s a common misconception of how to interpret confidence intervals. In this case, the confidence level isnot the probability that a specific confidence interval contains the population parameter.

The confidence level represents the theoretical ability of the analysis to produce accurate intervals if you are able to assess many intervals and you know the value of the population parameter. For a specific confidence interval from one study, the interval either contains the population value or it does not—there’s no room for probabilities other than 0 or 1. And you can't choose between these two possibilities because you don’t know the value of the population parameter.

"The parameter is an unknown constant and no probability statement concerning its value may be made."
—Jerzy Neyman, original developer of confidence intervals.

This will be easier to understand after we discuss the graph below . . .

With this in mind, how do you interpret confidence intervals?

Confidence intervals serve as good estimates of the population parameter because the procedure tends to produce intervals that contain the parameter. Confidence intervals are comprised of the point estimate (the most likely value) and a margin of error around that point estimate. The margin of error indicates the amount of uncertainty that surrounds the sample estimate of the population parameter.

In this vein, you can use confidence intervals to assess the precision of the sample estimate. For a specific variable, a narrower confidence interval [90 110] suggests a more precise estimate of the population parameter than a wider confidence interval [50 150].

Confidence Intervals and the Margin of Error

Let’s move on to see how confidence intervals account for that margin of error. To do this, we’ll use the same tools that we’ve been using to understand hypothesis tests. I’ll create a sampling distribution using probability distribution plots, the t-distribution, and the variability in our data. We'll base our confidence interval on the energy cost data set that we've been using.

When we looked at significance levels, the graphs displayed a sampling distribution centered on the null hypothesis value, and the outer 5% of the distribution was shaded. For confidence intervals, we need to shift the sampling distribution so that it is centered on the sample mean and shade the middle 95%.

Probability distribution plot that illustrates how a confidence interval works

The shaded area shows the range of sample means that you’d obtain 95% of the time using our sample mean as the point estimate of the population mean. This range [267 394] is our 95% confidence interval.

Using the graph, it’s easier to understand how a specific confidence interval represents the margin of error, or the amount of uncertainty, around the point estimate. The sample mean is the most likely value for the population mean given the information that we have. However, the graph shows it would not be unusual at all for other random samples drawn from the same population to obtain different sample means within the shaded area. These other likely sample means all suggest different values for the population mean. Hence, the interval represents the inherent uncertainty that comes with using sample data.

You can use these graphs to calculate probabilities for specific values. However, notice that you can’t place the population mean on the graph because that value is unknown. Consequently, you can’t calculate probabilities for the population mean, just as Neyman said!

Why P Values and Confidence Intervals Always Agree About Statistical Significance

You can use either P values or confidence intervals to determine whether your results are statistically significant. If a hypothesis test produces both, these results will agree.

The confidence level is equivalent to 1 – the alpha level. So, if your significance level is 0.05, the corresponding confidence level is 95%.

If the P value is less than your significance (alpha) level, the hypothesis test is statistically significant.
If the confidence interval does not contain the null hypothesis value, the results are statistically significant.
If the P value is less than alpha, the confidence interval will not contain the null hypothesis value.

For our example, the P value (0.031) is less than the significance level (0.05), which indicates that our results are statistically significant. Similarly, our 95% confidence interval [267 394] does not include the null hypothesis mean of 260 and we draw the same conclusion.

To understand why the results always agree, let’s recall how both the significance level and confidence level work.

The significance level defines the distance the sample mean must be from the null hypothesis to be considered statistically significant.
The confidence level defines the distance for how close the confidence limits are to sample mean.

Both the significance level and the confidence level define a distance from a limit to a mean. Guess what? The distances in both cases are exactly the same!

The distance equals the critical t-value * standard error of the mean. For our energy cost example data, the distance works out to be $63.57.

Imagine this discussion between the null hypothesis mean and the sample mean:

Null hypothesis mean, hypothesis test representative: Hey buddy! I’ve found that you’re statistically significant because you’re more than $63.57 away from me!

Sample mean, confidence interval representative: Actually, I’m significant because you’re more than $63.57 away from me!

Very agreeable aren’t they? And, they always will agree as long as you compare the correct pairs of P values and confidence intervals. If you compare the incorrect pair, you can get conflicting results, as shown by common mistake #1 in this post.

Closing Thoughts

In statistical analyses, there tends to be a greater focus on P values and simply detecting a significant effect or difference. However, a statistically significant effect is not necessarily meaningful in the real world. For instance, the effect might be too small to be of any practical value.

It’s important to pay attention to the both the magnitude and the precision of the estimated effect. That’s why I'm rather fond of confidence intervals. They allow you to assess these important characteristics along with the statistical significance. You'd like to see a narrow confidence interval where the entire range represents an effect that is meaningful in the real world.

For more about confidence intervals, read my post where I compare them to tolerance intervals and prediction intervals.

If you'd like to see how I made the probability distribution plot, please read: How to Create a Graphical Version of the 1-sample t-Test.

↧

Wave a Magic Wand Over Your DOE Analyses

April 6, 2015, 5:00 am

≫ Next: Getting the Most Out of Your Text Data, Part 1

≪ Previous: Understanding Hypothesis Tests: Confidence Intervals and Confidence Levels

As a member of Minitab's Technical Support team, I get the opportunity to work with many people using DOE (Design of Experiments).

People often will call after they've already chosen their design, run the experiment, and identified the important factors in their process. But now what? They have to find the best settings, but with several factors and responses, what should they do?

"I wish I had a magic wand," they sometimes say. If you do too, you’ve come to the right place.

I will show you how to use the Response Optimizer to let Minitab Statistical Software work its magic to find the best settings for your response(s).

Suppose your factors are Time, Temp, and Catalyst. Your responses are Yield (you want to maximize it) and Cost (which you want to minimize).

Let's also suppose you've used Stat > DOE > Factorial > Analyze Factorial Design to determine the significant factors and verify the assumptions of the models.

You must analyze your design(s) before Response Optimizer will be available because it is based on the statistical model(s) created in the analysis. If you want to view the current design for a response, just rest the mouse on the check mark in the header of the worksheet:

Your final models are:

Yield = 39.48 - 0.1026 Time + 0.01502 Temp + 0.001150 Time*Temp

Cost = 15.96 + 0.0818 Time + 0.07917 Temp - 1.159 Catalyst - 0.000128 Time*Temp + 0.01913 Time*Catalyst + 0.01389 Temp*Catalyst

Now what?

All you have to do is simply choose Stat > DOE > Factorial > Response Optimizer and specify the goal for each response:

Optimizing a Response

Click OK in the Response Optimizer dialog box, and POOF!

Just like magic, Minitab identifies the combination of input variable settings that jointly optimize both responses. (The factor settings in red produced the response values in blue.)

Optimization Plot for Designed Experiment

That’s it! Set Time at 50, Temperature at 200, and Cost is minimized (32.076) and Yield is maximized (48.852).

Click Predict in the upper left corner to see confidence and prediction intervals for the fit(s) in the Session window:

Prediction for Cost, Yield

And there's more! Want to predict for other factor levels? No problem! There's no need to “plug into” the equations and calculate the fitted values. Just use one or both of these methods on the Response Optimizer graph:

Drag the red lines. Yes, they are interactive!
Click on a red value and type a in a different value.

While working with the Response Optimizer graph, you also could save settings, show a list of settings you’ve saved, and go back to the original settings Minitab found to be optimal using the Optimizer toolbar. If you do not see the toolbar, enable it by choosing Tools > Toolbars > Response Optimizer. Rest your mouse on each button to see a description of each tool.

You can take advantage of the Response Optimizer’s magic whenever you're analyzing Factorial, Response Surface, and Mixture designs.

↧

Getting the Most Out of Your Text Data, Part 1

April 7, 2015, 5:00 am

≫ Next: Getting the Most Out of Your Text Data, Part 2

≪ Previous: Wave a Magic Wand Over Your DOE Analyses

With Minitab, it’s easy to create graphs and manage numeric, date/time and text data. Now Minitab 17.2’s enhanced data manipulation features make it even easier to work with text data.

handling text data is easy This is the first of three posts in which I'm going to focus on various tools in Minitab that are useful when working with text data, including the Calculator, the Data menu, and the Editor menu.

Using the Calculator

You might be surprised to hear that Minitab’s Calculator is just as useful with text as it is with numbers. Here are just a few things you can use it for:

ISOLATE CRITICAL INFORMATION

Sometimes it’s helpful to extract individual words or characters from text data for use in isolation. For example, if we have a column of product ID’s and need just part of the letters or numbers that are part of the text string in a column, the LEFT, MID and RIGHT functions in the calculator can be very useful.

The LEFT function in the calculator will extract values from a text string beginning with the leftmost and will stop at the number of characters we specify. In the example above, we could complete the Calc> Calculator dialog box to pull out the two characters on the left side (AB or BC) by completing the dialog box as shown in the example below:

The RIGHT function works in exactly the same way as the LEFT function, except that RIGHT extracts characters beginning with the rightmost. Here we’re pulling out the 4 characters beginning from the right side:

Similarly, we can use the MID function in the calculator to extract the number of characters we want from the middle of a text string. With the MID function, we enter the text column plus the position of the first character we want to extract, then the number of characters we want to extract. In this example we want to extract the 2 characters between the hyphens. In that case the first character we want is the fourth, so we’d complete the Calculator dialog box like this:

COMBINE DATA FOR ADDED MEANING

In other cases, the whole can be greater than the sum of its parts. For example, if values for Month, Day and Year are stored in separate columns, we may want to combine these into a single column:

The month, day and year of each observation was originally recorded in separate columns, which complicates graphing. Fortunately, the calculator can be used to combine the three columns into a single column. To do that, we can use the CONCATENATE function:

The empty space between the double quotes will add a space between the Month and Day, and the comma plus the empty space will add a comma after the Day and a space before the year:

REPLACE INCORRECT PORTIONS OF TEXT DATA

A consistent recording error doesn’t have to result in time-consuming hand-corrections. In this example an operator who handles product returns has noted the incorrect year portion of the catalog code that is used to reference the item:

The calculator’s SUBSTITUTE function can be used to replace the incorrect Spring 13 with Spring 14. The calculator will find the text and replace it with the new text that we specify:

These are just a few of the useful functions included in Minitab’s Calculator. To see a complete list of Minitab’s calculator functions with explanations and examples of how to use each, open Minitab and go to Help> Help.

In the Minitab Help window, choose Calc Menu, from the Calc Menu choose Calculator, and then finally choose the Calculator Functions link from Calculator:

If you’re already using the calculator in Minitab, the easiest way to access the same information is to click the Help button in the lower-left corner of the calculator

In my next post, we’ll explore some of the new text data manipulation features that Minitab 17.2 offers in the Data menu.

↧

Getting the Most Out of Your Text Data, Part 2

April 8, 2015, 5:00 am

≫ Next: Getting the Most Out of Your Text Data Part III

≪ Previous: Getting the Most Out of Your Text Data, Part 1

My previous post focused on manipulating text data using Minitab’s calculator.

In this post we continue to explore some of the useful tools for working with text data, and here we’ll focus on Minitab 17.2’s Data menu. This is the second in a 3-part series, and in the final post we’ll look at the new features in Minitab 17.2’s Editor menu.

Using the Data Menu

When I think of the Data menu, I think manipulation—the data menu in Minitab is used to manipulate the data in the worksheet. This menu is useful for both text and numeric data.

Let's focus on two features from the Data menu in Minitab 17.2: Code and Conditional Formatting. In Minitab’s newest version, the Code command has been updated for ease of use and to provide more control over the results. The Conditional Formatting option is entirely new, and offers various options for highlighting and formatting worksheet data.

Using the Code Command

When working with text data, it is sometimes useful to reduce the number of categories by combining some of the categories into fewer groups. Consider the example below, where we have low or 0 counts for some of the citrus fruits:

Rather than generating a bar chart with no bar for Grapefruit, we could combine some of the citrus fruits into a single Citrus category instead of listing them separately. The Code command in Minitab’s Data menu can help:

By coding our existing text values for Grapefruit, Oranges and Lemons to a new Text category called Citrus, we can reduce the number of categories. To do that in Minitab 17.2, we enter the original column listing the types of fruit in the first field. Then we change the values under Coded value from the current values (Lemons, Oranges, and Grapefruit) to their new value Citrus:

As a final step, we can tell Minitab where we’d like to store the coded results by using the Storage location for the coded columns drop-down list:

For this example, we’ll just keep the default and store the results at the end of the current worksheet and click OK. The coded results can easily be used to create a new bar chart that shows only the Citrus category instead of the individual fruits:

Using Conditional Formatting

This is a completely new feature in Minitab 17.2, which came about as the result of many requests from users who wanted the ability to control the appearance of the data in the worksheet. Because there are many options available in the new menu, we’ll just focus on two options as examples. The other options behave in a similar way, so a basic understanding of these two examples should be helpful in applying the other options in Conditional Formatting.

Often, raw text data is used to create Pareto charts to see the defects are most frequently occurring. But what if we want to highlight the most frequently occurring defect in the worksheet? This is now possible in Minitab 17.2: We use Data> Conditional Formatting> Pareto> Most Frequent Values:

We enter our column listing the defects in the first field, and in the second field we can tell Minitab the number of unique items to highlight. For example, if we want to highlight the two most frequently listed defects, we enter 2. In this example, we only want to highlight the most common defect so we enter 1. Finally, we can tell Minitab what color we’d like to apply to the cells that meet our condition, and then we click OK to see the highlighted cells in the worksheet:

In some situations, it may be useful to highlight specific values in a text column. In fact, we may want multiple colors in a single column, each representing a specific category. For this situation we can use Conditional Formatting > Highlight Cell> Text that Contains:

With this option, we can tell Minitab the color we want for a cell that contains the text that we type into the Format cells that contain field:

First we enter the column with the data in the first field, type the text we want to highlight in the second field (NOTE: This is case-sensitive), and then choose the color from the Style drop-down list. We can repeat this process if we want to apply multiple colors to a single column. In this example, the Low values will be shown in Green, the Medium values will be marked in Green, and the High values will be highlighted in Red:

Finally, after applying conditional formatting to our worksheet, we’ll need an easy way to see all the rules we’ve applied and the ability to remove or change those rules. The Conditional Formatting menu’s Manage Rules option can make the magic happen:

The rules for each column are listed separately, so we choose a column to see the rules that have been applied by selecting the column from the drop-down list at the top:

The rules applied to the selected column are listed under Rules. We can remove a specific rule by clicking on a rule in the Rules list, and then clicking the button with the red X.

We can also use the Format button to change the formatting of a specific rule. For example, I may want to change the rule for Medium from Yellow to my favorite color:

It’s much nicer in hot pink, wouldn’t you agree?

In my final post in this series, we’ll look at the new features that ease the pain of manipulating text data using the Editor menu in Minitab 17.2.

↧

Getting the Most Out of Your Text Data Part III

April 9, 2015, 5:00 am

≫ Next: How Could You Benefit from the Cpm Capability Index?

≪ Previous: Getting the Most Out of Your Text Data, Part 2

The two previous posts in this series focused on manipulating data using Minitab’s calculator and the Data menu.

text data manipulation In this third and final post, we continue to explore helpful features for working with text data and will focus on some new features in Minitab 17.2’s Editor menu.

Using the Editor Menu

The Editor menu is unique in that the options displayed depend on what is currently active (worksheet, graph, or session window). Here we’ll focus on some of the new options available when a worksheet is active. While some options are the same, the new Editor menu looks different than it did in previous versions of Minitab:

There is also some duplication between the updated Data menu that was the focus of my previous post and the updated Editor menu: both menus provide the option for Conditional Formatting. The same conditional formatting options can now be accessed via either the Data or the Editor menus.

Let's consider some examples using features from Find and Replace (Find/Replace Formatted Cell Value), Cell Properties (Comment, Highlight & Custom Formats), Column Properties (Value Order) and Subset Worksheet (Custom Subset).

Find and Replace

This section of the Editor menu includes options for Find Format and Replace Formatted Cell Value—either selection will display the Find Format and Replace Value dialog box:

We can toggle between the two options by using the Find and the Replace tabs at the top.

Both of these options could be useful when making changes to a worksheet that has been formatted using the new conditional formatting options discussed in the previous post in this series.

For example, if we’ve applied conditional formatting to a worksheet to highlight cells with the value ‘Low’ in green, we could use the Replace tab to find all the cells that are green and replace the values in the cells with new values. For example, we can replace the ‘Low’ values that are marked in green with the new value ‘Insignificant’:

Cell Properties

The new Cell Properties option in the Editor menu provides options for adding a Comment to the selected cell, to Highlight specific cells in the worksheet, and to create Custom Formats. These same options can be accessed by right-clicking on a cell and choosing Cell Properties:

The ability to add a comment to a specific cell is new. In previous versions of Minitab it was possible to add a comment to a worksheet or column only. Now we can select the Comment option to add a comment to a cell:

Notice that the top of the window confirms where the comment will be added. In the example above, it will be in C3 in row 6.

Similar to conditional formatting, we can use the Highlight options to highlight only the selected cell or cells:

Finally, the Custom Formats option allows us flexibility in terms of the fill color, font color, and style for the selected cell or cells:

Column Properties

This option in the Editor menu allows us to control the order of text strings in a column. For example, if I create a bar chart using Graph > Bar Chart > Counts of Unique Values using the data in column 1 below, the default output is in alphabetical order:

In some cases, it would be more intuitive to display the order of the bars beginning with Low, then Medium, then High- that is where the Editor menu can help.

First, we click in any cell in column 1 so that the column we want to modify is active, then we select Editor > Column Properties > Value Order:

To change the alphabetical order default, we select the radio button next to User-specified order, and then edit the order under Define an order and click OK. Now the default order will be Low, Medium, High, and we can update our bar chart to reflect that change:

Subset Worksheet

One of the best new enhancements to the Editor menu gives us the ability to quickly and easily create a subset of a worksheet without having to manually type a formula into the calculator.

For example, we may want to create a new worksheet that excludes items that are marked as Low priority. To do that, we can use Editor > Subset Worksheet > Custom Subset:

In this example, we’re telling Minitab that we want to use a condition when we subset- we want to Exclude rows that match our condition. Our condition is based on the column Priority. When we introduce that text column, Minitab automatically shows all the unique values in that column. We select Low as the value we want to exclude from the new worksheet, and then click OK. It’s that simple- no need to guess whether we need to type in single or double-quotes in the subset condition!

I hope this series of posts on working with text data has been useful. If you have an older version of Minitab and would like to use the new features described in these posts, you can download and install the free 30-day trial and check it out!

↧

How Could You Benefit from the Cpm Capability Index?

April 13, 2015, 5:00 am

≫ Next: No Horsing Around with the Poisson Distribution, Troops

≪ Previous: Getting the Most Out of Your Text Data Part III

The Cp and Cpk are well known capability indices commonly used to ensure that a process spread is as small as possible compared to the tolerance interval (Cp), or that it stays well within specifications (Cpk).

Yet another type of capability index exists: the Cpm, which is much less known and used less frequently. The main difference between the Cpm and the other capability indices is that the bias from the target is directly taken into consideration in the Cpm. The bias is the difference between the process average and the target.

In Minitab, as soon as a target is entered into a capability analysis a Cpm gets calculated automatically. To obtain a good Cpm, a process needs to be right on target, whereas to obtain a satisfactory Cpk value, a process simply needs to stay well within specifications. This might seem to lead to the same conclusion; however, there is an obvious and major difference: with the Cpm, a direct penalty is incurred for not being exactly on target.

With : USL : Upper Specification limit, LSL : Lower specification limit, S : the standard deviation

Notice that in the Cpm formula, above, bias is an input. The larger the bias, the smaller the Cpm value.

The Mismatch Issue

Why should we use the Cpm rather than the other capability indices? Because the Cpm is much more efficient at avoiding component mismatches.

Parts are often assembled together to manufacture a given product. In some cases, however, even though the Cpks of the individual parts are excellent, the resulting assembled system may be out of spec and suboptimal.

How could this happen? Suppose that two parts (type 1 part and type 2 part) need to be inserted together. In the capability analyses below, the Ppks (which are the long-term version of the Cpks) are excellent in both cases (the Ppks are larger than 2.2 in both cases and the parts are well within tolerances), but the Cpms are low (below 1) due to a significant shift from the targets. In the graph below, Both Type 1 parts and Type 2 parts are generally larger than they should be (average above target). As a consequence, the two parts will often not fit together. It will become difficult to insert a type 1 part into a type 2 part as the biases add up, thereby generating mismatches.

Keeping biases as small as possible is, therefore, essential. In complex systems, many parts or parameters interact together and should not be considered separately. As advanced products get more and more complex, with many parts subcontracted to external suppliers, matching all subsystems and parts and considering the performance of the system as a whole, rather than the quality of single parts in isolation from one another, becomes crucial.

In this context, the Cpm is clearly more efficient than other capability indices.

Consider the graph above, which displays the values of the resulting assembly of the two parts (Part 1 & Part 2) combined together. The Ppk is poor (below 0.4), as expected, with a large proportion of out-of-spec assembled products. The biases from the two individual parts were rather small initially, but they do add up, so that the resulting bias for the final assembly becomes relatively large.

If the bias is positive for type 1 parts and negative for type 2 parts, matching all parts together will certainly become easier, as the biases will compensate for one another. But in many cases, unfortunately, biases will add up. Only two parts have been considered in this very simple system; however, in the real world, products often contain many critical parts that are combined together.

Using Cpm for Asymmetrical Targets

Another advantage of the Cpm is that if the target is asymmetrical (in other words, if it is not halfway between the specs), the process average quality characteristic should be kept exactly on target. The Cpm is clearly more relevant in this context than the Ppk.

The Cpm capability index is based on the Taguchi loss curve concept. According to Taguchi (a well-known Japanese quality guru), as soon as the process mean deviates from the target, a loss is generated and quality becomes progressively worse. Therefore, just staying within specifications is not enough. The idea that customer losses are generated only when parts are out of specifications is naïve. A product that barely meets the specifications is, from the customer’s point of view, as good or as bad as the product that is barely outside of the specifications.

Quality loss curve : The quality loss is proportional to the square of the distance from the target value.

Conclusion

In the early days of the industrial revolution, skilled craftsmen would match parts together carefully. In the days of mass production, this became impossible due to a lack of time, to the complexity of many advanced high-tech products, and to the sheer number of parts that need to be assembled.

In this context, the Cpm capability index is very useful to avoid component mismatches. This is not only true for part dimensions but also for other part parameters which may also affect the final product when all these parts are combined together.

↧

No Horsing Around with the Poisson Distribution, Troops

April 14, 2015, 5:00 am

≫ Next: Thinking about Predictors in Regression, an Example

≪ Previous: How Could You Benefit from the Cpm Capability Index?

In 1898, Russian economist Ladislaus Bortkiewicz published his first statistics book entitled Das Gesetz der keinem Zahlen, in which he included an example that eventually became famous for illustrating the Poisson distribution. horses

Bortkiewicz researched the annual deaths by horse kicks in the Prussian Army from 1875-1984. Data was recorded from 14 different army corps, with one being the Guard Corps. (According to one Wikipedia article on the subject, the Guard Corps may have been responsible for Prussia’s elite Guard units.) Let's take a closer look at his data and see what Minitab has to say using a Poisson goodness-of-fit test.

Here's the data set (thank you, University of Alabama in Huntsville):

What Is the Poisson Distribution?

As a review, the Poisson distribution is a discrete probability distribution for the counts of events that occur randomly in a given interval of time or space. The Poisson distribution only has one parameter, which is called lambda (or mean). To divert your attention just a little bit before we run our goodness-of-fit test, let’s look at how the distribution changes with different values of lambda. Go to Graph > Probability Distribution Plot > View Single. Select Poisson from the Distribution drop-down and enter in .5 for the mean, then press OK:

After I created my first plot, I created 3 more probability distribution plots with lambda at 2, 4, 10. I then used Minitab’s Layout Tool under the Editor Menu to combine four graphs.

As lambda increases, the graphs begin to resemble a normally distributed curve:

Does This Data Follow a Poisson Distribution?

Interesting, right? But let's get back on track and test if the overall data obtained by Bortkiewicz follows a Poisson distribution.

I first had to stack the data from 14 columns into one column. This is done via Data > Stack > Columns…

With the data stacked, I went to Stat > Basic Statistics > Goodness-of-Fit for Poisson…, filling out the dialog as shown below:

After I clicked OK, Minitab delivered the following results:

The Poisson mean, or lambda, is 0.70. This means that we can expect, on average, 0.70 deaths per one corps per one year. If I knew of these statistics and served in the army corps at that time, I would have treated my horse like gold. Anything my horse wants, it gets.

Further down you’ll see a table showing the observed counts and the Expected Counts for the number of deaths by horse. The expected counts visually mirror pretty well to what was observed. To further validate these claims that this data can be modeled by a Poisson distribution, we can use the p-value for the Goodness-of-Fit Test in the last section of the output.

The hypothesis for the Chi-Square Goodness-of-Fit test for Poisson is:

Ho: The data follow a Poisson distribution

H1: The data do not follow a Poisson distribution

We are going to use an alpha level of 0.05. Since our p-value is greater than our alpha, we can say that we do not have enough evidence to reject the null hypothesis, which is that the horse kick deaths per year follow a Poisson distribution.

The chart below shows how close the both the expected and observed values for deaths are to each other.

I've been thinking about what other data could have been collected to serve as potential predictors if we wanted to do a poisson regression. We could then see if there were any significant relationships between our horse kick death counts and some factor of interest. Maybe corps location or horse breed could have been documented? Given that the space or unit of time is considered one year, that specific location or breed would have to be the same value for the entire length of that time. For example, Corps 14 in 1893 must have remained entirely in “Location A” during that year, or every horse in a particular corps must be of the same breed for a particular year.

According to equusmagazine.com, horses kick for six reasons:

"I feel threatened."
"I feel good."
"I hurt."
"I feel frustrated."
"Back off."
"I'm the boss around here."

Wouldn’t this have made for a great categorical variable?

↧

Thinking about Predictors in Regression, an Example

April 15, 2015, 10:02 am

≫ Next: Data Driven Analysis of the Republican Field of Presidential Candidates for 2016

≪ Previous: No Horsing Around with the Poisson Distribution, Troops

A few times a year, the Bureau of Labor Statistics (BLS) publishes a Spotlight on Statistics Article. The first such article of 2015 recently arrived, providing analysis of trends in long-term unemployment.

Certainly an interesting read on its own, but some of the included data gives us a good opportunity to look at how thought can improve your regression analysis. Fortunately, Minitab Statistical Software includes 3-D graphs and Regression Diagnostics that can help you spot opportunities for improvement.

The first chart in the report highlights how high the share of the unemployed who are unemployed for a long time is compared to historical levels. That chart looks a bit like this:

Percent of total unemployed in each category tend to follow each other.

The discussion points out an interesting relationship. The authors note that the record high for those unemployed 27 weeks or longer occurred in the second quarter of 2010. The record high for those unemployed 52 weeks or longer occurred in the second quarter of 2011. The record high for those unemployed 99 weeks or longer occurred in the 4th quarter of 2011. That is, the highest proportion of unemployed in each category happens earlier for shorter terms.

This relationship is where we can see how to put some thought into regression variables. Let’s say that we want to predict the percentage of unemployed who will have been unemployed for 99 weeks or longer, using the other two figures. The most natural setup for the data is for all of the figures to be in the same row by date, like this:

In this worksheet, each column starts in row 1.

When your data are set up like this, it’s natural to want to analyze the data this way. The relationship that you get this way is strong. If you looked at the R-squared statistics, you might stop.

Model Summary

S R-sq R-sq(adj) R-sq(pred)
0.963437 94.69% 94.56% 93.96%

But if you look a little deeper, you might find that there are some unsatisfactory aspects with the variables this way. Here's what the relationship looks like when you plot all 3 variables on a 3-D graph:

The relationship between the variables is weaker as the values increase.

I’ve marked the points on this graph that have unusual predictor values. In the diagnostic report for the model, we can see that these points are followed by large standardized residuals. That is, the lag that the article pointed out in the maximums shows up in the regression relationship as well.

Fits and Diagnostics for Unusual Observations

       99 weeks
        Percent
Obs unemployed     Fit   Resid Std Resid
63       4.500   3.219   1.281       1.54     X
64       5.800   6.793 -0.993      -1.11     X
65       6.500   8.323 -1.823      -2.03 R X
66       9.500 13.152 -3.652      -3.92 R
67       9.600 12.786 -3.186      -3.40 R
68      10.700 14.019 -3.319      -3.57 R
75      14.300 12.387   1.913       2.04 R

R Large residual
X Unusual X

If you think about the predictor variables, this makes perfect sense. The BLS report notes that finding a job is less likely the longer you are unemployed. People unemployed for more than 27 weeks can become people who are unemployed for longer than 52 weeks. People who are unemployed for more than 52 weeks can become people who are unemployed longer than 99 weeks.

So what are the right predictors to use for the percentage of the unemployed for longer than 99 weeks? The closest we can get with terms provided is probably that people who are unemployed for over 27 weeks can become people who are unemployed for over 99 weeks about 4 quarters later. Similarly, people who are unemployed for over 52 weeks can become people who are unemployed for over 99 weeks about 2 quarters later.

To get these variables in Minitab, use the Time Series menu.

Choose Stat > Time Series > Lag.
In Series, enter 'Over 27 Weeks'.
In Store lags in, enter‘Over 27 Lag 4’.
In Lag, enter 4.
Press CTRL + E.
In Series, enter 'Over 52 Weeks'.
In Store lags in, enter‘Over 52 Lag 2’.
In Lag, enter 2.

The resulting worksheet looks like this:

New variables are in this worksheet that line up the rows at more logical intervals.

Now, the value for the percentage unemployed over 27 weeks lines from the first quarter of 1994 lines up with the percentage of unemployed over 52 weeks from the third quarter of 1994 and the percentage unemployed over 99 weeks from the first quarter of 1995. Plot these data and the relationship looks stronger than before:

The relationship between the response and the lagged predictors looks stronger.

Highlighting the same 3 points from the first graph in red, the points don’t seem unusual at all. In fact, these points don’t appear in the diagnostic report anymore. One point still has a large standardized residual and it is preceded by an unusual X value. But the regression that compare appropriate time frames explains more variation in the data than the regression that compares simultaneous ones.

Model Summary

S R-sq R-sq(adj) R-sq(pred)
0.676735 97.50% 97.43% 97.04%

Fits and Diagnostics for Unusual Observations

       99 weeks
        Percent
Obs unemployed     Fit   Resid Std Resid
66       9.500   8.866   0.634       1.01     X
68      10.700 13.357 -2.657      -4.40 R X

R Large residual
X Unusual X

Minitab Statistical Software provides a number of ways for you to evaluate your regression model. If your diagnostics reveal model inadequacies, the you have a lot of easy ways to make improvements. I used lag to create appropriate variables. If you’re ready for more, check out how Bruno Scibilia uses includes interactions in his model for wine tasting or explains the benefits of a Box-Cox transformation.

↧

Data Driven Analysis of the Republican Field of Presidential Candidates for 2016

April 16, 2015, 5:00 am

≫ Next: The Easiest Way to Do Capability Analysis

≪ Previous: Thinking about Predictors in Regression, an Example

American flag The 2016 presidential race is becoming more real. We’ve had several announcements with Ted Cruz, Rand Paul, Hillary Clinton, and Marco Rubio officially entering the race to be President. While the prospective Democratic candidates are down to one, or at most a few, the Republican field is extra-large this election cycle. The first order of business for a GOP candidate is to survive the nomination process in order to become the nominee for the Republican Party. In this post, I assess the strengths and weaknesses of 15 potential and actual Republican presidential candidates.

The analysis I’ll use is the Solution Desirability Matrix in Quality Companion and Qeystone Tools. In a quality improvement project, you might use this tool to see how well several different solutions align with your organization’s various requirements. Here, I’ll use it to assess the qualities of the potential candidates for the GOP presidential nominee.

The model and values I present were developed by my good friend Chris Jordan and me. Presidential politics is a spectator sport for us! We’ve even developed our own fantasy politics game and held a draft from both political parties back in February.

After the draft, we naturally started to second-guess our choices and wanted to determine how well we had drafted. We decided to rigorously and objectively assess the Republican candidates to answer our questions.

I previously used the Solution Desirability Matrix to see who Mitt Romney might choose for Vice President in 2012. In fact, at the time, Chris and I had a bet about Romney’s choice. I credit winning that bet to the Solution Desirability Matrix!

The Solution Desirability Matrix is a multi-dimensional decision matrix that helps make subjective decisions more objective. This matrix provides a semi-scientific method for selecting which of several competing designs or strategies best matches a list of requirements. The proposals are scored on how well each improvement proposal matches the selection criteria.

8 Criteria to Evaluate the Presidential Candidates

Chris and I pooled our thoughts and collected data to thoroughly assess the candidates using a broad set of measures with the best information available at this time. Our goal was to assess the strengths of each candidate and rank them in terms of predicted performance during the Republican nomination process.

We identified eight criteria that measure very different characteristics. While any individual criterion is not very predictive at this early date, a candidate who ranks highly on a number of criteria shows a broader foundation for ongoing success.

Below are the criteria we identified and how we scored the candidates. The scores range from 1 to 9, where higher scores are better. All scores are based on data obtained before any official announcements in order to avoid the inevitable but temporary bounce in the polls. We didn’t want to rate the candidates based on a short-term bounce.

Net positive * Name recognition: Harry Enten of 538 found a positive correlation between favorability ratings and name recognition. In other words, high name recognition is associated with higher favorability ratings. However, if a specific candidate doesn’t follow this pattern (widely known but is not liked as much as predicted), it’s an obstacle. Using this measure, Mike Huckabee is the highest-rated candidate because he is both well known from his previous presidential bids and as a TV commentator, and he has very good favorability ratings. Chris Christie ranks the lowest because he is well-known but has very low favorability ratings.

Activity in Early Primary/Caucus states: The states that traditionally have early primaries and caucuses are very important in the nomination process. The typical pattern for an election with no Republican incumbent is that one candidate wins Iowa and another candidate wins New Hampshire. South Carolina breaks the tie and determines who becomes the nominee. History shows that candidates who don’t have wins in these three states don’t win the nomination even if they have otherwise promising starts. Just ask Jon Huntsman, Rudy Giuliani and Michelle Bachmann.

Campaign activity in these early states is important for a variety of reasons. This activity shows that a candidate is taking the race seriously and can indicate intent to run even before the official announcement. It also shows that a candidate has a proven ability to put together an effective ground game and engage in the more personal style of retail politics. This style requires different strengths than the mass-marketing techniques of wholesale politics.

Rick Perry, Rand Paul, and Ted Cruz have all had a large number of campaign events in these states, while Jeb Bush and Chris Christie have had very few.

Polling Data: We include polling data from Real Clear Politics for both the early states and nationally. We include a separate polling measure for the early states because, as discussed above, they are crucial.

A strong candidate will do well in both the early states and nationally. However, it’s possible that a candidate could do well in one arena but not another, or neither. Having both variables helps delineate the breadth of their current support. Jeb Bush and Scott Walker do well both in the polls of early states as well nationally. Ben Carson does well nationally but not in the early states. Rick Perry and Bobby Jindal don’t do well in either measure.

Fundraising: Raising money is critically important in modern elections. Consequently, we’ve given fundraising double the weight in our analysis. Jeb Bush is the undisputed winner of this category. His gains, in large part, have come at Chris Christie’s expense by successfully raiding Christie’s backyard, i.e., the rich donors in the New York City area.

Leadership Experience: A recent Pew Research Poll shows that the type and amount of leadership experience that a presidential candidate has is particularly important to voters during this election cycle. According to the poll, the most valued types of leadership are in the military, as Governor, or a business executive. Experience in Washington, D.C., say as a Senator, is much further down the list—but it is better than no leadership experience.

With this poll in mind, we gave high scores to the candidates who have been Governors or business executives (Carly Fiorina). Senators got a middling score while Dr. Ben Carson received the lowest score.

Ideological Fit: This measure assesses how conservative each candidate is compared to recent GOP presidential nominees. Nate Silver of 538 assessed the conservativeness of the GOP presidential candidates using their fundraising sources, voting records, and public statements. Silver found that recent Republican nominees have fallen within a specific range of conservatism. Candidates who are outside this range have not been able to secure the Republican nomination in recent history.

In our analysis, candidates in the correct range get a high score. Jeb Bush gets the highest score because he is the closest to the optimal amount. Bobby Jindal, Carly Fiorina, John Kasich, and Lindsey Graham also get high scores for being close. When a candidate is further away from this optimal zone, in either the liberal or conservative direction, their score drops. For example, Chris Christie gets a low score because he is the most liberal, while Rand Paul gets a low score because he is the most conservative.

General Election Match-Up: This measure shows how well each Republican candidate fares against Hillary Clinton in a hypothetical general election. We include this measure to capture an electability aspect. It’s not uncommon for voters to prefer one candidate for ideological reasons but support another candidate who is perceived to have a greater chance of winning the election.

According to Real Clear Politics, all GOP candidates lose to Clinton in these polls. We gave three scores to represent three groups of candidates. The differences within each group are too small to be meaningful. One group loses to Clinton by a smaller percentage than the other group, and we gave them the higher score. There is no election match-up poll data available on Real Clear Politics for candidates in the third group. We gave this group the lowest score of 1 because we felt the lack of data is indicative of ineffectively registering in public opinion.

Analysis Results of the Republican Candidates

The results show clearly that a candidate must have a broad collection of strengths to be able standout from the pack. Candidates with only a couple of strengths quickly get lost in the crowd.

We think this is a fairly robust model. As we refined the model by adding variables and finding better data, the overall ranks of the candidates eventually stabilized and stopped changing.

The candidates are sorted from better to worse scores. Be sure to check out the discussion below the results.

Solution desirability matrix of Republican presidential candidates

Discussion of the Candidate Rankings The Top Tier: Jeb Bush and Scott Walker

Jeb Bush and Scott Walker are both Republican Party establishment candidates. Historically, establishment candidates have tended to win the GOP presidential nomination.

Jeb Bush emerges as the frontrunner in our analysis, which matches the general consensus. He has the largest number of perfect scores of 9 in our analysis, which reflects a multitude of strengths. Indeed, one of his few “weaknesses” is that he has rarely appeared in the early states. However, instead of campaigning in these states, he’s been busy raising a vast amount of money and locking in the support of party leaders. This strategy is undoubtedly a smart trade-off because he still polls well in the early states. The former Governor of Florida has a respectable favorability rating but it’s not as high as you’d expect given his high name recognition. Presumably, this is baggage due to his last name, which could be a limiting factor.

Scott Walker is the current Governor of Wisconsin and there’s not a lot of daylight between him and Bush. The biggest difference between these two is their ideological fit scores. Walker is very conservative compared to Bush, who is at the optimal value for securing the nomination. On the positive side, Walker has substantially lower name recognition than Bush. It’s possible that Walker's favorability rating can increase as he becomes more widely known. This area of potential growth is not available to Bush and could tip the balance in Walker’s favor. However, it remains to be seen whether he can capitalize on this opportunity.

The Second Tier: Mike Huckabee and Rand Paul

Mike Huckabee and Rand Paul are tied for third in our analysis. Our assessment is that while they clearly have several key strengths they also face serious obstacles. Even if neither wins the nomination, they have a good shot at influencing the conversation during the process.

Mike Huckabee was the runner-up to John McCain in the nomination of 2008. His key strengths are that he is the former Governor of Arkansas and that he is both well-known and well-liked. Huckabee enjoys the top favorability scores in our analysis, but this also doesn’t allow him any room for future growth in this area—he’s a known entity. In 2008, his support was mainly limited to the South. If he can’t expand that base of support it will be difficult for him to secure the nomination.

Rand Paul is a very new Senator from Kentucky who, as a Tea Party candidate, won a surprise victory over the favored establishment candidate. In our analysis, Paul fares best in the general election match-up polls, he’s been very active in the early states, and he has very good favorable ratings. However, he does not have much political experience and he is the most conservative major candidate. Paul’s big problem is that he built his political base on positions very far to the right. In order to expand his base enough to win the nomination, he’ll need to move to the center on a variety of positions without losing his original base. That’s a tall order.

Quick Takes on Other Republican Candidates

Senator Marco Rubio of Florida ranks #5 in our analysis. While he is very young, he's a rising star in the Republican Party due to his natural charisma. Rubio does well in the general election match-up and has a good net favorability score, but everything else is middling to low, particularly in the crucial early states. His challenge is two-fold. On the one hand, Rubio is fairly conservative and won't be able to run with the establishment support that more moderate Jeb Bush receives. On the other hand, Rubio is well behind in the polls compared to not one but three other candidates (Walker, Huckabee, and Paul) who are also running to Bush's right. Walker, in particular, is only slightly older than Rubio, has a similar conservativeness rating, and has much more experience. Even if this isn't Rubio's year, he seems to have great potential down the road.

Chris Christie,the current Governor of New Jersey and the former establishment favorite, comes in at #10. After several controversies and being seen increasingly as too liberal to be the Republican nominee (those photo ops with President Obama after hurricane Sandy didn’t help), he’s lost that establishment support to Bush. He also hasn't shown up in the early states and lost many anticipated donors to Bush. Perhaps his biggest problem is his terrible favorability by name recognition score. Christie is well-known but disliked. Like Huckabee, Christie is a known entity, but in a bad way. This makes everything more difficult for Chris Christie and many are wondering whether he even has a plan.

We’re still at a very early point in the Republican nomination process. There are always surprises in any political contest, but we think this model reflects the state of the race as it is today. As the situation changes, we’ll revisit and adjust the model as necessary.

The photos of Jeb Bush and Mike Huckabee are by Gage Skidmore and used under this Creative Commons license.

↧

The Easiest Way to Do Capability Analysis

April 20, 2015, 5:00 am

≫ Next: Item Analysis with Cronbach's Alpha for Reliable Surveys

≪ Previous: Data Driven Analysis of the Republican Field of Presidential Candidates for 2016

A while back, I offered an overview of process capability analysis that emphasized the importance of matching your analysis to the distribution of your data.

If you're already familiar with different types of distributions, Minitab makes it easy to identify what type of data you're working with, or to transform your data to approximate the normal distribution.

But what if you're not so great with probability distributions, or you're not sure about how or even if you should transform your data? You can still do capability analysis with the Assistant in Minitab Statistical Software. Even if you're a stats whiz, the Assistant's easy-to-follow output can make the task of explaining your results much easier to people who don't share your expertise.

Let's walk through an example of capability analysis with non-normal data, using the Assistant.

The Easy Way to Do Capability Analysis on Non-normal Data

For this example, we'll use a data set that's included with Minitab Statistical Software. (If you're not already using Minitab, download the free trial and follow along.) Click File > Open Worksheet, and then click the button labeled "Look in Minitab Sample Data folder." Open the dataset named Tiles.

This is data from a manufacturer of floor tiles. The company is concerned about the flexibility of the tiles, and the data set contains data collected on 10 tiles produced on each of 10 consecutive working days.

Select Assistant > Capability Analysis in Minitab:

Capability Analysis

The Assistant presents you with simple decision tree that will guide you to the right kind of capability analysis:

The first decision we need to make is what type of data we've collected—Continuous or Attribute. If you're not sure what the difference is, you can just click the "Data Type" diamond to see a straightforward explanation.

Attribute data involves counts and characteristics, while Continuous data involves measurements of factors such as height, length, weight, and so on, so it's pretty easy to recognize that the measurements of tile flexibility are continuous data. With that question settled, the Assistant leads us to the "Capability Analysis" button:

capability analysis option

Clicking that button brings up the dialog shown below. Our data are all in the "Warping" column of the worksheet. The subgroup size is "10", since we measured 10 samples on each day. Enter "8" as the upper spec limit, because that's the customer's guideline.

capability dialog

Then press OK.

Transforming Non-normal Data

Uh-oh—the Assistant immediately gives us a warning. Our data don't meet the assumption of normality:

normality test

When you click "Yes," the Assistant will transform the data automatically (using the Box-Cox transformation) and continue the analysis. Once the analysis is complete, you'll get a Report Card that alerts you if there are potential issues with your analysis, a Diagnostic Report that assesses the stability of your process and the normality of your data, a detailed Process Performance Report, and a Summary Report that captures the bottom line results of your analysis and presents them in plain language.

capability analysis summary report

The Ppk of .75 is below the typical industry acceptability benchmark of 1.33, so this process is not capable. Looks like we have some opportunities to improve the quality of our process!

Comparing Before and After Capability Analysis Results

Once we've made adjustments to the process, we can also use the Assistant to see how much of an impact those changes have had. The Assistant's Before/After Capability Analysis is just what we need:

Before/After Capability Analysis

The dialog box for this analysis is very similar to that for the first capability analysis we performed, but this time we can select a column of data from before we made improvements (Baseline process data), and a column of data collected after our improvements were implemented:

before-after capability analysis dialog box

Press OK and the Assistant will again check if you want to transform your data for normality before it proceeds with the analysis. Then it presents us with a series of reports that make it easy to see the impact of our changes. The summary report gives you the bottom line quickly.

before/after capability analysis summary report

The changes did affect the process variability, and this process now has a Ppk of 1.94, a vast improvement over the original value of .75, and well above the 1.33 benchmark for acceptability.

I hope this post helps you see how the Assistant can make performing capability analyses easier, and that you'll be able to get more value from your process data as a result.

↧

Item Analysis with Cronbach's Alpha for Reliable Surveys

April 21, 2015, 5:00 am

≫ Next: Continuous Process Improvement and the Homework Dilemma…

≪ Previous: The Easiest Way to Do Capability Analysis

Many of the things you need to monitor can be measured in a concrete, objective way, such as an item's weight or length. But, many important characteristics are more subjective, such as the collaborative culture of the workplace, or an individual's political outlook.

A survey is an excellent way to measure these kinds of characteristics. To better understand a characteristic, a researcher asks multiple questions about it. For example, rather than simply ask diners whether they are satisfied, a researcher may ask:

How satisfied are you with our services?
How likely are you to visit our restaurant again?
How likely are you to recommend our restaurant?

Collectively, these questions give the researcher a deeper, more nuanced understanding of customer satisfaction than a single question.

The challenge is to ask questions that vary enough to measure the different facets of the characteristic, yet still relate to the same characteristic. If you ask questions that don't measure the same characteristic, your survey will produce misleading data, which can lead you to make poor, and potentially costly, decisions. So, how do you know whether different questions all measure the same characteristic?

Item Analysis with Cronbach's alpha can help, and it's easy to do in Minitab's statistical software.

What Is Item Analysis?

Item Analysis tells you how well a set of questions (or items) measures one characteristic (or construct) and helps to identify questions that are problematic.

For example, two questions measure different aspects of quality on a Likert scale (1 is worst, 5 is best). For the most part, respondents who rated Question 1 high also rated Question 2 high. And, those who rated Question 1 low tended to rate Question 2 low. This correlation suggests these questions measure the same characteristic, and so comprise a reliable survey.

Scatterplot of Question 1 vs Question 2

However, for Question 1 and Question 4, respondents gave markedly different ratings. This lack of a correlation indicates that the items do not measure the same characteristic.

scatterplot of question 1 vs question 4

Cronbach's alpha and other key statistics

Item Analysis helps you to evaluate the correlation of related survey items with only a few statistics. Most important is Cronbach's alpha, a single number that tells you how well a set of items measures a single characteristic. This statistic is an overall item correlation where the values range between 0 and 1. Values above 0.7 are often considered to be acceptable.

To identify problematic items, look at the Omitted Item Statistics section of the output. This section tells you how removing any one item from the analysis improves or worsens Cronbach's alpha. This information allows you to fine-tune your survey, keeping the good questions while replacing the bad.

Trust your data

Suppose a bank surveys customers to assess customer satisfaction.

Cronbach's alpha

Analysts use Item Analysis to determine how well all of the questions measure customer satisfaction. The results show that Cronbach's alpha is quite high: 0.9550. The bank can trust the three questions in the survey reliably assess the same construct, customer satisfaction.

Reveal an unreliable survey

Now suppose a medical group surveys patients who are in physical rehabilitation to assess their degree of mobility.

Cronbach's alpha

Analysts use Item Analysis to determine whether all of the questions measure mobility. The results show that Cronbach's alpha is quite low: 0.5191. This value suggests the questions do not all measure mobility.

Identify a problematic question

Item Analysis provides more than just a passing or failing grade; it also helps you identify problematic questions.

Suppose a manufacturing company surveys its employees to assess the strength of the safety culture in its factories. The survey asks the respondents to indicate the strength of their agreement with statements such as the following:

When a safety mistake is made but no one is harmed, the mistake is usually reported.
My supervisor wants us to work faster, but not by taking shortcuts on safety.
Our procedures and systems are good at preventing errors.
I feel that I am safe at work.

Cronbach's alpha

Cronbach's alpha is above 0.7, which is promising. However, looking at the Omitted Item Statistics output shows us that Cronbach's alpha increases from 0.7853 to 0.921674 when Minitab removes Question 4 from the analysis.

Collectively, the results suggest that Questions 1, 2, and 3 are the best indicators of the safety culture. The manager should remove Question 4 from the analysis and possibly replace it in future surveys.

Conducting an Item Analysis in Minitab

Analyzing your own survey data is easy.

Choose Stat > Multivariate > Item Analysis.
In Variables, enter all items which measure the same construct.
If your items are measured on different scales, check Standardize variables.
Click OK.

Item Analysis

Putting Item Analysis to Use

Surveys and tests are like any other measurement tool—you first need to assess whether your data are reliable. Minitab's Item Analysis evaluates your survey responses so you can trust your data and be confident in the decisions you make as a result.

↧

Continuous Process Improvement and the Homework Dilemma…

April 22, 2015, 5:00 am

≫ Next: LeBron vs. Jordan? Hey, What About Tim Duncan?

≪ Previous: Item Analysis with Cronbach's Alpha for Reliable Surveys

Generally speaking, I have a problem with authority. I don’t like being told what to do or how to do it. I’m not proud of that.

I recall debating with my High School Trigonometry teacher regarding the value of the homework “process.” Specifically, in those situations where the student in question did not require practice to get an A. And, if saidstudent was getting a 98% on the exams, why spend effort trying for those last 2 points? In the world of cost/benefit analysis, what’s the point? That homework effort may have cut into saidstudent’s social time and saidstudent had to make choices.

To this day, I believe that I had a valid point. Although I’d recommend, based on an unfortunate turn of events that resulted in an 89% and a difficult explanation to my parents, there’s probably a time and place to disagree with process. Touché, Mr. Petrunyak. Touché.

Fast forward many decades, and now my job is to deliver products in the most effective and efficient way possible and to continuously improve upon our methods and approaches to doing just that. And that often involves “process,” including analysis, diagrams, checklists, etc. Getting better and better at what we do. While I have no personal vendetta against process diagrams, checklists and general process of getting better, I believe, as with homework, they are tools that have their time and place.

Case in point.

Minitab, like many companies, has a variety of products at different stages of the product lifecycle. We have products that have been around for a while, products newly introduced or on the cusp of introduction and products that are a mere twinkle in a statistician’s eye. So, while we actively pursue process improvement, its application varies dramatically based on this and other factors.

“Established” Stage

Products that have been in the field for years tend to have a developed architecture and team. We also have plenty of data available on usage, issues, etc. In these cases, we are often interested in improved cadence or efficiency, so we’ll employ many traditional CPI methods. We regularly assess the data and make improvements in the development methodologies to improve our ability to get features to our customers in a more efficient manner. We can glean from the data how best to “spend” our testing dollars based on configurations in the field, risk areas of the applications, etc. We can use the data to identify high-risk areas requiring mitigation strategies. It’s the perfect place for some good, old fashioned data analysis.

The example below is from a one-year post-release assessment performed by our product development team. At Minitab, the product development team owns the release, pre- and post-release. They own the results and adjust based on their findings. The Pareto Chart depicts the types of issues found. From that, they identify and implement areas of improvement.

“Show Time” Stage

For products that are newly introduced or on the cusp of introduction, it’s a different ballgame. We don’t jump out of the gate thinking “Great—let’s pull out the Pareto Charts!” Instead, we step back, observe and learn. We don’t have an excessive amount of data on usage, yet so we aren’t churning out a variety of charts. We evaluate the information as it comes and we adjust, but it’s not process-heavy. We are evolving the process as we are learning.

“Twinkle in the Eye” Stage

This is someplace we don’t go peddling our CPI wares. Innovation drives this guy, not process. Sure there’s market data and analysis. And, sure, there’s learning. But, we aren’t evaluating the development process like we do with products in the field. For example, we aren’t assessing how “buggy” the code was the first time we prototyped something—it doesn’t matter at this point, and it’s not how to win statistician friends. If anyone would ever want to win a statistician friend, that is.

So, we learn and we get better. But we recognize the limits of standardizing our improvement processes. What fits for one part of the business doesn’t necessarily fit for another. But through careful consideration and effective data analysis, the way we've tailored our approach works for us.

↧

LeBron vs. Jordan? Hey, What About Tim Duncan?

April 24, 2015, 5:34 am

≫ Next: What Should I Do If My Data Is Not Normal?

≪ Previous: Continuous Process Improvement and the Homework Dilemma…

The NBA playoffs are under way, and all eyes are on LeBron James to see if he can finally bring a championship to Cleveland. But one could argue that there is even a bigger storyline going on: whether Tim Duncan can equal Michael Jordan’s six NBA Championships.

Duncan is currently in his 18th season in the NBA, and he is still playing at a very high level. Yet, he’s never in the conversation when it comes to the greatest players of all time.

But should he be?

Two years ago I did a post using Minitab Statistical Software to compare LeBron James and Michael Jordan. But with Duncan now looking for his sixth championship, I thought I would stir the pot and throw his name into the hat. Let’s see how Duncan’s statistics stack up to two of the greatest players of all time. You can get the data I used here.

Round 1: How Far Do You Lead Your Team?

Let’s start by seeing how far each player went in the playoffs. I’m including Jordan’s two seasons at the end of his career with the Wizards here. After all, he was about the same age on those teams as Duncan is now!

Even including his two seasons with the Wizards, 40% of Jordan’s seasons ended with him winning a championship. Tim Duncan isn’t too far behind him at 31%. LeBron is a distant 3rd, but unlike Duncan and Jordan, he still has a few more years in his prime to change that.

The most impressive thing this shows is that in his 18 seasons, Tim Duncan has never once missed the playoffs. LeBron missed them in his first two seasons in the league, and Jordan did so in his last two. But Duncan never has, and on top of that he’s only lost in the first round twice (I’m not counting the 2000 NBA Playoffs, where the Spurs lost in the first round but Duncan didn’t play in any of the games due to an injury.) That’s quite impressive!

Even if Duncan wins his 6th championship this year, Jordan will still have won his in fewer seasons. So although he’s close, Duncan doesn’t quite stack up to Michael. And if LeBron is going to challenge either player here, he better win a few more titles.

Of course titles are won by a team, not an individual person. So how will Duncan stack up to LeBron and Jordan when we start looking at some individual statistics?

Round 2: Who Is the Better Scorer in the Playoffs?

You can’t win unless you score more points than the other team, so let’s look at the scoring averages for each player. I’m only looking at playoff games, since that is where legacies are made. I also used points per 36 minutes so that overtime games don’t cloud the results.

Here is a time series plot of how each player performed in the playoffs as their careers progressed:

Jordan clearly looks like the best player here. His lowest points-per-36 average (his rookie year) is higher than Duncan’s highest average (the 2006 NBA Playoffs when the Spurs beat the Cavs for the championship). And LeBron has never been able to top Jordan when they were at similar points in their career.

Another interesting thing to note is that after his first two seasons, Jordan pretty consistently scored close to 30 points per 36 minutes in the playoffs. Meanwhile, LeBron and Duncan have a much higher variability. And even Duncan's better scoring years rarely top LeBron and never top Jordan. It looks like Duncan just isn't as good a scorer as the other two players.

Round 3: How Efficient Are You?

But how many points you score doesn’t always tell the entire story. If you score 37 points but it takes you 50 shots to do so, then it really isn’t all that impressive. You didn’t have a great game, you just shot the ball a lot. Plus only looking at scoring doesn’t account for rebounds, assists, steals, turnovers, etc.

So I’m going to use a stat called the Player Efficiency Rating (PER). It’s a measure of per-minute production standardized so that the league average is 15. It accounts for both how efficiently a player scores, and how they do in the “non-scoring” aspects of the game. It isn’t perfect, but it’s the best statistic we have to determine who the best all-around player is.

So let’s see who is better! Again, I’m only using playoff games.

Jordan doesn't look as dominant as he did in the previous plot, but he still looks like the winner. Michael’s least efficient playoff performance still beats 4 of LeBron’s seasons and 8 of Duncan’s. And check out Duncan’s PER in his 14th season (2011, when the Spurs lost to the Thunder in the Western Conference Finals). His value of 15.5 is barely better than the league average of 15. Neither Jordan nor LeBron have a value anywhere close to 15. And the fact that the Spurs reached the Conference Finals that year shows that his teammates were good enough to win games even if Duncan wasn’t playing well. How many teams could you say that about for LeBron or Jordan?

I do want to point out the ridiculous playoff performance by LeBron James in his 6th season. It was the 2009 playoffs, when he led Cleveland to the Eastern Conference Finals only to be upset by the Orlando Magic. In those playoffs, James had three 40-point games and never scored fewer than 25 points. And he did it all while shooting over 50%. Plus, in an elimination game against Orlando he had a triple double and staved off elimination for the Cavs (at least for one game). But after a playoff performance that far exceeded anything Jordan did, LeBron started to get stuck with the “unclutch” label. Sports media, everybody!

Getting back to Duncan, it looks as if his best playoff performances are behind him, as his PER has decreased the last couple of seasons. But he did hold his own pretty well in the prime of his career, and his best playoff performance (PER of 31.8 in 2002) is almost as good as Jordan’s best playoff performance (PER of 32 in 1991). But “almost as good as Jordan” isn’t nearly the same as being better than Jordan.

Round 4: What about the Regular Season?

So far we’ve only looked at what each player has done in the post-season. But it would be mistake to completely ignore the huge sample of games in the regular season, so let’s look at each player’s PER in the regular season.

Duncan is clearly in 3rd place here. He was only able to top LeBron’s regular season PER in their rookie season. And the only time he beat Jordan was a season Jordan was with the Wizards, and the season Jordan returned from retirement for 17 regular season games after not playing basketball for 2 years.

So is Tim Duncan a great NBA player? Absolutely. Is he the greatest NBA player? No, not quite. While he has had great success at the team level (more than LeBron and almost as much as Jordan) his individual statistics aren’t quite consistently on the same level as Jordan or LeBron.

But you should take that more as a compliment to Jordan and LeBron than an insult to Duncan, who’s still been a very great player on some great teams. And if the Spurs end up beat LeBron and the Cavs in the Finals this year, he’ll end up getting the last laugh.

Photograph "Tim Duncan" by Keith Allison. Licensed under Creative Commons Attribution ShareAlike 2.0.

↧

What Should I Do If My Data Is Not Normal?

April 27, 2015, 5:00 am

≫ Next: Cp and Cpk: Two Process Perspectives, One Process Reality

≪ Previous: LeBron vs. Jordan? Hey, What About Tim Duncan?

As a Minitab trainer, one of the most common questions I get from training participants is "what should I do when my data isn’t normal?" A large number of statistical tests are based on the assumption of normality, so not having data that is normally distributed typically instills a lot of fear.

Many practitioners suggest that if your data are not normal, you should do a nonparametric version of the test, which does not assume normality. From my experience, I would say that if you have non-normal data, you may look at the nonparametric version of the test you are interested in running. But more important, if the test you are running is not sensitive to normality, you may still run it even if the data are not normal.

What tests are robust to the assumption of normality?

Several tests are "robust" to the assumption of normality, including t-tests (1-sample, 2-sample, and paired t-tests), Analysis of Variance (ANOVA), Regression, and Design of Experiments (DOE). The trick I use to remember which tests are robust to normality is to recognize that tests which make inferences about means, or about the expected average response at certain factor levels, are generally robust to normality. That is why even though normality is an underlying assumption for the tests above, they should work for nonnormal data almost as well as if the data (or residuals) were normal.

The following example that illustrates this point. (You can download the data set here and follow along if you would like. If you don't have Minitab, download the free 30-day trial version, too.)

Generating random data from a Gamma distribution with a scale of 1 and a shape of 2 will produce data that is bounded at 0, and highly skewed. The theoretical mean of this data is 2. It should be clear that the data is not normal—not even approximately normal!

What if I want to test the hypothesis that the population mean is 2? Would I be able to do it effectively with a 1-sample t-test? If normality is not strictly required, I should be able to run the test and have the correct conclusion about 95% of the time—or, in more technical terms, with roughly 95% confidence, right?

To test this, I am providing a little bit of code that will generate 40 samples from a Gamma (1, 2) and will store the p-value for a 1-sample t-test in column C9 of an empty worksheet. To reproduce similar results on your own, copy the following commands and paste them into Notepad. Save the file with the name “p-values.mtb,” and make sure to use the double quotes as part of the file name to ensure that the extension becomes MTB and not the default TXT.

Once the file is saved, choose File > Other Files > Run an Exec. In Number of times to execute the file, enter a relatively large number, such as 1000, and hit Select File, browse to the location on your computer where the p-values.MTB file is saved, and select the executable you just created. Click Open. Grab a cup of coffee, and once you get back to your desk, the simulation should be about done. Once the simulation is complete, create a histogram of column C9 (p-values) by choosing Graph > Histogram. This shows that the p-values are uniformly distributed between 0 and 1, just like they should when the null hypothesis is true.

What percentage of the time in the simulation above did we fail to reject the null? In the simulation I ran, this happened in 95.3% of the instances. Amazing, huh? What does this really mean, though? In layman’s terms, the test is working with approximately 95% confidence, despite the fact that the data is clearly not normal!

For more on simulated p-values for a 1-sample t-test, be sure check out Rob Kelly’s recent post on “p-value roulette.”

Note that if you’re following along and running your own simulation, the histogram and output above will most likely not match yours precisely. This is the beauty of simulated data—it’s going to be a little different each time. However, even with slightly different numbers, the conclusion you reach from the analysis should be about the same.

At this point, I hope you feel a little more comfortable about using these tests that are robust to normality, even if your data don't meet the normality assumption. The Assistant menu can also help you, as some of these “rules of thumb” are built into the Report Card, which informs you that this particular test would be accurate even with nonnormal data.

It is also worth mentioning that the unusual data check in the Assistant even offers a warning about some unusual observations. These unusual observations could have been outliers if the data were normally distributed. In this case though, since we know this data was generated at random, we can be confident that they are not outliers, but proper observations the reflect an underlying nonnormal distribution.

Whenever a normality test fails, an important skill to develop is to determine the reason for why the data is not normal. A few common reasons include:

The underlying distribution is nonnormal.
Outliers or mixed distributions are present.
A low discrimination gauge is used.
Skewness is present in the data.
you have a large sample size.

These are some of the topics that Daniel Griffith and I have been researching and presenting at conferences recently, and they are among the topics that are discussed in detail in Minitab's newest training course, Analysis of Nonnormal Data for Quality.

What tests are not robust to the normality assumption?

If tests around means are—in general—robust to the normality assumption, then when is normality a critical assumption? In general, tests that try to make inferences about the tails of the distribution will require the distribution assumption to be met.

Some examples include:

1.   Capability Analysis and determining Cpk and Ppk
2.   Tolerance Intervals
3.   Acceptance Sampling for variable data
4.   Reliability Analysis to estimate low or high percentiles

The tests for equal variances also are known to be extremely sensitive to the normality assumption.

If you would like to learn how to assess normality by understanding the sensitivity of normality tests under different scenarios, or to analyze nonnormal data when the assumption is extremely critical, or not so critical, you should check out our new course on Analysis of Nonnormal Data for Quality.

↧

Cp and Cpk: Two Process Perspectives, One Process Reality

April 28, 2015, 5:03 am

≫ Next: Start the Way that You Want by Saving Current Window Layout

≪ Previous: What Should I Do If My Data Is Not Normal?

It’s usually not a good idea to rely solely on a single statistic to draw conclusions about your process. Do that, and you could fall into the clutches of the “duck-rabbit” illusion shown here:

If you fix your eyes solely on the duck, you’ll miss the rabbit—and vice-versa.

If you're using Minitab Statistical Software for capability analysis, the capability indices Cp and Cpk are good examples of this. If you focus on only one measure, and ignore the other, you might miss seeing something critical about the performance of your process.

Cp: A Tale of Two Tails

Cp is a ratio of the specification spread to the process spread. The process spread is often defined as the 6-sigma spread of the process (that is, 6 times the within-subgroup standard deviation). Higher Cp values indicate a more capable process.

When the specification spread is considerably greater than the process spread, Cp is high.

When the specification spread is less than the process spread, Cp is low.

By using the 6-sigma process spread, Cp incorporates information about both tails of the process data. But there’s something Cp doesn’t do—it doesn’t tell you anything about the location of the process data.

For example, the following two processes have the about same Cp value (≈ 3):

Obviously, Process B has a serious issue with its location in relation to the spec limits that Cp just can't "see."

Cpk: Location, Location, Location!

Like Cp, Cpk is also a ratio of the specification spread to the process spread. But unlike Cp, Cpk compares the distance from the process mean to the closest specification limit, to about half the spread of the process (often, the 3-sigma spread).

When the distance from the mean to the nearest specification limit is considerably greater than the one-sided process spread, Cpk is high.

When the distance from the mean to the nearest specification limit is less than the one-sided process spread, Cpk is low.

Notice how the location of the process does affect the Cpk value—by virtue of its being calculated using the process mean.

Yet there's something important that Cpk doesn't do. Because it's a "worst-case" estimate that uses only the nearest specification limit, Cpk can't "see" how the process is performing on the other side.

For example, the following two processes have the about same Cpk value (≈ 0.9):

Notice that Process X has nonconforming parts in relation to both spec limits, while Process Y has nonconforming parts in relation to only the upper spec limit (USL). But Cpk can't "see"any difference between these two processes.

To get the two-sided picture of each process, in relation to both spec limits, you can look at Cp, which would be higher for Process Y than for Process X.

Summing Up: Look for Ducks, Rabbits, and Other Critters as Well

Avoid getting too fixated on any single statistic. If you have both a lower and upper specification limit for your process, Cp and Cpk each might “know” something about your process that the other one doesn’t. That “something” could be critical to fully understand how your process is performing.

To see a concrete example of how Cp and Cpk work together, using real data from the National Renewable Energy Laboratory, see this post by Cody Steele.

By the way, the potential "blind spot" for Cp and Cpk also applies to Pp and Ppk. The only difference is that the process spread for those indices is calculated using the overall standard deviation, instead of the within-subgroup standard deviation. For more on that distinction, see this post by Michelle Paret.

And if you’re interested other optical and statistical illusions, check out this post on Simpson's paradox.

↧

Latest Images