Need a formula for detecting a single outlier with Grubbs test

**Flyers** · 05-15-2013, 12:07 AM

For example, need the outlier for the following.

78 77 78 45 76 81 80 80 77 79

Thanks!

**FDibbins** · 05-15-2013, 12:25 AM

Hi and welcome to the forum

Just out of curiosity, for those unfamiliar with that test, what is it, how does it work, what is the formula behind it, and what would be the answer for your sample? (gut feel says 45, but I have no clue really)

**Flyers** · 05-15-2013, 06:52 AM

Yes of course. And 45 is correct (that one is obvious)

Grubbs' test is based on the assumption of normality. That is, one should first verify that the data can be reasonably approximated by a normal distribution before applying the Grubbs' test.[1]
Grubbs' test detects one outlier at a time. This outlier is expunged from the dataset and the test is iterated until no outliers are detected. However, multiple iterations change the probabilities of detection, and the test should not be used for sample sizes of six or less since it frequently tags most of the points as outliers.
Grubbs' test is defined for the hypothesis:
H0: There are no outliers in the data set
Ha: There is at least one outlier in the data set
The Grubbs' test statistic is defined as:

with and s denoting the sample mean and standard deviation, respectively. The Grubbs test statistic is the largest absolute deviation from the sample mean in units of the sample standard deviation.
This is the two-sided version of the test. The Grubbs test can also be defined as a one-sided test. To test whether the minimum value is an outlier, the test statistic is

with Ymin denoting the minimum value. To test whether the maximum value is an outlier, the test statistic is

with Ymax denoting the maximum value.
For the two-sided test, the hypothesis of no outliers is rejected at significance level α if

with tα/(2N),N−2 denoting the upper critical value of the t-distribution with N − 2 degrees of freedom and a significance level of α/(2N). For the one-sided tests, replace α/(2N) with α/N.

http://en.wikipedia.org/wiki/Grubbs'_test_for_outliers

**Flyers** · 05-15-2013, 06:53 AM

All I need is the first outlier due to small data sets

**Flyers** · 05-15-2013, 07:02 PM

I found this but not a math guy

Detecting outliers with Grubbs' test.
FAQ# 1598
Statisticians have devised several ways to detect outliers. Grubbs' test is particularly easy to understand. This method is also called the ESD method (extreme studentized deviate).You can perform Grubbs' test using a free calculator on the GraphPad site. Prism 6 also has a built-in analysis that can detect outliers using Grubbs' method.

How Grubbs' test works

The first step is to quantify how far the outlier is from the others. Calculate the ratio Z as the difference between the outlier and the mean divided by the SD. If Z is large, the value is far from the others. Note that you calculate the mean and SD from all values, including the outlier.

Since 5% of the values in a Gaussian population are more than 1.96 standard deviations from the mean, your first thought might be to conclude that the outlier comes from a different population if Z is greater than 1.96. This approach only works if you know the population mean and SD from other data. Although this is rarely the case in experimental science, it is often the case in quality control. You know the overall mean and SD from historical data, and want to know whether the latest value matches the others. This is the basis for quality control charts.
When analyzing experimental data, you don't know the SD of the population. Instead, you calculate the SD from the data. The presence of an outlier increases the calculated SD. Since the presence of an outlier increases both the numerator (difference between the value and the mean) and denominator (SD of all values), Z can not get as large as many expect. For example, if N=3, Z cannot be larger than 1.155 for any set of values. More generally, with a sample of N observations, Z can never get larger than.
Grubbs and others have tabulated critical values for Z which are tabulated below. The critical value increases with sample size, as expected. You'll find the needed equation here, if you want to do your own analyses.
If your calculated value of Z is greater than the critical value in the table, then the P value is less than 0.05. This means that there is less than a 5% chance that you'd encounter an outlier so far from the others (in either direction) by chance alone, if all the data were really sampled from a single Gaussian distribution. Note that the method only works for testing the most extreme value in the sample (if in doubt, calculate Z for all values, but only calculate a P value for Grubbs' test from the largest value of Z.