+ Reply to Thread
Results 1 to 4 of 4

Average values while removing outliers

  1. #1
    Forum Contributor
    Join Date
    10-13-2015
    Location
    California
    MS-Off Ver
    Office 2013
    Posts
    138

    Average values while removing outliers

    hello all,

    I am looking for a way to achieve an average while removing outlining values.

    I attached a sample of some data that im working with...

    What i am hoping is that i can take an average of Hrs worked per Assy in Column I where The value in Columb B and D match but exclude any lines that are more than XX% different from the other entries. *the % to be used is still TBD*

    I initially tried to add a helper column that checked the hours worked against a calculated average and i was going to have it just display "yes" or "no" if the hours work was greater than 10% different than the average but the problem with that is the major outliers were still weighing the average and thus botching the results.

    Im sure i did a poor job of explaining this but if you can follow along at all please let me know if you have ideas or questions.
    Attached Files Attached Files

  2. #2
    Forum Guru
    Join Date
    04-13-2005
    Location
    North America
    MS-Off Ver
    2002/XP and 2007
    Posts
    15,812

    Re: Average values while removing outliers

    I'm not sure I understand exactly what you want. Your data is quite scattered to where it is difficult to intuitively see which values you intend to include in the averages and which you intend to exclude. If you could maybe work out a few example manually so we can see what you intend.

    I am not a real statistician, but every "outlier" algorithm that I know of requires that you first calculate the "unfiltered" average before determining which points are outliers. If you are trying to use any of those algorithms, I don't know of a way to implement those algorithms without first computing the average.

    Or maybe you are just wanting to add two additional criteria to the existing AVERAGEIFS(). AVERAGEIFS(...,I:I,">"&I2*0.9,I:I,"<"&I2*1.1) which simply adds the condition that you only want to average values that are within 10% of the current value in column I.

    Help us understand what you are trying to do, and we will try to help you implement it.
    Quote Originally Posted by shg
    Mathematics is the native language of the natural world. Just trying to become literate.

  3. #3
    Forum Expert
    Join Date
    05-01-2014
    Location
    California, US
    MS-Off Ver
    Excel 2010
    Posts
    1,795

    Re: Average values while removing outliers

    (Note: I would never use TRIMMEAN, which arbitrarily excludes the largest and smallest x% of the data, even though many people (incorrectly) describe its purpose as removing "outliers".)

    One statistical definition of an outlier is: a value that is less than q1-IQR*x or more than q3+IQR*x, where q1 is the 25%ile, q3 is the 75%ile, IQR (interquartile range) = q3-q1 (middle 50%), and x is some IQR factor (TBD below).

    Caveat: Just because a value is an outlier, that does not necessarily mean it should be excluded. Usually, a value that is identified as an outlier should be excluded only if it is a mistake. That said, many people do indeed arbitrarily exclude outliers, with the intent of trying to determine some "centrality" of the data.

    To that end, I would calculate the following:

    M15 (q1): =QUARTILE(I2:I13,1)
    M16 (q3): =QUARTILE(I2:I13,3)
    M17 (IQR): =M16-M15
    M18 (out1): =M15-M17*1.7
    M19 (out3): =M16+M17*1.7
    M20 (avg): =AVERAGEIFS(I2:I13, I2:I13, ">=" & M18, I2:I13, "<=" & M19)

    That average excludes I3 and I9 in your example. You can determine that by entering the following formula into M2 and copying down through M13:

    =IF(AND($M$18<=I2, I2<=$M$19), "in", "out")

    -----

    Note that I choose x=1.7. This is a subject of some debate and disagreement.

    Many people choose 2 or 2.5, even 1.5, IIRC. Those factors are easy to remember.

    I choose 1.7 because in a normal distribution, that corresponds to +/-3sd ("sd" is standard deviations).

    (But the IQR method of identifying outliers is not limited to normal distributions.)

    To explain.... For a normal distribution, two criteria are commonly used to determine the outlier limits: +/-3sd ("weak" limits), and +/-4sd ("strong" limits).

    And for a normal distribution, an IQR factor of 1.7 corresponds to +/-3sd; 2.5 corresponds to +/-4sd.

    I prefer the "weak" limits because they include 99.73% of normally-distributed data, and it is more likely to exclude some data.

    In contrast, the "strong" limits include 99.99% of normally-distributed data, and it is less likely to exclude any data.

    For your example, it does not matter because the outliers are so extreme. But it might make a difference with other data.

    Anyway, you should feel free to choose any IQR factor between 1.5 and 2.5.
    Last edited by joeu2004; 03-19-2020 at 02:52 AM.

  4. #4
    Forum Expert
    Join Date
    05-01-2014
    Location
    California, US
    MS-Off Ver
    Excel 2010
    Posts
    1,795

    Re: Average values while removing outliers

    PS....
    Quote Originally Posted by joeu2004 View Post
    Caveat: Just because a value is an outlier, that does not necessarily mean it should be excluded. Usually, a value that is identified as an outlier should be excluded only if it is a mistake. That said, many people do indeed arbitrarily exclude outliers, with the intent of trying to determine some "centrality" of the data.
    Speaking of "centrality" of data, you might consider calculating the MEDIAN, not the AVERAGE excluding outliers.

    The median is the value such that (about) 50% of the data is above and 50% of the data is below it.

    It is less sensitive to extreme differences in the data, such as yours.

    And it avoids arbitrarily excluding valid data just because they seem to be "outliers".

    That is why, for example, we talk about the median price of homes, not the average price. That way, the billionaire's $50M mansion and the minimalist's 1000 sq-ft one-room home do not affect the statistic in an otherwise "middle class" neighborhood.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Standard Deviation formula removing outliers
    By Stevednmc in forum Excel Formulas & Functions
    Replies: 2
    Last Post: 01-16-2018, 01:56 PM
  2. How to find max value but removing outliers first.
    By izzy_992 in forum Excel General
    Replies: 2
    Last Post: 04-05-2017, 11:21 PM
  3. Removing outliers
    By brunesCH in forum Excel Formulas & Functions
    Replies: 2
    Last Post: 10-23-2014, 12:38 PM
  4. Comparing 1 Sheet to a Masterlist and removing the outliers
    By Rainbowhunt in forum Excel Formulas & Functions
    Replies: 3
    Last Post: 04-12-2013, 04:29 PM
  5. [SOLVED] Formula for average minus outliers
    By Granny Nanny in forum Excel Formulas & Functions
    Replies: 4
    Last Post: 09-20-2012, 09:38 AM
  6. Averaging while removing outliers
    By Spets in forum Excel General
    Replies: 9
    Last Post: 09-25-2011, 11:58 PM
  7. Exclude outliers in average calculation excel
    By mkvassh in forum Excel General
    Replies: 6
    Last Post: 10-15-2009, 08:36 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1