+ Reply to Thread
Results 1 to 7 of 7

How to identify outliers?

  1. #1
    Registered User
    Join Date
    10-02-2014
    Location
    Birmingham
    MS-Off Ver
    2013
    Posts
    23

    How to identify outliers?

    I have a set of data. When I plot them into a graph, there is a data appeared to deviate from the overall trend (highlighted in yellow in the attachment).

    It looks to me that that particular data could be a potential outlier (which I hope it is). I am wondering how to identify it as outlier? (or outliers, if there is more than one?)
    Attached Files Attached Files

  2. #2
    Forum Guru MarvinP's Avatar
    Join Date
    07-23-2010
    Location
    Woodinville, WA
    MS-Off Ver
    Office 365
    Posts
    16,167

    Re: How to identify outliers?

    Hi Wes,

    You can add a Trendline and the equation to your data. Then using the equation simply find the square of the distance from the point to the line. Conditional format the biggest distances and see if that is what you need.

    See http://www.excel-easy.com/examples/trendline.html for adding a trendline.
    Attached Files Attached Files
    One test is worth a thousand opinions.
    Click the * Add Reputation below to say thanks.

  3. #3
    Registered User
    Join Date
    10-02-2014
    Location
    Birmingham
    MS-Off Ver
    2013
    Posts
    23

    Re: How to identify outliers?

    Hi Marvin,

    Thanks for your reply. Apparently, there are 5 outliers in the data.

    I have a couple of questions to ask, hope you don't mind

    1. Is there a statistical term of using this method in identifying outliers?
    2. How to define large values? Is there a method to calculate the maximum and minimum caps?

  4. #4
    Forum Guru MarvinP's Avatar
    Join Date
    07-23-2010
    Location
    Woodinville, WA
    MS-Off Ver
    Office 365
    Posts
    16,167

    Re: How to identify outliers?

    When you toss out data it is like cheating. You are really simply throwing away data that doesn't support what you want it to support. Why collect data if you aren't going use it?

    I believe you'd use a statistical term dealing with standard deviations away from the Trend Line. I'd say the data point was outside 3 standard deviations of the trend.

  5. #5
    Registered User
    Join Date
    10-02-2014
    Location
    Birmingham
    MS-Off Ver
    2013
    Posts
    23

    Re: How to identify outliers?

    Hmmm.. If the data is statistically determined as outlier, I'm happy to exclude it. Otherwise I'll use it anyway.

    In fact, I like your "square of distance" suggestion which sounds very sensible, but I wish to undertsand it better so that I can explain it intelligently during discussion.

    By the way, as my values in Y-axis change with values in X-axis, I am not sure if I can use standard deviation in this case?

  6. #6
    Forum Expert
    Join Date
    05-01-2014
    Location
    California, US
    MS-Off Ver
    Excel 2010
    Posts
    1,795

    Re: How to identify outliers?

    Quote Originally Posted by Wes-at-UoB View Post
    If the data is statistically determined as outlier, I'm happy to exclude it.
    Technically, you should only exlcude outliers if they represent an error in the data. But in practice, it is not uncommon to exclude outliers without understanding their nature.

    Quote Originally Posted by Wes-at-UoB View Post
    I am not sure if I can use standard deviation in this case?
    You are wise to question that. The std dev rule should be applied only to data that is normally distributed. Unfortunately, many people overlook that fact.

    More generally, use the IQR (interquartile range) rule to identify outliers. (The operative word is "identify". Again, just because data are outliers, that does not mean they should be excluded.) For example:

    Q1: =QUARTILE(B2:B38,1)
    Q3: =QUARTILE(B2:B38,3)
    IQR1: =Q3-Q1
    OUT1: =Q1-IQR1*1.5
    OUT3: =Q3+IQR1*1.5

    Then in C2:C38:
    =IF(OR(B2<$OUT$1,$OUT$3<B2),"outlier","")

    Note: There is no agreement on what IQR factor to use. 1.5 is often suggested because it is easy to remember, and it is "close" to 3sd if the data were normally distributed. In fact, 1.7 is better for that purpose. But some people prefer to use 4sd to identify outliers of a normal distribution; and to that end, IQR*2.5 is close to 4sd if the data were normally distributed.
    Last edited by joeu2004; 06-23-2015 at 09:13 PM.

  7. #7
    Forum Guru
    Join Date
    04-13-2005
    Location
    North America
    MS-Off Ver
    2002/XP and 2007
    Posts
    15,819

    Re: How to identify outliers?

    joeu2004 may be a stronger statistician than I am, so I would defer to his judgement in the details. The only thing I would add to this discussion is a link to this discussion: http://www.real-statistics.com/multi...d-influencers/ One thing I like about his website is that he teaches the statistics principles and theories specifically for an Excel audience. If you need to perform these tests with some statistical rigor, I would suggest sites like this to help you understand the statistics behind the question and solution.
    Quote Originally Posted by shg
    Mathematics is the native language of the natural world. Just trying to become literate.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Removing outliers
    By brunesCH in forum Excel Formulas & Functions
    Replies: 2
    Last Post: 10-23-2014, 12:38 PM
  2. Getting rid of data outliers
    By mwwoodm in forum Excel General
    Replies: 0
    Last Post: 05-10-2012, 08:47 PM
  3. sum and stdev outliers
    By gergleb in forum Excel General
    Replies: 0
    Last Post: 12-01-2011, 12:46 PM
  4. Excel 2007 : best fit line without outliers
    By bored in forum Excel General
    Replies: 3
    Last Post: 11-23-2010, 05:39 AM
  5. Boxplots with outliers
    By Confuzzled. in forum Excel Charting & Pivots
    Replies: 1
    Last Post: 03-17-2006, 03:40 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1