+ Reply to Thread
Results 1 to 8 of 8

Trimmed mean for grouped data

  1. #1
    Registered User
    Join Date
    02-23-2015
    Location
    London
    MS-Off Ver
    Excel 2013
    Posts
    6

    Trimmed mean for grouped data

    Hi,

    I have a dataset that looks like this but with 800,00 rows

    full date month year postcode category time (in seconds) area
    01/01/2010 1 1 CB1 Red 1 654 East of England
    02/01/2010 1 1 CB1 Red 2 351 East of England
    04/01/2010 1 1 CB1 Red 1 903 East of England

    I want to calculate the average time per each postcode but, as there are some outliers, I need the trimmed mean. I know how to use the =TRIMMEAN formula, but I don't know how to group per postcode and then calculate the trimmed mean. Like in a pivot table.

    I would also need to add some more levels of complexity, like calculating the trimmed mean for a specific postcode in a specific month. But I guess that if I solve the previous problem, this will be easy.

    Any ideas?

    Thanks!

  2. #2
    Forum Moderator
    Join Date
    01-21-2014
    Location
    St. Joseph, Illinois U.S.A.
    MS-Off Ver
    Office 365 v 2403
    Posts
    13,406

    Re: Trimmed mean for grouped data

    I've never used TRIMMEAN but as I understand it....assuming post code are in column D and times are in column F and the target post code is in J1 try this array formula.
    Formula: copy to clipboard
    Please Login or Register  to view this content.
    Array formulas are not committed in the regular way. You commit by pressing and holding Ctrl + Shift as you hit Enter. You will know that it has been entered successfully when you can see curly braces {} around your formula in the formula bar. You do not type these in yourself. Excel does it for you.

    Does this give expected results?
    Last edited by FlameRetired; 02-23-2015 at 02:21 PM. Reason: typos

  3. #3
    Forum Expert
    Join Date
    05-01-2014
    Location
    California, US
    MS-Off Ver
    Excel 2010
    Posts
    1,795

    Re: Trimmed mean for grouped data

    Quote Originally Posted by Duarte_RV View Post
    I have a dataset that looks like this but with 800,00 rows
    [....]
    I want to calculate the average time per each postcode but, as there are some outliers, I need the trimmed mean. I know how to use the =TRIMMEAN formula, but I don't know how to group per postcode and then calculate the trimmed mean. [....]

    I would also need to add some more levels of complexity, like calculating the trimmed mean for a specific postcode in a specific month.
    First, your table is not helpful because it does not include actual Excel row and column labels (1, 2 etc; and A, B etc). It is better to attach an example Excel file.

    Second, TRIMMEAN does not remove outliers per se. Instead, it arbitrarily removes "p" percent of the data (the first and last p/2 percent), where "p" is the percentage specified in the second parameter. Also beware of how TRIMMEAN rounds the number of data to exclude. See the TRIMMEAN help page for details.

    If you want to know about outliers, I could explain further.

    But if you insist on using TRIMMEAN, conceptually you might array-enter something like this (press ctrl+shift+Enter instead of just Enter):

    =TRIMMEAN(IF(D2:D80001="CB1",IF(B2:B80001=1,F2:F80001)), 1%)

    to remove 800 data points (80000*1%), the first and last 400 data points, from the January data (month 1) with postal code "CB1".
    Last edited by joeu2004; 02-23-2015 at 02:29 PM.

  4. #4
    Registered User
    Join Date
    02-23-2015
    Location
    London
    MS-Off Ver
    Excel 2013
    Posts
    6

    Re: Trimmed mean for grouped data

    Hi,

    Thanks for your help.


    I tried the formulas but they return #N/A

    The thing is that I have several postcodes in the spreadsheets, so the array should change. This is a sample of the dataset https://docs.google.com/spreadsheets...it?usp=sharing

    By the way Joeu2004, I know that a trimmed mean doesn't remove outliers per se. The thing is that this dataset is about ambulance response times, and there were times when they didn't start counting (so the time value is 0) or that they didn't stop (so the time is too long). I thought that for this case the trimmed mean was useful. But if you know a better method, I'm more than happy to change!

  5. #5
    Forum Expert
    Join Date
    05-01-2014
    Location
    California, US
    MS-Off Ver
    Excel 2010
    Posts
    1,795

    Re: Trimmed mean for grouped data

    Quote Originally Posted by Duarte_RV View Post
    I tried the formulas but they return #N/A
    [....]
    This is a sample of the dataset https://docs.google.com/spreadsheets...it?usp=sharing
    Thanks for the sample. As I said: the devil is in the details. You entered #N/A for some of the postcodes. Excel recognizes that as the Excel error, not text.

    It would be better if you wrote N/A without "#". You can use Find And Replace to change that. Then the original formula would work when it is array-entered (press ctrl+shift+Enter), specifically:

    =TRIMMEAN(IF(postcode="CB1",IF(mymonth=1,data)), 1%)

    where "postcode" is the named range D2:D41243, mymonth is B2:B41243, and data is G2:41243.

    Alternatively, array-enter the following:

    =TRIMMEAN(IF(ISTEXT(postcode & mymonth & data),IF(postcode="CB1",IF(mymonth=1,data))), 1%)

    That ensures there are no Excel errors (like #N/A) in mymonth and data as well. (There are not!)

    [EDIT] Better alternative if you do not require Excel 2003 compatibility (that is, you do not save as ".xls"): array-enter the following:

    =TRIMMEAN(IFERROR(IF(postcode="CB1",IF(mymonth=1,data)),""), 1%)

    Quote Originally Posted by Duarte_RV View Post
    By the way Joeu2004, I know that a trimmed mean doesn't remove outliers per se. The thing is that this dataset is about ambulance response times, and there were times when they didn't start counting (so the time value is 0) or that they didn't stop (so the time is too long). I thought that for this case the trimmed mean was useful. But if you know a better method, I'm more than happy to change!
    I will post another response soon.
    Last edited by joeu2004; 02-24-2015 at 12:28 PM. Reason: cosmetic; IFERROR alternative

  6. #6
    Registered User
    Join Date
    02-23-2015
    Location
    London
    MS-Off Ver
    Excel 2013
    Posts
    6

    Re: Trimmed mean for grouped data

    Oh! I didn't know about the #N/A! You saved my day. Thank you very much. Now it works

    Quote Originally Posted by joeu2004 View Post
    I will post another response soon.
    I'm trying two options: trimmed mean and median. Not sure which one is better to show the trend in this case.

  7. #7
    Forum Expert
    Join Date
    05-01-2014
    Location
    California, US
    MS-Off Ver
    Excel 2010
    Posts
    1,795

    Re: Trimmed mean for grouped data

    Sorry for the late response. I got intrigued by the nature of your example data, and I got side-tracked hoping to offer some "deep" insight. Not!

    Quote Originally Posted by Duarte_RV View Post
    I'm trying two options: trimmed mean and median. Not sure which one is better to show the trend in this case.
    Considering the Poisson-like or exponential-like distribution of your example data, the median might indeed be more representative of the central tendency than the mean.

    But that really depends on how you want to use the statistic. The benefit of the median is also its weakness: it is less sensitive to changes in the data. For example, MEDIAN({1,1,10,1000,1000}) and MEDIAN({8,8,10,11,11}) both return 10.

    In any case, since much of your example data are zero values, which are invalid, be sure to at least exclude zeros in any calculations.

    For example, array-enter formulas of the following form (press ctrl+shift+Enter instead of just Enter):

    =TRIMMEAN(IF(data>0,data), 0.5%)
    =MEDIAN(IF(data>0,data))

    Or you might normally-enter formulas of the following form (press Enter as usual):

    =AVERAGEIF(data,">0")
    =SUMIF(data,">0") / COUNTIF(data,">0")

    Quote Originally Posted by Duarte_RV View Post
    I know that a trimmed mean doesn't remove outliers per se. The thing is that this dataset is about ambulance response times, and there were times when they didn't start counting (so the time value is 0) or that they didn't stop (so the time is too long). I thought that for this case the trimmed mean was useful. But if you know a better method, I'm more than happy to change!
    Actually, the sample data that you post demonstrates the weakness of TRIMMEAN.

    More than 54% of the data are zero. Obviously, we cannot write TRIMMEAN(data,108%). Any reasonable use of TRIMMEAN would ignore only some of the zeros, not even "a lot" of them. So your trimmed mean would still be skewed to the left (too low).

    Based on your knowledge of the process, the ideal way to exclude "true" outliers (mistakes) is to determine arbitrary lower and upper boundaries that are clearly the result of procedural error.

    For example, you mentioned failing to start the response timer immediately. Clearly, zero reflects that mistake; perhaps 12 sec (the next smallest) does, too. And you mentioned failing to stop the response timer appropriately. Perhaps 8198 (2.25 hr), the example max, reflects that mistake; perhaps something lower does, too, like 3600 (1 hr).

    Alternatively, we can try to identify potential outliers statistically based on the interquartile range (IQR), the middle 50%.

    For some background, read http://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm (click here).

    To use this methodology, we might calculate the following:
    Please Login or Register  to view this content.
    Aside.... I prefer 1.7 instead of 1.5 because Q3 + IQR*1.7 is close to 3 sd, which is a common outlier limit for a normal distribution.

    Array-enter the formulas in X1 and X2 (press ctrl+shift+Enter instead of just Enter).

    Note that we always exclude zero because the predominance of zero data values (in the example) would skew any blind statistical calculation.

    Then we might calculate one of the following for a measure of central tendency:

    =AVERAGEIFS(data, data, ">0", data, ">=" & X4, data, "<=" & X5)
    or
    =MEDIAN(IF(data>0,IF(data>=X4,IF(data<=X5,data))))

    Array-enter the MEDIAN formula (press ctrl+shift+Enter instead of just Enter).

    Note that I always test data>0 as well as data>=X4 because X4 might be less than zero. That is the case for your example data.
    Last edited by joeu2004; 02-27-2015 at 08:41 PM. Reason: cosmetic

  8. #8
    Registered User
    Join Date
    02-23-2015
    Location
    London
    MS-Off Ver
    Excel 2013
    Posts
    6

    Re: Trimmed mean for grouped data

    Apologies for the late reply. For some reason I missed the notification email.

    Thank you very much for your explanation! It was very interesting.

    Nevertheless, I must say that what I sent wasn't the whole dataset. It was just a sample of the first few thousands rows. The original one is too large to work in Excel and I'm using a MySQL database. In the full dataset 0 values represent just the 1.5% of the total. I checked with the organisation that produced the data and when there's a 0 it means that there was an ambulance already there, so they are not errors (at least not all of them).

    The distribution is more normal than in the sample, so I'm going to stick to the median.

    Thank you very much for your help.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Trim and Keep the Trimmed Data
    By N Harkawat in forum Excel Formulas & Functions
    Replies: 10
    Last Post: 09-06-2005, 06:05 PM
  2. Trim and Keep the Trimmed Data
    By sally t in forum Excel Formulas & Functions
    Replies: 0
    Last Post: 09-06-2005, 03:05 PM
  3. [SOLVED] Trim and Keep the Trimmed Data
    By sally t in forum Excel Formulas & Functions
    Replies: 0
    Last Post: 09-06-2005, 12:05 PM
  4. [SOLVED] Trim and Keep the Trimmed Data
    By sally t in forum Excel Formulas & Functions
    Replies: 0
    Last Post: 09-05-2005, 11:05 PM
  5. Trim and Keep the Trimmed Data
    By sally t in forum Excel Formulas & Functions
    Replies: 0
    Last Post: 09-05-2005, 10:05 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1