+ Reply to Thread
Results 1 to 5 of 5

Remove Highest Outlier, Return Highest Value

  1. #1
    Registered User
    Join Date
    02-22-2017
    Location
    Palm Beach, FL
    MS-Off Ver
    2013
    Posts
    2

    Question Remove Highest Outlier, Return Highest Value

    I am attempting to remove the highest outlier(s) from a data set and return the next highest value. The following formula has been created, =MAX(IF($B:$B=R2,IF(E:E<((QUARTILE(IF($B:$B=R2,E:E),3)+((QUARTILE(IF($B:$B=R2,E:E),3)-QUARTILE(IF($B:$B=R2,E:E),1))*3))),E:E))) This formula returns the expected result after hours of calculation. I believe there must be a simpler and quicker way to return the result I need either through a less volatile formula or VBA. I am also wondering if unformatted data is causing the lag. The raw data from the client is formatted at General.

    Column B is the list of store #'s
    R2 = is the store I am looking for the highest value in column E

    I've pared down the data to provide an example but there are several hundred store numbers and a transaction for each date for each store, approximated 80,000 rows of data. In the below data the result I am expecting is 6,610

    3360 10/9/2016 6,503
    3360 10/28/2016 6,210
    3360 10/1/2016 6,251
    3360 10/21/2016 6,610
    3360 10/30/2016 6,308
    3360 10/13/2016 6,001
    3360 10/27/2016 6,080
    3360 10/2/2016 6,204
    3360 10/7/2016 7,205
    3360 10/19/2016 5,812
    3360 10/5/2016 5,701
    3360 10/25/2016 5,810
    3360 10/24/2016 5,707

  2. #2
    Valued Forum Contributor loginjmor's Avatar
    Join Date
    01-31-2013
    Location
    Cedar Rapids, Iowa
    MS-Off Ver
    Excel 2013
    Posts
    1,073

    Re: Remove Highest Outlier, Return Highest Value

    Hi -

    If you only want the second highest value, try the LARGE function and specify the second largest. For example, assuming your data is in column A it would look something like:

    =LARGE(A:A,2)

    Hope this helps.
    ____________________________________________
    If this has solved your problem, please edit the thread title to add the word [SOLVED] at the beginning. You can do this by
    -Go to the top of the first post
    -Select Thread Tools
    -Select Mark thread as Solved

    If I have been particularly helpful, please "bump" my reputation by pressing the small star in the lower left corner of my post.

  3. #3
    Registered User
    Join Date
    02-22-2017
    Location
    Palm Beach, FL
    MS-Off Ver
    2013
    Posts
    2

    Re: Remove Highest Outlier, Return Highest Value

    Thank you for your reply. I tried LARGE first but it does not always apply if there isn't an outlier. Say if the highest value was 6,625, I would need that value returned.

  4. #4
    Valued Forum Contributor loginjmor's Avatar
    Join Date
    01-31-2013
    Location
    Cedar Rapids, Iowa
    MS-Off Ver
    Excel 2013
    Posts
    1,073

    Re: Remove Highest Outlier, Return Highest Value

    OK - Try something like this:

    =LARGE((IF((B1:B13=$R$2),E1:E13,"")),2)

    This is an array formula, so you will have to copy and paste this to a cell somewhere in your spreadsheet. From the formula edit window, press Ctrl-Shift-Enter at the same time to engage the array formula functionality. If you did it right, Excel will add curly braces {} around the formula. So, column B is the store number, R2 is the store you're looking for, and column E is the list of data you want to pull the second highest value from.

    I noted in your formula in the original post you are using entire column references (like B:B or E:E) instead of specific ranges (like B1:B80000). When you specify entire columns that REALLY slows excel down on big spreadsheets. So, your performance will probably improve if you use specific range references.

    Hope this helps.

  5. #5
    Forum Guru
    Join Date
    04-13-2005
    Location
    North America
    MS-Off Ver
    2002/XP and 2007
    Posts
    15,829

    Re: Remove Highest Outlier, Return Highest Value

    A few things I note in that formula:

    1) You are using full column references, when you claim to only have 80000 rows. One of the first recommendations for speeding something like this up is to get rid of the full column references.
    2) We seem to have a real desire/"need" to cram as much as possible into a single cell formula, and I am not sure why. When we cram a lot into a single formula, it is easy to duplicate effort. For example:
    2a) Your IF(this store is the store in R2 then....) part of the function is computed three times inside of that one formula. If we can figure out a way (*cough* helper column *cough*) to compute this once, we will likely cut computation time by a factor of 3.
    2b) You are computing the 3rd quartile twice inside of the same function.
    3) Then, I assume, you are doing this for each store.

    After a little thought, the best way to really speed this up might be a pivot table. Set up a pivot table with store id (or date) as row label, date (or store id) as column label. Then you can use your quartile functions on the pivot table to get each quartile, identify outliers, then use the LARGE() function as loginjmor suggests to get the largest non-outlier. I generated a quick random dataset with a million rows of data, and the pivot table only takes maybe 5 sec. on my machine to refresh, and the outlier calculation is essentially instantaneous.
    Quote Originally Posted by shg
    Mathematics is the native language of the natural world. Just trying to become literate.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 3
    Last Post: 06-09-2016, 12:51 PM
  2. [SOLVED] Need to select earliest five rows matching highest value, and next highest if not enough
    By Ochimus in forum Excel Programming / VBA / Macros
    Replies: 10
    Last Post: 05-05-2016, 11:26 AM
  3. [SOLVED] Max formula to return total of highest, second highest and third highest value
    By JonWilf in forum Excel Formulas & Functions
    Replies: 5
    Last Post: 04-06-2016, 08:20 AM
  4. How to change the lowest to highest into Highest to lowes ??
    By jasond1992 in forum Excel Programming / VBA / Macros
    Replies: 1
    Last Post: 09-30-2015, 03:38 AM
  5. 2 columns of data matching highest with highest
    By ronaldchristie in forum Excel Formulas & Functions
    Replies: 2
    Last Post: 08-03-2012, 07:57 AM
  6. VBA code to find highest and 2nd highest number based in criteria
    By Michael007 in forum Excel Programming / VBA / Macros
    Replies: 5
    Last Post: 10-25-2011, 08:38 AM
  7. Replies: 3
    Last Post: 08-10-2006, 11:40 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1