+ Reply to Thread
Results 1 to 4 of 4

Formula for conditional distinct count on very large ranges

  1. #1
    Registered User
    Join Date
    11-22-2020
    Location
    Belgium
    MS-Off Ver
    Office 2016
    Posts
    1

    Formula for conditional distinct count on very large ranges

    I have a data set with the following structure where each row represents a unique Customer x Product combination:

    Customer Product Code Product category
    A 1 Food
    A 3 Beverages
    B 1 Food
    C 2 Food
    C 3 Beverages
    C 1 Food
    D 4 Beverages
    D 5 Other
    E 1 Food

    For each product category (Food, Beverages, Other) I need to calculate the distinct count of products in the data set.
    So for this particular example the output will be:

    Category Distinct count
    Food 2
    Beverages 2
    Other 1

    For this small example it's fairly easy to calculate this. However, my real data set has more than 20 thousand lines, with thousands of products and customers and hundreds of product categories. Ofcourse, I can easily add the table to a data model and then use the Distinct count functionality in a pivot table. However, I also have a lot of these tables, so converting them all into pivot tables makes it cumbersome + if anything in the data changes the pivot table has to be updated etc. So I'm looking for a formula that I can easily apply to the different tables to calculate this, but I can't wrap my head around how to do this...

    Any thoughts?

  2. #2
    Forum Moderator Glenn Kennedy's Avatar
    Join Date
    07-08-2012
    Location
    Digital Nomad... occasionally based in Ireland.
    MS-Off Ver
    O365 (PC) V 2403
    Posts
    43,893

    Re: Formula for conditional distinct count on very large ranges

    No guarantees about speed... but this will work:

    =SUM(INDEX(($C$3:$C$20=F3)*($B$3:$B$20<>"")/COUNTIFS($C$3:$C$20,$C$3:$C$20&"",$B$3:$B$20,$B$3:$B$20&""),0))
    Attached Files Attached Files
    Glenn




    None of us get paid for helping you... we do this for fun. So DON'T FORGET to say "Thank You" to all who have freely given some of their time to help YOU.

    Temporary addition of accented to illustrate ongoing problem to the TT: L? fh?ile P?draig sona dhaoibh

  3. #3
    Forum Expert
    Join Date
    09-25-2015
    Location
    Milan Italy
    MS-Off Ver
    office 365
    Posts
    1,776

    Re: Formula for conditional distinct count on very large ranges

    A
    B
    C
    D
    E
    F
    G
    1
    Customer Product Code Product category Category Distinct count
    2
    A
    1
    Food Food
    2
    3
    A
    3
    Beverages Beverages
    2
    4
    B
    1
    Food Other
    1
    5
    C
    2
    Food
    6
    C
    3
    Beverages
    7
    C
    1
    Food
    8
    D
    4
    Beverages
    9
    D
    5
    Other
    10
    E
    1
    Food
    11



    G2=IFERROR(SUM(IF(FREQUENCY(IF($C$2:$C$100=F2,MATCH($B$2:$B$100,$B$2:$B$100,0)),ROW($B$2:$B$100)-ROW($B$2)+1),1)),"")

    Control+shift +enter

    copy down
    Attached Files Attached Files
    Last edited by CARACALLA; 11-22-2020 at 08:53 AM.

  4. #4
    Forum Expert BMV's Avatar
    Join Date
    01-13-2020
    Location
    St-Petersburg, Russia
    MS-Off Ver
    2013-2016
    Posts
    1,329

    Re: Formula for conditional distinct count on very large ranges

    it is even not optimization because the speed will be very close but
    Please Login or Register  to view this content.
    I have tested several methods
    SUMPRODUCT(Cond1*Cond2)
    SUMPRODUCT(Cond1;Cond2)
    SUM(Cond1*Cond2) array formula

    Generally SUMPRODUCT(Cond1;Cond2) is favorite.

    But in this case FREQUENCY is winner
    =SUM(INDEX(($C$3:$C$5002=F3)*($B$3:$B$5002<>"")/COUNTIFS($C$3:$C$5002,$C$3:$C$5002&"",$B$3:$B$5002,$B$3:$B$5002&""),)) 5,078125
    =SUMPRODUCT(($C$3:$C$5002=F3)*($B$3:$B$5002<>""),1/COUNTIFS($C$3:$C$5002,$C$3:$C$5002&"",$B$3:$B$5002,$B$3:$B$5002&"")) 5,050781
    =IFERROR(SUM(IF(FREQUENCY(IF($C$3:$C$5002=F3,MATCH($B$3:$B$5002,$B$2:$B$5002,0)),ROW($B$3:$B$5002)-ROW($B$3)+1),1)),"") 0,0078125

    =SUM(INDEX(($C$3:$C$9002=F3)*($B$3:$B$9002<>"")/COUNTIFS($C$3:$C$9002,$C$3:$C$9002&"",$B$3:$B$9002,$B$3:$B$9002&""),)) 17,1875
    =SUMPRODUCT(($C$3:$C$9002=F3)*($B$3:$B$9002<>""),1/COUNTIFS($C$3:$C$9002,$C$3:$C$9002&"",$B$3:$B$9002,$B$3:$B$9002&"")) 16,74609
    =IFERROR(SUM(IF(FREQUENCY(IF($C$3:$C$9002=F3,MATCH($B$3:$B$9002,$B$2:$B$9002,0)),ROW($B$3:$B$9002)-ROW($B$3)+1),1)),"") 0,015625

    =SUM(INDEX(($C$3:$C$18002=F3)*($B$3:$B$18002<>"")/COUNTIFS($C$3:$C$18002,$C$3:$C$18002&"",$B$3:$B$18002,$B$3:$B$18002&""),)) 67,43359
    =SUMPRODUCT(($C$3:$C$18002=F3)*($B$3:$B$18002<>""),1/COUNTIFS($C$3:$C$18002,$C$3:$C$18002&"",$B$3:$B$18002,$B$3:$B$18002&"")) 69,51172
    =IFERROR(SUM(IF(FREQUENCY(IF($C$3:$C$18002=F3,MATCH($B$3:$B$18002,$B$2:$B$18002,0)),ROW($B$3:$B$18002)-ROW($B$3)+1),1)),"") 0,03125

    However
    FREQUENCY could be extracted
    array formula
    Formula: copy to clipboard
    Please Login or Register  to view this content.
    is shorter and power also

    =SUM(INDEX(($C$3:$C$18002=F3)*($B$3:$B$18002<>"")/COUNTIFS($C$3:$C$18002,$C$3:$C$18002&"",$B$3:$B$18002,$B$3:$B$18002&""),)) 66,38672
    =SUMPRODUCT(($C$3:$C$18002=F3)*($B$3:$B$18002<>""),1/COUNTIFS($C$3:$C$18002,$C$3:$C$18002&"",$B$3:$B$18002,$B$3:$B$18002&"")) 65,95703
    =IFERROR(SUM(IF(FREQUENCY(IF($C$3:$C$18002=F3,MATCH($B$3:$B$18002,$B$3:$B$18002,0)),ROW($B$3:$B$18002)-ROW($B$3)+1),1)),"") 0,0234375
    =IFERROR(SUM(--IF($C$3:$C$18002=F3,MATCH($B$3:$B$18002,$B$3:$B$18002,0)=ROW($B$3:$B$18002)-ROW($B$3)+1)),"") 0,02734375
    =IFERROR(SUM(--IF($C$3:$C$18002=F3,MATCH($B$3:$B$18002,$B:$B,)=ROW($B$3:$B$18002))),"") 0,0234375
    Last edited by BMV; 11-22-2020 at 10:15 AM.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Count distinct values based on criteria (large data)
    By ifulao in forum Excel Formulas & Functions
    Replies: 6
    Last Post: 04-02-2020, 05:31 AM
  2. Formula for distinct count in Data Model excluding 0s
    By Abell in forum Excel Formulas & Functions
    Replies: 1
    Last Post: 11-07-2019, 11:23 AM
  3. [SOLVED] How to craft a copy down formula for distinct, separate ranges.
    By Coyote_e in forum Excel Formulas & Functions
    Replies: 1
    Last Post: 09-15-2019, 09:34 PM
  4. [SOLVED] Count of distinct values between two date ranges
    By HKPHOOY in forum Excel Formulas & Functions
    Replies: 26
    Last Post: 03-01-2018, 02:39 PM
  5. [SOLVED] Count Conditional Distinct Values
    By Gos-C in forum Excel Formulas & Functions
    Replies: 2
    Last Post: 02-10-2013, 05:41 AM
  6. Formula - Count distinct / Countif??
    By Africa in forum Excel General
    Replies: 3
    Last Post: 02-29-2012, 11:24 AM
  7. Count frequency of set hour ranges from large list of times
    By samchargers09 in forum Excel General
    Replies: 11
    Last Post: 02-26-2010, 07:14 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1