# Formula for conditional distinct count on very large ranges

1. ## Formula for conditional distinct count on very large ranges

I have a data set with the following structure where each row represents a unique Customer x Product combination:

 Customer Product Code Product category A 1 Food A 3 Beverages B 1 Food C 2 Food C 3 Beverages C 1 Food D 4 Beverages D 5 Other E 1 Food

For each product category (Food, Beverages, Other) I need to calculate the distinct count of products in the data set.
So for this particular example the output will be:

 Category Distinct count Food 2 Beverages 2 Other 1

For this small example it's fairly easy to calculate this. However, my real data set has more than 20 thousand lines, with thousands of products and customers and hundreds of product categories. Ofcourse, I can easily add the table to a data model and then use the Distinct count functionality in a pivot table. However, I also have a lot of these tables, so converting them all into pivot tables makes it cumbersome + if anything in the data changes the pivot table has to be updated etc. So I'm looking for a formula that I can easily apply to the different tables to calculate this, but I can't wrap my head around how to do this...

Any thoughts?

2. ## Re: Formula for conditional distinct count on very large ranges

No guarantees about speed... but this will work:

=SUM(INDEX((\$C\$3:\$C\$20=F3)*(\$B\$3:\$B\$20<>"")/COUNTIFS(\$C\$3:\$C\$20,\$C\$3:\$C\$20&"",\$B\$3:\$B\$20,\$B\$3:\$B\$20&""),0))

3. ## Re: Formula for conditional distinct count on very large ranges

 A B C D E F G 1 Customer Product Code Product category Category Distinct count 2 A 1 Food Food 2 3 A 3 Beverages Beverages 2 4 B 1 Food Other 1 5 C 2 Food 6 C 3 Beverages 7 C 1 Food 8 D 4 Beverages 9 D 5 Other 10 E 1 Food 11

G2=IFERROR(SUM(IF(FREQUENCY(IF(\$C\$2:\$C\$100=F2,MATCH(\$B\$2:\$B\$100,\$B\$2:\$B\$100,0)),ROW(\$B\$2:\$B\$100)-ROW(\$B\$2)+1),1)),"")

Control+shift +enter

copy down

4. ## Re: Formula for conditional distinct count on very large ranges

it is even not optimization because the speed will be very close but
``Please Login or Register  to view this content.``
I have tested several methods
SUMPRODUCT(Cond1*Cond2)
SUMPRODUCT(Cond1;Cond2)
SUM(Cond1*Cond2) array formula

Generally SUMPRODUCT(Cond1;Cond2) is favorite.

But in this case FREQUENCY is winner
=SUM(INDEX((\$C\$3:\$C\$5002=F3)*(\$B\$3:\$B\$5002<>"")/COUNTIFS(\$C\$3:\$C\$5002,\$C\$3:\$C\$5002&"",\$B\$3:\$B\$5002,\$B\$3:\$B\$5002&""),)) 5,078125
=SUMPRODUCT((\$C\$3:\$C\$5002=F3)*(\$B\$3:\$B\$5002<>""),1/COUNTIFS(\$C\$3:\$C\$5002,\$C\$3:\$C\$5002&"",\$B\$3:\$B\$5002,\$B\$3:\$B\$5002&"")) 5,050781
=IFERROR(SUM(IF(FREQUENCY(IF(\$C\$3:\$C\$5002=F3,MATCH(\$B\$3:\$B\$5002,\$B\$2:\$B\$5002,0)),ROW(\$B\$3:\$B\$5002)-ROW(\$B\$3)+1),1)),"") 0,0078125

=SUM(INDEX((\$C\$3:\$C\$9002=F3)*(\$B\$3:\$B\$9002<>"")/COUNTIFS(\$C\$3:\$C\$9002,\$C\$3:\$C\$9002&"",\$B\$3:\$B\$9002,\$B\$3:\$B\$9002&""),)) 17,1875
=SUMPRODUCT((\$C\$3:\$C\$9002=F3)*(\$B\$3:\$B\$9002<>""),1/COUNTIFS(\$C\$3:\$C\$9002,\$C\$3:\$C\$9002&"",\$B\$3:\$B\$9002,\$B\$3:\$B\$9002&"")) 16,74609
=IFERROR(SUM(IF(FREQUENCY(IF(\$C\$3:\$C\$9002=F3,MATCH(\$B\$3:\$B\$9002,\$B\$2:\$B\$9002,0)),ROW(\$B\$3:\$B\$9002)-ROW(\$B\$3)+1),1)),"") 0,015625

=SUM(INDEX((\$C\$3:\$C\$18002=F3)*(\$B\$3:\$B\$18002<>"")/COUNTIFS(\$C\$3:\$C\$18002,\$C\$3:\$C\$18002&"",\$B\$3:\$B\$18002,\$B\$3:\$B\$18002&""),)) 67,43359
=SUMPRODUCT((\$C\$3:\$C\$18002=F3)*(\$B\$3:\$B\$18002<>""),1/COUNTIFS(\$C\$3:\$C\$18002,\$C\$3:\$C\$18002&"",\$B\$3:\$B\$18002,\$B\$3:\$B\$18002&"")) 69,51172
=IFERROR(SUM(IF(FREQUENCY(IF(\$C\$3:\$C\$18002=F3,MATCH(\$B\$3:\$B\$18002,\$B\$2:\$B\$18002,0)),ROW(\$B\$3:\$B\$18002)-ROW(\$B\$3)+1),1)),"") 0,03125

However
FREQUENCY could be extracted
array formula
Formula:
`Please Login or Register  to view this content.`
is shorter and power also

=SUM(INDEX((\$C\$3:\$C\$18002=F3)*(\$B\$3:\$B\$18002<>"")/COUNTIFS(\$C\$3:\$C\$18002,\$C\$3:\$C\$18002&"",\$B\$3:\$B\$18002,\$B\$3:\$B\$18002&""),)) 66,38672
=SUMPRODUCT((\$C\$3:\$C\$18002=F3)*(\$B\$3:\$B\$18002<>""),1/COUNTIFS(\$C\$3:\$C\$18002,\$C\$3:\$C\$18002&"",\$B\$3:\$B\$18002,\$B\$3:\$B\$18002&"")) 65,95703
=IFERROR(SUM(IF(FREQUENCY(IF(\$C\$3:\$C\$18002=F3,MATCH(\$B\$3:\$B\$18002,\$B\$3:\$B\$18002,0)),ROW(\$B\$3:\$B\$18002)-ROW(\$B\$3)+1),1)),"") 0,0234375
=IFERROR(SUM(--IF(\$C\$3:\$C\$18002=F3,MATCH(\$B\$3:\$B\$18002,\$B\$3:\$B\$18002,0)=ROW(\$B\$3:\$B\$18002)-ROW(\$B\$3)+1)),"") 0,02734375
=IFERROR(SUM(--IF(\$C\$3:\$C\$18002=F3,MATCH(\$B\$3:\$B\$18002,\$B:\$B,)=ROW(\$B\$3:\$B\$18002))),"") 0,0234375

There are currently 1 users browsing this thread. (0 members and 1 guests)