How to count number of unique occurrances starting with a text string?

**amateurmetheus** · 11-23-2010, 12:39 AM

Here's what I'm trying to accomplish: I'm trying to get a count of the # of agents in each office. Each office has two types of agents: Listing Agent, and Selling Agent. Either class could be attributed to that agent, but the agent ID will remain the same..the agent might be listed as a Listing Agent, as a Selling Agent, or as both.

I need to find the total # of agents attributed to an office based on the data drawn into the pivot table. (see the attached spreadsheet). I already have the office ranking done, but need to finish the # of agents count.

Currently, I have a pivot table set up for each of 4 tables of records (10,000-35,000 records each), which consists of agent ID#, office ID#, agent2 ID#, office2 ID#, and a few other columns of data (not relevant to this item). My initial thought was to create a hash column for office/agent data as follows (the ..... is just for spacing):

ListingofficeID_ListingAgentID#............SellingOfficeID_SellingAgentID
1234_444444....................................11235_99992
1234_333333....................................118_eeha
144_2837.........................................198_hhhegh
118_8888.........................................1234_444444
etc.

What I'm trying to do in the above scenario is count all unique occurances of each office ID# from two columns of data (which would calculate the total # of agents). The counts below would apply from the data above. (the ..... is just for spacing)

officeid.......# of unique occurrances
1234..........2
144...........1
118.......... 2
11235.......1
198...........1

With the large mass of data (roughly 100,000 total records), I'm having trouble coming up with anything that doesn't take 20 minutes to recalculate (including setting application.screenupdating = false and application.enableevents = false to start with).

Any ideas? I'm at a loss...

THANK YOU for any and help you can provide. (Sample spreadsheet is attached)

:::a

**outofthehat** · 11-23-2010, 04:18 AM

You can use COUNTIF with wildcard characters, if you have a full list of all the office numbers.

I assume your starting data is as per cols F & G on the attached Data sheet, which I reconstructed from the data you provided.

Ranking sheet:
The attached starts from a full list of office numbers in col B.
Column C calculates the number of agents in each office, using a couple of COUNTIFs and a wildcard (*), to calculate how many times that office number appears in your raw data.
Column A then generates the ranking number based on column C.

In column G, I start with a numerical list of the ranking numbers, 1-(30) - have only given 4 as an example.
H and I use VLOOKUP to extract office number and # of agents from columns A-C, giving the table listed in rank order.

**amateurmetheus** · 11-23-2010, 04:39 AM

THANK YOU, outofthehat! Your solution was almost exactly what I was looking for. The primary issue is: how to filter out the data to include only the unique values?

For instance: if you filter office 5116, it shows "14" entries. However, there are only 11 unique entries (with "11" being the value I'd be looking to obtain in this scenario.)

I have the unique ranking function completed (ranked by # of sales) but still can't figure out how to count only the unique entries based on the office ID#.

Thanks again!!! I really appreciate the help. I've spent--literally--62 *hours* on this ranking alone, and I'm still spinning my wheels. This should've been a 60-minute task.

:::a

**outofthehat** · 11-23-2010, 01:08 PM

Okay, I'm sure there's probably a more efficient way of doing this, but here's what I've come up with.

First, we need to clean those two columns up for our COUNTIF, by removing duplicates.
To do this in the attached, I've created 2 logical columns which use MATCH functions (with carefully entered $ in the formulae) to tell me whether the unique ID has appeared in earlier rows - column H for the Raw Listing data, column I for the Raw Selling data. NB, in column I, we also need to check whether the value appears in the same row, but in Raw Listings (hence 3 things in the OR). No idea how long this will take to calculate for thousands of rows...

Columns J and K then duplicate F and G, except putting 'Duplicate' in if identified in cols H and I.

We then do our Ranking COUNTIFs on columns J and K instead of F and G - and there's your 11...

I think that works...

**amateurmetheus** · 11-30-2010, 05:14 PM

Thanks again for the help on this, outofthehat! I finally got it all implemented into my spreadsheet, and it does indeed work.

I discovered (perhaps not surprisingly) that my workbook increased in size from 12mb to 70mb, and I've only added 1/3 of the additional comparison fields =( It's also slowed everything way way way down due to the large number of records (exceeding 200k records in all over 3 sheets). But: your solution worked!

I just found this little diddy that seems to work FLAWLESSLY using a single array. It will eliminate over 80-90% of the helper formulas needed, and compares defined list names.

my first list was named 'pc2_cu', and my second 'pc2_cu2'

Ctrl+Shift+Enter into the cell of your choice:

=SUM(IF(MATCH(pc2_cu,pc2_cu,0)>=(ROW(pc2_cu)-MIN(ROW(pc2_cu))+1),1,0))+SUM(IF(ISERROR(MATCH(pc2_cu2,pc2_cu,0)>=(ROW(pc2_cu2)-MIN(ROW(pc2_cu2))+1)),IF(MATCH(pc2_cu2,pc2_cu2,0)>=(ROW(pc2_cu2)-MIN(ROW(pc2_cu2))+1),1,0),0))

I have a vague idea of how it works, but it works. I can set up a pivot table for each, and get a count of unique items over two columns. The ranges are dynamic, and stretch down exactly to the last row. (I have "Grand Total" at the very bottom of each pivot table. Then, match() to find the row# of "Grand Total", and subtract 1 from the final result to get the height of the Offset() function. Seems to work pretty well, and always consistently for me thusfar.)

I'll revamp my workbook and see if the above solution works workbook-wide. If so, I should be able to reduce the size back down to under 10-12mb (fingers crossed.) I'll post my findings for y'all once I get it figured out.

Caveat: I'm not skilled at ultra-complicated functions for Excel (I found the above on a google search, located this site: http://www.get-digital-help.com/2009...umns-in-excel/ ) So I apologize, but I won't be able to tell you how to modify the above. Good luck and I hope this helps somebody out!

:::a

Originally Posted by outofthehat

Okay, I'm sure there's probably a more efficient way of doing this, but here's what I've come up with.

First, we need to clean those two columns up for our COUNTIF, by removing duplicates.
To do this in the attached, I've created 2 logical columns which use MATCH functions (with carefully entered $ in the formulae) to tell me whether the unique ID has appeared in earlier rows - column H for the Raw Listing data, column I for the Raw Selling data. NB, in column I, we also need to check whether the value appears in the same row, but in Raw Listings (hence 3 things in the OR). No idea how long this will take to calculate for thousands of rows...

Columns J and K then duplicate F and G, except putting 'Duplicate' in if identified in cols H and I.

We then do our Ranking COUNTIFs on columns J and K instead of F and G - and there's your 11...

I think that works...