Excel 2007 : Removing duplicates with a condition

**MattyNinja** · 03-19-2010, 01:56 PM

I have attached an example of my data set containing 12 of 143,000 rows

The real data is in Rows A though E, I have added a formula in row F to tell me how many boxes are filled in between B and E. (thanks to SHG)
I am trying to remove duplicate entries from this sheet, but when I use a simple remove duplicate function, excel 2007 just deletes all instances after the first.

I am looking for a conditional removal of duplicates, I want it to retain the higher value( column f )of the duplicates.

An example would be…
There are 2 “abgent” the first has just phone and fax, the second has phone, fax, and email; hence the values titled “rating” in column f

I would delete row 2 containing “abgent” where the data set is less complete

Rows 4 and 5 contain “abgent, inc.” which I do not want considered as a match for “abgent”

I would call what I am looking for a “removal of duplicates with condition”

Any help would be greatly appreciated!

**MattyNinja** · 03-19-2010, 01:57 PM

here is the attachment

RSpecianJr · 03-20-2010, 03:06 PM

Hey MattyNinja,

I have attached an adapted workbook with a macro in it that will go through and remove duplicates based on your criteria.

I didn't spend a lot of time on it so it could be improved a bit. For instance, I might add a way to combine data from two incomplete rows.

i.e. in your example you have two "Abgent, Inc.". They both contain two pieces of information, but they aren't the same two pieces of information. One has Email Address and the other has a Fax Number. With the current macro, it would delete the first one, the one with the fax.

If you have any questions or if for some reason this doesn't work, let me know and I'll see what I can do.

Hope this helps,

Robert Specian Jr.

Note: if your workbook has a whole lot of data in it, it might take a little while to run. Be patient. And as always, keep a backup. = )

**MattyNinja** · 03-26-2010, 10:45 AM

Thank you for your help!

You were not kidding about the speed, I turned standby of on my computer, took of the side of the case, turned my monitor off, and let it run for well over 48 hours. It had not finished due to the way it recalculated after every delete.

I do not know VB, but i was able to remove the part of your macro that deleted the cells and cleared the values of "keep/delete" so the macro just marked them, this was much more manageable.

I then used a sort on the keep/delete column and manually deleted them.

Im not sure if you work with access, but now that I have my raw data cleaned, i need a database that can queue multiple items.

here is my thread discussing my next issue

http://www.excelforum.com/access-tab...ht-for-me.html