Excel formula question re duplicates

**consulfirmin** · 05-18-2007, 12:12 PM

Hi,

I'm not a very experienced Excel user, so I apologize if this question is very basic. I briefly searched through the forums for an answer on it, but was overwhelmed by the amount of posts re removing duplicates.

I have a list of data that looks something like this:

____A_____B____C
1__John___x____y
2__Mary___x____y
3__John___x____
4__Tom___x____
5__Jim____x____y

I want to remove duplicate rows in which column A and B are the same (here, rows 1 and 3). But, in the instance that there are two such rows, I only want to remove the duplicate row that lacks a value in C (in the above case: remove the "John,x" row, keep the "John,x,y" row). The actual data I'm working with involves several thousands of rows, so doing it manually is out of the question. Any ideas how best to do this? Would it involve a formula? the filter only unique values function? something more obvious that I'm missing?

Any help is greatly appreciated!

**Paul** · 05-18-2007, 01:45 PM

How about something like the following code. This code assumes your data starts in A2. If otherwise, adjust the code.

Please Login or Register  to view this content.

I'd recommend saving a backup copy of your spreadsheet before running this against it in case it's not what you're looking for. Once you run a macro, there's no UNDO option.

UPDATE: I tried running my code on a 100 row example, and it removed the duplicate rows that had no value in column C almost instantly, but the code kept running for several minutes and kept going back through the i and j series' (I debugged the code while it was running and saw this). Perhaps someone more familiar with loops could assist in having it only run through all of the j values once for each i value? If you try it on a 20-row example, you'll see it work (remove the offending rows) instantly, but then continue processing for a few seconds.

**Paul** · 05-18-2007, 02:18 PM

I modified my code to include
Application.ScreenUpdating = False
Application.EnableEvents = False
at the beginning, and set them back to true at the end. This fixed the endless looping and screen flashing. On the 100-row example it took a couple seconds to finish.

Here you go:

Please Login or Register  to view this content.

**consulfirmin** · 05-18-2007, 04:15 PM

Thanks for the suggestion. Unfortunately, although the code makes sense in writing to me, it does seem to freeze up my Excel with this load of data.

Perhaps it would speed up/fix it if "i" were just compared to the rows after it instead of the whole range again? After all, there's no need to scan over the previous rows as any duplicates should have already been caught and deleted, right?

Would you know how I'd change this up to fix that?

**consulfirmin** · 05-18-2007, 04:21 PM

Can I just change the j "For" loop to the following:

For j = i +1 To iLastRow

(If there's something obviously wrong with this suggestion, apologies again for my lack of coding knowledge.)

**Paul** · 05-18-2007, 04:28 PM

I thought about using j=i+1 to start, but then realized something:

What if your rows looked like this:

__________A______B_____C___
Row 20....John.....x
Row 21....Paul.....x.......y
Row 22....Mary.....x.......y
Row 23....Paul.....x
Row 24....John.....x.......y

Using j=i+1, you would never check the rows above the current j value (or i value for that matter). So using that, it would end up deleting row 23 because the 'Paul, x, y' row appears before the 'Paul, x' row. However, look at the two rows for John - the row where column C is blank is ABOVE the row with a value in column C.

What that means is if you disregard checking rows ABOVE the current row for all values of j, then you would miss the fact that row 20 should be deleted (and 24 should remain).

Does that make sense, or did I make an incorrect assumption?

**consulfirmin** · 05-18-2007, 05:53 PM

But, in your example, wouldn't Row 20 have already been deleted by the time i = 24? because...

when i =20 and j = 24, the program sees that i is missing a value in C but that j has it and subsequently deletes the i row.

Which means there's no need to ever check above, no?

I admit though that I may be missing something...

**consulfirmin** · 05-24-2007, 06:40 PM

Sorry to resurrect a topic that was dead (possibly for good reason), but I'm just curious if anyone has any additional suggestions for solving the Excel duplicate issue I mentioned in my first post. The suggested program makes sense to me but seems to just freeze up on the large amount of data I'm running it on. I figure there has to be a rather straightforward solution that I'm just not thinking of!

Excel formula question re duplicates

LinkBack

Thread Tools

Rate This Thread

Display

Excel formula question re duplicates

Thread Information

Users Browsing this Thread

Bookmarks

Bookmarks

Posting Permissions