+ Reply to Thread
Results 1 to 8 of 8

Excel formula question re duplicates

  1. #1
    Registered User
    Join Date
    05-18-2007
    Posts
    8

    Excel formula question re duplicates

    Hi,

    I'm not a very experienced Excel user, so I apologize if this question is very basic. I briefly searched through the forums for an answer on it, but was overwhelmed by the amount of posts re removing duplicates.

    I have a list of data that looks something like this:

    ____A_____B____C
    1__John___x____y
    2__Mary___x____y
    3__John___x____
    4__Tom___x____
    5__Jim____x____y

    I want to remove duplicate rows in which column A and B are the same (here, rows 1 and 3). But, in the instance that there are two such rows, I only want to remove the duplicate row that lacks a value in C (in the above case: remove the "John,x" row, keep the "John,x,y" row). The actual data I'm working with involves several thousands of rows, so doing it manually is out of the question. Any ideas how best to do this? Would it involve a formula? the filter only unique values function? something more obvious that I'm missing?

    Any help is greatly appreciated!

  2. #2
    Forum Expert Paul's Avatar
    Join Date
    02-05-2007
    Location
    Wisconsin
    MS-Off Ver
    2016/365
    Posts
    6,885
    How about something like the following code. This code assumes your data starts in A2. If otherwise, adjust the code.
    Please Login or Register  to view this content.
    I'd recommend saving a backup copy of your spreadsheet before running this against it in case it's not what you're looking for. Once you run a macro, there's no UNDO option.

    UPDATE: I tried running my code on a 100 row example, and it removed the duplicate rows that had no value in column C almost instantly, but the code kept running for several minutes and kept going back through the i and j series' (I debugged the code while it was running and saw this). Perhaps someone more familiar with loops could assist in having it only run through all of the j values once for each i value? If you try it on a 20-row example, you'll see it work (remove the offending rows) instantly, but then continue processing for a few seconds.
    Last edited by Paul; 05-18-2007 at 02:00 PM.

  3. #3
    Forum Expert Paul's Avatar
    Join Date
    02-05-2007
    Location
    Wisconsin
    MS-Off Ver
    2016/365
    Posts
    6,885
    I modified my code to include
    Application.ScreenUpdating = False
    Application.EnableEvents = False
    at the beginning, and set them back to true at the end. This fixed the endless looping and screen flashing. On the 100-row example it took a couple seconds to finish.

    Here you go:
    Please Login or Register  to view this content.

  4. #4
    Registered User
    Join Date
    05-18-2007
    Posts
    8
    Thanks for the suggestion. Unfortunately, although the code makes sense in writing to me, it does seem to freeze up my Excel with this load of data.

    Perhaps it would speed up/fix it if "i" were just compared to the rows after it instead of the whole range again? After all, there's no need to scan over the previous rows as any duplicates should have already been caught and deleted, right?

    Would you know how I'd change this up to fix that?
    Last edited by consulfirmin; 05-18-2007 at 04:17 PM.

  5. #5
    Registered User
    Join Date
    05-18-2007
    Posts
    8
    Can I just change the j "For" loop to the following:

    For j = i +1 To iLastRow

    (If there's something obviously wrong with this suggestion, apologies again for my lack of coding knowledge.)

  6. #6
    Forum Expert Paul's Avatar
    Join Date
    02-05-2007
    Location
    Wisconsin
    MS-Off Ver
    2016/365
    Posts
    6,885
    I thought about using j=i+1 to start, but then realized something:

    What if your rows looked like this:

    __________A______B_____C___
    Row 20....John.....x
    Row 21....Paul.....x.......y
    Row 22....Mary.....x.......y
    Row 23....Paul.....x
    Row 24....John.....x.......y

    Using j=i+1, you would never check the rows above the current j value (or i value for that matter). So using that, it would end up deleting row 23 because the 'Paul, x, y' row appears before the 'Paul, x' row. However, look at the two rows for John - the row where column C is blank is ABOVE the row with a value in column C.

    What that means is if you disregard checking rows ABOVE the current row for all values of j, then you would miss the fact that row 20 should be deleted (and 24 should remain).

    Does that make sense, or did I make an incorrect assumption?

  7. #7
    Registered User
    Join Date
    05-18-2007
    Posts
    8
    But, in your example, wouldn't Row 20 have already been deleted by the time i = 24? because...

    when i =20 and j = 24, the program sees that i is missing a value in C but that j has it and subsequently deletes the i row.

    Which means there's no need to ever check above, no?

    I admit though that I may be missing something...

  8. #8
    Registered User
    Join Date
    05-18-2007
    Posts
    8
    Sorry to resurrect a topic that was dead (possibly for good reason), but I'm just curious if anyone has any additional suggestions for solving the Excel duplicate issue I mentioned in my first post. The suggested program makes sense to me but seems to just freeze up on the large amount of data I'm running it on. I figure there has to be a rather straightforward solution that I'm just not thinking of!

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1