+ Reply to Thread
Results 1 to 8 of 8

Deleting Duplicates, All records unique

  1. #1
    Registered User
    Join Date
    01-11-2006
    Posts
    9

    Deleting Duplicates, All records unique

    I want to delete the original and the duplicate of ALL rows that have duplicate filenames.

    My column headers are: id, filename, location, and description.
    All descriptions are unique.
    My filename column has duplicates. For example, flower010104.jpg is listed twice, with two different descriptions. I want to delete BOTH rows containing flower010104.jpg, NOT JUST the 2nd one (the duplicate).

    So, I want to delete all ROWS with duplicate filenames, regardless of the description being unique (which makes the 'record' unique). I have found that I can only filter by 'unique record', but ALL records are unique, due to the description.

    I need help. How can I do accomplish my task?
    Last edited by mirdonamy; 01-11-2006 at 03:00 PM.

  2. #2
    Registered User
    Join Date
    01-11-2006
    Posts
    9
    If this is impossible, please let me know! Doing this manually is taking forever!

  3. #3
    Bernie Deitrick
    Guest

    Re: Deleting Duplicates, All records unique

    mirdonamy,

    Use another column with a formula like this in row2:

    =COUNTIF(B:B,B2)>1

    Where column B has your filenames. Then copy down to match your data table, then filter or sort
    based on that column, and delete rows where the value of your formula is TRUE.

    HTH,
    Bernie
    MS Excel MVP


    "mirdonamy" <[email protected]> wrote in message
    news:[email protected]...
    >
    > My column headers are: id, filename, location, and description.
    >
    > All descriptions are unique.
    > My filename column has duplicates. For example, flower010104.jpg is
    > listed twice, with two different descriptions. I want to delete BOTH
    > rows containing flower010104.jpg.
    >
    > So, I want to delete ROWS with duplicate filenames, regardless of the
    > description being unique (which makes the 'record' unique).
    >
    > I have found that I can only filter by 'unique record', but ALL records
    > are unique, due to the description.
    >
    > I need help. How can I do accomplish my task?
    >
    >
    > --
    > mirdonamy
    > ------------------------------------------------------------------------
    > mirdonamy's Profile: http://www.excelforum.com/member.php...o&userid=30348
    > View this thread: http://www.excelforum.com/showthread...hreadid=500305
    >




  4. #4
    Registered User
    Join Date
    01-11-2006
    Posts
    9
    That's a pretty impressive formula, but here's the odd thing... TRUE only brought up 22 records (all duplicate filenames, just as I wanted). However, it didn't bring up the other 700+ records that have duplicate filenames. I can't quite understand why this happened.

    Just a note, these filenames have a row filled in completely (all the way across) and the duplicates do not have any information filled out in other columns (other than the filename) column. Does this affect the formula?

    Quote Originally Posted by Bernie Deitrick
    mirdonamy,

    Use another column with a formula like this in row2:

    =COUNTIF(B:B,B2)>1

    Where column B has your filenames. Then copy down to match your data table, then filter or sort
    based on that column, and delete rows where the value of your formula is TRUE.

    HTH,
    Bernie
    MS Excel MVP


    "mirdonamy" <[email protected]> wrote in message
    news:[email protected]...
    >
    > My column headers are: id, filename, location, and description.
    >
    > All descriptions are unique.
    > My filename column has duplicates. For example, flower010104.jpg is
    > listed twice, with two different descriptions. I want to delete BOTH
    > rows containing flower010104.jpg.
    >
    > So, I want to delete ROWS with duplicate filenames, regardless of the
    > description being unique (which makes the 'record' unique).
    >
    > I have found that I can only filter by 'unique record', but ALL records
    > are unique, due to the description.
    >
    > I need help. How can I do accomplish my task?
    >
    >
    > --
    > mirdonamy
    > ------------------------------------------------------------------------
    > mirdonamy's Profile: http://www.excelforum.com/member.php...o&userid=30348
    > View this thread: http://www.excelforum.com/showthread...hreadid=500305
    >

  5. #5
    Pete
    Guest

    Re: Deleting Duplicates, All records unique

    Here's another fairly quick way. I assume your data is not sorted by
    filename and I presume you want to keep the sequence you have at the
    moment. Assume your four fields occupy columns A to D, and that the
    data starts in row 2 (after the headings) and goes down to row 5000.

    Add the heading "seq" in column E and in E2 enter 1. Highlight cells E2
    to E5000 then Edit | Fill | Series and check Linear with a step value
    of 1. Click OK - this will fill a sequence down this column to enable
    you to get the data back into the same order.

    Highlight A1 to E5000 and sort the data using filename (column B). Add
    the heading "Check" in column F, and in cell F2 enter the following
    formula:

    =IF(OR(B2=B1,B2=B3),"duplicate","unique")

    Copy this down to F5000 (double-click the fill handle). Select Data |
    Filter | Autofilter (on). Filter column F for "duplicate". Highlight
    all visible rows between Row 1 and Row 5001, and Edit | Delete Row. Use
    the filter pull-down on column F to select "All", then Data | Filter |
    Autofilter (off).

    Re-sort the remaining data using column E (seq) for sort order.
    Finally, delete columns E and F.

    Hope this helps.

    Pete


  6. #6
    Pete
    Guest

    Re: Deleting Duplicates, All records unique

    Note, you will get some #REF in column F after you have deleted the
    rows, but this does not matter.

    Pete


  7. #7
    Registered User
    Join Date
    01-11-2006
    Posts
    9
    You are brilliant!!! Thank you so much! You saved my day and gave me back hours of my life! Thank you thank you!

    I am so appreciative!
    Arielle

    Quote Originally Posted by Pete
    Here's another fairly quick way. I assume your data is not sorted by
    filename and I presume you want to keep the sequence you have at the
    moment. Assume your four fields occupy columns A to D, and that the
    data starts in row 2 (after the headings) and goes down to row 5000.

    Add the heading "seq" in column E and in E2 enter 1. Highlight cells E2
    to E5000 then Edit | Fill | Series and check Linear with a step value
    of 1. Click OK - this will fill a sequence down this column to enable
    you to get the data back into the same order.

    Highlight A1 to E5000 and sort the data using filename (column B). Add
    the heading "Check" in column F, and in cell F2 enter the following
    formula:

    =IF(OR(B2=B1,B2=B3),"duplicate","unique")

    Copy this down to F5000 (double-click the fill handle). Select Data |
    Filter | Autofilter (on). Filter column F for "duplicate". Highlight
    all visible rows between Row 1 and Row 5001, and Edit | Delete Row. Use
    the filter pull-down on column F to select "All", then Data | Filter |
    Autofilter (off).

    Re-sort the remaining data using column E (seq) for sort order.
    Finally, delete columns E and F.

    Hope this helps.

    Pete

  8. #8
    Pete
    Guest

    Re: Deleting Duplicates, All records unique

    Well, thanks very much for the feedback - I didn't expect such praise !

    Pete


+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1