+ Reply to Thread
Results 1 to 4 of 4

remove duplicates based on partial text match

  1. #1
    Registered User
    Join Date
    06-15-2011
    Location
    Charleston SC
    MS-Off Ver
    Excel 2007
    Posts
    2

    remove duplicates based on partial text match

    So I have been toying with a way to do this in excel, and my brain is not big enough to figure it out apparently.

    I have a massive spreadsheet with web URLs in column A and other columns. Want to remove all but the first occurrence of a root URL from the sheet.

    Col A example:

    http://www.anydomain.com/userid=12
    http://www.anydomain.com/services
    http://anydomain.com/services/righton.asp
    http://anydomain.com/jartnaff.php

    So end result would be to generate a duplicate free spreadsheet that contains unique value rows between the // and the first / which in the above example would be:

    http://www.anydomain.com/userid=12
    http://anydomain.com/services/righton.asp

    any help would be appreciated, I tried several things.

    THanks.

  2. #2
    Forum Expert NBVC's Avatar
    Join Date
    12-06-2006
    Location
    Mississauga, CANADA
    MS-Off Ver
    2003:2010
    Posts
    34,898

    Re: remove duplicates based on partial text match

    Try this.

    Add a helper column to identify the section between the slashes...

    So if your data starts in A1, then in B1 enter formula:

    =MID(A1,FIND("//",A1),FIND("^^",SUBSTITUTE(A1,"/","^^",3))-FIND("//",A1)+1)

    copied down.

    Then select columns A and B and go to Data tab and click Remove Duplicates from the Data tools section of the ribbon.

    Uncheck Column A and leave only Column B selected.

    Click ok.

    You can then delete the formula column.
    Where there is a will there are many ways.

    If you are happy with the results, please add to the contributor's reputation by clicking the reputation icon (star icon) below left corner

    Please also mark the thread as Solved once it is solved. Check the FAQ's to see how.

  3. #3
    Registered User
    Join Date
    06-15-2011
    Location
    Charleston SC
    MS-Off Ver
    Excel 2007
    Posts
    2

    Re: remove duplicates based on partial text match

    NBVC,

    I tried it and it works like a charm. Thanks much! So that I understand the syntax here, I get the find "//" but you lose me after that with the find "^^" and also the ,3 +1 . Can you explain a bit? Grazie

  4. #4
    Forum Expert NBVC's Avatar
    Join Date
    12-06-2006
    Location
    Mississauga, CANADA
    MS-Off Ver
    2003:2010
    Posts
    34,898

    Re: remove duplicates based on partial text match

    This part: FIND("^^",SUBSTITUTE(A1,"/","^^",3))-FIND("//",A1)+1 is the "number of characters" argument of the parent MID function.

    So, basically, because there are known to be 3 slashes in each cell, I want to substitute the 3rd one with a random, but likely not used symbol, e.g. ^^.

    In this function, then we replace the 3rd / with a ^^ and then Find it's position and then subtract the position where the first double slash // occurs and add 1 to include that last slash.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1