+ Reply to Thread
Results 1 to 7 of 7

Eliminate duplicates in comma sep strings

  1. #1
    Forum Contributor JP Romano's Avatar
    Join Date
    10-09-2008
    Location
    Princeton, NJ
    MS-Off Ver
    2010
    Posts
    500

    Eliminate duplicates in comma sep strings

    Good day, all...

    I'm trying to find a way to eliminate duplicate values in a comma separated string. For example, in cell "F1" I have the following text:

    EMAIL, BULLETIN BOARDS, EMAIL, MESSAGING, RETRACT MESSAGES, SCROLLING MESSAGES, RETRACT MESSAGES

    And I'd like to end up with

    EMAIL, BULLETIN BOARDS, MESSAGING, RETRACT MESSAGES, SCROLLING MESSAGES
    ** the second instance of "EMAIL" and "RETRACT MESSAGES" have been removed.


    I have nearly 3,750 rows to go through, ranging from 2 to 88 phrases.

    I suppose I could do a text to columns, and have 1 phrase per cell, then alphabatize horizontally, then run a loop through each of the cells, but that seems horribly inefficient.

    If any of you could lend a hand, I'd appreciate whatever guidance you're offering.

    Thanks!
    Attached Files Attached Files
    Last edited by JP Romano; 03-24-2011 at 05:10 PM. Reason: SOLVED

  2. #2
    Forum Expert
    Join Date
    11-29-2010
    Location
    Ukraine
    MS-Off Ver
    Excel 2019
    Posts
    4,168

    Re: Eliminate duplicates in comma sep strings

    hi, JP Romano, can you provide a sample workbook with data?

  3. #3
    Forum Contributor JP Romano's Avatar
    Join Date
    10-09-2008
    Location
    Princeton, NJ
    MS-Off Ver
    2010
    Posts
    500

    Re: Eliminate duplicates in comma sep strings

    Okay, took a while for some reason, but I added the attachment to my original post... it's just a small subset of the data. Thanks again for any guidance!

  4. #4
    Administrator 6StringJazzer's Avatar
    Join Date
    01-27-2010
    Location
    Tysons Corner, VA, USA
    MS-Off Ver
    MS365 Family 64-bit
    Posts
    24,734

    Re: Eliminate duplicates in comma sep strings

    This code will provide the cleaned up text in column G

    Please Login or Register  to view this content.
    Jeff
    | | |會 |會 |會 |會 | |:| | |會 |會
    Read the rules
    Use code tags to [code]enclose your code![/code]

  5. #5
    Forum Contributor JP Romano's Avatar
    Join Date
    10-09-2008
    Location
    Princeton, NJ
    MS-Off Ver
    2010
    Posts
    500

    Re: Eliminate duplicates in comma sep strings

    6String - that did it. Thank you so much! I'll try to get a better feel for how it does what it does so I can learn more about it. Very cool stuff...thank you!

    Watersev - thanks a million for your time!

  6. #6
    Forum Expert
    Join Date
    11-29-2010
    Location
    Ukraine
    MS-Off Ver
    Excel 2019
    Posts
    4,168

    Re: Eliminate duplicates in comma sep strings

    hi, again, I know the thread is marked as solved and the offered code is good I would not say it is efficient in terms of time to process test file data attached, please check attachment and run code "test".

    Time to process attached file example - 5751 rows of data:

    Offered code: 26,7 seconds
    My code: 2,8 seconds

    If you are interested in analyzing the offered codes you'll find only two small things (apart from the main part - finding unique values) not used in one of them but they do make a big difference at the end. If to apply them to 6StringJazzer code you will get better results than both results provided above. To keep the intrigue I will leave codes analysis to you.
    Attached Files Attached Files
    Last edited by watersev; 03-24-2011 at 07:23 PM.

  7. #7
    Administrator 6StringJazzer's Avatar
    Join Date
    01-27-2010
    Location
    Tysons Corner, VA, USA
    MS-Off Ver
    MS365 Family 64-bit
    Posts
    24,734

    Re: Eliminate duplicates in comma sep strings

    watersev uses some very astute techniques worth noting.

    The code turns off Application.ScreenUpdating. This is responsible for most of the time savings. Adding this to my code, which should have been there in the first place, reduces a huge amount of I/O overhead.

    The code loads data into an array from the worksheet, and dumps it back out again from an array. This also saves I/O time vs. reading and writing each cell individually.

    The code adds each keyword to a collection, which causes an error if a duplicate is added. The error is handled by simply not including that duplicate keyword. This may be faster than using InStr to scan for an existing keyword. I suppose that the collection uses indexing or hashing, so this could be faster than searching an entire string, but I'm not sure if it's a significant savings.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1