+ Reply to Thread
Results 1 to 7 of 7

Thread: Eliminate duplicates in comma sep strings

  1. #1
    Valued Forum Contributor JP Romano's Avatar
    Join Date
    10-09-2008
    Location
    Princeton, NJ
    MS-Off Ver
    2007
    Posts
    345

    Eliminate duplicates in comma sep strings

    Good day, all...

    I'm trying to find a way to eliminate duplicate values in a comma separated string. For example, in cell "F1" I have the following text:

    EMAIL, BULLETIN BOARDS, EMAIL, MESSAGING, RETRACT MESSAGES, SCROLLING MESSAGES, RETRACT MESSAGES

    And I'd like to end up with

    EMAIL, BULLETIN BOARDS, MESSAGING, RETRACT MESSAGES, SCROLLING MESSAGES
    ** the second instance of "EMAIL" and "RETRACT MESSAGES" have been removed.


    I have nearly 3,750 rows to go through, ranging from 2 to 88 phrases.

    I suppose I could do a text to columns, and have 1 phrase per cell, then alphabatize horizontally, then run a loop through each of the cells, but that seems horribly inefficient.

    If any of you could lend a hand, I'd appreciate whatever guidance you're offering.

    Thanks!
    Attached Files Attached Files
    Last edited by JP Romano; 03-24-2011 at 05:10 PM. Reason: SOLVED

  2. #2
    Valued Forum Contributor
    Join Date
    11-29-2010
    Location
    Ukraine
    MS-Off Ver
    Excel 2003
    Posts
    2,488

    Re: Eliminate duplicates in comma sep strings

    hi, JP Romano, can you provide a sample workbook with data?

  3. #3
    Valued Forum Contributor JP Romano's Avatar
    Join Date
    10-09-2008
    Location
    Princeton, NJ
    MS-Off Ver
    2007
    Posts
    345

    Re: Eliminate duplicates in comma sep strings

    Okay, took a while for some reason, but I added the attachment to my original post... it's just a small subset of the data. Thanks again for any guidance!

  4. #4
    Forum Guru 6StringJazzer's Avatar
    Join Date
    01-27-2010
    Location
    Vienna, VA, USA
    MS-Off Ver
    Excel 2003, 2007
    Posts
    2,257

    Re: Eliminate duplicates in comma sep strings

    This code will provide the cleaned up text in column G

    Private Sub Cleanup()
    
       Dim c As Range
       Dim a As Variant
       Dim i As Long
       Dim result As String
       
       Set c = Range("F2")
       Do While c <> ""
       
          a = Split(c, ",")
          result = ","
    
          For i = LBound(a) To UBound(a)
             If InStr(1, result, a(i)) = 0 Then
                result = result & a(i) & ","
             End If
          Next i
    
          ' Lop off leading and trailing commas
          result = Mid(result, 2, Len(result) - 2)
       
          c.Offset(0, 1) = result
          
          Set c = c.Offset(1, 0)
          
       Loop
    
    End Sub
    Making the world a better place one fret at a time
    ||||||

    If someone helped you, please click on the star icon at the bottom of their post

    If your problem is solved, please update the first post:
    EDIT, Go Advanced button, set Prefix to SOLVED

    [code]
    ' Enclose code in tags like this
    [/code]

    Don't attach a screenshot
    --just attach your Excel file! It's easier and will let us experiment with your data, formulas, and code.

  5. #5
    Valued Forum Contributor JP Romano's Avatar
    Join Date
    10-09-2008
    Location
    Princeton, NJ
    MS-Off Ver
    2007
    Posts
    345

    Re: Eliminate duplicates in comma sep strings

    6String - that did it. Thank you so much! I'll try to get a better feel for how it does what it does so I can learn more about it. Very cool stuff...thank you!

    Watersev - thanks a million for your time!

  6. #6
    Valued Forum Contributor
    Join Date
    11-29-2010
    Location
    Ukraine
    MS-Off Ver
    Excel 2003
    Posts
    2,488

    Re: Eliminate duplicates in comma sep strings

    hi, again, I know the thread is marked as solved and the offered code is good I would not say it is efficient in terms of time to process test file data attached, please check attachment and run code "test".

    Time to process attached file example - 5751 rows of data:

    Offered code: 26,7 seconds
    My code: 2,8 seconds

    If you are interested in analyzing the offered codes you'll find only two small things (apart from the main part - finding unique values) not used in one of them but they do make a big difference at the end. If to apply them to 6StringJazzer code you will get better results than both results provided above. To keep the intrigue I will leave codes analysis to you.
    Attached Files Attached Files
    Last edited by watersev; 03-24-2011 at 07:23 PM.

  7. #7
    Forum Guru 6StringJazzer's Avatar
    Join Date
    01-27-2010
    Location
    Vienna, VA, USA
    MS-Off Ver
    Excel 2003, 2007
    Posts
    2,257

    Re: Eliminate duplicates in comma sep strings

    watersev uses some very astute techniques worth noting.

    The code turns off Application.ScreenUpdating. This is responsible for most of the time savings. Adding this to my code, which should have been there in the first place, reduces a huge amount of I/O overhead.

    The code loads data into an array from the worksheet, and dumps it back out again from an array. This also saves I/O time vs. reading and writing each cell individually.

    The code adds each keyword to a collection, which causes an error if a duplicate is added. The error is handled by simply not including that duplicate keyword. This may be faster than using InStr to scan for an existing keyword. I suppose that the collection uses indexing or hashing, so this could be faster than searching an entire string, but I'm not sure if it's a significant savings.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.2.0