+ Reply to Thread
Results 1 to 22 of 22

Compare words between two cells and get matching percentage

Hybrid View

  1. #1
    Registered User
    Join Date
    12-01-2006
    Posts
    55

    Question Compare words between two cells and get matching percentage

    Hi all,

    I have a column with 10,000+ lines, and lots of them are duplicated.
    I have a formula that tells me how many times they are duplicated, but it only looks for 100% matching cells.

    Now, lots of cells are partially duplicated, for example:

    A1 - Team Abcde
    A2 - Team Abcde Fghi
    B1 - Team Abc

    I would like to have a formula giving me a percentage of how two cells are similar to each others by checking the words.
    In this case, B1 is 50% similar to A1 because only the word 'Team' is duplicated.
    B1 is only 33,34% similar to A2.

    I hope any of you can give me a hand on this, thanks :-)
    Last edited by paokun; 04-26-2010 at 11:04 AM.

  2. #2
    Registered User
    Join Date
    04-29-2008
    Location
    Tennessee
    MS-Off Ver
    2007
    Posts
    37

    Re: Compare words between two cells and get matching percentage

    disregard this post...I had a formula here but it didn't work.
    Last edited by Rebuild8; 04-19-2010 at 10:39 AM.

  3. #3
    Registered User
    Join Date
    04-15-2010
    Location
    Greeley, PA
    MS-Off Ver
    Excel 2007
    Posts
    13

    Re: Compare words between two cells and get matching percentage

    Quote Originally Posted by paokun View Post
    Hi all,

    I have a column with 10,000+ lines, and lots of them are duplicated.
    I have a formula that tells me how many times they are duplicated, but it only looks for 100% matching cells.

    Now, lots of cells are partially duplicated, for example:

    A1 - Team Abcde
    A2 - Team Abcde Fghi
    B1 - Team Abc

    I would like to have a formula giving me a percentage of how two cells are similar to each others by checking the words.
    In this case, B1 is 50% similar to A1 because only the word 'Team' is duplicated.
    B1 is only 33,34% similar to A2.

    I hope any of you can give me a hand on this, thanks :-)

    I was able to do this with a Levenshtein formula. I borrowed the VBA algorithm from here:

    http://en.wikibooks.org/wiki/Algorit...htein_distance

    Insert a module into your workbook by hitting Alt-F11, right-click vbaproject, insert > module. Paste this code into the window:
    Function levenshtein(a As String, b As String) As Integer
     
        Dim i As Integer
        Dim j As Integer
        Dim cost As Integer
        Dim d() As Integer
        Dim min1 As Integer
        Dim min2 As Integer
        Dim min3 As Integer
     
        If Len(a) = 0 Then
            levenshtein = Len(b)
            Exit Function
        End If
     
        If Len(b) = 0 Then
            levenshtein = Len(a)
            Exit Function
        End If
     
        ReDim d(Len(a), Len(b))
     
        For i = 0 To Len(a)
            d(i, 0) = i
        Next
     
        For j = 0 To Len(b)
            d(0, j) = j
        Next
     
        For i = 1 To Len(a)
            For j = 1 To Len(b)
                If Mid(a, i, 1) = Mid(b, j, 1) Then
                    cost = 0
                Else
                    cost = 1
                End If
                min1 = (d(i - 1, j) + 1)
                min2 = (d(i, j - 1) + 1)
                min3 = (d(i - 1, j - 1) + cost)
                d(i, j) = Application.WorksheetFunction.Min(min1, min2, min3)
            Next
        Next
     
        levenshtein = d(Len(a), Len(b))
     
    End Function
    Now you can use this formula to compare A1 and B1:

    =(MAX(LEN(A1), LEN(B1)) - levenshtein(A1,B1))/MAX(LEN(A1), LEN(B1))
    This might compare a bit more accurately than you were asking for, as it is on a character-by-character basis rather than word-by-word (among other things). This gives matches of 80% and 53.33%, but I think it should suit your purposes.

  4. #4
    Registered User
    Join Date
    12-01-2006
    Posts
    55

    Re: Compare words between two cells and get matching percentage

    Hi, thanks a lot for your reply!
    I've tried the formula and it works, thing is that when I have, for example:

    A1 - A look at the sea.
    B1 - kkkkkkkk kkkkkk kkk

    It gives me a 10.53% match, but I'd like to have 0%.

    Also I would prefer to check by words. For example:

    A1 - A look at the sea.
    B1 - Beach look at the a.

    Without comparing the words order, out of 5 words, only 4 match, so I'd like to have 80% as a result.

    Pretty complicated I guess, but since you are geniuses I'm sure it's worth asking.

  5. #5
    Registered User
    Join Date
    12-01-2006
    Posts
    55

    Re: Compare words between two cells and get matching percentage

    Bump.. anyone?

  6. #6
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531

    Re: Compare words between two cells and get matching percentage

    I think you need to clarify what happens regards

    a) punctuation, eg A and a. are not a match
    b) case sensitivity (eg A and a are not a match per se)

    and what happens in the case of say a common word with differing frequencies, eg:

    A1: my the weather is lovely
    B1: my oh my what wonderful weather

    In this instance what is the % output ?

    Given B seemingly takes precedence in calculating % of the 6 words used in B1 3 are found, however, "my" appears with differing frequency - once in A1 and twice in B1.

  7. #7
    Registered User
    Join Date
    12-01-2006
    Posts
    55

    Re: Compare words between two cells and get matching percentage

    Hi,

    Thanks a lot for your reply and questions!

    a) punctuation, eg A and a. are not a match
    No need to consider punctuation.

    b) case sensitivity (eg A and a are not a match per se)
    'Aurora' and 'aurora' should be considered different words.

    Regarding the frequency words appear, for example:
    A1: my the weather is lovely
    B1: my oh my what wonderful weather

    Each repeated word can be considered only once. So, B1 has 5 words, two of which appear in A1 too, so the output should be 40%.

    Still, complicated, but I appreciate any help! Thanks a lot.

  8. #8
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531

    Re: Compare words between two cells and get matching percentage

    The case sensitivity makes this a little more complex I think...

    Function CompareString(rngS1 As Range, rngS2 As Range, Optional boolCase As Boolean = True) As Double
    Dim vW1, vW2, oDic As Object, lngW As Long, lngU As Long, lngM As Long, strTemp As String 
    vW1 = Split(rngS1, " ")
    vW2 = Split(rngS2, " ")
    Set oDic = CreateObject("Scripting.Dictionary")
    For lngW = LBound(vW2) To UBound(vW2) Step 1
        strTemp = Replace(vW2(lngW), ".", "")
        With oDic
            If Not .exists(strTemp) Then
                lngU = lngU + 1
                .Add strTemp, lngU
                If boolCase Then
                    lngM = lngM + rngS1.Parent.Evaluate("SUMPRODUCT(--ISNUMBER(FIND("" " & strTemp & " "","" ""&SUBSTITUTE(" & rngS1.Address & ",""."","""")&"" "")))")
                Else
                    lngM = lngM - IsNumeric(Application.Match(strTemp,vW1,0))
                End If
            End If
        End With
    Next lngW
    Set oDic = Nothing
    CompareString = lngM / lngU
    End Function
    the above, stored in a module, could be called from a cell along the lines of:

    =COMPARESTRING(A1,B1)
    this would generate the results outlined for your examples with the exception of:

    Quote Originally Posted by paokun
    A1 - A look at the sea.
    B1 - Beach look at the a.

    Without comparing the words order, out of 5 words, only 4 match, so I'd like to have 80% as a result.
    which given stipulated case sensitivity should actually return 60% given a <> A.
    The UDF has optional third parameter (Boolean) for Case Sensitivity - default being True - if explicitly set to False in the call you would get the 80%, eg:

    =COMPARESTRING(A1,B1,FALSE)
    I hope that helps.

    Note: in the UDF only a basic punctuation assumption is made (eg period delimiter) - pending complexity of your real strings you will most likely need to elaborate on this part.

  9. #9
    Registered User
    Join Date
    12-01-2006
    Posts
    55

    Re: Compare words between two cells and get matching percentage

    Wow, this is fantastic stuff!

    Thanks a lot for your quick reply, I really appreciate it.
    It works like a charm, and being able to use the 'false' tag is also extremely helpful.

    I've got one last question. Would it be possible to compare B1 with a range of cells in A, and show as a result the highest matching percentage?

    Edit: .. and show as a result the highest matching percentage and the best matching cell?

    For example:
    C1 - 86%
    D1 - my the weather is lovely (<- text content of the most similar cell)
    Last edited by paokun; 04-21-2010 at 08:38 AM.

  10. #10
    Registered User
    Join Date
    04-15-2010
    Location
    Greeley, PA
    MS-Off Ver
    Excel 2007
    Posts
    13

    Re: Compare words between two cells and get matching percentage

    Edit: Unnecessary.
    Last edited by SpeedingLunatic; 04-21-2010 at 11:27 AM.

  11. #11
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531

    Re: Compare words between two cells and get matching percentage

    I'm not quite sure what you're asking for... if you want the UDF to accept multiple strings for comparative purposes then perhaps the below adaptation might help ?

    Function CompareString(rngS1 As Range, rngS2 As Range, strType As String, Optional boolCase As Boolean = True) As Variant
    Dim vW1, vW2
    Dim oDic As Object
    Dim lngW As Long, lngU As Long, lngM As Long, lngTemp As Long, rngCell As Range
    Dim strTemp As String, strB As String
    vW2 = Split(rngS2.Text, " ")
    Set oDic = CreateObject("Scripting.Dictionary")
    For lngW = LBound(vW2) To UBound(vW2) Step 1
        strTemp = Replace(vW2(lngW), ".", "")
        With oDic
            If Not .exists(strTemp) Then
                lngU = lngU + 1
                .Add strTemp, lngU
            End If
        End With
    Next lngW
    Set oDic = Nothing
    For Each rngCell In rngS1.Cells
        If rngCell.Value <> "" Then
            If rngCell.Value = strTemp Then
                lngM = lngU
                strB = strTemp
            Else
                vW1 = Split(rngCell.Text, " ")
                lngTemp = 0
                For lngW = LBound(vW2) To UBound(vW2) Step 1
                    strTemp = vW2(lngW)
                    If boolCase Then
                        lngTemp = lngTemp + rngS1.Parent.Evaluate("SUMPRODUCT(--ISNUMBER(FIND("" " & strTemp & " "","" ""&SUBSTITUTE(" & rngCell.Address & ",""."","""")&"" "")))")
                    Else
                        lngTemp = lngTemp - IsNumeric(Application.Match(strTemp, vW1, 0))
                    End If
                Next lngW
                If lngTemp > lngM Then
                    lngM = lngTemp
                    strB = rngCell.Text
                End If
            End If
        End If
    Next rngCell
    Select Case UCase(strType)
        Case "P"
            CompareString = lngM / lngU
        Case "S"
            CompareString = strB
    End Select
    End Function
    the above is now called along the lines of:

    C1: =COMPARESTRING($A$1:$A$4,$B$1,"P")
    to return the greatest percentage

    D1: =COMPARESTRING($A$1:$A$4,$B$1,"S")
    to return the associated string

    In the case of matching "max" % the first string is returned and as such the above code would stop should an exact match be found - ie it won't iterate through additional cells unnecessarily.

    As before you have the optional Boolean as final parameter to denote case sensitivity (default is TRUE - ie Case Sensitive matching)


    (edit: and apologies for slight delay in responding - I tend to pop in and out of the forum in the afternoons)
    Last edited by DonkeyOte; 04-21-2010 at 10:20 AM.

  12. #12
    Registered User
    Join Date
    12-01-2006
    Posts
    55

    Re: Compare words between two cells and get matching percentage

    Hi, apologies from my side for the delay, I've been out on a business trip and didn't get a minute to access the forum.
    I've just tried your solution and it works like a charm, you're a genius! I'm impressed, just copying and pasting your code makes the trick.

    Thanks a lot, I really appreciate your big help! :-)

  13. #13
    Registered User
    Join Date
    12-01-2006
    Posts
    55

    Re: Compare words between two cells and get matching percentage

    Edit!

    If I compare exactly the same sentence:
    A1 - Mark scored an absolutely great point.
    B1 - Mark scored an absolutely great point.

    it gives me 83.33%. Removing the dot in the end would give me a 100% match.
    Would it be possible to have 100% even with the dots?

    Also, I've got some cells with line-breaks:
    A1 - Mark scored an absolutely
    great point.
    B1 - Mark scored an absolutely great point.

    Although the content is a 100%, because of the line-break the percentage goes down to 50%.
    Also, the result text I get in D1 is:

    D1 - Mark scored an absolutelygreat point.

    It would be best to have it on two lines like A1.

    Sorry to keep pestering you, but we're almost there!

  14. #14
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531

    Re: Compare words between two cells and get matching percentage

    Going forward it would be a lot easier if you posted sample files so we can ensure we're both dealing with the same variations.

    Based on your post perhaps:

    Function CompareString(rngS1 As Range, rngS2 As Range, strType As String, Optional boolCase As Boolean = True) As Variant
    Dim vW1, vW2
    Dim oDic As Object
    Dim lngW As Long, lngU As Long, lngM As Long, lngTemp As Long, rngCell As Range
    Dim strTemp As String, strC As String, strB As String
    vW2 = Split(Application.WorksheetFunction.Trim(Replace(Replace(rngS2.Text, ".", ""), Chr(10), " ")), " ")
    Set oDic = CreateObject("Scripting.Dictionary")
    For lngW = LBound(vW2) To UBound(vW2) Step 1
        strTemp = vW2(lngW)
        With oDic
            If Not .exists(strTemp) Then
                lngU = lngU + 1
                .Add strTemp, lngU
            End If
        End With
    Next lngW
    Set oDic = Nothing
    For Each rngCell In rngS1.Cells
        strC = Application.WorksheetFunction.Trim(Replace(Replace(rngCell.Text, ".", ""), Chr(10), " "))
        If strC <> "" Then
            If strC = rngS2.Text Then
                lngM = lngU
                strB = rngS2.Text
            Else
                vW1 = Split(strC, " ")
                lngTemp = 0
                For lngW = LBound(vW2) To UBound(vW2) Step 1
                    strTemp = vW2(lngW)
                    If boolCase Then
                        lngTemp = lngTemp + rngS1.Parent.Evaluate("SUMPRODUCT(--ISNUMBER(FIND("" " & strTemp & " "","" " & strC & " "")))")
                    Else
                        lngTemp = lngTemp - IsNumeric(Application.Match(strTemp, vW1, 0))
                    End If
                Next lngW
                If lngTemp > lngM Then
                    lngM = lngTemp
                    strB = rngCell.Text
                End If
            End If
        End If
    Next rngCell
    Select Case UCase(strType)
        Case "P"
            CompareString = lngM / lngU
        Case "S"
            CompareString = strB
    End Select
    End Function
    Note however that regards:

    Quote Originally Posted by paokun
    It would be best to have it on two lines like A1.
    this can only be achieved by altering the word-wrap setting on the cell itself (via Formatting) - once done the result will be per A1 (ie the line break is present it's simply not visible given the cell formatting) ... the UDF can not alter the cell's format.

  15. #15
    Registered User
    Join Date
    05-25-2009
    Location
    Orlando
    MS-Off Ver
    Excel 2007
    Posts
    9

    Re: Compare words between two cells and get matching percentage

    please remove
    Last edited by stussy5555; 12-07-2016 at 03:46 PM.

  16. #16
    Administrator FDibbins's Avatar
    Join Date
    12-29-2011
    Location
    Duncansville, PA USA
    MS-Off Ver
    Excel 7/10/13/16/365 (PC ver 2310)
    Posts
    53,051

    Re: Compare words between two cells and get matching percentage

    Quote Originally Posted by stussy5555 View Post
    Not to hijack this thread, but .........
    we actually have a rule about that, please start your own thread and reference this 1 if necessary
    1. Use code tags for VBA. [code] Your Code [/code] (or use the # button)
    2. If your question is resolved, mark it SOLVED using the thread tools
    3. Click on the star if you think someone helped you

    Regards
    Ford

  17. #17
    Registered User
    Join Date
    05-25-2009
    Location
    Orlando
    MS-Off Ver
    Excel 2007
    Posts
    9

    Re: Compare words between two cells and get matching percentage

    Quote Originally Posted by FDibbins View Post
    we actually have a rule about that, please start your own thread and reference this 1 if necessary
    Sorry about that, started a new thread

  18. #18
    Registered User
    Join Date
    12-01-2006
    Posts
    55

    Re: Compare words between two cells and get matching percentage

    Thank you very much again, that was quick!
    Now it's perfect, it does exactly what I need.

    Thank you again very much for your big support :-)

  19. #19
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531

    Re: Compare words between two cells and get matching percentage

    No problem - please remember to denote thread as solved if resolved to your satisfaction.

  20. #20
    Registered User
    Join Date
    09-07-2010
    Location
    Madrid, Spain
    MS-Off Ver
    Excel 2003
    Posts
    1

    Re: Compare words between two cells and get matching percentage

    Sorry, my mistake. This code is great.

    thanks a lot
    Last edited by dosbajistas; 09-07-2010 at 11:48 AM.

  21. #21
    Registered User
    Join Date
    05-19-2011
    Location
    Portland, OR
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Compare words between two cells and get matching percentage

    Will the macro work regardless of which two columns you're comparing? If I wanted to get a % between two cells but have several columns in the table..

    Thank you in advance.

  22. #22
    Registered User
    Join Date
    10-29-2015
    Location
    sydney
    MS-Off Ver
    2013
    Posts
    1

    Re: Compare words between two cells and get matching percentage

    this message has been removed
    Last edited by coop624; 10-10-2019 at 03:14 AM. Reason: starting new thread

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1