Results 1 to 8 of 8

Compare words between two cells for percentage match

Threaded View

  1. #1
    Registered User
    Join Date
    09-13-2010
    Location
    Australia
    MS-Off Ver
    Excel 2003
    Posts
    2

    Compare words between two cells for percentage match

    Hi folks,

    I have a question that is similar but different to an earlier post (April 26 to be precise - same title). I am a relative beginner with VB so it would be fantastic if someone could lend me a hand.

    I have two columns of text each with 1,000+ lines. I want a formula that generates a percentage match for the words in cells of the same row (eg cells B1 and C1).

    I have a formula (courtesy of the earlier post) that counts the number of differing characters in the two cells (see below for VBA code). What I really want is a formula that counts the number of differing words.

    Example:
    Cell A1: the cat sat on the mat
    Cell B1: the hat was squashed by the cat

    The Levenshtein formula (code and link below) gives me an output of 14 ie, there are 14 different characters in cell B1 than in cell A1. The result I am looking for is that there are 4 words used in cell B1 that are not used in cell A1.

    If there is a way to measure the change in the order of words I would be interested in that also, but I realise that is probably very complicated.

    Any help is much appreciated!

    The formula I have was posted on this forum by SpeedingLunatic in April 2010 and is a Levenshtein measure. The VBA was borrowed from here: http://en.wikibooks.org/wiki/Algorit...htein_distance and is below...

    Function levenshtein(a As String, b As String) As Integer
     
        Dim i As Integer
        Dim j As Integer
        Dim cost As Integer
        Dim d() As Integer
        Dim min1 As Integer
        Dim min2 As Integer
        Dim min3 As Integer
     
        If Len(a) = 0 Then
            levenshtein = Len(b)
            Exit Function
        End If
     
        If Len(b) = 0 Then
            levenshtein = Len(a)
            Exit Function
        End If
     
        ReDim d(Len(a), Len(b))
     
        For i = 0 To Len(a)
            d(i, 0) = i
        Next
     
        For j = 0 To Len(b)
            d(0, j) = j
        Next
     
        For i = 1 To Len(a)
            For j = 1 To Len(b)
                If Mid(a, i, 1) = Mid(b, j, 1) Then
                    cost = 0
                Else
                    cost = 1
                End If
                min1 = (d(i - 1, j) + 1)
                min2 = (d(i, j - 1) + 1)
                min3 = (d(i - 1, j - 1) + cost)
                d(i, j) = Application.WorksheetFunction.Min(min1, min2, min3)
            Next
        Next
     
        levenshtein = d(Len(a), Len(b))
     
    End Function
    Last edited by pike; 09-13-2010 at 02:27 AM. Reason: code tags for newbie MP message

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1