Hi folks,
I have a question that is similar but different to an earlier post (April 26 to be precise - same title). I am a relative beginner with VB so it would be fantastic if someone could lend me a hand.
I have two columns of text each with 1,000+ lines. I want a formula that generates a percentage match for the words in cells of the same row (eg cells B1 and C1).
I have a formula (courtesy of the earlier post) that counts the number of differing characters in the two cells (see below for VBA code). What I really want is a formula that counts the number of differing words.
Example:
Cell A1: the cat sat on the mat
Cell B1: the hat was squashed by the cat
The Levenshtein formula (code and link below) gives me an output of 14 ie, there are 14 different characters in cell B1 than in cell A1. The result I am looking for is that there are 4 words used in cell B1 that are not used in cell A1.
If there is a way to measure the change in the order of words I would be interested in that also, but I realise that is probably very complicated.
Any help is much appreciated!
The formula I have was posted on this forum by SpeedingLunatic in April 2010 and is a Levenshtein measure. The VBA was borrowed from here: http://en.wikibooks.org/wiki/Algorit...htein_distance and is below...
Function levenshtein(a As String, b As String) As Integer Dim i As Integer Dim j As Integer Dim cost As Integer Dim d() As Integer Dim min1 As Integer Dim min2 As Integer Dim min3 As Integer If Len(a) = 0 Then levenshtein = Len(b) Exit Function End If If Len(b) = 0 Then levenshtein = Len(a) Exit Function End If ReDim d(Len(a), Len(b)) For i = 0 To Len(a) d(i, 0) = i Next For j = 0 To Len(b) d(0, j) = j Next For i = 1 To Len(a) For j = 1 To Len(b) If Mid(a, i, 1) = Mid(b, j, 1) Then cost = 0 Else cost = 1 End If min1 = (d(i - 1, j) + 1) min2 = (d(i, j - 1) + 1) min3 = (d(i - 1, j - 1) + cost) d(i, j) = Application.WorksheetFunction.Min(min1, min2, min3) Next Next levenshtein = d(Len(a), Len(b)) End Function
Last edited by pike; 09-13-2010 at 02:27 AM. Reason: code tags for newbie MP message
You could use this function:
Regards,Function levenshtein(a As String, b As String) As Integer Dim myWord As Variant levenshtein = 0 For Each myWord In Split(b, " ") If InStr(" " & a & " ", " " & myWord & " ") = 0 Then levenshtein = levenshtein + 1 End If Next myWord End Function
Antonio
Last edited by antoka05; 09-13-2010 at 02:47 AM.
Below is similar but has case-sensitivity.
Regards change in order - you might need to specify some examples so we can better interpret requirements.Function MISSINGWORDS(rngS1 As Range, rngS2 As Range, Optional boolCase = False) As Long Dim vS1 As Variant Dim lngW As Long, lngTemp As Long Dim vbCompare As VbCompareMethod vbCompare = IIf(boolCase, vbBinaryCompare, vbTextCompare) vS1 = Split(rngS1) For lngW = LBound(vS1) To UBound(vS1) lngTemp = lngTemp - (InStr(1, " " & rngS2.Value & " ", " " & vS1(lngW) & " ", vbCompare) = 0) Next lngW MISSINGWORDS = lngTemp End Function
Note: regards both UDFs presented - both assume a consistent delimiter between words (ie no punctuation per se) - if this is not reality you will need to clean the strings before comparing.
My Recommended Reading:
Volatility
Sumproduct & Arrays
Pivot Intro
Email from XL - VBA & Outlook VBA
Function Dictionary & Function Translations
Dynamic Named Ranges
if you also want to take the words' order into account :Sub snb_paradigm() For Each wd In Split(Cells(1, 1)) x1 = x1 + UBound(Filter(Split(Cells(1,2), wd)) + 1 Next End Sub
In this last case you can choose to weigh the different matches: a return of 4 consecutive words can be multiplied by 4, a match of 3 consecutive words can be multiplied by 3.Sub snb_paradigm_extended() For Each wd In Split(Cells(1, 1)) x1 = x1 + UBound(Filter(Split(Cells(1, 2)), wd)) + 1 Next sq = Split(Cells(1, 1)) For jj = 1 To UBound(sq) x2 = 0 For j = 0 To UBound(sq) - jj c01 = "" For jjj = 0 To jj c01 = IIf(c01 = "", "", c01 & " ") & sq(j + jjj) Next x2 = x2 + UBound(Split(Cells(1, 2), c01)) Next If x2 = 0 Then Exit For x3 = x3 + x2 Next End Sub
It's also possible to count the different kind of matches in separate variables or array-elements.
Last edited by snb; 09-13-2010 at 04:10 AM.
[SOLVED} Thanks very much to the three of you for you suggestions, and apologies that it has taken me a couple of days to get back to the forum.
The code you supplied is perfect for my needs.
Cheers!
how do i use the vb script that snb posted above ? how do i insert it and where
You can put the code in the sheet-module in which you want to count the words.
If you put them in another module (macro/workbook/userform module) you'll have to add the sheetname the cells are in.
NB. the preceding period: .Cells(...
Sub snb_paradigm() With sheets(1) For Each wd In Split(.cells(1,1)) x1 = x1 + UBound(Filter(Split(.Cells(1,2), wd)) + 1 Next End With End SubSub snb_paradigm_extended() With sheets(1) For Each wd In Split(.Cells(1, 1)) x1 = x1 + UBound(Filter(Split(.Cells(1, 2)), wd)) + 1 Next sq = Split(.Cells(1, 1)) For jj = 1 To UBound(sq) x2 = 0 For j = 0 To UBound(sq) - jj c01 = "" For jjj = 0 To jj c01 = IIf(c01 = "", "", c01 & " ") & sq(j + jjj) Next x2 = x2 + UBound(Split(.Cells(1, 2), c01)) Next If x2 = 0 Then Exit For x3 = x3 + x2 Next End With End Sub
i've run your macro and for some reason i don't get any output.
the steps to runing the macro were: (using 2007 version)
alt+f11, paste in module 1 the macro,in excel i go went to view macro, select and run macro
the selected cell stays blank after running the macro (any of them)
any ideea why ?
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks