# Compare words between two cells and get matching percentage

1. ## Compare words between two cells and get matching percentage

Hi all,

I have a column with 10,000+ lines, and lots of them are duplicated.
I have a formula that tells me how many times they are duplicated, but it only looks for 100% matching cells.

Now, lots of cells are partially duplicated, for example:

A1 - Team Abcde
A2 - Team Abcde Fghi
B1 - Team Abc

I would like to have a formula giving me a percentage of how two cells are similar to each others by checking the words.
In this case, B1 is 50% similar to A1 because only the word 'Team' is duplicated.
B1 is only 33,34% similar to A2.

I hope any of you can give me a hand on this, thanks :-)

2. ## Re: Compare words between two cells and get matching percentage

disregard this post...I had a formula here but it didn't work.

3. ## Re: Compare words between two cells and get matching percentage

Originally Posted by paokun
Hi all,

I have a column with 10,000+ lines, and lots of them are duplicated.
I have a formula that tells me how many times they are duplicated, but it only looks for 100% matching cells.

Now, lots of cells are partially duplicated, for example:

A1 - Team Abcde
A2 - Team Abcde Fghi
B1 - Team Abc

I would like to have a formula giving me a percentage of how two cells are similar to each others by checking the words.
In this case, B1 is 50% similar to A1 because only the word 'Team' is duplicated.
B1 is only 33,34% similar to A2.

I hope any of you can give me a hand on this, thanks :-)

I was able to do this with a Levenshtein formula. I borrowed the VBA algorithm from here:

http://en.wikibooks.org/wiki/Algorit...htein_distance

Insert a module into your workbook by hitting Alt-F11, right-click vbaproject, insert > module. Paste this code into the window:
``Please Login or Register  to view this content.``
Now you can use this formula to compare A1 and B1:

``Please Login or Register  to view this content.``
This might compare a bit more accurately than you were asking for, as it is on a character-by-character basis rather than word-by-word (among other things). This gives matches of 80% and 53.33%, but I think it should suit your purposes.

4. ## Re: Compare words between two cells and get matching percentage

I've tried the formula and it works, thing is that when I have, for example:

A1 - A look at the sea.
B1 - kkkkkkkk kkkkkk kkk

It gives me a 10.53% match, but I'd like to have 0%.

Also I would prefer to check by words. For example:

A1 - A look at the sea.
B1 - Beach look at the a.

Without comparing the words order, out of 5 words, only 4 match, so I'd like to have 80% as a result.

Pretty complicated I guess, but since you are geniuses I'm sure it's worth asking.

5. ## Re: Compare words between two cells and get matching percentage

Bump.. anyone?

6. ## Re: Compare words between two cells and get matching percentage

I think you need to clarify what happens regards

a) punctuation, eg A and a. are not a match
b) case sensitivity (eg A and a are not a match per se)

and what happens in the case of say a common word with differing frequencies, eg:

A1: my the weather is lovely
B1: my oh my what wonderful weather

In this instance what is the % output ?

Given B seemingly takes precedence in calculating % of the 6 words used in B1 3 are found, however, "my" appears with differing frequency - once in A1 and twice in B1.

7. ## Re: Compare words between two cells and get matching percentage

Hi,

a) punctuation, eg A and a. are not a match
No need to consider punctuation.

b) case sensitivity (eg A and a are not a match per se)
'Aurora' and 'aurora' should be considered different words.

Regarding the frequency words appear, for example:
A1: my the weather is lovely
B1: my oh my what wonderful weather

Each repeated word can be considered only once. So, B1 has 5 words, two of which appear in A1 too, so the output should be 40%.

Still, complicated, but I appreciate any help! Thanks a lot.

8. ## Re: Compare words between two cells and get matching percentage

The case sensitivity makes this a little more complex I think...

``Please Login or Register  to view this content.``
the above, stored in a module, could be called from a cell along the lines of:

``Please Login or Register  to view this content.``
this would generate the results outlined for your examples with the exception of:

Originally Posted by paokun
A1 - A look at the sea.
B1 - Beach look at the a.

Without comparing the words order, out of 5 words, only 4 match, so I'd like to have 80% as a result.
which given stipulated case sensitivity should actually return 60% given a <> A.
The UDF has optional third parameter (Boolean) for Case Sensitivity - default being True - if explicitly set to False in the call you would get the 80%, eg:

``Please Login or Register  to view this content.``
I hope that helps.

Note: in the UDF only a basic punctuation assumption is made (eg period delimiter) - pending complexity of your real strings you will most likely need to elaborate on this part.

9. ## Re: Compare words between two cells and get matching percentage

Wow, this is fantastic stuff!

It works like a charm, and being able to use the 'false' tag is also extremely helpful.

I've got one last question. Would it be possible to compare B1 with a range of cells in A, and show as a result the highest matching percentage?

Edit: .. and show as a result the highest matching percentage and the best matching cell?

For example:
C1 - 86%
D1 - my the weather is lovely (<- text content of the most similar cell)

10. ## Re: Compare words between two cells and get matching percentage

Edit: Unnecessary.

11. ## Re: Compare words between two cells and get matching percentage

I'm not quite sure what you're asking for... if you want the UDF to accept multiple strings for comparative purposes then perhaps the below adaptation might help ?

``Please Login or Register  to view this content.``
the above is now called along the lines of:

``Please Login or Register  to view this content.``
to return the greatest percentage

``Please Login or Register  to view this content.``
to return the associated string

In the case of matching "max" % the first string is returned and as such the above code would stop should an exact match be found - ie it won't iterate through additional cells unnecessarily.

As before you have the optional Boolean as final parameter to denote case sensitivity (default is TRUE - ie Case Sensitive matching)

(edit: and apologies for slight delay in responding - I tend to pop in and out of the forum in the afternoons)

12. ## Re: Compare words between two cells and get matching percentage

Hi, apologies from my side for the delay, I've been out on a business trip and didn't get a minute to access the forum.
I've just tried your solution and it works like a charm, you're a genius! I'm impressed, just copying and pasting your code makes the trick.

Thanks a lot, I really appreciate your big help! :-)

13. ## Re: Compare words between two cells and get matching percentage

Edit!

If I compare exactly the same sentence:
A1 - Mark scored an absolutely great point.
B1 - Mark scored an absolutely great point.

it gives me 83.33%. Removing the dot in the end would give me a 100% match.
Would it be possible to have 100% even with the dots?

Also, I've got some cells with line-breaks:
A1 - Mark scored an absolutely
great point.
B1 - Mark scored an absolutely great point.

Although the content is a 100%, because of the line-break the percentage goes down to 50%.
Also, the result text I get in D1 is:

D1 - Mark scored an absolutelygreat point.

It would be best to have it on two lines like A1.

Sorry to keep pestering you, but we're almost there!

14. ## Re: Compare words between two cells and get matching percentage

Going forward it would be a lot easier if you posted sample files so we can ensure we're both dealing with the same variations.

``Please Login or Register  to view this content.``
Note however that regards:

Originally Posted by paokun
It would be best to have it on two lines like A1.
this can only be achieved by altering the word-wrap setting on the cell itself (via Formatting) - once done the result will be per A1 (ie the line break is present it's simply not visible given the cell formatting) ... the UDF can not alter the cell's format.

15. ## Re: Compare words between two cells and get matching percentage

Thank you very much again, that was quick!
Now it's perfect, it does exactly what I need.

Thank you again very much for your big support :-)

17. ## Re: Compare words between two cells and get matching percentage

Sorry, my mistake. This code is great.

thanks a lot

18. ## Re: Compare words between two cells and get matching percentage

Will the macro work regardless of which two columns you're comparing? If I wanted to get a % between two cells but have several columns in the table..

20. ## Re: Compare words between two cells and get matching percentage

Originally Posted by stussy5555
Not to hijack this thread, but .........

21. ## Re: Compare words between two cells and get matching percentage

Originally Posted by FDibbins

22. ## Re: Compare words between two cells and get matching percentage

this message has been removed

There are currently 1 users browsing this thread. (0 members and 1 guests)

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1