Fuzzy string comparison / detecting "similar" strings

xirx · 02-15-2005, 06:08 PM

When dealing with real live data, you often have some
variation of minor errors in your data. E.g. I have
two lists (databases) in which Names sligthly differ.

Examples:

"Clark Kent" vs "Clark Kent"
"John P. Smith" vs "John Paul Smith"
"Miller Limited" vs "Miller Ltd."
"Peter Hammer" vs "Petre Hammer"

I am looking for a way to handle this (semi-) automatic.
My idea is to have a function f, that takes two strings
and delivers a measure on how much the are alike. E.g.
f should be 1, if both arguments are identical and it
should be 0 if they are "completely" different.

I am pretty sure that a lot of ppl have been thinking
abouut such a thing already and there should be more
than one solution for this.

Any pointers?

Fuzzy string comparison / detecting "similar" strings

LinkBack

Thread Tools

Rate This Thread

Display

Threaded View

Fuzzy string comparison / detecting "similar" strings

Thread Information

Users Browsing this Thread

Bookmarks

Bookmarks

Posting Permissions