I have the following problem.
I have an Excel range, call it SearchRange, which has around 50 000 words in it.
A user enters a word in a cell, and I have to try to pull out of this range up to 9 other words that sound similar, and put them in an array. This almost always means that they are spelled in a similar way. But see last example!
Here's 3 examples. The word in caps is the word the user entered, and below are the words that are found / need to be found.
STAMP
tamp cramp lamp ramp
scamp tramp vamp clamp
Replacing the 1st letter with a wildcard would find 'tamp'.
Replacing the first 2 letters with a wildcard would find all the others, because they all end in 'amp'
MULCH
gulch such smutch touch
bunch brunch hunch lunch
munch punch much
Replacing the 1st letter with a wildcard would find 'gulch'.
But this approach wouldn't find any of the others, because the thing they have in common is that
they have the vowel 'u' in them, and end in 'ch'. And the vowel is important for the sound of the word.
TREASURER
emperor caterer poulterer fruiterer
plethora
And this one is typical of the sort of thing that would be lovely to be able to do. But the only thing that these words have in
common is that they SOUND similar in the way they end!
So what I seem to need, generally, and I haven't the faintest idea how to do it, is code that will:
Try to find words that have that vowel, and the same (or as much of the same) ending as possible, because they are likely to sound similar. That would probably work for short words - although look at 'would' and 'wood', which sound exactly the same; how could those be found?
But for the last example, I don't even know if it's possible to do at all!
Whoever can figure this one out definitely gets the Man of the Match Award.
I've included a file with several thousand words to play with. They are in order of how they sound, and it was easy for me to simply grab x number of words before and after the one in question, and they would sound the same. Sadly, the data is no longer in that order, and is much bigger, hence all my problems.
Kind Regards.
Bookmarks