I receive 200 stories per day from which I shortlist X number of stories. I also have a database of 12,000 shortlisted stories. I want to develop an algorithm, which matches contents of column A (which has 200 new stories) and column B (which has all the stories from the database) and generate the closest matched story based on the common words between the two, while ignoring all stop words. I also want it to generate the % of common words between the new story and the common story returned.
Example:
Column A has 200 stories, ranging from A1 to A200. Column B has 12,000 ranging from B1 to B 12,000. I want to match A1 with B1 to B12,000 and return the story with which it shares the maximum number of common words.
I have also attached an excel file with some sample statements for both Column A and Column B.
Thank you
Bookmarks