+ Reply to Thread
Results 1 to 15 of 15

Extract the most common words taken N at a time

  1. #1
    Registered User
    Join Date
    02-04-2012
    Location
    CA, USA
    MS-Off Ver
    Excel 2010
    Posts
    56

    Extract the most common words taken N at a time

    I have a single column where each cell contains multiple words in any order.

    I need to find which group of N words in a cell exist in the greatest number of rows (or cells for this 1-column case).

    Example: (using letters for simplicity)
    In the data below, each row of letters is one cell.

    I want to determine which 3 letters in a cell exist in the greatest number of cells?
    I believe B, E, and F (rows 1, 3 and last row).
    Edit: (also B, F, H in rows 1, 4 and last row - discovered later)

    F H E B A
    D B A G
    E B A F C
    B H C F
    C G D
    D F A H
    G E F D
    C B H D A
    B H F E D

    Like this example, each cell contains only a few words (maybe 10 max), but there are many different words and hundreds of rows, however all the words could be extracted and individually listed if necessary, e.g. used in a lookup table, separate array, etc.

    Any insights appreciated!
    Last edited by Buzzed Aldrin; 02-13-2012 at 07:26 AM. Reason: Clarify

  2. #2
    Forum Expert
    Join Date
    07-20-2011
    Location
    Mysore, India.
    MS-Off Ver
    Excel 2019
    Posts
    8,587

    Re: Extract the most common words taken N at a time

    Pl see attached file.
    Attached Files Attached Files

  3. #3
    Forum Expert dilipandey's Avatar
    Join Date
    12-05-2011
    Location
    Dubai, UAE
    MS-Off Ver
    1997 - 2016
    Posts
    8,191

    Re: Extract the most common words taken N at a time

    Hi Aldrin,

    Try looking at Advanced Filter feature of Excel with wildcard usage. Thanks.

    Regards,
    DILIPandey

    <click on below 'star' if this helps>
    DILIPandey, Excel rMVP
    +919810929744 (India), +971528225509 (Dubai), [email protected]

  4. #4
    Registered User
    Join Date
    02-04-2012
    Location
    CA, USA
    MS-Off Ver
    Excel 2010
    Posts
    56

    Re: Extract the most common words taken N at a time

    @kvsrini... That works when the words are known, but in my application the words to specify in the formula are unknown. I just used B,E,F to illustrate, because I already "knew" that those letters occurred most often (3 rows).

    Even with the tiny example data set, it's not a trivial thing to manually consider all the 3-item combinations for each row, which is why I said I "believe" B,E,F occurs most often My problem is the "reverse" situation - I need a way to find out the most common group of items.

    So the example can be worded better: Which 3-letter group exists in the most rows? Answer: B,E,F

    @dilipandy Never considered Advanced Filter, thanks for mentioning, will check it out.

  5. #5
    Forum Expert
    Join Date
    06-09-2010
    Location
    Australia
    MS-Off Ver
    Excel 2013
    Posts
    1,714

    Re: Extract the most common words taken N at a time

    Hi
    is the range of possible words fixed? otherwise you quicky hit a lot of permutations - a list of 100 unique words contains 161,700 possible combinations of 3 different words.

  6. #6
    Forum Expert
    Join Date
    06-09-2010
    Location
    Australia
    MS-Off Ver
    Excel 2013
    Posts
    1,714

    Re: Extract the most common words taken N at a time

    also, are there just words, or other information too (numbers, punctuation etc)

  7. #7
    Registered User
    Join Date
    02-04-2012
    Location
    CA, USA
    MS-Off Ver
    Excel 2010
    Posts
    56

    Re: Extract the most common words taken N at a time

    is the range of possible words fixed? otherwise you quicky hit a lot of permutations - a list of 100 unique words contains 161,700 possible combinations of 3 different words.
    The range of words is not fixed/limited, but there's only a few words per cell (say 3-10). Yes, the permutative load becomes quickly apparent just mentally going through the dinky example dataset.


    also, are there just words, or other information too (numbers, punctuation etc)
    No, strictly text words, no punctuation or junk characters.

  8. #8
    Forum Expert
    Join Date
    06-09-2010
    Location
    Australia
    MS-Off Ver
    Excel 2013
    Posts
    1,714

    Re: Extract the most common words taken N at a time

    Hi

    I have probably made this far more complex and ugly than it needs to be, but it seems to work. The trick seems to be to eliminate words that appear infrequently to reduce the word list to a manageable level. If all words occur with about the same frequency this will not work.

    You need to have your source words in a worksheet named "source" and a separate blank sheet in the workbook called "words" to list and manipulate data.

    the macros should run in the order listed (or the macro a_all_macros will run them for you):

    a_words_list
    copies each word in the cells in sheet "source" to a separate row in sheet "words"

    b_sort_words
    sorts them alphabetically

    c_count_frequencies
    counts the frequencies with which each word occurs

    d_delete_duplicates
    deletes all words that occur only once (these can't be the most frequent) and all duplicates, leaving just one instance of each word in the list

    e_sort_by_frequency
    sorts the remaining words by descending order of frequency

    f_find_most_frequent_delete_others
    for the top 3 most frequent words, counts the number of occasions on which all three words occur in one cell in sheet "source". Deletes all words that occur less frequently than this (they cannot be in the most frequent 3)

    hopefully, this will reduce the list of words to a manageable length (20 or less words). Otherwise, it will probably not work.

    g_check_nos
    If so, the final macro determines all permutations of 3 words in the list and checks their frequency in sheet “source”, returning the most frequent


    Please Login or Register  to view this content.
    Last edited by NickyC; 02-06-2012 at 11:43 PM.

  9. #9
    Registered User
    Join Date
    02-04-2012
    Location
    CA, USA
    MS-Off Ver
    Excel 2010
    Posts
    56

    Re: Extract the most common words taken N at a time

    Sorry for not getting back on this.
    I've had to reformat my HD and am now picking up the pieces
    Will get back asap.

  10. #10
    Registered User
    Join Date
    02-04-2012
    Location
    CA, USA
    MS-Off Ver
    Excel 2010
    Posts
    56

    Re: Extract the most common words taken N at a time

    @NickyC - Holy cow, hope you didn't feel compelled to find a solution after getting sucked into a challenge. Big thanks for the effort even if it doesn't work!

    When I run the macro, I'm getting "Run-time error '9': Subscript out of range"

    Problem with code or the way I'm running it?

    I have essentially zero VB experience so I can't (intelligently) mod your code.

  11. #11
    Forum Expert
    Join Date
    06-09-2010
    Location
    Australia
    MS-Off Ver
    Excel 2013
    Posts
    1,714

    Re: Extract the most common words taken N at a time

    Hi
    can you check that the workbook has sheets named "source" and "words"

    I have tweaked the code, which may enable it to work. if not, can you check to see at what point it breaks down (the row where it gets stuck in the VBA should be highlighted in yellow).

    Please Login or Register  to view this content.

  12. #12
    Forum Expert
    Join Date
    01-15-2007
    Location
    Brisbane, Australia
    MS-Off Ver
    2007
    Posts
    6,591

    Re: Extract the most common words taken N at a time

    Hi

    Slightly different approach. This will allow you to nominate the number of words in your groupings. I've only programmed for 3 and 4 word combinations just so you get the idea.

    I've also hard coded the range to process - again just to give you an idea. This can be automated, but the output will have to be put into a different place.

    Please Login or Register  to view this content.
    Using the above for a 3 word group, I noticed that F, B and H also have 3 rows...

    rylo
    Last edited by rylo; 02-13-2012 at 12:43 AM. Reason: realised it had to cover words not single characters...

  13. #13
    Registered User
    Join Date
    02-04-2012
    Location
    CA, USA
    MS-Off Ver
    Excel 2010
    Posts
    56

    Re: Extract the most common words taken N at a time

    @rylo and NickyC => both versions, outstanding!

    Yep - F,B,H is in there 3 times also, shows how tedious and error-prone this analysis would be if attempted manually on even a small data set.

    2- and 3-word macros are probably all I need; I can follow your codes and mod for other N-values if nec.

    Thank you very much - This would have taken me a long time, if even feasible, using formulas and a rat's nest of helper columns.

  14. #14
    Registered User
    Join Date
    02-04-2012
    Location
    CA, USA
    MS-Off Ver
    Excel 2010
    Posts
    56

    Re: Extract the most common words taken N at a time

    @rylo - In your code, for Case 3 in the Select Case section, "For k = j + 2 To x" should be "For k = j + 1 To x" correct?

  15. #15
    Forum Expert
    Join Date
    01-15-2007
    Location
    Brisbane, Australia
    MS-Off Ver
    2007
    Posts
    6,591

    Re: Extract the most common words taken N at a time

    Buzzed

    Correct. My bad. When I first did this, I realised the mistake and corrected for Case 4, but didn't see the problem for case 3.

    rylo

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1