+ Reply to Thread
Results 1 to 13 of 13

Count unique words appearing in a column once per row

  1. #1
    Registered User
    Join Date
    01-04-2010
    Location
    Canberra, Australia
    MS-Off Ver
    Excel 2007
    Posts
    11

    Count unique words appearing in a column once per row

    I would like to modify the below macro so it only counts a word once even if it appears more than once in a row. The reason I want to do this is so I can get a snapshot of the data without certain terms being over-represented.

    For example, in the attached example the word 'Microsoft' appears 5 times in row 4 but I only want this to be counted once. In the whole data set Microsoft appears 20 times but only in 7 of the 20 rows so I would like the count to be 7.

    The example is set out as follows: In column B there is a description field which in practice will contain consumer complaints and inquiries. To keep the data anonymous the description is filled out with random words and all other columns are blank. The output of Andy Pope's unique word counting macro appears in I:J.

    Please Login or Register  to view this content.
    I have been advised by a mod to start a new thread as the goals of my previous thread had changed.
    Attached Files Attached Files
    Last edited by eq2; 01-20-2010 at 08:17 AM.

  2. #2
    Administrator JBeaucaire's Avatar
    Join Date
    03-21-2004
    Location
    Bakersfield, CA
    MS-Off Ver
    2010, 2016, Office 365
    Posts
    33,492

    Re: Count unique words appearing in a column once per row

    Try this:
    Please Login or Register  to view this content.
    _________________
    Microsoft MVP 2010 - Excel
    Visit: Jerry Beaucaire's Excel Files & Macros

    If you've been given good help, use the icon below to give reputation feedback, it is appreciated.
    Always put your code between code tags. [CODE] your code here [/CODE]

    “None of us is as good as all of us” - Ray Kroc
    “Actually, I *am* a rocket scientist.” - JB (little ones count!)

  3. #3
    Forum Guru mikerickson's Avatar
    Join Date
    03-30-2007
    Location
    Davis CA
    MS-Off Ver
    Excel 2011
    Posts
    6,203

    Re: Count unique words appearing in a column once per row

    =SUMPRODUCT(--ISNUMBER(FIND("Microsoft", $B$4:$B$200)))

    returns 7
    _
    ...How to Cross-post politely...
    ..Wrap code by selecting the code and clicking the # or read this. Thank you.

  4. #4
    Administrator JBeaucaire's Avatar
    Join Date
    03-21-2004
    Location
    Bakersfield, CA
    MS-Off Ver
    2010, 2016, Office 365
    Posts
    33,492

    Re: Count unique words appearing in a column once per row

    The problem with not doing the count "as you go" using a formula like the one shown above..."Microsoft" may give the right answer, but some others would not.

    If two of the words were "all" and "call", the formula shown applied to a search for "all" would also match on the columns with words like "Call", "allow", "small", etc.

  5. #5
    Forum Guru mikerickson's Avatar
    Join Date
    03-30-2007
    Location
    Davis CA
    MS-Off Ver
    Excel 2011
    Posts
    6,203

    Re: Count unique words appearing in a column once per row

    This formulation checks for "whole word".

    =SUMPRODUCT(--ISNUMBER(FIND(" all "," "&TRIM(SUBSTITUTE(SUBSTITUTE(B2:B200,","," "),"."," "))&" ")))

  6. #6
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Essex, UK
    MS-Off Ver
    O365
    Posts
    21,528

    Re: Count unique words appearing in a column once per row

    FWIW - I think the point is that the code retrieves all unique words appearing within the range whilst applying an adjacent count... as I see it JB's alterations match OP requirements.

  7. #7
    Registered User
    Join Date
    01-04-2010
    Location
    Canberra, Australia
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Count unique words appearing in a column once per row

    Correct, JB's alternations performed my requirements correctly. Thanks so much. I'll confirm this as solved when I try it on a large data set tomorrow.

    A quick side-issue I'm having is tidying up the punctuation.

    Please Login or Register  to view this content.
    I get compile errors if I modify this line to remove commas, semi colons.

  8. #8
    Administrator JBeaucaire's Avatar
    Join Date
    03-21-2004
    Location
    Bakersfield, CA
    MS-Off Ver
    2010, 2016, Office 365
    Posts
    33,492

    Re: Count unique words appearing in a column once per row

    Try this version... it also fixes the sort routine, removes ( ) ; : ' , " ] [

    Please Login or Register  to view this content.
    Last edited by JBeaucaire; 01-17-2010 at 05:11 AM. Reason: tweaked the SORT code

  9. #9
    Administrator JBeaucaire's Avatar
    Join Date
    03-21-2004
    Location
    Bakersfield, CA
    MS-Off Ver
    2010, 2016, Office 365
    Posts
    33,492

    Re: Count unique words appearing in a column once per row

    Just a question, are the words with numbers of any interest at all? Or are you really just after a count of the actual alphabetic word strings? We could use a simple technique to remove ALL characters from the strings that are not alphabetic.

  10. #10
    Registered User
    Join Date
    01-04-2010
    Location
    Canberra, Australia
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Count unique words appearing in a column once per row

    In practice word strings with numbers may be of interest. Essentially this tool will help determine if there are any re-occurring terms of interest in a huge data set. Occasionally these terms may be things like company names or URLS that include a number, or sums of money or dates. I'll have to get back to you on how often these actually come up

    However, upon looking at the data further it seems that a result like "efficient6" has been pulled when is isn't actually a whole word in the text. It appears as "efficient.[6]". It would be preferred if these kind of results would not appear.

    I appreciate your interest.
    Last edited by eq2; 01-17-2010 at 05:36 AM.

  11. #11
    Registered User
    Join Date
    01-04-2010
    Location
    Canberra, Australia
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Count unique words appearing in a column once per row

    I ran this macro on 80,000 records and it took 1.5hrs to count over 100,000 keywords which is okay. I can confirm that I do want numbers to appear, there were counts of certain numbers that were important.

    A few issues with the punctuation filter that you or anyone else might be able to help with:
    -Full stops are removed from numeric strings with a dollar sign - eg $19.55 becomes $1955.
    -Words.which.have punctuation in between them are joined together. It would be better if these words were seperated and counted.
    -There are some issues with hidden characters which you can see in the attatched spreadsheet.

    It would be good if this punctuation filter would not apply to web addresses and email addresses.

    Please note I have moved the output of the macro because of the layout of the data it is been used for. I would prefer an output on a new worksheet 'WordCount'.

    This is becoming a really powerful macro which can help quickly get indicators from data which would otherwise take thousands of hours to read.

    Please Login or Register  to view this content.
    Attached Files Attached Files

  12. #12
    Administrator JBeaucaire's Avatar
    Join Date
    03-21-2004
    Location
    Bakersfield, CA
    MS-Off Ver
    2010, 2016, Office 365
    Posts
    33,492

    Re: Count unique words appearing in a column once per row

    Your expanded requirements demonstrate the black hole these kinds of projects can really be. The human mind is a powerful analytical tool, trying to get Excel to do it for you can be daunting.

    "Remove decimals, except not always."
    "All words are separated by spaces, but not always, so sometimes removing punctuation will accidentally merge two words, do that differently...(begin litany of variations on what to do)"
    "Be sure not to remove special characters from email addresses"

    ...and on and on.

    It may be time to hire someone to oversee what I imagine will continue to be an ongoing utility development.

  13. #13
    Registered User
    Join Date
    01-04-2010
    Location
    Canberra, Australia
    MS-Off Ver
    Excel 2007
    Posts
    11

    Re: Count unique words appearing in a column once per row

    Thanks for your response. I'm hoping this will not be a black hole project.

    With the dollar sign my understanding of the function "IsNumeric" is that a dollar value is still recognised as TRUE...

    The solution for stopping the punctuation filter from joining words was quite simple. I changed replace with nothing to replace with a space.

    Please Login or Register  to view this content.
    However this did not work for the full stop line - the result appears with a space in the same row. Eg joined.word became "joined word" on one row rather than "joined" and "word". I assume this is something todoing with the lack of a split function but I could not figure out how to fix it.
    Please Login or Register  to view this content.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1