I got some fantastic help on this forum yesterday regarding how to count and identify the most frequently occurring words across rows of text.
I now have some added challenges to that same problem that I need some help with.
Attached is a sample Excel 2007 file containing 3 worksheets.
The Raw Data worksheet has the sample raw data. My real data is MUCH larger.
The Stop Words worksheet has a list of words that should not be included in the results. My actual list of Stop Words has about 600 words.
The Count Results worksheet is the final worksheet. In this sheet I'd like to do 2 things. First, I'd like to be able to enter a Maximum number of words to list. So, the user might enter the number 20 (in cell B1), as the Maximum number of words to list.
Second, I'd like to have the sorted list of most frequently occurring words in the Raw Data in one column, and in the next column I'd like to have the frequency with which the words occurred. In other words, the word that occurs most frequently would be listed first, along with the number of times it occurred; however, none of the Stop Words values will be included in this list.
I would really appreciate help with this!
In terms of modifying existing function from earlier thread
then:Function ModeWord(rngS As Range, rngX As Range, Optional lngRank As Long = 1) As String Dim oDic As Object, RegExp As Object, RegExpMatch As Object Dim rngC As Range Dim lngKey As Long, lngInstance As Long Dim vTemp As Variant, vKeys As Variant, vKey As Variant Dim strTemp As String Set oDic = CreateObject("Scripting.Dictionary") Set RegExp = CreateObject("vbscript.regexp") With RegExp .Global = True .IgnoreCase = True .Pattern = "\w+" End With For Each rngC In rngS.Cells Set RegExpMatch = RegExp.Execute(Application.Trim(rngC.Value)) For lngInstance = 1 To RegExpMatch.Count Step 1 strTemp = LCase(RegExpMatch(lngInstance - 1)) With oDic If Not .exists(strTemp) Then If Application.CountIf(rngX, strTemp) = 0 Then .Add strTemp, 1 + 1 - (1 + .Count) / 10000 End If Else .Item(strTemp) = .Item(strTemp) + 1 End If End With Next lngInstance Next rngC Set RegExpMatch = Nothing With oDic If lngRank <= .Count Then ReDim vKeys(1 To .Count, 1 To 2) For Each vKey In .Keys lngKey = lngKey + 1 vKeys(lngKey, 1) = vKey vKeys(lngKey, 2) = .Item(vKey) Next vKey vTemp = Application.Match(Application.Large(Application.Index(vKeys, 0, 2), lngRank), Application.Index(vKeys, 0, 2), 0) ModeWord = vKeys(vTemp, 1) & " (" & Int(vKeys(vTemp, 2)) & ")" End If End With Set oDic = Nothing End Function
as before it would make sense to convert this into a Sub Routine rather than run as a UDF and use Input Dialogs to capture ranges etc... (Application.InputBox type 8)CountResults!A4: =IF(ROWS(A$4:A4)>$B$1,"",MODEWORD('Raw Data'!$A$1:$A$6,'Stop Words'!$A$1:$A$25,ROWS(A$4:A4))) copied down to say A53
I'm struggling with a cold at present * but I'm sure others can help you with a sub routine in the meantime
*given current goings on in the world all very trivial but the brain isn't working unfortunately
My Recommended Reading:
Volatility
Sumproduct & Arrays
Pivot Intro
Email from XL - VBA & Outlook VBA
Function Dictionary & Function Translations
Dynamic Named Ranges
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks