+ Reply to Thread
Results 1 to 2 of 2

Part 2: Count and identify text

Hybrid View

  1. #1
    Forum Contributor
    Join Date
    01-25-2010
    Location
    Virginia
    MS-Off Ver
    Excel 2013
    Posts
    104

    Part 2: Count and identify text

    I got some fantastic help on this forum yesterday regarding how to count and identify the most frequently occurring words across rows of text.

    I now have some added challenges to that same problem that I need some help with.

    Attached is a sample Excel 2007 file containing 3 worksheets.

    The Raw Data worksheet has the sample raw data. My real data is MUCH larger.

    The Stop Words worksheet has a list of words that should not be included in the results. My actual list of Stop Words has about 600 words.

    The Count Results worksheet is the final worksheet. In this sheet I'd like to do 2 things. First, I'd like to be able to enter a Maximum number of words to list. So, the user might enter the number 20 (in cell B1), as the Maximum number of words to list.

    Second, I'd like to have the sorted list of most frequently occurring words in the Raw Data in one column, and in the next column I'd like to have the frequency with which the words occurred. In other words, the word that occurs most frequently would be listed first, along with the number of times it occurred; however, none of the Stop Words values will be included in this list.

    I would really appreciate help with this!

  2. #2
    Forum Guru DonkeyOte's Avatar
    Join Date
    10-22-2008
    Location
    Northumberland, UK
    MS-Off Ver
    O365
    Posts
    21,531

    Re: Part 2: Count and identify text

    In terms of modifying existing function from earlier thread

    Function ModeWord(rngS As Range, rngX As Range, Optional lngRank As Long = 1) As String
        Dim oDic As Object, RegExp As Object, RegExpMatch As Object
        Dim rngC As Range
        Dim lngKey As Long, lngInstance As Long
        Dim vTemp As Variant, vKeys As Variant, vKey As Variant
        Dim strTemp As String
        Set oDic = CreateObject("Scripting.Dictionary")
        Set RegExp = CreateObject("vbscript.regexp")
        With RegExp
            .Global = True
            .IgnoreCase = True
            .Pattern = "\w+"
        End With
        For Each rngC In rngS.Cells
            Set RegExpMatch = RegExp.Execute(Application.Trim(rngC.Value))
            For lngInstance = 1 To RegExpMatch.Count Step 1
                strTemp = LCase(RegExpMatch(lngInstance - 1))
                With oDic
                    If Not .exists(strTemp) Then
                        If Application.CountIf(rngX, strTemp) = 0 Then
                            .Add strTemp, 1 + 1 - (1 + .Count) / 10000
                        End If
                    Else
                        .Item(strTemp) = .Item(strTemp) + 1
                    End If
                End With
            Next lngInstance
        Next rngC
        Set RegExpMatch = Nothing
        With oDic
            If lngRank <= .Count Then
                ReDim vKeys(1 To .Count, 1 To 2)
                For Each vKey In .Keys
                    lngKey = lngKey + 1
                    vKeys(lngKey, 1) = vKey
                    vKeys(lngKey, 2) = .Item(vKey)
                Next vKey
                vTemp = Application.Match(Application.Large(Application.Index(vKeys, 0, 2), lngRank), Application.Index(vKeys, 0, 2), 0)
                ModeWord = vKeys(vTemp, 1) & " (" & Int(vKeys(vTemp, 2)) & ")"
            End If
        End With
        Set oDic = Nothing
    End Function
    then:

    CountResults!A4:
    =IF(ROWS(A$4:A4)>$B$1,"",MODEWORD('Raw Data'!$A$1:$A$6,'Stop Words'!$A$1:$A$25,ROWS(A$4:A4)))
    copied down to say A53
    as before it would make sense to convert this into a Sub Routine rather than run as a UDF and use Input Dialogs to capture ranges etc... (Application.InputBox type 8)

    I'm struggling with a cold at present * but I'm sure others can help you with a sub routine in the meantime
    *given current goings on in the world all very trivial but the brain isn't working unfortunately

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1