Hi,
I have to remove all Danish stop words from a lot of sentences. I use this function to do it.
Option Explicit
Function CleanStopWords(S As String, StopWords As Range)
Dim RE As Object
Dim SW() As String
Dim C As Range
Dim I As Long
ReDim SW(1 To StopWords.Count)
For I = 1 To StopWords.Count
SW(I) = StopWords(I)
Next I
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.ignorecase = True
'create pattern using the StopWords
.Pattern = "\b(?:" & Join(SW, "|") & ")\b\s*"
CleanStopWords = .Replace(S, "")
End With
End Function
The function is working but there is a small problem. Stop words containing the Danish letter 'å' are not removed. I think it has something to do with the pattern, but I can't figure out what's wrong?
Maybe someone could help me with the code.
I have attached a sample file.
Best regards
Morten
Bookmarks