+ Reply to Thread
Results 1 to 3 of 3

Remove stop words from text

  1. #1
    Registered User
    Join Date
    08-13-2009
    Location
    Denmark
    MS-Off Ver
    Office 365
    Posts
    20

    Remove stop words from text

    Hi,

    I have to remove all Danish stop words from a lot of sentences. I use this function to do it.

    Please Login or Register  to view this content.

    The function is working but there is a small problem. Stop words containing the Danish letter 'å' are not removed. I think it has something to do with the pattern, but I can't figure out what's wrong?

    Maybe someone could help me with the code.

    I have attached a sample file.

    Best regards
    Morten
    Attached Files Attached Files

  2. #2
    Valued Forum Contributor
    Join Date
    10-15-2007
    Location
    Home
    MS-Off Ver
    Office 2010, W10
    Posts
    373

    Re: Remove stop words from text

    Hi Morten

    Quote Originally Posted by morsoe View Post
    The function is working but there is a small problem. Stop words containing the Danish letter 'å' are not removed. I think it has something to do with the pattern, but I can't figure out what's wrong?
    If I understand your problem correctly this should not happen with all words with å. This should only happen with words starting or ending in å. For ex., hvornår, has an å in the middle and so should be no problem.

    Please check.


    Looking at your pattern I see you are using the \b anchor, a word boundary. This is defined as a position between a word and a non-word character. This was defined long ago and still uses the ascii letters. Remember that a regex word character was an ascii letter, a digit or an underscore, in regex: [a-zA-Z0-9_]. Some languages now allow regex's to use unicode letters instead but that's not the case with the vba implementation (yet).

    I also saw that you have another non-ascii letter, ø, and you should have the same problem with it that you have with the å.

    This means that in my opinion your problems lies in you using the \b. You are not correctly identifying the word boundaries when the words start or end with å or ø (there may be other non-ascii characters that you want to add to this list).


    Solution: define your own word boundary, adding to the old definition of a word character the ø (H00F8) and å (H00E5), something like [a-zA-Z0-9_\u00F8\u00E5] and amend the code to take into account this new definition.

    This should solve the problem. If you have problems adapting the code, post back and I'll try to do it tomorrow night (GMT).

  3. #3
    Forum Guru
    Join Date
    08-15-2004
    Location
    Tokyo, Japan
    MS-Off Ver
    2013 O.365
    Posts
    22,523

    Re: Remove stop words from text

    Try change pattern to
    Please Login or Register  to view this content.
    Last edited by jindon; 10-21-2019 at 12:26 AM. Reason: Fixed a typo

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. [SOLVED] Remove duplicates and if space or text came then stop
    By Sekars in forum Excel Formulas & Functions
    Replies: 7
    Last Post: 07-05-2016, 05:48 PM
  2. [SOLVED] Macro to remove 650 stop words from excel text
    By PiaHarrison in forum Excel Programming / VBA / Macros
    Replies: 10
    Last Post: 05-18-2016, 03:56 PM
  3. How to remove (specific) Stop Words from a Text? I'm absolutely not getting it (:
    By dooyou in forum Excel Programming / VBA / Macros
    Replies: 3
    Last Post: 05-18-2016, 09:11 AM
  4. [SOLVED] Multi-line text box inserts a carriage return symbol in cell, can I stop/remove it?
    By loveridge01 in forum Excel Programming / VBA / Macros
    Replies: 3
    Last Post: 02-29-2016, 10:43 AM
  5. Remove/Keep words in string of Text
    By bfitzpa in forum Access Tables & Databases
    Replies: 1
    Last Post: 06-05-2014, 12:55 AM
  6. Remove Stop Words from a column containing 16000 rows of sentences
    By Abhayrajify in forum Excel Programming / VBA / Macros
    Replies: 7
    Last Post: 10-30-2013, 07:14 PM
  7. remove text after and before specific words
    By Statsman in forum Excel Formulas & Functions
    Replies: 4
    Last Post: 02-22-2008, 02:38 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1