Closed Thread
Results 1 to 20 of 20

Dictionary Object

  1. #1
    Registered User
    Join Date
    10-10-2006
    Posts
    18

    Dictionary Object

    Hi, I am looking for a way to make a list of words out of some straightforward prose using a vba algorithm. I have attached an example it's txt with the data in lines (for some reason I can't upload xls today?)

    -I'm looking to create a huge list of the words in the poem as I have started to do manually further down (the plan was to put this on a separate sheet). I discussed a similar algorithm with Leith Ross on this forum a few months ago and something called a "dictionary scripting object" was mentioned. I'm sad to say, I haven't been able to find out what this is. Any help anyone can offer would be greatly appreciated.

    Adam
    Attached Files Attached Files

  2. #2
    Forum Expert
    Join Date
    01-15-2007
    Location
    Brisbane, Australia
    MS-Off Ver
    2007
    Posts
    6,591
    Adam

    Here's a couple of ways, one using a collection, and the other using a dictionary.

    Run each of the codes, and view the output. They both use a space as a word separator. The dictionary will see the and The as 2 different words, while the collection treats them the same.

    Please Login or Register  to view this content.
    Please Login or Register  to view this content.

    rylo
    Attached Files Attached Files
    Last edited by VBA Noob; 01-07-2008 at 07:49 PM.

  3. #3
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678
    Dictionary has a CompareMode property that allows case-insensitive compares.

  4. #4
    Forum Contributor
    Join Date
    11-12-2007
    Location
    Germany
    MS-Off Ver
    2007
    Posts
    472
    Hello AdamDay,
    thank you for the nice thread. I needed such a macro as well. :-)

    Hello SHG,
    i want to collect vocabulary from texts as well. Well the macro you have written for Adamday is almost what i needed too. I will be very thankful if you could modify the macro to fullfill my needs as well. Thank you very much in advance! :-)

    1) Every time the macro is run it should not delet the older wordlist in Column G.It would mean that if i run makro over a text on firstday it will make me a word list in column G. The next time if i want to run the macro over another text, the older wordlist in the column G should not be deleted. Only the newer words which dont already exist should be added to the column G. And there should be further no duplicates in column G.
    2) Column G should not be ordered alphabatically. The order can remain the way the text is processed. And every time the new text is processed the words should be simply put to the next empty cells in column G.
    3) Your macro now sperates some times words containing signs such as ":" or "," or ";" or ".". It put for example"hand:", "Jabberwock," and "wabe:". Is it possible to add a function to the macro that it ignores specific signs. For example if it finds a word "hand:" that its put in Column G only "hand" and not the singn ":".
    4) To modify the macro in such a way that even bigger texts can be processed. And that it can handle the job even if there are thousands of words already in column G.

    AdamDay,

    Through the modified macro one could enhance an existing dictionary. I would for example put all the target words of my existing dictionary in column G. Then i will put a text and run the macro. The macro will then only put those words in column G which dont exist in my older list already. The new suggested words could also contain some words which are nonsence. I would even dont delet them from G but put a specific signe infron of them in another column. That way the next time the macro is run over another text it wont put the same unuseful word in column G cus it will be there already. And that would save one the time to correct the same mistake each time. And one could always fish out the usefull words as well cus they wont be marked as unuseful.

  5. #5
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678
    Hello SHG,
    i want to collect vocabulary from texts as well. Well the macro you have written for Adamday is almost what i needed too. ...
    I think you're crediting me for rylo's post.

    However, posting a question in someone else's thread violates a crdinal rule of this and most other forums. Please start your own thread, and provide a link to this one for context as necessary.

    Thanks.

  6. #6
    Forum Contributor
    Join Date
    11-12-2007
    Location
    Germany
    MS-Off Ver
    2007
    Posts
    472

    Apology

    oops!!
    i hope that rylo and AdamDay eccept my apology. Though i had read the rules long before but had forgotten that important one.

    2). Never post a question in the Thread of another member. You MUST ALWAYS start you own New Thread.

    SHG
    To avoid such an embaracing thing again, i will read the rules each time in future before i post.

    soooooooooooooorry!! :-)
    Last edited by wali; 01-10-2008 at 02:48 AM.

  7. #7
    Registered User
    Join Date
    10-10-2006
    Posts
    18
    Thank you for the contribution Wali. Infact I think your program maybe of use to me in another project. Thank you for bringing it to my attention.

    Thanks also to shg and rylo. I am still getting back into excel after not using for some time and your contributions have been most valuable.

    What I am hoping to do is similar to wali's project, however I really need to make the transition from just a list of words to a usable dictionary much more step-by-step so that I can alter the ways in which the dictionary is created and also view the dictionary in a number of different orders.

    First of all I really just need a program that can make the list of words using " " as the delimiter between any two words. I also need the repeats to appear separately.

    Secondly I need to take the results of the first program and then start to remove the superfluous characters like "." and ":" as wali suggested for his program.

    Third, I need a third program to perform the de-duplication of entries. This program must also count the number of times a word appears. The result will be a list of words with a list of numbers (the count of duplications) alongside. The frequency of words in the poem will be the most important thing I gain from this.

    Finally, it will be possible to write another list of words which will be removed from the dictionary.

    This is why I'm a little unsure if the dictionary object is the best thing for the job. I really need the process to be step by step as described.

    once again - thanks for all your help!

  8. #8
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678
    The attachment will histogram text taken from worksheet cells. I copied your post, for example, and it gave:
    Please Login or Register  to view this content.
    Last edited by shg; 01-07-2009 at 11:47 AM.

  9. #9
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678
    Added a few new interfaces, so you can histogram the text on the clipboard, delete words, ...
    Last edited by shg; 01-07-2009 at 11:47 AM.

  10. #10
    Registered User
    Join Date
    10-10-2006
    Posts
    18

    Thank you

    thanks for this. I am having a look at the data now.

  11. #11
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678
    Here's a little more to play with.
    Last edited by shg; 01-07-2009 at 11:47 AM.

  12. #12
    Registered User
    Join Date
    10-10-2006
    Posts
    18
    Hi! These are excellent. Really handy programs. The largest dataset I'm using has several thousand words. Unfortunately, my computer stalls when I try to run the histogram with this large a set. This is really why I need to have the program perform each step one at a time, so that I can see where any flaws like this occur. Is it easy enough to swap the code around like this?

    It really is brilliant stuff by the way - it would have taken me ages to program this!

    Many thanks.

    Adam

  13. #13
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678
    Try this version.
    Last edited by shg; 01-07-2009 at 11:47 AM.

  14. #14
    Registered User
    Join Date
    10-10-2006
    Posts
    18
    Yes, this is perfect. Thanks so much! It does have a bit of a problem with really HUGE sets, but I think that may actually be my old computer! Unfortunately, I now have my work cut out with my project!

    Thanks again!

    Adam

  15. #15
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678
    I found a bug in that version when the text from the clipboard is larger that 32K characters. This version fixes that; I tested it to about 250K. Also added a log.
    Last edited by shg; 01-07-2009 at 11:47 AM.

  16. #16
    Forum Contributor
    Join Date
    06-27-2006
    Posts
    310
    Nice work that is impressive.

  17. #17
    Registered User
    Join Date
    10-10-2006
    Posts
    18

    2m words

    Indeed! This is a great program. It still has a bit of bother with some of my larger sets, though. Is there an easy way around this? The address below will show you a similar set from an old US census. It has about 2m words (although far less unique words than some of my other sets).

    http://rosuda.org/GOLD/data/CPS.txt

    Thanks again

  18. #18
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678
    OK, I went back to doing strings in 32K batches. My laptop took 23min to process the 13.8MB string.
    Attached Files Attached Files

  19. #19
    Registered User
    Join Date
    10-10-2006
    Posts
    18
    WOW! I am delighted! It works perfectly. This is so much more than I hoped to get back. Many thanks to you. This has doubtlessly saved me weeks of toil!

  20. #20
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678
    Well, good.


    An 'input from text file' interface would be a better solution probably -- it could read line-by-line rather than sucking the whole thing in from the clipboard. Oughta reduce memory consumption dramatically.

    I'll post back if I add it.

Closed Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1