Hi, I am looking for a way to make a list of words out of some straightforward prose using a vba algorithm. I have attached an example it's txt with the data in lines (for some reason I can't upload xls today?)
-I'm looking to create a huge list of the words in the poem as I have started to do manually further down (the plan was to put this on a separate sheet). I discussed a similar algorithm with Leith Ross on this forum a few months ago and something called a "dictionary scripting object" was mentioned. I'm sad to say, I haven't been able to find out what this is. Any help anyone can offer would be greatly appreciated.
Adam
Adam
Here's a couple of ways, one using a collection, and the other using a dictionary.
Run each of the codes, and view the output. They both use a space as a word separator. The dictionary will see the and The as 2 different words, while the collection treats them the same.
Sub aaa() Set nodupes = New Collection For i = 1 To 12 arr = Split(Cells(i, 1), " ") For j = LBound(arr) To UBound(arr) On Error Resume Next nodupes.Add Item:=arr(j), key:=arr(j) On Error GoTo 0 Next j Next i Range("G:I").ClearContents For i = 1 To nodupes.Count Cells(i, "G") = nodupes(i) Next i Range("G:G").Sort key1:=Range("G1"), order1:=xlAscending, header:=xlNo End SubSub bbb() Set dic = CreateObject("scripting.dictionary") For i = 1 To 12 arr = Split(Cells(i, 1), " ") For j = LBound(arr) To UBound(arr) If Not dic.exists(arr(j)) Then dic.Add Item:=arr(j), key:=arr(j) End If Next j Next i For Each ce In dic.items Cells(Rows.Count, 9).End(xlUp).Offset(1, 0).Value = ce Next ce Range("I:I").Sort key1:=Range("I1"), order1:=xlAscending, header:=xlNo End Sub
rylo
Last edited by VBA Noob; 01-07-2008 at 06:49 PM.
Dictionary has a CompareMode property that allows case-insensitive compares.
Hello AdamDay,
thank you for the nice thread. I needed such a macro as well. :-)
Hello SHG,
i want to collect vocabulary from texts as well. Well the macro you have written for Adamday is almost what i needed too. I will be very thankful if you could modify the macro to fullfill my needs as well. Thank you very much in advance! :-)
1) Every time the macro is run it should not delet the older wordlist in Column G.It would mean that if i run makro over a text on firstday it will make me a word list in column G. The next time if i want to run the macro over another text, the older wordlist in the column G should not be deleted. Only the newer words which dont already exist should be added to the column G. And there should be further no duplicates in column G.
2) Column G should not be ordered alphabatically. The order can remain the way the text is processed. And every time the new text is processed the words should be simply put to the next empty cells in column G.
3) Your macro now sperates some times words containing signs such as ":" or "," or ";" or ".". It put for example"hand:", "Jabberwock," and "wabe:". Is it possible to add a function to the macro that it ignores specific signs. For example if it finds a word "hand:" that its put in Column G only "hand" and not the singn ":".
4) To modify the macro in such a way that even bigger texts can be processed. And that it can handle the job even if there are thousands of words already in column G.
AdamDay,
Through the modified macro one could enhance an existing dictionary. I would for example put all the target words of my existing dictionary in column G. Then i will put a text and run the macro. The macro will then only put those words in column G which dont exist in my older list already. The new suggested words could also contain some words which are nonsence. I would even dont delet them from G but put a specific signe infron of them in another column. That way the next time the macro is run over another text it wont put the same unuseful word in column G cus it will be there already. And that would save one the time to correct the same mistake each time. And one could always fish out the usefull words as well cus they wont be marked as unuseful.
I think you're crediting me for rylo's post.Hello SHG,
i want to collect vocabulary from texts as well. Well the macro you have written for Adamday is almost what i needed too. ...
However, posting a question in someone else's thread violates a crdinal rule of this and most other forums. Please start your own thread, and provide a link to this one for context as necessary.
Thanks.
oops!!
i hope that rylo and AdamDay eccept my apology. Though i had read the rules long before but had forgotten that important one.
2). Never post a question in the Thread of another member. You MUST ALWAYS start you own New Thread.
SHG
To avoid such an embaracing thing again, i will read the rules each time in future before i post.
soooooooooooooorry!! :-)
Last edited by wali; 01-10-2008 at 01:48 AM.
Thank you for the contribution Wali. Infact I think your program maybe of use to me in another project. Thank you for bringing it to my attention.
Thanks also to shg and rylo. I am still getting back into excel after not using for some time and your contributions have been most valuable.
What I am hoping to do is similar to wali's project, however I really need to make the transition from just a list of words to a usable dictionary much more step-by-step so that I can alter the ways in which the dictionary is created and also view the dictionary in a number of different orders.
First of all I really just need a program that can make the list of words using " " as the delimiter between any two words. I also need the repeats to appear separately.
Secondly I need to take the results of the first program and then start to remove the superfluous characters like "." and ":" as wali suggested for his program.
Third, I need a third program to perform the de-duplication of entries. This program must also count the number of times a word appears. The result will be a list of words with a list of numbers (the count of duplications) alongside. The frequency of words in the poem will be the most important thing I gain from this.
Finally, it will be possible to write another list of words which will be removed from the dictionary.
This is why I'm a little unsure if the dictionary object is the best thing for the job. I really need the process to be step by step as described.
once again - thanks for all your help!
The attachment will histogram text taken from worksheet cells. I copied your post, for example, and it gave:
------A------ -B-- 1 Word Freq 2 the 21 3 of 13 4 to 13 5 I 11 6 a 9 7 for 6 8 need 6 9 program 6 10 words 6 11 and 5 12 be 5 13 dictionary 5 14 list 5 15 also 4 16 in 4 17 is 4 18 step 4 19 will 4 20 as 3 21 from 3 22 really 3 23 your 3 24 all 2 25 am 2 26 another 2 27 by 2 28 can 2 29 count 2 30 it 2 31 just 2 32 make 2 33 most 2 34 number 2 35 project 2 36 Thank 2 37 that 2 38 thing 2 39 using 2 40 which 2 41 you 2 42 after 1 43 again 1 44 alongside 1 45 alter 1 46 any 1 47 appear 1 48 appears 1 49 attention 1 50 back 1 51 been 1 52 best 1 53 between 1 54 bringing 1 55 characters 1 56 contribution 1 57 contributions 1 58 created 1 59 de 1 60 delimiter 1 61 described 1 62 different 1 63 do 1 64 duplication 1 65 duplications 1 66 entries 1 67 excel 1 68 Finally 1 69 first 1 70 frequency 1 71 gain 1 72 getting 1 73 have 1 74 help 1 75 his 1 76 hoping 1 77 however 1 78 if 1 79 I'm 1 80 important 1 81 Infact 1 82 into 1 83 job 1 84 like 1 85 little 1 86 maybe 1 87 me 1 88 more 1 89 much 1 90 must 1 91 my 1 92 not 1 93 numbers 1 94 object 1 95 once 1 96 orders 1 97 perform 1 98 poem 1 99 possible 1 100 process 1 101 remove 1 102 removed 1 103 repeats 1 104 result 1 105 results 1 106 rylo 1 107 Secondly 1 108 separately 1 109 shg 1 110 similar 1 111 so 1 112 some 1 113 start 1 114 still 1 115 suggested 1 116 superfluous 1 117 take 1 118 thanks 1 119 then 1 120 think 1 121 third 1 122 this 1 123 time 1 124 times 1 125 transition 1 126 two 1 127 unsure 1 128 usable 1 129 use 1 130 valuable 1 131 view 1 132 wali 1 133 wali's 1 134 ways 1 135 What 1 136 why 1 137 with 1 138 word 1 139 write 1
Last edited by shg; 01-07-2009 at 10:47 AM.
Added a few new interfaces, so you can histogram the text on the clipboard, delete words, ...
Last edited by shg; 01-07-2009 at 10:47 AM.
thanks for this. I am having a look at the data now.
Here's a little more to play with.
Last edited by shg; 01-07-2009 at 10:47 AM.
Hi! These are excellent. Really handy programs. The largest dataset I'm using has several thousand words. Unfortunately, my computer stalls when I try to run the histogram with this large a set. This is really why I need to have the program perform each step one at a time, so that I can see where any flaws like this occur. Is it easy enough to swap the code around like this?
It really is brilliant stuff by the way - it would have taken me ages to program this!
Many thanks.
Adam
Try this version.
Last edited by shg; 01-07-2009 at 10:47 AM.
Yes, this is perfect. Thanks so much! It does have a bit of a problem with really HUGE sets, but I think that may actually be my old computer! Unfortunately, I now have my work cut out with my project!
Thanks again!
Adam
I found a bug in that version when the text from the clipboard is larger that 32K characters. This version fixes that; I tested it to about 250K. Also added a log.
Last edited by shg; 01-07-2009 at 10:47 AM.
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks