Hello every one,
Hi SHG,
there is an old thread in this forum where Forum Guru SHG had posted an amazing tool for Text frequency. The link to the original thread is here:
http://www.excelforum.com/excel-prog...-object-2.html
As far this great tool is concepted only for English, it neglects all other symbols (ansi characters) but English alphabet A-Z in combination with numbers 0-9.
I want to use this tool for other languages as well. One possibility for me would be to convert letters of a nonlatin language to a-z and 0-9. Then run SHG's tool over it and reconvert the results to original language letters. I could use code of Rylo to do so:
http://www.excelforum.com/excel-prog...2nd-sheet.html
But unfortunately the languages i want to use this tool for have more letters than the number of a-z and 0-9 together. And therefore I will be very thankful if the current method is enhanced, so that i can use all ansi characters.
For example the tool now doesn’t differ between words "cas}e" and "case" both are recorded as "case" and other symbols are neglected. Of course for a good reason, but I need a modification of it so that i can try it with other languages.
For example i would like to try this tool for pashto language which's characters are:
آ ChrW(1570)
ا ChrW(1575)
ب ChrW(1576)
پ ChrW(1662)
ت ChrW(1578)
ټ ChrW(1660)
ث ChrW(1579)
ج ChrW(1580)
ځ ChrW(1665)
چ ChrW(1670)
څ ChrW(1669)
ح ChrW(1581)
خ ChrW(1582)
د ChrW(1583)
ډ ChrW(1673)
ذ ChrW(1584)
ر ChrW(1585)
ړ ChrW(1683)
ز ChrW(1586)
ژ ChrW(1688)
ږ ChrW(1686)
س ChrW(1587)
ش ChrW(1588)
ښ ChrW(1690)
ص ChrW(1589)
ض ChrW(1590)
ط ChrW(1591)
ظ ChrW(1592)
ع ChrW(1593)
غ ChrW(1594)
ف ChrW(1601)
ق ChrW(1602)
ک ChrW(1705)
ګ ChrW(1707)
گ ChrW(1711)
ل ChrW(1604)
ﻻ ChrW(65275)
ﻷ ChrW(65271)
ﻵ ChrW(65269)
ﷲ ChrW(65010)
م ChrW(1605)
ن ChrW(1606)
ڼ ChrW(1724)
و ChrW(1608)
ﺅ ChrW(1572)
ه ChrW(1607)
ي ChrW(1610)
ې ChrW(1744)
ئ ChrW(1574)
ی ChrW(1740)
ۍ ChrW(1741)
ﮮ ChrW(1746)
ﮰ ChrW(64432)
*ء ChrW(1569)
۰ ChrW(1776)
۱ ChrW(1777)
۲ ChrW(1778)
۳ ChrW(1779)
۴ ChrW(1780)
۵ ChrW(1781)
۶ ChrW(1782)
۷ ChrW(1783)
۸ ChrW(1784)
۹ ChrW(1785)
Thanks
Last edited by wali; 05-06-2010 at 02:17 PM.
Hi,
i was able to solve the problem by converting the non english letters to combination of A-Z and 0-9.
for example:
Letter 1= 11aa11
Letter 2= 22a22
Letter 3= 33a33
Letter 4= 44b44
Letter 5= 55c55
Letter 6= 66d66
Letter 7= 77e77
Letter 8= 88f88
Letter 9= 99g99
Letter 10= 00h00
I then put over 1,2 million words in SHG's histogram and it did its job amazingly quick. SHG is the best and i thank him alot for his code. May he live long.
After i had the results i reconverted the letters and everything was fine.
Million thanks to SHG and this forum once again.
Hi Wali,Could you elaborate on the method you used? I am also trying to get the histogram of a Farsi text, but th eproblem , as you mentioned, is that it has 32 characters rather than the 28 of English. Any help is greatly appreciated.
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks