Text Histogram for collecting words and their frequency

    Hello every one,
    Hi SHG,

    there is an old thread in this forum where Forum Guru SHG had posted an amazing tool for Text frequency. The link to the original thread is here:


    As far this great tool is concepted only for English, it neglects all other symbols (ansi characters) but English alphabet A-Z in combination with numbers 0-9.

    I want to use this tool for other languages as well. One possibility for me would be to convert letters of a nonlatin language to a-z and 0-9. Then run SHG's tool over it and reconvert the results to original language letters. I could use code of Rylo to do so:


    But unfortunately the languages i want to use this tool for have more letters than the number of a-z and 0-9 together. And therefore I will be very thankful if the current method is enhanced, so that i can use all ansi characters.

    For example the tool now doesn’t differ between words "cas}e" and "case" both are recorded as "case" and other symbols are neglected. Of course for a good reason, but I need a modification of it so that i can try it with other languages.

    For example i would like to try this tool for pashto language which's characters are:

    آ ChrW(1570)
    ا ChrW(1575)
    ب ChrW(1576)
    پ ChrW(1662)
    ت ChrW(1578)
    ټ ChrW(1660)
    ث ChrW(1579)
    ج ChrW(1580)
    ځ ChrW(1665)
    چ ChrW(1670)
    څ ChrW(1669)
    ح ChrW(1581)
    خ ChrW(1582)
    د ChrW(1583)
    ډ ChrW(1673)
    ذ ChrW(1584)
    ر ChrW(1585)
    ړ ChrW(1683)
    ز ChrW(1586)
    ژ ChrW(1688)
    ږ ChrW(1686)
    س ChrW(1587)
    ش ChrW(1588)
    ښ ChrW(1690)
    ص ChrW(1589)
    ض ChrW(1590)
    ط ChrW(1591)
    ظ ChrW(1592)
    ع ChrW(1593)
    غ ChrW(1594)
    ف ChrW(1601)
    ق ChrW(1602)
    ک ChrW(1705)
    ګ ChrW(1707)
    گ ChrW(1711)
    ل ChrW(1604)
    ﻻ ChrW(65275)
    ﻷ ChrW(65271)
    ﻵ ChrW(65269)
    ﷲ ChrW(65010)
    م ChrW(1605)
    ن ChrW(1606)
    ڼ ChrW(1724)
    و ChrW(1608)
    ﺅ ChrW(1572)
    ه ChrW(1607)
    ي ChrW(1610)
    ې ChrW(1744)
    ئ ChrW(1574)
    ی ChrW(1740)
    ۍ ChrW(1741)
    ﮮ ChrW(1746)
    ﮰ ChrW(64432)
    *ء ChrW(1569)
    ۰ ChrW(1776)
    ۱ ChrW(1777)
    ۲ ChrW(1778)
    ۳ ChrW(1779)
    ۴ ChrW(1780)
    ۵ ChrW(1781)
    ۶ ChrW(1782)
    ۷ ChrW(1783)
    ۸ ChrW(1784)
    ۹ ChrW(1785)

    i was able to solve the problem by converting the non english letters to combination of A-Z and 0-9.
    for example:
    Letter 1= 11aa11
    Letter 2= 22a22
    Letter 3= 33a33
    Letter 4= 44b44
    Letter 5= 55c55
    Letter 6= 66d66
    Letter 7= 77e77
    Letter 8= 88f88
    Letter 9= 99g99
    Letter 10= 00h00

    I then put over 1,2 million words in SHG's histogram and it did its job amazingly quick. SHG is the best and i thank him alot for his code. May he live long.

    After i had the results i reconverted the letters and everything was fine.

    Million thanks to SHG and this forum once again.

    Re: Text Histogram for collecting words and their frequency

    Hi Wali,Could you elaborate on the method you used? I am also trying to get the histogram of a Farsi text, but th eproblem , as you mentioned, is that it has 32 characters rather than the 28 of English. Any help is greatly appreciated.

