+ Reply to Thread
Results 1 to 23 of 23

break strings into syllables

  1. #1
    Forum Contributor
    Join Date
    11-12-2007
    Location
    Germany
    MS-Off Ver
    2007
    Posts
    472

    break strings into syllables

    Hello every one,

    I have to write transcription for over 120 000 words in a worksheet “Text” in Column A. :-(
    Now the ugliest part is to divide the transcription into syllables. And I need automation for that. Of course there will be a lot to correct afterwards but I hope that atomization and latter the correction will consume less time compared to doing all the work manually. I hope!
    I have automated the conversion of special language letters to transcription letters in excel. Excel converts the strings to transcription itself and I have to make the corrections latter. I have experienced that I need a lot less time for this than typing whole the transcription myself.
    I wonder if atomization of dividing the transcription into syllables and then manually correcting it will also save me that much time.
    I guess that for atomization of breaking strings into syllables would need few pre defined criteria. First of all vowels and consonants have to be defined. For that purpose I have an extra worksheet “v&c”. Column A of this “v&c” contains of vowels and column B contains of consonants. For example ( the real consunants and vowels may be different. Its just an example here):
    Consunants
    b
    c
    d
    f
    g
    h
    j
    k
    l
    m
    n
    p
    q
    r
    s
    t
    v
    w
    x
    z
    ß
    µ

    Vowels
    a
    e
    i
    o
    u
    ä
    ö
    y


    Secondly, the positions of consonants and vowels have to be analyzed. There are few cases possible:
    Before:
    love
    ballade
    transcription
    country
    abba
    aba
    abab
    möhren
    mässigung
    schutz suchen
    suchende person
    combination of words

    Should be:
    lo-ve ( consunant vowel - consunant vowel )
    bal-la-de ( con vow con - con vow -con vow )
    tran-scrip-tion ( con con vow con -con con con vow con - con vow vow con )
    coun-try ( con vow vow con - con con vow )
    ab-ba ( vow cons - con vow )
    a-ba ( vow - con vow )
    a-bab ( vow - con vow con )
    möh-ren ( con vow con - con vow con )
    mäs-si-gung ( con vow con - con vow - con vow con con )
    schutz suc-hen ( …. )
    suc-hen-de per-son ( …. )
    com-bi-na-tion of words ( …. )

    I will be very thankful if someone could help me out with it.
    Attached Files Attached Files
    Last edited by wali; 05-24-2010 at 04:47 PM.

  2. #2
    Forum Expert davegugg's Avatar
    Join Date
    12-18-2008
    Location
    WI, US
    MS-Off Ver
    2010
    Posts
    1,884

    Re: break strings into syllables

    To do this, you'd need a set of criteria for a syllable. Can you create this set? I'd think it an impossible task, especially if you are dealing with more than one language.

    However, if you can come up with a set of criteria, I can definitely automate it for you.
    Is your code running too slowly?
    Does your workbook or database have a bunch of duplicate pieces of data?
    Have a look at this article to learn the best ways to set up your projects.
    It will save both time and effort in the long run!


    Dave

  3. #3
    Forum Contributor
    Join Date
    11-12-2007
    Location
    Germany
    MS-Off Ver
    2007
    Posts
    472

    Re: break strings into syllables

    Hi davegugg,

    thank you very much for your quick aswer. Of course its really a complex thing and a lot of analytical work has to be done. I would like to have a start first and maybe one can make it more effective latter:


    Case1: interchanging between con and vow, like:
    babi, love, novi, balabali ----> here we have pairs of con+vow and always startig with a con
    => ba-bi, lo-ve, no-vi, ba-la-ba-li....

    Case 2: two con between two vow, like:
    abbi, abbitte, ella, aslammikhant--->starting with vowel & followed by two con.
    => ab-bi, ab-bit-te, el-la, as-lam-mik-hant

    Case 3: one con between two vows, like:
    abi, alu, osidu, usidudt---> starting with with vowel & followd by con followed by con
    => a-bi, a-lu, o-si-du, u-si-dudt

    Case 4: combination of Case 1 and Case 2, like:
    babibbi, novistalla----> start Case 1 and end Case 2
    => ba-bib-bi, no-vis-tal-la


    Thanks

  4. #4
    Forum Expert davegugg's Avatar
    Join Date
    12-18-2008
    Location
    WI, US
    MS-Off Ver
    2010
    Posts
    1,884

    Re: break strings into syllables

    Hi wali

    Wow, this is more complicated than I had originally anticipated. Give this a shot and let me know if you can come up with any upgrades.

    Please Login or Register  to view this content.

  5. #5
    Forum Contributor
    Join Date
    11-12-2007
    Location
    Germany
    MS-Off Ver
    2007
    Posts
    472

    Re: break strings into syllables

    Oh God! That was lot of work for you!! thank you very much.

    I will try it and send my response tomorrow. Thank you once again

  6. #6
    Forum Contributor
    Join Date
    11-12-2007
    Location
    Germany
    MS-Off Ver
    2007
    Posts
    472

    Re: break strings into syllables

    Hi,

    i ran the macro over the example i have attached to the thread. It gives following results:

    love .=> lo-ve
    ballade .=> ba-l-laa-de (should be => bal-la-de)
    transcription .=> tra-nscri-p-tii-on (should be => tran-scrip-tion)
    country .=> co-untry (should be => coun-try)
    abba .=> ab-baa (should be => ab-ba)
    aba .=> aba (should be => a-ba)
    abab .=> aba-b (should be => a-bab)
    möhren .=> möhre-n (should be => möh-ren)
    mässigung .=> mässi-gu-ng (should be => mäs-si-gung)
    schutz suchen .=> schu-tz su-c-hee-n (should be => schutz suc-hen)
    suchende person .=> su-c-hee-n-dee- -pee-r-soo-n (should be => suc-hen-de per-son)
    combination of words .=> co-m-bii-na-ti-on- oo-f wo-rds (should be => com-bi-na-tion of words)
    balabali .=> ba-la-ba-li- (should be => ba-la-ba-li)
    abbi .=> ab-bii (should be => ab-bi)


    The first thing is that the code doubles some vowels. Those vowels get double if they follow a double letter.

    Secondly,how can we change it so that if a vowel is followed by two cons. and a vowel that the "-" is put between the cons?

    It seems that it may really work. i hope that you can find time to make some adjustments.

    I thank you a million time for taking the time to solve my problem. Thanks
    Last edited by wali; 05-13-2010 at 03:17 PM.

  7. #7
    Forum Expert davegugg's Avatar
    Join Date
    12-18-2008
    Location
    WI, US
    MS-Off Ver
    2010
    Posts
    1,884

    Re: break strings into syllables

    I have fixed the vowel doubling.
    In terms of your second request, can you give me a specific example?
    Lastly, I don't know how to type the foreign letters (like german umlouts), but you can add them to the code yourself. Go to the Vowel Function, and add those letters to the Cases. I've added an example commented out. That should fix words like your example mohren.

    Please Login or Register  to view this content.

  8. #8
    Forum Contributor
    Join Date
    11-12-2007
    Location
    Germany
    MS-Off Ver
    2007
    Posts
    472

    Re: break strings into syllables

    Hi,

    the double vowel is fixed. Great!!
    Ok! i will change the vowels according to my needs in the code.

    What do you think? can you also fix following things?:


    original => your code returns now => but should be
    ballade .=> ba-l-lade (.should be => bal-la-de ) -->no isolation of a cons like -l-
    transcription .=> tra-nscri-p-tion (.should be => tran-scrip-tion ) --> at least one cons after vow before putting "-" , if vow is not the first letter of the word
    country .=> co-untry (.should be => coun-try ) --> if two vowels follow each other both should be considered as one plus cons after vowel
    aba .=> aba (.should be => a-ba )--> if word starts with vowel and third letter is vowel again then isolate the first vowel

    abab .=> aba-b (.should be => a-bab )


    thanks
    Last edited by wali; 05-13-2010 at 04:24 PM.

  9. #9
    Forum Expert davegugg's Avatar
    Join Date
    12-18-2008
    Location
    WI, US
    MS-Off Ver
    2010
    Posts
    1,884

    Re: break strings into syllables

    You've got to remember we're dealing with set criteria. Just putting examples without a criteria does no good.
    Original: I think you missed some text here.
    ballade: Your first two criteria cause this. I can't fix it without messing up those criteria.
    etc.

    I told you when we started that this seemed to be an impossible task. Theoretically, you could add thousands of criteria to try and decypher where the "-" should be, but in the end there are simply too many possibilities. I think you'd be best off doing it manually, but you can use my code to try to build your own if you so desire.

  10. #10
    Forum Expert snb's Avatar
    Join Date
    05-09-2010
    Location
    VBA
    MS-Off Ver
    Redhat
    Posts
    5,649

    Re: break strings into syllables

    What about ?

    angst-schreeuw
    bou-wen
    ge-e-ve-naard
    idee-ën
    sba-gli-a-to
    wij-zi-ging

    Syllables in English are quite different from syllables in German, Dutch or Italian.
    Even in 1 language there is no single algorithm.
    Last edited by snb; 05-13-2010 at 04:55 PM.

  11. #11
    Forum Contributor
    Join Date
    11-12-2007
    Location
    Germany
    MS-Off Ver
    2007
    Posts
    472

    Re: break strings into syllables

    Dear Davegugg,

    you are right! It is imposible.
    But i promise that your code will be used. I will try to modify it.
    It's really great of you investing that much time of yours in solving my problem. I just can't thank you enough for that. God bless you!

    I will post the results here, if I succeed in adding some more conditions.

    Dear snb,
    yes, you are right for each language it would differ. And of cousre wont work perfectly for any language. But though for each language one can change bit.

  12. #12
    Forum Expert snb's Avatar
    Join Date
    05-09-2010
    Location
    VBA
    MS-Off Ver
    Redhat
    Posts
    5,649

    Re: break strings into syllables

    But though for each language one can change bit.
    Not even that

  13. #13
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: break strings into syllables

    From http://en.wikipedia.org/wiki/Syllabification

    In some languages, the spoken syllables are also the basis of syllabification in writing. However, possibly due to the weak correspondence between sounds and letters in the spelling of modern English, written syllabification in English is based mostly on etymological or morphological instead of phonetic principles. For example, it is not possible to syllabify "learning" as lear-ning according to the correct syllabification of the living language. Seeing only lear- at the end of a line might mislead the reader into pronouncing the word incorrectly, as the digraph ea can hold many different values. The history of English orthography accounts for such phenomena.

    English written syllabification therefore deals with a concept of "syllable" that doesn't correspond to the linguistic concept or a phonetic (as opposed to morphological) unit.

    As a result, even most native English speakers are unable to syllabify words accurately without consulting a dictionary or using a word processor[citation needed]. The process is, in fact, so complicated that even schools usually do not provide much more advice on the topic than to consult a dictionary. In addition, there are differences between British and US syllabification and even between dictionaries of the same English variety.

    In Finnish, Italian, and other nearly phonetically spelled languages, writers can in principle correctly syllabify any existing or newly created word using only general rules. In Finland, children are first taught to hyphenate every word until they produce the correct syllabification reliably, after which the hyphens can be omitted.
    I have no idea where Pashto falls in all that.
    Entia non sunt multiplicanda sine necessitate

  14. #14
    Forum Contributor
    Join Date
    11-12-2007
    Location
    Germany
    MS-Off Ver
    2007
    Posts
    472

    Re: break strings into syllables

    It's imposible to divide original Pashto script (right to left) into syllables. Because, you dont have always visible vowels in the script. That means one word can be read in different ways and you have to read out from context which word it is. There are many examples where semantically you have four or five different words but they are written all with same spelling.

    But I have here a transcription and not the original script. Secondly the words i have, their lenght dont exceed 12 Letters. No vowel is missing because the trascription is written in IPA(international phonetic alphabets). And I am quite sure that up to an extent the division into syllables is possible. Of course not 100% correctly but if enough critereas are defined and studied I am sure that up to 40% it may be divided correctly. I will try to studie the different possibe cases.

    Of course for english it is not possible at all.
    Last edited by wali; 05-13-2010 at 07:00 PM.

  15. #15
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: break strings into syllables

    You could categorize words by vowel/consonant formation, e.g.,

    youthfulness -> aaabbbabbabb
    schmaltziest -> bbbbabbbaabb

    There are 891 different such patterns among the 11,000-odd 12-letter words from the SOWPADS.

  16. #16
    Forum Contributor
    Join Date
    11-12-2007
    Location
    Germany
    MS-Off Ver
    2007
    Posts
    472

    Re: break strings into syllables

    I will spend some time with it the comming days. By the way I am posting some pashto words in attachment as example. May be not all the symbols are shown right cus i have a special font for it.
    Attached Files Attached Files

  17. #17
    Forum Expert
    Join Date
    01-03-2006
    Location
    Waikato, New Zealand
    MS-Off Ver
    2010 @ work & 2007 @ home
    Posts
    2,243

    Re: break strings into syllables

    Wow!

    What a fascinating thread. I agree with everyone, although you may be able to get some defined logic for English syllables, the number of possibilities are massive.
    I've never heard of IPA before, but when comparing Japanese to English, Japanese would have to easier as I think it is one of the more (/most?) phonetic languages.

    Dave,
    Great work on putting the logic down in a working code
    I'm not sure how fast the code runs now, but to make it faster, you could load the info into an "in memory array", work on array elements "in memory" & then write the results back to the spreadsheet in one hit (or fewer than currently). This minimises the "hits" on the spreadsheet. See here for some links or here for an alternative example.

    Wali,
    If you want to improve the speed of the code, you can "Order" the Select Case statements to have the characters listed from most likely to least likely and finally the "Case Else" as the last statement. This could be useful if you end up with a large list of other characters, because I believe a Select Case only processes the options until it finds the first True case statement. Here are a couple of links which discuss letter frequency (I'm guessing you may have something similar already?) that you could use to order your Case Statements:
    http://www.askoxford.com/asktheexper...quency?view=uk
    http://en.wikipedia.org/wiki/Letter_frequency

    Here is a variation on Dave's Select Case with a slightly different syntax. Note: I'm not sure if (or when) there would be a tipping point between the two versions when more characters are added (ie either in a single line or as a separate Case statement).
    Please Login or Register  to view this content.
    -----------------------
    From a different angle, I wonder if this could be solved using regular Expressions?
    Unfortunately, I've only started reading up on these in the past few days & have never used Reg Ex myself so I can't offer a possible Reg Ex solution.

    Shg, nice link. I'm not yet a real wiki fan but I couldn't help myself this time & succumbed to linking to it as well.

    hth
    Rob
    Last edited by broro183; 05-13-2010 at 08:12 PM.
    Rob Brockett
    Kiwi in the UK
    Always learning & the best way to learn is to experience...

  18. #18
    Forum Expert davegugg's Avatar
    Join Date
    12-18-2008
    Location
    WI, US
    MS-Off Ver
    2010
    Posts
    1,884

    Re: break strings into syllables

    I see what you're saying about loading an array into memory rather than reading it off the sheet each time Rob, thanks for the tip. I'll keep that in mind for any upcoming projects and try to use it so I get in the habit.

  19. #19
    Forum Contributor
    Join Date
    11-12-2007
    Location
    Germany
    MS-Off Ver
    2007
    Posts
    472

    Re: break strings into syllables

    Hey,

    I took transcription of around 60 000 pashto words which were already correctly devided into syllables. I simplified them as SHG had suggested. I replaced all consunants with "b" and all vowels with "a".
    I could reduce 60 000 cells into 363 cells. I was surprised to see how many unique solutions there are. For most of the combinations there is only one solution.
    So my guess was right that for most vowel consunant combination there is only one solution and not as many as in english.
    I will keep trying to understand it better and will post as soon i have something new.
    Attached Files Attached Files

  20. #20
    Forum Contributor
    Join Date
    11-12-2007
    Location
    Germany
    MS-Off Ver
    2007
    Posts
    472

    Re: break strings into syllables

    Hey,

    The problem is solved. You won’t believe it! Over 70% splitting is correct! Yahooooooooooooooo
    The key to solution was the tip of SHG. I did following:

    1) took 60 000 pashto words with transcription which were already divided into syllables. I replaced all vowels to "a" and all consonants to "b"
    2) I removed the duplicates and only about 363 cases of combination remained. I made two columns of them: A column without "|" and B column containing the transcription which was divided with "|".
    3) I took now the real data which was supposed to be divided into syllables. I copied it in an extra column and replaced all the vowels and consonants with a and b.
    4) I compared now the A column of 363 with extra column of the real data where the simplified data was saved. If they matched the cell of B column from 363 sheet would replace the extra cell in the real data sheet.
    5) Now i had the real data with an extra column in front of it containing divided strings in simplified form( only as and bs). JBeaucaire was that kind to write a code for me and according to the model the real data was divided. http://www.excelforum.com/excel-prog...-in-col-b.html

    And at the end of the day I saved at least one year of life. :-)

    Thank you all! Thank you SHG for the cool idea with simplification
    Thanks to davegugg and JBeaucaire for their codes.

  21. #21
    Registered User
    Join Date
    05-06-2021
    Location
    nepal
    MS-Off Ver
    2016
    Posts
    1

    Post Re: break strings into syllables

    wali can u plz send me the code i hav similar problem.

  22. #22
    Forum Guru
    Join Date
    08-28-2014
    Location
    USA
    MS-Off Ver
    Excel 2019
    Posts
    17,806

    Re: break strings into syllables

    Administrative Note:

    Hello sudeep and Welcome to Excel Forum.

    We are happy to help, however whilst you feel your request is similar to this thread, experience has shown that things soon get confusing when answers refer to particular cells/ranges/sheets which are unique to your post and not relevant to the original.

    Please see Forum Rule #4 about hijacking and start a new thread for your query.

    If you are not familiar with how to start a new thread see the FAQ: How to start a new thread

    Let us know if you have any questions.
    Consider taking the time to add to the reputation of everybody that has taken the time to respond to your query.

  23. #23
    Registered User
    Join Date
    07-07-2022
    Location
    big man
    MS-Off Ver
    office 69
    Posts
    1

    Post Re: break strings into syllables

    surely instead of using the case function you can just check if an item is in array?

    Please Login or Register  to view this content.
    [/QUOTE]

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1