Re: Items needed for computer generation of words( was Re: ARequest for Clarification: Syllables) Saul Epstein Sat, 8 Nov 1997 09:20:22 -0600 From: Rob Zook Date: Friday, November 7, 1997 4:24 PM > At 03:10 PM 11/7/97 -0600, Saul wrote: > >Rob wrote: > > > >>For any given syllable in a language one could find up to 24 different > >>variations on how a syllable is formed if we take into consideration > >>consonent clusters and dipthongs. > >> > >>CV CCV VC VVC > >>CVV CCVV VCC VVCC > >>CVC CCVC VCV VVCV > >>CVCC CCVCC VCVV VVCVV > >>CVVC CCVVC VCCV VVCCV > >>CVVCC CCVVCC VCCVV VVCCVV > > > >Actually, the last 4 items in the last two columns have two syllables, so > >that's 16. > > Oops. Your right, I got a little carried away there. > > >But if one includes 3-consonant clusters like spl- or -rst, there > >would be..,32? > > Well, I was meaning CC to stand in for any consonent cluster. And > VV to stand in for any dipthong. Ahh. Very clever. > >>The way syllables are formed in the middle of the words are also > >>probably different then how they are formed at the beginning and the > >>end. > > > >If you mean that the kinds of consonants and clusters permitted at the > >beginnings and ends of a syllable depends on the syllable's position in a > >word, that's true. But we can approach the description of such rules now. > >For instance, if we find that the sequence #NC > >(word-boundary+nasal+consonant) is not permitted, but the sequence VNC > >(vowel+nasal+consonant) is permitted, then the hypothetical word /omqee/ > >is possible, regardless of whether it breaks as om-qee or omq-ee or even > >o-mqee. > > /mq/? Now that seems like a real throat breaker to me. Worse than /ky/ > by far. Oh, /ky/ is easy. You say it every time you say "cute," or "accuracy." > >To be sure, things can get more complicated than this. What about a word > >like /omqprii/? If /mq/ is permitted after a vowel, /qp/ is permitted > >except at the end of words, and /pr/ is permitted before a vowel, is > >VmqprV permitted? Or do we need to further specify that NC must be > >preceded by a vowel and cannot be followed by a stop? > > Well, we should put it in as explicit terms as possible. Given the > small amount of data in the dictionary, whatever Marketa, and Prof. > Zvelebil cannot supply we may have to make up as it becomes neccessary. Yes. > The best thing would be a list of word inital, mid word and word > ending consonent clusters, and possible a fuller list with allowed > vowels/dipthongs. If we make the rules explicit enough, such a > list should be simple to generate. It could then become an input > file to the program doing the word generation. Right. That's what I was trying to start with my "Consonant Clusters at the Beginnings of Words." [snip] > So to summarize, here are the things which we need to do a computerized > generation of a bunch of words: > > 1. Frequency Distribution (FD) of the phonemes (we almost have this). > 2. FD of syllable forms. > 3. FD of word initial, mid-word, and word ending consonent clusters. > 4. FD of dipthongs. We can find 3-4 for the data we have in much the same we have for 1. We can even make a stab at 2, if we take single-syllable words as data. If the professor can give us guidance we can simply set the FD for some of these. Otherwise we have some construction to do. -- from Saul Epstein liberty uit net www johnco cc ks us sepstein "Surak ow'phaaper thes'hi thes'tca'; thes'phaadjar thes'hi suraketca'." -- K'dvarin Urswhl'at