Items needed for computer generation of words( was Re: A
Rob Zook 
Fri, 07 Nov 1997 16:24:57 -0600

At 03:10 PM 11/7/97 -0600, you wrote:
>>Without the rules for forming syllables we cannot figure out what sets
>>of consonents in the middle of a word are or are not consonent clusters.
>
>True.
>
>>We cannot even tell how to correctly pronounce the words we do have.
>
>Well, probably not. I think knowing what <'> is will largely resolve 
>that, however.

In many cases it does make syllables in words with <'> fairly obvious.
But not every word has a <'>, and those which don't may have up to
4 or more permutations on how to split the syllables.

>>I realize I'm thinking ahead a bit here, but not much I think.
>>
>>For any given syllable in a language one could find up to 24 different
>>variations on how a syllable is formed if we take into consideration
>>consonent clusters and dipthongs.
>>
>>CV    CCV    VC    VVC
>>CVV   CCVV   VCC   VVCC
>>CVC   CCVC   VCV   VVCV
>>CVCC  CCVCC  VCVV  VVCVV
>>CVVC  CCVVC  VCCV  VVCCV
>>CVVCC CCVVCC VCCVV VVCCVV
>
>Actually, the last 4 items in the last two columns have two syllables, so
>that's 16. 

Oops. Your right, I got a little carried away there.

>But if one includes 3-consonant clusters like spl- or -rst, there
>would be..,32?

Well, I was meaning CC to stand in for any consonent cluster. And 
VV to stand in for any dipthong.

>>The way syllables are formed in the middle of the words are also
>>probably different then how they are formed at the beginning and the
>>end.
>
>If you mean that the kinds of consonants and clusters permitted at the
>beginnings and ends of a syllable depends on the syllable's position in a
>word, that's true. But we can approach the description of such rules now.
>For instance, if we find that the sequence #NC
>(word-boundary+nasal+consonant) is not permitted, but the sequence VNC
>(vowel+nasal+consonant) is permitted, then the hypothetical word /omqee/ 
>is possible, regardless of whether it breaks as om-qee or omq-ee or even
>o-mqee.

/mq/? Now that seems like a real throat breaker to me. Worse than /ky/
by far.

>To be sure, things can get more complicated than this. What about a word
>like /omqprii/? If /mq/ is permitted after a vowel, /qp/ is permitted
>except at the end of words, and /pr/ is permitted before a vowel, is 
>VmqprV permitted? Or do we need to further specify that NC must be 
>preceded by a vowel and cannot be followed by a stop?

Well, we should put it in as explicit terms as possible. Given the 
small amount of data in the dictionary, whatever Marketa, and Prof.
Zvelebil cannot supply we may have to make up as it becomes neccessary.

The best thing would be a list of word inital, mid word and word
ending consonent clusters, and possible a fuller list with allowed
vowels/dipthongs. If we make the rules explicit enough, such a
list should be simple to generate. It could then become an input
file to the program doing the word generation.

>>The distribution of syllables in a language probably is not random. If
>>we do a computer generate set of Vulcan words, we need that information
>>to develop the proper distribution of syllables in the final set.
>
>I know. But an elaborate enough set of phonological rules should suffice.
>Putting them into algorithm-appropriate language may be beyond us, of
>course...

Well, if we can put it in an explicit list, I can make it an algorithm.

I was thinking that if we had a list of syllables in the words we have, 
I can develope a distribution of the rules like I mentioned above: CV, 
VC.,etc. Then combining that with the phonological constraints we should 
have enough information to put in a word generation algorithm.

I can do a psuedo random selection of a syllable type, then within
the syllable another psuedo random selection of an allowable consonent
clusters, consonents, vowels, and/or dipthongs.

So for example, say the program selects a syllable of form CVCC. Then
based on our base-line phoneme distribution, I can do a random selection
of a consonent and a vowel. Then we can either make the consonent
cluster a *created* distribution to select from (we really don't have
a large enough sample for it), or a totally random one.

Same thing would follow for the additional types. So we would also
need to either make up a distribution of dipthongs or generate a
random one.

So to summarize, here are the things which we need to do a computerized
generation of a bunch of words:

1. Frequency Distribution (FD) of the phonemes (we almost have this).
2. FD of syllable forms.
3. FD of word initial, mid-word, and word ending consonent clusters.
4. FD of dipthongs.

Rob Z.

--------------------------------------------------------
Men are born ignorant, not stupid; they are made stupid
by education.
-- Bertrand Russell