FW: Computer Generated Words (was RE: Arrghhhh!!!! and a Summary of Whittaker, Pat Thu, 27 Nov 97 13:28:00 M That sounds great and we would also get a dictionary (or add to an existing one) at the same time. Kills two birds with one stone. Selek ---------- From: owner-vulcan-linguistics To: vulcan-linguistics Subject: Computer Generated Words (was RE: Arrghhhh!!!! and a Summary of What has Gone Before) Date: Thursday, 27 November, 1997 1:47PM At 02:22 AM 11/27/97 MET_DST, Maggie wrote: < >look like. Right now, I'm having difficulties to visualize the >outcome of such an (automated) procedure, so a sample would be >most welcome! Well, I envision something like this: we establish a frequency distribution for the phonemes; initial, mid and ending consonent clusters; dipthongs; and syllable types. All those lists act as input files to the program, along with a list of English meanings. First the program would select a number of syllables for the word. It does this by generating a random number from 0,000 to 1,000. If Vulcan follows the pattern of all Terran languages, then the chance of a word having a certain number of syllables gets smaller as the number of syllables gets larger. So words of 5 syllables might occur only 1% of the time, while words of 1 syllable might occur 50% of the time. The program would then generate another random number and then us this number to select a syllable type. For example, say the program generated a 0,012. After reading in the syllable frequency distribution, the program will sort it in ascending order based on the frequency percentage. So it would look something like this if I printed it out: CCVC 0,002 'CVV 0,002 CVVC 0,002 'CC 0,002 C 0,002 CCVCC 0,005<< VV 0,005 CCV' 0,005 ... In this case it would start at the top of the list and sum up all the percentages in the list of syllables and then stop when the sum went over the number generated above, 0,012. Which in the list above would make the syllable of the form CCVCC. In the list of syllable types, a C means a single consonent, a CC means a consonant cluster, a V means a single vowel, a VV means a diphthong, and a ' means a glottal stop. Next the program would generate a new number an use that to find a consonent cluster, following the same method it did to find a syllable type only using the most appropriate frequency distribution of consonent clusters (of which it would have three, initial, mid, and ending). In this case for our first syllable in the new word the program would select from a distribution of word initial consonent clusters. Let us say it picks <. Then it would do the same thing for the vowel. Let's say it picks a < for that. Finally it picks another mid word consonent cluster, say <, and we have our first syllable: driirl. The program would follow the same procedure for all the syllables in the word, and then grab a meaning out of the list of meanings and save them in a file. This would eventually result in a list of Vulcan words and English meanings. Even using as complex an algorithm as this, the program will generate some really goofy looking words as well as some really cool looking words. So, I'm thinking that I could have it generate a set of Vulcan- looking words for each meaning and then do this for 20 or so meanings a week. I could then post the result of each 20 meanings and if some one objects to the way a Vulcan-like word looks we throw the meaning back in the unknown list for another go. So each week I could post a list looking something like this: abdomen tiil able hrail above methdi abstract oorrinda accidental qitlai'a account jrio'dka acid perluu ... So if someone hates the look of tiil, I would throw abdomen back in the list of unknown meanings for a later weeks run. Alternatively, I could generate all 2000 or so meanings I have easily at hand and post them to the list, make note of all the objections and then rerun the list of meanings that had goofy Vulcan-like words and post them again. Repeating the process until all 2000 have a Vulcan-like words no one objects to. When someone runs into a English word they cannot translate, then we can either make up more words manually as we need them, or put together another list of meanings and I can make another run with the new list. How does that sound? Rob Z.