Text-to-Speech Conversion Using Artificial Neural Systems

Jing Wang .

Abstract

Text-to-speech conversion is an interesting and significant application of artificial neural systems (ANSs). Since the NETtalk was developed by Sejnowski and Rosenberg (1986; 1987), it has been found that ANSs can successfully solve the mapping problem of textto- speech conversion, which is difficult by conventional computing methods. In this thesis, a text-to-speech system was developed for an IBM-PC by employing ANSs to learn the mapping between text and allophones. With the use of the General Instrument's SP0256 "Narrator Speech Processor", which converts allophones to speech, this system transforms a written . text input into a corresponding speech output. The network, based on the back propagation learning algorithm, was trained to learn English word pronunciation. It was shown that the network learning process was similar to that of a human being. The network exhibited the ability to perform recollection and generalization. It achieved 98.1% accuracy for the words in the training sets, and 86.7% accuracy for new words. The possibility of applying the other two ANS models, bidirectional associative memory (BAM) and sparse distributed memory (SDM), to the problem of text-to-speech conversion was also investigated. The results showed that the BAM network is of limited use in text-to speech applications because . of its limited storage capacity, and the SDM network could be well-suited to these applications if the problem of "degraded capacity" were overcome. 1