3 research outputs found

    Naturalness Improvement Algorithm for Reconstructed Glossectomy Patient's Speech Using Spectral Differential Modification in Voice Conversion

    Get PDF
    In this paper, we propose an algorithm to improve the naturalness of the reconstructed glossectomy patient's speech that is generated by voice conversion to enhance the intelligibility of speech uttered by patients with a wide glossectomy. While existing VC algorithms make it possible to improve intelligibility and naturalness, the result is still not satisfying. To solve the continuing problems, we propose to directly modify the speech waveforms using a spectrum differential. The motivation is that glossectomy patients mainly have problems in their vocal tract, not in their vocal cords. The proposed algorithm requires no source parameter extractions for speech synthesis, so there are no errors in source parameter extractions and we are able to make the best use of the original source characteristics. In terms of spectrum conversion, we evaluate with both GMM and DNN. Subjective evaluations show that our algorithm can synthesize more natural speech than the vocoder-based method. Judging from observations of the spectrogram, power in high-frequency bands of fricatives and stops is reconstructed to be similar to that of natural speech

    Design of a portable microprocessor-based International Phonetic Alphabet (IPA) text-to-speech conversion device for use by the speech impaired

    Get PDF
    A portable microprocessor-based alternate communication device was designed and a prototype fabricated. This device allows speech impaired individuals, whose language skills remain intact, to input and edit an utterance of unrestricted vocabulary via a keyboard/LCD display system. The utterance, which is specified using the International Phonetic Alphabet (IPA), is then converted into an appropriate set of speech synthesizer parameters, using context sensitive rules. An interrupt driven system is used to pass each set of parameters, in order and at the appropriate time, to the synthesizer, thus generating an audible output;Tests conducted using the device in its present state have shown the utterance specification process to be slower than desired and the speech produced to be rather robotic in nature. Although understanding the speech generated is sometimes a problem, intelligibility improves with time as the listener becomes used to the synthesized speech and the programmer\u27s ability to specify the utterance improves;As the current implementation does not include rules that vary the suprasegmental features, it is anticipated that the introduction of such rules will improve the quality and intelligibility of the speech output. Further suggestions for future development are provided, and focus on expediting the utterance specification process and improving the quality of the speech generated
    corecore