51 research outputs found

    New Excitation Signal for High Quality Linear Predictive Coding Speech Synthesis

    Get PDF
    The purpose of this thesis is to improve the quality of synthetic to attempt to speech by using the bes t excitation signal during Linear Predictive Coding (LPC) synthesis. This thesis examines the human speech system as a basis for our synthetic speech model. Then it closely examines LPC synthesis, including the mathematical details. One dominant factor in producing natural-sounding and intelligible speech is the excitation signal. For LPC synthesis the excitation signal must have a flat frequency spectrum. A train of impulses separated by the pitch period of the speech has been the standard excitation signal for voiced speech in LPC synthesis. Unfortunately, speech produced using this excitation signal has an unnatural nbuzz . For natural-sounding speech, the excitation signal should resemble the glottal volume velocity waveform. The glottal volume velocity waveform is a measure of the excitation that produces natural speech and it does not have a flat frequency spectrum. This raises the question: what type of excitation signal should be used to produce the most natural-sounding speech possible? To answer this question, we examined six excitation signals that are currently being used in LPC synthesis. We also developed many new excitation signals to be used specifically for synthesizing natural-sounding speech. We experimented with the LPC parameters and these excitation signals to determine the conditions that produced the best speech. Then we compared five of the excitation signals in forced pair trials. We found that our new excitation, LF Impulse excitation, produced speech superior in overall quality (that is naturalness and intelligibility) to the others. We conclude, therefore, that LF Impulse excitation, or an excitation similar to it, should be considered when attempting to produce speech that is both natural-sounding and intelligible with LPC synthesis

    Speech synthesis based on a harmonic model

    Get PDF
    The wide range of potential commercial applications for a com puter system capable of automatically converting text to speech (TTS) has stimulated decades of research. One of the currently most successful approaches to synthesising speech, concatenative TTS synthesis, combines prerecorded speech units to build full utterances. However, th e prosody of the stored units is often not consistent with that of the target utterance and m ust be altered. Furthermore, several types of mismatch can occur at unit boundaries and must be smoothed. Thus, pitch and time-scale modification techniques as well as smoothing algorithms play a critical role in all concatenative-based systems. This thesis presents the developm ent of a concatenative TTS system based on a harm onic model and incorporating new pitch and time-scaling as well as smoothing algorithms. Experim ent has shown our system capable of both very high quality prosodic modification and synthesis. Results com pare very favourably with those of existing state-of-the-art systems

    Analysis and correction of the helium speech effect by autoregressive signal processing

    Get PDF
    SIGLELD:D48902/84 / BLDSC - British Library Document Supply CentreGBUnited Kingdo
    • …
    corecore