10 research outputs found
Reconstruction of Phonated Speech from Whispers Using Formant-Derived Plausible Pitch Modulation
Whispering is a natural, unphonated, secondary aspect of speech communications for most people. However, it is the primary mechanism of communications for some speakers who have impaired voice production mechanisms, such as partial laryngectomees, as well as for those prescribed voice rest, which often follows surgery or damage to the larynx. Unlike most people, who choose when to whisper and when not to, these speakers may have little choice but to rely on whispers for much of their daily vocal interaction.
Even though most speakers will whisper at times, and some speakers can only whisper, the majority of todayâs computational speech technology systems assume or require phonated speech. This article considers conversion of whispers into natural-sounding phonated speech as a noninvasive prosthetic aid for people with voice impairments who can only whisper. As a by-product, the technique is also useful for unimpaired speakers who choose to whisper.
Speech reconstruction systems can be classified into those requiring training and those that do not. Among the latter, a recent parametric reconstruction framework is explored and then enhanced through a refined estimation of plausible pitch from weighted formant differences. The improved reconstruction framework, with proposed formant-derived artificial pitch modulation, is validated through subjective and objective comparison tests alongside state-of-the-art alternatives
An acoustic analysis of the Cantonese whispered tones
"A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, December 31, 2004."Also available in print.Thesis (B.Sc)--University of Hong Kong, 2004.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science
Tones in whispered Cantonese
Includes bibliographical references (p. 28-30)."A dissertation submitted in partial fulfillment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, June 30, 2010."Thesis (B.Sc)--University of Hong Kong, 2010.Acoustic analysis and perceptual experiments were carried out to investigate the acoustical
characteristics of tones in whispered Cantonese and to identify possible perceptual cues for tone
identification. The isolated vowel /a/ embedded in a framing sentence produced by 20 (10 male
and 10 female) native Cantonese speakers using modal and whispered phonation was recorded.
Formant frequencies, duration and intensity of the vowels were measured from the samples using
signal analysis software. During tone identification tasks, the speech samples were presented to
20 listeners who were native Cantonese speakers. The listeners were instructed to identify the
tone of the target vowels in the presented sentences, based on which percent correct
identification of tones was calculated. Results of the study reveal the role of second formant,
duration, average intensity and intensity contours in perception of Cantonese whispered tones.
Speakerâs maneuvers in production of whispered tones were also discussed.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science
Methods for speaking style conversion from normal speech to high vocal effort speech
This thesis deals with vocal-effort-focused speaking style conversion (SSC). Specifically, we studied two topics on conversion of normal speech to high vocal effort. The first topic involves the conversion of normal speech to shouted speech. We employed this conversion in a speaker recognition system with vocal effort mismatch between test and enrollment utterances (shouted speech vs. normal speech). The mismatch causes a degradation of the system's speaker identification performance. As solution, we proposed a SSC system that included a novel spectral mapping, used along a statistical mapping technique, to transform the mel-frequency spectral energies of normal speech enrollment utterances towards their counterparts in shouted speech. We evaluated the proposed solution by comparing speaker identification rates for a state-of-the-art i-vector-based speaker recognition system, with and without applying SSC to the enrollment utterances. Our results showed that applying the proposed SSC pre-processing to the enrollment data improves considerably the speaker identification rates.
The second topic involves a normal-to-Lombard speech conversion. We proposed a vocoder-based parametric SSC system to perform the conversion. This system first extracts speech features using the vocoder. Next, a mapping technique, robust to data scarcity, maps the features. Finally, the vocoder synthesizes the mapped features into speech. We used two vocoders in the conversion system, for comparison: a glottal vocoder and the widely used STRAIGHT. We assessed the converted speech from the two vocoder cases with two subjective listening tests that measured similarity to Lombard speech and naturalness. The similarity subjective test showed that, for both vocoder cases, our proposed SSC system was able to convert normal speech to Lombard speech. The naturalness subjective test showed that the converted samples using the glottal vocoder were clearly more natural than those obtained with STRAIGHT
Regeneration of speech in voice-loss patients
This paper considers regeneration of natural sounding speech from whisper-speech, produced by patients with vocal tract lesions affecting the glottis. Such reconstruction is important for both total and partial laryngectomy patients to improve on the monotonous robotized sound typical of electrolarynx devices.
Reconstruction of speech from whispers has been demonstrated previously, however the resulting speech does not exhibit particularly high intelligibility, and more importantly, sounds un-natural. It is the conjecture of the authors that limited pitch variations in the reconstructed speech contributes most to that lack of naturalness.
In this paper, a method for pitch contour variation in reconstructed speech is presented. This method extracts voice factors which are important to ânaturalnessâ from the whispered signal and applies these to the reconstructed speech. The method is based upon our previous published work which implemented an analysis-by-synthesis approach to voice reconstruction using a modified CELP codec