Search CORE

10 research outputs found

Reconstruction of Phonated Speech from Whispers Using Formant-Derived Plausible Pitch Modulation

Author: Beigi Homayoon
Hamid Reza Sharifzadeh
Ian V. Mcloughlin
Jingjie Li
Joliveau Elodie
McLoughlin Ian Vince
Netsell Ronald
Rothenberg Martin
Sharifzadeh Hamid Reza
Sharifzadeh Hamid Reza
Su Lim Tan
Sundberg Johan
Toda Tomoki
Yan Song
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/05/2015
Field of study

Whispering is a natural, unphonated, secondary aspect of speech communications for most people. However, it is the primary mechanism of communications for some speakers who have impaired voice production mechanisms, such as partial laryngectomees, as well as for those prescribed voice rest, which often follows surgery or damage to the larynx. Unlike most people, who choose when to whisper and when not to, these speakers may have little choice but to rely on whispers for much of their daily vocal interaction. Even though most speakers will whisper at times, and some speakers can only whisper, the majority of today’s computational speech technology systems assume or require phonated speech. This article considers conversion of whispers into natural-sounding phonated speech as a noninvasive prosthetic aid for people with voice impairments who can only whisper. As a by-product, the technique is also useful for unimpaired speakers who choose to whisper. Speech reconstruction systems can be classified into those requiring training and those that do not. Among the latter, a recent parametric reconstruction framework is explored and then enhanced through a refined estimation of plausible pitch from weighted formant differences. The improved reconstruction framework, with proposed formant-derived artificial pitch modulation, is validated through subjective and objective comparison tests alongside state-of-the-art alternatives

Crossref

Kent Academic Repository

An acoustic analysis of the Cantonese whispered tones

Author: Cheung Ka-yee
張嘉怡
Publication venue: The University of Hong Kong (Pokfulam, Hong Kong)
Publication date: 01/01/2004
Field of study

"A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, December 31, 2004."Also available in print.Thesis (B.Sc)--University of Hong Kong, 2004.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

HKU Scholars Hub

Tones in whispered Cantonese

Author: Lam Kam-shing
林錦成
Publication venue: The University of Hong Kong (Pokfulam, Hong Kong)
Publication date: 01/01/2010
Field of study

Includes bibliographical references (p. 28-30)."A dissertation submitted in partial fulfillment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, June 30, 2010."Thesis (B.Sc)--University of Hong Kong, 2010.Acoustic analysis and perceptual experiments were carried out to investigate the acoustical characteristics of tones in whispered Cantonese and to identify possible perceptual cues for tone identification. The isolated vowel /a/ embedded in a framing sentence produced by 20 (10 male and 10 female) native Cantonese speakers using modal and whispered phonation was recorded. Formant frequencies, duration and intensity of the vowels were measured from the samples using signal analysis software. During tone identification tasks, the speech samples were presented to 20 listeners who were native Cantonese speakers. The listeners were instructed to identify the tone of the target vowels in the presented sentences, based on which percent correct identification of tones was calculated. Results of the study reveal the role of second formant, duration, average intensity and intensity contours in perception of Cantonese whispered tones. Speaker’s maneuvers in production of whispered tones were also discussed.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

HKU Scholars Hub

Methods for speaking style conversion from normal speech to high vocal effort speech

Author: Ramírez López Ana
Publication venue: Aalto University, School of Arts, Design and Architecture, Department of Arts
Publication date: 01/01/2020
Field of study

This thesis deals with vocal-effort-focused speaking style conversion (SSC). Specifically, we studied two topics on conversion of normal speech to high vocal effort. The first topic involves the conversion of normal speech to shouted speech. We employed this conversion in a speaker recognition system with vocal effort mismatch between test and enrollment utterances (shouted speech vs. normal speech). The mismatch causes a degradation of the system's speaker identification performance. As solution, we proposed a SSC system that included a novel spectral mapping, used along a statistical mapping technique, to transform the mel-frequency spectral energies of normal speech enrollment utterances towards their counterparts in shouted speech. We evaluated the proposed solution by comparing speaker identification rates for a state-of-the-art i-vector-based speaker recognition system, with and without applying SSC to the enrollment utterances. Our results showed that applying the proposed SSC pre-processing to the enrollment data improves considerably the speaker identification rates. The second topic involves a normal-to-Lombard speech conversion. We proposed a vocoder-based parametric SSC system to perform the conversion. This system first extracts speech features using the vocoder. Next, a mapping technique, robust to data scarcity, maps the features. Finally, the vocoder synthesizes the mapped features into speech. We used two vocoders in the conversion system, for comparison: a glottal vocoder and the widely used STRAIGHT. We assessed the converted speech from the two vocoder cases with two subjective listening tests that measured similarity to Lombard speech and naturalness. The similarity subjective test showed that, for both vocoder cases, our proposed SSC system was able to convert normal speech to Lombard speech. The naturalness subjective test showed that the converted samples using the glottal vocoder were clearly more natural than those obtained with STRAIGHT

Aaltodoc Publication Archive

Artificial voicing of whispered speech

Author: Patrícia Cristina Ramalho de Oliveira
Publication venue
Publication date: 23/07/2015
Field of study

Repositório Aberto da Universidade do Porto

Regeneration of speech in voice-loss patients

Author: Ahmadi Farzaneh
McLoughlin Ian Vince
Sharifzadeh Hamid Reza
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

This paper considers regeneration of natural sounding speech from whisper-speech, produced by patients with vocal tract lesions affecting the glottis. Such reconstruction is important for both total and partial laryngectomy patients to improve on the monotonous robotized sound typical of electrolarynx devices. Reconstruction of speech from whispers has been demonstrated previously, however the resulting speech does not exhibit particularly high intelligibility, and more importantly, sounds un-natural. It is the conjecture of the authors that limited pitch variations in the reconstructed speech contributes most to that lack of naturalness. In this paper, a method for pitch contour variation in reconstructed speech is presented. This method extracts voice factors which are important to ‘naturalness’ from the whispered signal and applies these to the reconstructed speech. The method is based upon our previous published work which implemented an analysis-by-synthesis approach to voice reconstruction using a modified CELP codec

Kent Academic Repository

Western Sydney ResearchDirect