12 research outputs found
Recommended from our members
A novel framework for high-quality voice source analysis and synthesis
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The analysis, parameterization and modeling of voice source estimates obtained via inverse filtering of recorded speech are some of the most challenging areas of speech processing owing to the fact humans produce a wide range of voice source realizations and that the voice source estimates commonly contain artifacts due to the non-linear time-varying source-filter coupling. Currently, the most widely adopted representation of voice source signal is Liljencrants-Fant's (LF) model which was developed in late 1985. Due to the overly simplistic interpretation of voice source dynamics, LF model can not represent the fine temporal structure of glottal flow derivative realizations nor can it carry the sufficient spectral richness to facilitate a truly natural sounding speech synthesis. In this thesis we have introduced Characteristic Glottal Pulse Waveform Parameterization and Modeling (CGPWPM) which constitutes an entirely novel framework for voice source analysis, parameterization and reconstruction. In comparative evaluation of CGPWPM and LF model we have demonstrated that the proposed method is able to preserve higher levels of speaker dependant information from the voice source estimates and realize a more natural sounding speech synthesis. In general, we have shown that CGPWPM-based speech synthesis rates highly on the scale of absolute perceptual acceptability and that speech signals are faithfully reconstructed on consistent basis, across speakers, gender. We have applied CGPWPM to voice quality profiling and text-independent voice quality conversion method. The proposed voice conversion method is able to achieve the desired perceptual effects and the modified
speech remained as natural sounding and intelligible as natural speech. In this thesis, we have also developed an optimal wavelet thresholding strategy for voice source signals which is able to suppress aspiration noise and still retain both the slow and the rapid variations in the voice source estimate
The relationships among physiological, acoustical, and perceptual measures of vocal effort
The purpose of this work was to explore the physiological mechanisms of vocal effort, the acoustical manifestation of vocal effort, and the perceptual interpretation of vocal effort by speakers and listeners. The first study evaluated four proposed mechanisms of vocal effort specific to the larynx: intrinsic laryngeal tension, extrinsic laryngeal tension, supraglottal compression, and subglottal pressure. Twenty-six healthy adults produced modulations of vocal effort (mild, moderate, maximal) and rate (slow, typical, fast), followed by self-ratings of vocal effort on a visual analog scale. Ten physiological measures across the four hypothesized mechanisms were captured via high-speed flexible laryngoscopy, surface electromyography, and neck-surface accelerometry. A mixed-effects backward stepwise regression analysis revealed that estimated subglottal pressure, mediolateral supraglottal compression, and a normalized percent activation of extrinsic suprahyoid muscles significantly increased as ratings of vocal effort increased (R2 = .60). The second study had twenty inexperienced listeners rate vocal effort on the speech recordings from the first study (typical, mild, moderate, and maximal effort) via a visual sort-and-rate method. A set of acoustical measures were calculated, including amplitude-, time-, spectral-, and cepstral-based measures. Two separate mixed-effects regression models determined the relationship between the acoustical predictors and speaker and listener ratings. Results indicated that mean sound pressure level, low-to-high spectral ratio, and harmonic-to-noise ratio significantly predicted speaker and listener ratings. Mean fundamental frequency (measured as change in semitones from typical productions) and relative fundamental frequency offset cycle 10 were also significant predictors of listener ratings. The acoustical predictors accounted for 72% and 82% of the variance in speaker and listener ratings, respectively. Speaker and listener ratings were also highly correlated (average r = .86). From these two studies, we determined that vocal effort is a complex physiological process that is mediated by changes in laryngeal configuration and subglottal pressure. The self-perception of vocal effort is related to the acoustical properties underlying these physiological changes. Listeners appear to rely on the same acoustical manifestations as speakers, yet incorporate additional time-based acoustical cues during perceptual judgments. Future work should explore the physiological, acoustical, and perceptual measures identified here in speakers with voice disorders.2019-07-06T00:00:00