Search CORE

3 research outputs found

HMM-based speech synthesiser using the LF-model of the glottal source

Author: Cabral J.
Renals Steve
Richmond K.
Yamagishi J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2011
Field of study

A major factor which causes a deterioration in speech quality in HMM-based speech synthesis is the use of a simple delta pulse signal to generate the excitation of voiced speech. This paper sets out a new approach to using an acoustic glottal source model in HMM-based synthesisers instead of the traditional pulse signal. The goal is to improve speech quality and to better model and transform voice characteristics. We have found the new method decreases buzziness and also improves prosodic modelling. A perceptual evaluation has supported this finding by showing a 55.6 % preference for the new system, as against the baseline. This improvement, while not being as significant as we had initially expected, does encourage us to work on developing the proposed speech synthesiser further

CiteSeerX

Crossref

Edinburgh Research Explorer

Automatic LF-model fitting to the glottal source waveform by extended Kalman filtering

Author: Li Haoxuan
O'Brien Darragh
Scaife Ronan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/08/2012
Field of study

A new method for automatically fitting the Liljencrants-Fant (LF) model to the time domain waveform of the glottal flow derivative is presented in this paper. By applying an extended Kalman filter (EKF) to track the LF-model shape-controlling parameters and dynamically searching for a globally minimal fitting error, the algorithm can accurately fit the LF-model to the inverse filtered glottal flow derivative. Experimental results show that the method has better performance for both synthetic and real speech signals compared to a standard time-domain LF-model fitting algorithm. By offering a new method to estimate the glottal source LF-model parameters, the proposed algorithm can be utilised in many applications

ZENODO

Irish Universities

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

DCU Online Research Access Service

Glottal source parametrisation by multi-estimate fusion

Author: Li Haoxuan
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/11/2013
Field of study

Glottal source information has been proven useful in many applications such as speech synthesis, speaker characterisation, voice transformation and pathological speech diagnosis. However, currently no single algorithm can extract reliable glottal source estimates across a wide range of speech signals. This thesis describes an investigation into glottal source parametrisation, including studies, proposals and evaluations on glottal waveform extraction, glottal source modelling by Liljencrants-Fant (LF) model ﬁtting and a new multi-estimate fusion framework. As one of the critical steps in voice source parametrisation, glottal waveform extraction techniques are reviewed. A performance study is carried out on three existing glottal inverse ﬁltering approaches and results conﬁrm that no single algorithm consistently outperforms others and provide a reliable and accurate estimate for diﬀerent speech signals. The next step is modelling the extracted glottal ﬂow. To more accurately estimate the glottal source parameters, a new time-domain LF-model ﬁtting algorithm by extended Kalman ﬁlter is proposed. The algorithm is evaluated by comparing it with a standard time-domain method and a spectral approach. Results show the proposed ﬁtting method is superior to existing ﬁtting methods. To obtain accurate glottal source estimates for different speech signals, a multi-estimate (ME) fusion framework is proposed. In the framework different algorithms are applied in parallel to extract multiple sets of LF-model estimates which are then combined by quantitative data fusion. The ME fusion approach is implemented and tested in several ways. The novel fusion framework is shown to be able to give more reliable glottal LF-model estimates than any single algorithm

Irish Universities

DCU Online Research Access Service