Search CORE

102 research outputs found

Phase-Distortion-Robust Voice-Source Analysis

Author: O\u27Cinneide Alan
Publication venue: Dublin Institute of Technology
Publication date: 01/02/2012
Field of study

This work concerns itself with the analysis of voiced speech signals, in particular the analysis of the glottal source signal. Following the source-filter theory of speech, the glottal signal is produced by the vibratory behaviour of the vocal folds and is modulated by the resonances of the vocal tract and radiation characteristic of the lips to form the speech signal. As it is thought that the glottal source signal contributes much of the non-linguistic and prosodical information to speech, it is useful to develop techniques which can estimate and parameterise this signal accurately. Because of vocal tract modulation, estimating the glottal source waveform from the speech signal is a blind deconvolution problem which necessarily makes assumptions about the characteristics of both the glottal source and vocal tract. A common assumption is that the glottal signal and/or vocal tract can be approximated by a parametric model. Other assumptions include the causality of the speech signal: the vocal tract is assumed to be a minimum phase system while the glottal source is assumed to exhibit mixed phase characteristics. However, as the literature review within this thesis will show, the error criteria utilised to determine the parameters are not robust to the conditions under which the speech signal is recorded, and are particularly degraded in the common scenario where low frequency phase distortion is introduced. Those that are robust to this type of distortion are not well suited to the analysis of real-world signals. This research proposes a voice-source estimation and parameterisation technique, called the Power-spectrum-based determination of the Rd parameter (PowRd) method. Illustrated by theory and demonstrated by experiment, the new technique is robust to the time placement of the analysis frame and phase issues that are generally encountered during recording. The method assumes that the derivative glottal flow signal is approximated by the transformed Liljencrants-Fant model and that the vocal tract can be represented by an all-pole filter. Unlike many existing glottal source estimation methods, the PowRd method employs a new error criterion to optimise the parameters which is also suitable to determine the optimal vocal-tract filter order. In addition to the issue of glottal source parameterisation, nonlinear phase recording conditions can also adversely affect the results of other speech processing tasks such as the estimation of the instant of glottal closure. In this thesis, a new glottal closing instant estimation algorithm is proposed which incorporates elements from the state-of-the-art techniques and is specifically designed for operation upon speech recorded under nonlinear phase conditions. The new method, called the Fundamental RESidual Search or FRESS algorithm, is shown to estimate the glottal closing instant of voiced speech with superior precision and comparable accuracy as other existing methods over a large database of real speech signals under real and simulated recording conditions. An application of the proposed glottal source parameterisation method and glottal closing instant detection algorithm is a system which can analyse and re-synthesise voiced speech signals. This thesis describes perceptual experiments which show that, iunder linear and nonlinear recording conditions, the system produces synthetic speech which is generally preferred to speech synthesised based upon a state-of-the-art timedomain- based parameterisation technique. In sum, this work represents a movement towards flexible and robust voice-source analysis, with potential for a wide range of applications including speech analysis, modification and synthesis

Arrow@TUDublin

Glottal-synchronous speech processing

Author: Thomas Mark R P
Thomas Mark R P
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2010
Field of study

Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

Spiral - Imperial College Digital Repository

OpenGrey Repository

Automatic LF-model fitting to the glottal source waveform by extended Kalman filtering

Author: Li Haoxuan
O'Brien Darragh
Scaife Ronan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/08/2012
Field of study

A new method for automatically fitting the Liljencrants-Fant (LF) model to the time domain waveform of the glottal flow derivative is presented in this paper. By applying an extended Kalman filter (EKF) to track the LF-model shape-controlling parameters and dynamically searching for a globally minimal fitting error, the algorithm can accurately fit the LF-model to the inverse filtered glottal flow derivative. Experimental results show that the method has better performance for both synthetic and real speech signals compared to a standard time-domain LF-model fitting algorithm. By offering a new method to estimate the glottal source LF-model parameters, the proposed algorithm can be utilised in many applications

ZENODO

Irish Universities

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

DCU Online Research Access Service

Robust tracking of glottal LF-model parameters by multi-estimate fusion

Author: Li Haoxuan
O'Brien Darragh
Scaife Ronan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/08/2012
Field of study

A new approach to robust tracking of glottal LF-model parameters is presented. The approach does not rely on a new glottal source estimation algorithm, but instead introduces a new extensible multi-estimate fusion framework. Within this framework several existing algorithms are applied in parallel to extract glottal LF-model parameter estimates which are subsequently passed to quantitative data fusion procedures. The preliminary implementation of the fusion algorithm described here incorporates three glottal inverse filtering methods and one time-domain LF-model fitting algorithm. Experimental results for both synthetic and natural speech signals demonstrate the effectiveness of the fusion algorithm. The proposed method is flexible and can be easily extended for other speech processing applications such as speech synthesis, speaker identification and prosody analysis

ZENODO

Irish Universities

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

DCU Online Research Access Service

Recommended from our members

A novel framework for high-quality voice source analysis and synthesis

Author: Turajlic Emir
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2006
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The analysis, parameterization and modeling of voice source estimates obtained via inverse filtering of recorded speech are some of the most challenging areas of speech processing owing to the fact humans produce a wide range of voice source realizations and that the voice source estimates commonly contain artifacts due to the non-linear time-varying source-filter coupling. Currently, the most widely adopted representation of voice source signal is Liljencrants-Fant's (LF) model which was developed in late 1985. Due to the overly simplistic interpretation of voice source dynamics, LF model can not represent the fine temporal structure of glottal flow derivative realizations nor can it carry the sufficient spectral richness to facilitate a truly natural sounding speech synthesis. In this thesis we have introduced Characteristic Glottal Pulse Waveform Parameterization and Modeling (CGPWPM) which constitutes an entirely novel framework for voice source analysis, parameterization and reconstruction. In comparative evaluation of CGPWPM and LF model we have demonstrated that the proposed method is able to preserve higher levels of speaker dependant information from the voice source estimates and realize a more natural sounding speech synthesis. In general, we have shown that CGPWPM-based speech synthesis rates highly on the scale of absolute perceptual acceptability and that speech signals are faithfully reconstructed on consistent basis, across speakers, gender. We have applied CGPWPM to voice quality profiling and text-independent voice quality conversion method. The proposed voice conversion method is able to achieve the desired perceptual effects and the modified speech remained as natural sounding and intelligible as natural speech. In this thesis, we have also developed an optimal wavelet thresholding strategy for voice source signals which is able to suppress aspiration noise and still retain both the slow and the rapid variations in the voice source estimate

Brunel University Research Archive

Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference

Directory of Open Access Books (DOAB)

Glottal source parametrisation by multi-estimate fusion

Author: Li Haoxuan
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/11/2013
Field of study

Glottal source information has been proven useful in many applications such as speech synthesis, speaker characterisation, voice transformation and pathological speech diagnosis. However, currently no single algorithm can extract reliable glottal source estimates across a wide range of speech signals. This thesis describes an investigation into glottal source parametrisation, including studies, proposals and evaluations on glottal waveform extraction, glottal source modelling by Liljencrants-Fant (LF) model ﬁtting and a new multi-estimate fusion framework. As one of the critical steps in voice source parametrisation, glottal waveform extraction techniques are reviewed. A performance study is carried out on three existing glottal inverse ﬁltering approaches and results conﬁrm that no single algorithm consistently outperforms others and provide a reliable and accurate estimate for diﬀerent speech signals. The next step is modelling the extracted glottal ﬂow. To more accurately estimate the glottal source parameters, a new time-domain LF-model ﬁtting algorithm by extended Kalman ﬁlter is proposed. The algorithm is evaluated by comparing it with a standard time-domain method and a spectral approach. Results show the proposed ﬁtting method is superior to existing ﬁtting methods. To obtain accurate glottal source estimates for different speech signals, a multi-estimate (ME) fusion framework is proposed. In the framework different algorithms are applied in parallel to extract multiple sets of LF-model estimates which are then combined by quantitative data fusion. The ME fusion approach is implemented and tested in several ways. The novel fusion framework is shown to be able to give more reliable glottal LF-model estimates than any single algorithm

Irish Universities

DCU Online Research Access Service

Detection of Irregular Phonation in Speech

Author: Vishnubhotla Srikanth
Publication venue
Publication date: 28/02/2007
Field of study

This work addresses the detection & characterization of irregular phonation in spontaneous speech. While published work tackles this problem as a two-hypothesis problem only in regions of speech with phonation, this work focuses on distinguishing aperiodicity due to frication from that due to irregular voicing. This work also deals with correction of a current pitch tracking algorithm in regions of irregular phonation, where most pitch trackers fail to perform well. Relying on the detection of regions of irregular phonation, an acoustic parameter is developed in order to characterize these regions for speaker identification applications. The detection performance of the algorithm on a clean speech corpus (TIMIT) is seen to be 91.8%, with the percentage of false detections being 17.42%. On telephone speech corpus (NIST 98) database, the detection performance is 89.2%, with the percentage of false detections being 12.8%. The pitch detection accuracy increased from 95.4% to 98.3% for TIMIT, and from94.8% to 97.4% for NIST 98 databases. The creakiness parameter was added to a set of seven acoustic parameters for speaker identification on the NIST 98 database, and the performance was found to be enhanced by 1.5% for female speakers and 0.4% for male speakers for a population of 250 speakers

Digital Repository at the University of Maryland