Search CORE

697 research outputs found

A Hybrid Parameterization Technique for Speaker Identification

Author: Fernández-Baillo Gallego de la Sacristana Roberto
Gómez Vilda Pedro
Martínez Olalla Rafael
Mazaira Fernández Luis Miguel
Muñoz Cristina
Nieto Lluis Victor
Rodellar Biarge M. Victoria
Álvarez Marquina Agustin
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2008
Field of study

Classical parameterization techniques for Speaker Identification use the codification of the power spectral density of raw speech, not discriminating between articulatory features produced by vocal tract dynamics (acoustic-phonetics) from glottal source biometry. Through the present paper a study is conducted to separate voicing fragments of speech into vocal and glottal components, dominated respectively by the vocal tract transfer function estimated adaptively to track the acoustic-phonetic sequence of the message, and by the glottal characteristics of the speaker and the phonation gesture. The separation methodology is based in Joint Process Estimation under the un-correlation hypothesis between vocal and glottal spectral distributions. Its application on voiced speech is presented in the time and frequency domains. The parameterization methodology is also described. Speaker Identification experiments conducted on 245 speakers are shown comparing different parameterization strategies. The results confirm the better performance of decoupled parameterization compared against approaches based on plain speech parameterization

A quantitative assessment of group delay methods for identifying glottal closures in voiced speech

Author: Brookes M
Gudnason J
Naylor PA
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Published versio

CiteSeerX

Spiral - Imperial College Digital Repository

Improved Emotion Recognition Using Gaussian Mixture Model and Extreme Learning Machine in Speech and Glottal Signals

Author: Hariharan Muthusamy
Kemal Polat
Sazali Yaacob
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Recently, researchers have paid escalating attention to studying the emotional state of an individual from his/her speech signals as the speech signal is the fastest and the most natural method of communication between individuals. In this work, new feature enhancement using Gaussian mixture model (GMM) was proposed to enhance the discriminatory power of the features extracted from speech and glottal signals. Three different emotional speech databases were utilized to gauge the proposed methods. Extreme learning machine (ELM) and k-nearest neighbor (kNN) classifier were employed to classify the different types of emotions. Several experiments were conducted and results show that the proposed methods significantly improved the speech emotion recognition performance compared to research works published in the literature

Directory of Open Access Journals

Glottal-synchronous speech processing

Author: Thomas Mark R P
Thomas Mark R P
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2010
Field of study

Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

Spiral - Imperial College Digital Repository

The quantal larynx: The stable regions of laryngeal biomechanics and implications for speech production

Author: Gick B.
Moisik S.
Publication venue: 'American Speech Language Hearing Association'
Publication date: 01/01/2017
Field of study

Purpose: Recent proposals suggest that (a) the high dimensionality of speech motor control may be reduced via modular neuromuscular organization that takes advantage of intrinsic biomechanical regions of stability and (b) computational modeling provides a means to study whether and how such modularization works. In this study, the focus is on the larynx, a structure that is fundamental to speech production because of its role in phonation and numerous articulatory functions. Method: A 3-dimensional model of the larynx was created using the ArtiSynth platform (http://www.artisynth.org). This model was used to simulate laryngeal articulatory states, including inspiration, glottal fricative, modal prephonation, plain glottal stop, vocal–ventricular stop, and aryepiglotto– epiglottal stop and fricative. Results: Speech-relevant laryngeal biomechanics is rich with “quantal” or highly stable regions within muscle activation space. Conclusions: Quantal laryngeal biomechanics complement a modular view of speech control and have implications for the articulatory–biomechanical grounding of numerous phonetic and phonological phenomen