Search CORE

510 research outputs found

Relevance of the glottal pulse and the vocal tract in gender detection

Author: Gómez Vilda Pedro
Martínez Olalla Rafael
Mazaira Fernández Luis Miguel
Muñoz Mulas Cristina
Álvarez Marquina Agustín
Publication venue: E.T.S. de Ingenieros Informáticos (UPM)
Publication date: 01/09/2013
Field of study

Gender detection is a very important objective to improve efficiency in tasks as speech or speaker recognition, among others. Traditionally gender detection has been focused on fundamental frequency (f0) and cepstral features derived from voiced segments of speech. The methodology presented here consists in obtaining uncorrelated glottal and vocal tract components which are parameterized as mel-frequency coefficients. K-fold and cross-validation using QDA and GMM classifiers showed that better detection rates are reached when glottal source and vocal tract parameters are used in a gender-balanced database of running speech from 340 speakers

Archivo Digital UPM

Glottal Parameter Estimation by Wavelet Transform for Voice Biometry

Author: Gómez Vilda Pedro
Martínez Olalla Rafael
Mazaira Fernández Luis Miguel
Muñoz Mulas Cristina
Rodellar Biarge M. Victoria
Álvarez Marquina Agustin
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2011
Field of study

Voice biometry is classically based on the parameterization and patterning of speech features mainly. The present approach is based on the characterization of phonation features instead (glottal features). The intention is to reduce intra-speaker variability due to the `text'. Through the study of larynx biomechanics it may be seen that the glottal correlates constitute a family of 2-nd order gaussian wavelets. The methodology relies in the extraction of glottal correlates (the glottal source) which are parameterized using wavelet techniques. Classification and pattern matching was carried out using Gaussian Mixture Models. Data of speakers from a balanced database and NIST SRE HASR2 were used in verification experiments. Preliminary results are given and discussed

Archivo Digital UPM

Cepstral peak prominence: a comprehensive analysis

Author: Abramowitz
Alpan
Alpan
Alpan
Awan
Awan
Awan
Awan
Awan
Balasubramanium
Balasubramanium
Blankenship
Cannito
Chen
Childers
Childers
Clapham
Dejonckere
Eadie
Esposito
Esposito
Ferrer
Fraile
Fraj
Haderlein
Haderlein
Halberstam
Hartl
Hartl
Hartl
Haykin
Heman-Ackah
Heman-Ackah
Heman-Ackah
Hillenbrand
Hillenbrand
Howard
Juan Ignacio Godino-Llorente
Kumar
Leong
Lowell
Lowell
Maryn
Maryn
Maryn
Medhurst
Mehta
Mehta
Merk
Moers
Murphy
Murphy
Murphy
Nagle
Noll
Oppenheim
Oppenheim
Peterson
Rabiner
Rosa
Rubén Fraile
Samlan
Samlan
Shanmugan
Shrivastav
Shrivastav
Shue
Solomon
Story
Vasilakis
Vipperla
Watts
Wolfe
Wolfe
Yap
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

An analytical study of cepstral peak prominence (CPP) is presented, intended to provide an insight into its meaning and relation with voice perturbation parameters. To carry out this analysis, a parametric approach is adopted in which voice production is modelled using the traditional source-filter model and the first cepstral peak is assumed to have Gaussian shape. It is concluded that the meaning of CPP is very similar to that of the first rahmonic and some insights are provided on its dependence with fundamental frequency and vocal tract resonances. It is further shown that CPP integrates measures of voice waveform and periodicity perturbations, be them either amplitude, frequency or noise

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Archivo Digital UPM

Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference

Directory of Open Access Books (DOAB)

Analysis and Detection of Pathological Voice using Glottal Source Features

Author: Alku Paavo
Kadiri Sudarsana Reddy
Publication venue
Publication date: 25/09/2023
Field of study

Automatic detection of voice pathology enables objective assessment and earlier intervention for the diagnosis. This study provides a systematic analysis of glottal source features and investigates their effectiveness in voice pathology detection. Glottal source features are extracted using glottal flows estimated with the quasi-closed phase (QCP) glottal inverse filtering method, using approximate glottal source signals computed with the zero frequency filtering (ZFF) method, and using acoustic voice signals directly. In addition, we propose to derive mel-frequency cepstral coefficients (MFCCs) from the glottal source waveforms computed by QCP and ZFF to effectively capture the variations in glottal source spectra of pathological voice. Experiments were carried out using two databases, the Hospital Universitario Principe de Asturias (HUPA) database and the Saarbrucken Voice Disorders (SVD) database. Analysis of features revealed that the glottal source contains information that discriminates normal and pathological voice. Pathology detection experiments were carried out using support vector machine (SVM). From the detection experiments it was observed that the performance achieved with the studied glottal source features is comparable or better than that of conventional MFCCs and perceptual linear prediction (PLP) features. The best detection performance was achieved when the glottal source features were combined with the conventional MFCCs and PLP features, which indicates the complementary nature of the features

arXiv.org e-Print Archive

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

Directory of Open Access Books (DOAB)

Estimating tremor in Vocal Fold Biomechanics for Neurological Disease characterisation

Author: Fernández Fernández Mario
Gómez Vilda Pedro
Martínez Olalla Rafael
Mazaira Fernández Luis Miguel
Muñoz Mulas Cristina Elena
Nieto Lluis Victor
Ramírez Calvo Carlos
Rodellar Biarge M. Victoria
Álvarez Marquina Agustin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Neurological Diseases (ND) are affecting larger segments of aging population every year. Treatment is dependent on expensive accurate and frequent monitoring. It is well known that ND leave correlates in speech and phonation. The present work shows a method to detect alterations in vocal fold tension during phonation. These may appear either as hypertension or as cyclical tremor. Estimations of tremor may be produced by auto-regressive modeling of the vocal fold tension series in sustained phonation. The correlates obtained are a set of cyclicality coefficients, the frequency and the root mean square amplitude of the tremor. Statistical distributions of these correlates obtained from a set of male and female subjects are presented. Results from five study cases of female voice are also given

Archivo Digital UPM

Speech Communication

Author: Agrawal Shyam S.
Allen Jonathan
Barotti Barbara B.
Bickley Corine A.
Boyce Suzanne E.
Chen Marilyn Y.
Chenausky Karen
Cheng Howard
Cheyne Harold A.
Choi Jeung-Yoon
Chuang Erika S.
Dilley Laura C.
Du Limin
Esposito Anna
Espy-Wilson Carol Y.
Govindarajan Krishna K.
Gow David W.
Guiod Peter C.
Hagen Astrid
Hall Seth M.
Halle Morris
Hanna Emily J.
Hanson Helen M.
Harms Michael P.
Harrell Dameon
Hasegawa-Johnson Mark A.
Hillman Robert E.
Holmberg Eva B.
Horowitz David M.
Huang Caroline B.
Keyser Samuel J.
Knobel Mark D.
Kuo Hong-Kwang J.
Lada Genevieve R.
Lane Harlan L.
LePrell Glenn S.
Makhoul John I.
Manuel Sharon Y.
Matthies Melanie L.
McGowan Richard S.
Perez Adrian D.
Perkell Joseph S.
Perrier Pascal H.
Poort Kelly L.
Prahler Adrienne M.
Qi Yingyong
Shattuck-Hufnagel Stefanie
Slifka Janet L.
Smith Jason L.
Stevens Kenneth N.
Sun Walter
Tanaja Hemant
Turk Alice E.
Vick Jennell C.
Wilde Lorin F.
Wilhelms-Tricarico Reiner
Williams David R.
Wint Arlene E.
Wozniak Jane W.
Zandipour Majid
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date
Field of study

Contains table of contents for Part V, table of contents for Section 1, reports on six research projects and a list of publications.C.J. Lebel FellowshipDennis Klatt Memorial FundNational Institutes of Health Grant R01-DC00075National Institutes of Health Grant R01-DC01291National Institutes of Health Grant R01-DC01925National Institutes of Health Grant R01-DC02125National Institutes of Health Grant R01-DC02978National Institutes of Health Grant R01-DC03007National Institutes of Health Grant R29-DC02525National Institutes of Health Grant F32-DC00194National Institutes of Health Grant F32-DC00205National Institutes of Health Grant T32-DC00038National Science Foundation Grant IRI 89-05249National Science Foundation Grant IRI 93-14967National Science Foundation Grant INT 94-2114

DSpace@MIT

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

Directory of Open Access Books (DOAB)

Recommended from our members

A novel framework for high-quality voice source analysis and synthesis

Author: Turajlic Emir
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2006
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The analysis, parameterization and modeling of voice source estimates obtained via inverse filtering of recorded speech are some of the most challenging areas of speech processing owing to the fact humans produce a wide range of voice source realizations and that the voice source estimates commonly contain artifacts due to the non-linear time-varying source-filter coupling. Currently, the most widely adopted representation of voice source signal is Liljencrants-Fant's (LF) model which was developed in late 1985. Due to the overly simplistic interpretation of voice source dynamics, LF model can not represent the fine temporal structure of glottal flow derivative realizations nor can it carry the sufficient spectral richness to facilitate a truly natural sounding speech synthesis. In this thesis we have introduced Characteristic Glottal Pulse Waveform Parameterization and Modeling (CGPWPM) which constitutes an entirely novel framework for voice source analysis, parameterization and reconstruction. In comparative evaluation of CGPWPM and LF model we have demonstrated that the proposed method is able to preserve higher levels of speaker dependant information from the voice source estimates and realize a more natural sounding speech synthesis. In general, we have shown that CGPWPM-based speech synthesis rates highly on the scale of absolute perceptual acceptability and that speech signals are faithfully reconstructed on consistent basis, across speakers, gender. We have applied CGPWPM to voice quality profiling and text-independent voice quality conversion method. The proposed voice conversion method is able to achieve the desired perceptual effects and the modified speech remained as natural sounding and intelligible as natural speech. In this thesis, we have also developed an optimal wavelet thresholding strategy for voice source signals which is able to suppress aspiration noise and still retain both the slow and the rapid variations in the voice source estimate

Brunel University Research Archive