30 research outputs found
Glottal Source Cepstrum Coefficients Applied to NIST SRE 2010
Through the present paper, a novel feature set for speaker recognition based on glottal estimate information is presented. An iterative algorithm is used to derive the vocal tract and glottal source estimations from speech signal. In order to test the importance of glottal source information in speaker characterization, the novel feature set has been tested in the 2010 NIST Speaker Recognition Evaluation (NIST SRE10). The proposed system uses glottal estimate parameter templates and classical cepstral information to build a model for each speaker involved in the recognition process. ALIZE [1] open-source software has been used to create the GMM models for both background and target speakers. Compared to using mel-frequency cepstrum coefficients (MFCC), the misclassification rate for the NIST SRE 2010 reduced from 29.43% to 27.15% when glottal source features are use
Bio-inspired broad-class phonetic labelling
Recent studies have shown that the correct labeling of phonetic classes may help current Automatic Speech Recognition (ASR) when combined with classical parsing automata based on Hidden Markov Models (HMM).Through the present paper a method for Phonetic Class Labeling (PCL) based on bio-inspired speech processing is described. The methodology is based in the automatic detection of formants and formant trajectories after a careful separation of the vocal and glottal components of speech and in the operation of CF (Characteristic Frequency) neurons in the cochlear nucleus and cortical complex of the human auditory apparatus. Examples of phonetic class labeling are given and the applicability of the method to Speech Processing is discussed
Relevance of the glottal pulse and the vocal tract in gender detection
Gender detection is a very important objective to improve efficiency in tasks as speech or speaker recognition, among others. Traditionally gender detection has been focused on fundamental frequency (f0) and cepstral features derived from voiced segments of speech. The methodology presented here consists in obtaining uncorrelated glottal and vocal tract components which are parameterized as mel-frequency coefficients. K-fold and cross-validation using QDA and GMM classifiers showed that better detection rates are reached when glottal source and vocal tract parameters are used in a gender-balanced database of running speech from 340 speakers
Glottal-Source Spectral Biometry for Voice Characterization
The biometric signature derived from the estimation of the power spectral density singularities of a speakerâs glottal source is described in the present work. This consists in the collection of peak-trough profiles found in the spectral density, as related to the biomechanics of the vocal folds. Samples of parameter estimations from a set of 100 normophonic (pathology-free) speakers are produced. Mapping the set of speakerâs samples to a manifold defined by Principal Component Analysis and clustering them by k-means in terms of the most relevant principal components shows the separation of speakers by gender. This means that the proposed signature conveys relevant speakerâs metainformation, which may be useful in security and forensic applications for which contextual side information is considered relevant
Bio-inspired Dynamic Formant Tracking for Phonetic Labelling
It is a known fact that phonetic labeling may be relevant in helping current Automatic Speech Recognition (ASR) when combined with classical parsing systems as HMM's by reducing the search space. Through the present paper a method for Phonetic Broad-Class Labeling (PCL) based on speech perception in the high auditory centers is described. The methodology is based in the operation of CF (Characteristic Frequency) and FM (Frequency Modulation) neurons in the cochlear nucleus and cortical complex of the human auditory apparatus in the automatic detection of formants and formant dynamics on speech. Results obtained informant detection and dynamic formant tracking are given and the applicability of the method to Speech Processing is discussed
A Hybrid Parameterization Technique for Speaker Identification
Classical parameterization techniques for Speaker Identification use the codification of the power spectral density of raw speech, not discriminating between articulatory features produced by vocal tract dynamics (acoustic-phonetics) from glottal source biometry. Through the present paper a study is conducted to separate voicing fragments of speech into vocal and glottal components, dominated respectively by the vocal tract transfer function estimated adaptively to track the acoustic-phonetic sequence of the message, and by the glottal characteristics of the speaker and the phonation gesture. The separation methodology is based in Joint Process Estimation under the un-correlation hypothesis between vocal and glottal spectral distributions. Its application on voiced speech is presented in the time and frequency domains. The parameterization methodology is also described. Speaker Identification experiments conducted on 245 speakers are shown comparing different parameterization strategies. The results confirm the better performance of decoupled parameterization compared against approaches based on plain speech parameterization
Glottal Parameter Estimation by Wavelet Transform for Voice Biometry
Voice biometry is classically based on the parameterization and patterning of speech features mainly. The present approach is based on the characterization of phonation features instead (glottal features). The intention is to reduce intra-speaker variability due to the `text'. Through the study of larynx biomechanics it may be seen that the glottal correlates constitute a family of 2-nd order gaussian wavelets. The methodology relies in the extraction of glottal correlates (the glottal source) which are parameterized using wavelet techniques. Classification and pattern matching was carried out using Gaussian Mixture Models. Data of speakers from a balanced database and NIST SRE HASR2 were used in verification experiments. Preliminary results are given and discussed
Monitoring Neurological disease in Phonation
It is well known that many neurological diseases leave a fingerprint in voice and speech production. The dramatic impact of these pathologies in life quality is a growing concert. Many techniques have been designed for the detection, diagnose and monitoring the neurological disease. Most of them are costly or difficult to extend to primary services. The present paper shows that some neurological diseases can be traced a the level of voice production. The detection procedure would be based on a simple voice test. The availability of advanced tools and methodologies to monitor the organic pathology of voice would facilitate the implantation of these tests. The paper hypothesizes some of the underlying mechanisms affecting the production of voice and presents a general description of the methodological foundations for the voice analysis system which can estimate correlates to the neurological disease. A case of study is presented from spasmodic dysphonia to illustrate the possibilities of the methodology to monitor other neurological problems as well
Effectiveness of the mantente REAL program for preventing alcohol use in spanish adolescents
Mantente REAL is a school-based universal program to prevent drug use and other problematic behaviors specifically
designed to be implemented in schools at the beginning of adolescence. This program, which is a culturally adapted
version of the Keepinâ it REAL intervention, focuses on skills training for resisting social pressure to use drugs and
improving psychosocial development. This study aims to evaluate the effectiveness of Mantente REAL on alcohol use in
the Spanish context. The sample was composed of 755 adolescents from 12 state secondary schools in Spain, aged 11 to
15 (M = 12.24, SD = 0.56), 47.1% females. The 12 schools were randomly assigned to control and experimental groups, six
to each condition. Pre-test and post-test questionnaires data were collected to evaluate the effectiveness of the program.
The results indicated that a culturally adapted version of Mantente REAL was effective in preventing alcohol use among
youth from northern and southern Spain. Students participating in the program demonstrated changes in the desired
direction on alcohol frequency and intoxication episodes. Implications of these results regarding intervention programs
aimed at preventing substance use in adolescence are discussedâMantente REALâ es un programa universal que utiliza la escuela para prevenir el consumo de drogas y otras conductas
problemĂĄticas diseñado especĂficamente para ser implementado en las escuelas al comienzo de la adolescencia. Este
programa, que es una versiĂłn culturalmente adaptada de la intervenciĂłn Keepinâ it REAL, se centra en el entrenamiento
de habilidades para resistir la presiĂłn social para consumir drogas y mejorar el desarrollo psicosocial. Este estudio tiene
como objetivo evaluar la eficacia de âMantente REALâ en el consumo de alcohol en el contexto español. La muestra
estuvo compuesta por 755 adolescentes de 12 escuelas secundarias pĂșblicas en España, de 11 a 15 años (M = 12.24,
DT = 0.56), el 47.1% mujeres. Las 12 escuelas fueron asignadas aleatoriamente a grupo control y experimental, seis en
cada condición. Los datos se recopilaron a través de cuestionarios antes y después de la intervención para evaluar la
eficacia del programa. Los resultados indicaron que la versiĂłn culturalmente adaptada de âMantente REALâ fue eficaz
para prevenir el consumo de alcohol entre los jóvenes del norte y sur de España. Los estudiantes que participaron en
el programa demostraron cambios en la direcciĂłn deseada en la frecuencia del alcohol y los episodios de intoxicaciĂłn.
Se discuten las implicaciones de estos resultados con respecto a los programas de intervenciĂłn destinados a prevenir
el consumo de sustancias en la adolescenciaThis study was funded by the Global Center for Applied Health Research (Arizona State University) and supported by the Programa de Axudas ĂĄ etapa posdoutoral da Xunta de Galicia (ConsellerĂa de Cultura, EducaciĂłn e OrdenaciĂłn Universitaria) and by FEDER/Ministerio de Ciencia, InnovaciĂłn y Universidades â Agencia Estatal de InvestigaciĂłn (Grant PSI2015-65766-R) â under the Axuda para a consolidaciĂłn e estruturaciĂłn de unidades de investigaciĂłn competitivas e outras acciĂłns de fomento nas universidades do SUG (GRC, 2018)S
Decoupling Vocal Tract from Glottal Source Estimates in Speaker's Identification
Classical parameterization techniques in Speaker Identification tasks use the codification of the power spectral density of speech as a whole, not discriminating between articulatory features due to the dynamics of vocal tract (acoustic-phonetics) and those contributed by the glottal source. Through the present paper a study is conducted to separate voicing fragments of speech into vocal and glottal components, dominated respectively by the vocal tract transfer function estimated adaptively to track the acoustic-phonetic sequence of the message, and by the glottal characteristics of the speaker and the phonation gesture. In this way information which is conveyed in both components depending in different degree on message and biometry is estimated and treated differently to be fused at the time of template composition. The methodology to separate both components is based on the decorrelation hypothesis between vocal and glottal information and it is carried out using Joint Process Estimation. This methodology is briefly discussed and its application on vowel-like speech is presented as an example to observe the resulting estimates both in the time as in the frequency domain. The parameterization methodology to produce representative templates of the glottal and vocal components is also described. Speaker Identification experiments conducted on a wide database of 240 speakers is also given with comparative scorings obtained using different parameterization strategies. The results confirm the better performance of de-coupled parameterization techniques compared against approaches based on full speech parameterization