81 research outputs found
Cepstral trajectories in linguistic units for text-independent speaker recognition
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-35292-8_3Proceedings of IberSPEECH, held in Madrid (Spain) on 2012.In this paper, the contributions of different linguistic units to the speaker recognition task are explored by means of temporal trajectories of their MFCC features. Inspired by successful work in forensic speaker identification, we extend the approach based on temporal contours of formant frequencies in linguistic units to design a fully automatic system that puts together both forensic and automatic speaker recognition worlds. The combination of MFCC features and unit-dependent trajectories provides a powerful tool to extract individualizing information. At a fine-grained level, we provide a calibrated likelihood ratio per linguistic unit under analysis (extremely useful in applications such as forensics), and at a coarse-grained level, we combine the individual contributions of the different units to obtain a highly discriminative single system. This approach has been tested with NIST SRE 2006 datasets and protocols, consisting of 9,720 trials from 219 male speakers for the 1side-1side English-only task, and development data being extracted from 367 male speakers from 1,808 conversations from NIST SRE 2004 and 2005 datasetsSupported by MEC grant PR-2010-123, MICINN project TEC09-14179, ForBayes project CCG10-UAM/TIC-5792 and Cátedra UAM-Telefónica
Recommended from our members
Unimodal late fusion for NIST i-vector challenge on speaker detection
Speaker detection is a very interesting machine learning task for which the latest i-vector challenge has been coordinated by the National Institute of Standards and Technology (NIST). A simple late fusion approach for the speaker detection task on the i-vector challenge is presented. The approach is based on the late fusion of scores from the cosine distance method (the baseline) and the scores obtained from linear discriminant analysis. The results show that by adapting the simple late fusion approach, the framework can outperform the baseline score for the decision cost function on the NIST i-vector machine learning challenge
Improved i-Vector Representation for Speaker Diarization
This paper proposes using a previously well-trained deep neural network (DNN) to enhance the i-vector representation used for speaker diarization. In effect, we replace the Gaussian Mixture Model (GMM) typically used to train a Universal Background Model (UBM), with a DNN that has been trained using a different large scale dataset. To train the T-matrix we use a supervised UBM obtained from the DNN using filterbank input features to calculate the posterior information, and then MFCC features to train the UBM instead of a traditional unsupervised UBM derived from single features. Next we jointly use DNN and MFCC features to calculate the zeroth and first order Baum-Welch statistics for training an extractor from which we obtain the i-vector. The system will be shown to achieve a significant improvement on the NIST 2008 speaker recognition evaluation (SRE) telephone data task compared to state-of-the-art approaches
Sesquiterpenes from aerial parts of Ferula vesceritensis
From the dichloromethane extract of aerial parts of Ferula vesceritensis (Apiaceae), 11 sesquiterpene derivatives were isolated. Among them five were compounds designated as 10-hydroxylancerodiol-6-anisate, 2,10-diacetyl-8-hydroxyferutriol-6-anisate, 10-hydroxylancerodiol-6-benzoate, vesceritenone and epoxy-vesceritenol. The six known compounds were identified as feselol, farnesiferol A, lapidol, 2-acetyl-jaeschkeanadiol-6-anisate, lasidiol-10-anisate and 10-oxo-jaesckeanadiol-6-anisate. All the structures were determined by extensive spectroscopic studies including 1D and 2D NMR experiments and mass spectroscopy analysis. Two of the compounds, the sesquiterpene coumarins farnesiferol A and feselol, bound to the model recombinant nucleotide-binding site of an MDR-like efflux pump from the enteropathogenic protozoan Cryptosporidium parvum
Recommended from our members
Speaker recognition with hybrid features from a deep belief network
Learning representation from audio data has shown advantages over the handcrafted features such as mel-frequency cepstral coefficients (MFCCs) in many audio applications. In most of the representation learning approaches, the connectionist systems have been used to learn and extract latent features from the fixed length data. In this paper, we propose an approach to combine the learned features and the MFCC features for speaker recognition task, which can be applied to audio scripts of different lengths. In particular, we study the use of features from different levels of deep belief network for quantizing the audio data into vectors of audio word counts. These vectors represent the audio scripts of different lengths that make them easier to train a classifier. We show in the experiment that the audio word count vectors generated from mixture of DBN features at different layers give better performance than the MFCC features. We also can achieve further improvement by combining the audio word count vector and the MFCC features
NeuroSpeech
NeuroSpeech is a software for modeling pathological speech signals considering different speech dimensions: phonation, articulation, prosody, and intelligibility. Although it was developed to model dysarthric speech signals from Parkinson's patients, its structure allows other computer scientists or developers to include other pathologies and/or measures. Different tasks can be performed: (1) modeling of the signals considering the aforementioned speech dimensions, (2) automatic discrimination of Parkinson's vs. non-Parkinson's, and (3) prediction of the neurological state according to the Unified Parkinson's Disease Rating Scale (UPDRS) score. The prediction of the dysarthria level according to the Frenchay Dysarthria Assessment scale is also provided
Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks
Zazo R, Lozano-Diez A, Gonzalez-Dominguez J, T. Toledano D, Gonzalez-Rodriguez J (2016) Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks. PLoS ONE 11(1): e0146917. doi:10.1371/journal.pone.0146917Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (similar to 3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources (a single GPU) that outperforms a reference i-vector system on a subset of the NIST Language Recognition Evaluation (8 target languages, 3s task) by up to a 26%. This result is in line with previously published research using proprietary LSTM implementations and huge computational resources, which made these former results hardly reproducible. Further, we extend those previous experiments modeling unseen languages (out of set, OOS, modeling), which is crucial in real applications. Results show that a LSTM RNN with OOS modeling is able to detect these languages and generalizes robustly to unseen OOS languages. Finally, we also analyze the effect of even more limited test data (from 2.25s to 0.1s) proving that with as little as 0.5s an accuracy of over 50% can be achieved.This work has been supported by project CMC-V2: Caracterizacion, Modelado y Compensacion de Variabilidad en la Señal de Voz (TEC2012-37585-C02-01), funded by Ministerio de Economia y Competitividad, Spain
An investigation of supervector regression for forensic voice comparison on small data
International audienceThe present paper deals with an observer design for a nonlinear lateral vehicle model. The nonlinear model is represented by an exact Takagi-Sugeno (TS) model via the sector nonlinearity transformation. A proportional multiple integral observer (PMIO) based on the TS model is designed to estimate simultaneously the state vector and the unknown input (road curvature). The convergence conditions of the estimation error are expressed under LMI formulation using the Lyapunov theory which guaranties bounded error. Simulations are carried out and experimental results are provided to illustrate the proposed observer
- …