48 research outputs found
Factorization of Discriminatively Trained i-vector Extractor for Speaker Recognition
In this work, we continue in our research on i-vector extractor for speaker
verification (SV) and we optimize its architecture for fast and effective
discriminative training. We were motivated by computational and memory
requirements caused by the large number of parameters of the original
generative i-vector model. Our aim is to preserve the power of the original
generative model, and at the same time focus the model towards extraction of
speaker-related information. We show that it is possible to represent a
standard generative i-vector extractor by a model with significantly less
parameters and obtain similar performance on SV tasks. We can further refine
this compact model by discriminative training and obtain i-vectors that lead to
better performance on various SV benchmarks representing different acoustic
domains.Comment: Submitted to Interspeech 2019, Graz, Austria. arXiv admin note:
substantial text overlap with arXiv:1810.1318
GENDER INDEPENDENT DISCRIMINATIVE SPEAKER RECOGNITION IN IâVECTOR SPACE
Speaker recognition systems attain their best accuracy when
trained with gender dependent features and tested with known
gender trials. In real applications, however, gender labels are
often not given. In this work we illustrate the design of a system
that does not make use of the gender labels both in training
and in test, i.e. a completely Gender Independent (GI)
system. It relies on discriminative training, where the trials
are iâvector pairs, and the discrimination is between the hypothesis
that the pair of feature vectors in the trial belong to
the same speaker or to different speakers. We demonstrate
that this pairwise discriminative training can be interpreted as
a procedure that estimates the parameters of the best (second
order) approximation of the logâlikelihood ratio score function,
and that a pairwise SVM can be used for training a gender
independent system. Our results show that a pairwise GI
SVM, saving memory and execution time, achieves on the last
NIST evaluations stateâofâtheâart performance, comparable
to a Gender Dependent(GD) system
Phonotactic language recognition using i-vectors and phoneme posteriogram counts
This paper describes a novel approach to phonotactic LID, where instead of using soft-counts based on phoneme lattices, we use posteriogram to obtain n-gram counts. The high-dimensional vectors of counts are reduced to low-dimensional units for which we adapted the commonly used term i-vectors. The reduction is based on multinomial subspace modeling and is designed to work in the total-variability space. The proposed technique was tested on the NIST 2009 LRE set with better results to a system based on using soft-counts (Cavg on 30s: 3.15% vs 3.43%), and with very good results when fused with an acoustic i-vector LID system (Cavg on 30s acoustic 2.4% vs 1.25%). The proposed technique is also compared with another low dimensional projection system based on PCA. In comparison with the original soft-counts, the proposed technique provides better results, reduces the problems due to sparse counts, and avoids the process of using pruning techniques when creating the lattices