3,435 research outputs found
Factorization of Discriminatively Trained i-vector Extractor for Speaker Recognition
In this work, we continue in our research on i-vector extractor for speaker
verification (SV) and we optimize its architecture for fast and effective
discriminative training. We were motivated by computational and memory
requirements caused by the large number of parameters of the original
generative i-vector model. Our aim is to preserve the power of the original
generative model, and at the same time focus the model towards extraction of
speaker-related information. We show that it is possible to represent a
standard generative i-vector extractor by a model with significantly less
parameters and obtain similar performance on SV tasks. We can further refine
this compact model by discriminative training and obtain i-vectors that lead to
better performance on various SV benchmarks representing different acoustic
domains.Comment: Submitted to Interspeech 2019, Graz, Austria. arXiv admin note:
substantial text overlap with arXiv:1810.1318
Employing Emotion Cues to Verify Speakers in Emotional Talking Environments
Usually, people talk neutrally in environments where there are no abnormal
talking conditions such as stress and emotion. Other emotional conditions that
might affect people talking tone like happiness, anger, and sadness. Such
emotions are directly affected by the patient health status. In neutral talking
environments, speakers can be easily verified, however, in emotional talking
environments, speakers cannot be easily verified as in neutral talking ones.
Consequently, speaker verification systems do not perform well in emotional
talking environments as they do in neutral talking environments. In this work,
a two-stage approach has been employed and evaluated to improve speaker
verification performance in emotional talking environments. This approach
employs speaker emotion cues (text-independent and emotion-dependent speaker
verification problem) based on both Hidden Markov Models (HMMs) and
Suprasegmental Hidden Markov Models (SPHMMs) as classifiers. The approach is
comprised of two cascaded stages that combines and integrates emotion
recognizer and speaker recognizer into one recognizer. The architecture has
been tested on two different and separate emotional speech databases: our
collected database and Emotional Prosody Speech and Transcripts database. The
results of this work show that the proposed approach gives promising results
with a significant improvement over previous studies and other approaches such
as emotion-independent speaker verification approach and emotion-dependent
speaker verification approach based completely on HMMs.Comment: Journal of Intelligent Systems, Special Issue on Intelligent
Healthcare Systems, De Gruyter, 201
- …