Search CORE

2 research outputs found

End-to-End Residual CNN with L-GM Loss Speaker Verification System

Author: Du Xingjian
Shi Xuan
Zhu Mengyao
Publication venue
Publication date: 01/09/2018
Field of study

We propose an end-to-end speaker verification system based on the neural network and trained by a loss function with less computational complexity. The end-to-end speaker verification system in this paper consists of a ResNet architecture to extract features from utterance, then produces utterance-level speaker embeddings, and train using the large-margin Gaussian Mixture loss function. Influenced by the large-margin and likelihood regularization, large-margin Gaussian Mixture loss function benefits the speaker verification performance. Experimental results demonstrate that the Residual CNN with large-margin Gaussian Mixture loss outperforms DNN-based i-vector baseline by more than 10% improvement in accuracy rate.Comment: 5 pages. arXiv admin note: text overlap with arXiv:1803.02988, arXiv:1705.02304, arXiv:1706.08612 by other author

arXiv.org e-Print Archive

Masked Proxy Loss For Text-Independent Speaker Verification

Author: Dhamyal Hira
Kumar Aiswarya Vinod
Lian Jiachen
Raj Bhiksha
Singh Rita
Publication venue
Publication date: 24/06/2021
Field of study

Open-set speaker recognition can be regarded as a metric learning problem, which is to maximize inter-class variance and minimize intra-class variance. Supervised metric learning can be categorized into entity-based learning and proxy-based learning. Most of the existing metric learning objectives like Contrastive, Triplet, Prototypical, GE2E, etc all belong to the former division, the performance of which is either highly dependent on sample mining strategy or restricted by insufficient label information in the mini-batch. Proxy-based losses mitigate both shortcomings, however, fine-grained connections among entities are either not or indirectly leveraged. This paper proposes a Masked Proxy (MP) loss which directly incorporates both proxy-based relationships and pair-based relationships. We further propose Multinomial Masked Proxy (MMP) loss to leverage the hardness of speaker pairs. These methods have been applied to evaluate on VoxCeleb test set and reach state-of-the-art Equal Error Rate(EER).Comment: Accepted at Interspeech 202

arXiv.org e-Print Archive