2 research outputs found
End-to-End Residual CNN with L-GM Loss Speaker Verification System
We propose an end-to-end speaker verification system based on the neural
network and trained by a loss function with less computational complexity. The
end-to-end speaker verification system in this paper consists of a ResNet
architecture to extract features from utterance, then produces utterance-level
speaker embeddings, and train using the large-margin Gaussian Mixture loss
function. Influenced by the large-margin and likelihood regularization,
large-margin Gaussian Mixture loss function benefits the speaker verification
performance. Experimental results demonstrate that the Residual CNN with
large-margin Gaussian Mixture loss outperforms DNN-based i-vector baseline by
more than 10% improvement in accuracy rate.Comment: 5 pages. arXiv admin note: text overlap with arXiv:1803.02988,
arXiv:1705.02304, arXiv:1706.08612 by other author
Masked Proxy Loss For Text-Independent Speaker Verification
Open-set speaker recognition can be regarded as a metric learning problem,
which is to maximize inter-class variance and minimize intra-class variance.
Supervised metric learning can be categorized into entity-based learning and
proxy-based learning. Most of the existing metric learning objectives like
Contrastive, Triplet, Prototypical, GE2E, etc all belong to the former
division, the performance of which is either highly dependent on sample mining
strategy or restricted by insufficient label information in the mini-batch.
Proxy-based losses mitigate both shortcomings, however, fine-grained
connections among entities are either not or indirectly leveraged. This paper
proposes a Masked Proxy (MP) loss which directly incorporates both proxy-based
relationships and pair-based relationships. We further propose Multinomial
Masked Proxy (MMP) loss to leverage the hardness of speaker pairs. These
methods have been applied to evaluate on VoxCeleb test set and reach
state-of-the-art Equal Error Rate(EER).Comment: Accepted at Interspeech 202