Search CORE

14 research outputs found

Probing the Information Encoded in X-vectors

Author: Khudanpur Sanjeev
Povey Daniel
Raj Desh
Snyder David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/09/2019
Field of study

Deep neural network based speaker embeddings, such as x-vectors, have been shown to perform well in text-independent speaker recognition/verification tasks. In this paper, we use simple classifiers to investigate the contents encoded by x-vector embeddings. We probe these embeddings for information related to the speaker, channel, transcription (sentence, words, phones), and meta information about the utterance (duration and augmentation type), and compare these with the information encoded by i-vectors across a varying number of dimensions. We also study the effect of data augmentation during extractor training on the information captured by x-vectors. Experiments on the RedDots data set show that x-vectors capture spoken content and channel-related information, while performing well on speaker verification tasks.Comment: Accepted at IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 201

arXiv.org e-Print Archive

Crossref

Domain-Dependent Speaker Diarization for the Third DIHARD Challenge

Author: Kumar A. Kishore
Saha Goutam
Sahidullah Md
Waldekar Shefali
Publication venue: HAL CCSD
Publication date: 23/01/2021
Field of study

International audienceThis report presents the system developed by the ABSP Laboratory team for the third DIHARD speech diarization challenge. Our main contribution in this work is to develop a simple and efficient solution for acoustic domain dependent speech diarization. We explore speaker embeddings for acoustic domain identification (ADI) task. Our study reveals that i-vector based method achieves considerably better performance than xvector based approach in the third DIHARD challenge dataset. Next, we integrate the ADI module with the diarization framework. The performance substantially improved over that of the baseline when we optimized the thresholds for agglomerative hierarchical clustering and the parameters for dimensionality reduction during scoring for individual acoustic domains. We achieved a relative improvement of 9.63% and 10.64% in DER for core and full conditions, respectively, for Track 1 of the DIHARD III evaluation set

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes