26 research outputs found
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
The I4U consortium was established to facilitate a joint entry to NIST
speaker recognition evaluations (SRE). The latest edition of such joint
submission was in SRE 2018, in which the I4U submission was among the
best-performing systems. SRE'18 also marks the 10-year anniversary of I4U
consortium into NIST SRE series of evaluation. The primary objective of the
current paper is to summarize the results and lessons learned based on the
twelve sub-systems and their fusion submitted to SRE'18. It is also our
intention to present a shared view on the advancements, progresses, and major
paradigm shifts that we have witnessed as an SRE participant in the past decade
from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm
shift from supervector representation to deep speaker embedding, and a switch
of research challenge from channel compensation to domain adaptation.Comment: 5 page
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve subsystems and their fusion submitted to SRE'18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE'08 to SRE'18. In this regard, we have seen, among others , a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation
Wespeaker baselines for VoxSRC2023
This report showcases the results achieved using the wespeaker toolkit for
the VoxSRC2023 Challenge. Our aim is to provide participants, especially those
with limited experience, with clear and straightforward guidelines to develop
their initial systems. Via well-structured recipes and strong results, we hope
to offer an accessible and good enough start point for all interested
individuals. In this report, we describe the results achieved on the VoxSRC2023
dev set using the pretrained models, you can check the CodaLab evaluation
server for the results on the evaluation set
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
International audienceThe I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve subsystems and their fusion submitted to SRE'18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE'08 to SRE'18. In this regard, we have seen, among others , a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation
Multi-Domain Adaptation by Self-Supervised Learning for Speaker Verification
In real-world applications, speaker recognition models often face various
domain-mismatch challenges, leading to a significant drop in performance.
Although numerous domain adaptation techniques have been developed to address
this issue, almost all present methods focus on a simple configuration where
the model is trained in one domain and deployed in another. However, real-world
environments are often complex and may contain multiple domains, making the
methods designed for one-to-one adaptation suboptimal. In our paper, we propose
a self-supervised learning method to tackle this multi-domain adaptation
problem. Building upon the basic self-supervised adaptation algorithm, we
designed three strategies to make it suitable for multi-domain adaptation: an
in-domain negative sampling strategy, a MoCo-like memory bank scheme, and a
CORAL-like distribution alignment. We conducted experiments using VoxCeleb2 as
the source domain dataset and CN-Celeb1 as the target multi-domain dataset. Our
results demonstrate that our method clearly outperforms the basic
self-supervised adaptation method, which simply treats the data of CN-Celeb1 as
a single domain. Importantly, the improvement is consistent in nearly all
in-domain tests and cross-domain tests, demonstrating the effectiveness of our
proposed method.Comment: submitted to ICASSP 202
Unsupervised regularization of the embedding extractor for robust language identification
International audienceState-of-the-art spoken language identification systems are constituted of three modules: a frame-level feature extractor, a segment-level embedding extractor and a final classifier. The performance of these systems degrades when facing mismatch between training and testing data. Most domain adaptation methods focus on adaptation of the final classifier. In this article , we propose a model-based unsupervised domain adaptation of the segment-level embedding extractor. The approach consists in a modification of the loss function used for training the embedding extractor. We introduce a regularization term based on the maximum mean discrepancy loss. Experiments were performed on the RATS corpus with transmission channel mismatch between telephone and radio channels. We obtained the same language identification performance as supervised training on the target domains but without using labeled data from these domains