10 research outputs found
Improving Voice Trigger Detection with Metric Learning
Voice trigger detection is an important task, which enables activating a
voice assistant when a target user speaks a keyword phrase. A detector is
typically trained on speech data independent of speaker information and used
for the voice trigger detection task. However, such a speaker independent voice
trigger detector typically suffers from performance degradation on speech from
underrepresented groups, such as accented speakers. In this work, we propose a
novel voice trigger detector that can use a small number of utterances from a
target speaker to improve detection accuracy. Our proposed model employs an
encoder-decoder architecture. While the encoder performs speaker independent
voice trigger detection, similar to the conventional detector, the decoder
predicts a personalized embedding for each utterance. A personalized voice
trigger score is then obtained as a similarity score between the embeddings of
enrollment utterances and a test utterance. The personalized embedding allows
adapting to target speaker's speech when computing the voice trigger score,
hence improving voice trigger detection accuracy. Experimental results show
that the proposed approach achieves a 38% relative reduction in a false
rejection rate (FRR) compared to a baseline speaker independent voice trigger
model.Comment: Submitted to InterSpeech 202
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
The I4U consortium was established to facilitate a joint entry to NIST
speaker recognition evaluations (SRE). The latest edition of such joint
submission was in SRE 2018, in which the I4U submission was among the
best-performing systems. SRE'18 also marks the 10-year anniversary of I4U
consortium into NIST SRE series of evaluation. The primary objective of the
current paper is to summarize the results and lessons learned based on the
twelve sub-systems and their fusion submitted to SRE'18. It is also our
intention to present a shared view on the advancements, progresses, and major
paradigm shifts that we have witnessed as an SRE participant in the past decade
from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm
shift from supervector representation to deep speaker embedding, and a switch
of research challenge from channel compensation to domain adaptation.Comment: 5 page
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve subsystems and their fusion submitted to SRE'18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE'08 to SRE'18. In this regard, we have seen, among others , a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
International audienceThe I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve subsystems and their fusion submitted to SRE'18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE'08 to SRE'18. In this regard, we have seen, among others , a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation
Dual-tree complex wavelet transform-based image enhancement for accurate long-term change assessment in coal mining areas
The main objective of this study was to improve the long-term land use change detection by improving classification accuracy of previous generation satellite image using a recent super-resolution technique. The study also analysed the change in land cover over a period of 41 years in a coal mining area. A dual-tree complex wavelet transform-based image super-resolution technique was used to enhance Landsat images of 1975 and 2016. Separating pixels with similar spectral response is an enigmatical task, especially when those pixel represent different ground features. Therefore, an advanced neural net supervised classifier was used to minimize classification errors. Accuracy of the classified images (both super-resolved and original) were measured using confusion matrices and kappa coefficients. A significant improvement of more than 10% was observed in the overall classification accuracy for the image of 1975, highlighting that the classification accuracy of earlier generation satellite data can be improved substantially
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
International audienceThe I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve subsystems and their fusion submitted to SRE'18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE'08 to SRE'18. In this regard, we have seen, among others , a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
International audienceThe I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve subsystems and their fusion submitted to SRE'18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE'08 to SRE'18. In this regard, we have seen, among others , a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation