Search CORE

22,189 research outputs found

NPLDA: A Deep Neural PLDA Model for Speaker Verification

Author: Ganapathy Sriram
Krishnan Prashant
Ramoji Shreyas
Publication venue: 'International Speech Communication Association'
Publication date: 24/05/2020
Field of study

The state-of-art approach for speaker verification consists of a neural network based embedding extractor along with a backend generative model such as the Probabilistic Linear Discriminant Analysis (PLDA). In this work, we propose a neural network approach for backend modeling in speaker recognition. The likelihood ratio score of the generative PLDA model is posed as a discriminative similarity function and the learnable parameters of the score function are optimized using a verification cost. The proposed model, termed as neural PLDA (NPLDA), is initialized using the generative PLDA model parameters. The loss function for the NPLDA model is an approximation of the minimum detection cost function (DCF). The speaker recognition experiments using the NPLDA model are performed on the speaker verificiation task in the VOiCES datasets as well as the SITW challenge dataset. In these experiments, the NPLDA model optimized using the proposed loss function improves significantly over the state-of-art PLDA based speaker verification system.Comment: Published in Odyssey 2020, the Speaker and Language Recognition Workshop (VOiCES Special Session). Link to GitHub Implementation: https://github.com/iiscleap/NeuralPlda. arXiv admin note: substantial text overlap with arXiv:2001.0703

arXiv.org e-Print Archive

Crossref

Recommended from our members

Workshop Report from the 33rd Workshop of the Pugwash Study Group on the Implementation of the Chemical and Biological Weapons Conventions: 'Achieving realistic decisions at the seventh BWC review conference in 2011'

Author: Revill James
Publication venue
Publication date: 01/03/2012
Field of study

This workshop was hosted by the Association Suisse de Pugwash in association with the Geneva International Peace Research Institute GIPRI. The meeting was supported by a grant provided by the Swiss federal authorities. The workshop took place immediately prior to the Seventh Review Conference on the operation of the Biological Weapons Convention (BWC) in December 2011. It was attended by 57 participants, all by invitation and in their personal capacities, from 17 countries including, Australia, Canada, China, Germany, Hungary, India, Iran, Italy, Japan, New Zealand, the Russian Federation, Sweden, Switzerland, the Netherlands, the United Kingdom (UK), the United States of America (USA) and Ukraine. This report is the sole responsibility of its author, who was asked to prepare a brief account of the proceedings of the meeting in consultation with the Steering Committee. It does not necessarily reflect a consensus of the workshop as a whole, nor of the Study Group. The workshop was strictly governed by the Chatham House Rule, so reference to specific speakers is not detailed here

Sussex Research Online

Speaker Recognition: Advancements and Challenges

Author: Homayoon Beigi
Publication venue: 'IntechOpen'
Publication date: 28/11/2012
Field of study

IntechOpen

Improving Source Separation via Multi-Speaker Representations

Author: Van hamme Hugo
Zegers Jeroen
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2017
Field of study

Lately there have been novel developments in deep learning towards solving the cocktail party problem. Initial results are very promising and allow for more research in the domain. One technique that has not yet been explored in the neural network approach to this task is speaker adaptation. Intuitively, information on the speakers that we are trying to separate seems fundamentally important for the speaker separation task. However, retrieving this speaker information is challenging since the speaker identities are not known a priori and multiple speakers are simultaneously active. There is thus some sort of chicken and egg problem. To tackle this, source signals and i-vectors are estimated alternately. We show that blind multi-speaker adaptation improves the results of the network and that (in our case) the network is not capable of adequately retrieving this useful speaker information itself

arXiv.org e-Print Archive

Lirias

Crossref

The new nuclear arms control environment : trip report and project conclusions

Author: Gottemoeller Rose
Mistry Dimshaw
Sands Amy
Singer Clifford E.
Publication venue: Program for Arms Control, Disarmament, and International Security (ACDIS) : University of Illinois at Urbana-Champaign
Publication date: 01/07/2002
Field of study

Includes bibliographical references. "July 2002"This paper reports the results of 9 conference, workshops and private meetings held on the current diplomatic and security problems associated with nuclear arms control, both before and after September 11. Appendixes include participants and questions.unpublishednot peer reviewe

Illinois Digital Environment for Access to Learning and Scholarship Repository

Transfer Learning for Speech and Language Processing

Author: Wang Dong
Zheng Thomas Fang
Publication venue
Publication date: 19/11/2015
Field of study

Transfer learning is a vital technique that generalizes models trained for one setting or task to other settings or tasks. For example in speech recognition, an acoustic model trained for one language can be used to recognize speech in another language, with little or no re-training data. Transfer learning is closely related to multi-task learning (cross-lingual vs. multilingual), and is traditionally studied in the name of `model adaptation'. Recent advance in deep learning shows that transfer learning becomes much easier and more effective with high-level abstract features learned by deep models, and the `transfer' can be conducted not only between data distributions and data types, but also between model structures (e.g., shallow nets and deep nets) or even model types (e.g., Bayesian models and neural models). This review paper summarizes some recent prominent research towards this direction, particularly for speech and language processing. We also report some results from our group and highlight the potential of this very interesting research field.Comment: 13 pages, APSIPA 201

arXiv.org e-Print Archive

Crossref