22,189 research outputs found
NPLDA: A Deep Neural PLDA Model for Speaker Verification
The state-of-art approach for speaker verification consists of a neural
network based embedding extractor along with a backend generative model such as
the Probabilistic Linear Discriminant Analysis (PLDA). In this work, we propose
a neural network approach for backend modeling in speaker recognition. The
likelihood ratio score of the generative PLDA model is posed as a
discriminative similarity function and the learnable parameters of the score
function are optimized using a verification cost. The proposed model, termed as
neural PLDA (NPLDA), is initialized using the generative PLDA model parameters.
The loss function for the NPLDA model is an approximation of the minimum
detection cost function (DCF). The speaker recognition experiments using the
NPLDA model are performed on the speaker verificiation task in the VOiCES
datasets as well as the SITW challenge dataset. In these experiments, the NPLDA
model optimized using the proposed loss function improves significantly over
the state-of-art PLDA based speaker verification system.Comment: Published in Odyssey 2020, the Speaker and Language Recognition
Workshop (VOiCES Special Session). Link to GitHub Implementation:
https://github.com/iiscleap/NeuralPlda. arXiv admin note: substantial text
overlap with arXiv:2001.0703
Recommended from our members
Workshop Report from the 33rd Workshop of the Pugwash Study Group on the Implementation of the Chemical and Biological Weapons Conventions: 'Achieving realistic decisions at the seventh BWC review conference in 2011'
This workshop was hosted by the Association Suisse de Pugwash in association with the Geneva International Peace Research Institute GIPRI. The meeting was supported by a grant provided by the Swiss federal authorities.
The workshop took place immediately prior to the Seventh Review Conference on the operation of the Biological Weapons Convention (BWC) in December 2011. It was attended by 57 participants, all by invitation and in their personal capacities, from 17 countries including, Australia, Canada, China, Germany, Hungary, India, Iran, Italy, Japan, New Zealand, the Russian Federation, Sweden, Switzerland, the Netherlands, the United Kingdom (UK), the United States of America (USA) and Ukraine. This report is the sole responsibility of its author, who was asked to prepare a brief account of the proceedings of the meeting in consultation with the Steering Committee. It does not necessarily reflect a consensus of the workshop as a whole, nor of the Study Group. The workshop was strictly governed by the Chatham House Rule, so reference to specific speakers is not detailed here
Improving Source Separation via Multi-Speaker Representations
Lately there have been novel developments in deep learning towards solving
the cocktail party problem. Initial results are very promising and allow for
more research in the domain. One technique that has not yet been explored in
the neural network approach to this task is speaker adaptation. Intuitively,
information on the speakers that we are trying to separate seems fundamentally
important for the speaker separation task. However, retrieving this speaker
information is challenging since the speaker identities are not known a priori
and multiple speakers are simultaneously active. There is thus some sort of
chicken and egg problem. To tackle this, source signals and i-vectors are
estimated alternately. We show that blind multi-speaker adaptation improves the
results of the network and that (in our case) the network is not capable of
adequately retrieving this useful speaker information itself
The new nuclear arms control environment : trip report and project conclusions
Includes bibliographical references. "July 2002"This paper reports the results of 9 conference, workshops and private meetings held on the current diplomatic and security problems associated with nuclear arms control, both before and after September 11. Appendixes include participants and questions.unpublishednot peer reviewe
Transfer Learning for Speech and Language Processing
Transfer learning is a vital technique that generalizes models trained for
one setting or task to other settings or tasks. For example in speech
recognition, an acoustic model trained for one language can be used to
recognize speech in another language, with little or no re-training data.
Transfer learning is closely related to multi-task learning (cross-lingual vs.
multilingual), and is traditionally studied in the name of `model adaptation'.
Recent advance in deep learning shows that transfer learning becomes much
easier and more effective with high-level abstract features learned by deep
models, and the `transfer' can be conducted not only between data distributions
and data types, but also between model structures (e.g., shallow nets and deep
nets) or even model types (e.g., Bayesian models and neural models). This
review paper summarizes some recent prominent research towards this direction,
particularly for speech and language processing. We also report some results
from our group and highlight the potential of this very interesting research
field.Comment: 13 pages, APSIPA 201
- …