11 research outputs found
NPLDA: A Deep Neural PLDA Model for Speaker Verification
The state-of-art approach for speaker verification consists of a neural
network based embedding extractor along with a backend generative model such as
the Probabilistic Linear Discriminant Analysis (PLDA). In this work, we propose
a neural network approach for backend modeling in speaker recognition. The
likelihood ratio score of the generative PLDA model is posed as a
discriminative similarity function and the learnable parameters of the score
function are optimized using a verification cost. The proposed model, termed as
neural PLDA (NPLDA), is initialized using the generative PLDA model parameters.
The loss function for the NPLDA model is an approximation of the minimum
detection cost function (DCF). The speaker recognition experiments using the
NPLDA model are performed on the speaker verificiation task in the VOiCES
datasets as well as the SITW challenge dataset. In these experiments, the NPLDA
model optimized using the proposed loss function improves significantly over
the state-of-art PLDA based speaker verification system.Comment: Published in Odyssey 2020, the Speaker and Language Recognition
Workshop (VOiCES Special Session). Link to GitHub Implementation:
https://github.com/iiscleap/NeuralPlda. arXiv admin note: substantial text
overlap with arXiv:2001.0703
Neural PLDA Modeling for End-to-End Speaker Verification
While deep learning models have made significant advances in supervised
classification problems, the application of these models for out-of-set
verification tasks like speaker recognition has been limited to deriving
feature embeddings. The state-of-the-art x-vector PLDA based speaker
verification systems use a generative model based on probabilistic linear
discriminant analysis (PLDA) for computing the verification score. Recently, we
had proposed a neural network approach for backend modeling in speaker
verification called the neural PLDA (NPLDA) where the likelihood ratio score of
the generative PLDA model is posed as a discriminative similarity function and
the learnable parameters of the score function are optimized using a
verification cost. In this paper, we extend this work to achieve joint
optimization of the embedding neural network (x-vector network) with the NPLDA
network in an end-to-end (E2E) fashion. This proposed end-to-end model is
optimized directly from the acoustic features with a verification cost function
and during testing, the model directly outputs the likelihood ratio score. With
various experiments using the NIST speaker recognition evaluation (SRE) 2018
and 2019 datasets, we show that the proposed E2E model improves significantly
over the x-vector PLDA baseline speaker verification system.Comment: Accepted in Interspeech 2020. GitHub Implementation Repos:
https://github.com/iiscleap/E2E-NPLDA and
https://github.com/iiscleap/NeuralPld
Coswara -- A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis
The COVID-19 pandemic presents global challenges transcending boundaries of
country, race, religion, and economy. The current gold standard method for
COVID-19 detection is the reverse transcription polymerase chain reaction
(RT-PCR) testing. However, this method is expensive, time-consuming, and
violates social distancing. Also, as the pandemic is expected to stay for a
while, there is a need for an alternate diagnosis tool which overcomes these
limitations, and is deployable at a large scale. The prominent symptoms of
COVID-19 include cough and breathing difficulties. We foresee that respiratory
sounds, when analyzed using machine learning techniques, can provide useful
insights, enabling the design of a diagnostic tool. Towards this, the paper
presents an early effort in creating (and analyzing) a database, called
Coswara, of respiratory sounds, namely, cough, breath, and voice. The sound
samples are collected via worldwide crowdsourcing using a website application.
The curated dataset is released as open access. As the pandemic is evolving,
the data collection and analysis is a work in progress. We believe that
insights from analysis of Coswara can be effective in enabling sound based
technology solutions for point-of-care diagnosis of respiratory infection, and
in the near future this can help to diagnose COVID-19.Comment: A description of Coswara dataset to evaluate COVID-19 diagnosis using
respiratory sound
Epistatic and Combinatorial Effects of Pigmentary Gene Mutations in the Domestic Pigeon
SummaryUnderstanding the molecular basis of phenotypic diversity is a critical challenge in biology, yet we know little about the mechanistic effects of different mutations and epistatic relationships among loci that contribute to complex traits. Pigmentation genetics offers a powerful model for identifying mutations underlying diversity and for determining how additional complexity emerges from interactions among loci. Centuries of artificial selection in domestic rock pigeons (Columba livia) have cultivated tremendous variation in plumage pigmentation through the combined effects of dozens of loci. The dominance and epistatic hierarchies of key loci governing this diversity are known through classical genetic studies [1–6], but their molecular identities and the mechanisms of their genetic interactions remain unknown. Here we identify protein-coding and cis-regulatory mutations in Tyrp1, Sox10, and Slc45a2 that underlie classical color phenotypes of pigeons and present a mechanistic explanation of their dominance and epistatic relationships. We also find unanticipated allelic heterogeneity at Tyrp1 and Sox10, indicating that color variants evolved repeatedly though mutations in the same genes. These results demonstrate how a spectrum of coding and regulatory mutations in a small number of genes can interact to generate substantial phenotypic diversity in a classic Darwinian model of evolution [7]