4,617 research outputs found
Alternative Education for the Rom
The Rom* in the United States are nearly 100% illiterate. There are very few in any of the professions. The Rom cannot rely on gajo (non-Gypsy) doctors, lawyers, and educators who do not understand their ways or their unique problems
Joint Bayesian Gaussian discriminant analysis for speaker verification
State-of-the-art i-vector based speaker verification relies on variants of
Probabilistic Linear Discriminant Analysis (PLDA) for discriminant analysis. We
are mainly motivated by the recent work of the joint Bayesian (JB) method,
which is originally proposed for discriminant analysis in face verification. We
apply JB to speaker verification and make three contributions beyond the
original JB. 1) In contrast to the EM iterations with approximated statistics
in the original JB, the EM iterations with exact statistics are employed and
give better performance. 2) We propose to do simultaneous diagonalization (SD)
of the within-class and between-class covariance matrices to achieve efficient
testing, which has broader application scope than the SVD-based efficient
testing method in the original JB. 3) We scrutinize similarities and
differences between various Gaussian PLDAs and JB, complementing the previous
analysis of comparing JB only with Prince-Elder PLDA. Extensive experiments are
conducted on NIST SRE10 core condition 5, empirically validating the
superiority of JB with faster convergence rate and 9-13% EER reduction compared
with state-of-the-art PLDA.Comment: accepted by ICASSP201
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
The I4U consortium was established to facilitate a joint entry to NIST
speaker recognition evaluations (SRE). The latest edition of such joint
submission was in SRE 2018, in which the I4U submission was among the
best-performing systems. SRE'18 also marks the 10-year anniversary of I4U
consortium into NIST SRE series of evaluation. The primary objective of the
current paper is to summarize the results and lessons learned based on the
twelve sub-systems and their fusion submitted to SRE'18. It is also our
intention to present a shared view on the advancements, progresses, and major
paradigm shifts that we have witnessed as an SRE participant in the past decade
from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm
shift from supervector representation to deep speaker embedding, and a switch
of research challenge from channel compensation to domain adaptation.Comment: 5 page
Critique [of Alternative Education for the Rom]
Leita Kaldi has introduced the readers to little known data on one of America\u27s most interesting and lesser-known ethnic groups. This critique focuses on further development of the material in the article and the implications of such research for the field of ethnic studies
Speech Recognition For Selected Languages
Tato práce se zabývá rozpoznáváním spojité řeči pro trojici jazyků bulharštinu, chorvatštinu a švédštinu. Zpráva popisuje základy zpracování a rozpoznávání řeči, tvorbu akustických modelů pomocí skrytých Markovových modelů a směsi gaussovských rozložení a použití těchto technik pro rozpoznávání řeči v toolkitu Kaldi. Další součástí práce je postup přípravy dat pro toolkity pro rozpoznávání řeči HTK a Kaldi na základě dat z databáze GlobalPhone. V závěru jsou vytvořené modely otestovány pomocí testovacích dat a porovnány výsledky z jednotlivých modelů.This bachelor's thesis deals with recognition of continues speech for three languages - Bulgarian, Croatian and Swedish. There are described basics of speech processing and recognition methods like acoustic modeling using hidden Markov models and gaussian mixture models. Another aim of this work is preparing data for those languages from GlobalPhone database, so they may be used with speech recognition toolkits Kaldi and HTK. With data prepared there are several models trained and tested using Kaldi toolkit.
Adversarial Black-Box Attacks on Automatic Speech Recognition Systems using Multi-Objective Evolutionary Optimization
Fooling deep neural networks with adversarial input have exposed a
significant vulnerability in the current state-of-the-art systems in multiple
domains. Both black-box and white-box approaches have been used to either
replicate the model itself or to craft examples which cause the model to fail.
In this work, we propose a framework which uses multi-objective evolutionary
optimization to perform both targeted and un-targeted black-box attacks on
Automatic Speech Recognition (ASR) systems. We apply this framework on two ASR
systems: Deepspeech and Kaldi-ASR, which increases the Word Error Rates (WER)
of these systems by upto 980%, indicating the potency of our approach. During
both un-targeted and targeted attacks, the adversarial samples maintain a high
acoustic similarity of 0.98 and 0.97 with the original audio.Comment: Published in Interspeech 201
- …