4,617 research outputs found

    Alternative Education for the Rom

    Get PDF
    The Rom* in the United States are nearly 100% illiterate. There are very few in any of the professions. The Rom cannot rely on gajo (non-Gypsy) doctors, lawyers, and educators who do not understand their ways or their unique problems

    Joint Bayesian Gaussian discriminant analysis for speaker verification

    Full text link
    State-of-the-art i-vector based speaker verification relies on variants of Probabilistic Linear Discriminant Analysis (PLDA) for discriminant analysis. We are mainly motivated by the recent work of the joint Bayesian (JB) method, which is originally proposed for discriminant analysis in face verification. We apply JB to speaker verification and make three contributions beyond the original JB. 1) In contrast to the EM iterations with approximated statistics in the original JB, the EM iterations with exact statistics are employed and give better performance. 2) We propose to do simultaneous diagonalization (SD) of the within-class and between-class covariance matrices to achieve efficient testing, which has broader application scope than the SVD-based efficient testing method in the original JB. 3) We scrutinize similarities and differences between various Gaussian PLDAs and JB, complementing the previous analysis of comparing JB only with Prince-Elder PLDA. Extensive experiments are conducted on NIST SRE10 core condition 5, empirically validating the superiority of JB with faster convergence rate and 9-13% EER reduction compared with state-of-the-art PLDA.Comment: accepted by ICASSP201

    I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

    Get PDF
    The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve sub-systems and their fusion submitted to SRE'18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation.Comment: 5 page

    Critique [of Alternative Education for the Rom]

    Get PDF
    Leita Kaldi has introduced the readers to little known data on one of America\u27s most interesting and lesser-known ethnic groups. This critique focuses on further development of the material in the article and the implications of such research for the field of ethnic studies

    Speech Recognition For Selected Languages

    Get PDF
    Tato práce se zabývá rozpoznáváním spojité řeči pro trojici jazyků bulharštinu, chorvatštinu a švédštinu. Zpráva popisuje základy zpracování a rozpoznávání řeči, tvorbu akustických modelů pomocí skrytých Markovových modelů a směsi gaussovských rozložení a použití těchto technik pro rozpoznávání řeči v toolkitu Kaldi. Další součástí práce je postup přípravy dat pro toolkity pro rozpoznávání řeči HTK a Kaldi na základě dat z databáze GlobalPhone. V závěru jsou vytvořené modely otestovány pomocí testovacích dat a porovnány výsledky z jednotlivých modelů.This bachelor's thesis deals with recognition of continues speech for three languages - Bulgarian, Croatian and Swedish. There are described basics of speech processing and recognition methods like acoustic modeling using hidden Markov models and gaussian mixture models. Another aim of this work is preparing data for those languages from GlobalPhone database, so they may be used with speech recognition toolkits Kaldi and HTK. With data prepared there are several models trained and tested using Kaldi toolkit.

    Adversarial Black-Box Attacks on Automatic Speech Recognition Systems using Multi-Objective Evolutionary Optimization

    Full text link
    Fooling deep neural networks with adversarial input have exposed a significant vulnerability in the current state-of-the-art systems in multiple domains. Both black-box and white-box approaches have been used to either replicate the model itself or to craft examples which cause the model to fail. In this work, we propose a framework which uses multi-objective evolutionary optimization to perform both targeted and un-targeted black-box attacks on Automatic Speech Recognition (ASR) systems. We apply this framework on two ASR systems: Deepspeech and Kaldi-ASR, which increases the Word Error Rates (WER) of these systems by upto 980%, indicating the potency of our approach. During both un-targeted and targeted attacks, the adversarial samples maintain a high acoustic similarity of 0.98 and 0.97 with the original audio.Comment: Published in Interspeech 201
    corecore