7,626 research outputs found

    Data augmentation for automatic speech recognition for low resource languages

    Get PDF
    In this thesis, we explore several novel data augmentation methods for improving the performance of automatic speech recognition (ASR) on low-resource languages. Using a 100-hour subset of English LibriSpeech to simulate a low-resource setting, we compare the well-known SpecAugment augmentation approach to these new methods, along with several other competitive baselines. We then apply the most promising combinations of models and augmentation methods to three genuinely under-resourced languages using the 40-hour Gujarati, Tamil, Telugu datasets from the 2021 Interspeech Low Resource Automatic Speech Recognition Challenge for Indian Languages. Our data augmentation approaches, coupled with state-of-the-art acoustic model architectures and language models, yield reductions in word error rate over SpecAugment and other competitive baselines for the LibriSpeech-100 dataset, showing a particular advantage over prior models for the ``other\u27\u27, more challenging, dev and test sets. Extending this work to the low-resource Indian languages, we see large improvements over the baseline models and results comparable to large multilingual models

    Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

    Get PDF
    This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

    Intersession Variability Compensation in Language and Speaker Identification

    Get PDF
    Variabilita kanálu a hovoru je velmi důležitým problémem v úloze rozpoznávání mluvčího. V současné době je ve velkém množství vědeckých článků uvedeno několik technik pro kompenzaci vlivu kanálu. Kompenzace vlivu kanálu může být implementována jak v doméně modelu, tak i v doménách příznaků i skóre. Relativně nová výkoná technika je takzvaná eigenchannel adaptace pro GMM (Gaussian Mixture Models). Mevýhodou této metody je nemožnost její aplikace na jiné klasifikátory, jako napřílad takzvané SVM (Support Vector Machines), GMM s různým počtem Gausových komponent nebo v rozpoznávání řeči s použitím skrytých markovových modelů (HMM). Řešením může být aproximace této metody, eigenchannel adaptace v doméně příznaků. Obě tyto techniky, eigenchannel adaptace v doméně modelu a doméně příznaků v systémech rozpoznávání mluvčího, jsou uvedeny v této práci. Po dosažení dobrých výsledků v rozpoznávání mluvčího, byl přínos těchto technik zkoumán pro akustický systém rozpoznávání jazyka zahrnující 14 jazyků. V této úloze má nežádoucí vliv nejen variabilita kanálu, ale i variabilita mluvčího. Výsledky jsou prezentovány na datech definovaných pro evaluaci rozpoznávání mluvčího z roku 2006 a evaluaci rozpoznávání jazyka v roce 2007, obě organizované Amerických Národním Institutem pro Standard a Technologie (NIST)Varibiality in the channel and session is an important issue in the text-independent speaker recognition task. To date, several techniques providing channel and session variability compensation were introduced in a number of scientic papers. Such implementation can be done in feature, model and score domain. Relatively new and powerful approach to remove channel distortion is so-called eigenchannel adaptation for Gaussian Mixture Models (GMM). The drawback of the technique is that it is not applicable in its original implementation to different types of classifiers, eg. Support Vector Machines (SVM), GMM with different number of Gaussians or in speech recognition task using Hidden Markov Models (HMM). The solution can be the approximation of the technique, eigenchannel adaptation in feature domain. Both, the original eigenchannel adaptation and eigenchannel adaptation on features in task of speaker recognition are presented. After achieving good results in speaker recognition, contribution of the same techniques was examined in acoustic language identification system with 1414 languages. In this task undesired factors are channel and speaker variability. Presented results are presented on the NIST Speaker Recognition Evaluation 2006 data and NIST Language Recognition Evaluation 2007 data.

    Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

    Full text link
    Exploiting effective target modeling units is very important and has always been a concern in end-to-end automatic speech recognition (ASR). In this work, we propose a phonetic-assisted multi-target units (PMU) modeling approach, to enhance the Conformer-Transducer ASR system in a progressive representation learning manner. Specifically, PMU first uses the pronunciation-assisted subword modeling (PASM) and byte pair encoding (BPE) to produce phonetic-induced and text-induced target units separately; Then, three new frameworks are investigated to enhance the acoustic encoder, including a basic PMU, a paraCTC and a pcaCTC, they integrate the PASM and BPE units at different levels for CTC and transducer multi-task training. Experiments on both LibriSpeech and accented ASR tasks show that, the proposed PMU significantly outperforms the conventional BPE, it reduces the WER of LibriSpeech clean, other, and six accented ASR testsets by relative 12.7%, 6.0% and 7.7%, respectively.Comment: 5 pages, 1 figures, submitted to ICASSP 202

    Program of Research in Aeronautics

    Get PDF
    A prospectus of the educational and research opportunities available at the Joint Institute for Advancement of Flight Sciences, operated at NASA Langley Research Center in conjunction with George Washington University's School of Engineering and Applied Sciences is presented. Requirements of admission to various degree programs are given as well as the course offerings in the areas of acoustics, aeronautics, environmental modelling, materials science, and structures and dynamics. Research facilities for each field of study are described. Presentations and publications (including dissertations and theses) generated by each program are listed as well as faculty members visting scientists and engineers
    corecore