3 research outputs found

    Towards Language-Universal End-to-End Speech Recognition

    Full text link
    Building speech recognizers in multiple languages typically involves replicating a monolingual training recipe for each language, or utilizing a multi-task learning approach where models for different languages have separate output labels but share some internal parameters. In this work, we exploit recent progress in end-to-end speech recognition to create a single multilingual speech recognition system capable of recognizing any of the languages seen in training. To do so, we propose the use of a universal character set that is shared among all languages. We also create a language-specific gating mechanism within the network that can modulate the network's internal representations in a language-specific way. We evaluate our proposed approach on the Microsoft Cortana task across three languages and show that our system outperforms both the individual monolingual systems and systems built with a multi-task learning approach. We also show that this model can be used to initialize a monolingual speech recognizer, and can be used to create a bilingual model for use in code-switching scenarios.Comment: submitted to ICASSP 201

    Transfer Learning for Speech and Language Processing

    Full text link
    Transfer learning is a vital technique that generalizes models trained for one setting or task to other settings or tasks. For example in speech recognition, an acoustic model trained for one language can be used to recognize speech in another language, with little or no re-training data. Transfer learning is closely related to multi-task learning (cross-lingual vs. multilingual), and is traditionally studied in the name of `model adaptation'. Recent advance in deep learning shows that transfer learning becomes much easier and more effective with high-level abstract features learned by deep models, and the `transfer' can be conducted not only between data distributions and data types, but also between model structures (e.g., shallow nets and deep nets) or even model types (e.g., Bayesian models and neural models). This review paper summarizes some recent prominent research towards this direction, particularly for speech and language processing. We also report some results from our group and highlight the potential of this very interesting research field.Comment: 13 pages, APSIPA 201

    INVESTIGATION ON CROSS- AND MULTILINGUAL MLP FEATURES UNDER MATCHED AND MISMATCHED ACOUSTICAL CONDITIONS

    No full text
    In this paper, Multi Layer Perceptron (MLP) based multilingual bottleneck features are investigated for acoustic modeling in three languages — German, French, and US English. We use a modified training algorithm to handle the multilingual training scenario without having to explicitly map the phonemes to a common phoneme set. Furthermore, the cross-lingual portability of bottleneck features between the three languages are also investigated. Single pass recognition experiments on large vocabulary SMS dictation task indicate that (1) multilingual bottleneck features yield significantly lower word error rates compared to standard MFCC features (2) multilingual bottleneck features are superior to monolingual bottleneck features trained for the target language with limited training data, and (3) multilingual bottleneck features are beneficial in training acoustic models in a low resource language where only mismatched training data is available — by exploiting the more matched training data from other languages. Index Terms — MLP, bottleneck, multilingual, mismatched acoustical conditio
    corecore