242,648 research outputs found
Multilingual Speech Recognition With A Single End-To-End Model
Training a conventional automatic speech recognition (ASR) system to support
multiple languages is challenging because the sub-word unit, lexicon and word
inventories are typically language specific. In contrast, sequence-to-sequence
models are well suited for multilingual ASR because they encapsulate an
acoustic, pronunciation and language model jointly in a single network. In this
work we present a single sequence-to-sequence ASR model trained on 9 different
Indian languages, which have very little overlap in their scripts.
Specifically, we take a union of language-specific grapheme sets and train a
grapheme-based sequence-to-sequence model jointly on data from all languages.
We find that this model, which is not explicitly given any information about
language identity, improves recognition performance by 21% relative compared to
analogous sequence-to-sequence models trained on each language individually. By
modifying the model to accept a language identifier as an additional input
feature, we further improve performance by an additional 7% relative and
eliminate confusion between different languages.Comment: Accepted in ICASSP 201
- …