11 research outputs found
Language-specific Acoustic Boundary Learning for Mandarin-English Code-switching Speech Recognition
Code-switching speech recognition (CSSR) transcribes speech that switches
between multiple languages or dialects within a single sentence. The main
challenge in this task is that different languages often have similar
pronunciations, making it difficult for models to distinguish between them. In
this paper, we propose a method for solving the CSSR task from the perspective
of language-specific acoustic boundary learning. We introduce language-specific
weight estimators (LSWE) to model acoustic boundary learning in different
languages separately. Additionally, a non-autoregressive (NAR) decoder and a
language change detection (LCD) module are employed to assist in training.
Evaluated on the SEAME corpus, our method achieves a state-of-the-art mixed
error rate (MER) of 16.29% and 22.81% on the test_man and test_sge sets. We
also demonstrate the effectiveness of our method on a 9000-hour in-house
meeting code-switching dataset, where our method achieves a relatively 7.9% MER
reduction
Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information
This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech