Search CORE

1,228 research outputs found

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

Author: Chng Eng Siong
Khassanov Yerbolat
Ma Bin
Ni Chongjia
Pham Van Tung
Xu Haihua
Zeng Zhiping
Publication venue: 'International Speech Communication Association'
Publication date: 31/07/2019
Field of study

The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models. In this work, we propose a method to train an improved end-to-end code-switching ASR using only monolingual data. Our method encourages the distributions of output token embeddings of monolingual languages to be similar, and hence, promotes the ASR model to easily code-switch between languages. Specifically, we propose to use Jensen-Shannon divergence and cosine distance based constraints. The former will enforce output embeddings of monolingual languages to possess similar distributions, while the later simply brings the centroids of two distributions to be close to each other. Experimental results demonstrate high effectiveness of the proposed method, yielding up to 4.5% absolute mixed error rate improvement on Mandarin-English code-switching ASR task.Comment: 5 pages, 3 figures, accepted to INTERSPEECH 201

arXiv.org e-Print Archive

Crossref

Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition

Author: Du Binbin
Li Yuke
Ma Guodong
Wang Wenxuan
Publication venue
Publication date: 13/07/2023
Field of study

Multilingual speech recognition for both monolingual and code-switching speech is a challenging task. Recently, based on the Mixture of Experts (MoE), many works have made good progress in multilingual and code-switching ASR, but present huge computational complexity with the increase of supported languages. In this work, we propose a computation-efficient network named Language-Routing Mixture of Experts (LR-MoE) for multilingual and code-switching ASR. LR-MoE extracts language-specific representations through the Mixture of Language Experts (MLE), which is guided to learn by a frame-wise language routing mechanism. The weight-shared frame-level language identification (LID) network is jointly trained as the shared pre-router of each MoE layer. Experiments show that the proposed method significantly improves multilingual and code-switching speech recognition performances over baseline with comparable computational efficiency.Comment: To appear in Proc. INTERSPEECH 2023, August 20-24, 2023, Dublin, Irelan

arXiv.org e-Print Archive