1,228 research outputs found
Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data
The lack of code-switch training data is one of the major concerns in the
development of end-to-end code-switching automatic speech recognition (ASR)
models. In this work, we propose a method to train an improved end-to-end
code-switching ASR using only monolingual data. Our method encourages the
distributions of output token embeddings of monolingual languages to be
similar, and hence, promotes the ASR model to easily code-switch between
languages. Specifically, we propose to use Jensen-Shannon divergence and cosine
distance based constraints. The former will enforce output embeddings of
monolingual languages to possess similar distributions, while the later simply
brings the centroids of two distributions to be close to each other.
Experimental results demonstrate high effectiveness of the proposed method,
yielding up to 4.5% absolute mixed error rate improvement on Mandarin-English
code-switching ASR task.Comment: 5 pages, 3 figures, accepted to INTERSPEECH 201
Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Multilingual speech recognition for both monolingual and code-switching
speech is a challenging task. Recently, based on the Mixture of Experts (MoE),
many works have made good progress in multilingual and code-switching ASR, but
present huge computational complexity with the increase of supported languages.
In this work, we propose a computation-efficient network named Language-Routing
Mixture of Experts (LR-MoE) for multilingual and code-switching ASR. LR-MoE
extracts language-specific representations through the Mixture of Language
Experts (MLE), which is guided to learn by a frame-wise language routing
mechanism. The weight-shared frame-level language identification (LID) network
is jointly trained as the shared pre-router of each MoE layer. Experiments show
that the proposed method significantly improves multilingual and code-switching
speech recognition performances over baseline with comparable computational
efficiency.Comment: To appear in Proc. INTERSPEECH 2023, August 20-24, 2023, Dublin,
Irelan
- …