5,895 research outputs found
Independent language modeling architecture for end-to-end ASR
The attention-based end-to-end (E2E) automatic speech recognition (ASR)
architecture allows for joint optimization of acoustic and language models
within a single network. However, in a vanilla E2E ASR architecture, the
decoder sub-network (subnet), which incorporates the role of the language model
(LM), is conditioned on the encoder output. This means that the acoustic
encoder and the language model are entangled that doesn't allow language model
to be trained separately from external text data. To address this problem, in
this work, we propose a new architecture that separates the decoder subnet from
the encoder output. In this way, the decoupled subnet becomes an independently
trainable LM subnet, which can easily be updated using the external text data.
We study two strategies for updating the new architecture. Experimental results
show that, 1) the independent LM architecture benefits from external text data,
achieving 9.3% and 22.8% relative character and word error rate reduction on
Mandarin HKUST and English NSC datasets respectively; 2)the proposed
architecture works well with external LM and can be generalized to different
amount of labelled data
Transfer learning of language-independent end-to-end ASR with language model fusion
This work explores better adaptation methods to low-resource languages using
an external language model (LM) under the framework of transfer learning. We
first build a language-independent ASR system in a unified sequence-to-sequence
(S2S) architecture with a shared vocabulary among all languages. During
adaptation, we perform LM fusion transfer, where an external LM is integrated
into the decoder network of the attention-based S2S model in the whole
adaptation stage, to effectively incorporate linguistic context of the target
language. We also investigate various seed models for transfer learning.
Experimental evaluations using the IARPA BABEL data set show that LM fusion
transfer improves performances on all target five languages compared with
simple transfer learning when the external text data is available. Our final
system drastically reduces the performance gap from the hybrid systems.Comment: Accepted at ICASSP201
- …