Search CORE

2,708 research outputs found

Transfer learning of language-independent end-to-end ASR with language model fusion

Author: Baskar Murali Karthick
Cho Jaejin
Inaguma Hirofumi
Kawahara Tatsuya
Watanabe Shinji
Publication venue
Publication date: 07/05/2019
Field of study

This work explores better adaptation methods to low-resource languages using an external language model (LM) under the framework of transfer learning. We first build a language-independent ASR system in a unified sequence-to-sequence (S2S) architecture with a shared vocabulary among all languages. During adaptation, we perform LM fusion transfer, where an external LM is integrated into the decoder network of the attention-based S2S model in the whole adaptation stage, to effectively incorporate linguistic context of the target language. We also investigate various seed models for transfer learning. Experimental evaluations using the IARPA BABEL data set show that LM fusion transfer improves performances on all target five languages compared with simple transfer learning when the external text data is available. Our final system drastically reduces the performance gap from the hybrid systems.Comment: Accepted at ICASSP201

arXiv.org e-Print Archive

Crossref

Phonetic Temporal Neural Model for Language Identification

Author: Abel Andrew
Chen Yixiang
Li Lantian
Tang Zhiyuan
Wang Dong
Publication venue
Publication date: 25/08/2017
Field of study

Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID). However, the use of phonetic information has been largely overlooked by most existing neural LID methods, although this information has been used very successfully in conventional phonetic LID systems. We present a phonetic temporal neural model for LID, which is an LSTM-RNN LID system that accepts phonetic features produced by a phone-discriminative DNN as the input, rather than raw acoustic features. This new model is similar to traditional phonetic LID methods, but the phonetic knowledge here is much richer: it is at the frame level and involves compacted information of all phones. Our experiments conducted on the Babel database and the AP16-OLR database demonstrate that the temporal phonetic neural approach is very effective, and significantly outperforms existing acoustic neural models. It also outperforms the conventional i-vector approach on short utterances and in noisy conditions.Comment: Submitted to TASL

arXiv.org e-Print Archive

Crossref

University of Strathclyde Institutional Repository