398 research outputs found
Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling
This study addresses unsupervised subword modeling, i.e., learning feature
representations that can distinguish subword units of a language. The proposed
approach adopts a two-stage bottleneck feature (BNF) learning framework,
consisting of autoregressive predictive coding (APC) as a front-end and a
DNN-BNF model as a back-end. APC pretrained features are set as input features
to a DNN-BNF model. A language-mismatched ASR system is used to provide
cross-lingual phone labels for DNN-BNF model training. Finally, BNFs are
extracted as the subword-discriminative feature representation. A second aim of
this work is to investigate the robustness of our approach's effectiveness to
different amounts of training data. The results on Libri-light and the
ZeroSpeech 2017 databases show that APC is effective in front-end feature
pretraining. Our whole system outperforms the state of the art on both
databases. Cross-lingual phone labels for English data by a Dutch ASR
outperform those by a Mandarin ASR, possibly linked to the larger similarity of
Dutch compared to Mandarin with English. Our system is less sensitive to
training data amount when the training data is over 50 hours. APC pretraining
leads to a reduction of needed training material from over 5,000 hours to
around 200 hours with little performance degradation.Comment: 5 pages, 3 figures. Accepted for publication in INTERSPEECH 2020,
Shanghai, Chin
- …