903 research outputs found
Speech Enhancement Guided by Contextual Articulatory Information
Previous studies have confirmed the effectiveness of leveraging articulatory
information to attain improved speech enhancement (SE) performance. By
augmenting the original acoustic features with the place/manner of articulatory
features, the SE process can be guided to consider the articulatory properties
of the input speech when performing enhancement. Hence, we believe that the
contextual information of articulatory attributes should include useful
information and can further benefit SE in different languages. In this study,
we propose an SE system that improves its performance through optimizing the
contextual articulatory information in enhanced speech for both English and
Mandarin. We optimize the contextual articulatory information through
joint-train the SE model with an end-to-end automatic speech recognition (E2E
ASR) model, predicting the sequence of broad phone classes (BPC) instead of the
word sequences. Meanwhile, two training strategies are developed to train the
SE system based on the BPC-based ASR: multitask-learning and deep-feature
training strategies. Experimental results on the TIMIT and TMHINT dataset
confirm that the contextual articulatory information facilitates an SE system
in achieving better results than the traditional Acoustic Model(AM). Moreover,
in contrast to another SE system that is trained with monophonic ASR, the
BPC-based ASR (providing contextual articulatory information) can improve the
SE performance more effectively under different signal-to-noise ratios(SNR).Comment: Will be submitted to TASL
- …