903 research outputs found

    Speech Enhancement Guided by Contextual Articulatory Information

    Full text link
    Previous studies have confirmed the effectiveness of leveraging articulatory information to attain improved speech enhancement (SE) performance. By augmenting the original acoustic features with the place/manner of articulatory features, the SE process can be guided to consider the articulatory properties of the input speech when performing enhancement. Hence, we believe that the contextual information of articulatory attributes should include useful information and can further benefit SE in different languages. In this study, we propose an SE system that improves its performance through optimizing the contextual articulatory information in enhanced speech for both English and Mandarin. We optimize the contextual articulatory information through joint-train the SE model with an end-to-end automatic speech recognition (E2E ASR) model, predicting the sequence of broad phone classes (BPC) instead of the word sequences. Meanwhile, two training strategies are developed to train the SE system based on the BPC-based ASR: multitask-learning and deep-feature training strategies. Experimental results on the TIMIT and TMHINT dataset confirm that the contextual articulatory information facilitates an SE system in achieving better results than the traditional Acoustic Model(AM). Moreover, in contrast to another SE system that is trained with monophonic ASR, the BPC-based ASR (providing contextual articulatory information) can improve the SE performance more effectively under different signal-to-noise ratios(SNR).Comment: Will be submitted to TASL
    • …
    corecore