37 research outputs found
Time-Domain Multi-modal Bone/air Conducted Speech Enhancement
Previous studies have proven that integrating video signals, as a
complementary modality, can facilitate improved performance for speech
enhancement (SE). However, video clips usually contain large amounts of data
and pose a high cost in terms of computational resources and thus may
complicate the SE system. As an alternative source, a bone-conducted speech
signal has a moderate data size while manifesting speech-phoneme structures,
and thus complements its air-conducted counterpart. In this study, we propose a
novel multi-modal SE structure in the time domain that leverages bone- and
air-conducted signals. In addition, we examine two ensemble-learning-based
strategies, early fusion (EF) and late fusion (LF), to integrate the two types
of speech signals, and adopt a deep learning-based fully convolutional network
to conduct the enhancement. The experiment results on the Mandarin corpus
indicate that this newly presented multi-modal (integrating bone- and
air-conducted signals) SE structure significantly outperforms the single-source
SE counterparts (with a bone- or air-conducted signal only) in various speech
evaluation metrics. In addition, the adoption of an LF strategy other than an
EF in this novel SE multi-modal structure achieves better results.Comment: multi-modal, bone/air-conducted signals, speech enhancement, fully
convolutional networ
Recommended from our members
A Single Channel End-to-End Speech Enhancement using Complex Operations
© Copyright 2021 The Authors. This paper investigates the possibility of using complex operations to perform speech enhancement task in time domain. To that end, first, the Hilbert transform is utilized to prepare the complex input in time domain. After that, the complex temporal convolutional network (CTCN) is developed to conduct complex convolutions. By cascading the TCN and the CTCN modules, the final proposed network form an encoder-decoder structure, which performs an end-to-end speech enhancement task. The results demonstrate that utilizing complex information in time domain indeed improves the enhancement performance. Compared to other approaches, the proposed network also demonstrates a superior performance in terms of objective evaluations