Search CORE

37 research outputs found

Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

Author: Fu Szu-Wei
Hung Jeih-weih
Hung Kuo-Hsuan
Tsao Yu
Wang Syu-Siang
Yu Cheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE). However, video clips usually contain large amounts of data and pose a high cost in terms of computational resources and thus may complicate the SE system. As an alternative source, a bone-conducted speech signal has a moderate data size while manifesting speech-phoneme structures, and thus complements its air-conducted counterpart. In this study, we propose a novel multi-modal SE structure in the time domain that leverages bone- and air-conducted signals. In addition, we examine two ensemble-learning-based strategies, early fusion (EF) and late fusion (LF), to integrate the two types of speech signals, and adopt a deep learning-based fully convolutional network to conduct the enhancement. The experiment results on the Mandarin corpus indicate that this newly presented multi-modal (integrating bone- and air-conducted signals) SE structure significantly outperforms the single-source SE counterparts (with a bone- or air-conducted signal only) in various speech evaluation metrics. In addition, the adoption of an LF strategy other than an EF in this novel SE multi-modal structure achieves better results.Comment: multi-modal, bone/air-conducted signals, speech enhancement, fully convolutional networ

arXiv.org e-Print Archive

Crossref

Recommended from our members

A Single Channel End-to-End Speech Enhancement using Complex Operations

Author: Gan L
Liu H
Wu J
Zhou Y
Publication venue: IOP Press
Publication date: 01/03/2022
Field of study

© Copyright 2021 The Authors. This paper investigates the possibility of using complex operations to perform speech enhancement task in time domain. To that end, first, the Hilbert transform is utilized to prepare the complex input in time domain. After that, the complex temporal convolutional network (CTCN) is developed to conduct complex convolutions. By cascading the TCN and the CTCN modules, the final proposed network form an encoder-decoder structure, which performs an end-to-end speech enhancement task. The results demonstrate that utilizing complex information in time domain indeed improves the enhancement performance. Compared to other approaches, the proposed network also demonstrates a superior performance in terms of objective evaluations

Brunel University Research Archive