Advanced deep neural networks for speech separation and enhancement

Abstract

Ph. D. Thesis.Monaural speech separation and enhancement aim to remove noise interference from the noisy speech mixture recorded by a single microphone, which causes a lack of spatial information. Deep neural network (DNN) dominates speech separation and enhancement. However, there are still challenges in DNN-based methods, including choosing proper training targets and network structures, refining generalization ability and model capacity for unseen speakers and noises, and mitigating the reverberations in room environments. This thesis focuses on improving separation and enhancement performance in the real-world environment. The first contribution in this thesis is to address monaural speech separation and enhancement within reverberant room environment by designing new training targets and advanced network structures. The second contribution to this thesis is on improving the enhancement performance by proposing a multi-scale feature recalibration convolutional bidirectional gate recurrent unit (GRU) network (MCGN). The third contribution is to improve the model capacity of the network and retain the robustness in the enhancement performance. A convolutional fusion network (CFN) is proposed, which exploits the group convolutional fusion unit (GCFU). The proposed speech enhancement methods are evaluated with various challenging datasets. The proposed methods are assessed with the stateof-the-art techniques and performance measures to confirm that this thesis contributes novel solution

Similar works

This paper was published in Newcastle University eTheses.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.