506 research outputs found
Interactive Speech and Noise Modeling for Speech Enhancement
Speech enhancement is challenging because of the diversity of background
noise types. Most of the existing methods are focused on modelling the speech
rather than the noise. In this paper, we propose a novel idea to model speech
and noise simultaneously in a two-branch convolutional neural network, namely
SN-Net. In SN-Net, the two branches predict speech and noise, respectively.
Instead of information fusion only at the final output layer, interaction
modules are introduced at several intermediate feature domains between the two
branches to benefit each other. Such an interaction can leverage features
learned from one branch to counteract the undesired part and restore the
missing component of the other and thus enhance their discrimination
capabilities. We also design a feature extraction module, namely
residual-convolution-and-attention (RA), to capture the correlations along
temporal and frequency dimensions for both the speech and the noises.
Evaluations on public datasets show that the interaction module plays a key
role in simultaneous modeling and the SN-Net outperforms the state-of-the-art
by a large margin on various evaluation metrics. The proposed SN-Net also shows
superior performance for speaker separation.Comment: AAAI 2021 (Accepted
- …