1 research outputs found
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
Speech enhancement has benefited from the success of deep learning in terms
of intelligibility and perceptual quality. Conventional time-frequency (TF)
domain methods focus on predicting TF-masks or speech spectrum, via a naive
convolution neural network (CNN) or recurrent neural network (RNN). Some recent
studies use complex-valued spectrogram as a training target but train in a
real-valued network, predicting the magnitude and phase component or real and
imaginary part, respectively. Particularly, convolution recurrent network (CRN)
integrates a convolutional encoder-decoder (CED) structure and long short-term
memory (LSTM), which has been proven to be helpful for complex targets. In
order to train the complex target more effectively, in this paper, we design a
new network structure simulating the complex-valued operation, called Deep
Complex Convolution Recurrent Network (DCCRN), where both CNN and RNN
structures can handle complex-valued operation. The proposed DCCRN models are
very competitive over other previous networks, either on objective or
subjective metric. With only 3.7M parameters, our DCCRN models submitted to the
Interspeech 2020 Deep Noise Suppression (DNS) challenge ranked first for the
real-time-track and second for the non-real-time track in terms of Mean Opinion
Score (MOS).Comment: Accepted by Interspeech 202