1 research outputs found
UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition
Speech enhancement at extremely low signal-to-noise ratio (SNR) condition is
a very challenging problem and rarely investigated in previous works. This
paper proposes a robust speech enhancement approach (UNetGAN) based on U-Net
and generative adversarial learning to deal with this problem. This approach
consists of a generator network and a discriminator network, which operate
directly in the time domain. The generator network adopts a U-Net like
structure and employs dilated convolution in the bottleneck of it. We evaluate
the performance of the UNetGAN at low SNR conditions (up to -20dB) on the
public benchmark. The result demonstrates that it significantly improves the
speech quality and substantially outperforms the representative deep learning
models, including SEGAN, cGAN fo SE, Bidirectional LSTM using phase-sensitive
spectrum approximation cost function (PSA-BLSTM) and Wave-U-Net regarding
Short-Time Objective Intelligibility (STOI) and Perceptual evaluation of speech
quality (PESQ).Comment: Published in Interspeech 201