Search CORE

2 research outputs found

DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network for Speech Enhancement

Author: Huang Huixiang
Huang Jingbiao
Lin Jucai
Wu Renjie
Yin Jun
Publication venue
Publication date: 07/03/2021
Field of study

Generative adversarial network (GAN) still exists some problems in dealing with speech enhancement (SE) task. Some GAN-based systems adopt the same structure from Pixel-to-Pixel directly without special optimization. The importance of the generator network has not been fully explored. Other related researches change the generator network but operate in the time-frequency domain, which ignores the phase mismatch problem. In order to solve these problems, a deep complex convolution recurrent GAN (DCCRGAN) structure is proposed in this paper. The complex module builds the correlation between magnitude and phase of the waveform and has been proved to be effective. The proposed structure is trained in an end-to-end way. Different LSTM layers are used in the generator network to sufficiently explore the speech enhancement performance of DCCRGAN. The experimental results confirm that the proposed DCCRGAN outperforms the state-of-the-art GAN-based SE systems

arXiv.org e-Print Archive

Tdcgan: Temporal Dilated Convolutional Generative Adversarial Network for End-to-end Speech Enhancement

Author: Hu Xinhui
Xu Xinkang
Ye Shuaishuai
Publication venue
Publication date: 30/09/2020
Field of study

In this paper, in order to further deal with the performance degradation caused by ignoring the phase information in conventional speech enhancement systems, we proposed a temporal dilated convolutional generative adversarial network (TDCGAN) in the end-to-end based speech enhancement architecture. For the first time, we introduced the temporal dilated convolutional network with depthwise separable convolutions into the GAN structure so that the receptive field can be greatly increased without increasing the number of parameters. We also first explored the effect of signal-to-noise ratio (SNR) penalty item as regularization of the loss function of generator on improving the SNR of enhanced speech. The experimental results demonstrated that our proposed method outperformed the state-of-the-art end-to-end GAN-based speech enhancement. Moreover, compared with previous GAN-based methods, the proposed TDCGAN could greatly decreased the number of parameters. As expected, the work also demonstrated that the SNR penalty item as regularization was more effective than

L1

on improving the SNR of enhanced speech

arXiv.org e-Print Archive