2 research outputs found
DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network for Speech Enhancement
Generative adversarial network (GAN) still exists some problems in dealing
with speech enhancement (SE) task. Some GAN-based systems adopt the same
structure from Pixel-to-Pixel directly without special optimization. The
importance of the generator network has not been fully explored. Other related
researches change the generator network but operate in the time-frequency
domain, which ignores the phase mismatch problem. In order to solve these
problems, a deep complex convolution recurrent GAN (DCCRGAN) structure is
proposed in this paper. The complex module builds the correlation between
magnitude and phase of the waveform and has been proved to be effective. The
proposed structure is trained in an end-to-end way. Different LSTM layers are
used in the generator network to sufficiently explore the speech enhancement
performance of DCCRGAN. The experimental results confirm that the proposed
DCCRGAN outperforms the state-of-the-art GAN-based SE systems
Tdcgan: Temporal Dilated Convolutional Generative Adversarial Network for End-to-end Speech Enhancement
In this paper, in order to further deal with the performance degradation
caused by ignoring the phase information in conventional speech enhancement
systems, we proposed a temporal dilated convolutional generative adversarial
network (TDCGAN) in the end-to-end based speech enhancement architecture. For
the first time, we introduced the temporal dilated convolutional network with
depthwise separable convolutions into the GAN structure so that the receptive
field can be greatly increased without increasing the number of parameters. We
also first explored the effect of signal-to-noise ratio (SNR) penalty item as
regularization of the loss function of generator on improving the SNR of
enhanced speech. The experimental results demonstrated that our proposed method
outperformed the state-of-the-art end-to-end GAN-based speech enhancement.
Moreover, compared with previous GAN-based methods, the proposed TDCGAN could
greatly decreased the number of parameters. As expected, the work also
demonstrated that the SNR penalty item as regularization was more effective
than on improving the SNR of enhanced speech