2 research outputs found
Uformer: A Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation
Complex spectrum and magnitude are considered as two major features of speech
enhancement and dereverberation. Traditional approaches always treat these two
features separately, ignoring their underlying relationship. In this paper, we
propose Uformer, a Unet based dilated complex & real dual-path conformer
network in both complex and magnitude domain for simultaneous speech
enhancement and dereverberation. We exploit time attention (TA) and dilated
convolution (DC) to leverage local and global contextual information and
frequency attention (FA) to model dimensional information. These three
sub-modules contained in the proposed dilated complex & real dual-path
conformer module effectively improve the speech enhancement and dereverberation
performance. Furthermore, hybrid encoder and decoder are adopted to
simultaneously model the complex spectrum and magnitude and promote the
information interaction between two domains. Encoder decoder attention is also
applied to enhance the interaction between encoder and decoder. Our
experimental results outperform all SOTA time and complex domain models
objectively and subjectively. Specifically, Uformer reaches 3.6032 DNSMOS on
the blind test set of Interspeech 2021 DNS Challenge, which outperforms all
top-performed models. We also carry out ablation experiments to tease apart all
proposed sub-modules that are most important.Comment: Accepted by ICASSP 202