Deep neural network solutions have emerged as a new and powerful paradigm for
speech enhancement (SE). The capabilities to capture long context and extract
multi-scale patterns are crucial to design effective SE networks. Such
capabilities, however, are often in conflict with the goal of maintaining
compact networks to ensure good system generalization. In this paper, we
explore dilation operations and apply them to fully convolutional networks
(FCNs) to address this issue. Dilations equip the networks with greatly
expanded receptive fields, without increasing the number of parameters.
Different strategies to fuse multi-scale dilations, as well as to install the
dilation modules are explored in this work. Using Noisy VCTK and AzBio
sentences datasets, we demonstrate that the proposed dilation models
significantly improve over the baseline FCN and outperform the state-of-the-art
SE solutions.Comment: 5 pages; will appear in WASPAA conferenc