1 research outputs found
An empirical study of Conv-TasNet
Conv-TasNet is a recently proposed waveform-based deep neural network that
achieves state-of-the-art performance in speech source separation. Its
architecture consists of a learnable encoder/decoder and a separator that
operates on top of this learned space. Various improvements have been proposed
to Conv-TasNet. However, they mostly focus on the separator, leaving its
encoder/decoder as a (shallow) linear operator. In this paper, we conduct an
empirical study of Conv-TasNet and propose an enhancement to the
encoder/decoder that is based on a (deep) non-linear variant of it. In
addition, we experiment with the larger and more diverse LibriTTS dataset and
investigate the generalization capabilities of the studied models when trained
on a much larger dataset. We propose cross-dataset evaluation that includes
assessing separations from the WSJ0-2mix, LibriTTS and VCTK databases. Our
results show that enhancements to the encoder/decoder can improve average
SI-SNR performance by more than 1 dB. Furthermore, we offer insights into the
generalization capabilities of Conv-TasNet and the potential value of
improvements to the encoder/decoder.Comment: In proceedings of ICASSP202