This paper deals with the problem of audio source separation. To handle the
complex and ill-posed nature of the problems of audio source separation, the
current state-of-the-art approaches employ deep neural networks to obtain
instrumental spectra from a mixture. In this study, we propose a novel network
architecture that extends the recently developed densely connected
convolutional network (DenseNet), which has shown excellent results on image
classification tasks. To deal with the specific problem of audio source
separation, an up-sampling layer, block skip connection and band-dedicated
dense blocks are incorporated on top of DenseNet. The proposed approach takes
advantage of long contextual information and outperforms state-of-the-art
results on SiSEC 2016 competition by a large margin in terms of
signal-to-distortion ratio. Moreover, the proposed architecture requires
significantly fewer parameters and considerably less training time compared
with other methods.Comment: to appear at WASPAA 201