23 research outputs found
End-to-end Source Separation with Adaptive Front-Ends
Source separation and other audio applications have traditionally relied on
the use of short-time Fourier transforms as a front-end frequency domain
representation step. The unavailability of a neural network equivalent to
forward and inverse transforms hinders the implementation of end-to-end
learning systems for these applications. We present an auto-encoder neural
network that can act as an equivalent to short-time front-end transforms. We
demonstrate the ability of the network to learn optimal, real-valued basis
functions directly from the raw waveform of a signal and further show how it
can be used as an adaptive front-end for supervised source separation. In terms
of separation performance, these transforms significantly outperform their
Fourier counterparts. Finally, we also propose a novel source to distortion
ratio based cost function for end-to-end source separation.Comment: 4 figures, 4 page
Time-Domain Audio Source Separation Based on Wave-U-Net Combined with Discrete Wavelet Transform
We propose a time-domain audio source separation method using down-sampling
(DS) and up-sampling (US) layers based on a discrete wavelet transform (DWT).
The proposed method is based on one of the state-of-the-art deep neural
networks, Wave-U-Net, which successively down-samples and up-samples feature
maps. We find that this architecture resembles that of multiresolution
analysis, and reveal that the DS layers of Wave-U-Net cause aliasing and may
discard information useful for the separation. Although the effects of these
problems may be reduced by training, to achieve a more reliable source
separation method, we should design DS layers capable of overcoming the
problems. With this belief, focusing on the fact that the DWT has an
anti-aliasing filter and the perfect reconstruction property, we design the
proposed layers. Experiments on music source separation show the efficacy of
the proposed method and the importance of simultaneously considering the
anti-aliasing filters and the perfect reconstruction property.Comment: 5 pages, to appear in IEEE International Conference on Acoustics,
Speech, and Signal Processing 2020 (ICASSP 2020