2 research outputs found
Audio Source Separation with Discriminative Scattering Networks
In this report we describe an ongoing line of research for solving
single-channel source separation problems. Many monaural signal decomposition
techniques proposed in the literature operate on a feature space consisting of
a time-frequency representation of the input data. A challenge faced by these
approaches is to effectively exploit the temporal dependencies of the signals
at scales larger than the duration of a time-frame. In this work we propose to
tackle this problem by modeling the signals using a time-frequency
representation with multiple temporal resolutions. The proposed representation
consists of a pyramid of wavelet scattering operators, which generalizes
Constant Q Transforms (CQT) with extra layers of convolution and complex
modulus. We first show that learning standard models with this multi-resolution
setting improves source separation results over fixed-resolution methods. As
study case, we use Non-Negative Matrix Factorizations (NMF) that has been
widely considered in many audio application. Then, we investigate the inclusion
of the proposed multi-resolution setting into a discriminative training regime.
We discuss several alternatives using different deep neural network
architectures