Audio Source Separation with Discriminative Scattering Networks

C Févotte; DD Lee; E Vincent; J Bruna; J Han; J Mairal; P Smaragdis; S Mallat

slides

Audio Source Separation with Discriminative Scattering Networks

Authors: C Févotte
DD Lee
E Vincent
J Bruna
J Han
J Mairal
P Smaragdis
S Mallat
Publication date: 27 April 2015
Publisher
Doi

Abstract

In this report we describe an ongoing line of research for solving single-channel source separation problems. Many monaural signal decomposition techniques proposed in the literature operate on a feature space consisting of a time-frequency representation of the input data. A challenge faced by these approaches is to effectively exploit the temporal dependencies of the signals at scales larger than the duration of a time-frame. In this work we propose to tackle this problem by modeling the signals using a time-frequency representation with multiple temporal resolutions. The proposed representation consists of a pyramid of wavelet scattering operators, which generalizes Constant Q Transforms (CQT) with extra layers of convolution and complex modulus. We first show that learning standard models with this multi-resolution setting improves source separation results over fixed-resolution methods. As study case, we use Non-Negative Matrix Factorizations (NMF) that has been widely considered in many audio application. Then, we investigate the inclusion of the proposed multi-resolution setting into a discriminative training regime. We discuss several alternatives using different deep neural network architectures

Similar works

Full text

Available Versions

Crossref

info:doi/10.1007%2F978-3-319-2...

Last time updated on 01/04/2019