Search CORE

17 research outputs found

Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction

Author: Dixon Simon
Ewert Sebastian
Stoller Daniel
Publication venue
Publication date: 06/04/2018
Field of study

The state of the art in music source separation employs neural networks trained in a supervised fashion on multi-track databases to estimate the sources from a given mixture. With only few datasets available, often extensive data augmentation is used to combat overfitting. Mixing random tracks, however, can even reduce separation performance as instruments in real music are strongly correlated. The key concept in our approach is that source estimates of an optimal separator should be indistinguishable from real source signals. Based on this idea, we drive the separator towards outputs deemed as realistic by discriminator networks that are trained to tell apart real from separator samples. This way, we can also use unpaired source and mixture recordings without the drawbacks of creating unrealistic music mixtures. Our framework is widely applicable as it does not assume a specific network architecture or number of sources. To our knowledge, this is the first adoption of adversarial training for music source separation. In a prototype experiment for singing voice separation, separation performance increases with our approach compared to purely supervised training.Comment: 5 pages, 2 figures, 1 table. Final version of manuscript accepted for 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Implementation available at https://github.com/f90/AdversarialAudioSeparatio

arXiv.org e-Print Archive

Crossref

ADVERSARIAL SEMI-SUPERVISED AUDIO SOURCE SEPARATION APPLIED TO SINGING VOICE EXTRACTION

Author: Dixon S
Ewert S
IEEE
Stoller D
Publication venue
Publication date: 01/01/2018
Field of study

Queen Mary Research Online

CASS: Cross Adversarial Source Separation via Autoencoder

Author: Ong Yong Zheng
Chui Charles K.
Yang Haizhao
Publication venue
Publication date: 29/05/1967
Field of study

This paper introduces a cross adversarial source separation (CASS) framework via autoencoder, a new model that aims at separating an input signal consisting of a mixture of multiple components into individual components defined via adversarial learning and autoencoder fitting. CASS unifies popular generative networks like auto-encoders (AEs) and generative adversarial networks (GANs) in a single framework. The basic building block that filters the input signal and reconstructs the

i

-th target component is a pair of deep neural networks

\mathcal{EN}_i

and

\mathcal{DE}_i

as an encoder for dimension reduction and a decoder for component reconstruction, respectively. The decoder

\mathcal{DE}_i

as a generator is enhanced by a discriminator network

\mathcal{D}_i

that favors signal structures of the

i

-th component in the

i

-th given dataset as guidance through adversarial learning. In contrast with existing practices in AEs which trains each Auto-Encoder independently, or in GANs that share the same generator, we introduce cross adversarial training that emphasizes adversarial relation between any arbitrary network pairs

(\mathcal{DE}_i,\mathcal{D}_j)

, achieving state-of-the-art performance especially when target components share similar data structures

arXiv.org e-Print Archive

The University of Nebraska, Omaha