793 research outputs found
Neural Fast Full-Rank Spatial Covariance Analysis for Blind Source Separation
This paper describes an efficient unsupervised learning method for a neural
source separation model that utilizes a probabilistic generative model of
observed multichannel mixtures proposed for blind source separation (BSS). For
this purpose, amortized variational inference (AVI) has been used for directly
solving the inverse problem of BSS with full-rank spatial covariance analysis
(FCA). Although this unsupervised technique called neural FCA is in principle
free from the domain mismatch problem, it is computationally demanding due to
the full rankness of the spatial model in exchange for robustness against
relatively short reverberations. To reduce the model complexity without
sacrificing performance, we propose neural FastFCA based on the
jointly-diagonalizable yet full-rank spatial model. Our neural separation model
introduced for AVI alternately performs neural network blocks and single steps
of an efficient iterative algorithm called iterative source steering. This
alternating architecture enables the separation model to quickly separate the
mixture spectrogram by leveraging both the deep neural network and the
multichannel optimization algorithm. The training objective with AVI is derived
to maximize the marginalized likelihood of the observed mixtures. The
experiment using mixture signals of two to four sound sources shows that neural
FastFCA outperforms conventional BSS methods and reduces the computational time
to about 2% of that for the neural FCA.Comment: 5 pages, 2 figures, accepted to EUSIPCO 202
Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction
The state of the art in music source separation employs neural networks
trained in a supervised fashion on multi-track databases to estimate the
sources from a given mixture. With only few datasets available, often extensive
data augmentation is used to combat overfitting. Mixing random tracks, however,
can even reduce separation performance as instruments in real music are
strongly correlated. The key concept in our approach is that source estimates
of an optimal separator should be indistinguishable from real source signals.
Based on this idea, we drive the separator towards outputs deemed as realistic
by discriminator networks that are trained to tell apart real from separator
samples. This way, we can also use unpaired source and mixture recordings
without the drawbacks of creating unrealistic music mixtures. Our framework is
widely applicable as it does not assume a specific network architecture or
number of sources. To our knowledge, this is the first adoption of adversarial
training for music source separation. In a prototype experiment for singing
voice separation, separation performance increases with our approach compared
to purely supervised training.Comment: 5 pages, 2 figures, 1 table. Final version of manuscript accepted for
2018 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP). Implementation available at
https://github.com/f90/AdversarialAudioSeparatio
- …