3,524 research outputs found
Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction
The state of the art in music source separation employs neural networks
trained in a supervised fashion on multi-track databases to estimate the
sources from a given mixture. With only few datasets available, often extensive
data augmentation is used to combat overfitting. Mixing random tracks, however,
can even reduce separation performance as instruments in real music are
strongly correlated. The key concept in our approach is that source estimates
of an optimal separator should be indistinguishable from real source signals.
Based on this idea, we drive the separator towards outputs deemed as realistic
by discriminator networks that are trained to tell apart real from separator
samples. This way, we can also use unpaired source and mixture recordings
without the drawbacks of creating unrealistic music mixtures. Our framework is
widely applicable as it does not assume a specific network architecture or
number of sources. To our knowledge, this is the first adoption of adversarial
training for music source separation. In a prototype experiment for singing
voice separation, separation performance increases with our approach compared
to purely supervised training.Comment: 5 pages, 2 figures, 1 table. Final version of manuscript accepted for
2018 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP). Implementation available at
https://github.com/f90/AdversarialAudioSeparatio
Probabilistic Modeling Paradigms for Audio Source Separation
This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems
Efficient Bayesian inference for harmonic models via adaptive posterior factorization
NOTICE: this is the author’s version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in NEUROCOMPUTING, [VOL72, ISSUE 1-3, (2008)] DOI10.1016/j.neucom.2007.12.05
Collaborative sparse regression using spatially correlated supports - Application to hyperspectral unmixing
This paper presents a new Bayesian collaborative sparse regression method for
linear unmixing of hyperspectral images. Our contribution is twofold; first, we
propose a new Bayesian model for structured sparse regression in which the
supports of the sparse abundance vectors are a priori spatially correlated
across pixels (i.e., materials are spatially organised rather than randomly
distributed at a pixel level). This prior information is encoded in the model
through a truncated multivariate Ising Markov random field, which also takes
into consideration the facts that pixels cannot be empty (i.e, there is at
least one material present in each pixel), and that different materials may
exhibit different degrees of spatial regularity. Secondly, we propose an
advanced Markov chain Monte Carlo algorithm to estimate the posterior
probabilities that materials are present or absent in each pixel, and,
conditionally to the maximum marginal a posteriori configuration of the
support, compute the MMSE estimates of the abundance vectors. A remarkable
property of this algorithm is that it self-adjusts the values of the parameters
of the Markov random field, thus relieving practitioners from setting
regularisation parameters by cross-validation. The performance of the proposed
methodology is finally demonstrated through a series of experiments with
synthetic and real data and comparisons with other algorithms from the
literature
Learning sparse representations of depth
This paper introduces a new method for learning and inferring sparse
representations of depth (disparity) maps. The proposed algorithm relaxes the
usual assumption of the stationary noise model in sparse coding. This enables
learning from data corrupted with spatially varying noise or uncertainty,
typically obtained by laser range scanners or structured light depth cameras.
Sparse representations are learned from the Middlebury database disparity maps
and then exploited in a two-layer graphical model for inferring depth from
stereo, by including a sparsity prior on the learned features. Since they
capture higher-order dependencies in the depth structure, these priors can
complement smoothness priors commonly used in depth inference based on Markov
Random Field (MRF) models. Inference on the proposed graph is achieved using an
alternating iterative optimization technique, where the first layer is solved
using an existing MRF-based stereo matching algorithm, then held fixed as the
second layer is solved using the proposed non-stationary sparse coding
algorithm. This leads to a general method for improving solutions of state of
the art MRF-based depth estimation algorithms. Our experimental results first
show that depth inference using learned representations leads to state of the
art denoising of depth maps obtained from laser range scanners and a time of
flight camera. Furthermore, we show that adding sparse priors improves the
results of two depth estimation methods: the classical graph cut algorithm by
Boykov et al. and the more recent algorithm of Woodford et al.Comment: 12 page
Hierarchical Compound Poisson Factorization
Non-negative matrix factorization models based on a hierarchical
Gamma-Poisson structure capture user and item behavior effectively in extremely
sparse data sets, making them the ideal choice for collaborative filtering
applications. Hierarchical Poisson factorization (HPF) in particular has proved
successful for scalable recommendation systems with extreme sparsity. HPF,
however, suffers from a tight coupling of sparsity model (absence of a rating)
and response model (the value of the rating), which limits the expressiveness
of the latter. Here, we introduce hierarchical compound Poisson factorization
(HCPF) that has the favorable Gamma-Poisson structure and scalability of HPF to
high-dimensional extremely sparse matrices. More importantly, HCPF decouples
the sparsity model from the response model, allowing us to choose the most
suitable distribution for the response. HCPF can capture binary, non-negative
discrete, non-negative continuous, and zero-inflated continuous responses. We
compare HCPF with HPF on nine discrete and three continuous data sets and
conclude that HCPF captures the relationship between sparsity and response
better than HPF.Comment: Will appear on Proceedings of the 33 rd International Conference on
Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 4
- …