4,094 research outputs found
Probabilistic Modeling Paradigms for Audio Source Separation
This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems
Multi-modal dictionary learning for image separation with application in art investigation
In support of art investigation, we propose a new source separation method
that unmixes a single X-ray scan acquired from double-sided paintings. In this
problem, the X-ray signals to be separated have similar morphological
characteristics, which brings previous source separation methods to their
limits. Our solution is to use photographs taken from the front and back-side
of the panel to drive the separation process. The crux of our approach relies
on the coupling of the two imaging modalities (photographs and X-rays) using a
novel coupled dictionary learning framework able to capture both common and
disparate features across the modalities using parsimonious representations;
the common component models features shared by the multi-modal images, whereas
the innovation component captures modality-specific information. As such, our
model enables the formulation of appropriately regularized convex optimization
procedures that lead to the accurate separation of the X-rays. Our dictionary
learning framework can be tailored both to a single- and a multi-scale
framework, with the latter leading to a significant performance improvement.
Moreover, to improve further on the visual quality of the separated images, we
propose to train coupled dictionaries that ignore certain parts of the painting
corresponding to craquelure. Experimentation on synthetic and real data - taken
from digital acquisition of the Ghent Altarpiece (1432) - confirms the
superiority of our method against the state-of-the-art morphological component
analysis technique that uses either fixed or trained dictionaries to perform
image separation.Comment: submitted to IEEE Transactions on Images Processin
Bayesian orthogonal component analysis for sparse representation
This paper addresses the problem of identifying a lower dimensional space
where observed data can be sparsely represented. This under-complete dictionary
learning task can be formulated as a blind separation problem of sparse sources
linearly mixed with an unknown orthogonal mixing matrix. This issue is
formulated in a Bayesian framework. First, the unknown sparse sources are
modeled as Bernoulli-Gaussian processes. To promote sparsity, a weighted
mixture of an atom at zero and a Gaussian distribution is proposed as prior
distribution for the unobserved sources. A non-informative prior distribution
defined on an appropriate Stiefel manifold is elected for the mixing matrix.
The Bayesian inference on the unknown parameters is conducted using a Markov
chain Monte Carlo (MCMC) method. A partially collapsed Gibbs sampler is
designed to generate samples asymptotically distributed according to the joint
posterior distribution of the unknown model parameters and hyperparameters.
These samples are then used to approximate the joint maximum a posteriori
estimator of the sources and mixing matrix. Simulations conducted on synthetic
data are reported to illustrate the performance of the method for recovering
sparse representations. An application to sparse coding on under-complete
dictionary is finally investigated.Comment: Revised version. Accepted to IEEE Trans. Signal Processin
C-HiLasso: A Collaborative Hierarchical Sparse Modeling Framework
Sparse modeling is a powerful framework for data analysis and processing.
Traditionally, encoding in this framework is performed by solving an
L1-regularized linear regression problem, commonly referred to as Lasso or
Basis Pursuit. In this work we combine the sparsity-inducing property of the
Lasso model at the individual feature level, with the block-sparsity property
of the Group Lasso model, where sparse groups of features are jointly encoded,
obtaining a sparsity pattern hierarchically structured. This results in the
Hierarchical Lasso (HiLasso), which shows important practical modeling
advantages. We then extend this approach to the collaborative case, where a set
of simultaneously coded signals share the same sparsity pattern at the higher
(group) level, but not necessarily at the lower (inside the group) level,
obtaining the collaborative HiLasso model (C-HiLasso). Such signals then share
the same active groups, or classes, but not necessarily the same active set.
This model is very well suited for applications such as source identification
and separation. An efficient optimization procedure, which guarantees
convergence to the global optimum, is developed for these new models. The
underlying presentation of the new framework and optimization approach is
complemented with experimental examples and theoretical results regarding
recovery guarantees for the proposed models
Audio Source Separation Using Sparse Representations
This is the author's final version of the article, first published as A. Nesbit, M. G. Jafari, E. Vincent and M. D. Plumbley. Audio Source Separation Using Sparse Representations. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 10, pp. 246-264. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch010file: NesbitJafariVincentP11-audio.pdf:n\NesbitJafariVincentP11-audio.pdf:PDF owner: markp timestamp: 2011.02.04file: NesbitJafariVincentP11-audio.pdf:n\NesbitJafariVincentP11-audio.pdf:PDF owner: markp timestamp: 2011.02.04The authors address the problem of audio source separation, namely, the recovery of audio signals from recordings of mixtures of those signals. The sparse component analysis framework is a powerful method for achieving this. Sparse orthogonal transforms, in which only few transform coefficients differ significantly from zero, are developed; once the signal has been transformed, energy is apportioned from each transform coefficient to each estimated source, and, finally, the signal is reconstructed using the inverse transform. The overriding aim of this chapter is to demonstrate how this framework, as exemplified here by two different decomposition methods which adapt to the signal to represent it sparsely, can be used to solve different problems in different mixing scenarios. To address the instantaneous (neither delays nor echoes) and underdetermined (more sources than mixtures) mixing model, a lapped orthogonal transform is adapted to the signal by selecting a basis from a library of predetermined bases. This method is highly related to the windowing methods used in the MPEG audio coding framework. In considering the anechoic (delays but no echoes) and determined (equal number of sources and mixtures) mixing case, a greedy adaptive transform is used based on orthogonal basis functions that are learned from the observed data, instead of being selected from a predetermined library of bases. This is found to encode the signal characteristics, by introducing a feedback system between the bases and the observed data. Experiments on mixtures of speech and music signals demonstrate that these methods give good signal approximations and separation performance, and indicate promising directions for future research
- âŠ