1,030 research outputs found
Sparse Gaussian Process Audio Source Separation Using Spectrum Priors in the Time-Domain
Gaussian process (GP) audio source separation is a time-domain approach that
circumvents the inherent phase approximation issue of spectrogram based
methods. Furthermore, through its kernel, GPs elegantly incorporate prior
knowledge about the sources into the separation model. Despite these compelling
advantages, the computational complexity of GP inference scales cubically with
the number of audio samples. As a result, source separation GP models have been
restricted to the analysis of short audio frames. We introduce an efficient
application of GPs to time-domain audio source separation, without compromising
performance. For this purpose, we used GP regression, together with spectral
mixture kernels, and variational sparse GPs. We compared our method with
LD-PSDTF (positive semi-definite tensor factorization), KL-NMF
(Kullback-Leibler non-negative matrix factorization), and IS-NMF (Itakura-Saito
NMF). Results show that the proposed method outperforms these techniques.Comment: Paper submitted to the 44th International Conference on Acoustics,
Speech, and Signal Processing, ICASSP 2019. To be held in Brighton, United
Kingdom, between May 12 and May 17, 201
Probabilistic Modeling Paradigms for Audio Source Separation
This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems
Adaptive Langevin Sampler for Separation of t-Distribution Modelled Astrophysical Maps
We propose to model the image differentials of astrophysical source maps by
Student's t-distribution and to use them in the Bayesian source separation
method as priors. We introduce an efficient Markov Chain Monte Carlo (MCMC)
sampling scheme to unmix the astrophysical sources and describe the derivation
details. In this scheme, we use the Langevin stochastic equation for
transitions, which enables parallel drawing of random samples from the
posterior, and reduces the computation time significantly (by two orders of
magnitude). In addition, Student's t-distribution parameters are updated
throughout the iterations. The results on astrophysical source separation are
assessed with two performance criteria defined in the pixel and the frequency
domains.Comment: 12 pages, 6 figure
Acoustically Inspired Probabilistic Time-domain Music Transcription and Source Separation.
PhD ThesisAutomatic music transcription (AMT) and source separation are important
computational tasks, which can help to understand, analyse and process music
recordings. The main purpose of AMT is to estimate, from an observed
audio recording, a latent symbolic representation of a piece of music (piano-roll).
In this sense, in AMT the duration and location of every note played is
reconstructed from a mixture recording. The related task of source separation
aims to estimate the latent functions or source signals that were mixed
together in an audio recording. This task requires not only the duration and
location of every event present in the mixture, but also the reconstruction
of the waveform of all the individual sounds. Most methods for AMT and
source separation rely on the magnitude of time-frequency representations
of the analysed recording, i.e., spectrograms, and often arbitrarily discard
phase information. On one hand, this decreases the time resolution in AMT.
On the other hand, discarding phase information corrupts the reconstruction
in source separation, because the phase of each source-spectrogram must
be approximated. There is thus a need for models that circumvent phase
approximation, while operating at sample-rate resolution.
This thesis intends to solve AMT and source separation together from
an unified perspective. For this purpose, Bayesian non-parametric signal
processing, covariance kernels designed for audio, and scalable variational
inference are integrated to form efficient and acoustically-inspired probabilistic
models. To circumvent phase approximation while keeping sample-rate
resolution, AMT and source separation are addressed from a Bayesian time-domain
viewpoint. That is, the posterior distribution over the waveform of
each sound event in the mixture is computed directly from the observed data.
For this purpose, Gaussian processes (GPs) are used to define priors over the
sources/pitches. GPs are probability distributions over functions, and its
kernel or covariance determines the properties of the functions sampled from
a GP. Finally, the GP priors and the available data (mixture recording) are
combined using Bayes' theorem in order to compute the posterior distributions
over the sources/pitches.
Although the proposed paradigm is elegant, it introduces two main challenges.
First, as mentioned before, the kernel of the GP priors determines the
properties of each source/pitch function, that is, its smoothness, stationariness,
and more importantly its spectrum. Consequently, the proposed model
requires the design of flexible kernels, able to learn the rich frequency content
and intricate properties of audio sources. To this end, spectral mixture
(SM) kernels are studied, and the Mat ern spectral mixture (MSM) kernel
is introduced, i.e. a modified version of the SM covariance function. The
MSM kernel introduces less strong smoothness, thus it is more suitable for
modelling physical processes. Second, the computational complexity of GP
inference scales cubically with the number of audio samples. Therefore, the
application of GP models to large audio signals becomes intractable. To
overcome this limitation, variational inference is used to make the proposed
model scalable and suitable for signals in the order of hundreds of thousands
of data points.
The integration of GP priors, kernels intended for audio, and variational
inference could enable AMT and source separation time-domain methods to
reconstruct sources and transcribe music in an efficient and informed manner.
In addition, AMT and source separation are current challenges, because
the spectra of the sources/pitches overlap with each other in intricate
ways. Thus, the development of probabilistic models capable of differentiating
sources/pitches in the time domain, despite the high similarity between
their spectra, opens the possibility to take a step towards solving source separation
and automatic music transcription. We demonstrate the utility of our
methods using real and synthesized music audio datasets for various types of
musical instruments
Sparse and Non-Negative BSS for Noisy Data
Non-negative blind source separation (BSS) has raised interest in various
fields of research, as testified by the wide literature on the topic of
non-negative matrix factorization (NMF). In this context, it is fundamental
that the sources to be estimated present some diversity in order to be
efficiently retrieved. Sparsity is known to enhance such contrast between the
sources while producing very robust approaches, especially to noise. In this
paper we introduce a new algorithm in order to tackle the blind separation of
non-negative sparse sources from noisy measurements. We first show that
sparsity and non-negativity constraints have to be carefully applied on the
sought-after solution. In fact, improperly constrained solutions are unlikely
to be stable and are therefore sub-optimal. The proposed algorithm, named nGMCA
(non-negative Generalized Morphological Component Analysis), makes use of
proximal calculus techniques to provide properly constrained solutions. The
performance of nGMCA compared to other state-of-the-art algorithms is
demonstrated by numerical experiments encompassing a wide variety of settings,
with negligible parameter tuning. In particular, nGMCA is shown to provide
robustness to noise and performs well on synthetic mixtures of real NMR
spectra.Comment: 13 pages, 18 figures, to be published in IEEE Transactions on Signal
Processin
Single channel speech music separation using nonnegative matrix factorization and spectral masks
A single channel speech-music separation algorithm based on nonnegative matrix factorization (NMF) with spectral masks is proposed in this work. The proposed algorithm uses training data of speech and music signals with nonnegative matrix factorization followed by masking to separate the mixed signal. In the training stage, NMF uses the training data to train a set of basis vectors for each source. These bases are trained using NMF in the magnitude spectrum domain. After observing the mixed signal, NMF is used to decompose its magnitude spectra into a linear combination of the trained bases for both sources. The decomposition results are used to build a mask, which explains the contribution of each source in the mixed signal. Experimental results show that using masks after NMF improves the separation process even when calculating NMF with fewer iterations, which yields a faster separation process
- …