1,938 research outputs found
Spectro-temporal post-enhancement using MMSE estimation in NMF based single-channel source separation
We propose to use minimum mean squared error (MMSE) estimates to enhance the signals that are separated by nonnegative matrix factorization (NMF). In single channel source separation (SCSS), NMF is used to train a set of basis vectors for each source from their training spectrograms. Then NMF is used to decompose the mixed signal spectrogram as a weighted linear combination of the trained basis vectors from which estimates of each corresponding source can be obtained. In this work, we deal with the spectrogram of each separated signal as a 2D distorted signal that needs to be restored. A multiplicative distortion model is assumed where the logarithm of the true signal distribution is modeled with a Gaussian mixture model (GMM) and the distortion is modeled as having a log-normal distribution. The parameters of the GMM are learned from training data whereas the distortion parameters are learned online from each separated signal. The initial source estimates are improved and replaced with their MMSE estimates under this new probabilistic framework. The experimental results show that using the proposed MMSE estimation technique as a post enhancement after NMF improves the quality of the separated signal
Direction of Arrival with One Microphone, a few LEGOs, and Non-Negative Matrix Factorization
Conventional approaches to sound source localization require at least two
microphones. It is known, however, that people with unilateral hearing loss can
also localize sounds. Monaural localization is possible thanks to the
scattering by the head, though it hinges on learning the spectra of the various
sources. We take inspiration from this human ability to propose algorithms for
accurate sound source localization using a single microphone embedded in an
arbitrary scattering structure. The structure modifies the frequency response
of the microphone in a direction-dependent way giving each direction a
signature. While knowing those signatures is sufficient to localize sources of
white noise, localizing speech is much more challenging: it is an ill-posed
inverse problem which we regularize by prior knowledge in the form of learned
non-negative dictionaries. We demonstrate a monaural speech localization
algorithm based on non-negative matrix factorization that does not depend on
sophisticated, designed scatterers. In fact, we show experimental results with
ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures
we can accurately localize arbitrary speakers; that is, we do not need to learn
the dictionary for the particular speaker to be localized. Finally, we discuss
multi-source localization and the related limitations of our approach.Comment: This article has been accepted for publication in IEEE/ACM
Transactions on Audio, Speech, and Language processing (TASLP
Multichannel high resolution NMF for modelling convolutive mixtures of non-stationary signals in the time-frequency domain
Several probabilistic models involving latent components have been proposed for modeling time-frequency (TF) representations of audio signals such as spectrograms, notably in the nonnegative matrix factorization (NMF) literature. Among them, the recent high-resolution NMF (HR-NMF) model is able to take both phases and local correlations in each frequency band into account, and its potential has been illustrated in applications such as source separation and audio inpainting. In this paper, HR-NMF is extended to multichannel signals and to convolutive mixtures. The new model can represent a variety of stationary and non-stationary signals, including autoregressive moving average (ARMA) processes and mixtures of damped sinusoids. A fast variational expectation-maximization (EM) algorithm is proposed to estimate the enhanced model. This algorithm is applied to piano signals, and proves capable of accurately modeling reverberation, restoring missing observations, and separating pure tones with close frequencies
Sparse Gaussian Process Audio Source Separation Using Spectrum Priors in the Time-Domain
Gaussian process (GP) audio source separation is a time-domain approach that
circumvents the inherent phase approximation issue of spectrogram based
methods. Furthermore, through its kernel, GPs elegantly incorporate prior
knowledge about the sources into the separation model. Despite these compelling
advantages, the computational complexity of GP inference scales cubically with
the number of audio samples. As a result, source separation GP models have been
restricted to the analysis of short audio frames. We introduce an efficient
application of GPs to time-domain audio source separation, without compromising
performance. For this purpose, we used GP regression, together with spectral
mixture kernels, and variational sparse GPs. We compared our method with
LD-PSDTF (positive semi-definite tensor factorization), KL-NMF
(Kullback-Leibler non-negative matrix factorization), and IS-NMF (Itakura-Saito
NMF). Results show that the proposed method outperforms these techniques.Comment: Paper submitted to the 44th International Conference on Acoustics,
Speech, and Signal Processing, ICASSP 2019. To be held in Brighton, United
Kingdom, between May 12 and May 17, 201
A Unified Framework for Sparse Non-Negative Least Squares using Multiplicative Updates and the Non-Negative Matrix Factorization Problem
We study the sparse non-negative least squares (S-NNLS) problem. S-NNLS
occurs naturally in a wide variety of applications where an unknown,
non-negative quantity must be recovered from linear measurements. We present a
unified framework for S-NNLS based on a rectified power exponential scale
mixture prior on the sparse codes. We show that the proposed framework
encompasses a large class of S-NNLS algorithms and provide a computationally
efficient inference procedure based on multiplicative update rules. Such update
rules are convenient for solving large sets of S-NNLS problems simultaneously,
which is required in contexts like sparse non-negative matrix factorization
(S-NMF). We provide theoretical justification for the proposed approach by
showing that the local minima of the objective function being optimized are
sparse and the S-NNLS algorithms presented are guaranteed to converge to a set
of stationary points of the objective function. We then extend our framework to
S-NMF, showing that our framework leads to many well known S-NMF algorithms
under specific choices of prior and providing a guarantee that a popular
subclass of the proposed algorithms converges to a set of stationary points of
the objective function. Finally, we study the performance of the proposed
approaches on synthetic and real-world data.Comment: To appear in Signal Processin
Non-negative mixtures
This is the author's accepted pre-print of the article, first published as M. D. Plumbley, A. Cichocki and R. Bro. Non-negative mixtures. In P. Comon and C. Jutten (Ed), Handbook of Blind Source Separation: Independent Component Analysis and Applications. Chapter 13, pp. 515-547. Academic Press, Feb 2010. ISBN 978-0-12-374726-6 DOI: 10.1016/B978-0-12-374726-6.00018-7file: Proof:p\PlumbleyCichockiBro10-non-negative.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:p\PlumbleyCichockiBro10-non-negative.pdf:PDF owner: markp timestamp: 2011.04.2
Unsupervised Learning for Monaural Source Separation Using Maximization–Minimization Algorithm with Time–Frequency Deconvolution
This paper presents an unsupervised learning algorithm for sparse nonnegative matrix factor time–frequency deconvolution with optimized fractional β -divergence. The β -divergence is a group of cost functions parametrized by a single parameter β . The Itakura–Saito divergence, Kullback–Leibler divergence and Least Square distance are special cases that correspond to β=0, 1, 2 , respectively. This paper presents a generalized algorithm that uses a flexible range of β that includes fractional values. It describes a maximization–minimization (MM) algorithm leading to the development of a fast convergence multiplicative update algorithm with guaranteed convergence. The proposed model operates in the time–frequency domain and decomposes an information-bearing matrix into two-dimensional deconvolution of factor matrices that represent the spectral dictionary and temporal codes. The deconvolution process has been optimized to yield sparse temporal codes through maximizing the likelihood of the observations. The paper also presents a method to estimate the fractional β value. The method is demonstrated on separating audio mixtures recorded from a single channel. The paper shows that the extraction of the spectral dictionary and temporal codes is significantly more efficient by using the proposed algorithm and subsequently leads to better source separation performance. Experimental tests and comparisons with other factorization methods have been conducted to verify its efficacy
- …