200 research outputs found
Evaluation of emerging frequency domain convolutive blind source separation algorithms based on real room recordings
This paper presents a comparative study of three of the emerging frequency domain convolutive blind source separation (FDCBSS) techniques i.e. convolutive blind separation of non-stationary sources due to Parra and Spence, penalty function-based joint diagonalization approach for convolutive blind separation of nonstationary sources due to Wang et al. and a geometrically constrained multimodal approach for convolutive blind source separation due to Sanei et al. Objective evaluation is performed on the basis of signal to interference ratio (SIR), performance index (PI) and solution to the permutation problem. The results confirm that a multimodal approach is necessary to properly mitigate the permutation in BSS and ultimately to solve the cocktail party problem. In other words, it is to make BSS semiblind by exploiting prior geometrical information, and thereby providing the framework to find robust solutions for more challenging source separation with moving speakers
A geometrically constrained multimodal approach for convolutive blind source separation
A novel constrained multimodal approach for convolutive blind source separation is presented which incorporates video information related to geometrical position of both the speakers and the microphones, and the directionality of the speakers into the separation algorithm. The separation is performed in the frequency domain and the constraints are incorporated through a penalty function-based formulation. The separation results show a considerable improvement over traditional frequency domain convolutive BSS systems such as that developed by Parra and Spence. Importantly, the inherent permutation problem in the frequency domain BSS is potentially solve
Non-negative mixtures
This is the author's accepted pre-print of the article, first published as M. D. Plumbley, A. Cichocki and R. Bro. Non-negative mixtures. In P. Comon and C. Jutten (Ed), Handbook of Blind Source Separation: Independent Component Analysis and Applications. Chapter 13, pp. 515-547. Academic Press, Feb 2010. ISBN 978-0-12-374726-6 DOI: 10.1016/B978-0-12-374726-6.00018-7file: Proof:p\PlumbleyCichockiBro10-non-negative.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:p\PlumbleyCichockiBro10-non-negative.pdf:PDF owner: markp timestamp: 2011.04.2
Penalty function-based joint diagonalization approach for convolutive blind separation of nonstationary sources
A new approach for convolutive blind source separation (BSS) by explicitly exploiting the second-order nonstationarity of signals and operating in the frequency domain is proposed. The algorithm accommodates a penalty function within the cross-power spectrum-based cost function and thereby converts the separation problem into a joint diagonalization problem with unconstrained optimization. This leads to a new member of the family of joint diagonalization criteria and a modification of the search direction of the gradient-based descent algorithm. Using this approach, not only can the degenerate solution induced by a unmixing matrix and the effect of large errors within the elements of covariance matrices at low-frequency bins be automatically removed, but in addition, a unifying view to joint diagonalization with unitary or nonunitary constraint is provided. Numerical experiments are presented to verify the performance of the new method, which show that a suitable penalty function may lead the algorithm to a faster convergence and a better performance for the separation of convolved speech signals, in particular, in terms of shape preservation and amplitude ambiguity reduction, as compared with the conventional second-order based algorithms for convolutive mixtures that exploit signal nonstationarity
A geometrically constrained multimodal time domain approach for convolutive blind source separation
A novel time domain constrained multimodal approach for convolutive blind source separation is presented which incorporates geometrical 3-D cordinates of both the speakers and the microphones. The semi-blind separation is performed in time domain and the constraints are incorporated through an alternative least squares optimization. Orthogonal source model and gradient based optimization concepts have been used to construct and estimate the model parameters which fits the convolutive mixture signals. Moreover, the majorization concept has been used to incorporate the geometrical information for estimating the mixing channels for different time lags. The separation results show a considerable improvement over time domain convolutive blind source separation systems. Having diagonal or quasi diagonal covariance matrices for different source segments and also having independent profiles for different sources (which implies nonstationarity of the sources) are the requirements for our method. We evaluated the method using synthetically mixed real signals. The results show high capability of the method for separating speech signals. © 2011 EURASIP
Multimodal methods for blind source separation of audio sources
The enhancement of the performance of frequency domain convolutive
blind source separation (FDCBSS) techniques when applied to the
problem of separating audio sources recorded in a room environment
is the focus of this thesis. This challenging application is termed the
cocktail party problem and the ultimate aim would be to build a machine
which matches the ability of a human being to solve this task.
Human beings exploit both their eyes and their ears in solving this task
and hence they adopt a multimodal approach, i.e. they exploit both
audio and video modalities. New multimodal methods for blind source
separation of audio sources are therefore proposed in this work as a
step towards realizing such a machine.
The geometry of the room environment is initially exploited to improve
the separation performance of a FDCBSS algorithm. The positions
of the human speakers are monitored by video cameras and this
information is incorporated within the FDCBSS algorithm in the form
of constraints added to the underlying cross-power spectral density
matrix-based cost function which measures separation performance. [Continues.
Improved Convolutive and Under-Determined Blind Audio Source Separation with MRF Smoothing
Convolutive and under-determined blind audio source separation from noisy recordings is a challenging problem. Several computational strategies have been proposed to address this problem. This study is concerned with several modifications to the expectation-minimization-based algorithm, which iteratively estimates the mixing and source parameters. This strategy assumes that any entry in each source spectrogram is modeled using superimposed Gaussian components, which are mutually and individually independent across frequency and time bins. In our approach, we resolve this issue by considering a locally smooth temporal and frequency structure in the power source spectrograms. Local smoothness is enforced by incorporating a Gibbs prior in the complete data likelihood function, which models the interactions between neighboring spectrogram bins using a Markov random field. Simulations using audio files derived from stereo audio source separation evaluation campaign 2008 demonstrate high efficiency with the proposed improvement
Joint Tensor Factorization and Outlying Slab Suppression with Applications
We consider factoring low-rank tensors in the presence of outlying slabs.
This problem is important in practice, because data collected in many
real-world applications, such as speech, fluorescence, and some social network
data, fit this paradigm. Prior work tackles this problem by iteratively
selecting a fixed number of slabs and fitting, a procedure which may not
converge. We formulate this problem from a group-sparsity promoting point of
view, and propose an alternating optimization framework to handle the
corresponding () minimization-based low-rank tensor
factorization problem. The proposed algorithm features a similar per-iteration
complexity as the plain trilinear alternating least squares (TALS) algorithm.
Convergence of the proposed algorithm is also easy to analyze under the
framework of alternating optimization and its variants. In addition,
regularization and constraints can be easily incorporated to make use of
\emph{a priori} information on the latent loading factors. Simulations and real
data experiments on blind speech separation, fluorescence data analysis, and
social network mining are used to showcase the effectiveness of the proposed
algorithm
- …