11,289 research outputs found
Multimodal blind source separation for moving sources
A novel multimodal approach is proposed to solve the problem of
blind source separation (BSS) of moving sources. The challenge
of BSS for moving sources is that the mixing filters are time varying,
thus the unmixing filters should also be time varying, which are
difficult to track in real time. In the proposed approach, the visual
modality is utilized to facilitate the separation for both stationary and
moving sources. The movement of the sources is detected by a 3-D
tracker based on particle filtering. The full BSS solution is formed
by integrating a frequency domain blind source separation algorithm
and beamforming: if the sources are identified as stationary, a frequency
domain BSS algorithm is implemented with an initialization
derived from the visual information. Once the sources are moving,
a beamforming algorithm is used to perform real time speech
enhancement and provide separation of the sources. Experimental
results show that by utilizing the visual modality, the proposed algorithm
can not only improve the performance of the BSS algorithm
and mitigate the permutation problem for stationary sources, but also
provide a good BSS performance for moving sources in a low reverberant
environment
Multimodal blind source separation for moving sources
A novel multimodal approach is proposed to solve the problem of
blind source separation (BSS) of moving sources. The challenge
of BSS for moving sources is that the mixing filters are time varying,
thus the unmixing filters should also be time varying, which are
difficult to track in real time. In the proposed approach, the visual
modality is utilized to facilitate the separation for both stationary and
moving sources. The movement of the sources is detected by a 3-D
tracker based on particle filtering. The full BSS solution is formed
by integrating a frequency domain blind source separation algorithm
and beamforming: if the sources are identified as stationary, a frequency
domain BSS algorithm is implemented with an initialization
derived from the visual information. Once the sources are moving,
a beamforming algorithm is used to perform real time speech
enhancement and provide separation of the sources. Experimental
results show that by utilizing the visual modality, the proposed algorithm
can not only improve the performance of the BSS algorithm
and mitigate the permutation problem for stationary sources, but also
provide a good BSS performance for moving sources in a low reverberant
environment
Evaluation of emerging frequency domain convolutive blind source separation algorithms based on real room recordings
This paper presents a comparative study of three of the emerging frequency domain convolutive blind source separation (FDCBSS) techniques i.e. convolutive blind separation of non-stationary sources due to Parra and Spence, penalty function-based joint diagonalization approach for convolutive blind separation of nonstationary sources due to Wang et al. and a geometrically constrained multimodal approach for convolutive blind source separation due to Sanei et al. Objective evaluation is performed on the basis of signal to interference ratio (SIR), performance index (PI) and solution to the permutation problem. The results confirm that a multimodal approach is necessary to properly mitigate the permutation in BSS and ultimately to solve the cocktail party problem. In other words, it is to make BSS semiblind by exploiting prior geometrical information, and thereby providing the framework to find robust solutions for more challenging source separation with moving speakers
A multimodal approach to blind source separation of moving sources
A novel multimodal approach is proposed to solve the
problem of blind source separation (BSS) of moving sources. The
challenge of BSS for moving sources is that the mixing filters are
time varying; thus, the unmixing filters should also be time varying,
which are difficult to calculate in real time. In the proposed approach,
the visual modality is utilized to facilitate the separation for
both stationary and moving sources. The movement of the sources
is detected by a 3-D tracker based on video cameras. Positions
and velocities of the sources are obtained from the 3-D tracker
based on a Markov Chain Monte Carlo particle filter (MCMC-PF),
which results in high sampling efficiency. The full BSS solution
is formed by integrating a frequency domain blind source separation
algorithm and beamforming: if the sources are identified
as stationary for a certain minimum period, a frequency domain
BSS algorithm is implemented with an initialization derived from
the positions of the source signals. Once the sources are moving, a
beamforming algorithm which requires no prior statistical knowledge
is used to perform real time speech enhancement and provide
separation of the sources. Experimental results confirm that
by utilizing the visual modality, the proposed algorithm not only
improves the performance of the BSS algorithm and mitigates the
permutation problem for stationary sources, but also provides a
good BSS performance for moving sources in a low reverberant
environment
Multimodal methods for blind source separation of audio sources
The enhancement of the performance of frequency domain convolutive
blind source separation (FDCBSS) techniques when applied to the
problem of separating audio sources recorded in a room environment
is the focus of this thesis. This challenging application is termed the
cocktail party problem and the ultimate aim would be to build a machine
which matches the ability of a human being to solve this task.
Human beings exploit both their eyes and their ears in solving this task
and hence they adopt a multimodal approach, i.e. they exploit both
audio and video modalities. New multimodal methods for blind source
separation of audio sources are therefore proposed in this work as a
step towards realizing such a machine.
The geometry of the room environment is initially exploited to improve
the separation performance of a FDCBSS algorithm. The positions
of the human speakers are monitored by video cameras and this
information is incorporated within the FDCBSS algorithm in the form
of constraints added to the underlying cross-power spectral density
matrix-based cost function which measures separation performance. [Continues.
Improved physiological noise regression in fNIRS: a multimodal extension of the General Linear Model using temporally embedded Canonical Correlation Analysis
For the robust estimation of evoked brain activity from functional Near-Infrared Spectroscopy (fNIRS) signals, it is crucial to reduce nuisance signals from systemic physiology and motion. The current best practice incorporates short-separation (SS) fNIRS measurements as regressors in a General Linear Model (GLM). However, several challenging signal characteristics such as non-instantaneous and non-constant coupling are not yet addressed by this approach and additional auxiliary signals are not optimally exploited. We have recently introduced a new methodological framework for the unsupervised multivariate analysis of fNIRS signals using Blind Source Separation (BSS) methods. Building onto the framework, in this manuscript we show how to incorporate the advantages of regularized temporally embedded Canonical Correlation Analysis (tCCA) into the supervised GLM. This approach allows flexible integration of any number of auxiliary modalities and signals. We provide guidance for the selection of optimal parameters and auxiliary signals for the proposed GLM extension. Its performance in the recovery of evoked HRFs is then evaluated using both simulated ground truth data and real experimental data and compared with the GLM with short-separation regression. Our results show that the GLM with tCCA significantly improves upon the current best practice, yielding significantly better results across all applied metrics: Correlation (HbO max. +45%), Root Mean Squared Error (HbO max. -55%), F-Score (HbO up to 3.25-fold) and p-value as well as power spectral density of the noise floor. The proposed method can be incorporated into the GLM in an easily applicable way that flexibly combines any available auxiliary signals into optimal nuisance regressors. This work has potential significance both for conventional neuroscientific fNIRS experiments as well as for emerging applications of fNIRS in everyday environments, medicine and BCI, where high Contrast to Noise Ratio is of importance for single trial analysis.Published versio
A geometrically constrained multimodal time domain approach for convolutive blind source separation
A novel time domain constrained multimodal approach for convolutive blind source separation is presented which incorporates geometrical 3-D cordinates of both the speakers and the microphones. The semi-blind separation is performed in time domain and the constraints are incorporated through an alternative least squares optimization. Orthogonal source model and gradient based optimization concepts have been used to construct and estimate the model parameters which fits the convolutive mixture signals. Moreover, the majorization concept has been used to incorporate the geometrical information for estimating the mixing channels for different time lags. The separation results show a considerable improvement over time domain convolutive blind source separation systems. Having diagonal or quasi diagonal covariance matrices for different source segments and also having independent profiles for different sources (which implies nonstationarity of the sources) are the requirements for our method. We evaluated the method using synthetically mixed real signals. The results show high capability of the method for separating speech signals. © 2011 EURASIP
- …