361 research outputs found
Statistical single channel source separation
PhD ThesisSingle channel source separation (SCSS) principally is one of the challenging fields
in signal processing and has various significant applications. Unlike conventional
SCSS methods which were based on linear instantaneous model, this research sets out
to investigate the separation of single channel in two types of mixture which is
nonlinear instantaneous mixture and linear convolutive mixture. For the nonlinear
SCSS in instantaneous mixture, this research proposes a novel solution based on a
two-stage process that consists of a Gaussianization transform which efficiently
compensates for the nonlinear distortion follow by a maximum likelihood estimator to
perform source separation. For linear SCSS in convolutive mixture, this research
proposes new methods based on nonnegative matrix factorization which decomposes a
mixture into two-dimensional convolution factor matrices that represent the spectral
basis and temporal code. The proposed factorization considers the convolutive mixing
in the decomposition by introducing frequency constrained parameters in the model.
The method aims to separate the mixture into its constituent spectral-temporal source
components while alleviating the effect of convolutive mixing. In addition, family of
Itakura-Saito divergence has been developed as a cost function which brings the
beneficial property of scale-invariant. Two new statistical techniques are proposed,
namely, Expectation-Maximisation (EM) based algorithm framework which
maximizes the log-likelihood of a mixed signals, and the maximum a posteriori
approach which maximises the joint probability of a mixed signal using multiplicative
update rules. To further improve this research work, a novel method that incorporates
adaptive sparseness into the solution has been proposed to resolve the ambiguity and
hence, improve the algorithm performance. The theoretical foundation of the proposed
solutions has been rigorously developed and discussed in details. Results have
concretely shown the effectiveness of all the proposed algorithms presented in this
thesis in separating the mixed signals in single channel and have outperformed others
available methods.Universiti Teknikal Malaysia Melaka(UTeM),
Ministry of Higher Education of Malaysi
Multichannel high resolution NMF for modelling convolutive mixtures of non-stationary signals in the time-frequency domain
Several probabilistic models involving latent components have been proposed for modeling time-frequency (TF) representations of audio signals such as spectrograms, notably in the nonnegative matrix factorization (NMF) literature. Among them, the recent high-resolution NMF (HR-NMF) model is able to take both phases and local correlations in each frequency band into account, and its potential has been illustrated in applications such as source separation and audio inpainting. In this paper, HR-NMF is extended to multichannel signals and to convolutive mixtures. The new model can represent a variety of stationary and non-stationary signals, including autoregressive moving average (ARMA) processes and mixtures of damped sinusoids. A fast variational expectation-maximization (EM) algorithm is proposed to estimate the enhanced model. This algorithm is applied to piano signals, and proves capable of accurately modeling reverberation, restoring missing observations, and separating pure tones with close frequencies
Use of Bimodal Coherence to Resolve Spectral Indeterminacy in Convolutive BSS
Recent studies show that visual information contained in visual speech can be helpful for the performance enhancement of audio-only blind source separation (BSS) algorithms. Such information is exploited through the statistical characterisation of the coherence between the audio and visual speech using, e.g. a Gaussian mixture model (GMM). In this paper, we present two new contributions. An adapted expectation maximization (AEM) algorithm is proposed in the training process to model the audio-visual coherence upon the extracted features. The coherence is exploited to solve the permutation problem in the frequency domain using a new sorting scheme. We test our algorithm on the XM2VTS multimodal database. The experimental results show that our proposed algorithm outperforms traditional audio-only BSS
- …