160 research outputs found
Dictionary Learning for Sparse Representations With Applications to Blind Source Separation.
During the past decade, sparse representation has attracted much attention in the signal processing community. It aims to represent a signal as a linear combination of a small number of elementary signals called atoms. These atoms constitute a dictionary so that a signal can be expressed by the multiplication of the dictionary and a sparse coefficients vector. This leads to two main challenges that are studied in the literature, i.e. sparse coding (find the coding coefficients based on a given dictionary) and dictionary design (find an appropriate dictionary to fit the data). Dictionary design is the focus of this thesis. Traditionally, the signals can be decomposed by the predefined mathematical transform, such as discrete cosine transform (DCT), which forms the so-called analytical approach. In recent years, learning-based methods have been introduced to adapt the dictionary from a set of training data, leading to the technique of dictionary learning. Although this may involve a higher computational complexity, learned dictionaries have the potential to offer improved performance as compared with predefined dictionaries. Dictionary learning algorithm is often achieved by iteratively executing two operations: sparse approximation and dictionary update. We focus on the dictionary update step, where the dictionary is optimized with a given sparsity pattern. A novel framework is proposed to generalize benchmark mechanisms such as the method of optimal directions (MOD) and K-SVD where an arbitrary set of codewords and the corresponding sparse coefficients are simultaneously updated, hence the term simultaneous codeword optimization (SimCO). Moreover, its extended formulation ‘regularized SimCO’ mitigates the major bottleneck of dictionary update caused by the singular points. First and second order optimization procedures are designed to solve the primitive and regularized SimCO. In addition, a tree-structured multi-level representation of dictionary based on clustering is used to speed up the optimization process in the sparse coding stage. This novel dictionary learning algorithm is also applied for solving the underdetermined blind speech separation problem, leading to a multi-stage method, where the separation problem is reformulated as a sparse coding problem, with the dictionary being learned by an adaptive algorithm. Using mutual coherence and sparsity index, the performance of a variety of dictionaries for underdetermined speech separation is compared and analyzed, such as the dictionaries learned from speech mixtures and ground truth speech sources, as well as those predefined by mathematical transforms. Finally, we propose a new method for joint dictionary learning and source separation. Different from the multistage method, the proposed method can simultaneously estimate the mixing matrix, the dictionary and the sources in an alternating and blind manner. The advantages of all the proposed methods are demonstrated over the state-of-the-art methods using extensive numerical tests
Underdetermined instantaneous audio source separation via local Gaussian modeling
International audienceUnderdetermined source separation is often carried out by modeling time-frequency source coefficients via a fixed sparse prior. This approach fails when the number of active sources in one time-frequency bin is larger than the number of channels or when active sources lie on both sides of an inactive source. In this article, we partially address these issues by modeling time-frequency source coefficients via Gaussian priors with free variances. We study the resulting maximum likelihood criterion and derive a fast non-iterative optimization algorithm that finds the global minimum. We show that this algorithm outperforms state-of-the- art approaches over stereo instantaneous speech mixtures
Robust variational Bayesian clustering for underdetermined speech separation
The main focus of this thesis is the enhancement of the statistical framework employed for underdetermined T-F masking blind separation of speech. While humans are capable of extracting a speech signal of interest in the presence
of other interference and noise; actual speech recognition systems and hearing aids cannot match this psychoacoustic ability. They perform well in
noise and reverberant free environments but suffer in realistic environments.
Time-frequency masking algorithms based on computational auditory scene analysis attempt to separate multiple sound sources from only two reverberant stereo mixtures. They essentially rely on the sparsity that binaural cues exhibit in the time-frequency domain to generate masks which extract
individual sources from their corresponding spectrogram points to solve the problem of underdetermined convolutive speech separation. Statistically, this can be interpreted as a classical clustering problem. Due to analytical simplicity, a finite mixture of Gaussian distributions is commonly used in T-F masking algorithms for modelling interaural cues.
Such a model is however sensitive to outliers, therefore, a robust probabilistic model based on the Student's t-distribution is first proposed to improve the robustness of the statistical framework. This heavy tailed distribution, as compared to the Gaussian distribution, can potentially better capture outlier
values and thereby lead to more accurate probabilistic masks for source separation. This non-Gaussian approach is applied to the state-of the-art
MESSL algorithm and comparative studies are undertaken to confirm the improved separation quality.
A Bayesian clustering framework that can better model uncertainties in reverberant environments is then exploited to replace the conventional
expectation-maximization (EM) algorithm within a maximum likelihood estimation (MLE) framework. A variational Bayesian (VB) approach is
then applied to the MESSL algorithm to cluster interaural phase differences
thereby avoiding the drawbacks of MLE; specifically the probable presence of singularities and experimental results confirm an improvement in the separation performance.
Finally, the joint modelling of the interaural phase and level differences and the integration of their non-Gaussian modelling within a variational Bayesian framework, is proposed. This approach combines the advantages
of the robust estimation provided by the Student's t-distribution and the robust clustering inherent in the Bayesian approach. In other words, this
general framework avoids the difficulties associated with MLE and makes use of the heavy tailed Student's t-distribution to improve the estimation of
the soft probabilistic masks at various reverberation times particularly for sources in close proximity. Through an extensive set of simulation studies
which compares the proposed approach with other T-F masking algorithms under different scenarios, a significant improvement in terms of objective
and subjective performance measures is achieved
Cram\'er-Rao Bounds for Complex-Valued Independent Component Extraction: Determined and Piecewise Determined Mixing Models
This paper presents Cram\'er-Rao Lower Bound (CRLB) for the complex-valued
Blind Source Extraction (BSE) problem based on the assumption that the target
signal is independent of the other signals. Two instantaneous mixing models are
considered. First, we consider the standard determined mixing model used in
Independent Component Analysis (ICA) where the mixing matrix is square and
non-singular and the number of the latent sources is the same as that of the
observed signals. The CRLB for Independent Component Extraction (ICE) where the
mixing matrix is re-parameterized in order to extract only one independent
target source is computed. The target source is assumed to be non-Gaussian or
non-circular Gaussian while the other signals (background) are circular
Gaussian or non-Gaussian. The results confirm some previous observations known
for the real domain and bring new results for the complex domain. Also, the
CRLB for ICE is shown to coincide with that for ICA when the non-Gaussianity of
background is taken into account. %unless the assumed sources' distributions
are misspecified. Second, we extend the CRLB analysis to piecewise determined
mixing models. Here, the observed signals are assumed to obey the determined
mixing model within short blocks where the mixing matrices can be varying from
block to block. However, either the mixing vector or the separating vector
corresponding to the target source is assumed to be constant across the blocks.
The CRLBs for the parameters of these models bring new performance bounds for
the BSE problem.Comment: 25 pages, 8 figure
A unified approach to sparse signal processing
A unified view of the area of sparse signal processing is presented in tutorial form by bringing together various fields in which the property of sparsity has been successfully exploited. For each of these fields, various algorithms and techniques, which have been developed to leverage sparsity, are described succinctly. The common potential benefits of significant reduction in sampling rate and processing manipulations through sparse signal processing are revealed. The key application domains of sparse signal processing are sampling, coding, spectral estimation, array processing, compo-nent analysis, and multipath channel estimation. In terms of the sampling process and reconstruction algorithms, linkages are made with random sampling, compressed sensing and rate of innovation. The redundancy introduced by channel coding i
Exploitation of source nonstationarity in underdetermined blind source separation with advanced clustering techniques
The problem of blind source separation (BSS) is
investigated. Following the assumption that the time-frequency
(TF) distributions of the input sources do not overlap, quadratic
TF representation is used to exploit the sparsity of the statistically
nonstationary sources. However, separation performance is shown
to be limited by the selection of a certain threshold in classifying
the eigenvectors of the TF matrices drawn from the observation
mixtures. Two methods are, therefore, proposed based on recently
introduced advanced clustering techniques, namely Gap statistics
and self-splitting competitive learning (SSCL), to mitigate the
problem of eigenvector classification. The novel integration of
these two approaches successfully overcomes the problem of artificial
sources induced by insufficient knowledge of the threshold and
enables automatic determination of the number of active sources
over the observation. The separation performance is thereby
greatly improved. Practical consequences of violating the TF orthogonality
assumption in the current approach are also studied,
which motivates the proposal of a new solution robust to violation
of orthogonality. In this new method, the TF plane is partitioned
into appropriate blocks and source separation is thereby carried
out in a block-by-block manner. Numerical experiments with
linear chirp signals and Gaussian minimum shift keying (GMSK)
signals are included which support the improved performance of
the proposed approaches
- …