434 research outputs found
Robust variational Bayesian clustering for underdetermined speech separation
The main focus of this thesis is the enhancement of the statistical framework employed for underdetermined T-F masking blind separation of speech. While humans are capable of extracting a speech signal of interest in the presence
of other interference and noise; actual speech recognition systems and hearing aids cannot match this psychoacoustic ability. They perform well in
noise and reverberant free environments but suffer in realistic environments.
Time-frequency masking algorithms based on computational auditory scene analysis attempt to separate multiple sound sources from only two reverberant stereo mixtures. They essentially rely on the sparsity that binaural cues exhibit in the time-frequency domain to generate masks which extract
individual sources from their corresponding spectrogram points to solve the problem of underdetermined convolutive speech separation. Statistically, this can be interpreted as a classical clustering problem. Due to analytical simplicity, a finite mixture of Gaussian distributions is commonly used in T-F masking algorithms for modelling interaural cues.
Such a model is however sensitive to outliers, therefore, a robust probabilistic model based on the Student's t-distribution is first proposed to improve the robustness of the statistical framework. This heavy tailed distribution, as compared to the Gaussian distribution, can potentially better capture outlier
values and thereby lead to more accurate probabilistic masks for source separation. This non-Gaussian approach is applied to the state-of the-art
MESSL algorithm and comparative studies are undertaken to confirm the improved separation quality.
A Bayesian clustering framework that can better model uncertainties in reverberant environments is then exploited to replace the conventional
expectation-maximization (EM) algorithm within a maximum likelihood estimation (MLE) framework. A variational Bayesian (VB) approach is
then applied to the MESSL algorithm to cluster interaural phase differences
thereby avoiding the drawbacks of MLE; specifically the probable presence of singularities and experimental results confirm an improvement in the separation performance.
Finally, the joint modelling of the interaural phase and level differences and the integration of their non-Gaussian modelling within a variational Bayesian framework, is proposed. This approach combines the advantages
of the robust estimation provided by the Student's t-distribution and the robust clustering inherent in the Bayesian approach. In other words, this
general framework avoids the difficulties associated with MLE and makes use of the heavy tailed Student's t-distribution to improve the estimation of
the soft probabilistic masks at various reverberation times particularly for sources in close proximity. Through an extensive set of simulation studies
which compares the proposed approach with other T-F masking algorithms under different scenarios, a significant improvement in terms of objective
and subjective performance measures is achieved
Dictionary Learning for Sparse Representations With Applications to Blind Source Separation.
During the past decade, sparse representation has attracted much attention in the signal processing community. It aims to represent a signal as a linear combination of a small number of elementary signals called atoms. These atoms constitute a dictionary so that a signal can be expressed by the multiplication of the dictionary and a sparse coefficients vector. This leads to two main challenges that are studied in the literature, i.e. sparse coding (find the coding coefficients based on a given dictionary) and dictionary design (find an appropriate dictionary to fit the data). Dictionary design is the focus of this thesis. Traditionally, the signals can be decomposed by the predefined mathematical transform, such as discrete cosine transform (DCT), which forms the so-called analytical approach. In recent years, learning-based methods have been introduced to adapt the dictionary from a set of training data, leading to the technique of dictionary learning. Although this may involve a higher computational complexity, learned dictionaries have the potential to offer improved performance as compared with predefined dictionaries. Dictionary learning algorithm is often achieved by iteratively executing two operations: sparse approximation and dictionary update. We focus on the dictionary update step, where the dictionary is optimized with a given sparsity pattern. A novel framework is proposed to generalize benchmark mechanisms such as the method of optimal directions (MOD) and K-SVD where an arbitrary set of codewords and the corresponding sparse coefficients are simultaneously updated, hence the term simultaneous codeword optimization (SimCO). Moreover, its extended formulation ‘regularized SimCO’ mitigates the major bottleneck of dictionary update caused by the singular points. First and second order optimization procedures are designed to solve the primitive and regularized SimCO. In addition, a tree-structured multi-level representation of dictionary based on clustering is used to speed up the optimization process in the sparse coding stage. This novel dictionary learning algorithm is also applied for solving the underdetermined blind speech separation problem, leading to a multi-stage method, where the separation problem is reformulated as a sparse coding problem, with the dictionary being learned by an adaptive algorithm. Using mutual coherence and sparsity index, the performance of a variety of dictionaries for underdetermined speech separation is compared and analyzed, such as the dictionaries learned from speech mixtures and ground truth speech sources, as well as those predefined by mathematical transforms. Finally, we propose a new method for joint dictionary learning and source separation. Different from the multistage method, the proposed method can simultaneously estimate the mixing matrix, the dictionary and the sources in an alternating and blind manner. The advantages of all the proposed methods are demonstrated over the state-of-the-art methods using extensive numerical tests
Exploiting spatial sparsity for multi-wavelength imaging in optical interferometry
Optical interferometers provide multiple wavelength measurements. In order to
fully exploit the spectral and spatial resolution of these instruments, new
algorithms for image reconstruction have to be developed. Early attempts to
deal with multi-chromatic interferometric data have consisted in recovering a
gray image of the object or independent monochromatic images in some spectral
bandwidths. The main challenge is now to recover the full 3-D (spatio-spectral)
brightness distribution of the astronomical target given all the available
data. We describe a new approach to implement multi-wavelength image
reconstruction in the case where the observed scene is a collection of
point-like sources. We show the gain in image quality (both spatially and
spectrally) achieved by globally taking into account all the data instead of
dealing with independent spectral slices. This is achieved thanks to a
regularization which favors spatial sparsity and spectral grouping of the
sources. Since the objective function is not differentiable, we had to develop
a specialized optimization algorithm which also accounts for non-negativity of
the brightness distribution.Comment: This version has been accepted for publication in J. Opt. Soc. Am.
Recommended from our members
Model updating in structural dynamics: advanced parametrization, optimal regularization, and symmetry considerations
Numerical models are pervasive tools in science and engineering for simulation, design, and assessment of physical systems. In structural engineering, finite element (FE) models are extensively used to predict responses and estimate risk for built structures. While FE models attempt to exactly replicate the physics of their corresponding structures, discrepancies always exist between measured and model output responses. Discrepancies are related to aleatoric uncertainties, such as measurement noise, and epistemic uncertainties, such as modeling errors. Epistemic uncertainties indicate that the FE model may not fully represent the built structure, greatly limiting its utility for simulation and structural assessment. Model updating is used to reduce error between measurement and model-output responses through adjustment of uncertain FE model parameters, typically using data from structural vibration studies. However, the model updating problem is often ill-posed with more unknown parameters than available data, such that parameters cannot be uniquely inferred from the data.
This dissertation focuses on two approaches to remedy ill-posedness in FE model updating: parametrization and regularization. Parametrization produces a reduced set of updating parameters to estimate, thereby improving posedness. An ideal parametrization should incorporate model uncertainties, effectively reduce errors, and use as few parameters as possible. This is a challenging task since a large number of candidate parametrizations are available in any model updating problem. To ameliorate this, three new parametrization techniques are proposed: improved parameter clustering with residual-based weighting, singular vector decomposition-based parametrization, and incremental reparametrization. All of these methods utilize local system sensitivity information, providing effective reduced-order parametrizations which incorporate FE model uncertainties.
The other focus of this dissertation is regularization, which improves posedness by providing additional constraints on the updating problem, such as a minimum-norm parameter solution constraint. Optimal regularization is proposed for use in model updating to provide an optimal balance between residual reduction and parameter change minimization. This approach links computationally-efficient deterministic model updating with asymptotic Bayesian inference to provide regularization based on maximal model evidence. Estimates are also provided for uncertainties and model evidence, along with an interesting measure of parameter efficiency
- …