455 research outputs found

    Enhanced IVA for audio separation in highly reverberant environments

    Get PDF
    Blind Audio Source Separation (BASS), inspired by the "cocktail-party problem", has been a leading research application for blind source separation (BSS). This thesis concerns the enhancement of frequency domain convolutive blind source separation (FDCBSS) techniques for audio separation in highly reverberant room environments. Independent component analysis (ICA) is a higher order statistics (HOS) approach commonly used in the BSS framework. When applied to audio FDCBSS, ICA based methods suffer from the permutation problem across the frequency bins of each source. Independent vector analysis (IVA) is an FD-BSS algorithm that theoretically solves the permutation problem by using a multivariate source prior, where the sources are considered to be random vectors. The algorithm allows independence between multivariate source signals, and retains dependency between the source signals within each source vector. The source prior adopted to model the nonlinear dependency structure within the source vectors is crucial to the separation performance of the IVA algorithm. The focus of this thesis is on improving the separation performance of the IVA algorithm in the application of BASS. An alternative multivariate Student's t distribution is proposed as the source prior for the batch IVA algorithm. A Student's t probability density function can better model certain frequency domain speech signals due to its tail dependency property. Then, the nonlinear score function, for the IVA, is derived from the proposed source prior. A novel energy driven mixed super Gaussian and Student's t source prior is proposed for the IVA and FastIVA algorithms. The Student's t distribution, in the mixed source prior, can model the high amplitude data points whereas the super Gaussian distribution can model the lower amplitude information in the speech signals. The ratio of both distributions can be adjusted according to the energy of the observed mixtures to adapt for different types of speech signals. A particular multivariate generalized Gaussian distribution is adopted as the source prior for the online IVA algorithm. The nonlinear score function derived from this proposed source prior contains fourth order relationships between different frequency bins, which provides a more informative and stronger dependency structure and thereby improves the separation performance. An adaptive learning scheme is developed to improve the performance of the online IVA algorithm. The scheme adjusts the learning rate as a function of proximity to the target solutions. The scheme is also accompanied with a novel switched source prior technique taking the best performance properties of the super Gaussian source prior and the generalized Gaussian source prior as the algorithm converges. The methods and techniques, proposed in this thesis, are evaluated with real speech source signals in different simulated and real reverberant acoustic environments. A variety of measures are used within the evaluation criteria of the various algorithms. The experimental results demonstrate improved performance of the proposed methods and their robustness in a wide range of situations

    Filter-Based Probabilistic Markov Random Field Image Priors: Learning, Evaluation, and Image Analysis

    Get PDF
    Markov random fields (MRF) based on linear filter responses are one of the most popular forms for modeling image priors due to their rigorous probabilistic interpretations and versatility in various applications. In this dissertation, we propose an application-independent method to quantitatively evaluate MRF image priors using model samples. To this end, we developed an efficient auxiliary-variable Gibbs samplers for a general class of MRFs with flexible potentials. We found that the popular pairwise and high-order MRF priors capture image statistics quite roughly and exhibit poor generative properties. We further developed new learning strategies and obtained high-order MRFs that well capture the statistics of the inbuilt features, thus being real maximum-entropy models, and other important statistical properties of natural images, outlining the capabilities of MRFs. We suggest a multi-modal extension of MRF potentials which not only allows to train more expressive priors, but also helps to reveal more insights of MRF variants, based on which we are able to train compact, fully-convolutional restricted Boltzmann machines (RBM) that can model visual repetitive textures even better than more complex and deep models. The learned high-order MRFs allow us to develop new methods for various real-world image analysis problems. For denoising of natural images and deconvolution of microscopy images, the MRF priors are employed in a pure generative setting. We propose efficient sampling-based methods to infer Bayesian minimum mean squared error (MMSE) estimates, which substantially outperform maximum a-posteriori (MAP) estimates and can compete with state-of-the-art discriminative methods. For non-rigid registration of live cell nuclei in time-lapse microscopy images, we propose a global optical flow-based method. The statistics of noise in fluorescence microscopy images are studied to derive an adaptive weighting scheme for increasing model robustness. High-order MRFs are also employed to train image filters for extracting important features of cell nuclei and the deformation of nuclei are then estimated in the learned feature spaces. The developed method outperforms previous approaches in terms of both registration accuracy and computational efficiency

    Sequence learning using deep neural networks with flexibility and interpretability

    Get PDF
    Throughout this thesis, I investigate two long-standing yet rarely explored sequence learning challenges under the Probabilistic Graphical Models (PGMs) framework: learning multi-timescale representations on a single sequence and learning higher-order dynamics between multi-sequences. The first challenge is tackled with Hidden Markov Models (HMMs), a type of directed PGMs, under the reinforcement learning framework. I prove that the Semi-Markov Decision Problem (SMDP) formulated option framework [Sutton et al., 1999, Bacon et al., 2017, Zhang and Whiteson, 2019], one of the most promising Hierarchical Reinforcement Learning (HRL) frameworks, has a Markov Decision Problem (MDP) equivalence. Based on this equivalence, a simple yet effective Skill-Action (SA) architecture is proposed. Our empirical studies on challenging robot simulation environments demonstrate that SA significantly outperforms all baselines on both infinite horizon and transfer learning environments. Because of its exceptional scalability, SA gives rise to a large scale pre-training architecture in reinforcement learning. The second challenge is tackled with Markov Random Fields (MRFs), also known as undirected PGMs, under the supervised learning framework. I employ binary MRFs with weighted Lower Linear Envelope Potentials (LLEPs) to capture higher-order dependencies. I propose an exact inference algorithm under the graph-cuts framework and an efficient learning algorithm under the Latent Structural Support Vector Machines (LSSVMs) framework. In order to learn higher-order latent dynamics on time series, we layer multi-task recurrent neural networks (RNNs) on top of Markov random fields (MRFs). A sub-gradient algorithm is employed to perform end-to-end training. We conduct thorough empirical studies on three popular Chinese stock market indexes and the proposed method outperforms all baselines. To our best knowledge, the proposed technique is the first to investigate higher-order dynamics between stocks

    Generalized method of moments approach for spatial-temporal binary data

    Get PDF

    Doctor of Philosophy

    Get PDF
    dissertationFunctional magnetic resonance imaging (fMRI) measures the change of oxygen consumption level in the blood vessels of the human brain, hence indirectly detecting the neuronal activity. Resting-state fMRI (rs-fMRI) is used to identify the intrinsic functional patterns of the brain when there is no external stimulus. Accurate estimation of intrinsic activity is important for understanding the functional organization and dynamics of the brain, as well as differences in the functional networks of patients with mental disorders. This dissertation aims to robustly estimate the functional connectivities and networks of the human brain using rs-fMRI data of multiple subjects. We use Markov random field (MRF), an undirected graphical model to represent the statistical dependency among the functional network variables. Graphical models describe multivariate probability distributions that can be factorized and represented by a graph. By defining the nodes and the edges along with their weights according to our assumptions, we build soft constraints into the graph structure as prior information. We explore various approximate optimization methods including variational Bayesian, graph cuts, and Markov chain Monte Carlo sampling (MCMC). We develop the random field models to solve three related problems. In the first problem, the goal is to detect the pairwise connectivity between gray matter voxels in a rs-fMRI dataset of the single subject. We define a six-dimensional graph to represent our prior information that two voxels are more likely to be connected if their spatial neighbors are connected. The posterior mean of the connectivity variables are estimated by variational inference, also known as mean field theory in statistical physics. The proposed method proves to outperform the standard spatial smoothing and is able to detect finer patterns of brain activity. Our second work aims to identify multiple functional systems. We define a Potts model, a special case of MRF, on the network label variables, and define von Mises-Fisher distribution on the normalized fMRI signal. The inference is significantly more difficult than the binary classification in the previous problem. We use MCMC to draw samples from the posterior distribution of network labels. In the third application, we extend the graphical model to the multiple subject scenario. By building a graph including the network labels of both a group map and the subject label maps, we define a hierarchical model that has richer structure than the flat single-subject model, and captures the shared patterns as well as the variation among the subjects. All three solutions are data-driven Bayesian methods, which estimate model parameters from the data. The experiments show that by the regularization of MRF, the functional network maps we estimate are more accurate and more consistent across multiple sessions

    MATEDA: A suite of EDA programs in Matlab

    Get PDF
    This paper describes MATEDA-2.0, a suite of programs in Matlab for estimation of distribution algorithms. The package allows the optimization of single and multi-objective problems with estimation of distribution algorithms (EDAs) based on undirected graphical models and Bayesian networks. The implementation is conceived for allowing the incorporation by the user of different combinations of selection, learning, sampling, and local search procedures. Other included methods allow the analysis of the structures learned by the probabilistic models, the visualization of particular features of these structures and the use of the probabilistic models as fitness modeling tools
    corecore