301 research outputs found

    Fast factorisation of probabilistic potentials and its application to approximate inference in Bayesian networks

    Get PDF
    We present an efficient procedure for factorising probabilistic potentials represented as probability trees. This new procedure is able to detect some regularities that cannot be captured by existing methods. In cases where an exact decomposition is not achievable, we propose a heuristic way to carry out approximate factorisations guided by a parameter called factorisation degree, which is fast to compute. We show how this parameter can be used to control the tradeoff between complexity and accuracy in approximate inference algorithms for Bayesian networks

    New strategies for finding multiplicative decompositions of probability trees

    Get PDF
    Probability trees are a powerful data structure for representing probabilistic potentials. However, their complexity can become intractable if they represent a probability distribution over a large set of variables. In this paper, we study the problem of decomposing a probability tree as a product of smaller trees, with the aim of being able to handle bigger probabilistic potentials. We propose exact and approximate approaches and evaluate their behaviour through an extensive set of experiments

    The Functional Anatomy of Time: What and When in the Brain

    Get PDF
    This Opinion article considers the implications for functional anatomy of how we represent temporal structure in our exchanges with the world. It offers a theoretical treatment that tries to make sense of the architectural principles seen in mammalian brains. Specifically, it considers a factorisation between representations of temporal succession and representations of content or, heuristically, a segregation into when and what. This segregation may explain the central role of the hippocampus in neuronal hierarchies while providing a tentative explanation for recent observations of how ordinal sequences are encoded. The implications for neuroanatomy and physiology may have something important to say about how self-organised cell assembly sequences enable the brain to exhibit purposeful behaviour that transcends the here and now

    Modules or mean-fields?

    Get PDF
    The segregation of neural processing into distinct streams has been interpreted by some as evidence in favour of a modular view of brain function. This implies a set of specialised 'modules', each of which performs a specific kind of computation in isolation of other brain systems, before sharing the result of this operation with other modules. In light of a modern understanding of stochastic non-equilibrium systems, like the brain, a simpler and more parsimonious explanation presents itself. Formulating the evolution of a non-equilibrium steady state system in terms of its density dynamics reveals that such systems appear on average to perform a gradient ascent on their steady state density. If this steady state implies a sufficiently sparse conditional independency structure, this endorses a mean-field dynamical formulation. This decomposes the density over all states in a system into the product of marginal probabilities for those states. This factorisation lends the system a modular appearance, in the sense that we can interpret the dynamics of each factor independently. However, the argument here is that it is factorisation, as opposed to modularisation, that gives rise to the functional anatomy of the brain or, indeed, any sentient system. In the following, we briefly overview mean-field theory and its applications to stochastic dynamical systems. We then unpack the consequences of this factorisation through simple numerical simulations and highlight the implications for neuronal message passing and the computational architecture of sentience

    Scalable Bayesian inversion with Poisson data

    Get PDF
    Poisson data arise in many important inverse problems, e.g., medical imaging. The stochastic nature of noisy observation processes and imprecise prior information implies that there exists an ensemble of solutions consistent with the given Poisson data to various extents. Existing approaches, e.g., maximum likelihood and penalised maximum likelihood, incorporate the statistical information for point estimates, but fail to provide the important uncertainty information of various possible solu- tions. While full Bayesian approaches can solve this problem, the posterior distributions are often intractable due to their complicated form and the curse of dimensionality. In this thesis, we investigate approximate Bayesian inference techniques, i.e., variational inference (VI), expectation propagation (EP) and Bayesian deep learning (BDL), for scalable posterior exploration. The scalability relies on leveraging 1) mathematical structures emerging in the problems, i.e., the low rank structure of forward operators and the rank 1 projection form of factors in the posterior distribution, and 2) efficient feed forward processes of neural networks and further reduced training time by flexibility of dimensions with incorporating forward and adjoint operators. Apart from the scalability, we also address theoretical analysis, algorithmic design and practical implementation. For VI, we derive explicit functional form and analyse the convergence of algorithms, which are long-standing problems in the literature. For EP, we discuss how to incorporate nonnegative constraints and how to design stable moment evaluation schemes, which are vital and nontrivial practical concerns. For BDL, specifically conditional variational auto-encoders (CVAEs), we investigate how to apply them for uncertainty quantification of inverse problems and develop flexible and novel frameworks for general Bayesian Inversion. Finally, we justify these contributions with numerical experiments and show the competitiveness of our proposed methods by comparing with state-of-the-art benchmarks

    On conditional random fields: applications, feature selection, parameter estimation and hierarchical modelling

    Get PDF
    There has been a growing interest in stochastic modelling and learning with complex data, whose elements are structured and interdependent. One of the most successful methods to model data dependencies is graphical models, which is a combination of graph theory and probability theory. This thesis focuses on a special type of graphical models known as Conditional Random Fields (CRFs) (Lafferty et al., 2001), in which the output state spaces, when conditioned on some observational input data, are represented by undirected graphical models. The contributions of thesis involve both (a) broadening the current applicability of CRFs in the real world and (b) deepening the understanding of theoretical aspects of CRFs. On the application side, we empirically investigate the applications of CRFs in two real world settings. The first application is on a novel domain of Vietnamese accent restoration, in which we need to restore accents of an accent-less Vietnamese sentence. Experiments on half a million sentences of news articles show that the CRF-based approach is highly accurate. In the second application, we develop a new CRF-based movie recommendation system called Preference Network (PN). The PN jointly integrates various sources of domain knowledge into a large and densely connected Markov network. We obtained competitive results against well-established methods in the recommendation field.On the theory side, the thesis addresses three important theoretical issues of CRFs: feature selection, parameter estimation and modelling recursive sequential data. These issues are all addressed under a general setting of partial supervision in that training labels are not fully available. For feature selection, we introduce a novel learning algorithm called AdaBoost.CRF that incrementally selects features out of a large feature pool as learning proceeds. AdaBoost.CRF is an extension of the standard boosting methodology to structured and partially observed data. We demonstrate that the AdaBoost.CRF is able to eliminate irrelevant features and as a result, returns a very compact feature set without significant loss of accuracy. Parameter estimation of CRFs is generally intractable in arbitrary network structures. This thesis contributes to this area by proposing a learning method called AdaBoost.MRF (which stands for AdaBoosted Markov Random Forests). As learning proceeds AdaBoost.MRF incrementally builds a tree ensemble (a forest) that cover the original network by selecting the best spanning tree at a time. As a result, we can approximately learn many rich classes of CRFs in linear time. The third theoretical work is on modelling recursive, sequential data in that each level of resolution is a Markov sequence, where each state in the sequence is also a Markov sequence at the finer grain. One of the key contributions of this thesis is Hierarchical Conditional Random Fields (HCRF), which is an extension to the currently popular sequential CRF and the recent semi-Markov CRF (Sarawagi and Cohen, 2004). Unlike previous CRF work, the HCRF does not assume any fixed graphical structures.Rather, it treats structure as an uncertain aspect and it can estimate the structure automatically from the data. The HCRF is motivated by Hierarchical Hidden Markov Model (HHMM) (Fine et al., 1998). Importantly, the thesis shows that the HHMM is a special case of HCRF with slight modification, and the semi-Markov CRF is essentially a flat version of the HCRF. Central to our contribution in HCRF is a polynomial-time algorithm based on the Asymmetric Inside Outside (AIO) family developed in (Bui et al., 2004) for learning and inference. Another important contribution is to extend the AIO family to address learning with missing data and inference under partially observed labels. We also derive methods to deal with practical concerns associated with the AIO family, including numerical overflow and cubic-time complexity. Finally, we demonstrate good performance of HCRF against rivals on two applications: indoor video surveillance and noun-phrase chunking

    Variational Approximate Inference in Latent Linear Models

    Get PDF
    Latent linear models are core to much of machine learning and statistics. Specific examples of this model class include Bayesian generalised linear models, Gaussian process regression models and unsupervised latent linear models such as factor analysis and principal components analysis. In general, exact inference in this model class is computationally and analytically intractable. Approximations are thus required. In this thesis we consider deterministic approximate inference methods based on minimising the Kullback-Leibler (KL) divergence between a given target density and an approximating `variational' density. First we consider Gaussian KL (G-KL) approximate inference methods where the approximating variational density is a multivariate Gaussian. Regarding this procedure we make a number of novel contributions: sufficient conditions for which the G-KL objective is differentiable and convex are described, constrained parameterisations of Gaussian covariance that make G-KL methods fast and scalable are presented, the G-KL lower-bound to the target density's normalisation constant is proven to dominate those provided by local variational bounding methods. We also discuss complexity and model applicability issues of G-KL and other Gaussian approximate inference methods. To numerically validate our approach we present results comparing the performance of G-KL and other deterministic Gaussian approximate inference methods across a range of latent linear model inference problems. Second we present a new method to perform KL variational inference for a broad class of approximating variational densities. Specifically, we construct the variational density as an affine transformation of independently distributed latent random variables. The method we develop extends the known class of tractable variational approximations for which the KL divergence can be computed and optimised and enables more accurate approximations of non-Gaussian target densities to be obtained
    • …
    corecore