244 research outputs found

    Plausible Prediction by Bayesian Inference

    Get PDF

    Extending expectation propagation for graphical models

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2005.Includes bibliographical references (p. 101-106).Graphical models have been widely used in many applications, ranging from human behavior recognition to wireless signal detection. However, efficient inference and learning techniques for graphical models are needed to handle complex models, such as hybrid Bayesian networks. This thesis proposes extensions of expectation propagation, a powerful generalization of loopy belief propagation, to develop efficient Bayesian inference and learning algorithms for graphical models. The first two chapters of the thesis present inference algorithms for generative graphical models, and the next two propose learning algorithms for conditional graphical models. First, the thesis proposes a window-based EP smoothing algorithm for online estimation on hybrid dynamic Bayesian networks. For an application in wireless communications, window-based EP smoothing achieves estimation accuracy comparable to sequential Monte Carlo methods, but with less than one-tenth computational cost. Second, it develops a new method that combines tree-structured EP approximations with the junction tree for inference on loopy graphs. This new method saves computation and memory by propagating messages only locally to a subgraph when processing each edge in the entire graph. Using this local propagation scheme, this method is not only more accurate, but also faster than loopy belief propagation and structured variational methods. Third, it proposes predictive automatic relevance determination (ARD) to enhance classification accuracy in the presence of irrelevant features. ARD is a Bayesian technique for feature selection.(cont.) The thesis discusses the overfitting problem associated with ARD, and proposes a method that optimizes the estimated predictive performance, instead of maximizing the model evidence. For a gene expression classification problem, predictive ARD outperforms previous methods, including traditional ARD as well as support vector machines combined with feature selection techniques. Finally, it presents Bayesian conditional random fields (BCRFs) for classifying interdependent and structured data, such as sequences, images or webs. BCRFs estimate the posterior distribution of model parameters and average prediction over this posterior to avoid overfitting. For the problems of frequently-asked-question labeling and of ink recognition, BCRFs achieve superior prediction accuracy over conditional random fields trained with maximum likelihood and maximum a posteriori criteria.by Yuan Qi.Ph.D

    Operations for Learning with Graphical Models

    Full text link
    This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feed-forward networks, and learning Gaussian and discrete Bayesian networks from data. The paper concludes by sketching some implications for data analysis and summarizing how some popular algorithms fall within the framework presented. The main original contributions here are the decomposition techniques and the demonstration that graphical models provide a framework for understanding and developing complex learning algorithms.Comment: See http://www.jair.org/ for any accompanying file

    Linear latent force models using Gaussian processes.

    Get PDF
    Purely data-driven approaches for machine learning present difficulties when data are scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data-driven modeling with a physical model of the system. We show how different, physically inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies from motion capture, computational biology, and geostatistics

    Reasoning with uncertainty using Nilsson's probabilistic logic and the maximum entropy formalism

    Get PDF
    An expert system must reason with certain and uncertain information. This thesis is concerned with the process of Reasoning with Uncertainty. Nilsson's elegant model of "Probabilistic Logic" has been chosen as the framework for this investigation, and the information theoretical aspect of the maximum entropy formalism as the inference engine. These two formalisms, although semantically compelling, offer major complexity problems to the implementor. Probabilistic Logic models the complete uncertainty space, and the maximum entropy formalism finds the least commitment probability distribution within the uncertainty space. The main finding in this thesis is that Nilsson's Probabilistic Logic can be successfully developed beyond the structure proposed by Nilsson. Some deficiencies in Nilsson's model have been uncovered in the area of probabilistic representation, making Probabilistic Logic less powerful than Bayesian Inference techniques. These deficiencies are examined and a new model of entailment is presented which overcomes these problems, allowing Probabilistic Logic the full representational power of Bayesian Inferencing. The new model also preserves an important extension which Nilsson's Probabilistic Logic has over Bayesian Inference: the ability to use uncertain evidence. Traditionally, the probabilistic, solution proposed by the maximum entropy formalism is arrived at by solving non-linear simultaneous equations for the aggregate factors of the non- linear terms. In the new model the maximum entropy algorithms are shown to have the highly desirable property of tractability. Although these problems have been solved for probabilistic entailment the problems of complexity are still prevalent in large databases of expert rules. This thesis also considers the use of heuristics and meta level reasoning in a complex knowledge base. Finally, a description of an expert system using these techniques is given

    Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations

    Get PDF
    Non-parametric models and techniques enjoy a growing popularity in the field of machine learning, and among these Bayesian inference for Gaussian process (GP) models has recently received significant attention. We feel that GP priors should be part of the standard toolbox for constructing models relevant to machine learning in the same way as parametric linear models are, and the results in this thesis help to remove some obstacles on the way towards this goal. In the first main chapter, we provide a distribution-free finite sample bound on the difference between generalisation and empirical (training) error for GP classification methods. While the general theorem (the PAC-Bayesian bound) is not new, we give a much simplified and somewhat generalised derivation and point out the underlying core technique (convex duality) explicitly. Furthermore, the application to GP models is novel (to our knowledge). A central feature of this bound is that its quality depends crucially on task knowledge being encoded faithfully in the model and prior distributions, so there is a mutual benefit between a sharp theoretical guarantee and empirically well-established statistical practices. Extensive simulations on real-world classification tasks indicate an impressive tightness of the bound, in spite of the fact that many previous bounds for related kernel machines fail to give non-trivial guarantees in this practically relevant regime. In the second main chapter, sparse approximations are developed to address the problem of the unfavourable scaling of most GP techniques with large training sets. Due to its high importance in practice, this problem has received a lot of attention recently. We demonstrate the tractability and usefulness of simple greedy forward selection with information-theoretic criteria previously used in active learning (or sequential design) and develop generic schemes for automatic model selection with many (hyper)parameters. We suggest two new generic schemes and evaluate some of their variants on large real-world classification and regression tasks. These schemes and their underlying principles (which are clearly stated and analysed) can be applied to obtain sparse approximations for a wide regime of GP models far beyond the special cases we studied here
    corecore