22,532 research outputs found

    Bayesian estimation and classification with incomplete data using mixture models

    Get PDF
    ©2004 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Reasoning from data in practical problems is frequently hampered by missing observations. Mixture models provide a powerful general semi-parametric method for modelling densities and have close links to radial basis function neural networks (RBFs). We extend the Data Augmentation (DA) technique for multiple imputation to Gaussian mixture models to permit fully Bayesian inference of model parameters and estimation of the missing values. The method is compared to imputation using a single normal density on synthetic and real-world data. In addition to a lower mean squared error than can be achieved by simple imputation methods, mixture Models provide valuable information on the potentially multi-modal nature of imputed values. The DA formalism is extended to a classifier closely related to RBF networks permitting Bayesian classification with incomplete data; the technique is illustrated on synthetic and real datasets

    Imputation Estimators Partially Correct for Model Misspecification

    Full text link
    Inference problems with incomplete observations often aim at estimating population properties of unobserved quantities. One simple way to accomplish this estimation is to impute the unobserved quantities of interest at the individual level and then take an empirical average of the imputed values. We show that this simple imputation estimator can provide partial protection against model misspecification. We illustrate imputation estimators' robustness to model specification on three examples: mixture model-based clustering, estimation of genotype frequencies in population genetics, and estimation of Markovian evolutionary distances. In the final example, using a representative model misspecification, we demonstrate that in non-degenerate cases, the imputation estimator dominates the plug-in estimate asymptotically. We conclude by outlining a Bayesian implementation of the imputation-based estimation.Comment: major rewrite, beta-binomial example removed, model based clustering is added to the mixture model example, Bayesian approach is now illustrated with the genetics exampl

    A self-organising mixture network for density modelling

    Get PDF
    A completely unsupervised mixture distribution network, namely the self-organising mixture network, is proposed for learning arbitrary density functions. The algorithm minimises the Kullback-Leibler information by means of stochastic approximation methods. The density functions are modelled as mixtures of parametric distributions such as Gaussian and Cauchy. The first layer of the network is similar to the Kohonen's self-organising map (SOM), but with the parameters of the class conditional densities as the learning weights. The winning mechanism is based on maximum posterior probability, and the updating of weights can be limited to a small neighbourhood around the winner. The second layer accumulates the responses of these local nodes, weighted by the learning mixing parameters. The network possesses simple structure and computation, yet yields fast and robust convergence. Experimental results are also presente

    Uncovering latent structure in valued graphs: A variational approach

    Full text link
    As more and more network-structured data sets are available, the statistical analysis of valued graphs has become common place. Looking for a latent structure is one of the many strategies used to better understand the behavior of a network. Several methods already exist for the binary case. We present a model-based strategy to uncover groups of nodes in valued graphs. This framework can be used for a wide span of parametric random graphs models and allows to include covariates. Variational tools allow us to achieve approximate maximum likelihood estimation of the parameters of these models. We provide a simulation study showing that our estimation method performs well over a broad range of situations. We apply this method to analyze host--parasite interaction networks in forest ecosystems.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS361 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Probabilistic Methodology and Techniques for Artefact Conception and Development

    Get PDF
    The purpose of this paper is to make a state of the art on probabilistic methodology and techniques for artefact conception and development. It is the 8th deliverable of the BIBA (Bayesian Inspired Brain and Artefacts) project. We first present the incompletness problem as the central difficulty that both living creatures and artefacts have to face: how can they perceive, infer, decide and act efficiently with incomplete and uncertain knowledge?. We then introduce a generic probabilistic formalism called Bayesian Programming. This formalism is then used to review the main probabilistic methodology and techniques. This review is organized in 3 parts: first the probabilistic models from Bayesian networks to Kalman filters and from sensor fusion to CAD systems, second the inference techniques and finally the learning and model acquisition and comparison methodologies. We conclude with the perspectives of the BIBA project as they rise from this state of the art

    Mixtures of Skew-t Factor Analyzers

    Get PDF
    In this paper, we introduce a mixture of skew-t factor analyzers as well as a family of mixture models based thereon. The mixture of skew-t distributions model that we use arises as a limiting case of the mixture of generalized hyperbolic distributions. Like their Gaussian and t-distribution analogues, our mixture of skew-t factor analyzers are very well-suited to the model-based clustering of high-dimensional data. Imposing constraints on components of the decomposed covariance parameter results in the development of eight flexible models. The alternating expectation-conditional maximization algorithm is used for model parameter estimation and the Bayesian information criterion is used for model selection. The models are applied to both real and simulated data, giving superior clustering results compared to a well-established family of Gaussian mixture models
    corecore