873 research outputs found

    Functional Bregman Divergence and Bayesian Estimation of Distributions

    Full text link
    A class of distortions termed functional Bregman divergences is defined, which includes squared error and relative entropy. A functional Bregman divergence acts on functions or distributions, and generalizes the standard Bregman divergence for vectors and a previous pointwise Bregman divergence that was defined for functions. A recently published result showed that the mean minimizes the expected Bregman divergence. The new functional definition enables the extension of this result to the continuous case to show that the mean minimizes the expected functional Bregman divergence over a set of functions or distributions. It is shown how this theorem applies to the Bayesian estimation of distributions. Estimation of the uniform distribution from independent and identically drawn samples is used as a case study.Comment: 26 pages, 1 figur

    Bayesian influence diagnostics using normalizing functional Bregman divergence

    Full text link
    Ideally, any statistical inference should be robust to local influences. Although there are simple ways to check about leverage points in independent and linear problems, more complex models require more sophisticated methods. Kullback-Leiber and Bregman divergences were already applied in Bayesian inference to measure the isolated impact of each observation in a model. We extend these ideas to models for dependent data and with non-normal probability distributions such as time series, spatial models and generalized linear models. We also propose a strategy to rescale the functional Bregman divergence to lie in the (0,1) interval thus facilitating interpretation and comparison. This is accomplished with a minimal computational effort and maintaining all theoretical properties. For computational efficiency, we take advantage of Hamiltonian Monte Carlo methods to draw samples from the posterior distribution of model parameters. The resulting Markov chains are then directly connected with Bregman calculus, which results in fast computation. We check the propositions in both simulated and empirical studies

    Copula Variational Bayes inference via information geometry

    Full text link
    Variational Bayes (VB), also known as independent mean-field approximation, has become a popular method for Bayesian network inference in recent years. Its application is vast, e.g. in neural network, compressed sensing, clustering, etc. to name just a few. In this paper, the independence constraint in VB will be relaxed to a conditional constraint class, called copula in statistics. Since a joint probability distribution always belongs to a copula class, the novel copula VB (CVB) approximation is a generalized form of VB. Via information geometry, we will see that CVB algorithm iteratively projects the original joint distribution to a copula constraint space until it reaches a local minimum Kullback-Leibler (KL) divergence. By this way, all mean-field approximations, e.g. iterative VB, Expectation-Maximization (EM), Iterated Conditional Mode (ICM) and k-means algorithms, are special cases of CVB approximation. For a generic Bayesian network, an augmented hierarchy form of CVB will also be designed. While mean-field algorithms can only return a locally optimal approximation for a correlated network, the augmented CVB network, which is an optimally weighted average of a mixture of simpler network structures, can potentially achieve the globally optimal approximation for the first time. Via simulations of Gaussian mixture clustering, the classification's accuracy of CVB will be shown to be far superior to that of state-of-the-art VB, EM and k-means algorithms.Comment: IEEE Transactions on Information Theor

    BDSAR: a new package on Bregman divergence for Bayesian simultaneous autoregressive models

    Full text link
    BDSAR is an R package which estimates distances between probability distributions and facilitates a dynamic and powerful analysis of diagnostics for Bayesian models from the class of Simultaneous Autoregressive (SAR) spatial models. The package offers a new and fine plot to compare models as well as it works in an intuitive way to allow any analyst to easily build fine plots. These are helpful to promote insights about influential observations in the data

    On the role of ML estimation and Bregman divergences in sparse representation of covariance and precision matrices

    Full text link
    Sparse representation of structured signals requires modelling strategies that maintain specific signal properties, in addition to preserving original information content and achieving simpler signal representation. Therefore, the major design challenge is to introduce adequate problem formulations and offer solutions that will efficiently lead to desired representations. In this context, sparse representation of covariance and precision matrices, which appear as feature descriptors or mixture model parameters, respectively, will be in the main focus of this paper.Comment: 8 page

    Optimal Grouping for Group Minimax Hypothesis Testing

    Full text link
    Bayesian hypothesis testing and minimax hypothesis testing represent extreme instances of detection in which the prior probabilities of the hypotheses are either completely and precisely known, or are completely unknown. Group minimax, also known as Gamma-minimax, is a robust intermediary between Bayesian and minimax hypothesis testing that allows for coarse or partial advance knowledge of the hypothesis priors by using information on sets in which the prior lies. Existing work on group minimax, however, does not consider the question of how to define the sets or groups of priors; it is assumed that the groups are given. In this work, we propose a novel intermediate detection scheme formulated through the quantization of the space of prior probabilities that optimally determines groups and also representative priors within the groups. We show that when viewed from a quantization perspective, group minimax amounts to determining centroids with a minimax Bayes risk error divergence distortion criterion: the appropriate Bregman divergence for this task. Moreover, the optimal partitioning of the space of prior probabilities is a Bregman Voronoi diagram. Together, the optimal grouping and representation points are an epsilon-net with respect to Bayes risk error divergence, and permit a rate-distortion type asymptotic analysis of detection performance with the number of groups. Examples of detecting signals corrupted by additive white Gaussian noise and of distinguishing exponentially-distributed signals are presented.Comment: 12 figure

    Optimal Bayesian Minimax Rates for Unconstrained Large Covariance Matrices

    Full text link
    We obtain the optimal Bayesian minimax rate for the unconstrained large covariance matrix of multivariate normal sample with mean zero, when both the sample size, n, and the dimension, p, of the covariance matrix tend to infinity. Traditionally the posterior convergence rate is used to compare the frequentist asymptotic performance of priors, but defining the optimality with it is elusive. We propose a new decision theoretic framework for prior selection and define Bayesian minimax rate. Under the proposed framework, we obtain the optimal Bayesian minimax rate for the spectral norm for all rates of p. We also considered Frobenius norm, Bregman divergence and squared log-determinant loss and obtain the optimal Bayesian minimax rate under certain rate conditions on p. A simulation study is conducted to support the theoretical results.Comment: 45 page

    Maximum-A-Posteriori Estimates in Linear Inverse Problems with Log-concave Priors are Proper Bayes Estimators

    Full text link
    A frequent matter of debate in Bayesian inversion is the question, which of the two principle point-estimators, the maximum-a-posteriori (MAP) or the conditional mean (CM) estimate is to be preferred. As the MAP estimate corresponds to the solution given by variational regularization techniques, this is also a constant matter of debate between the two research areas. Following a theoretical argument - the Bayes cost formalism - the CM estimate is classically preferred for being the Bayes estimator for the mean squared error cost while the MAP estimate is classically discredited for being only asymptotically the Bayes estimator for the uniform cost function. In this article we present recent theoretical and computational observations that challenge this point of view, in particular for high-dimensional sparsity-promoting Bayesian inversion. Using Bregman distances, we present new, proper convex Bayes cost functions for which the MAP estimator is the Bayes estimator. We complement this finding by results that correct further common misconceptions about MAP estimates. In total, we aim to rehabilitate MAP estimates in linear inverse problems with log-concave priors as proper Bayes estimators

    Bayesian Distance Clustering

    Full text link
    Model-based clustering is widely-used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance clustering methods, which rely on modeling the likelihood of the pairwise distances in place of the original data. Although some information in the data is discarded, we gain substantial robustness to modeling assumptions. The proposed approach represents an appealing middle ground between distance- and model-based clustering, drawing advantages from each of these canonical approaches. We illustrate dramatic gains in the ability to infer clusters that are not well represented by the usual choices of kernel. A simulation study is included to assess performance relative to competitors, and we apply the approach to clustering of brain genome expression data. Keywords: Distance-based clustering; Mixture model; Model-based clustering; Model misspecification; Pairwise distance matrix; Partial likelihood; Robustness

    Deep Divergence Learning

    Full text link
    Classical linear metric learning methods have recently been extended along two distinct lines: deep metric learning methods for learning embeddings of the data using neural networks, and Bregman divergence learning approaches for extending learning Euclidean distances to more general divergence measures such as divergences over distributions. In this paper, we introduce deep Bregman divergences, which are based on learning and parameterizing functional Bregman divergences using neural networks, and which unify and extend these existing lines of work. We show in particular how deep metric learning formulations, kernel metric learning, Mahalanobis metric learning, and moment-matching functions for comparing distributions arise as special cases of these divergences in the symmetric setting. We then describe a deep learning framework for learning general functional Bregman divergences, and show in experiments that this method yields superior performance on benchmark datasets as compared to existing deep metric learning approaches. We also discuss novel applications, including a semi-supervised distributional clustering problem, and a new loss function for unsupervised data generation.Comment: Under revie
    corecore