Search CORE

873 research outputs found

Functional Bregman Divergence and Bayesian Estimation of Distributions

Author: Frigyik B. A.
Gupta M. R.
Srivastava S.
Publication venue
Publication date: 23/11/2006
Field of study

A class of distortions termed functional Bregman divergences is defined, which includes squared error and relative entropy. A functional Bregman divergence acts on functions or distributions, and generalizes the standard Bregman divergence for vectors and a previous pointwise Bregman divergence that was defined for functions. A recently published result showed that the mean minimizes the expected Bregman divergence. The new functional definition enables the extension of this result to the continuous case to show that the mean minimizes the expected functional Bregman divergence over a set of functions or distributions. It is shown how this theorem applies to the Bayesian estimation of distributions. Estimation of the uniform distribution from independent and identically drawn samples is used as a case study.Comment: 26 pages, 1 figur

arXiv.org e-Print Archive

Bayesian influence diagnostics using normalizing functional Bregman divergence

Author: Danilevicz Ian M
Ehlers Ricardo S
Publication venue
Publication date: 07/04/2019
Field of study

Ideally, any statistical inference should be robust to local influences. Although there are simple ways to check about leverage points in independent and linear problems, more complex models require more sophisticated methods. Kullback-Leiber and Bregman divergences were already applied in Bayesian inference to measure the isolated impact of each observation in a model. We extend these ideas to models for dependent data and with non-normal probability distributions such as time series, spatial models and generalized linear models. We also propose a strategy to rescale the functional Bregman divergence to lie in the (0,1) interval thus facilitating interpretation and comparison. This is accomplished with a minimal computational effort and maintaining all theoretical properties. For computational efficiency, we take advantage of Hamiltonian Monte Carlo methods to draw samples from the posterior distribution of model parameters. The resulting Markov chains are then directly connected with Bregman calculus, which results in fast computation. We check the propositions in both simulated and empirical studies

arXiv.org e-Print Archive

Copula Variational Bayes inference via information geometry

Author: Tran Viet Hung
Publication venue
Publication date: 29/03/2018
Field of study

Variational Bayes (VB), also known as independent mean-field approximation, has become a popular method for Bayesian network inference in recent years. Its application is vast, e.g. in neural network, compressed sensing, clustering, etc. to name just a few. In this paper, the independence constraint in VB will be relaxed to a conditional constraint class, called copula in statistics. Since a joint probability distribution always belongs to a copula class, the novel copula VB (CVB) approximation is a generalized form of VB. Via information geometry, we will see that CVB algorithm iteratively projects the original joint distribution to a copula constraint space until it reaches a local minimum Kullback-Leibler (KL) divergence. By this way, all mean-field approximations, e.g. iterative VB, Expectation-Maximization (EM), Iterated Conditional Mode (ICM) and k-means algorithms, are special cases of CVB approximation. For a generic Bayesian network, an augmented hierarchy form of CVB will also be designed. While mean-field algorithms can only return a locally optimal approximation for a correlated network, the augmented CVB network, which is an optimally weighted average of a mixture of simpler network structures, can potentially achieve the globally optimal approximation for the first time. Via simulations of Gaussian mixture clustering, the classification's accuracy of CVB will be shown to be far superior to that of state-of-the-art VB, EM and k-means algorithms.Comment: IEEE Transactions on Information Theor

arXiv.org e-Print Archive

BDSAR: a new package on Bregman divergence for Bayesian simultaneous autoregressive models

Author: Danilevicz Ian M
Ehlers Ricardo S
Publication venue
Publication date: 24/04/2017
Field of study

BDSAR is an R package which estimates distances between probability distributions and facilitates a dynamic and powerful analysis of diagnostics for Bayesian models from the class of Simultaneous Autoregressive (SAR) spatial models. The package offers a new and fine plot to compare models as well as it works in an intuitive way to allow any analyst to easily build fine plots. These are helpful to promote insights about influential observations in the data

arXiv.org e-Print Archive

On the role of ML estimation and Bregman divergences in sparse representation of covariance and precision matrices

Author: Brkljač Branko
Trpovski Željen
Publication venue
Publication date: 27/10/2018
Field of study

Sparse representation of structured signals requires modelling strategies that maintain specific signal properties, in addition to preserving original information content and achieving simpler signal representation. Therefore, the major design challenge is to introduce adequate problem formulations and offer solutions that will efficiently lead to desired representations. In this context, sparse representation of covariance and precision matrices, which appear as feature descriptors or mixture model parameters, respectively, will be in the main focus of this paper.Comment: 8 page

arXiv.org e-Print Archive

Optimal Grouping for Group Minimax Hypothesis Testing

Author: Varshney Kush R.
Varshney Lav R.
Publication venue
Publication date: 24/07/2013
Field of study

Bayesian hypothesis testing and minimax hypothesis testing represent extreme instances of detection in which the prior probabilities of the hypotheses are either completely and precisely known, or are completely unknown. Group minimax, also known as Gamma-minimax, is a robust intermediary between Bayesian and minimax hypothesis testing that allows for coarse or partial advance knowledge of the hypothesis priors by using information on sets in which the prior lies. Existing work on group minimax, however, does not consider the question of how to define the sets or groups of priors; it is assumed that the groups are given. In this work, we propose a novel intermediate detection scheme formulated through the quantization of the space of prior probabilities that optimally determines groups and also representative priors within the groups. We show that when viewed from a quantization perspective, group minimax amounts to determining centroids with a minimax Bayes risk error divergence distortion criterion: the appropriate Bregman divergence for this task. Moreover, the optimal partitioning of the space of prior probabilities is a Bregman Voronoi diagram. Together, the optimal grouping and representation points are an epsilon-net with respect to Bayes risk error divergence, and permit a rate-distortion type asymptotic analysis of detection performance with the number of groups. Examples of detecting signals corrupted by additive white Gaussian noise and of distinguishing exponentially-distributed signals are presented.Comment: 12 figure

arXiv.org e-Print Archive

Optimal Bayesian Minimax Rates for Unconstrained Large Covariance Matrices

Author: Lee Jaeyong
Lee Kyoungjae
Publication venue
Publication date: 30/11/2017
Field of study

We obtain the optimal Bayesian minimax rate for the unconstrained large covariance matrix of multivariate normal sample with mean zero, when both the sample size, n, and the dimension, p, of the covariance matrix tend to infinity. Traditionally the posterior convergence rate is used to compare the frequentist asymptotic performance of priors, but defining the optimality with it is elusive. We propose a new decision theoretic framework for prior selection and define Bayesian minimax rate. Under the proposed framework, we obtain the optimal Bayesian minimax rate for the spectral norm for all rates of p. We also considered Frobenius norm, Bregman divergence and squared log-determinant loss and obtain the optimal Bayesian minimax rate under certain rate conditions on p. A simulation study is conducted to support the theoretical results.Comment: 45 page

arXiv.org e-Print Archive

Maximum-A-Posteriori Estimates in Linear Inverse Problems with Log-concave Priors are Proper Bayes Estimators

Author: Burger Martin
Lucka Felix
Publication venue: 'IOP Publishing'
Publication date: 02/06/2014
Field of study

A frequent matter of debate in Bayesian inversion is the question, which of the two principle point-estimators, the maximum-a-posteriori (MAP) or the conditional mean (CM) estimate is to be preferred. As the MAP estimate corresponds to the solution given by variational regularization techniques, this is also a constant matter of debate between the two research areas. Following a theoretical argument - the Bayes cost formalism - the CM estimate is classically preferred for being the Bayes estimator for the mean squared error cost while the MAP estimate is classically discredited for being only asymptotically the Bayes estimator for the uniform cost function. In this article we present recent theoretical and computational observations that challenge this point of view, in particular for high-dimensional sparsity-promoting Bayesian inversion. Using Bregman distances, we present new, proper convex Bayes cost functions for which the MAP estimator is the Bayes estimator. We complement this finding by results that correct further common misconceptions about MAP estimates. In total, we aim to rehabilitate MAP estimates in linear inverse problems with log-concave priors as proper Bayes estimators

arXiv.org e-Print Archive

Bayesian Distance Clustering

Author: Duan Leo L
Dunson David B
Publication venue
Publication date: 25/06/2019
Field of study

Model-based clustering is widely-used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance clustering methods, which rely on modeling the likelihood of the pairwise distances in place of the original data. Although some information in the data is discarded, we gain substantial robustness to modeling assumptions. The proposed approach represents an appealing middle ground between distance- and model-based clustering, drawing advantages from each of these canonical approaches. We illustrate dramatic gains in the ability to infer clusters that are not well represented by the usual choices of kernel. A simulation study is included to assess performance relative to competitors, and we apply the approach to clustering of brain genome expression data. Keywords: Distance-based clustering; Mixture model; Model-based clustering; Model misspecification; Pairwise distance matrix; Partial likelihood; Robustness

arXiv.org e-Print Archive

Deep Divergence Learning

Author: Cilingir Kubra
Kulis Brian
Manzelli Rachel
Publication venue
Publication date: 06/05/2020
Field of study

Classical linear metric learning methods have recently been extended along two distinct lines: deep metric learning methods for learning embeddings of the data using neural networks, and Bregman divergence learning approaches for extending learning Euclidean distances to more general divergence measures such as divergences over distributions. In this paper, we introduce deep Bregman divergences, which are based on learning and parameterizing functional Bregman divergences using neural networks, and which unify and extend these existing lines of work. We show in particular how deep metric learning formulations, kernel metric learning, Mahalanobis metric learning, and moment-matching functions for comparing distributions arise as special cases of these divergences in the symmetric setting. We then describe a deep learning framework for learning general functional Bregman divergences, and show in experiments that this method yields superior performance on benchmark datasets as compared to existing deep metric learning approaches. We also discuss novel applications, including a semi-supervised distributional clustering problem, and a new loss function for unsupervised data generation.Comment: Under revie

arXiv.org e-Print Archive