873 research outputs found
Functional Bregman Divergence and Bayesian Estimation of Distributions
A class of distortions termed functional Bregman divergences is defined,
which includes squared error and relative entropy. A functional Bregman
divergence acts on functions or distributions, and generalizes the standard
Bregman divergence for vectors and a previous pointwise Bregman divergence that
was defined for functions. A recently published result showed that the mean
minimizes the expected Bregman divergence. The new functional definition
enables the extension of this result to the continuous case to show that the
mean minimizes the expected functional Bregman divergence over a set of
functions or distributions. It is shown how this theorem applies to the
Bayesian estimation of distributions. Estimation of the uniform distribution
from independent and identically drawn samples is used as a case study.Comment: 26 pages, 1 figur
Bayesian influence diagnostics using normalizing functional Bregman divergence
Ideally, any statistical inference should be robust to local influences.
Although there are simple ways to check about leverage points in independent
and linear problems, more complex models require more sophisticated methods.
Kullback-Leiber and Bregman divergences were already applied in Bayesian
inference to measure the isolated impact of each observation in a model. We
extend these ideas to models for dependent data and with non-normal probability
distributions such as time series, spatial models and generalized linear
models. We also propose a strategy to rescale the functional Bregman divergence
to lie in the (0,1) interval thus facilitating interpretation and comparison.
This is accomplished with a minimal computational effort and maintaining all
theoretical properties. For computational efficiency, we take advantage of
Hamiltonian Monte Carlo methods to draw samples from the posterior distribution
of model parameters. The resulting Markov chains are then directly connected
with Bregman calculus, which results in fast computation. We check the
propositions in both simulated and empirical studies
Copula Variational Bayes inference via information geometry
Variational Bayes (VB), also known as independent mean-field approximation,
has become a popular method for Bayesian network inference in recent years. Its
application is vast, e.g. in neural network, compressed sensing, clustering,
etc. to name just a few. In this paper, the independence constraint in VB will
be relaxed to a conditional constraint class, called copula in statistics.
Since a joint probability distribution always belongs to a copula class, the
novel copula VB (CVB) approximation is a generalized form of VB. Via
information geometry, we will see that CVB algorithm iteratively projects the
original joint distribution to a copula constraint space until it reaches a
local minimum Kullback-Leibler (KL) divergence. By this way, all mean-field
approximations, e.g. iterative VB, Expectation-Maximization (EM), Iterated
Conditional Mode (ICM) and k-means algorithms, are special cases of CVB
approximation.
For a generic Bayesian network, an augmented hierarchy form of CVB will also
be designed. While mean-field algorithms can only return a locally optimal
approximation for a correlated network, the augmented CVB network, which is an
optimally weighted average of a mixture of simpler network structures, can
potentially achieve the globally optimal approximation for the first time. Via
simulations of Gaussian mixture clustering, the classification's accuracy of
CVB will be shown to be far superior to that of state-of-the-art VB, EM and
k-means algorithms.Comment: IEEE Transactions on Information Theor
BDSAR: a new package on Bregman divergence for Bayesian simultaneous autoregressive models
BDSAR is an R package which estimates distances between probability
distributions and facilitates a dynamic and powerful analysis of diagnostics
for Bayesian models from the class of Simultaneous Autoregressive (SAR) spatial
models. The package offers a new and fine plot to compare models as well as it
works in an intuitive way to allow any analyst to easily build fine plots.
These are helpful to promote insights about influential observations in the
data
On the role of ML estimation and Bregman divergences in sparse representation of covariance and precision matrices
Sparse representation of structured signals requires modelling strategies
that maintain specific signal properties, in addition to preserving original
information content and achieving simpler signal representation. Therefore, the
major design challenge is to introduce adequate problem formulations and offer
solutions that will efficiently lead to desired representations. In this
context, sparse representation of covariance and precision matrices, which
appear as feature descriptors or mixture model parameters, respectively, will
be in the main focus of this paper.Comment: 8 page
Optimal Grouping for Group Minimax Hypothesis Testing
Bayesian hypothesis testing and minimax hypothesis testing represent extreme
instances of detection in which the prior probabilities of the hypotheses are
either completely and precisely known, or are completely unknown. Group
minimax, also known as Gamma-minimax, is a robust intermediary between Bayesian
and minimax hypothesis testing that allows for coarse or partial advance
knowledge of the hypothesis priors by using information on sets in which the
prior lies. Existing work on group minimax, however, does not consider the
question of how to define the sets or groups of priors; it is assumed that the
groups are given. In this work, we propose a novel intermediate detection
scheme formulated through the quantization of the space of prior probabilities
that optimally determines groups and also representative priors within the
groups. We show that when viewed from a quantization perspective, group minimax
amounts to determining centroids with a minimax Bayes risk error divergence
distortion criterion: the appropriate Bregman divergence for this task.
Moreover, the optimal partitioning of the space of prior probabilities is a
Bregman Voronoi diagram. Together, the optimal grouping and representation
points are an epsilon-net with respect to Bayes risk error divergence, and
permit a rate-distortion type asymptotic analysis of detection performance with
the number of groups. Examples of detecting signals corrupted by additive white
Gaussian noise and of distinguishing exponentially-distributed signals are
presented.Comment: 12 figure
Optimal Bayesian Minimax Rates for Unconstrained Large Covariance Matrices
We obtain the optimal Bayesian minimax rate for the unconstrained large
covariance matrix of multivariate normal sample with mean zero, when both the
sample size, n, and the dimension, p, of the covariance matrix tend to
infinity. Traditionally the posterior convergence rate is used to compare the
frequentist asymptotic performance of priors, but defining the optimality with
it is elusive. We propose a new decision theoretic framework for prior
selection and define Bayesian minimax rate. Under the proposed framework, we
obtain the optimal Bayesian minimax rate for the spectral norm for all rates of
p. We also considered Frobenius norm, Bregman divergence and squared
log-determinant loss and obtain the optimal Bayesian minimax rate under certain
rate conditions on p. A simulation study is conducted to support the
theoretical results.Comment: 45 page
Maximum-A-Posteriori Estimates in Linear Inverse Problems with Log-concave Priors are Proper Bayes Estimators
A frequent matter of debate in Bayesian inversion is the question, which of
the two principle point-estimators, the maximum-a-posteriori (MAP) or the
conditional mean (CM) estimate is to be preferred. As the MAP estimate
corresponds to the solution given by variational regularization techniques,
this is also a constant matter of debate between the two research areas.
Following a theoretical argument - the Bayes cost formalism - the CM estimate
is classically preferred for being the Bayes estimator for the mean squared
error cost while the MAP estimate is classically discredited for being only
asymptotically the Bayes estimator for the uniform cost function. In this
article we present recent theoretical and computational observations that
challenge this point of view, in particular for high-dimensional
sparsity-promoting Bayesian inversion. Using Bregman distances, we present new,
proper convex Bayes cost functions for which the MAP estimator is the Bayes
estimator. We complement this finding by results that correct further common
misconceptions about MAP estimates. In total, we aim to rehabilitate MAP
estimates in linear inverse problems with log-concave priors as proper Bayes
estimators
Bayesian Distance Clustering
Model-based clustering is widely-used in a variety of application areas.
However, fundamental concerns remain about robustness. In particular, results
can be sensitive to the choice of kernel representing the within-cluster data
density. Leveraging on properties of pairwise differences between data points,
we propose a class of Bayesian distance clustering methods, which rely on
modeling the likelihood of the pairwise distances in place of the original
data. Although some information in the data is discarded, we gain substantial
robustness to modeling assumptions. The proposed approach represents an
appealing middle ground between distance- and model-based clustering, drawing
advantages from each of these canonical approaches. We illustrate dramatic
gains in the ability to infer clusters that are not well represented by the
usual choices of kernel. A simulation study is included to assess performance
relative to competitors, and we apply the approach to clustering of brain
genome expression data.
Keywords: Distance-based clustering; Mixture model; Model-based clustering;
Model misspecification; Pairwise distance matrix; Partial likelihood;
Robustness
Deep Divergence Learning
Classical linear metric learning methods have recently been extended along
two distinct lines: deep metric learning methods for learning embeddings of the
data using neural networks, and Bregman divergence learning approaches for
extending learning Euclidean distances to more general divergence measures such
as divergences over distributions. In this paper, we introduce deep Bregman
divergences, which are based on learning and parameterizing functional Bregman
divergences using neural networks, and which unify and extend these existing
lines of work. We show in particular how deep metric learning formulations,
kernel metric learning, Mahalanobis metric learning, and moment-matching
functions for comparing distributions arise as special cases of these
divergences in the symmetric setting. We then describe a deep learning
framework for learning general functional Bregman divergences, and show in
experiments that this method yields superior performance on benchmark datasets
as compared to existing deep metric learning approaches. We also discuss novel
applications, including a semi-supervised distributional clustering problem,
and a new loss function for unsupervised data generation.Comment: Under revie
- …