62 research outputs found
The Burbea-Rao and Bhattacharyya centroids
We study the centroid with respect to the class of information-theoretic
Burbea-Rao divergences that generalize the celebrated Jensen-Shannon divergence
by measuring the non-negative Jensen difference induced by a strictly convex
and differentiable function. Although those Burbea-Rao divergences are
symmetric by construction, they are not metric since they fail to satisfy the
triangle inequality. We first explain how a particular symmetrization of
Bregman divergences called Jensen-Bregman distances yields exactly those
Burbea-Rao divergences. We then proceed by defining skew Burbea-Rao
divergences, and show that skew Burbea-Rao divergences amount in limit cases to
compute Bregman divergences. We then prove that Burbea-Rao centroids are
unique, and can be arbitrarily finely approximated by a generic iterative
concave-convex optimization algorithm with guaranteed convergence property. In
the second part of the paper, we consider the Bhattacharyya distance that is
commonly used to measure overlapping degree of probability distributions. We
show that Bhattacharyya distances on members of the same statistical
exponential family amount to calculate a Burbea-Rao divergence in disguise.
Thus we get an efficient algorithm for computing the Bhattacharyya centroid of
a set of parametric distributions belonging to the same exponential families,
improving over former specialized methods found in the literature that were
limited to univariate or "diagonal" multivariate Gaussians. To illustrate the
performance of our Bhattacharyya/Burbea-Rao centroid algorithm, we present
experimental performance results for -means and hierarchical clustering
methods of Gaussian mixture models.Comment: 13 page
On a generalization of the Jensen-Shannon divergence and the JS-symmetrization of distances relying on abstract means
The Jensen-Shannon divergence is a renown bounded symmetrization of the
unbounded Kullback-Leibler divergence which measures the total Kullback-Leibler
divergence to the average mixture distribution. However the Jensen-Shannon
divergence between Gaussian distributions is not available in closed-form. To
bypass this problem, we present a generalization of the Jensen-Shannon (JS)
divergence using abstract means which yields closed-form expressions when the
mean is chosen according to the parametric family of distributions. More
generally, we define the JS-symmetrizations of any distance using generalized
statistical mixtures derived from abstract means. In particular, we first show
that the geometric mean is well-suited for exponential families, and report two
closed-form formula for (i) the geometric Jensen-Shannon divergence between
probability densities of the same exponential family, and (ii) the geometric
JS-symmetrization of the reverse Kullback-Leibler divergence. As a second
illustrating example, we show that the harmonic mean is well-suited for the
scale Cauchy distributions, and report a closed-form formula for the harmonic
Jensen-Shannon divergence between scale Cauchy distributions. We also define
generalized Jensen-Shannon divergences between matrices (e.g., quantum
Jensen-Shannon divergences) and consider clustering with respect to these novel
Jensen-Shannon divergences.Comment: 30 page
The Bregman chord divergence
Distances are fundamental primitives whose choice significantly impacts the
performances of algorithms in machine learning and signal processing. However
selecting the most appropriate distance for a given task is an endeavor.
Instead of testing one by one the entries of an ever-expanding dictionary of
{\em ad hoc} distances, one rather prefers to consider parametric classes of
distances that are exhaustively characterized by axioms derived from first
principles. Bregman divergences are such a class. However fine-tuning a Bregman
divergence is delicate since it requires to smoothly adjust a functional
generator. In this work, we propose an extension of Bregman divergences called
the Bregman chord divergences. This new class of distances does not require
gradient calculations, uses two scalar parameters that can be easily tailored
in applications, and generalizes asymptotically Bregman divergences.Comment: 10 page
Total Jensen divergences: Definition, Properties and k-Means++ Clustering
We present a novel class of divergences induced by a smooth convex function
called total Jensen divergences. Those total Jensen divergences are invariant
by construction to rotations, a feature yielding regularization of ordinary
Jensen divergences by a conformal factor. We analyze the relationships between
this novel class of total Jensen divergences and the recently introduced total
Bregman divergences. We then proceed by defining the total Jensen centroids as
average distortion minimizers, and study their robustness performance to
outliers. Finally, we prove that the k-means++ initialization that bypasses
explicit centroid computations is good enough in practice to guarantee
probabilistically a constant approximation factor to the optimal k-means
clustering.Comment: 27 page
Bregman divergences based on optimal design criteria and simplicial measures of dispersion
In previous work the authors defined the k-th order simplicial distance between probability distributions which arises naturally from a measure of dispersion based on the squared volume of random simplices of dimension k. This theory is embedded in the wider theory of divergences and distances between distributions which includes Kullback–Leibler, Jensen–Shannon, Jeffreys–Bregman divergence and Bhattacharyya distance. A general construction is given based on defining a directional derivative of a function ϕ from one distribution to the other whose concavity or strict concavity influences the properties of the resulting divergence. For the normal distribution these divergences can be expressed as matrix formula for the (multivariate) means and covariances. Optimal experimental design criteria contribute a range of functionals applied to non-negative, or positive definite, information matrices. Not all can distinguish normal distributions but sufficient conditions are given. The k-th order simplicial distance is revisited from this aspect and the results are used to test empirically the identity of means and covariances
Generalized Bregman and Jensen divergences which include some f-divergences
In this paper, we introduce new classes of divergences by extending the
definitions of the Bregman divergence and the skew Jensen divergence. These new
divergence classes (g-Bregman divergence and skew g-Jensen divergence) satisfy
some properties similar to the Bregman or skew Jensen divergence. We show these
g-divergences include divergences which belong to a class of f-divergence (the
Hellinger distance, the chi-square divergence and the alpha-divergence in
addition to the Kullback-Leibler divergence). Moreover, we derive an inequality
between the g-Bregman divergence and the skew g-Jensen divergence and show this
inequality is a generalization of Lin's inequality.Comment: 11 page
- …