47 research outputs found
Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means
Bayesian classification labels observations based on given prior information,
namely class-a priori and class-conditional probabilities. Bayes' risk is the
minimum expected classification cost that is achieved by the Bayes' test, the
optimal decision rule. When no cost incurs for correct classification and unit
cost is charged for misclassification, Bayes' test reduces to the maximum a
posteriori decision rule, and Bayes risk simplifies to Bayes' error, the
probability of error. Since calculating this probability of error is often
intractable, several techniques have been devised to bound it with closed-form
formula, introducing thereby measures of similarity and divergence between
distributions like the Bhattacharyya coefficient and its associated
Bhattacharyya distance. The Bhattacharyya upper bound can further be tightened
using the Chernoff information that relies on the notion of best error
exponent. In this paper, we first express Bayes' risk using the total variation
distance on scaled distributions. We then elucidate and extend the
Bhattacharyya and the Chernoff upper bound mechanisms using generalized
weighted means. We provide as a byproduct novel notions of statistical
divergences and affinity coefficients. We illustrate our technique by deriving
new upper bounds for the univariate Cauchy and the multivariate
-distributions, and show experimentally that those bounds are not too
distant to the computationally intractable Bayes' error.Comment: 22 pages, include R code. To appear in Pattern Recognition Letter
Estimating Mixture Entropy with Pairwise Distances
Mixture distributions arise in many parametric and non-parametric settings --
for example, in Gaussian mixture models and in non-parametric estimation. It is
often necessary to compute the entropy of a mixture, but, in most cases, this
quantity has no closed-form expression, making some form of approximation
necessary. We propose a family of estimators based on a pairwise distance
function between mixture components, and show that this estimator class has
many attractive properties. For many distributions of interest, the proposed
estimators are efficient to compute, differentiable in the mixture parameters,
and become exact when the mixture components are clustered. We prove this
family includes lower and upper bounds on the mixture entropy. The Chernoff
-divergence gives a lower bound when chosen as the distance function,
with the Bhattacharyya distance providing the tightest lower bound for
components that are symmetric and members of a location family. The
Kullback-Leibler divergence gives an upper bound when used as the distance
function. We provide closed-form expressions of these bounds for mixtures of
Gaussians, and discuss their applications to the estimation of mutual
information. We then demonstrate that our bounds are significantly tighter than
well-known existing bounds using numeric simulations. This estimator class is
very useful in optimization problems involving maximization/minimization of
entropy and mutual information, such as MaxEnt and rate distortion problems.Comment: Corrects several errata in published version, in particular in
Section V (bounds on mutual information
On a generalization of the Jensen-Shannon divergence and the JS-symmetrization of distances relying on abstract means
The Jensen-Shannon divergence is a renown bounded symmetrization of the
unbounded Kullback-Leibler divergence which measures the total Kullback-Leibler
divergence to the average mixture distribution. However the Jensen-Shannon
divergence between Gaussian distributions is not available in closed-form. To
bypass this problem, we present a generalization of the Jensen-Shannon (JS)
divergence using abstract means which yields closed-form expressions when the
mean is chosen according to the parametric family of distributions. More
generally, we define the JS-symmetrizations of any distance using generalized
statistical mixtures derived from abstract means. In particular, we first show
that the geometric mean is well-suited for exponential families, and report two
closed-form formula for (i) the geometric Jensen-Shannon divergence between
probability densities of the same exponential family, and (ii) the geometric
JS-symmetrization of the reverse Kullback-Leibler divergence. As a second
illustrating example, we show that the harmonic mean is well-suited for the
scale Cauchy distributions, and report a closed-form formula for the harmonic
Jensen-Shannon divergence between scale Cauchy distributions. We also define
generalized Jensen-Shannon divergences between matrices (e.g., quantum
Jensen-Shannon divergences) and consider clustering with respect to these novel
Jensen-Shannon divergences.Comment: 30 page
Revisiting Chernoff Information with Likelihood Ratio Exponential Families
The Chernoff information between two probability measures is a statistical
divergence measuring their deviation defined as their maximally skewed
Bhattacharyya distance. Although the Chernoff information was originally
introduced for bounding the Bayes error in statistical hypothesis testing, the
divergence found many other applications due to its empirical robustness
property found in applications ranging from information fusion to quantum
information. From the viewpoint of information theory, the Chernoff information
can also be interpreted as a minmax symmetrization of the Kullback--Leibler
divergence. In this paper, we first revisit the Chernoff information between
two densities of a measurable Lebesgue space by considering the exponential
families induced by their geometric mixtures: The so-called likelihood ratio
exponential families. Second, we show how to (i) solve exactly the Chernoff
information between any two univariate Gaussian distributions or get a
closed-form formula using symbolic computing, (ii) report a closed-form formula
of the Chernoff information of centered Gaussians with scaled covariance
matrices and (iii) use a fast numerical scheme to approximate the Chernoff
information between any two multivariate Gaussian distributions.Comment: 41 page
Quantifying the Similarity of Paleomagnetic Poles
An ability to compare paleomagnetic poles quantitatively is fundamental to paleogeographicreconstruction. The Fisher distribution provides a statistical framework for both constructing and relatingpaleomagnetic poles to enable comparison of estimated pole positions in paleomagnetic reconstructions.However, Fisher distribution-based confidence regions for paleomagnetic poles are often comparedusing empirical rules of thumb rather than by quantitative analysis of their full structure. Here wedemonstrate potential shortcomings of such comparisons and propose continuous metrics for quantitativecomparison of paleomagnetic poles. These metrics are simple to apply for Fisher distributions and can bemodified readily for a broad range of alternative distributions that may be more appropriate forrepresenting some paleomagnetic data sets. We demonstrate how our proposed metrics provide bothquantitative and probabilistic approaches to common tasks in paleomagnetic reconstruction, such ascomparing estimated mean pole positions with apparent polar wander paths.This work was supported by the
Australian Research Council (Grant
DP190100874
Beyond scalar quasi-arithmetic means: Quasi-arithmetic averages and quasi-arithmetic mixtures in information geometry
We generalize quasi-arithmetic means beyond scalars by considering the
gradient map of a Legendre type real-valued function. The gradient map of a
Legendre type function is proven strictly comonotone with a global inverse. It
thus yields a generalization of strictly mononotone and differentiable
functions generating scalar quasi-arithmetic means. Furthermore, the Legendre
transformation gives rise to pairs of dual quasi-arithmetic averages via the
convex duality. We study the invariance and equivariance properties under
affine transformations of quasi-arithmetic averages via the lens of dually flat
spaces of information geometry. We show how these quasi-arithmetic averages are
used to express points on dual geodesics and sided barycenters in the dual
affine coordinate systems. We then consider quasi-arithmetic mixtures and
describe several parametric and non-parametric statistical models which are
closed under the quasi-arithmetic mixture operation.Comment: 20 page
The {\alpha}-divergences associated with a pair of strictly comparable quasi-arithmetic means
We generalize the family of -divergences using a pair of strictly
comparable weighted means. In particular, we obtain the -divergence in the
limit case (a generalization of the Kullback-Leibler
divergence) and the -divergence in the limit case (a
generalization of the reverse Kullback-Leibler divergence). We state the
condition for a pair of quasi-arithmetic means to be strictly comparable, and
report the formula for the quasi-arithmetic -divergences and its
subfamily of bipower homogeneous -divergences which belong to the
Csis\'ar's -divergences. Finally, we show that these generalized
quasi-arithmetic -divergences and -divergences can be decomposed as the
sum of generalized cross-entropies minus entropies, and rewritten as conformal
Bregman divergences using monotone embeddings.Comment: 18 page