905 research outputs found
On the Chi square and higher-order Chi distances for approximating f-divergences
We report closed-form formula for calculating the Chi square and higher-order
Chi distances between statistical distributions belonging to the same
exponential family with affine natural space, and instantiate those formula for
the Poisson and isotropic Gaussian families. We then describe an analytic
formula for the -divergences based on Taylor expansions and relying on an
extended class of Chi-type distances.Comment: 11 pages, two tables, no figure. Java(TM) code available online at
http://www.informationgeometry.org/fDivergence
On power chi expansions of -divergences
We consider both finite and infinite power chi expansions of -divergences
derived from Taylor's expansions of smooth generators, and elaborate on cases
where these expansions yield closed-form formula, bounded approximations, or
analytic divergence series expressions of -divergences.Comment: 21 page
Multi-model inference through projections in model space
Information criteria have had a profound impact on modern ecological science.
They allow researchers to estimate which probabilistic approximating models are
closest to the generating process. Unfortunately, information criterion
comparison does not tell how good the best model is. Nor do practitioners
routinely test the reliability (e.g. error rates) of information
criterion-based model selection. In this work, we show that these two
shortcomings can be resolved by extending a key observation from Hirotugu
Akaike's original work. Standard information criterion analysis considers only
the divergences of each model from the generating process. It is ignored that
there are also estimable divergence relationships amongst all of the
approximating models. We then show that using both sets of divergences, a model
space can be constructed that includes an estimated location for the generating
process. Thus, not only can an analyst determine which model is closest to the
generating process, she/he can also determine how close to the generating
process the best approximating model is. Properties of the generating process
estimated from these projections are more accurate than those estimated by
model averaging. The applications of our findings extend to all areas of
science where model selection through information criteria is done.Comment: 31 pages, 8 figures. Submitted to JRSS
Alpha-Beta Divergence For Variational Inference
This paper introduces a variational approximation framework using direct
optimization of what is known as the {\it scale invariant Alpha-Beta
divergence} (sAB divergence). This new objective encompasses most variational
objectives that use the Kullback-Leibler, the R{\'e}nyi or the gamma
divergences. It also gives access to objective functions never exploited before
in the context of variational inference. This is achieved via two easy to
interpret control parameters, which allow for a smooth interpolation over the
divergence space while trading-off properties such as mass-covering of a target
distribution and robustness to outliers in the data. Furthermore, the sAB
variational objective can be optimized directly by repurposing existing methods
for Monte Carlo computation of complex variational objectives, leading to
estimates of the divergence instead of variational lower bounds. We show the
advantages of this objective on Bayesian models for regression problems
f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization
Generative neural samplers are probabilistic models that implement sampling
using feedforward neural networks: they take a random input vector and produce
a sample from a probability distribution defined by the network weights. These
models are expressive and allow efficient computation of samples and
derivatives, but cannot be used for computing likelihoods or for
marginalization. The generative-adversarial training method allows to train
such models through the use of an auxiliary discriminative neural network. We
show that the generative-adversarial approach is a special case of an existing
more general variational divergence estimation approach. We show that any
f-divergence can be used for training generative neural samplers. We discuss
the benefits of various choices of divergence functions on training complexity
and the quality of the obtained generative models.Comment: 17 page
On The Chain Rule Optimal Transport Distance
We define a novel class of distances between statistical multivariate
distributions by solving an optimal transportation problem on their marginal
densities with respect to a ground distance defined on their conditional
densities. By using the chain rule factorization of probabilities, we show how
to perform optimal transport on a ground space being an information-geometric
manifold of conditional probabilities. We prove that this new distance is a
metric whenever the chosen ground distance is a metric. Our distance
generalizes both the Wasserstein distances between point sets and a recently
introduced metric distance between statistical mixtures. As a first application
of this Chain Rule Optimal Transport (CROT) distance, we show that the ground
distance between statistical mixtures is upper bounded by this optimal
transport distance and its fast relaxed Sinkhorn distance, whenever the ground
distance is joint convex. We report on our experiments which quantify the
tightness of the CROT distance for the total variation distance, the square
root generalization of the Jensen-Shannon divergence, the Wasserstein
metric and the R\'enyi divergence between mixtures.Comment: 23 page
Quantifying the probable approximation error of probabilistic inference programs
This paper introduces a new technique for quantifying the approximation error
of a broad class of probabilistic inference programs, including ones based on
both variational and Monte Carlo approaches. The key idea is to derive a
subjective bound on the symmetrized KL divergence between the distribution
achieved by an approximate inference program and its true target distribution.
The bound's validity (and subjectivity) rests on the accuracy of two auxiliary
probabilistic programs: (i) a "reference" inference program that defines a gold
standard of accuracy and (ii) a "meta-inference" program that answers the
question "what internal random choices did the original approximate inference
program probably make given that it produced a particular result?" The paper
includes empirical results on inference problems drawn from linear regression,
Dirichlet process mixture modeling, HMMs, and Bayesian networks. The
experiments show that the technique is robust to the quality of the reference
inference program and that it can detect implementation bugs that are not
apparent from predictive performance
Generalization of Clustering Agreements and Distances for Overlapping Clusters and Network Communities
A measure of distance between two clusterings has important applications,
including clustering validation and ensemble clustering. Generally, such
distance measure provides navigation through the space of possible clusterings.
Mostly used in cluster validation, a normalized clustering distance, a.k.a.
agreement measure, compares a given clustering result against the ground-truth
clustering. Clustering agreement measures are often classified into two
families of pair-counting and information theoretic measures, with the
widely-used representatives of Adjusted Rand Index (ARI) and Normalized Mutual
Information (NMI), respectively. This paper sheds light on the relation between
these two families through a generalization. It further presents an alternative
algebraic formulation for these agreement measures which incorporates an
intuitive clustering distance, which is defined based on the analogous between
cluster overlaps and co-memberships of nodes in clusters. Unlike the original
measures, it is easily extendable for different cases, including overlapping
clusters and clusters of inter-related data for complex networks. These two
extensions are, in particular, important in the context of finding clusters in
social and information networks, a.k.a communities
On Voronoi diagrams and dual Delaunay complexes on the information-geometric Cauchy manifolds
We study the Voronoi diagrams of a finite set of Cauchy distributions and
their dual complexes from the viewpoint of information geometry by considering
the Fisher-Rao distance, the Kullback-Leibler divergence, the chi square
divergence, and a flat divergence derived from Tsallis' quadratic entropy
related to the conformal flattening of the Fisher-Rao curved geometry. We prove
that the Voronoi diagrams of the Fisher-Rao distance, the chi square
divergence, and the Kullback-Leibler divergences all coincide with a hyperbolic
Voronoi diagram on the corresponding Cauchy location-scale parameters, and that
the dual Cauchy hyperbolic Delaunay complexes are Fisher orthogonal to the
Cauchy hyperbolic Voronoi diagrams. The dual Voronoi diagrams with respect to
the dual forward/reverse flat divergences amount to dual Bregman Voronoi
diagrams, and their dual complexes are regular triangulations. The primal
Bregman-Tsallis Voronoi diagram corresponds to the hyperbolic Voronoi diagram
and the dual Bregman-Tsallis Voronoi diagram coincides with the ordinary
Euclidean Voronoi diagram. Besides, we prove that the square root of the
Kullback-Leibler divergence between Cauchy distributions yields a metric
distance which is Hilbertian for the Cauchy scale families.Comment: 34 pages, 13 figure
On divergences tests for composite hypotheses under composite likelihood
It is well-known that in some situations it is not easy to compute the
likelihood function as the datasets might be large or the model is too complex.
In that contexts composite likelihood, derived by multiplying the likelihoods
of subjects of the variables, may be useful. The extension of the classical
likelihood ratio test statistics to the framework of composite likelihoods is
used as a procedure to solve the problem of testing in the context of composite
likelihood. In this paper we introduce and study a new family of test
statistics for composite likelihood: Composite {\phi}-divergence test
statistics for solving the problem of testing a simple null hypothesis or a
composite null hypothesis. To do that we introduce and study the asymptotic
distribution of the restricted maximum composite likelihood estimate
- …