Search CORE

905 research outputs found

On the Chi square and higher-order Chi distances for approximating f-divergences

Author: Nielsen Frank
Nock Richard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/09/2013
Field of study

We report closed-form formula for calculating the Chi square and higher-order Chi distances between statistical distributions belonging to the same exponential family with affine natural space, and instantiate those formula for the Poisson and isotropic Gaussian families. We then describe an analytic formula for the

f

-divergences based on Taylor expansions and relying on an extended class of Chi-type distances.Comment: 11 pages, two tables, no figure. Java(TM) code available online at http://www.informationgeometry.org/fDivergence

arXiv.org e-Print Archive

On power chi expansions of $f$ -divergences

Author: Hadjeres Gaëtan
Nielsen Frank
Publication venue
Publication date: 14/03/2019
Field of study

We consider both finite and infinite power chi expansions of

f

-divergences derived from Taylor's expansions of smooth generators, and elaborate on cases where these expansions yield closed-form formula, bounded approximations, or analytic divergence series expressions of

f

-divergences.Comment: 21 page

arXiv.org e-Print Archive

Multi-model inference through projections in model space

Author: Ponciano Jose-Miguel
Taper Mark L
Publication venue
Publication date: 22/05/2018
Field of study

Information criteria have had a profound impact on modern ecological science. They allow researchers to estimate which probabilistic approximating models are closest to the generating process. Unfortunately, information criterion comparison does not tell how good the best model is. Nor do practitioners routinely test the reliability (e.g. error rates) of information criterion-based model selection. In this work, we show that these two shortcomings can be resolved by extending a key observation from Hirotugu Akaike's original work. Standard information criterion analysis considers only the divergences of each model from the generating process. It is ignored that there are also estimable divergence relationships amongst all of the approximating models. We then show that using both sets of divergences, a model space can be constructed that includes an estimated location for the generating process. Thus, not only can an analyst determine which model is closest to the generating process, she/he can also determine how close to the generating process the best approximating model is. Properties of the generating process estimated from these projections are more accurate than those estimated by model averaging. The applications of our findings extend to all areas of science where model selection through information criteria is done.Comment: 31 pages, 8 figures. Submitted to JRSS

arXiv.org e-Print Archive

Alpha-Beta Divergence For Variational Inference

Author: Regli Jean-Baptiste
Silva Ricardo
Publication venue
Publication date: 20/05/2018
Field of study

This paper introduces a variational approximation framework using direct optimization of what is known as the {\it scale invariant Alpha-Beta divergence} (sAB divergence). This new objective encompasses most variational objectives that use the Kullback-Leibler, the R{\'e}nyi or the gamma divergences. It also gives access to objective functions never exploited before in the context of variational inference. This is achieved via two easy to interpret control parameters, which allow for a smooth interpolation over the divergence space while trading-off properties such as mass-covering of a target distribution and robustness to outliers in the data. Furthermore, the sAB variational objective can be optimized directly by repurposing existing methods for Monte Carlo computation of complex variational objectives, leading to estimates of the divergence instead of variational lower bounds. We show the advantages of this objective on Bayesian models for regression problems

arXiv.org e-Print Archive

f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization

Author: Cseke Botond
Nowozin Sebastian
Tomioka Ryota
Publication venue
Publication date: 02/06/2016
Field of study

Generative neural samplers are probabilistic models that implement sampling using feedforward neural networks: they take a random input vector and produce a sample from a probability distribution defined by the network weights. These models are expressive and allow efficient computation of samples and derivatives, but cannot be used for computing likelihoods or for marginalization. The generative-adversarial training method allows to train such models through the use of an auxiliary discriminative neural network. We show that the generative-adversarial approach is a special case of an existing more general variational divergence estimation approach. We show that any f-divergence can be used for training generative neural samplers. We discuss the benefits of various choices of divergence functions on training complexity and the quality of the obtained generative models.Comment: 17 page

arXiv.org e-Print Archive

On The Chain Rule Optimal Transport Distance

Author: Nielsen Frank
Sun Ke
Publication venue
Publication date: 22/02/2019
Field of study

We define a novel class of distances between statistical multivariate distributions by solving an optimal transportation problem on their marginal densities with respect to a ground distance defined on their conditional densities. By using the chain rule factorization of probabilities, we show how to perform optimal transport on a ground space being an information-geometric manifold of conditional probabilities. We prove that this new distance is a metric whenever the chosen ground distance is a metric. Our distance generalizes both the Wasserstein distances between point sets and a recently introduced metric distance between statistical mixtures. As a first application of this Chain Rule Optimal Transport (CROT) distance, we show that the ground distance between statistical mixtures is upper bounded by this optimal transport distance and its fast relaxed Sinkhorn distance, whenever the ground distance is joint convex. We report on our experiments which quantify the tightness of the CROT distance for the total variation distance, the square root generalization of the Jensen-Shannon divergence, the Wasserstein

W_p

metric and the R\'enyi divergence between mixtures.Comment: 23 page

arXiv.org e-Print Archive

Quantifying the probable approximation error of probabilistic inference programs

Author: Cusumano-Towner Marco F
Mansinghka Vikash K
Publication venue
Publication date: 31/05/2016
Field of study

This paper introduces a new technique for quantifying the approximation error of a broad class of probabilistic inference programs, including ones based on both variational and Monte Carlo approaches. The key idea is to derive a subjective bound on the symmetrized KL divergence between the distribution achieved by an approximate inference program and its true target distribution. The bound's validity (and subjectivity) rests on the accuracy of two auxiliary probabilistic programs: (i) a "reference" inference program that defines a gold standard of accuracy and (ii) a "meta-inference" program that answers the question "what internal random choices did the original approximate inference program probably make given that it produced a particular result?" The paper includes empirical results on inference problems drawn from linear regression, Dirichlet process mixture modeling, HMMs, and Bayesian networks. The experiments show that the technique is robust to the quality of the reference inference program and that it can detect implementation bugs that are not apparent from predictive performance

arXiv.org e-Print Archive

Generalization of Clustering Agreements and Distances for Overlapping Clusters and Network Communities

Author: Rabbany Reihaneh
Zaïane Osmar R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/03/2015
Field of study

A measure of distance between two clusterings has important applications, including clustering validation and ensemble clustering. Generally, such distance measure provides navigation through the space of possible clusterings. Mostly used in cluster validation, a normalized clustering distance, a.k.a. agreement measure, compares a given clustering result against the ground-truth clustering. Clustering agreement measures are often classified into two families of pair-counting and information theoretic measures, with the widely-used representatives of Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), respectively. This paper sheds light on the relation between these two families through a generalization. It further presents an alternative algebraic formulation for these agreement measures which incorporates an intuitive clustering distance, which is defined based on the analogous between cluster overlaps and co-memberships of nodes in clusters. Unlike the original measures, it is easily extendable for different cases, including overlapping clusters and clusters of inter-related data for complex networks. These two extensions are, in particular, important in the context of finding clusters in social and information networks, a.k.a communities

arXiv.org e-Print Archive

On Voronoi diagrams and dual Delaunay complexes on the information-geometric Cauchy manifolds

Author: Nielsen Frank
Publication venue
Publication date: 18/06/2020
Field of study

We study the Voronoi diagrams of a finite set of Cauchy distributions and their dual complexes from the viewpoint of information geometry by considering the Fisher-Rao distance, the Kullback-Leibler divergence, the chi square divergence, and a flat divergence derived from Tsallis' quadratic entropy related to the conformal flattening of the Fisher-Rao curved geometry. We prove that the Voronoi diagrams of the Fisher-Rao distance, the chi square divergence, and the Kullback-Leibler divergences all coincide with a hyperbolic Voronoi diagram on the corresponding Cauchy location-scale parameters, and that the dual Cauchy hyperbolic Delaunay complexes are Fisher orthogonal to the Cauchy hyperbolic Voronoi diagrams. The dual Voronoi diagrams with respect to the dual forward/reverse flat divergences amount to dual Bregman Voronoi diagrams, and their dual complexes are regular triangulations. The primal Bregman-Tsallis Voronoi diagram corresponds to the hyperbolic Voronoi diagram and the dual Bregman-Tsallis Voronoi diagram coincides with the ordinary Euclidean Voronoi diagram. Besides, we prove that the square root of the Kullback-Leibler divergence between Cauchy distributions yields a metric distance which is Hilbertian for the Cauchy scale families.Comment: 34 pages, 13 figure

arXiv.org e-Print Archive

On divergences tests for composite hypotheses under composite likelihood

Author: Martin Nirian
Pardo Leandro
Zografos Konstantinos
Publication venue
Publication date: 01/03/2016
Field of study

It is well-known that in some situations it is not easy to compute the likelihood function as the datasets might be large or the model is too complex. In that contexts composite likelihood, derived by multiplying the likelihoods of subjects of the variables, may be useful. The extension of the classical likelihood ratio test statistics to the framework of composite likelihoods is used as a procedure to solve the problem of testing in the context of composite likelihood. In this paper we introduce and study a new family of test statistics for composite likelihood: Composite {\phi}-divergence test statistics for solving the problem of testing a simple null hypothesis or a composite null hypothesis. To do that we introduce and study the asymptotic distribution of the restricted maximum composite likelihood estimate

arXiv.org e-Print Archive