905 research outputs found

    On the Chi square and higher-order Chi distances for approximating f-divergences

    Full text link
    We report closed-form formula for calculating the Chi square and higher-order Chi distances between statistical distributions belonging to the same exponential family with affine natural space, and instantiate those formula for the Poisson and isotropic Gaussian families. We then describe an analytic formula for the ff-divergences based on Taylor expansions and relying on an extended class of Chi-type distances.Comment: 11 pages, two tables, no figure. Java(TM) code available online at http://www.informationgeometry.org/fDivergence

    On power chi expansions of ff-divergences

    Full text link
    We consider both finite and infinite power chi expansions of ff-divergences derived from Taylor's expansions of smooth generators, and elaborate on cases where these expansions yield closed-form formula, bounded approximations, or analytic divergence series expressions of ff-divergences.Comment: 21 page

    Multi-model inference through projections in model space

    Full text link
    Information criteria have had a profound impact on modern ecological science. They allow researchers to estimate which probabilistic approximating models are closest to the generating process. Unfortunately, information criterion comparison does not tell how good the best model is. Nor do practitioners routinely test the reliability (e.g. error rates) of information criterion-based model selection. In this work, we show that these two shortcomings can be resolved by extending a key observation from Hirotugu Akaike's original work. Standard information criterion analysis considers only the divergences of each model from the generating process. It is ignored that there are also estimable divergence relationships amongst all of the approximating models. We then show that using both sets of divergences, a model space can be constructed that includes an estimated location for the generating process. Thus, not only can an analyst determine which model is closest to the generating process, she/he can also determine how close to the generating process the best approximating model is. Properties of the generating process estimated from these projections are more accurate than those estimated by model averaging. The applications of our findings extend to all areas of science where model selection through information criteria is done.Comment: 31 pages, 8 figures. Submitted to JRSS

    Alpha-Beta Divergence For Variational Inference

    Full text link
    This paper introduces a variational approximation framework using direct optimization of what is known as the {\it scale invariant Alpha-Beta divergence} (sAB divergence). This new objective encompasses most variational objectives that use the Kullback-Leibler, the R{\'e}nyi or the gamma divergences. It also gives access to objective functions never exploited before in the context of variational inference. This is achieved via two easy to interpret control parameters, which allow for a smooth interpolation over the divergence space while trading-off properties such as mass-covering of a target distribution and robustness to outliers in the data. Furthermore, the sAB variational objective can be optimized directly by repurposing existing methods for Monte Carlo computation of complex variational objectives, leading to estimates of the divergence instead of variational lower bounds. We show the advantages of this objective on Bayesian models for regression problems

    f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization

    Full text link
    Generative neural samplers are probabilistic models that implement sampling using feedforward neural networks: they take a random input vector and produce a sample from a probability distribution defined by the network weights. These models are expressive and allow efficient computation of samples and derivatives, but cannot be used for computing likelihoods or for marginalization. The generative-adversarial training method allows to train such models through the use of an auxiliary discriminative neural network. We show that the generative-adversarial approach is a special case of an existing more general variational divergence estimation approach. We show that any f-divergence can be used for training generative neural samplers. We discuss the benefits of various choices of divergence functions on training complexity and the quality of the obtained generative models.Comment: 17 page

    On The Chain Rule Optimal Transport Distance

    Full text link
    We define a novel class of distances between statistical multivariate distributions by solving an optimal transportation problem on their marginal densities with respect to a ground distance defined on their conditional densities. By using the chain rule factorization of probabilities, we show how to perform optimal transport on a ground space being an information-geometric manifold of conditional probabilities. We prove that this new distance is a metric whenever the chosen ground distance is a metric. Our distance generalizes both the Wasserstein distances between point sets and a recently introduced metric distance between statistical mixtures. As a first application of this Chain Rule Optimal Transport (CROT) distance, we show that the ground distance between statistical mixtures is upper bounded by this optimal transport distance and its fast relaxed Sinkhorn distance, whenever the ground distance is joint convex. We report on our experiments which quantify the tightness of the CROT distance for the total variation distance, the square root generalization of the Jensen-Shannon divergence, the Wasserstein WpW_p metric and the R\'enyi divergence between mixtures.Comment: 23 page

    Quantifying the probable approximation error of probabilistic inference programs

    Full text link
    This paper introduces a new technique for quantifying the approximation error of a broad class of probabilistic inference programs, including ones based on both variational and Monte Carlo approaches. The key idea is to derive a subjective bound on the symmetrized KL divergence between the distribution achieved by an approximate inference program and its true target distribution. The bound's validity (and subjectivity) rests on the accuracy of two auxiliary probabilistic programs: (i) a "reference" inference program that defines a gold standard of accuracy and (ii) a "meta-inference" program that answers the question "what internal random choices did the original approximate inference program probably make given that it produced a particular result?" The paper includes empirical results on inference problems drawn from linear regression, Dirichlet process mixture modeling, HMMs, and Bayesian networks. The experiments show that the technique is robust to the quality of the reference inference program and that it can detect implementation bugs that are not apparent from predictive performance

    Generalization of Clustering Agreements and Distances for Overlapping Clusters and Network Communities

    Full text link
    A measure of distance between two clusterings has important applications, including clustering validation and ensemble clustering. Generally, such distance measure provides navigation through the space of possible clusterings. Mostly used in cluster validation, a normalized clustering distance, a.k.a. agreement measure, compares a given clustering result against the ground-truth clustering. Clustering agreement measures are often classified into two families of pair-counting and information theoretic measures, with the widely-used representatives of Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), respectively. This paper sheds light on the relation between these two families through a generalization. It further presents an alternative algebraic formulation for these agreement measures which incorporates an intuitive clustering distance, which is defined based on the analogous between cluster overlaps and co-memberships of nodes in clusters. Unlike the original measures, it is easily extendable for different cases, including overlapping clusters and clusters of inter-related data for complex networks. These two extensions are, in particular, important in the context of finding clusters in social and information networks, a.k.a communities

    On Voronoi diagrams and dual Delaunay complexes on the information-geometric Cauchy manifolds

    Full text link
    We study the Voronoi diagrams of a finite set of Cauchy distributions and their dual complexes from the viewpoint of information geometry by considering the Fisher-Rao distance, the Kullback-Leibler divergence, the chi square divergence, and a flat divergence derived from Tsallis' quadratic entropy related to the conformal flattening of the Fisher-Rao curved geometry. We prove that the Voronoi diagrams of the Fisher-Rao distance, the chi square divergence, and the Kullback-Leibler divergences all coincide with a hyperbolic Voronoi diagram on the corresponding Cauchy location-scale parameters, and that the dual Cauchy hyperbolic Delaunay complexes are Fisher orthogonal to the Cauchy hyperbolic Voronoi diagrams. The dual Voronoi diagrams with respect to the dual forward/reverse flat divergences amount to dual Bregman Voronoi diagrams, and their dual complexes are regular triangulations. The primal Bregman-Tsallis Voronoi diagram corresponds to the hyperbolic Voronoi diagram and the dual Bregman-Tsallis Voronoi diagram coincides with the ordinary Euclidean Voronoi diagram. Besides, we prove that the square root of the Kullback-Leibler divergence between Cauchy distributions yields a metric distance which is Hilbertian for the Cauchy scale families.Comment: 34 pages, 13 figure

    On divergences tests for composite hypotheses under composite likelihood

    Full text link
    It is well-known that in some situations it is not easy to compute the likelihood function as the datasets might be large or the model is too complex. In that contexts composite likelihood, derived by multiplying the likelihoods of subjects of the variables, may be useful. The extension of the classical likelihood ratio test statistics to the framework of composite likelihoods is used as a procedure to solve the problem of testing in the context of composite likelihood. In this paper we introduce and study a new family of test statistics for composite likelihood: Composite {\phi}-divergence test statistics for solving the problem of testing a simple null hypothesis or a composite null hypothesis. To do that we introduce and study the asymptotic distribution of the restricted maximum composite likelihood estimate
    • …
    corecore