252 research outputs found
Dual Connections in Nonparametric Classical Information Geometry
We construct an infinite-dimensional information manifold based on
exponential Orlicz spaces without using the notion of exponential convergence.
We then show that convex mixtures of probability densities lie on the same
connected component of this manifold, and characterize the class of densities
for which this mixture can be extended to an open segment containing the
extreme points. For this class, we define an infinite-dimensional analogue of
the mixture parallel transport and prove that it is dual to the exponential
parallel transport with respect to the Fisher information. We also define
{\alpha}-derivatives and prove that they are convex mixtures of the extremal
(\pm 1)-derivatives
Quantum Statistical Manifolds
Quantum information geometry studies families of quantum states by means of
differential geometry. A new approach is followed with the intention to
facilitate the introduction of a more general theory in subsequent work. To
this purpose, the emphasis is shifted from a manifold of strictly positive
density matrices to a manifold of faithful quantum states on the C*-algebra of
bounded linear operators. In addition, ideas from the parameter-free approach
to information geometry are adopted. The underlying Hilbert space is assumed to
be finite-dimensional. In this way technicalities are avoided so that strong
results are obtained, which one can hope to prove later on in a more general
context. Two different atlases are introduced, one in which it is
straightforward to show that the quantum states form a Banach manifold, the
other which is compatible with the inner product of Bogoliubov and which yields
affine coordinates for the exponential connection.Comment: submitted to the proceedings of Entropy 201
A Class of Non-Parametric Statistical Manifolds modelled on Sobolev Space
We construct a family of non-parametric (infinite-dimensional) manifolds of finite measures on Rd. The manifolds are modelled on a variety of weighted Sobolev spaces, including Hilbert-Sobolev spaces and mixed-norm spaces. Each supports the Fisher-Rao metric as a weak Riemannian metric. Densities are expressed in terms of a deformed exponential function having linear growth. Unusually for the Sobolev context, and as a consequence of its linear growth, this "lifts" to a nonlinear superposition (Nemytskii) operator that acts continuously on a particular class of mixed-norm model spaces, and on the fixed norm space W²'¹ i.e. it maps each of these spaces continuously into itself. It also maps continuously between other fixed-norm spaces with a loss of Lebesgue exponent that increases with the number of derivatives. Some of the results make essential use of a log-Sobolev embedding theorem. Each manifold contains a smoothly embedded submanifold of probability measures. Applications to the stochastic partial differential equations of nonlinear filtering (and hence to the Fokker-Planck equation) are outlined
Information geometric measurements of generalisation
Neural networks can be regarded as statistical models, and can be analysed in a Bayesian framework. Generalisation is measured by the performance on independent test data drawn from the same distribution as the training data. Such performance can be quantified by the posterior average of the information divergence between the true and the model distributions. Averaging over the Bayesian posterior guarantees internal coherence; Using information divergence guarantees invariance with respect to representation. The theory generalises the least mean squares theory for linear Gaussian models to general problems of statistical estimation. The main results are: (1)~the ideal optimal estimate is always given by average over the posterior; (2)~the optimal estimate within a computational model is given by the projection of the ideal estimate to the model. This incidentally shows some currently popular methods dealing with hyperpriors are in general unnecessary and misleading. The extension of information divergence to positive normalisable measures reveals a remarkable relation between the dlt dual affine geometry of statistical manifolds and the geometry of the dual pair of Banach spaces Ld and Ldd. It therefore offers conceptual simplification to information geometry. The general conclusion on the issue of evaluating neural network learning rules and other statistical inference methods is that such evaluations are only meaningful under three assumptions: The prior P(p), describing the environment of all the problems; the divergence Dd, specifying the requirement of the task; and the model Q, specifying available computing resources
Manifolds of Differentiable Densities
We develop a family of infinite-dimensional (non-parametric) manifolds of probability measures. The latter are defined on underlying Banach spaces, and have densities of class with respect to appropriate reference measures. The case , in which the manifolds are modelled on Fréchet spaces, is included. The manifolds admit the Fisher-Rao metric and, unusually for the non-parametric setting, Amari's
-covariant derivatives for all . By construction, they are -embedded submanifolds of particular manifolds of finite measures. The statistical manifolds are dually () flat, and admit mixture and exponential representations as charts. Their curvatures with respect to the -covariant derivatives are derived. The likelihood function associated with a finite sample is a
continuous function on each of the manifolds, and the -divergences are of class
Long-range interactions, doubling measures and Tsallis entropy
We present a path toward determining the statistical origin of the
thermodynamic limit for systems with long-range interactions. We assume
throughout that the systems under consideration have thermodynamic properties
given by the Tsallis entropy. We rely on the composition property of the
Tsallis entropy for determining effective metrics and measures on their
configuration/phase spaces. We point out the significance of Muckenhoupt
weights, of doubling measures and of doubling measure-induced metric
deformations of the metric. We comment on the volume deformations induced by
the Tsallis entropy composition and on the significance of functional spaces
for these constructions.Comment: 26 pages, No figures, Standard LaTeX. Revised version: addition of a
paragraph on a contentious issue (Sect. 3). To be published by Eur. Phys. J.
Entropies from coarse-graining: convex polytopes vs. ellipsoids
We examine the Boltzmann/Gibbs/Shannon and the
non-additive Havrda-Charv\'{a}t / Dar\'{o}czy/Cressie-Read/Tsallis \
\ and the Kaniadakis -entropy \ \
from the viewpoint of coarse-graining, symplectic capacities and convexity. We
argue that the functional form of such entropies can be ascribed to a
discordance in phase-space coarse-graining between two generally different
approaches: the Euclidean/Riemannian metric one that reflects independence and
picks cubes as the fundamental cells and the symplectic/canonical one that
picks spheres/ellipsoids for this role. Our discussion is motivated by and
confined to the behaviour of Hamiltonian systems of many degrees of freedom. We
see that Dvoretzky's theorem provides asymptotic estimates for the minimal
dimension beyond which these two approaches are close to each other. We state
and speculate about the role that dualities may play in this viewpoint.Comment: 63 pages. No figures. Standard LaTe
A Geometric Variational Approach to Bayesian Inference
We propose a novel Riemannian geometric framework for variational inference
in Bayesian models based on the nonparametric Fisher-Rao metric on the manifold
of probability density functions. Under the square-root density representation,
the manifold can be identified with the positive orthant of the unit
hypersphere in L2, and the Fisher-Rao metric reduces to the standard L2 metric.
Exploiting such a Riemannian structure, we formulate the task of approximating
the posterior distribution as a variational problem on the hypersphere based on
the alpha-divergence. This provides a tighter lower bound on the marginal
distribution when compared to, and a corresponding upper bound unavailable
with, approaches based on the Kullback-Leibler divergence. We propose a novel
gradient-based algorithm for the variational problem based on Frechet
derivative operators motivated by the geometry of the Hilbert sphere, and
examine its properties. Through simulations and real-data applications, we
demonstrate the utility of the proposed geometric framework and algorithm on
several Bayesian models
Quasi-Arithmetic Mixtures, Divergence Minimization, and Bregman Information
Markov Chain Monte Carlo methods for sampling from complex distributions and
estimating normalization constants often simulate samples from a sequence of
intermediate distributions along an annealing path, which bridges between a
tractable initial distribution and a target density of interest. Prior work has
constructed annealing paths using quasi-arithmetic means, and interpreted the
resulting intermediate densities as minimizing an expected divergence to the
endpoints. We provide a comprehensive analysis of this 'centroid' property
using Bregman divergences under a monotonic embedding of the density function,
thereby associating common divergences such as Amari's and Renyi's
-divergences, -divergences, and the Jensen-Shannon
divergence with intermediate densities along an annealing path. Our analysis
highlights the interplay between parametric families, quasi-arithmetic means,
and divergence functions using the rho-tau Bregman divergence framework of
Zhang 2004,2013.Comment: 19 pages + appendix (rewritten + changed title in revision
- …