145 research outputs found
Hamilton-Jacobi Theory and Information Geometry
Recently, a method to dynamically define a divergence function for a
given statistical manifold by means of the
Hamilton-Jacobi theory associated with a suitable Lagrangian function
on has been proposed. Here we will review this
construction and lay the basis for an inverse problem where we assume the
divergence function to be known and we look for a Lagrangian function
for which is a complete solution of the associated
Hamilton-Jacobi theory. To apply these ideas to quantum systems, we have to
replace probability distributions with probability amplitudes.Comment: 8 page
A Formalization of The Natural Gradient Method for General Similarity Measures
In optimization, the natural gradient method is well-known for likelihood
maximization. The method uses the Kullback-Leibler divergence, corresponding
infinitesimally to the Fisher-Rao metric, which is pulled back to the parameter
space of a family of probability distributions. This way, gradients with
respect to the parameters respect the Fisher-Rao geometry of the space of
distributions, which might differ vastly from the standard Euclidean geometry
of the parameter space, often leading to faster convergence. However, when
minimizing an arbitrary similarity measure between distributions, it is
generally unclear which metric to use. We provide a general framework that,
given a similarity measure, derives a metric for the natural gradient. We then
discuss connections between the natural gradient method and multiple other
optimization techniques in the literature. Finally, we provide computations of
the formal natural gradient to show overlap with well-known cases and to
compute natural gradients in novel frameworks
Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence
Incremental learning (IL) has received a lot of attention recently, however,
the literature lacks a precise problem definition, proper evaluation settings,
and metrics tailored specifically for the IL problem. One of the main
objectives of this work is to fill these gaps so as to provide a common ground
for better understanding of IL. The main challenge for an IL algorithm is to
update the classifier whilst preserving existing knowledge. We observe that, in
addition to forgetting, a known issue while preserving knowledge, IL also
suffers from a problem we call intransigence, inability of a model to update
its knowledge. We introduce two metrics to quantify forgetting and
intransigence that allow us to understand, analyse, and gain better insights
into the behaviour of IL algorithms. We present RWalk, a generalization of
EWC++ (our efficient version of EWC [Kirkpatrick2016EWC]) and Path Integral
[Zenke2017Continual] with a theoretically grounded KL-divergence based
perspective. We provide a thorough analysis of various IL algorithms on MNIST
and CIFAR-100 datasets. In these experiments, RWalk obtains superior results in
terms of accuracy, and also provides a better trade-off between forgetting and
intransigence
The Bregman chord divergence
Distances are fundamental primitives whose choice significantly impacts the
performances of algorithms in machine learning and signal processing. However
selecting the most appropriate distance for a given task is an endeavor.
Instead of testing one by one the entries of an ever-expanding dictionary of
{\em ad hoc} distances, one rather prefers to consider parametric classes of
distances that are exhaustively characterized by axioms derived from first
principles. Bregman divergences are such a class. However fine-tuning a Bregman
divergence is delicate since it requires to smoothly adjust a functional
generator. In this work, we propose an extension of Bregman divergences called
the Bregman chord divergences. This new class of distances does not require
gradient calculations, uses two scalar parameters that can be easily tailored
in applications, and generalizes asymptotically Bregman divergences.Comment: 10 page
Learning Adaptive Regularization for Image Labeling Using Geometric Assignment
We study the inverse problem of model parameter learning for pixelwise image
labeling, using the linear assignment flow and training data with ground truth.
This is accomplished by a Riemannian gradient flow on the manifold of
parameters that determine the regularization properties of the assignment flow.
Using the symplectic partitioned Runge--Kutta method for numerical integration,
it is shown that deriving the sensitivity conditions of the parameter learning
problem and its discretization commute. A convenient property of our approach
is that learning is based on exact inference. Carefully designed experiments
demonstrate the performance of our approach, the expressiveness of the
mathematical model as well as its limitations, from the viewpoint of
statistical learning and optimal control
Warped Riemannian metrics for location-scale models
The present paper shows that warped Riemannian metrics, a class of Riemannian
metrics which play a prominent role in Riemannian geometry, are also of
fundamental importance in information geometry. Precisely, the paper features a
new theorem, which states that the Rao-Fisher information metric of any
location-scale model, defined on a Riemannian manifold, is a warped Riemannian
metric, whenever this model is invariant under the action of some Lie group.
This theorem is a valuable tool in finding the expression of the Rao-Fisher
information metric of location-scale models defined on high-dimensional
Riemannian manifolds. Indeed, a warped Riemannian metric is fully determined by
only two functions of a single variable, irrespective of the dimension of the
underlying Riemannian manifold. Starting from this theorem, several original
contributions are made. The expression of the Rao-Fisher information metric of
the Riemannian Gaussian model is provided, for the first time in the
literature. A generalised definition of the Mahalanobis distance is introduced,
which is applicable to any location-scale model defined on a Riemannian
manifold. The solution of the geodesic equation is obtained, for any Rao-Fisher
information metric defined in terms of warped Riemannian metrics. Finally,
using a mixture of analytical and numerical computations, it is shown that the
parameter space of the von Mises-Fisher model of -dimensional directional
data, when equipped with its Rao-Fisher information metric, becomes a Hadamard
manifold, a simply-connected complete Riemannian manifold of negative sectional
curvature, for . Hopefully, in upcoming work, this will be
proved for any value of .Comment: first version, before submissio
Dependency-aware Attention Control for Unconstrained Face Recognition with Image Sets
This paper targets the problem of image set-based face verification and
identification. Unlike traditional single media (an image or video) setting, we
encounter a set of heterogeneous contents containing orderless images and
videos. The importance of each image is usually considered either equal or
based on their independent quality assessment. How to model the relationship of
orderless images within a set remains a challenge. We address this problem by
formulating it as a Markov Decision Process (MDP) in the latent space.
Specifically, we first present a dependency-aware attention control (DAC)
network, which resorts to actor-critic reinforcement learning for sequential
attention decision of each image embedding to fully exploit the rich
correlation cues among the unordered images. Moreover, we introduce its
sample-efficient variant with off-policy experience replay to speed up the
learning process. The pose-guided representation scheme can further boost the
performance at the extremes of the pose variation.Comment: Fixed the unreadable code in CVF version. arXiv admin note: text
overlap with arXiv:1707.00130 by other author
State-Space Analysis of Time-Varying Higher-Order Spike Correlation for Multiple Neural Spike Train Data
Precise spike coordination between the spiking activities of multiple neurons is suggested as an indication of coordinated network activity in active cell assemblies. Spike correlation analysis aims to identify such cooperative network activity by detecting excess spike synchrony in simultaneously recorded multiple neural spike sequences. Cooperative activity is expected to organize dynamically during behavior and cognition; therefore currently available analysis techniques must be extended to enable the estimation of multiple time-varying spike interactions between neurons simultaneously. In particular, new methods must take advantage of the simultaneous observations of multiple neurons by addressing their higher-order dependencies, which cannot be revealed by pairwise analyses alone. In this paper, we develop a method for estimating time-varying spike interactions by means of a state-space analysis. Discretized parallel spike sequences are modeled as multi-variate binary processes using a log-linear model that provides a well-defined measure of higher-order spike correlation in an information geometry framework. We construct a recursive Bayesian filter/smoother for the extraction of spike interaction parameters. This method can simultaneously estimate the dynamic pairwise spike interactions of multiple single neurons, thereby extending the Ising/spin-glass model analysis of multiple neural spike train data to a nonstationary analysis. Furthermore, the method can estimate dynamic higher-order spike interactions. To validate the inclusion of the higher-order terms in the model, we construct an approximation method to assess the goodness-of-fit to spike data. In addition, we formulate a test method for the presence of higher-order spike correlation even in nonstationary spike data, e.g., data from awake behaving animals. The utility of the proposed methods is tested using simulated spike data with known underlying correlation dynamics. Finally, we apply the methods to neural spike data simultaneously recorded from the motor cortex of an awake monkey and demonstrate that the higher-order spike correlation organizes dynamically in relation to a behavioral demand
Hyperbolic planforms in relation to visual edges and textures perception
We propose to use bifurcation theory and pattern formation as theoretical
probes for various hypotheses about the neural organization of the brain. This
allows us to make predictions about the kinds of patterns that should be
observed in the activity of real brains through, e.g. optical imaging, and
opens the door to the design of experiments to test these hypotheses. We study
the specific problem of visual edges and textures perception and suggest that
these features may be represented at the population level in the visual cortex
as a specific second-order tensor, the structure tensor, perhaps within a
hypercolumn. We then extend the classical ring model to this case and show that
its natural framework is the non-Euclidean hyperbolic geometry. This brings in
the beautiful structure of its group of isometries and certain of its subgroups
which have a direct interpretation in terms of the organization of the neural
populations that are assumed to encode the structure tensor. By studying the
bifurcations of the solutions of the structure tensor equations, the analog of
the classical Wilson and Cowan equations, under the assumption of invariance
with respect to the action of these subgroups, we predict the appearance of
characteristic patterns. These patterns can be described by what we call
hyperbolic or H-planforms that are reminiscent of Euclidean planar waves and of
the planforms that were used in [1, 2] to account for some visual
hallucinations. If these patterns could be observed through brain imaging
techniques they would reveal the built-in or acquired invariance of the neural
organization to the action of the corresponding subgroups.Comment: 34 pages, 11 figures, 2 table
- …