145 research outputs found

    Hamilton-Jacobi Theory and Information Geometry

    Full text link
    Recently, a method to dynamically define a divergence function DD for a given statistical manifold (M ,g ,T)(\mathcal{M}\,,g\,,T) by means of the Hamilton-Jacobi theory associated with a suitable Lagrangian function L\mathfrak{L} on TMT\mathcal{M} has been proposed. Here we will review this construction and lay the basis for an inverse problem where we assume the divergence function DD to be known and we look for a Lagrangian function L\mathfrak{L} for which DD is a complete solution of the associated Hamilton-Jacobi theory. To apply these ideas to quantum systems, we have to replace probability distributions with probability amplitudes.Comment: 8 page

    A Formalization of The Natural Gradient Method for General Similarity Measures

    Full text link
    In optimization, the natural gradient method is well-known for likelihood maximization. The method uses the Kullback-Leibler divergence, corresponding infinitesimally to the Fisher-Rao metric, which is pulled back to the parameter space of a family of probability distributions. This way, gradients with respect to the parameters respect the Fisher-Rao geometry of the space of distributions, which might differ vastly from the standard Euclidean geometry of the parameter space, often leading to faster convergence. However, when minimizing an arbitrary similarity measure between distributions, it is generally unclear which metric to use. We provide a general framework that, given a similarity measure, derives a metric for the natural gradient. We then discuss connections between the natural gradient method and multiple other optimization techniques in the literature. Finally, we provide computations of the formal natural gradient to show overlap with well-known cases and to compute natural gradients in novel frameworks

    Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence

    Get PDF
    Incremental learning (IL) has received a lot of attention recently, however, the literature lacks a precise problem definition, proper evaluation settings, and metrics tailored specifically for the IL problem. One of the main objectives of this work is to fill these gaps so as to provide a common ground for better understanding of IL. The main challenge for an IL algorithm is to update the classifier whilst preserving existing knowledge. We observe that, in addition to forgetting, a known issue while preserving knowledge, IL also suffers from a problem we call intransigence, inability of a model to update its knowledge. We introduce two metrics to quantify forgetting and intransigence that allow us to understand, analyse, and gain better insights into the behaviour of IL algorithms. We present RWalk, a generalization of EWC++ (our efficient version of EWC [Kirkpatrick2016EWC]) and Path Integral [Zenke2017Continual] with a theoretically grounded KL-divergence based perspective. We provide a thorough analysis of various IL algorithms on MNIST and CIFAR-100 datasets. In these experiments, RWalk obtains superior results in terms of accuracy, and also provides a better trade-off between forgetting and intransigence

    The Bregman chord divergence

    Full text link
    Distances are fundamental primitives whose choice significantly impacts the performances of algorithms in machine learning and signal processing. However selecting the most appropriate distance for a given task is an endeavor. Instead of testing one by one the entries of an ever-expanding dictionary of {\em ad hoc} distances, one rather prefers to consider parametric classes of distances that are exhaustively characterized by axioms derived from first principles. Bregman divergences are such a class. However fine-tuning a Bregman divergence is delicate since it requires to smoothly adjust a functional generator. In this work, we propose an extension of Bregman divergences called the Bregman chord divergences. This new class of distances does not require gradient calculations, uses two scalar parameters that can be easily tailored in applications, and generalizes asymptotically Bregman divergences.Comment: 10 page

    Learning Adaptive Regularization for Image Labeling Using Geometric Assignment

    Full text link
    We study the inverse problem of model parameter learning for pixelwise image labeling, using the linear assignment flow and training data with ground truth. This is accomplished by a Riemannian gradient flow on the manifold of parameters that determine the regularization properties of the assignment flow. Using the symplectic partitioned Runge--Kutta method for numerical integration, it is shown that deriving the sensitivity conditions of the parameter learning problem and its discretization commute. A convenient property of our approach is that learning is based on exact inference. Carefully designed experiments demonstrate the performance of our approach, the expressiveness of the mathematical model as well as its limitations, from the viewpoint of statistical learning and optimal control

    Warped Riemannian metrics for location-scale models

    Full text link
    The present paper shows that warped Riemannian metrics, a class of Riemannian metrics which play a prominent role in Riemannian geometry, are also of fundamental importance in information geometry. Precisely, the paper features a new theorem, which states that the Rao-Fisher information metric of any location-scale model, defined on a Riemannian manifold, is a warped Riemannian metric, whenever this model is invariant under the action of some Lie group. This theorem is a valuable tool in finding the expression of the Rao-Fisher information metric of location-scale models defined on high-dimensional Riemannian manifolds. Indeed, a warped Riemannian metric is fully determined by only two functions of a single variable, irrespective of the dimension of the underlying Riemannian manifold. Starting from this theorem, several original contributions are made. The expression of the Rao-Fisher information metric of the Riemannian Gaussian model is provided, for the first time in the literature. A generalised definition of the Mahalanobis distance is introduced, which is applicable to any location-scale model defined on a Riemannian manifold. The solution of the geodesic equation is obtained, for any Rao-Fisher information metric defined in terms of warped Riemannian metrics. Finally, using a mixture of analytical and numerical computations, it is shown that the parameter space of the von Mises-Fisher model of nn-dimensional directional data, when equipped with its Rao-Fisher information metric, becomes a Hadamard manifold, a simply-connected complete Riemannian manifold of negative sectional curvature, for n=2,…,8n = 2,\ldots,8. Hopefully, in upcoming work, this will be proved for any value of nn.Comment: first version, before submissio

    Dependency-aware Attention Control for Unconstrained Face Recognition with Image Sets

    Full text link
    This paper targets the problem of image set-based face verification and identification. Unlike traditional single media (an image or video) setting, we encounter a set of heterogeneous contents containing orderless images and videos. The importance of each image is usually considered either equal or based on their independent quality assessment. How to model the relationship of orderless images within a set remains a challenge. We address this problem by formulating it as a Markov Decision Process (MDP) in the latent space. Specifically, we first present a dependency-aware attention control (DAC) network, which resorts to actor-critic reinforcement learning for sequential attention decision of each image embedding to fully exploit the rich correlation cues among the unordered images. Moreover, we introduce its sample-efficient variant with off-policy experience replay to speed up the learning process. The pose-guided representation scheme can further boost the performance at the extremes of the pose variation.Comment: Fixed the unreadable code in CVF version. arXiv admin note: text overlap with arXiv:1707.00130 by other author

    State-Space Analysis of Time-Varying Higher-Order Spike Correlation for Multiple Neural Spike Train Data

    Get PDF
    Precise spike coordination between the spiking activities of multiple neurons is suggested as an indication of coordinated network activity in active cell assemblies. Spike correlation analysis aims to identify such cooperative network activity by detecting excess spike synchrony in simultaneously recorded multiple neural spike sequences. Cooperative activity is expected to organize dynamically during behavior and cognition; therefore currently available analysis techniques must be extended to enable the estimation of multiple time-varying spike interactions between neurons simultaneously. In particular, new methods must take advantage of the simultaneous observations of multiple neurons by addressing their higher-order dependencies, which cannot be revealed by pairwise analyses alone. In this paper, we develop a method for estimating time-varying spike interactions by means of a state-space analysis. Discretized parallel spike sequences are modeled as multi-variate binary processes using a log-linear model that provides a well-defined measure of higher-order spike correlation in an information geometry framework. We construct a recursive Bayesian filter/smoother for the extraction of spike interaction parameters. This method can simultaneously estimate the dynamic pairwise spike interactions of multiple single neurons, thereby extending the Ising/spin-glass model analysis of multiple neural spike train data to a nonstationary analysis. Furthermore, the method can estimate dynamic higher-order spike interactions. To validate the inclusion of the higher-order terms in the model, we construct an approximation method to assess the goodness-of-fit to spike data. In addition, we formulate a test method for the presence of higher-order spike correlation even in nonstationary spike data, e.g., data from awake behaving animals. The utility of the proposed methods is tested using simulated spike data with known underlying correlation dynamics. Finally, we apply the methods to neural spike data simultaneously recorded from the motor cortex of an awake monkey and demonstrate that the higher-order spike correlation organizes dynamically in relation to a behavioral demand

    Hyperbolic planforms in relation to visual edges and textures perception

    Get PDF
    We propose to use bifurcation theory and pattern formation as theoretical probes for various hypotheses about the neural organization of the brain. This allows us to make predictions about the kinds of patterns that should be observed in the activity of real brains through, e.g. optical imaging, and opens the door to the design of experiments to test these hypotheses. We study the specific problem of visual edges and textures perception and suggest that these features may be represented at the population level in the visual cortex as a specific second-order tensor, the structure tensor, perhaps within a hypercolumn. We then extend the classical ring model to this case and show that its natural framework is the non-Euclidean hyperbolic geometry. This brings in the beautiful structure of its group of isometries and certain of its subgroups which have a direct interpretation in terms of the organization of the neural populations that are assumed to encode the structure tensor. By studying the bifurcations of the solutions of the structure tensor equations, the analog of the classical Wilson and Cowan equations, under the assumption of invariance with respect to the action of these subgroups, we predict the appearance of characteristic patterns. These patterns can be described by what we call hyperbolic or H-planforms that are reminiscent of Euclidean planar waves and of the planforms that were used in [1, 2] to account for some visual hallucinations. If these patterns could be observed through brain imaging techniques they would reveal the built-in or acquired invariance of the neural organization to the action of the corresponding subgroups.Comment: 34 pages, 11 figures, 2 table
    • …
    corecore