Search CORE

15 research outputs found

A Formalization of The Natural Gradient Method for General Similarity Measures

Author: G Raskutti
M Agueh
N Parikh
NN Schraudolph
SI Amari
SI Amari
W Li
Y Saad
Publication venue
Publication date: 01/01/2019
Field of study

In optimization, the natural gradient method is well-known for likelihood maximization. The method uses the Kullback-Leibler divergence, corresponding infinitesimally to the Fisher-Rao metric, which is pulled back to the parameter space of a family of probability distributions. This way, gradients with respect to the parameters respect the Fisher-Rao geometry of the space of distributions, which might differ vastly from the standard Euclidean geometry of the parameter space, often leading to faster convergence. However, when minimizing an arbitrary similarity measure between distributions, it is generally unclear which metric to use. We provide a general framework that, given a similarity measure, derives a metric for the natural gradient. We then discuss connections between the natural gradient method and multiple other optimization techniques in the literature. Finally, we provide computations of the formal natural gradient to show overlap with well-known cases and to compute natural gradients in novel frameworks

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Efficient Wasserstein Natural Gradients for Reinforcement Learning

Author: Arbel Michael
Gretton Arthur
Huszar Ferenc
Moskovitz Ted
Publication venue: ICLR
Publication date: 18/03/2021
Field of study

A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL). The procedure uses a computationally efficient Wasserstein natural gradient (WNG) descent that takes advantage of the geometry induced by a Wasserstein penalty to speed optimization. This method follows the recent theme in RL of including a divergence penalty in the objective to establish a trust region. Experiments on challenging tasks demonstrate improvements in both computational cost and performance over advanced baseline

arXiv.org e-Print Archive

UCL Discovery

Solving general elliptical mixture models through an approximate Wasserstein manifold

Author: Li Shengxi
Mandic Danilo
Xiang Min
Yu Zeyang
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 03/04/2020
Field of study

We address the estimation problem for general finite mixture models, with a particular focus on the elliptical mixture models (EMMs). Compared to the widely adopted Kullback-Leibler divergence, we show that the Wasserstein distance provides a more desirable optimisation space. We thus provide a stable solution to the EMMs that is both robust to initialisations and reaches a superior optimum by adaptively optimising along a manifold of an approximate Wasserstein distance. To this end, we first provide a unifying account of computable and identifiable EMMs, which serves as a basis to rigorously address the underpinning optimisation problem. Due to a probability constraint, solving this problem is extremely cumbersome and unstable, especially under the Wasserstein distance. To relieve this issue, we introduce an efficient optimisation method on a statistical manifold defined under an approximate Wasserstein distance, which allows for explicit metrics and computable operations, thus significantly stabilising and improving the EMM estimation. We further propose an adaptive method to accelerate the convergence. Experimental results demonstrate the excellent performance of the proposed EMM solver.Comment: This work has been accepted to AAAI2020. Note that this version also corrects a small error on the Equation (16) in proo

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

On parameter estimation with the Wasserstein distance

Author: Bernton Espen
Gerber Mathieu
Jacob Pierre E.
Robert Christian P.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 09/05/2019
Field of study

Statistical inference can be performed by minimizing, over the parameter space, the Wasserstein distance between model distributions and the empirical distribution of the data. We study asymptotic properties of such minimum Wasserstein distance estimators, complementing results derived by Bassetti, Bodini and Regazzini in 2006. In particular, our results cover the misspecified setting, in which the data-generating process is not assumed to be part of the family of distributions described by the model. Our results are motivated by recent applications of minimum Wasserstein estimators to complex generative models. We discuss some difficulties arising in the approximation of these estimators and illustrate their behavior in several numerical experiments. Two of our examples are taken from the literature on approximate Bayesian computation and have likelihood functions that are not analytically tractable. Two other examples involve misspecified models.Comment: 29 pages (+18 pages of appendices), 6 figures. To appear in Information and Inference: A Journal of the IMA. A previous version of this paper contained work on approximate Bayesian computation with the Wasserstein distance, which can now be found at arxiv:1905.0374

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Explore Bristol Research