15 research outputs found

    Optimization via conformal Hamiltonian systems on manifolds

    Full text link
    In this work we propose a method to perform optimization on manifolds. We assume to have an objective function ff defined on a manifold and think of it as the potential energy of a mechanical system. By adding a momentum-dependent kinetic energy we define its Hamiltonian function, which allows us to write the corresponding Hamiltonian system. We make it conformal by introducing a dissipation term: the result is the continuous model of our scheme. We solve it via splitting methods (Lie-Trotter and leapfrog): we combine the RATTLE scheme, approximating the conserved flow, with the exact dissipated flow. The result is a conformal symplectic method for constant stepsizes. We also propose an adaptive stepsize version of it. We test it on an example, the minimization of a function defined on a sphere, and compare it with the usual gradient descent method.Comment: 21 pages, 6 figures, 1 page. Presented at GSI conference 202

    Riemannian Stochastic Gradient Method for Nested Composition Optimization

    Full text link
    This work considers optimization of composition of functions in a nested form over Riemannian manifolds where each function contains an expectation. This type of problems is gaining popularity in applications such as policy evaluation in reinforcement learning or model customization in meta-learning. The standard Riemannian stochastic gradient methods for non-compositional optimization cannot be directly applied as stochastic approximation of inner functions create bias in the gradients of the outer functions. For two-level composition optimization, we present a Riemannian Stochastic Composition Gradient Descent (R-SCGD) method that finds an approximate stationary point, with expected squared Riemannian gradient smaller than ϵ\epsilon, in O(ϵ−2)O(\epsilon^{-2}) calls to the stochastic gradient oracle of the outer function and stochastic function and gradient oracles of the inner function. Furthermore, we generalize the R-SCGD algorithms for problems with multi-level nested compositional structures, with the same complexity of O(ϵ−2)O(\epsilon^{-2}) for the first-order stochastic oracle. Finally, the performance of the R-SCGD method is numerically evaluated over a policy evaluation problem in reinforcement learning

    Sufficient conditions for non-asymptotic convergence of Riemannian optimisation methods

    Full text link
    Motivated by energy based analyses for descent methods in the Euclidean setting, we investigate a generalisation of such analyses for descent methods over Riemannian manifolds. In doing so, we find that it is possible to derive curvature-free guarantees for such descent methods. This also enables us to give the first known guarantees for a Riemannian cubic-regularised Newton algorithm over gg-convex functions, which extends the guarantees by Agarwal et al [2021] for an adaptive Riemannian cubic-regularised Newton algorithm over general non-convex functions. This analysis leads us to study acceleration of Riemannian gradient descent in the gg-convex setting, and we improve on an existing result by Alimisis et al [2021], albeit with a curvature-dependent rate. Finally, extending the analysis by Ahn and Sra [2020], we attempt to provide some sufficient conditions for the acceleration of Riemannian descent methods in the strongly geodesically convex setting.Comment: Paper accepted at the OPT-ML Workshop, NeurIPS 202

    Solving general elliptical mixture models through an approximate Wasserstein manifold

    Full text link
    We address the estimation problem for general finite mixture models, with a particular focus on the elliptical mixture models (EMMs). Compared to the widely adopted Kullback-Leibler divergence, we show that the Wasserstein distance provides a more desirable optimisation space. We thus provide a stable solution to the EMMs that is both robust to initialisations and reaches a superior optimum by adaptively optimising along a manifold of an approximate Wasserstein distance. To this end, we first provide a unifying account of computable and identifiable EMMs, which serves as a basis to rigorously address the underpinning optimisation problem. Due to a probability constraint, solving this problem is extremely cumbersome and unstable, especially under the Wasserstein distance. To relieve this issue, we introduce an efficient optimisation method on a statistical manifold defined under an approximate Wasserstein distance, which allows for explicit metrics and computable operations, thus significantly stabilising and improving the EMM estimation. We further propose an adaptive method to accelerate the convergence. Experimental results demonstrate the excellent performance of the proposed EMM solver.Comment: This work has been accepted to AAAI2020. Note that this version also corrects a small error on the Equation (16) in proo

    Fast gradient method for Low-Rank Matrix Estimation

    Full text link
    Projected gradient descent and its Riemannian variant belong to a typical class of methods for low-rank matrix estimation. This paper proposes a new Nesterov's Accelerated Riemannian Gradient algorithm by efficient orthographic retraction and tangent space projection. The subspace relationship between iterative and extrapolated sequences on the low-rank matrix manifold provides a computational convenience. With perturbation analysis of truncated singular value decomposition and two retractions, we systematically analyze the local convergence of gradient algorithms and Nesterov's variants in the Euclidean and Riemannian settings. Theoretically, we estimate the exact rate of local linear convergence under different parameters using the spectral radius in a closed form and give the optimal convergence rate and the corresponding momentum parameter. When the parameter is unknown, the adaptive restart scheme can avoid the oscillation problem caused by high momentum, thus approaching the optimal convergence rate. Extensive numerical experiments confirm the estimations of convergence rate and demonstrate that the proposed algorithm is competitive with first-order methods for matrix completion and matrix sensing.Comment: Accepted for publication in Journal of Scientific Computin
    corecore