13 research outputs found

    Riemannian Stochastic Gradient Method for Nested Composition Optimization

    Full text link
    This work considers optimization of composition of functions in a nested form over Riemannian manifolds where each function contains an expectation. This type of problems is gaining popularity in applications such as policy evaluation in reinforcement learning or model customization in meta-learning. The standard Riemannian stochastic gradient methods for non-compositional optimization cannot be directly applied as stochastic approximation of inner functions create bias in the gradients of the outer functions. For two-level composition optimization, we present a Riemannian Stochastic Composition Gradient Descent (R-SCGD) method that finds an approximate stationary point, with expected squared Riemannian gradient smaller than ϵ\epsilon, in O(ϵ−2)O(\epsilon^{-2}) calls to the stochastic gradient oracle of the outer function and stochastic function and gradient oracles of the inner function. Furthermore, we generalize the R-SCGD algorithms for problems with multi-level nested compositional structures, with the same complexity of O(ϵ−2)O(\epsilon^{-2}) for the first-order stochastic oracle. Finally, the performance of the R-SCGD method is numerically evaluated over a policy evaluation problem in reinforcement learning

    Stochastic Composition Optimization of Functions without Lipschitz Continuous Gradient

    Full text link
    In this paper, we study the stochastic optimization of two-level composition of functions without Lipschitz continuous gradient. The smoothness property is generalized by the notion of relative smoothness which provokes the Bregman gradient method. We propose three Stochastic Compositional Bregman Gradient algorithms for the three possible nonsmooth compositional scenarios and provide their sample complexities to achieve an ϵ\epsilon-approximate stationary point. For the smooth of relative smooth composition, the first algorithm requires O(ϵ−2)O(\epsilon^{-2}) calls to the stochastic oracles of the inner function value and gradient as well as the outer function gradient. When both functions are relatively smooth, the second algorithm requires O(ϵ−3)O(\epsilon^{-3}) calls to the inner function stochastic oracle and O(ϵ−2)O(\epsilon^{-2}) calls to the inner and outer function stochastic gradient oracles. We further improve the second algorithm by variance reduction for the setting where just the inner function is smooth. The resulting algorithm requires O(ϵ−5/2)O(\epsilon^{-5/2}) calls to the stochastic inner function value and O(ϵ−3/2)O(\epsilon^{-3/2}) calls to the inner stochastic gradient and O(ϵ−2)O(\epsilon^{-2}) calls to the outer function stochastic gradient. Finally, we numerically evaluate the performance of these algorithms over two examples
    corecore