13 research outputs found
Riemannian Stochastic Gradient Method for Nested Composition Optimization
This work considers optimization of composition of functions in a nested form
over Riemannian manifolds where each function contains an expectation. This
type of problems is gaining popularity in applications such as policy
evaluation in reinforcement learning or model customization in meta-learning.
The standard Riemannian stochastic gradient methods for non-compositional
optimization cannot be directly applied as stochastic approximation of inner
functions create bias in the gradients of the outer functions. For two-level
composition optimization, we present a Riemannian Stochastic Composition
Gradient Descent (R-SCGD) method that finds an approximate stationary point,
with expected squared Riemannian gradient smaller than , in
calls to the stochastic gradient oracle of the outer
function and stochastic function and gradient oracles of the inner function.
Furthermore, we generalize the R-SCGD algorithms for problems with multi-level
nested compositional structures, with the same complexity of
for the first-order stochastic oracle. Finally, the performance of the R-SCGD
method is numerically evaluated over a policy evaluation problem in
reinforcement learning
Stochastic Composition Optimization of Functions without Lipschitz Continuous Gradient
In this paper, we study the stochastic optimization of two-level composition
of functions without Lipschitz continuous gradient. The smoothness property is
generalized by the notion of relative smoothness which provokes the Bregman
gradient method. We propose three Stochastic Compositional Bregman Gradient
algorithms for the three possible nonsmooth compositional scenarios and provide
their sample complexities to achieve an -approximate stationary
point. For the smooth of relative smooth composition, the first algorithm
requires calls to the stochastic oracles of the inner
function value and gradient as well as the outer function gradient. When both
functions are relatively smooth, the second algorithm requires
calls to the inner function stochastic oracle and
calls to the inner and outer function stochastic gradient
oracles. We further improve the second algorithm by variance reduction for the
setting where just the inner function is smooth. The resulting algorithm
requires calls to the stochastic inner function value and
calls to the inner stochastic gradient and
calls to the outer function stochastic gradient. Finally, we
numerically evaluate the performance of these algorithms over two examples