616 research outputs found
Efficient Smooth Non-Convex Stochastic Compositional Optimization via Stochastic Recursive Gradient Descent
Stochastic compositional optimization arises in many important machine learning applications. The objective function is the composition of two expectations of stochastic functions, and is more challenging to optimize than vanilla stochastic optimization problems. In this paper, we investigate the stochastic compositional optimization in the general smooth non-convex setting. We employ a recently developed idea of Stochastic Recursive Gradient Descent to design a novel algorithm named SARAH-Compositional, and prove a sharp Incremental First-order Oracle (IFO) complexity upper bound for stochastic compositional optimization: ((n + m)1/2ε-2) in the finite-sum case and (ε-3) in the online case. Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization. Numerical experiments on risk-adverse portfolio management validate the superiority of SARAH-Compositional over a few rival algorithms
Stochastic Composition Optimization of Functions without Lipschitz Continuous Gradient
In this paper, we study the stochastic optimization of two-level composition
of functions without Lipschitz continuous gradient. The smoothness property is
generalized by the notion of relative smoothness which provokes the Bregman
gradient method. We propose three Stochastic Compositional Bregman Gradient
algorithms for the three possible nonsmooth compositional scenarios and provide
their sample complexities to achieve an -approximate stationary
point. For the smooth of relative smooth composition, the first algorithm
requires calls to the stochastic oracles of the inner
function value and gradient as well as the outer function gradient. When both
functions are relatively smooth, the second algorithm requires
calls to the inner function stochastic oracle and
calls to the inner and outer function stochastic gradient
oracles. We further improve the second algorithm by variance reduction for the
setting where just the inner function is smooth. The resulting algorithm
requires calls to the stochastic inner function value and
calls to the inner stochastic gradient and
calls to the outer function stochastic gradient. Finally, we
numerically evaluate the performance of these algorithms over two examples
Riemannian Stochastic Gradient Method for Nested Composition Optimization
This work considers optimization of composition of functions in a nested form
over Riemannian manifolds where each function contains an expectation. This
type of problems is gaining popularity in applications such as policy
evaluation in reinforcement learning or model customization in meta-learning.
The standard Riemannian stochastic gradient methods for non-compositional
optimization cannot be directly applied as stochastic approximation of inner
functions create bias in the gradients of the outer functions. For two-level
composition optimization, we present a Riemannian Stochastic Composition
Gradient Descent (R-SCGD) method that finds an approximate stationary point,
with expected squared Riemannian gradient smaller than , in
calls to the stochastic gradient oracle of the outer
function and stochastic function and gradient oracles of the inner function.
Furthermore, we generalize the R-SCGD algorithms for problems with multi-level
nested compositional structures, with the same complexity of
for the first-order stochastic oracle. Finally, the performance of the R-SCGD
method is numerically evaluated over a policy evaluation problem in
reinforcement learning
Stochastic Constrained DRO with a Complexity Independent of Sample Size
Distributionally Robust Optimization (DRO), as a popular method to train
robust models against distribution shift between training and test sets, has
received tremendous attention in recent years. In this paper, we propose and
analyze stochastic algorithms that apply to both non-convex and convex losses
for solving Kullback Leibler divergence constrained DRO problem. Compared with
existing methods solving this problem, our stochastic algorithms not only enjoy
competitive if not better complexity independent of sample size but also just
require a constant batch size at every iteration, which is more practical for
broad applications. We establish a nearly optimal complexity bound for finding
an stationary solution for non-convex losses and an optimal
complexity for finding an optimal solution for convex losses.
Empirical studies demonstrate the effectiveness of the proposed algorithms for
solving non-convex and convex constrained DRO problems.Comment: 37 pages, 16 figure
Multi-block Min-max Bilevel Optimization with Applications in Multi-task Deep AUC Maximization
In this paper, we study multi-block min-max bilevel optimization problems,
where the upper level is non-convex strongly-concave minimax objective and the
lower level is a strongly convex objective, and there are multiple blocks of
dual variables and lower level problems. Due to the intertwined multi-block
min-max bilevel structure, the computational cost at each iteration could be
prohibitively high, especially with a large number of blocks. To tackle this
challenge, we present a single-loop randomized stochastic algorithm, which
requires updates for only a constant number of blocks at each iteration. Under
some mild assumptions on the problem, we establish its sample complexity of
for finding an -stationary point. This matches the
optimal complexity for solving stochastic nonconvex optimization under a
general unbiased stochastic oracle model. Moreover, we provide two applications
of the proposed method in multi-task deep AUC (area under ROC curve)
maximization and multi-task deep partial AUC maximization. Experimental results
validate our theory and demonstrate the effectiveness of our method on problems
with hundreds of tasks
- …