639 research outputs found
Efficient Smooth Non-Convex Stochastic Compositional Optimization via Stochastic Recursive Gradient Descent
Stochastic compositional optimization arises in many important machine learning applications. The objective function is the composition of two expectations of stochastic functions, and is more challenging to optimize than vanilla stochastic optimization problems. In this paper, we investigate the stochastic compositional optimization in the general smooth non-convex setting. We employ a recently developed idea of Stochastic Recursive Gradient Descent to design a novel algorithm named SARAH-Compositional, and prove a sharp Incremental First-order Oracle (IFO) complexity upper bound for stochastic compositional optimization: ((n + m)1/2ε-2) in the finite-sum case and (ε-3) in the online case. Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization. Numerical experiments on risk-adverse portfolio management validate the superiority of SARAH-Compositional over a few rival algorithms
Stochastic Multi-Level Compositional Optimization Algorithms over Networks with Level-Independent Convergence Rate
Stochastic multi-level compositional optimization problems cover many new
machine learning paradigms, e.g., multi-step model-agnostic meta-learning,
which require efficient optimization algorithms for large-scale applications.
This paper studies the decentralized stochastic multi-level optimization
algorithm, which is challenging because the multi-level structure and
decentralized communication scheme may make the number of levels affect the
order of the convergence rate. To this end, we develop two novel decentralized
optimization algorithms to deal with the multi-level function and its gradient.
Our theoretical results show that both algorithms can achieve the
level-independent convergence rate for nonconvex problems under much milder
conditions compared with existing single-machine algorithms. To the best of our
knowledge, this is the first work that achieves the level-independent
convergence rate under the decentralized setting. Moreover, extensive
experiments confirm the efficacy of our proposed algorithms
Riemannian Stochastic Gradient Method for Nested Composition Optimization
This work considers optimization of composition of functions in a nested form
over Riemannian manifolds where each function contains an expectation. This
type of problems is gaining popularity in applications such as policy
evaluation in reinforcement learning or model customization in meta-learning.
The standard Riemannian stochastic gradient methods for non-compositional
optimization cannot be directly applied as stochastic approximation of inner
functions create bias in the gradients of the outer functions. For two-level
composition optimization, we present a Riemannian Stochastic Composition
Gradient Descent (R-SCGD) method that finds an approximate stationary point,
with expected squared Riemannian gradient smaller than , in
calls to the stochastic gradient oracle of the outer
function and stochastic function and gradient oracles of the inner function.
Furthermore, we generalize the R-SCGD algorithms for problems with multi-level
nested compositional structures, with the same complexity of
for the first-order stochastic oracle. Finally, the performance of the R-SCGD
method is numerically evaluated over a policy evaluation problem in
reinforcement learning
Algorithmic Foundations of Empirical X-risk Minimization
This manuscript introduces a new optimization framework for machine learning
and AI, named {\bf empirical X-risk minimization (EXM)}. X-risk is a term
introduced to represent a family of compositional measures or objectives, in
which each data point is compared with a large number of items explicitly or
implicitly for defining a risk function. It includes surrogate objectives of
many widely used measures and non-decomposable losses, e.g., AUROC, AUPRC,
partial AUROC, NDCG, MAP, precision/recall at top positions, precision at a
certain recall level, listwise losses, p-norm push, top push, global
contrastive losses, etc. While these non-decomposable objectives and their
optimization algorithms have been studied in the literature of machine
learning, computer vision, information retrieval, and etc, optimizing these
objectives has encountered some unique challenges for deep learning. In this
paper, we present recent rigorous efforts for EXM with a focus on its
algorithmic foundations and its applications. We introduce a class of
algorithmic techniques for solving EXM with smooth non-convex objectives. We
formulate EXM into three special families of non-convex optimization problems
belonging to non-convex compositional optimization, non-convex min-max
optimization and non-convex bilevel optimization, respectively. For each family
of problems, we present some strong baseline algorithms and their complexities,
which will motivate further research for improving the existing results.
Discussions about the presented results and future studies are given at the
end. Efficient algorithms for optimizing a variety of X-risks are implemented
in the LibAUC library at \url{www.libauc.org}
- …