1,744 research outputs found
Stochastic Nested Compositional Bi-level Optimization for Robust Feature Learning
We develop and analyze stochastic approximation algorithms for solving nested
compositional bi-level optimization problems. These problems involve a nested
composition of potentially non-convex smooth functions in the upper-level,
and a smooth and strongly convex function in the lower-level. Our proposed
algorithm does not rely on matrix inversions or mini-batches and can achieve an
-stationary solution with an oracle complexity of approximately
, assuming the availability of stochastic
first-order oracles for the individual functions in the composition and the
lower-level, which are unbiased and have bounded moments. Here,
hides polylog factors and constants that depend on . The key challenge we
address in establishing this result relates to handling three distinct sources
of bias in the stochastic gradients. The first source arises from the
compositional nature of the upper-level, the second stems from the bi-level
structure, and the third emerges due to the utilization of Neumann series
approximations to avoid matrix inversion. To demonstrate the effectiveness of
our approach, we apply it to the problem of robust feature learning for deep
neural networks under covariate shift, showcasing the benefits and advantages
of our methodology in that context
Stochastic Multi-Level Compositional Optimization Algorithms over Networks with Level-Independent Convergence Rate
Stochastic multi-level compositional optimization problems cover many new
machine learning paradigms, e.g., multi-step model-agnostic meta-learning,
which require efficient optimization algorithms for large-scale applications.
This paper studies the decentralized stochastic multi-level optimization
algorithm, which is challenging because the multi-level structure and
decentralized communication scheme may make the number of levels affect the
order of the convergence rate. To this end, we develop two novel decentralized
optimization algorithms to deal with the multi-level function and its gradient.
Our theoretical results show that both algorithms can achieve the
level-independent convergence rate for nonconvex problems under much milder
conditions compared with existing single-machine algorithms. To the best of our
knowledge, this is the first work that achieves the level-independent
convergence rate under the decentralized setting. Moreover, extensive
experiments confirm the efficacy of our proposed algorithms
Efficient Smooth Non-Convex Stochastic Compositional Optimization via Stochastic Recursive Gradient Descent
Stochastic compositional optimization arises in many important machine learning applications. The objective function is the composition of two expectations of stochastic functions, and is more challenging to optimize than vanilla stochastic optimization problems. In this paper, we investigate the stochastic compositional optimization in the general smooth non-convex setting. We employ a recently developed idea of Stochastic Recursive Gradient Descent to design a novel algorithm named SARAH-Compositional, and prove a sharp Incremental First-order Oracle (IFO) complexity upper bound for stochastic compositional optimization: ((n + m)1/2ε-2) in the finite-sum case and (ε-3) in the online case. Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization. Numerical experiments on risk-adverse portfolio management validate the superiority of SARAH-Compositional over a few rival algorithms
Algorithmic Foundations of Empirical X-risk Minimization
This manuscript introduces a new optimization framework for machine learning
and AI, named {\bf empirical X-risk minimization (EXM)}. X-risk is a term
introduced to represent a family of compositional measures or objectives, in
which each data point is compared with a large number of items explicitly or
implicitly for defining a risk function. It includes surrogate objectives of
many widely used measures and non-decomposable losses, e.g., AUROC, AUPRC,
partial AUROC, NDCG, MAP, precision/recall at top positions, precision at a
certain recall level, listwise losses, p-norm push, top push, global
contrastive losses, etc. While these non-decomposable objectives and their
optimization algorithms have been studied in the literature of machine
learning, computer vision, information retrieval, and etc, optimizing these
objectives has encountered some unique challenges for deep learning. In this
paper, we present recent rigorous efforts for EXM with a focus on its
algorithmic foundations and its applications. We introduce a class of
algorithmic techniques for solving EXM with smooth non-convex objectives. We
formulate EXM into three special families of non-convex optimization problems
belonging to non-convex compositional optimization, non-convex min-max
optimization and non-convex bilevel optimization, respectively. For each family
of problems, we present some strong baseline algorithms and their complexities,
which will motivate further research for improving the existing results.
Discussions about the presented results and future studies are given at the
end. Efficient algorithms for optimizing a variety of X-risks are implemented
in the LibAUC library at \url{www.libauc.org}
Federated Multi-Level Optimization over Decentralized Networks
Multi-level optimization has gained increasing attention in recent years, as
it provides a powerful framework for solving complex optimization problems that
arise in many fields, such as meta-learning, multi-player games, reinforcement
learning, and nested composition optimization. In this paper, we study the
problem of distributed multi-level optimization over a network, where agents
can only communicate with their immediate neighbors. This setting is motivated
by the need for distributed optimization in large-scale systems, where
centralized optimization may not be practical or feasible. To address this
problem, we propose a novel gossip-based distributed multi-level optimization
algorithm that enables networked agents to solve optimization problems at
different levels in a single timescale and share information through network
propagation. Our algorithm achieves optimal sample complexity, scaling linearly
with the network size, and demonstrates state-of-the-art performance on various
applications, including hyper-parameter tuning, decentralized reinforcement
learning, and risk-averse optimization.Comment: arXiv admin note: substantial text overlap with arXiv:2206.1087
- …