94 research outputs found
An accelerated first-order regularized momentum descent ascent algorithm for stochastic nonconvex-concave minimax problems
Stochastic nonconvex minimax problems have attracted wide attention in
machine learning, signal processing and many other fields in recent years. In
this paper, we propose an accelerated first-order regularized momentum descent
ascent algorithm (FORMDA) for solving stochastic nonconvex-concave minimax
problems. The iteration complexity of the algorithm is proved to be
to obtain an
-stationary point, which achieves the best-known complexity bound
for single-loop algorithms to solve the stochastic nonconvex-concave minimax
problems under the stationarity of the objective function
Zeroth-Order Alternating Gradient Descent Ascent Algorithms for a Class of Nonconvex-Nonconcave Minimax Problems
In this paper, we consider a class of nonconvex-nonconcave minimax problems,
i.e., NC-PL minimax problems, whose objective functions satisfy the
Polyak-\Lojasiewicz (PL) condition with respect to the inner variable. We
propose a zeroth-order alternating gradient descent ascent (ZO-AGDA) algorithm
and a zeroth-order variance reduced alternating gradient descent ascent
(ZO-VRAGDA) algorithm for solving NC-PL minimax problem under the deterministic
and the stochastic setting, respectively. The number of iterations to obtain an
-stationary point of ZO-AGDA and ZO-VRAGDA algorithm for solving
NC-PL minimax problem is upper bounded by and
, respectively. To the best of our knowledge,
they are the first two zeroth-order algorithms with the iteration complexity
gurantee for solving NC-PL minimax problems
Gradient Descent Ascent for Min-Max Problems on Riemannian Manifolds
In the paper, we study a class of useful non-convex minimax optimization
problems on Riemanian manifolds and propose a class of Riemanian gradient
descent ascent algorithms to solve these minimax problems. Specifically, we
propose a new Riemannian gradient descent ascent (RGDA) algorithm for the
\textbf{deterministic} minimax optimization. Moreover, we prove that the RGDA
has a sample complexity of for finding an
-stationary point of the nonconvex strongly-concave minimax problems,
where denotes the condition number. At the same time, we introduce a
Riemannian stochastic gradient descent ascent (RSGDA) algorithm for the
\textbf{stochastic} minimax optimization. In the theoretical analysis, we prove
that the RSGDA can achieve a sample complexity of .
To further reduce the sample complexity, we propose a novel momentum
variance-reduced Riemannian stochastic gradient descent ascent (MVR-RSGDA)
algorithm based on the momentum-based variance-reduced technique of STORM. We
prove that the MVR-RSGDA algorithm achieves a lower sample complexity of
for , which reaches
the best known sample complexity for its Euclidean counterpart. Extensive
experimental results on the robust deep neural networks training over Stiefel
manifold demonstrate the efficiency of our proposed algorithms.Comment: 32 pages. We have updated the theoretical results of our methods in
this new revision. E.g., our MVR-RSGDA algorithm achieves a lower sample
complexity. arXiv admin note: text overlap with arXiv:2008.0817
Enhanced Adaptive Gradient Algorithms for Nonconvex-PL Minimax Optimization
In the paper, we study a class of nonconvex nonconcave minimax optimization
problems (i.e., ), where is possible nonconvex in
, and it is nonconcave and satisfies the Polyak-Lojasiewicz (PL) condition
in . Moreover, we propose a class of enhanced momentum-based gradient
descent ascent methods (i.e., MSGDA and AdaMSGDA) to solve these stochastic
Nonconvex-PL minimax problems. In particular, our AdaMSGDA algorithm can use
various adaptive learning rates in updating the variables and without
relying on any global and coordinate-wise adaptive learning rates.
Theoretically, we present an effective convergence analysis framework for our
methods. Specifically, we prove that our MSGDA and AdaMSGDA methods have the
best known sample (gradient) complexity of only requiring
one sample at each loop in finding an -stationary solution (i.e.,
, where ). This
manuscript commemorates the mathematician Boris Polyak (1935-2023).Comment: 30 page
Near-Optimal Decentralized Momentum Method for Nonconvex-PL Minimax Problems
Minimax optimization plays an important role in many machine learning tasks
such as generative adversarial networks (GANs) and adversarial training.
Although recently a wide variety of optimization methods have been proposed to
solve the minimax problems, most of them ignore the distributed setting where
the data is distributed on multiple workers. Meanwhile, the existing
decentralized minimax optimization methods rely on the strictly assumptions
such as (strongly) concavity and variational inequality conditions. In the
paper, thus, we propose an efficient decentralized momentum-based gradient
descent ascent (DM-GDA) method for the distributed nonconvex-PL minimax
optimization, which is nonconvex in primal variable and is nonconcave in dual
variable and satisfies the Polyak-Lojasiewicz (PL) condition. In particular,
our DM-GDA method simultaneously uses the momentum-based techniques to update
variables and estimate the stochastic gradients. Moreover, we provide a solid
convergence analysis for our DM-GDA method, and prove that it obtains a
near-optimal gradient complexity of for finding an
-stationary solution of the nonconvex-PL stochastic minimax problems,
which reaches the lower bound of nonconvex stochastic optimization. To the best
of our knowledge, we first study the decentralized algorithm for Nonconvex-PL
stochastic minimax optimization over a network.Comment: 31 page
AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization
In the paper, we propose a class of faster adaptive gradient descent ascent
methods for solving the nonconvex-strongly-concave minimax problems by using
unified adaptive matrices used in the SUPER-ADAM \citep{huang2021super}.
Specifically, we propose a fast adaptive gradient decent ascent (AdaGDA) method
based on the basic momentum technique, which reaches a low sample complexity of
for finding an -stationary point without
large batches, which improves the existing result of adaptive minimax
optimization method by a factor of . Moreover, we present an
accelerated version of AdaGDA (VR-AdaGDA) method based on the momentum-based
variance reduced technique, which achieves the best known sample complexity of
for finding an -stationary point without
large batches. Further assume the bounded Lipschitz parameter of objective
function, we prove that our VR-AdaGDA method reaches a lower sample complexity
of with the mini-batch size . In
particular, we provide an effective convergence analysis framework for our
adaptive methods based on unified adaptive matrices, which include almost
existing adaptive learning rates.Comment: 27 pages. Welcome to discuss. arXiv admin note: text overlap with
arXiv:2106.1139
Projection-Free Methods for Solving Nonconvex-Concave Saddle Point Problems
In this paper, we investigate a class of constrained saddle point (SP)
problems where the objective function is nonconvex-concave and smooth. This
class of problems has wide applicability in machine learning, including robust
multi-class classification and dictionary learning. Several projection-based
primal-dual methods have been developed for tackling this problem; however, the
availability of methods with projection-free oracles remains limited. To
address this gap, we propose efficient single-loop projection-free methods
reliant on first-order information. In particular, using regularization and
nested approximation techniques, we propose a primal-dual conditional gradient
method that solely employs linear minimization oracles to handle constraints.
Assuming that the constraint set in the maximization is strongly convex, our
method achieves an -stationary solution within
iterations. When the projection onto the
constraint set of maximization is easy to compute, we propose a one-sided
projection-free method that achieves an -stationary solution within
iterations. Moreover, we present improved
iteration complexities of our methods under a strong concavity assumption. To
the best of our knowledge, our proposed algorithms are among the first
projection-free methods with convergence guarantees for solving
nonconvex-concave SP problems.Comment: Additional experiments have been adde
- …