16,359 research outputs found
Near-Optimal Decentralized Momentum Method for Nonconvex-PL Minimax Problems
Minimax optimization plays an important role in many machine learning tasks
such as generative adversarial networks (GANs) and adversarial training.
Although recently a wide variety of optimization methods have been proposed to
solve the minimax problems, most of them ignore the distributed setting where
the data is distributed on multiple workers. Meanwhile, the existing
decentralized minimax optimization methods rely on the strictly assumptions
such as (strongly) concavity and variational inequality conditions. In the
paper, thus, we propose an efficient decentralized momentum-based gradient
descent ascent (DM-GDA) method for the distributed nonconvex-PL minimax
optimization, which is nonconvex in primal variable and is nonconcave in dual
variable and satisfies the Polyak-Lojasiewicz (PL) condition. In particular,
our DM-GDA method simultaneously uses the momentum-based techniques to update
variables and estimate the stochastic gradients. Moreover, we provide a solid
convergence analysis for our DM-GDA method, and prove that it obtains a
near-optimal gradient complexity of for finding an
-stationary solution of the nonconvex-PL stochastic minimax problems,
which reaches the lower bound of nonconvex stochastic optimization. To the best
of our knowledge, we first study the decentralized algorithm for Nonconvex-PL
stochastic minimax optimization over a network.Comment: 31 page
Gradient Descent Ascent for Min-Max Problems on Riemannian Manifolds
In the paper, we study a class of useful non-convex minimax optimization
problems on Riemanian manifolds and propose a class of Riemanian gradient
descent ascent algorithms to solve these minimax problems. Specifically, we
propose a new Riemannian gradient descent ascent (RGDA) algorithm for the
\textbf{deterministic} minimax optimization. Moreover, we prove that the RGDA
has a sample complexity of for finding an
-stationary point of the nonconvex strongly-concave minimax problems,
where denotes the condition number. At the same time, we introduce a
Riemannian stochastic gradient descent ascent (RSGDA) algorithm for the
\textbf{stochastic} minimax optimization. In the theoretical analysis, we prove
that the RSGDA can achieve a sample complexity of .
To further reduce the sample complexity, we propose a novel momentum
variance-reduced Riemannian stochastic gradient descent ascent (MVR-RSGDA)
algorithm based on the momentum-based variance-reduced technique of STORM. We
prove that the MVR-RSGDA algorithm achieves a lower sample complexity of
for , which reaches
the best known sample complexity for its Euclidean counterpart. Extensive
experimental results on the robust deep neural networks training over Stiefel
manifold demonstrate the efficiency of our proposed algorithms.Comment: 32 pages. We have updated the theoretical results of our methods in
this new revision. E.g., our MVR-RSGDA algorithm achieves a lower sample
complexity. arXiv admin note: text overlap with arXiv:2008.0817
Uncertainties in stochastic programming models: The minimax approach
50 years ago, stochastic programming was introduced to deal with uncertain values of coefficients which were observed in applications of mathematical programming. These uncertainties were modeled as random and the assumption of complete knowledge of the probability distribution of random parameters became a standard. Hence, there is a new type of uncertainty concerning the probability distribution. Using a hypothetical, ad hoc distribution may lead to bad, costly decisions. Besides of a subsequent output analysis it pays to include the existing, possibly limited information into the model, cf. the minimax approach which will be the main item of this presentation. It applies to cases when the probability distribution is only known to belong to a specified class of probability distributions and one wishes to hedge against the least favorable distribution. The minimax approach has been developed for special types of stochastic programs and special choices of the class of probability distributions and there are recent results aiming at algorithmic solution of minimax problems and on stability properties of minimax solutions
Adaptive Federated Minimax Optimization with Lower complexities
Federated learning is a popular distributed and privacy-preserving machine
learning paradigm. Meanwhile, minimax optimization, as an effective
hierarchical optimization, is widely applied in machine learning. Recently,
some federated optimization methods have been proposed to solve the distributed
minimax problems. However, these federated minimax methods still suffer from
high gradient and communication complexities. Meanwhile, few algorithm focuses
on using adaptive learning rate to accelerate algorithms. To fill this gap, in
the paper, we study a class of nonconvex minimax optimization, and propose an
efficient adaptive federated minimax optimization algorithm (i.e., AdaFGDA) to
solve these distributed minimax problems. Specifically, our AdaFGDA builds on
the momentum-based variance reduced and local-SGD techniques, and it can
flexibly incorporate various adaptive learning rates by using the unified
adaptive matrix. Theoretically, we provide a solid convergence analysis
framework for our AdaFGDA algorithm under non-i.i.d. setting. Moreover, we
prove our algorithms obtain lower gradient (i.e., stochastic first-order
oracle, SFO) complexity of with lower communication
complexity of in finding -stationary point
of the nonconvex minimax problems. Experimentally, we conduct some experiments
on the deep AUC maximization and robust neural network training tasks to verify
efficiency of our algorithms.Comment: Submitted to AISTATS-202
Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates
Distributed and federated learning algorithms and techniques associated
primarily with minimization problems. However, with the increase of minimax
optimization and variational inequality problems in machine learning, the
necessity of designing efficient distributed/federated learning approaches for
these problems is becoming more apparent. In this paper, we provide a unified
convergence analysis of communication-efficient local training methods for
distributed variational inequality problems (VIPs). Our approach is based on a
general key assumption on the stochastic estimates that allows us to propose
and analyze several novel local training algorithms under a single framework
for solving a class of structured non-monotone VIPs. We present the first local
gradient descent-accent algorithms with provable improved communication
complexity for solving distributed variational inequalities on heterogeneous
data. The general algorithmic framework recovers state-of-the-art algorithms
and their sharp convergence guarantees when the setting is specialized to
minimization or minimax optimization problems. Finally, we demonstrate the
strong performance of the proposed algorithms compared to state-of-the-art
methods when solving federated minimax optimization problems
- …