16,359 research outputs found

    Near-Optimal Decentralized Momentum Method for Nonconvex-PL Minimax Problems

    Full text link
    Minimax optimization plays an important role in many machine learning tasks such as generative adversarial networks (GANs) and adversarial training. Although recently a wide variety of optimization methods have been proposed to solve the minimax problems, most of them ignore the distributed setting where the data is distributed on multiple workers. Meanwhile, the existing decentralized minimax optimization methods rely on the strictly assumptions such as (strongly) concavity and variational inequality conditions. In the paper, thus, we propose an efficient decentralized momentum-based gradient descent ascent (DM-GDA) method for the distributed nonconvex-PL minimax optimization, which is nonconvex in primal variable and is nonconcave in dual variable and satisfies the Polyak-Lojasiewicz (PL) condition. In particular, our DM-GDA method simultaneously uses the momentum-based techniques to update variables and estimate the stochastic gradients. Moreover, we provide a solid convergence analysis for our DM-GDA method, and prove that it obtains a near-optimal gradient complexity of O(ϵ−3)O(\epsilon^{-3}) for finding an ϵ\epsilon-stationary solution of the nonconvex-PL stochastic minimax problems, which reaches the lower bound of nonconvex stochastic optimization. To the best of our knowledge, we first study the decentralized algorithm for Nonconvex-PL stochastic minimax optimization over a network.Comment: 31 page

    Gradient Descent Ascent for Min-Max Problems on Riemannian Manifolds

    Full text link
    In the paper, we study a class of useful non-convex minimax optimization problems on Riemanian manifolds and propose a class of Riemanian gradient descent ascent algorithms to solve these minimax problems. Specifically, we propose a new Riemannian gradient descent ascent (RGDA) algorithm for the \textbf{deterministic} minimax optimization. Moreover, we prove that the RGDA has a sample complexity of O(κ2ϵ−2)O(\kappa^2\epsilon^{-2}) for finding an ϵ\epsilon-stationary point of the nonconvex strongly-concave minimax problems, where κ\kappa denotes the condition number. At the same time, we introduce a Riemannian stochastic gradient descent ascent (RSGDA) algorithm for the \textbf{stochastic} minimax optimization. In the theoretical analysis, we prove that the RSGDA can achieve a sample complexity of O(κ3ϵ−4)O(\kappa^3\epsilon^{-4}). To further reduce the sample complexity, we propose a novel momentum variance-reduced Riemannian stochastic gradient descent ascent (MVR-RSGDA) algorithm based on the momentum-based variance-reduced technique of STORM. We prove that the MVR-RSGDA algorithm achieves a lower sample complexity of O~(κ(3−ν/2)ϵ−3)\tilde{O}(\kappa^{(3-\nu/2)}\epsilon^{-3}) for ν≥0\nu \geq 0, which reaches the best known sample complexity for its Euclidean counterpart. Extensive experimental results on the robust deep neural networks training over Stiefel manifold demonstrate the efficiency of our proposed algorithms.Comment: 32 pages. We have updated the theoretical results of our methods in this new revision. E.g., our MVR-RSGDA algorithm achieves a lower sample complexity. arXiv admin note: text overlap with arXiv:2008.0817

    Uncertainties in stochastic programming models: The minimax approach

    Get PDF
    50 years ago, stochastic programming was introduced to deal with uncertain values of coefficients which were observed in applications of mathematical programming. These uncertainties were modeled as random and the assumption of complete knowledge of the probability distribution of random parameters became a standard. Hence, there is a new type of uncertainty concerning the probability distribution. Using a hypothetical, ad hoc distribution may lead to bad, costly decisions. Besides of a subsequent output analysis it pays to include the existing, possibly limited information into the model, cf. the minimax approach which will be the main item of this presentation. It applies to cases when the probability distribution is only known to belong to a specified class of probability distributions and one wishes to hedge against the least favorable distribution. The minimax approach has been developed for special types of stochastic programs and special choices of the class of probability distributions and there are recent results aiming at algorithmic solution of minimax problems and on stability properties of minimax solutions

    Adaptive Federated Minimax Optimization with Lower complexities

    Full text link
    Federated learning is a popular distributed and privacy-preserving machine learning paradigm. Meanwhile, minimax optimization, as an effective hierarchical optimization, is widely applied in machine learning. Recently, some federated optimization methods have been proposed to solve the distributed minimax problems. However, these federated minimax methods still suffer from high gradient and communication complexities. Meanwhile, few algorithm focuses on using adaptive learning rate to accelerate algorithms. To fill this gap, in the paper, we study a class of nonconvex minimax optimization, and propose an efficient adaptive federated minimax optimization algorithm (i.e., AdaFGDA) to solve these distributed minimax problems. Specifically, our AdaFGDA builds on the momentum-based variance reduced and local-SGD techniques, and it can flexibly incorporate various adaptive learning rates by using the unified adaptive matrix. Theoretically, we provide a solid convergence analysis framework for our AdaFGDA algorithm under non-i.i.d. setting. Moreover, we prove our algorithms obtain lower gradient (i.e., stochastic first-order oracle, SFO) complexity of O~(ϵ−3)\tilde{O}(\epsilon^{-3}) with lower communication complexity of O~(ϵ−2)\tilde{O}(\epsilon^{-2}) in finding ϵ\epsilon-stationary point of the nonconvex minimax problems. Experimentally, we conduct some experiments on the deep AUC maximization and robust neural network training tasks to verify efficiency of our algorithms.Comment: Submitted to AISTATS-202

    Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates

    Full text link
    Distributed and federated learning algorithms and techniques associated primarily with minimization problems. However, with the increase of minimax optimization and variational inequality problems in machine learning, the necessity of designing efficient distributed/federated learning approaches for these problems is becoming more apparent. In this paper, we provide a unified convergence analysis of communication-efficient local training methods for distributed variational inequality problems (VIPs). Our approach is based on a general key assumption on the stochastic estimates that allows us to propose and analyze several novel local training algorithms under a single framework for solving a class of structured non-monotone VIPs. We present the first local gradient descent-accent algorithms with provable improved communication complexity for solving distributed variational inequalities on heterogeneous data. The general algorithmic framework recovers state-of-the-art algorithms and their sharp convergence guarantees when the setting is specialized to minimization or minimax optimization problems. Finally, we demonstrate the strong performance of the proposed algorithms compared to state-of-the-art methods when solving federated minimax optimization problems
    • …
    corecore