94 research outputs found

    An accelerated first-order regularized momentum descent ascent algorithm for stochastic nonconvex-concave minimax problems

    Full text link
    Stochastic nonconvex minimax problems have attracted wide attention in machine learning, signal processing and many other fields in recent years. In this paper, we propose an accelerated first-order regularized momentum descent ascent algorithm (FORMDA) for solving stochastic nonconvex-concave minimax problems. The iteration complexity of the algorithm is proved to be O~(ε6.5)\tilde{\mathcal{O}}(\varepsilon ^{-6.5}) to obtain an ε\varepsilon-stationary point, which achieves the best-known complexity bound for single-loop algorithms to solve the stochastic nonconvex-concave minimax problems under the stationarity of the objective function

    Zeroth-Order Alternating Gradient Descent Ascent Algorithms for a Class of Nonconvex-Nonconcave Minimax Problems

    Full text link
    In this paper, we consider a class of nonconvex-nonconcave minimax problems, i.e., NC-PL minimax problems, whose objective functions satisfy the Polyak-\Lojasiewicz (PL) condition with respect to the inner variable. We propose a zeroth-order alternating gradient descent ascent (ZO-AGDA) algorithm and a zeroth-order variance reduced alternating gradient descent ascent (ZO-VRAGDA) algorithm for solving NC-PL minimax problem under the deterministic and the stochastic setting, respectively. The number of iterations to obtain an ϵ\epsilon-stationary point of ZO-AGDA and ZO-VRAGDA algorithm for solving NC-PL minimax problem is upper bounded by O(ε2)\mathcal{O}(\varepsilon^{-2}) and O(ε3)\mathcal{O}(\varepsilon^{-3}), respectively. To the best of our knowledge, they are the first two zeroth-order algorithms with the iteration complexity gurantee for solving NC-PL minimax problems

    Gradient Descent Ascent for Min-Max Problems on Riemannian Manifolds

    Full text link
    In the paper, we study a class of useful non-convex minimax optimization problems on Riemanian manifolds and propose a class of Riemanian gradient descent ascent algorithms to solve these minimax problems. Specifically, we propose a new Riemannian gradient descent ascent (RGDA) algorithm for the \textbf{deterministic} minimax optimization. Moreover, we prove that the RGDA has a sample complexity of O(κ2ϵ2)O(\kappa^2\epsilon^{-2}) for finding an ϵ\epsilon-stationary point of the nonconvex strongly-concave minimax problems, where κ\kappa denotes the condition number. At the same time, we introduce a Riemannian stochastic gradient descent ascent (RSGDA) algorithm for the \textbf{stochastic} minimax optimization. In the theoretical analysis, we prove that the RSGDA can achieve a sample complexity of O(κ3ϵ4)O(\kappa^3\epsilon^{-4}). To further reduce the sample complexity, we propose a novel momentum variance-reduced Riemannian stochastic gradient descent ascent (MVR-RSGDA) algorithm based on the momentum-based variance-reduced technique of STORM. We prove that the MVR-RSGDA algorithm achieves a lower sample complexity of O~(κ(3ν/2)ϵ3)\tilde{O}(\kappa^{(3-\nu/2)}\epsilon^{-3}) for ν0\nu \geq 0, which reaches the best known sample complexity for its Euclidean counterpart. Extensive experimental results on the robust deep neural networks training over Stiefel manifold demonstrate the efficiency of our proposed algorithms.Comment: 32 pages. We have updated the theoretical results of our methods in this new revision. E.g., our MVR-RSGDA algorithm achieves a lower sample complexity. arXiv admin note: text overlap with arXiv:2008.0817

    Enhanced Adaptive Gradient Algorithms for Nonconvex-PL Minimax Optimization

    Full text link
    In the paper, we study a class of nonconvex nonconcave minimax optimization problems (i.e., minxmaxyf(x,y)\min_x\max_y f(x,y)), where f(x,y)f(x,y) is possible nonconvex in xx, and it is nonconcave and satisfies the Polyak-Lojasiewicz (PL) condition in yy. Moreover, we propose a class of enhanced momentum-based gradient descent ascent methods (i.e., MSGDA and AdaMSGDA) to solve these stochastic Nonconvex-PL minimax problems. In particular, our AdaMSGDA algorithm can use various adaptive learning rates in updating the variables xx and yy without relying on any global and coordinate-wise adaptive learning rates. Theoretically, we present an effective convergence analysis framework for our methods. Specifically, we prove that our MSGDA and AdaMSGDA methods have the best known sample (gradient) complexity of O(ϵ3)O(\epsilon^{-3}) only requiring one sample at each loop in finding an ϵ\epsilon-stationary solution (i.e., EF(x)ϵ\mathbb{E}\|\nabla F(x)\|\leq \epsilon, where F(x)=maxyf(x,y)F(x)=\max_y f(x,y)). This manuscript commemorates the mathematician Boris Polyak (1935-2023).Comment: 30 page

    Near-Optimal Decentralized Momentum Method for Nonconvex-PL Minimax Problems

    Full text link
    Minimax optimization plays an important role in many machine learning tasks such as generative adversarial networks (GANs) and adversarial training. Although recently a wide variety of optimization methods have been proposed to solve the minimax problems, most of them ignore the distributed setting where the data is distributed on multiple workers. Meanwhile, the existing decentralized minimax optimization methods rely on the strictly assumptions such as (strongly) concavity and variational inequality conditions. In the paper, thus, we propose an efficient decentralized momentum-based gradient descent ascent (DM-GDA) method for the distributed nonconvex-PL minimax optimization, which is nonconvex in primal variable and is nonconcave in dual variable and satisfies the Polyak-Lojasiewicz (PL) condition. In particular, our DM-GDA method simultaneously uses the momentum-based techniques to update variables and estimate the stochastic gradients. Moreover, we provide a solid convergence analysis for our DM-GDA method, and prove that it obtains a near-optimal gradient complexity of O(ϵ3)O(\epsilon^{-3}) for finding an ϵ\epsilon-stationary solution of the nonconvex-PL stochastic minimax problems, which reaches the lower bound of nonconvex stochastic optimization. To the best of our knowledge, we first study the decentralized algorithm for Nonconvex-PL stochastic minimax optimization over a network.Comment: 31 page

    AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization

    Full text link
    In the paper, we propose a class of faster adaptive gradient descent ascent methods for solving the nonconvex-strongly-concave minimax problems by using unified adaptive matrices used in the SUPER-ADAM \citep{huang2021super}. Specifically, we propose a fast adaptive gradient decent ascent (AdaGDA) method based on the basic momentum technique, which reaches a low sample complexity of O(κ4ϵ4)O(\kappa^4\epsilon^{-4}) for finding an ϵ\epsilon-stationary point without large batches, which improves the existing result of adaptive minimax optimization method by a factor of O(κ)O(\sqrt{\kappa}). Moreover, we present an accelerated version of AdaGDA (VR-AdaGDA) method based on the momentum-based variance reduced technique, which achieves the best known sample complexity of O(κ3ϵ3)O(\kappa^3\epsilon^{-3}) for finding an ϵ\epsilon-stationary point without large batches. Further assume the bounded Lipschitz parameter of objective function, we prove that our VR-AdaGDA method reaches a lower sample complexity of O(κ2.5ϵ3)O(\kappa^{2.5}\epsilon^{-3}) with the mini-batch size O(κ)O(\kappa). In particular, we provide an effective convergence analysis framework for our adaptive methods based on unified adaptive matrices, which include almost existing adaptive learning rates.Comment: 27 pages. Welcome to discuss. arXiv admin note: text overlap with arXiv:2106.1139

    Projection-Free Methods for Solving Nonconvex-Concave Saddle Point Problems

    Full text link
    In this paper, we investigate a class of constrained saddle point (SP) problems where the objective function is nonconvex-concave and smooth. This class of problems has wide applicability in machine learning, including robust multi-class classification and dictionary learning. Several projection-based primal-dual methods have been developed for tackling this problem; however, the availability of methods with projection-free oracles remains limited. To address this gap, we propose efficient single-loop projection-free methods reliant on first-order information. In particular, using regularization and nested approximation techniques, we propose a primal-dual conditional gradient method that solely employs linear minimization oracles to handle constraints. Assuming that the constraint set in the maximization is strongly convex, our method achieves an ϵ\epsilon-stationary solution within O(ϵ6)\mathcal{O}(\epsilon^{-6}) iterations. When the projection onto the constraint set of maximization is easy to compute, we propose a one-sided projection-free method that achieves an ϵ\epsilon-stationary solution within O(ϵ4)\mathcal{O}(\epsilon^{-4}) iterations. Moreover, we present improved iteration complexities of our methods under a strong concavity assumption. To the best of our knowledge, our proposed algorithms are among the first projection-free methods with convergence guarantees for solving nonconvex-concave SP problems.Comment: Additional experiments have been adde
    corecore