362 research outputs found

    On the Infimal Sub-differential Size of Primal-Dual Hybrid Gradient Method and Beyond

    Full text link
    Primal-dual hybrid gradient method (PDHG, a.k.a. Chambolle and Pock method) is a well-studied algorithm for minimax optimization problems with a bilinear interaction term. Recently, PDHG is used as the base algorithm for a new LP solver PDLP that aims to solve large LP instances by taking advantage of modern computing resources, such as GPU and distributed system. Most of the previous convergence results of PDHG are either on duality gap or on distance to the optimal solution set, which are usually hard to compute during the solving process. In this paper, we propose a new progress metric for analyzing PDHG, which we dub infimal sub-differential size (IDS), by utilizing the geometry of PDHG iterates. IDS is a natural extension of the gradient norm of smooth problems to non-smooth problems, and it is tied with KKT error in the case of LP. Compared to traditional progress metrics for PDHG, IDS always has a finite value and can be computed only using information of the current solution. We show that IDS monotonically decays, and it has an O(1k)\mathcal O(\frac{1}{k}) sublinear rate for solving convex-concave primal-dual problems, and it has a linear convergence rate if the problem further satisfies a regularity condition that is satisfied by applications such as linear programming, quadratic programming, TV-denoising model, etc. The simplicity of our analysis and the monotonic decay of IDS suggest that IDS is a natural progress metric to analyze PDHG. As a by-product of our analysis, we show that the primal-dual gap has O(1k)\mathcal O(\frac{1}{\sqrt{k}}) convergence rate for the last iteration of PDHG for convex-concave problems. The analysis and results on PDHG can be directly generalized to other primal-dual algorithms, for example, proximal point method (PPM), alternating direction method of multipliers (ADMM) and linearized alternating direction method of multipliers (l-ADMM)

    Doubly Optimal No-Regret Learning in Monotone Games

    Full text link
    We consider online learning in multi-player smooth monotone games. Existing algorithms have limitations such as (1) being only applicable to strongly monotone games; (2) lacking the no-regret guarantee; (3) having only asymptotic or slow O(1T)O(\frac{1}{\sqrt{T}}) last-iterate convergence rate to a Nash equilibrium. While the O(1T)O(\frac{1}{\sqrt{T}}) rate is tight for a large class of algorithms including the well-studied extragradient algorithm and optimistic gradient algorithm, it is not optimal for all gradient-based algorithms. We propose the accelerated optimistic gradient (AOG) algorithm, the first doubly optimal no-regret learning algorithm for smooth monotone games. Namely, our algorithm achieves both (i) the optimal O(T)O(\sqrt{T}) regret in the adversarial setting under smooth and convex loss functions and (ii) the optimal O(1T)O(\frac{1}{T}) last-iterate convergence rate to a Nash equilibrium in multi-player smooth monotone games. As a byproduct of the accelerated last-iterate convergence rate, we further show that each player suffers only an O(log⁑T)O(\log T) individual worst-case dynamic regret, providing an exponential improvement over the previous state-of-the-art O(T)O(\sqrt{T}) bound.Comment: Published at ICML 2023. V2 incorporates reviewers' feedbac

    Semi-Anchored Multi-Step Gradient Descent Ascent Method for Structured Nonconvex-Nonconcave Composite Minimax Problems

    Full text link
    Minimax problems, such as generative adversarial network, adversarial training, and fair training, are widely solved by a multi-step gradient descent ascent (MGDA) method in practice. However, its convergence guarantee is limited. In this paper, inspired by the primal-dual hybrid gradient method, we propose a new semi-anchoring (SA) technique for the MGDA method. This makes the MGDA method find a stationary point of a structured nonconvex-nonconcave composite minimax problem; its saddle-subdifferential operator satisfies the weak Minty variational inequality condition. The resulting method, named SA-MGDA, is built upon a Bregman proximal point method. We further develop its backtracking line-search version, and its non-Euclidean version for smooth adaptable functions. Numerical experiments, including a fair classification training, are provided

    On Stochastic Subgradient Mirror-Descent Algorithm with Weighted Averaging

    Full text link
    This paper considers stochastic subgradient mirror-descent method for solving constrained convex minimization problems. In particular, a stochastic subgradient mirror-descent method with weighted iterate-averaging is investigated and its per-iterate convergence rate is analyzed. The novel part of the approach is in the choice of weights that are used to construct the averages. Through the use of these weighted averages, we show that the known optimal rates can be obtained with simpler algorithms than those currently existing in the literature. Specifically, by suitably choosing the stepsize values, one can obtain the rate of the order 1/k1/k for strongly convex functions, and the rate 1/k1/\sqrt{k} for general convex functions (not necessarily differentiable). Furthermore, for the latter case, it is shown that a stochastic subgradient mirror-descent with iterate averaging converges (along a subsequence) to an optimal solution, almost surely, even with the stepsize of the form 1/1+k1/\sqrt{1+k}, which was not previously known. The stepsize choices that achieve the best rates are those proposed by Paul Tseng for acceleration of proximal gradient methods

    A Unified View of Large-scale Zero-sum Equilibrium Computation

    Full text link
    The task of computing approximate Nash equilibria in large zero-sum extensive-form games has received a tremendous amount of attention due mainly to the Annual Computer Poker Competition. Immediately after its inception, two competing and seemingly different approaches emerged---one an application of no-regret online learning, the other a sophisticated gradient method applied to a convex-concave saddle-point formulation. Since then, both approaches have grown in relative isolation with advancements on one side not effecting the other. In this paper, we rectify this by dissecting and, in a sense, unify the two views.Comment: AAAI Workshop on Computer Poker and Imperfect Informatio
    • …
    corecore