Search CORE

29 research outputs found

Extra-Newton: A First Approach to Noise-Adaptive Accelerated Second-Order Methods

Author: Antonakopoulos Kimon
Cevher Volkan
Kavis Ali
Publication venue
Publication date: 12/12/2022
Field of study

This work proposes a universal and adaptive second-order method for minimizing second-order smooth, convex functions. Our algorithm achieves

O(\sigma / \sqrt{T})

convergence when the oracle feedback is stochastic with variance

\sigma^2

, and improves its convergence to

O( 1 / T^3)

with deterministic oracles, where

T

is the number of iterations. Our method also interpolates these rates without knowing the nature of the oracle apriori, which is enabled by a parameter-free adaptive step-size that is oblivious to the knowledge of smoothness modulus, variance bounds and the diameter of the constrained set. To our knowledge, this is the first universal algorithm with such global guarantees within the second-order optimization literature.Comment: 32 pages, 4 figures, accepted at NeurIPS 202

arXiv.org e-Print Archive

Adaptive first-order methods revisited: Convex optimization without Lipschitz requirements

Author: Antonakopoulos Kimon
Mertikopoulos Panayotis
Publication venue: HAL CCSD
Publication date: 06/12/2021
Field of study

International audienceWe propose a new family of adaptive first-order methods for a class of convex minimization problems that may fail to be Lipschitz continuous or smooth in the standard sense. Specifically, motivated by a recent flurry of activity on non-Lipschitz (NoLips) optimization, we consider problems that are continuous or smooth relative to a reference Bregman function-as opposed to a global, ambient norm (Euclidean or otherwise). These conditions encompass a wide range of problems with singular objective, such as Fisher markets, Poisson tomography, D-design, and the like. In this setting, the application of existing order-optimal adaptive methods-like UnixGrad or AcceleGrad-is not possible, especially in the presence of randomness and uncertainty. The proposed method, adaptive mirror descent (AdaMir), aims to close this gap by concurrently achieving min-max optimal rates in problems that are relatively continuous or smooth, including stochastic ones

INRIA a CCSD electronic archive server

Online and Stochastic Optimization beyond Lipschitz Continuity: A Riemannian Approach

Author: Antonakopoulos Kimon
Belmega Elena Veronica
Mertikopoulos Panayotis
Publication venue: HAL CCSD
Publication date: 01/01/2020
Field of study

International audienceMotivated by applications to machine learning and imaging science, we study a class of online and stochastic optimization problems with loss functions that are not Lipschitz continuous; in particular, the loss functions encountered by the optimizer could exhibit gradient singularities or be singular themselves. Drawing on tools and techniques from Riemannian geometry, we examine a Riemann-Lipschitz (RL) continuity condition which is tailored to the singularity landscape of the problem's loss functions. In this way, we are able to tackle cases beyond the Lipschitz framework provided by a global norm, and we derive optimal regret bounds and last iterate convergence results through the use of regularized learning methods (such as online mirror descent). These results are subsequently validated in a class of stochastic Poisson inverse problems that arise in imaging science

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

Adaptive extra-gradient methods for min-max optimization and games

Author: Antonakopoulos Kimon
Belmega Veronica,
Mertikopoulos Panayotis
Publication venue: HAL CCSD
Publication date: 01/01/2021
Field of study

International audienceWe present a new family of min-max optimization algorithms that automatically exploit the geometry of the gradient data observed at earlier iterations to perform more informative extra-gradient steps in later ones. Thanks to this adaptation mechanism, the proposed method automatically detects whether the problem is smooth or not, without requiring any prior tuning by the optimizer. As a result, the algorithm simultaneously achieves order-optimal convergence rates, i.e., it converges to an ε-optimal solution within O(1/ε) iterations in smooth problems, and within O(1/ε 2) iterations in non-smooth ones.Importantly, these guarantees do not require any of the standard boundedness or Lipschitz continuity conditions that are typically assumed in the literature; in particular, they apply even to problems with singularities (such as resource allocation problems and the like). This adaptation is achieved through the use of a geometric apparatus based on Finsler metrics and a suitably chosen mirror-prox template that allows us to derive sharp convergence rates for the methods at hand

INRIA a CCSD electronic archive server

HAL Descartes

An adaptive mirror-prox algorithm for variational inequalities with singular operators

Author: Antonakopoulos Kimon
Belmega Elena Veronica
Mertikopoulos Panayotis
Publication venue: HAL CCSD
Publication date: 01/01/2019
Field of study

International audienceLipschitz continuity is a central requirement for achieving the optimal O(1/T) rate of convergence in monotone, deterministic variational inequalities (a setting that includes convex minimization, convex-concave optimization, nonatomic games, and many other problems). However, in many cases of practical interest, the operator defining the variational inequality may exhibit singularities at the boundary of the feasible region, precluding in this way the use of fast gradient methods that attain this optimal rate (such as Nemirovski's mirror-prox algorithm and its variants). To address this issue, we propose a novel regularity condition which we call Bregman continuity, and which relates the variation of the operator to that of a suitably chosen Bregman function. Leveraging this condition, we derive an adaptive mirror-prox algorithm which attains the optimal O(1/T) rate of convergence in problems with possibly singular operators, without any prior knowledge of the degree of smoothness (the Bregman analogue of the Lipschitz constant). We also show that, under Bregman continuity, the mirror-prox algorithm achieves a

O(1/ \sqrt{T})

convergence rate in stochastic variational inequalities

INRIA a CCSD electronic archive server

Adaptive learning in continuous games: Optimal regret bounds and convergence to Nash equilibrium

Author: Antonakopoulos Kimon
Hsieh Yu-Guan
Mertikopoulos Panayotis
Publication venue: HAL CCSD
Publication date: 15/08/2021
Field of study

International audienceIn game-theoretic learning, several agents are simultaneously following their individual interests, so the environment is non-stationary from each player's perspective. In this context, the performance of a learning algorithm is often measured by its regret. However, no-regret algorithms are not created equal in terms of game-theoretic guarantees: depending on how they are tuned, some of them may drive the system to an equilibrium, while others could produce cyclic, chaotic, or otherwise divergent trajectories. To account for this, we propose a range of no-regret policies based on optimistic mirror descent, with the following desirable properties: i) they do not require any prior tuning or knowledge of the game; ii) they all achieve O(√ T) regret against arbitrary, adversarial opponents; and iii) they converge to the best response against convergent opponents. Also, if employed by all players, then iv) they guarantee O(1) social regret; while v) the induced sequence of play converges to Nash equilibrium with O(1) individual regret in all variationally stable games (a class of games that includes all monotone and convex-concave zero-sum games)

INRIA a CCSD electronic archive server

Distributed Extra-gradient with Optimal Complexity and Communication Guarantees

Author: Antonakopoulos Kimon
Cevher Volkan
Deschenaux Justin
Krawczuk Igor
Ramezani-Kebrya Ali
Publication venue
Publication date: 17/08/2023
Field of study

We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local stochastic dual vectors. This setting includes a broad range of important problems from distributed convex minimization to min-max and games. Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient. To this end, we propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs. We provide an adaptive step-size rule, which adapts to the respective noise profiles at hand and achieve a fast rate of

{\mathcal O}(1/T)

under relative noise, and an order-optimal

{\mathcal O}(1/\sqrt{T})

under absolute noise and show distributed training accelerates convergence. Finally, we validate our theoretical results by providing real-world experiments and training generative adversarial networks on multiple GPUs.Comment: International Conference on Learning Representations (ICLR 2023

arXiv.org e-Print Archive

Fast routing under uncertainty: Adaptive learning in congestion games with exponential weights

Author: Antonakopoulos Kimon
Mertikopoulos Panayotis
Quan Vu Dong
Publication venue: HAL CCSD
Publication date: 06/12/2021
Field of study

International audienceWe examine an adaptive learning framework for nonatomic congestion games where the players' cost functions may be subject to exogenous fluctuations (e.g., due to disturbances in the network, variations in the traffic going through a link). In this setting, the popular multiplicative/ exponential weights algorithm enjoys an

\mathcal{O}(1/\sqrt{T})

equilibrium convergence rate; however, this rate is suboptimal in static environments – i.e., when the network is not subject to randomness. In this static regime, accelerated algorithms achieve an

\mathcal{O}(1/T^{2})

convergence speed, but they fail to converge altogether in stochastic problems. To fill this gap, we propose a novel, adaptive exponential weights method – dubbed AdaWeight – that seamlessly interpolates between the

\mathcal{O}(1/T^{2})

and

\mathcal{O}(1/\sqrt{T})

rates in the static and stochastic regimes respectively. Importantly, this "best-of-both-worlds" guarantee does not require any prior knowledge of the problem's parameters or tuning by the optimizer; in addition, the method's convergence speed depends subquadratically on the size of the network (number of vertices and edges), so it scales gracefully to large, real-life urban networks

INRIA a CCSD electronic archive server

Advancing the lower bounds: An accelerated, stochastic, second-order method with optimal adaptation to inexactness

Author: Agafonov Artem
Antonakopoulos Kimon
Cevher Volkan
Gasnikov Alexander
Kamzolov Dmitry
Kavis Ali
Takáč Martin
Publication venue
Publication date: 06/09/2023
Field of study

We present a new accelerated stochastic second-order method that is robust to both gradient and Hessian inexactness, which occurs typically in machine learning. We establish theoretical lower bounds and prove that our algorithm achieves optimal convergence in both gradient and Hessian inexactness in this key setting. We further introduce a tensor generalization for stochastic higher-order derivatives. When the oracles are non-stochastic, the proposed tensor algorithm matches the global convergence of Nesterov Accelerated Tensor method. Both algorithms allow for approximate solutions of their auxiliary subproblems with verifiable conditions on the accuracy of the solution

arXiv.org e-Print Archive

Sifting through the noise: Universal first-order methods for stochastic variational inequalities

Author: Antonakopoulos Kimon
Cevher Volkan
Kavis Ali
Mertikopoulos Panayotis
Pethick Thomas
Publication venue: HAL CCSD
Publication date: 06/12/2021
Field of study

International audienceWe examine a flexible algorithmic framework for solving monotone variational inequalities in the presence of randomness and uncertainty. The proposed template encompasses a wide range of popular first-order methods, including dual averaging, dual extrapolation and optimistic gradient algorithms – both adaptive and non-adaptive. Our first result is that the algorithm achieves the optimal rates of convergence for cocoercive problems when the profile of the randomness is known to the optimizer:

\mathcal{O}(1/\sqrt{T})

for absolute noise profiles, and

\mathcal{O}(1/T)

for relative ones. Subsequently, we drop all prior knowledge requirements (the absolute/relative variance of the randomness affecting the problem, the operator's cocoercivity constant, etc.), and we analyze an adaptive instance of the method that gracefully interpolates between the above rates – i.e. it achieves

\mathcal{O}(1/\sqrt{T})

and

\mathcal{O}(1/T)

in the absolute and relative cases, respectively. To our knowledge, this is the first universality result of its kind in the literature and, somewhat surprisingly, it shows that an extra-gradient proxy step is not required to achieve optimal rates

INRIA a CCSD electronic archive server