Search CORE

31 research outputs found

Let's be Honest: An Optimal No-Regret Framework for Zero-Sum Games

Author: Cevher Volkan
Hsieh Ya-Ping
Kangarshahi Ehsan Asadi
Sahin Mehmet Fatih
Publication venue
Publication date: 12/02/2018
Field of study

We revisit the problem of solving two-player zero-sum games in the decentralized setting. We propose a simple algorithmic framework that simultaneously achieves the best rates for honest regret as well as adversarial regret, and in addition resolves the open problem of removing the logarithmic terms in convergence to the value of the game. We achieve this goal in three steps. First, we provide a novel analysis of the optimistic mirror descent (OMD), showing that it can be modified to guarantee fast convergence for both honest regret and value of the game, when the players are playing collaboratively. Second, we propose a new algorithm, dubbed as robust optimistic mirror descent (ROMD), which attains optimal adversarial regret without knowing the time horizon beforehand. Finally, we propose a simple signaling scheme, which enables us to bridge OMD and ROMD to achieve the best of both worlds. Numerical examples are presented to support our theoretical claims and show that our non-adaptive ROMD algorithm can be competitive to OMD with adaptive step-size selection.Comment: Proceedings of the 35th International Conference on Machine Learnin

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Playing Stackelberg Opinion Optimization with Randomized Algorithms for Combinatorial Strategies

Author: Chen Po-An
Lu Chi-Jen
Publication venue
Publication date: 05/03/2018
Field of study

From a perspective of designing or engineering for opinion formation games in social networks, the "opinion maximization (or minimization)" problem has been studied mainly for designing subset selecting algorithms. We furthermore define a two-player zero-sum Stackelberg game of competitive opinion optimization by letting the player under study as the first-mover minimize the sum of expressed opinions by doing so-called "internal opinion design", knowing that the other adversarial player as the follower is to maximize the same objective by also conducting her own internal opinion design. We propose for the min player to play the "follow-the-perturbed-leader" algorithm in such Stackelberg game, obtaining losses depending on the other adversarial player's play. Since our strategy of subset selection is combinatorial in nature, the probabilities in a distribution over all the strategies would be too many to be enumerated one by one. Thus, we design a randomized algorithm to produce a (randomized) pure strategy. We show that the strategy output by the randomized algorithm for the min player is essentially an approximate equilibrium strategy against the other adversarial player

arXiv.org e-Print Archive

Optimization, Learning, and Games with Predictable Sequences

Author: Rakhlin Alexander
Sridharan Karthik
Publication venue
Publication date: 01/01/2013
Field of study

We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror Prox algorithm for offline optimization, prove an extension to Holder-smooth functions, and apply the results to saddle-point type problems. Next, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log T)/T). This addresses a question of Daskalakis et al 2011. Further, we consider a partial information version of the problem. We then apply the results to convex programming and exhibit a simple algorithm for the approximate Max Flow problem

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

Cycles in adversarial regularized learning

Author: Mertikopoulos Panayotis
Papadimitriou Christos
Piliouras Georgios
Publication venue
Publication date: 08/09/2017
Field of study

Regularized learning is a fundamental technique in online optimization, machine learning and many other fields of computer science. A natural question that arises in these settings is how regularized learning algorithms behave when faced against each other. We study a natural formulation of this problem by coupling regularized learning dynamics in zero-sum games. We show that the system's behavior is Poincar\'e recurrent, implying that almost every trajectory revisits any (arbitrarily small) neighborhood of its starting point infinitely often. This cycling behavior is robust to the agents' choice of regularization mechanism (each agent could be using a different regularizer), to positive-affine transformations of the agents' utilities, and it also persists in the case of networked competition, i.e., for zero-sum polymatrix games.Comment: 22 pages, 4 figure

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Theoretical and Practical Advances on Smoothing for Extensive-Form Games

Author: Kilinc-Karzan Fatma
Kroer Christian
Sandholm Tuomas
Waugh Kevin
Publication venue
Publication date: 08/05/2017
Field of study

Sparse iterative methods, in particular first-order methods, are known to be among the most effective in solving large-scale two-player zero-sum extensive-form games. The convergence rates of these methods depend heavily on the properties of the distance-generating function that they are based on. We investigate the acceleration of first-order methods for solving extensive-form games through better design of the dilated entropy function---a class of distance-generating functions related to the domains associated with the extensive-form games. By introducing a new weighting scheme for the dilated entropy function, we develop the first distance-generating function for the strategy spaces of sequential games that has no dependence on the branching factor of the player. This result improves the convergence rate of several first-order methods by a factor of

\Omega(b^dd)

, where

b

is the branching factor of the player, and

d

is the depth of the game tree. Thus far, counterfactual regret minimization methods have been faster in practice, and more popular, than first-order methods despite their theoretically inferior convergence rates. Using our new weighting scheme and practical tuning we show that, for the first time, the excessive gap technique can be made faster than the fastest counterfactual regret minimization algorithm, CFR+, in practice

arXiv.org e-Print Archive

Crossref