Search CORE

583 research outputs found

On the Impossibility of Regret Minimization in Repeated Games

Author: Andriy Zapechelnyuk
Karl Schlag
Publication venue
Publication date
Field of study

Regret minimizing strategies for repeated games have been receiving increasing attention in the literature. These are simple adaptive behavior rules that exhibit nice convergence properties. If all players follow regret minimizing strategies, their average joint play converges to the set of correlated equilibria or to the Hannan set (depending on the notion of regret in use), or even to Nash equilibrium on certain classes of games. In this note we raise the question of validity of the regret minimization objective. By example we show that regret minimization can lead to unrealistic behavior, since it fails to take into account the effect of one's actions on subsequent behavior of the opponents. An amended notion of regret that corrects this defect is not very useful either, since achieving a no-regret objective is not guaranteed in that case.Repeated games, Regret minimization, No-regret strategy

Optimization, Learning, and Games with Predictable Sequences

Author: Rakhlin Alexander
Sridharan Karthik
Publication venue
Publication date: 01/01/2013
Field of study

We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror Prox algorithm for offline optimization, prove an extension to Holder-smooth functions, and apply the results to saddle-point type problems. Next, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log T)/T). This addresses a question of Daskalakis et al 2011. Further, we consider a partial information version of the problem. We then apply the results to convex programming and exhibit a simple algorithm for the approximate Max Flow problem

arXiv.org e-Print Archive

CiteSeerX

Online learning with graph-structured feedback against adaptive adversaries

Author: Feng Zhili
Loh Po-Ling
Publication venue
Publication date: 01/04/2018
Field of study

We derive upper and lower bounds for the policy regret of

T

-round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of

\widetilde O(T^{2/3})

and

\widetilde O(T^{3/4})

for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we show that a matching lower bound of

\widetilde\Omega(T^{2/3})

is achieved in the case of full-information feedback. We also study the particular loss structure of an oblivious adversary with switching costs, and show that in such a setting, non-revealing strongly-observable feedback graphs achieve a lower bound of

\widetilde\Omega(T^{2/3})

, as well.Comment: This paper has been accepted to ISIT 201

arXiv.org e-Print Archive