31 research outputs found
Let's be Honest: An Optimal No-Regret Framework for Zero-Sum Games
We revisit the problem of solving two-player zero-sum games in the
decentralized setting. We propose a simple algorithmic framework that
simultaneously achieves the best rates for honest regret as well as adversarial
regret, and in addition resolves the open problem of removing the logarithmic
terms in convergence to the value of the game. We achieve this goal in three
steps. First, we provide a novel analysis of the optimistic mirror descent
(OMD), showing that it can be modified to guarantee fast convergence for both
honest regret and value of the game, when the players are playing
collaboratively. Second, we propose a new algorithm, dubbed as robust
optimistic mirror descent (ROMD), which attains optimal adversarial regret
without knowing the time horizon beforehand. Finally, we propose a simple
signaling scheme, which enables us to bridge OMD and ROMD to achieve the best
of both worlds. Numerical examples are presented to support our theoretical
claims and show that our non-adaptive ROMD algorithm can be competitive to OMD
with adaptive step-size selection.Comment: Proceedings of the 35th International Conference on Machine Learnin
Playing Stackelberg Opinion Optimization with Randomized Algorithms for Combinatorial Strategies
From a perspective of designing or engineering for opinion formation games in
social networks, the "opinion maximization (or minimization)" problem has been
studied mainly for designing subset selecting algorithms. We furthermore define
a two-player zero-sum Stackelberg game of competitive opinion optimization by
letting the player under study as the first-mover minimize the sum of expressed
opinions by doing so-called "internal opinion design", knowing that the other
adversarial player as the follower is to maximize the same objective by also
conducting her own internal opinion design.
We propose for the min player to play the "follow-the-perturbed-leader"
algorithm in such Stackelberg game, obtaining losses depending on the other
adversarial player's play. Since our strategy of subset selection is
combinatorial in nature, the probabilities in a distribution over all the
strategies would be too many to be enumerated one by one. Thus, we design a
randomized algorithm to produce a (randomized) pure strategy. We show that the
strategy output by the randomized algorithm for the min player is essentially
an approximate equilibrium strategy against the other adversarial player
Optimization, Learning, and Games with Predictable Sequences
We provide several applications of Optimistic Mirror Descent, an online
learning algorithm based on the idea of predictable sequences. First, we
recover the Mirror Prox algorithm for offline optimization, prove an extension
to Holder-smooth functions, and apply the results to saddle-point type
problems. Next, we prove that a version of Optimistic Mirror Descent (which has
a close relation to the Exponential Weights algorithm) can be used by two
strongly-uncoupled players in a finite zero-sum matrix game to converge to the
minimax equilibrium at the rate of O((log T)/T). This addresses a question of
Daskalakis et al 2011. Further, we consider a partial information version of
the problem. We then apply the results to convex programming and exhibit a
simple algorithm for the approximate Max Flow problem
Cycles in adversarial regularized learning
Regularized learning is a fundamental technique in online optimization,
machine learning and many other fields of computer science. A natural question
that arises in these settings is how regularized learning algorithms behave
when faced against each other. We study a natural formulation of this problem
by coupling regularized learning dynamics in zero-sum games. We show that the
system's behavior is Poincar\'e recurrent, implying that almost every
trajectory revisits any (arbitrarily small) neighborhood of its starting point
infinitely often. This cycling behavior is robust to the agents' choice of
regularization mechanism (each agent could be using a different regularizer),
to positive-affine transformations of the agents' utilities, and it also
persists in the case of networked competition, i.e., for zero-sum polymatrix
games.Comment: 22 pages, 4 figure
Theoretical and Practical Advances on Smoothing for Extensive-Form Games
Sparse iterative methods, in particular first-order methods, are known to be
among the most effective in solving large-scale two-player zero-sum
extensive-form games. The convergence rates of these methods depend heavily on
the properties of the distance-generating function that they are based on. We
investigate the acceleration of first-order methods for solving extensive-form
games through better design of the dilated entropy function---a class of
distance-generating functions related to the domains associated with the
extensive-form games. By introducing a new weighting scheme for the dilated
entropy function, we develop the first distance-generating function for the
strategy spaces of sequential games that has no dependence on the branching
factor of the player. This result improves the convergence rate of several
first-order methods by a factor of , where is the branching
factor of the player, and is the depth of the game tree.
Thus far, counterfactual regret minimization methods have been faster in
practice, and more popular, than first-order methods despite their
theoretically inferior convergence rates. Using our new weighting scheme and
practical tuning we show that, for the first time, the excessive gap technique
can be made faster than the fastest counterfactual regret minimization
algorithm, CFR+, in practice