3,899 research outputs found
Cycles in adversarial regularized learning
Regularized learning is a fundamental technique in online optimization,
machine learning and many other fields of computer science. A natural question
that arises in these settings is how regularized learning algorithms behave
when faced against each other. We study a natural formulation of this problem
by coupling regularized learning dynamics in zero-sum games. We show that the
system's behavior is Poincar\'e recurrent, implying that almost every
trajectory revisits any (arbitrarily small) neighborhood of its starting point
infinitely often. This cycling behavior is robust to the agents' choice of
regularization mechanism (each agent could be using a different regularizer),
to positive-affine transformations of the agents' utilities, and it also
persists in the case of networked competition, i.e., for zero-sum polymatrix
games.Comment: 22 pages, 4 figure
Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games
Regret minimization is a powerful tool for solving large-scale extensive-form
games. State-of-the-art methods rely on minimizing regret locally at each
decision point. In this work we derive a new framework for regret minimization
on sequential decision problems and extensive-form games with general compact
convex sets at each decision point and general convex losses, as opposed to
prior work which has been for simplex decision points and linear losses. We
call our framework laminar regret decomposition. It generalizes the CFR
algorithm to this more general setting. Furthermore, our framework enables a
new proof of CFR even in the known setting, which is derived from a perspective
of decomposing polytope regret, thereby leading to an arguably simpler
interpretation of the algorithm. Our generalization to convex compact sets and
convex losses allows us to develop new algorithms for several problems:
regularized sequential decision making, regularized Nash equilibria in
extensive-form games, and computing approximate extensive-form perfect
equilibria. Our generalization also leads to the first regret-minimization
algorithm for computing reduced-normal-form quantal response equilibria based
on minimizing local regrets. Experiments show that our framework leads to
algorithms that scale at a rate comparable to the fastest variants of
counterfactual regret minimization for computing Nash equilibrium, and
therefore our approach leads to the first algorithm for computing quantal
response equilibria in extremely large games. Finally we show that our
framework enables a new kind of scalable opponent exploitation approach
Competitive Gradient Descent
We introduce a new algorithm for the numerical computation of Nash equilibria
of competitive two-player games. Our method is a natural generalization of
gradient descent to the two-player setting where the update is given by the
Nash equilibrium of a regularized bilinear local approximation of the
underlying game. It avoids oscillatory and divergent behaviors seen in
alternating gradient descent. Using numerical experiments and rigorous
analysis, we provide a detailed comparison to methods based on \emph{optimism}
and \emph{consensus} and show that our method avoids making any unnecessary
changes to the gradient dynamics while achieving exponential (local)
convergence for (locally) convex-concave zero sum games. Convergence and
stability properties of our method are robust to strong interactions between
the players, without adapting the stepsize, which is not the case with previous
methods. In our numerical experiments on non-convex-concave problems, existing
methods are prone to divergence and instability due to their sensitivity to
interactions among the players, whereas we never observe divergence of our
algorithm. The ability to choose larger stepsizes furthermore allows our
algorithm to achieve faster convergence, as measured by the number of model
evaluations.Comment: Appeared in NeurIPS 2019. This version corrects an error in theorem
2.2. Source code used for the numerical experiments can be found under
http://github.com/f-t-s/CGD. A high-level overview of this work can be found
under http://f-t-s.github.io/projects/cgd
- …