2,668 research outputs found

    Open-ended Learning in Symmetric Zero-sum Games

    Get PDF
    Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective -- we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.Comment: ICML 2019, final versio

    The Computational Power of Optimization in Online Learning

    Full text link
    We consider the fundamental problem of prediction with expert advice where the experts are "optimizable": there is a black-box optimization oracle that can be used to compute, in constant time, the leading expert in retrospect at any point in time. In this setting, we give a novel online algorithm that attains vanishing regret with respect to NN experts in total O~(N)\widetilde{O}(\sqrt{N}) computation time. We also give a lower bound showing that this running time cannot be improved (up to log factors) in the oracle model, thereby exhibiting a quadratic speedup as compared to the standard, oracle-free setting where the required time for vanishing regret is Θ~(N)\widetilde{\Theta}(N). These results demonstrate an exponential gap between the power of optimization in online learning and its power in statistical learning: in the latter, an optimization oracle---i.e., an efficient empirical risk minimizer---allows to learn a finite hypothesis class of size NN in time O(logN)O(\log{N}). We also study the implications of our results to learning in repeated zero-sum games, in a setting where the players have access to oracles that compute, in constant time, their best-response to any mixed strategy of their opponent. We show that the runtime required for approximating the minimax value of the game in this setting is Θ~(N)\widetilde{\Theta}(\sqrt{N}), yielding again a quadratic improvement upon the oracle-free setting, where Θ~(N)\widetilde{\Theta}(N) is known to be tight

    Approximate Convex Optimization by Online Game Playing

    Full text link
    Lagrangian relaxation and approximate optimization algorithms have received much attention in the last two decades. Typically, the running time of these methods to obtain a ϵ\epsilon approximate solution is proportional to 1ϵ2\frac{1}{\epsilon^2}. Recently, Bienstock and Iyengar, following Nesterov, gave an algorithm for fractional packing linear programs which runs in 1ϵ\frac{1}{\epsilon} iterations. The latter algorithm requires to solve a convex quadratic program every iteration - an optimization subroutine which dominates the theoretical running time. We give an algorithm for convex programs with strictly convex constraints which runs in time proportional to 1ϵ\frac{1}{\epsilon}. The algorithm does NOT require to solve any quadratic program, but uses gradient steps and elementary operations only. Problems which have strictly convex constraints include maximum entropy frequency estimation, portfolio optimization with loss risk constraints, and various computational problems in signal processing. As a side product, we also obtain a simpler version of Bienstock and Iyengar's result for general linear programming, with similar running time. We derive these algorithms using a new framework for deriving convex optimization algorithms from online game playing algorithms, which may be of independent interest
    corecore