Search CORE

982 research outputs found

The Computational Power of Optimization in Online Learning

Author: Agarwal A.
Agarwal A.
Dani V.
Dud´ık M.
Gofer E.
Hazan E.
Kakade S.
McMahan H. B.
Shalev-Shwartz S.
Zinkevich M.
Publication venue
Publication date: 27/01/2016
Field of study

We consider the fundamental problem of prediction with expert advice where the experts are "optimizable": there is a black-box optimization oracle that can be used to compute, in constant time, the leading expert in retrospect at any point in time. In this setting, we give a novel online algorithm that attains vanishing regret with respect to

N

experts in total

\widetilde{O}(\sqrt{N})

computation time. We also give a lower bound showing that this running time cannot be improved (up to log factors) in the oracle model, thereby exhibiting a quadratic speedup as compared to the standard, oracle-free setting where the required time for vanishing regret is

\widetilde{\Theta}(N)

. These results demonstrate an exponential gap between the power of optimization in online learning and its power in statistical learning: in the latter, an optimization oracle---i.e., an efficient empirical risk minimizer---allows to learn a finite hypothesis class of size

N

in time

O(\log{N})

. We also study the implications of our results to learning in repeated zero-sum games, in a setting where the players have access to oracles that compute, in constant time, their best-response to any mixed strategy of their opponent. We show that the runtime required for approximating the minimax value of the game in this setting is

\widetilde{\Theta}(\sqrt{N})

, yielding again a quadratic improvement upon the oracle-free setting, where

\widetilde{\Theta}(N)

is known to be tight

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Modelling Behavioural Diversity for Learning in Open-Ended Games

Author: Mguni David Henry
Nieves Nicolas Perez
Slumbers Oliver
Wang Jun
Wen Ying
Yang Yaodong
Publication venue
Publication date: 10/06/2021
Field of study

Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on determinantal point processes (DPP). By incorporating the diversity metric into best-response dynamics, we develop diverse fictitious play and diverse policy-space response oracle for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the gamescape -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve at least the same, and in most games, lower exploitability than PSRO solvers by finding effective and diverse strategies.Comment: corresponds to <[email protected]

arXiv.org e-Print Archive

UCL Discovery

Learning Convex Partitions and Computing Game-theoretic Equilibria from Best Response Queries

Author: C Daskalakis
D Fudenberg
J Fearnley
J Nash
J Robinson
NH Bshouty
P Klemperer
PW Goldberg
PW Goldberg
PW Goldberg
S Hart
S Kakutani
X Chen
Y Babichenko
Y Babichenko
Publication venue
Publication date: 01/01/2018
Field of study

Suppose that an

m

-simplex is partitioned into

n

convex regions having disjoint interiors and distinct labels, and we may learn the label of any point by querying it. The learning objective is to know, for any point in the simplex, a label that occurs within some distance

\epsilon

from that point. We present two algorithms for this task: Constant-Dimension Generalised Binary Search (CD-GBS), which for constant

m

uses

poly(n, \log \left( \frac{1}{\epsilon} \right))

queries, and Constant-Region Generalised Binary Search (CR-GBS), which uses CD-GBS as a subroutine and for constant

n

uses

poly(m, \log \left( \frac{1}{\epsilon} \right))

queries. We show via Kakutani's fixed-point theorem that these algorithms provide bounds on the best-response query complexity of computing approximate well-supported equilibria of bimatrix games in which one of the players has a constant number of pure strategies. We also partially extend our results to games with multiple players, establishing further query complexity bounds for computing approximate well-supported equilibria in this setting.Comment: 38 pages, 7 figures, second version strengthens lower bound in Theorem 6, adds footnotes with additional comments and fixes typo

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

Author: Huang Furong
Liang Yongyuan
Liu Xiangyu
McAleer Stephen
Sandholm Tuomas
Sun Yanchao
Zheng Ruijie
Publication venue
Publication date: 22/07/2023
Field of study

Robust reinforcement learning (RL) seeks to train policies that can perform well under environment perturbations or adversarial attacks. Existing approaches typically assume that the space of possible perturbations remains the same across timesteps. However, in many settings, the space of possible perturbations at a given timestep depends on past perturbations. We formally introduce temporally-coupled perturbations, presenting a novel challenge for existing robust RL methods. To tackle this challenge, we propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially-observable two-player zero-sum game. By finding an approximate equilibrium in this game, GRAD ensures the agent's robustness against temporally-coupled perturbations. Empirical experiments on a variety of continuous control tasks demonstrate that our proposed approach exhibits significant robustness advantages compared to baselines against both standard and temporally-coupled attacks, in both state and action spaces

arXiv.org e-Print Archive

A quantum view on convex optimization

Author: Apeldoorn J.T.S. (Joran) van
Publication venue
Publication date: 01/01/2020
Field of study

In this dissertation we consider quantum algorithms for convex optimization. We start by considering a black-box setting of convex optimization. In this setting we show that quantum computers require exponentially fewer queries to a membership oracle for a convex set in order to implement a separation oracle for that set. We do so by proving that Jordan's quantum gradient algorithm can also be applied to find sub-gradients of convex Lipschitz functions, even though these functions might not even be differentiable. As a corollary we get a quadraticly faster algorithm for convex optimization using membership queries. As a second set of results we give sub-linear time quantum algorithms for semidefinite optimization by speeding up the iterations of the Arora-Kale algorithm. For the problem of finding approximate Nash equilibria for zero-sum games we then give specific algorithms that improve the error-dependence and only depend on the sparsity of the game, not it's size. These last results yield improved algorithms for linear programming as a corollary. We also show several lower bounds in these settings, matching the upper bounds in most or all parameters

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Non-Asymptotic Pure Exploration by Solving Games

Author: Degenne R.R.B.P. (Rémy)
Koolen-Wijkstra W.M. (Wouter)
Ménard P. (Pierre)
Publication venue
Publication date: 01/01/2019
Field of study

Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment. Good algorithms make few mistake

CWI's Institutional Repository