982 research outputs found
The Computational Power of Optimization in Online Learning
We consider the fundamental problem of prediction with expert advice where
the experts are "optimizable": there is a black-box optimization oracle that
can be used to compute, in constant time, the leading expert in retrospect at
any point in time. In this setting, we give a novel online algorithm that
attains vanishing regret with respect to experts in total
computation time. We also give a lower bound showing
that this running time cannot be improved (up to log factors) in the oracle
model, thereby exhibiting a quadratic speedup as compared to the standard,
oracle-free setting where the required time for vanishing regret is
. These results demonstrate an exponential gap between
the power of optimization in online learning and its power in statistical
learning: in the latter, an optimization oracle---i.e., an efficient empirical
risk minimizer---allows to learn a finite hypothesis class of size in time
. We also study the implications of our results to learning in
repeated zero-sum games, in a setting where the players have access to oracles
that compute, in constant time, their best-response to any mixed strategy of
their opponent. We show that the runtime required for approximating the minimax
value of the game in this setting is , yielding
again a quadratic improvement upon the oracle-free setting, where
is known to be tight
Modelling Behavioural Diversity for Learning in Open-Ended Games
Promoting behavioural diversity is critical for solving games with
non-transitive dynamics where strategic cycles exist, and there is no
consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous
treatment for defining diversity and constructing diversity-aware learning
dynamics. In this work, we offer a geometric interpretation of behavioural
diversity in games and introduce a novel diversity metric based on
determinantal point processes (DPP). By incorporating the diversity metric into
best-response dynamics, we develop diverse fictitious play and diverse
policy-space response oracle for solving normal-form games and open-ended
games. We prove the uniqueness of the diverse best response and the convergence
of our algorithms on two-player games. Importantly, we show that maximising the
DPP-based diversity metric guarantees to enlarge the gamescape -- convex
polytopes spanned by agents' mixtures of strategies. To validate our
diversity-aware solvers, we test on tens of games that show strong
non-transitivity. Results suggest that our methods achieve at least the same,
and in most games, lower exploitability than PSRO solvers by finding effective
and diverse strategies.Comment: corresponds to <[email protected]
Learning Convex Partitions and Computing Game-theoretic Equilibria from Best Response Queries
Suppose that an -simplex is partitioned into convex regions having
disjoint interiors and distinct labels, and we may learn the label of any point
by querying it. The learning objective is to know, for any point in the
simplex, a label that occurs within some distance from that point.
We present two algorithms for this task: Constant-Dimension Generalised Binary
Search (CD-GBS), which for constant uses queries, and Constant-Region Generalised Binary
Search (CR-GBS), which uses CD-GBS as a subroutine and for constant uses
queries.
We show via Kakutani's fixed-point theorem that these algorithms provide
bounds on the best-response query complexity of computing approximate
well-supported equilibria of bimatrix games in which one of the players has a
constant number of pure strategies. We also partially extend our results to
games with multiple players, establishing further query complexity bounds for
computing approximate well-supported equilibria in this setting.Comment: 38 pages, 7 figures, second version strengthens lower bound in
Theorem 6, adds footnotes with additional comments and fixes typo
Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations
Robust reinforcement learning (RL) seeks to train policies that can perform
well under environment perturbations or adversarial attacks. Existing
approaches typically assume that the space of possible perturbations remains
the same across timesteps. However, in many settings, the space of possible
perturbations at a given timestep depends on past perturbations. We formally
introduce temporally-coupled perturbations, presenting a novel challenge for
existing robust RL methods. To tackle this challenge, we propose GRAD, a novel
game-theoretic approach that treats the temporally-coupled robust RL problem as
a partially-observable two-player zero-sum game. By finding an approximate
equilibrium in this game, GRAD ensures the agent's robustness against
temporally-coupled perturbations. Empirical experiments on a variety of
continuous control tasks demonstrate that our proposed approach exhibits
significant robustness advantages compared to baselines against both standard
and temporally-coupled attacks, in both state and action spaces
A quantum view on convex optimization
In this dissertation we consider quantum algorithms for convex optimization. We start by considering a black-box setting of convex optimization. In this setting we show that quantum computers require exponentially fewer queries to a membership oracle for a convex set in order to implement a separation oracle for that set. We do so by proving that Jordan's quantum gradient algorithm can also be applied to find sub-gradients of convex Lipschitz functions, even though these functions might not even be differentiable. As a corollary we get a quadraticly faster algorithm for convex optimization using membership queries. As a second set of results we give sub-linear time quantum algorithms for semidefinite optimization by speeding up the iterations of the Arora-Kale algorithm. For the problem of finding approximate Nash equilibria for zero-sum games we then give specific algorithms that improve the error-dependence and only depend on the sparsity of the game, not it's size. These last results yield improved algorithms for linear programming as a corollary. We also show several lower bounds in these settings, matching the upper bounds in most or all parameters
Non-Asymptotic Pure Exploration by Solving Games
Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment. Good algorithms make few mistake
- …