982 research outputs found

    The Computational Power of Optimization in Online Learning

    Full text link
    We consider the fundamental problem of prediction with expert advice where the experts are "optimizable": there is a black-box optimization oracle that can be used to compute, in constant time, the leading expert in retrospect at any point in time. In this setting, we give a novel online algorithm that attains vanishing regret with respect to NN experts in total O~(N)\widetilde{O}(\sqrt{N}) computation time. We also give a lower bound showing that this running time cannot be improved (up to log factors) in the oracle model, thereby exhibiting a quadratic speedup as compared to the standard, oracle-free setting where the required time for vanishing regret is Θ~(N)\widetilde{\Theta}(N). These results demonstrate an exponential gap between the power of optimization in online learning and its power in statistical learning: in the latter, an optimization oracle---i.e., an efficient empirical risk minimizer---allows to learn a finite hypothesis class of size NN in time O(logN)O(\log{N}). We also study the implications of our results to learning in repeated zero-sum games, in a setting where the players have access to oracles that compute, in constant time, their best-response to any mixed strategy of their opponent. We show that the runtime required for approximating the minimax value of the game in this setting is Θ~(N)\widetilde{\Theta}(\sqrt{N}), yielding again a quadratic improvement upon the oracle-free setting, where Θ~(N)\widetilde{\Theta}(N) is known to be tight

    Modelling Behavioural Diversity for Learning in Open-Ended Games

    Get PDF
    Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on determinantal point processes (DPP). By incorporating the diversity metric into best-response dynamics, we develop diverse fictitious play and diverse policy-space response oracle for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the gamescape -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve at least the same, and in most games, lower exploitability than PSRO solvers by finding effective and diverse strategies.Comment: corresponds to <[email protected]

    Learning Convex Partitions and Computing Game-theoretic Equilibria from Best Response Queries

    Full text link
    Suppose that an mm-simplex is partitioned into nn convex regions having disjoint interiors and distinct labels, and we may learn the label of any point by querying it. The learning objective is to know, for any point in the simplex, a label that occurs within some distance ϵ\epsilon from that point. We present two algorithms for this task: Constant-Dimension Generalised Binary Search (CD-GBS), which for constant mm uses poly(n,log(1ϵ))poly(n, \log \left( \frac{1}{\epsilon} \right)) queries, and Constant-Region Generalised Binary Search (CR-GBS), which uses CD-GBS as a subroutine and for constant nn uses poly(m,log(1ϵ))poly(m, \log \left( \frac{1}{\epsilon} \right)) queries. We show via Kakutani's fixed-point theorem that these algorithms provide bounds on the best-response query complexity of computing approximate well-supported equilibria of bimatrix games in which one of the players has a constant number of pure strategies. We also partially extend our results to games with multiple players, establishing further query complexity bounds for computing approximate well-supported equilibria in this setting.Comment: 38 pages, 7 figures, second version strengthens lower bound in Theorem 6, adds footnotes with additional comments and fixes typo

    Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

    Full text link
    Robust reinforcement learning (RL) seeks to train policies that can perform well under environment perturbations or adversarial attacks. Existing approaches typically assume that the space of possible perturbations remains the same across timesteps. However, in many settings, the space of possible perturbations at a given timestep depends on past perturbations. We formally introduce temporally-coupled perturbations, presenting a novel challenge for existing robust RL methods. To tackle this challenge, we propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially-observable two-player zero-sum game. By finding an approximate equilibrium in this game, GRAD ensures the agent's robustness against temporally-coupled perturbations. Empirical experiments on a variety of continuous control tasks demonstrate that our proposed approach exhibits significant robustness advantages compared to baselines against both standard and temporally-coupled attacks, in both state and action spaces

    A quantum view on convex optimization

    Get PDF
    In this dissertation we consider quantum algorithms for convex optimization. We start by considering a black-box setting of convex optimization. In this setting we show that quantum computers require exponentially fewer queries to a membership oracle for a convex set in order to implement a separation oracle for that set. We do so by proving that Jordan's quantum gradient algorithm can also be applied to find sub-gradients of convex Lipschitz functions, even though these functions might not even be differentiable. As a corollary we get a quadraticly faster algorithm for convex optimization using membership queries. As a second set of results we give sub-linear time quantum algorithms for semidefinite optimization by speeding up the iterations of the Arora-Kale algorithm. For the problem of finding approximate Nash equilibria for zero-sum games we then give specific algorithms that improve the error-dependence and only depend on the sparsity of the game, not it's size. These last results yield improved algorithms for linear programming as a corollary. We also show several lower bounds in these settings, matching the upper bounds in most or all parameters

    Non-Asymptotic Pure Exploration by Solving Games

    Get PDF
    Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment. Good algorithms make few mistake
    corecore