1,968 research outputs found

    Efficiency and formalism of quantum games

    Full text link
    We pursue a general theory of quantum games. We show that quantum games are more efficient than classical games, and provide a saturated upper bound for this efficiency. We demonstrate that the set of finite classical games is a strict subset of the set of finite quantum games. We also deduce the quantum version of the Minimax Theorem and the Nash Equilibrium Theorem.Comment: 10 pages. Efficiency is explicitly defined. More discussion on the connection of quantum and classical game

    Monte Carlo Tree Search with Heuristic Evaluations using Implicit Minimax Backups

    Full text link
    Monte Carlo Tree Search (MCTS) has improved the performance of game engines in domains such as Go, Hex, and general game playing. MCTS has been shown to outperform classic alpha-beta search in games where good heuristic evaluations are difficult to obtain. In recent years, combining ideas from traditional minimax search in MCTS has been shown to be advantageous in some domains, such as Lines of Action, Amazons, and Breakthrough. In this paper, we propose a new way to use heuristic evaluations to guide the MCTS search by storing the two sources of information, estimated win rates and heuristic evaluations, separately. Rather than using the heuristic evaluations to replace the playouts, our technique backs them up implicitly during the MCTS simulations. These minimax values are then used to guide future simulations. We show that using implicit minimax backups leads to stronger play performance in Kalah, Breakthrough, and Lines of Action.Comment: 24 pages, 7 figures, 9 tables, expanded version of paper presented at IEEE Conference on Computational Intelligence and Games (CIG) 2014 conferenc

    A Survey of Monte Carlo Tree Search Methods

    Get PDF
    Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

    Efficient Transductive Online Learning via Randomized Rounding

    Full text link
    Most traditional online learning algorithms are based on variants of mirror descent or follow-the-leader. In this paper, we present an online algorithm based on a completely different approach, tailored for transductive settings, which combines "random playout" and randomized rounding of loss subgradients. As an application of our approach, we present the first computationally efficient online algorithm for collaborative filtering with trace-norm constrained matrices. As a second application, we solve an open question linking batch learning and transductive online learningComment: To appear in a Festschrift in honor of V.N. Vapnik. Preliminary version presented in NIPS 201

    Online Learning with Feedback Graphs: Beyond Bandits

    Get PDF
    We study a general class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multi-armed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph controls the inherent difficulty of the induced TT-round learning problem. Specifically, we show that any feedback graph belongs to one of three classes: strongly observable graphs, weakly observable graphs, and unobservable graphs. We prove that the first class induces learning problems with Θ~(α1/2T1/2)\widetilde\Theta(\alpha^{1/2} T^{1/2}) minimax regret, where α\alpha is the independence number of the underlying graph; the second class induces problems with Θ~(δ1/3T2/3)\widetilde\Theta(\delta^{1/3}T^{2/3}) minimax regret, where δ\delta is the domination number of a certain portion of the graph; and the third class induces problems with linear minimax regret. Our results subsume much of the previous work on learning with feedback graphs and reveal new connections to partial monitoring games. We also show how the regret is affected if the graphs are allowed to vary with time

    Minimax Policies for Combinatorial Prediction Games

    Full text link
    We address the online linear optimization problem when the actions of the forecaster are represented by binary vectors. Our goal is to understand the magnitude of the minimax regret for the worst possible set of actions. We study the problem under three different assumptions for the feedback: full information, and the partial information models of the so-called "semi-bandit", and "bandit" problems. We consider both LL_\infty-, and L2L_2-type of restrictions for the losses assigned by the adversary. We formulate a general strategy using Bregman projections on top of a potential-based gradient descent, which generalizes the ones studied in the series of papers Gyorgy et al. (2007), Dani et al. (2008), Abernethy et al. (2008), Cesa-Bianchi and Lugosi (2009), Helmbold and Warmuth (2009), Koolen et al. (2010), Uchiya et al. (2010), Kale et al. (2010) and Audibert and Bubeck (2010). We provide simple proofs that recover most of the previous results. We propose new upper bounds for the semi-bandit game. Moreover we derive lower bounds for all three feedback assumptions. With the only exception of the bandit game, the upper and lower bounds are tight, up to a constant factor. Finally, we answer a question asked by Koolen et al. (2010) by showing that the exponentially weighted average forecaster is suboptimal against LL_{\infty} adversaries

    Relax and Localize: From Value to Algorithms

    Full text link
    We show a principled way of deriving online learning algorithms from a minimax analysis. Various upper bounds on the minimax value, previously thought to be non-constructive, are shown to yield algorithms. This allows us to seamlessly recover known methods and to derive new ones. Our framework also captures such "unorthodox" methods as Follow the Perturbed Leader and the R^2 forecaster. We emphasize that understanding the inherent complexity of the learning problem leads to the development of algorithms. We define local sequential Rademacher complexities and associated algorithms that allow us to obtain faster rates in online learning, similarly to statistical learning theory. Based on these localized complexities we build a general adaptive method that can take advantage of the suboptimality of the observed sequence. We present a number of new algorithms, including a family of randomized methods that use the idea of a "random playout". Several new versions of the Follow-the-Perturbed-Leader algorithms are presented, as well as methods based on the Littlestone's dimension, efficient methods for matrix completion with trace norm, and algorithms for the problems of transductive learning and prediction with static experts
    corecore