Search CORE

1,968 research outputs found

Efficiency and formalism of quantum games

Author: Johnson Neil
Lee Chiu Fan
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2003
Field of study

We pursue a general theory of quantum games. We show that quantum games are more efficient than classical games, and provide a saturated upper bound for this efficiency. We demonstrate that the set of finite classical games is a strict subset of the set of finite quantum games. We also deduce the quantum version of the Minimax Theorem and the Nash Equilibrium Theorem.Comment: 10 pages. Efficiency is explicitly defined. More discussion on the connection of quantum and classical game

arXiv.org e-Print Archive

Oxford University Research Archive

Monte Carlo Tree Search with Heuristic Evaluations using Implicit Minimax Backups

Author: Lanctot Marc
Pepels Tom
Sturtevant Nathan R.
Winands Mark H. M.
Publication venue
Publication date: 01/01/2014
Field of study

Monte Carlo Tree Search (MCTS) has improved the performance of game engines in domains such as Go, Hex, and general game playing. MCTS has been shown to outperform classic alpha-beta search in games where good heuristic evaluations are difficult to obtain. In recent years, combining ideas from traditional minimax search in MCTS has been shown to be advantageous in some domains, such as Lines of Action, Amazons, and Breakthrough. In this paper, we propose a new way to use heuristic evaluations to guide the MCTS search by storing the two sources of information, estimated win rates and heuristic evaluations, separately. Rather than using the heuristic evaluations to replace the playouts, our technique backs them up implicitly during the MCTS simulations. These minimax values are then used to guide future simulations. We show that using implicit minimax backups leads to stronger play performance in Kalah, Breakthrough, and Lines of Action.Comment: 24 pages, 7 figures, 9 tables, expanded version of paper presented at IEEE Conference on Computational Intelligence and Games (CIG) 2014 conferenc

arXiv.org e-Print Archive

CiteSeerX

Maastricht University Research Portal

Crossref

A Survey of Monte Carlo Tree Search Methods

Author: Browne Cameron B
Colton Simon
Cowling Peter I
Lucas Simon M
Perez Diego
Powley Edward
Rohlfshagen Philipp
Samothrakis Spyridon
Tavener Stephen
Whitehouse Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

University of Essex Research Repository

CiteSeerX

Maastricht University Research Portal

Efficient Transductive Online Learning via Randomized Rounding

Author: Cesa-Bianchi Nicolò
Shamir Ohad
Publication venue
Publication date: 01/01/2013
Field of study

Most traditional online learning algorithms are based on variants of mirror descent or follow-the-leader. In this paper, we present an online algorithm based on a completely different approach, tailored for transductive settings, which combines "random playout" and randomized rounding of loss subgradients. As an application of our approach, we present the first computationally efficient online algorithm for collaborative filtering with trace-norm constrained matrices. As a second application, we solve an open question linking batch learning and transductive online learningComment: To appear in a Festschrift in honor of V.N. Vapnik. Preliminary version presented in NIPS 201

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

Online Learning with Feedback Graphs: Beyond Bandits

Author: Alon Noga
Cesa-Bianchi Nicolò
Dekel Ofer
Koren Tomer
Publication venue
Publication date: 01/01/2015
Field of study

We study a general class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multi-armed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph controls the inherent difficulty of the induced

T

-round learning problem. Specifically, we show that any feedback graph belongs to one of three classes: strongly observable graphs, weakly observable graphs, and unobservable graphs. We prove that the first class induces learning problems with

\widetilde\Theta(\alpha^{1/2} T^{1/2})

minimax regret, where

\alpha

is the independence number of the underlying graph; the second class induces problems with

\widetilde\Theta(\delta^{1/3}T^{2/3})

minimax regret, where

\delta

is the domination number of a certain portion of the graph; and the third class induces problems with linear minimax regret. Our results subsume much of the previous work on learning with feedback graphs and reveal new connections to partial monitoring games. We also show how the regret is affected if the graphs are allowed to vary with time

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

Minimax Policies for Combinatorial Prediction Games

Author: Audibert Jean-Yves
Bubeck Sebastien
Lugosi Gabor
Publication venue
Publication date: 01/01/2011
Field of study

We address the online linear optimization problem when the actions of the forecaster are represented by binary vectors. Our goal is to understand the magnitude of the minimax regret for the worst possible set of actions. We study the problem under three different assumptions for the feedback: full information, and the partial information models of the so-called "semi-bandit", and "bandit" problems. We consider both

L_\infty

-, and

L_2

-type of restrictions for the losses assigned by the adversary. We formulate a general strategy using Bregman projections on top of a potential-based gradient descent, which generalizes the ones studied in the series of papers Gyorgy et al. (2007), Dani et al. (2008), Abernethy et al. (2008), Cesa-Bianchi and Lugosi (2009), Helmbold and Warmuth (2009), Koolen et al. (2010), Uchiya et al. (2010), Kale et al. (2010) and Audibert and Bubeck (2010). We provide simple proofs that recover most of the previous results. We propose new upper bounds for the semi-bandit game. Moreover we derive lower bounds for all three feedback assumptions. With the only exception of the bandit game, the upper and lower bounds are tight, up to a constant factor. Finally, we answer a question asked by Koolen et al. (2010) by showing that the exponentially weighted average forecaster is suboptimal against

L_{\infty}

adversaries

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Relax and Localize: From Value to Algorithms

Author: Rakhlin Alexander
Shamir Ohad
Sridharan Karthik
Publication venue
Publication date: 01/01/2012
Field of study

We show a principled way of deriving online learning algorithms from a minimax analysis. Various upper bounds on the minimax value, previously thought to be non-constructive, are shown to yield algorithms. This allows us to seamlessly recover known methods and to derive new ones. Our framework also captures such "unorthodox" methods as Follow the Perturbed Leader and the R^2 forecaster. We emphasize that understanding the inherent complexity of the learning problem leads to the development of algorithms. We define local sequential Rademacher complexities and associated algorithms that allow us to obtain faster rates in online learning, similarly to statistical learning theory. Based on these localized complexities we build a general adaptive method that can take advantage of the suboptimality of the observed sequence. We present a number of new algorithms, including a family of randomized methods that use the idea of a "random playout". Several new versions of the Follow-the-Perturbed-Leader algorithms are presented, as well as methods based on the Littlestone's dimension, efficient methods for matrix completion with trace norm, and algorithms for the problems of transductive learning and prediction with static experts

arXiv.org e-Print Archive

CiteSeerX