Search CORE

14 research outputs found

No-Regret Learning in Extensive-Form Games with Imperfect Recall

Author: Marc Lanctot
Martin Zinkevich
Michael Bowling
Neil Burch
Richard Gibson
Yahoo Reseach
Publication venue
Publication date: 01/01/2012
Field of study

Counterfactual Regret Minimization (CFR) is an efficient no-regret learning algorithm for decision problems modeled as extensive games. CFR's regret bounds depend on the requirement of perfect recall: players always remember information that was revealed to them and the order in which it was revealed. In games without perfect recall, however, CFR's guarantees do not apply. In this paper, we present the first regret bound for CFR when applied to a general class of games with imperfect recall. In addition, we show that CFR applied to any abstraction belonging to our general class results in a regret bound not just for the abstract game, but for the full game as well. We verify our theory and show how imperfect recall can be used to trade a small increase in regret for a significant reduction in memory in three domains: die-roll poker, phantom tic-tac-toe, and Bluff.Comment: 21 pages, 4 figures, expanded version of article to appear in Proceedings of the Twenty-Ninth International Conference on Machine Learnin

arXiv.org e-Print Archive

CiteSeerX

Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines

Author: Bowling Michael
Burch Neil
Kadlec Rudolf
Lanctot Marc
Moravcik Matej
Schmid Martin
Publication venue
Publication date: 09/09/2018
Field of study

Learning strategies for imperfect information games from samples of interaction is a challenging problem. A common method for this setting, Monte Carlo Counterfactual Regret Minimization (MCCFR), can have slow long-term convergence rates due to high variance. In this paper, we introduce a variance reduction technique (VR-MCCFR) that applies to any sampling variant of MCCFR. Using this technique, per-iteration estimated values and updates are reformulated as a function of sampled values and state-action baselines, similar to their use in policy gradient reinforcement learning. The new formulation allows estimates to be bootstrapped from other estimates within the same episode, propagating the benefits of baselines along the sampled trajectory; the estimates remain unbiased even when bootstrapping from other estimates. Finally, we show that given a perfect baseline, the variance of the value estimates can be reduced to zero. Experimental evaluation shows that VR-MCCFR brings an order of magnitude speedup, while the empirical variance decreases by three orders of magnitude. The decreased variance allows for the first time CFR+ to be used with sampling, increasing the speedup to two orders of magnitude

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Solving Common-Payoff Games with Approximate Policy Iteration

Author: Bowling Michael
Burch Neil
D'Orazio Ryan
Davoodi Elnaz
Lanctot Marc
Lockhart Edward
Schmid Martin
Sokota Samuel
Timbers Finbarr
Publication venue
Publication date: 11/01/2021
Field of study

For artificially intelligent learning systems to have widespread applicability in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is difficult -- computing even an epsilon-optimal joint policy is a NEXP complete problem. Nevertheless, a recently rediscovered insight -- that a team of agents can coordinate via common knowledge -- has given rise to algorithms capable of finding optimal joint policies in small common-payoff games. The Bayesian action decoder (BAD) leverages this insight and deep reinforcement learning to scale to games as large as two-player Hanabi. However, the approximations it uses to do so prevent it from discovering optimal joint policies even in games small enough to brute force optimal solutions. This work proposes CAPI, a novel algorithm which, like BAD, combines common knowledge with deep reinforcement learning. However, unlike BAD, CAPI prioritizes the propensity to discover optimal joint policies over scalability. While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so. Code is available at https://github.com/ssokota/capi .Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

The Hanabi Challenge: A New Frontier for AI Research

Author: Bard Nolan
Bellemare Marc G.
Bowling Michael
Burch Neil
Chandar Sarath
Dumoulin Vincent
Dunning Iain
Foerster Jakob N.
Hughes Edward
Lanctot Marc
Larochelle Hugo
Moitra Subhodeep
Mourad Shibl
Parisotto Emilio
Song H. Francis
Publication venue: 'Elsevier BV'
Publication date: 06/12/2019
Field of study

From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet well-defined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay with two to five players and imperfect information. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques for such theory of mind reasoning will not only be crucial for success in Hanabi, but also in broader collaborative efforts, especially those with human partners. To facilitate future research, we introduce the open-source Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current state-of-the-art techniques.Comment: 32 pages, 5 figures, In Press (Artificial Intelligence

arXiv.org e-Print Archive

PolyPublie

The Transcriptomes of Two Heritable Cell Types Illuminate the Circuit Governing Their Differentiation

The differentiation of cells into distinct cell types, each of which is heritable for many generations, underlies many biological phenomena. White and opaque cells of the fungal pathogen Candida albicans are two such heritable cell types, each thought to be adapted to unique niches within their human host. To systematically investigate their differences, we performed strand-specific, massively-parallel sequencing of RNA from C. albicans white and opaque cells. With these data we first annotated the C. albicans transcriptome, finding hundreds of novel differentially-expressed transcripts. Using the new annotation, we compared differences in transcript abundance between the two cell types with the genomic regions bound by a master regulator of the white-opaque switch (Wor1). We found that the revised transcriptional landscape considerably alters our understanding of the circuit governing differentiation. In particular, we can now resolve the poor concordance between binding of a master regulator and the differential expression of adjacent genes, a discrepancy observed in several other studies of cell differentiation. More than one third of the Wor1-bound differentially-expressed transcripts were previously unannotated, which explains the formerly puzzling presence of Wor1 at these positions along the genome. Many of these newly identified Wor1-regulated genes are non-coding and transcribed antisense to coding transcripts. We also find that 5′ and 3′ UTRs of mRNAs in the circuit are unusually long and that 5′ UTRs often differ in length between cell-types, suggesting UTRs encode important regulatory information and that use of alternative promoters is widespread. Further analysis revealed that the revised Wor1 circuit bears several striking similarities to the Oct4 circuit that specifies the pluripotency of mammalian embryonic stem cells. Additional characteristics shared with the Oct4 circuit suggest a set of general hallmarks characteristic of heritable differentiation states in eukaryotes

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Symptomatic treatment of Alzheimer’s disease: identification of biomarkers to aid translation from bench to bedside

Author: Alhainen
Bartenstein
Beckmann
de la Torre
Elaine A Irving
Gessner
Giustina
Hofferberth
Holman
Iliff
Itil
Katada
Kinney
Kitamura
Lanctot
Lojkowska
Moraes
Mueggler
Nakano
Nakao
Neil Upton
Nobili
Ogawa
Salin-Pascual
Tsukada
Zhang
Publication venue: 'Future Medicine Ltd'
Publication date
Field of study

Crossref

Pharmacological management of behavioural and psychological disturbance in dementia

Author: Aisen
Apter
Aupperle
Ballard
Beber
Brodaty
Bullock
Burke
Coccaro
Cohen
Cohen-Mansfield
Cooper
Cooper
Cummings
Cummings
Curran
Curran
De Deyn
Feldman
Finkel
Finkel
Food and Drug Administration—US FDA
Frank
Gabelli
Gauthier
Gauthier
Gill
Hawkins
Hermann
Holmes
Houlihan
Jewart
Jost
Kaufer
Kyomen
Lanctot
Lee
Lonergan
Lyketsos
McKeith
Meehan
Monsch
Neil
Nyth
Oberholzer
Oflasson
Olin
Overall
Petrie
Porsteinsson
Reading
Reisberg
Robert
Rojas-Fernandez
Rosler
Rosler
Roth
Salzman
Schneider
Schneider
Shankle
Street
Tariot
Tariot
Tariot
Tariot
Tariot
Tariot
Verny
Weiler
Wynn
Publication venue: 'Wiley'
Publication date: 01/01/2006
Field of study

Behavioural and psychological symptoms in patients with dementia are common, distressing and often difficult to manage. This review evaluates a range of drugs commonly used to manage these symptoms including antipsychotics, anticonvulsants, antidementia drugs and antidepressants. The risks and benefits of individual treatments are discussed and the relatively poor evidence base and need for further research is highlighted

Crossref

University of Huddersfield Repository