14 research outputs found

    No-Regret Learning in Extensive-Form Games with Imperfect Recall

    Full text link
    Counterfactual Regret Minimization (CFR) is an efficient no-regret learning algorithm for decision problems modeled as extensive games. CFR's regret bounds depend on the requirement of perfect recall: players always remember information that was revealed to them and the order in which it was revealed. In games without perfect recall, however, CFR's guarantees do not apply. In this paper, we present the first regret bound for CFR when applied to a general class of games with imperfect recall. In addition, we show that CFR applied to any abstraction belonging to our general class results in a regret bound not just for the abstract game, but for the full game as well. We verify our theory and show how imperfect recall can be used to trade a small increase in regret for a significant reduction in memory in three domains: die-roll poker, phantom tic-tac-toe, and Bluff.Comment: 21 pages, 4 figures, expanded version of article to appear in Proceedings of the Twenty-Ninth International Conference on Machine Learnin

    Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines

    Full text link
    Learning strategies for imperfect information games from samples of interaction is a challenging problem. A common method for this setting, Monte Carlo Counterfactual Regret Minimization (MCCFR), can have slow long-term convergence rates due to high variance. In this paper, we introduce a variance reduction technique (VR-MCCFR) that applies to any sampling variant of MCCFR. Using this technique, per-iteration estimated values and updates are reformulated as a function of sampled values and state-action baselines, similar to their use in policy gradient reinforcement learning. The new formulation allows estimates to be bootstrapped from other estimates within the same episode, propagating the benefits of baselines along the sampled trajectory; the estimates remain unbiased even when bootstrapping from other estimates. Finally, we show that given a perfect baseline, the variance of the value estimates can be reduced to zero. Experimental evaluation shows that VR-MCCFR brings an order of magnitude speedup, while the empirical variance decreases by three orders of magnitude. The decreased variance allows for the first time CFR+ to be used with sampling, increasing the speedup to two orders of magnitude

    Solving Common-Payoff Games with Approximate Policy Iteration

    Full text link
    For artificially intelligent learning systems to have widespread applicability in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is difficult -- computing even an epsilon-optimal joint policy is a NEXP complete problem. Nevertheless, a recently rediscovered insight -- that a team of agents can coordinate via common knowledge -- has given rise to algorithms capable of finding optimal joint policies in small common-payoff games. The Bayesian action decoder (BAD) leverages this insight and deep reinforcement learning to scale to games as large as two-player Hanabi. However, the approximations it uses to do so prevent it from discovering optimal joint policies even in games small enough to brute force optimal solutions. This work proposes CAPI, a novel algorithm which, like BAD, combines common knowledge with deep reinforcement learning. However, unlike BAD, CAPI prioritizes the propensity to discover optimal joint policies over scalability. While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so. Code is available at https://github.com/ssokota/capi .Comment: AAAI 202

    The Hanabi Challenge: A New Frontier for AI Research

    Full text link
    From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet well-defined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay with two to five players and imperfect information. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques for such theory of mind reasoning will not only be crucial for success in Hanabi, but also in broader collaborative efforts, especially those with human partners. To facilitate future research, we introduce the open-source Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current state-of-the-art techniques.Comment: 32 pages, 5 figures, In Press (Artificial Intelligence

    The Transcriptomes of Two Heritable Cell Types Illuminate the Circuit Governing Their Differentiation

    Get PDF
    The differentiation of cells into distinct cell types, each of which is heritable for many generations, underlies many biological phenomena. White and opaque cells of the fungal pathogen Candida albicans are two such heritable cell types, each thought to be adapted to unique niches within their human host. To systematically investigate their differences, we performed strand-specific, massively-parallel sequencing of RNA from C. albicans white and opaque cells. With these data we first annotated the C. albicans transcriptome, finding hundreds of novel differentially-expressed transcripts. Using the new annotation, we compared differences in transcript abundance between the two cell types with the genomic regions bound by a master regulator of the white-opaque switch (Wor1). We found that the revised transcriptional landscape considerably alters our understanding of the circuit governing differentiation. In particular, we can now resolve the poor concordance between binding of a master regulator and the differential expression of adjacent genes, a discrepancy observed in several other studies of cell differentiation. More than one third of the Wor1-bound differentially-expressed transcripts were previously unannotated, which explains the formerly puzzling presence of Wor1 at these positions along the genome. Many of these newly identified Wor1-regulated genes are non-coding and transcribed antisense to coding transcripts. We also find that 5′ and 3′ UTRs of mRNAs in the circuit are unusually long and that 5′ UTRs often differ in length between cell-types, suggesting UTRs encode important regulatory information and that use of alternative promoters is widespread. Further analysis revealed that the revised Wor1 circuit bears several striking similarities to the Oct4 circuit that specifies the pluripotency of mammalian embryonic stem cells. Additional characteristics shared with the Oct4 circuit suggest a set of general hallmarks characteristic of heritable differentiation states in eukaryotes

    Pharmacological management of behavioural and psychological disturbance in dementia

    No full text
    Behavioural and psychological symptoms in patients with dementia are common, distressing and often difficult to manage. This review evaluates a range of drugs commonly used to manage these symptoms including antipsychotics, anticonvulsants, antidementia drugs and antidepressants. The risks and benefits of individual treatments are discussed and the relatively poor evidence base and need for further research is highlighted
    corecore