460 research outputs found
Action Guidance with MCTS for Deep Reinforcement Learning
Deep reinforcement learning has achieved great successes in recent years,
however, one main challenge is the sample inefficiency. In this paper, we focus
on how to use action guidance by means of a non-expert demonstrator to improve
sample efficiency in a domain with sparse, delayed, and possibly deceptive
rewards: the recently-proposed multi-agent benchmark of Pommerman. We propose a
new framework where even a non-expert simulated demonstrator, e.g., planning
algorithms such as Monte Carlo tree search with a small number rollouts, can be
integrated within asynchronous distributed deep reinforcement learning methods.
Compared to a vanilla deep RL algorithm, our proposed methods both learn faster
and converge to better policies on a two-player mini version of the Pommerman
game.Comment: AAAI Conference on Artificial Intelligence and Interactive Digital
Entertainment (AIIDE'19). arXiv admin note: substantial text overlap with
arXiv:1904.05759, arXiv:1812.0004
Object-Oriented Dynamics Learning through Multi-Level Abstraction
Object-based approaches for learning action-conditioned dynamics has
demonstrated promise for generalization and interpretability. However, existing
approaches suffer from structural limitations and optimization difficulties for
common environments with multiple dynamic objects. In this paper, we present a
novel self-supervised learning framework, called Multi-level Abstraction
Object-oriented Predictor (MAOP), which employs a three-level learning
architecture that enables efficient object-based dynamics learning from raw
visual observations. We also design a spatial-temporal relational reasoning
mechanism for MAOP to support instance-level dynamics learning and handle
partial observability. Our results show that MAOP significantly outperforms
previous methods in terms of sample efficiency and generalization over novel
environments for learning environment models. We also demonstrate that learned
dynamics models enable efficient planning in unseen environments, comparable to
true environment models. In addition, MAOP learns semantically and visually
interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial
Intelligence (AAAI), 202
Beyond Games: A Systematic Review of Neural Monte Carlo Tree Search Applications
The advent of AlphaGo and its successors marked the beginning of a new
paradigm in playing games using artificial intelligence. This was achieved by
combining Monte Carlo tree search, a planning procedure, and deep learning.
While the impact on the domain of games has been undeniable, it is less clear
how useful similar approaches are in applications beyond games and how they
need to be adapted from the original methodology. We review 129 peer-reviewed
articles detailing the application of neural Monte Carlo tree search methods in
domains other than games. Our goal is to systematically assess how such methods
are structured in practice and if their success can be extended to other
domains. We find applications in a variety of domains, many distinct ways of
guiding the tree search using learned policy and value functions, and various
training methods. Our review maps the current landscape of algorithms in the
family of neural monte carlo tree search as they are applied to practical
problems, which is a first step towards a more principled way of designing such
algorithms for specific problems and their requirements.Comment: 38 pages, 14 figures, submitted to Springer Applied Intelligenc
- …