261 research outputs found
Bayesian RL in factored POMDPs
Robust decision-making agents in any non-trivial system must reason over uncertainty of various types such as action outcomes, the agent's current state and the dynamics of the environment. The outcome and state un- certainty are elegantly captured by the Partially Observable Markov Decision Processes (POMDP) framework [1], which enable reasoning in stochastic, par- tially observable environments. POMDP solution methods, however, typically assume complete access to the system dynamics, which unfortunately are often not available. When such a model is not available, model-based Bayesian Re- inforcement Learning (BRL) methods explicitly maintain a posterior over the possible models of the environment, and use this knowledge to select actions that, theoretically, trade o_ exploration and exploitation optimally. However, few of the BRL methods are applicable to partial observable settings, and those that are, have limited scaling properties. The Bayes-Adaptive POMDP (BA- POMDP) [4], for example, models the environment in a tabular fashion, which poses a bottleneck for scalability. Here, we describe previous work [3] that pro- poses a method to overcome this bottleneck by representing the dynamics with Bayes Network, an approach that exploits structure in the form of independence between state and observation features.Interactive Intelligenc
Scalable Planning and Learning for Multiagent POMDPs: Extended Version
Online, sample-based planning algorithms for POMDPs have shown great promise
in scaling to problems with large state spaces, but they become intractable for
large action and observation spaces. This is particularly problematic in
multiagent POMDPs where the action and observation space grows exponentially
with the number of agents. To combat this intractability, we propose a novel
scalable approach based on sample-based planning and factored value functions
that exploits structure present in many multiagent settings. This approach
applies not only in the planning case, but also in the Bayesian reinforcement
learning setting. Experimental results show that we are able to provide high
quality solutions to large multiagent planning and learning problems
Influence-Optimistic Local Values for Multiagent Planning --- Extended Version
Recent years have seen the development of methods for multiagent planning
under uncertainty that scale to tens or even hundreds of agents. However, most
of these methods either make restrictive assumptions on the problem domain, or
provide approximate solutions without any guarantees on quality. Methods in the
former category typically build on heuristic search using upper bounds on the
value function. Unfortunately, no techniques exist to compute such upper bounds
for problems with non-factored value functions. To allow for meaningful
benchmarking through measurable quality guarantees on a very general class of
problems, this paper introduces a family of influence-optimistic upper bounds
for factored decentralized partially observable Markov decision processes
(Dec-POMDPs) that do not have factored value functions. Intuitively, we derive
bounds on very large multiagent planning problems by subdividing them in
sub-problems, and at each of these sub-problems making optimistic assumptions
with respect to the influence that will be exerted by the rest of the system.
We numerically compare the different upper bounds and demonstrate how we can
achieve a non-trivial guarantee that a heuristic solution for problems with
hundreds of agents is close to optimal. Furthermore, we provide evidence that
the upper bounds may improve the effectiveness of heuristic influence search,
and discuss further potential applications to multiagent planning.Comment: Long version of IJCAI 2015 paper (and extended abstract at AAMAS
2015
Deep Variational Reinforcement Learning for POMDPs
Many real-world sequential decision making problems are partially observable
by nature, and the environment model is typically unknown. Consequently, there
is great need for reinforcement learning methods that can tackle such problems
given only a stream of incomplete and noisy observations. In this paper, we
propose deep variational reinforcement learning (DVRL), which introduces an
inductive bias that allows an agent to learn a generative model of the
environment and perform inference in that model to effectively aggregate the
available information. We develop an n-step approximation to the evidence lower
bound (ELBO), allowing the model to be trained jointly with the policy. This
ensures that the latent state representation is suitable for the control task.
In experiments on Mountain Hike and flickering Atari we show that our method
outperforms previous approaches relying on recurrent neural networks to encode
the past
On-Robot Bayesian Reinforcement Learning for POMDPs
Robot learning is often difficult due to the expense of gathering data. The
need for large amounts of data can, and should, be tackled with effective
algorithms and leveraging expert information on robot dynamics. Bayesian
reinforcement learning (BRL), thanks to its sample efficiency and ability to
exploit prior knowledge, is uniquely positioned as such a solution method.
Unfortunately, the application of BRL has been limited due to the difficulties
of representing expert knowledge as well as solving the subsequent inference
problem. This paper advances BRL for robotics by proposing a specialized
framework for physical systems. In particular, we capture this knowledge in a
factored representation, then demonstrate the posterior factorizes in a similar
shape, and ultimately formalize the model in a Bayesian framework. We then
introduce a sample-based online solution method, based on Monte-Carlo tree
search and particle filtering, specialized to solve the resulting model. This
approach can, for example, utilize typical low-level robot simulators and
handle uncertainty over unknown dynamics of the environment. We empirically
demonstrate its efficiency by performing on-robot learning in two human-robot
interaction tasks with uncertainty about human behavior, achieving near-optimal
performance after only a handful of real-world episodes. A video of learned
policies is at https://youtu.be/H9xp60ngOes.Comment: Accepted at IROS-2023 (Detroit, USA
The MADP Toolbox: An Open-Source Library for Planning and Learning in (Multi-)Agent Systems
This article describes the MultiAgent Decision Process (MADP) toolbox, a software library to support planning and learning for intelligent agents and multiagent systems in un- certain environments. Some of its key features are that it sup- ports partially observable environments and stochastic tran- sition models; has unified support for single- and multiagent systems; provides a large number of models for decision- theoretic decision making, including one-shot decision mak- ing (e.g., Bayesian games) and sequential decision mak- ing under various assumptions of observability and coopera- tion, such as Dec-POMDPs and POSGs; provides tools and parsers to quickly prototype new problems; provides an ex- tensive range of planning and learning algorithms for single- and multiagent systems; and is written in C++ and designed to be extensible via the object-oriented paradigm
- …