33 research outputs found
Crossmodal Attentive Skill Learner
This paper presents the Crossmodal Attentive Skill Learner (CASL), integrated
with the recently-introduced Asynchronous Advantage Option-Critic (A2OC)
architecture [Harb et al., 2017] to enable hierarchical reinforcement learning
across multiple sensory inputs. We provide concrete examples where the approach
not only improves performance in a single task, but accelerates transfer to new
tasks. We demonstrate the attention mechanism anticipates and identifies useful
latent features, while filtering irrelevant sensor modalities during execution.
We modify the Arcade Learning Environment [Bellemare et al., 2013] to support
audio queries, and conduct evaluations of crossmodal learning in the Atari 2600
game Amidar. Finally, building on the recent work of Babaeizadeh et al. [2017],
we open-source a fast hybrid CPU-GPU implementation of CASL.Comment: International Conference on Autonomous Agents and Multiagent Systems
(AAMAS) 2018, NIPS 2017 Deep Reinforcement Learning Symposiu
Learning for Multi-robot Cooperation in Partially Observable Stochastic Environments with Macro-actions
This paper presents a data-driven approach for multi-robot coordination in
partially-observable domains based on Decentralized Partially Observable Markov
Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a
general framework for cooperative sequential decision making under uncertainty
and MAs allow temporally extended and asynchronous action execution. To date,
most methods assume the underlying Dec-POMDP model is known a priori or a full
simulator is available during planning time. Previous methods which aim to
address these issues suffer from local optimality and sensitivity to initial
conditions. Additionally, few hardware demonstrations involving a large team of
heterogeneous robots and with long planning horizons exist. This work addresses
these gaps by proposing an iterative sampling based Expectation-Maximization
algorithm (iSEM) to learn polices using only trajectory data containing
observations, MAs, and rewards. Our experiments show the algorithm is able to
achieve better solution quality than the state-of-the-art learning-based
methods. We implement two variants of multi-robot Search and Rescue (SAR)
domains (with and without obstacles) on hardware to demonstrate the learned
policies can effectively control a team of distributed robots to cooperate in a
partially observable stochastic environment.Comment: Accepted to the 2017 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS 2017
DRIP: Domain Refinement Iteration with Polytopes for Backward Reachability Analysis of Neural Feedback Loops
Safety certification of data-driven control techniques remains a major open
problem. This work investigates backward reachability as a framework for
providing collision avoidance guarantees for systems controlled by neural
network (NN) policies. Because NNs are typically not invertible, existing
methods conservatively assume a domain over which to relax the NN, which causes
loose over-approximations of the set of states that could lead the system into
the obstacle (i.e., backprojection (BP) sets). To address this issue, we
introduce DRIP, an algorithm with a refinement loop on the relaxation domain,
which substantially tightens the BP set bounds. Furthermore, we introduce a
formulation that enables directly obtaining closed-form representations of
polytopes to bound the BP sets tighter than prior work, which required solving
linear programs and using hyper-rectangles. Furthermore, this work extends the
NN relaxation algorithm to handle polytope domains, which further tightens the
bounds on BP sets. DRIP is demonstrated in numerical experiments on control
systems, including a ground robot controlled by a learned NN obstacle avoidance
policy
Decentralized Control of Partially Observable Markov Decision Processes using Belief Space Macro-actions
The focus of this paper is on solving multi-robot planning problems in
continuous spaces with partial observability. Decentralized partially
observable Markov decision processes (Dec-POMDPs) are general models for
multi-robot coordination problems, but representing and solving Dec-POMDPs is
often intractable for large problems. To allow for a high-level representation
that is natural for multi-robot problems and scalable to large discrete and
continuous problems, this paper extends the Dec-POMDP model to the
decentralized partially observable semi-Markov decision process (Dec-POSMDP).
The Dec-POSMDP formulation allows asynchronous decision-making by the robots,
which is crucial in multi-robot domains. We also present an algorithm for
solving this Dec-POSMDP which is much more scalable than previous methods since
it can incorporate closed-loop belief space macro-actions in planning. These
macro-actions are automatically constructed to produce robust solutions. The
proposed method's performance is evaluated on a complex multi-robot package
delivery problem under uncertainty, showing that our approach can naturally
represent multi-robot problems and provide high-quality solutions for
large-scale problems
Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis
Each year, expert-level performance is attained in increasingly-complex
multiagent domains, notable examples including Go, Poker, and StarCraft II.
This rapid progression is accompanied by a commensurate need to better
understand how such agents attain this performance, to enable their safe
deployment, identify limitations, and reveal potential means of improving them.
In this paper we take a step back from performance-focused multiagent learning,
and instead turn our attention towards agent behavior analysis. We introduce a
model-agnostic method for discovery of behavior clusters in multiagent domains,
using variational inference to learn a hierarchy of behaviors at the joint and
local agent levels. Our framework makes no assumption about agents' underlying
learning algorithms, does not require access to their latent states or
policies, and is trained using only offline observational data. We illustrate
the effectiveness of our method for enabling the coupled understanding of
behaviors at the joint and local agent level, detection of behavior
changepoints throughout training, discovery of core behavioral concepts,
demonstrate the approach's scalability to a high-dimensional multiagent MuJoCo
control domain, and also illustrate that the approach can disentangle
previously-trained policies in OpenAI's hide-and-seek domain
Decentralized control of Partially Observable Markov Decision Processes using belief space macro-actions
The focus of this paper is on solving multi-robot planning problems in continuous spaces with partial observability. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are general models for multi-robot coordination problems, but representing and solving Dec-POMDPs is often intractable for large problems. To allow for a high-level representation that is natural for multi-robot problems and scalable to large discrete and continuous problems, this paper extends the Dec-POMDP model to the Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP). The Dec-POSMDP formulation allows asynchronous decision-making by the robots, which is crucial in multi-robot domains. We also present an algorithm for solving this Dec-POSMDP which is much more scalable than previous methods since it can incorporate closed-loop belief space macro-actions in planning. These macro-actions are automatically constructed to produce robust solutions. The proposed method's performance is evaluated on a complex multi-robot package delivery problem under uncertainty, showing that our approach can naturally represent multi-robot problems and provide high-quality solutions for large-scale problems