8,020 research outputs found
Learning for Multi-robot Cooperation in Partially Observable Stochastic Environments with Macro-actions
This paper presents a data-driven approach for multi-robot coordination in
partially-observable domains based on Decentralized Partially Observable Markov
Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a
general framework for cooperative sequential decision making under uncertainty
and MAs allow temporally extended and asynchronous action execution. To date,
most methods assume the underlying Dec-POMDP model is known a priori or a full
simulator is available during planning time. Previous methods which aim to
address these issues suffer from local optimality and sensitivity to initial
conditions. Additionally, few hardware demonstrations involving a large team of
heterogeneous robots and with long planning horizons exist. This work addresses
these gaps by proposing an iterative sampling based Expectation-Maximization
algorithm (iSEM) to learn polices using only trajectory data containing
observations, MAs, and rewards. Our experiments show the algorithm is able to
achieve better solution quality than the state-of-the-art learning-based
methods. We implement two variants of multi-robot Search and Rescue (SAR)
domains (with and without obstacles) on hardware to demonstrate the learned
policies can effectively control a team of distributed robots to cooperate in a
partially observable stochastic environment.Comment: Accepted to the 2017 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS 2017
Recommended from our members
Belief-Space Planning for Resourceful Manipulation and Mobility
Robots are increasingly expected to work in partially observable and unstructured environments. They need to select actions that exploit perceptual and motor resourcefulness to manage uncertainty based on the demands of the task and environment. The research in this dissertation makes two primary contributions. First, it develops a new concept in resourceful robot platforms called the UMass uBot and introduces the sixth and seventh in the uBot series. uBot-6 introduces multiple postural configurations that enable different modes of mobility and manipulation to meet the needs of a wide variety of tasks and environmental constraints. uBot-7 extends this with the use of series elastic actuators (SEAs) to improve manipulation capabilities and support safer operation around humans. The resourcefulness of these robots is complemented with a belief-space planning framework that enables task-driven action selection in the context of the partially observable environment. The framework uses a compact but expressive state representation based on object models. We extend an existing affordance-based object model, called an aspect transition graph (ATG), with geometric information. This enables object-centric modeling of features and actions, making the model much more expressive without increasing the complexity. A novel task representation enables the belief-space planner to perform general object-centric tasks ranging from recognition to manipulation of objects. The approach supports the efficient handling of multi-object scenes. The combination of the physical platform and the planning framework are evaluated in two novel, challenging, partially observable planning domains. The ARcube domain provides a large population of objects that are highly ambiguous. Objects can only be differentiated using multi-modal sensor information and manual interactions. In the dexterous mobility domain, a robot can employ multiple mobility modes to complete navigation tasks under a variety of possible environment constraints. The performance of the proposed approach is evaluated using experiments in simulation and on a real robot
HandMeThat: Human-Robot Communication in Physical and Social Environments
We introduce HandMeThat, a benchmark for a holistic evaluation of instruction
understanding and following in physical and social environments. While previous
datasets primarily focused on language grounding and planning, HandMeThat
considers the resolution of human instructions with ambiguities based on the
physical (object states and relations) and social (human actions and goals)
information. HandMeThat contains 10,000 episodes of human-robot interactions.
In each episode, the robot first observes a trajectory of human actions towards
her internal goal. Next, the robot receives a human instruction and should take
actions to accomplish the subgoal set through the instruction. In this paper,
we present a textual interface for our benchmark, where the robot interacts
with a virtual environment through textual commands. We evaluate several
baseline models on HandMeThat, and show that both offline and online
reinforcement learning algorithms perform poorly on HandMeThat, suggesting
significant room for future work on physical and social human-robot
communications and interactions.Comment: NeurIPS 2022 (Dataset and Benchmark Track). First two authors
contributed equally. Project page: http://handmethat.csail.mit.edu
Path Planning in Rough Terrain Using Neural Network Memory
Learning navigation policies in an unstructured terrain is a complex task. The Learning to Search (LEARCH) algorithm constructs cost functions that map environmental features to a certain cost for traversing a patch of terrain. These features are abstractions of the environment, in which trees, vegetation, slopes, water and rocks can be found, and the traversal costs are scalar values that represent the difficulty for a robot to cross given the patches of terrain. However, LEARCH tends to forget knowledge after new policies are learned. The study demonstrates that reinforcement learning and long-short-term memory (LSTM) neural networks can be used to provide a memory for LEARCH. Further, they allow the navigation agent to recognize hidden states of the state space it navigates. This new approach allows the knowledge learned in the previous training to be used to navigate new environments and, also, for retraining. Herein, navigation episodes are designed to confirm the memory, learning policy and hidden-state recognition capabilities, acquired by the navigation agent through the use of LSTM
- …