Search CORE

8,020 research outputs found

Learning for Multi-robot Cooperation in Partially Observable Stochastic Environments with Macro-actions

Author: Amato Christopher
How Jonathan P.
Liu Miao
Omidshafiei Shayegan
Sivakumar Kavinayan
Publication venue
Publication date: 17/08/2017
Field of study

This paper presents a data-driven approach for multi-robot coordination in partially-observable domains based on Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a general framework for cooperative sequential decision making under uncertainty and MAs allow temporally extended and asynchronous action execution. To date, most methods assume the underlying Dec-POMDP model is known a priori or a full simulator is available during planning time. Previous methods which aim to address these issues suffer from local optimality and sensitivity to initial conditions. Additionally, few hardware demonstrations involving a large team of heterogeneous robots and with long planning horizons exist. This work addresses these gaps by proposing an iterative sampling based Expectation-Maximization algorithm (iSEM) to learn polices using only trajectory data containing observations, MAs, and rewards. Our experiments show the algorithm is able to achieve better solution quality than the state-of-the-art learning-based methods. We implement two variants of multi-robot Search and Rescue (SAR) domains (with and without obstacles) on hardware to demonstrate the learned policies can effectively control a team of distributed robots to cooperate in a partially observable stochastic environment.Comment: Accepted to the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Recommended from our members

Belief-Space Planning for Resourceful Manipulation and Mobility

Author: Ruiken Dirk
Publication venue: ScholarWorks@UMass Amherst
Publication date: 06/07/2017
Field of study

Robots are increasingly expected to work in partially observable and unstructured environments. They need to select actions that exploit perceptual and motor resourcefulness to manage uncertainty based on the demands of the task and environment. The research in this dissertation makes two primary contributions. First, it develops a new concept in resourceful robot platforms called the UMass uBot and introduces the sixth and seventh in the uBot series. uBot-6 introduces multiple postural configurations that enable different modes of mobility and manipulation to meet the needs of a wide variety of tasks and environmental constraints. uBot-7 extends this with the use of series elastic actuators (SEAs) to improve manipulation capabilities and support safer operation around humans. The resourcefulness of these robots is complemented with a belief-space planning framework that enables task-driven action selection in the context of the partially observable environment. The framework uses a compact but expressive state representation based on object models. We extend an existing affordance-based object model, called an aspect transition graph (ATG), with geometric information. This enables object-centric modeling of features and actions, making the model much more expressive without increasing the complexity. A novel task representation enables the belief-space planner to perform general object-centric tasks ranging from recognition to manipulation of objects. The approach supports the efficient handling of multi-object scenes. The combination of the physical platform and the planning framework are evaluated in two novel, challenging, partially observable planning domains. The ARcube domain provides a large population of objects that are highly ambiguous. Objects can only be differentiated using multi-modal sensor information and manual interactions. In the dexterous mobility domain, a robot can employ multiple mobility modes to complete navigation tasks under a variety of possible environment constraints. The performance of the proposed approach is evaluated using experiments in simulation and on a real robot

ScholarWorks@UMass Amherst

HandMeThat: Human-Robot Communication in Physical and Social Environments

Author: Mao Jiayuan
Tenenbaum Joshua B.
Wan Yanming
Publication venue
Publication date: 05/10/2023
Field of study

We introduce HandMeThat, a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments. While previous datasets primarily focused on language grounding and planning, HandMeThat considers the resolution of human instructions with ambiguities based on the physical (object states and relations) and social (human actions and goals) information. HandMeThat contains 10,000 episodes of human-robot interactions. In each episode, the robot first observes a trajectory of human actions towards her internal goal. Next, the robot receives a human instruction and should take actions to accomplish the subgoal set through the instruction. In this paper, we present a textual interface for our benchmark, where the robot interacts with a virtual environment through textual commands. We evaluate several baseline models on HandMeThat, and show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat, suggesting significant room for future work on physical and social human-robot communications and interactions.Comment: NeurIPS 2022 (Dataset and Benchmark Track). First two authors contributed equally. Project page: http://handmethat.csail.mit.edu

arXiv.org e-Print Archive

Path Planning in Rough Terrain Using Neural Network Memory

Author: Alanís Alma Y.
Arana-Daniel Nancy
López-Franco Carlos
Valencia-Murillo Roberto
Villaseñor Carlos
Publication venue: 'IntechOpen'
Publication date: 20/12/2017
Field of study

Learning navigation policies in an unstructured terrain is a complex task. The Learning to Search (LEARCH) algorithm constructs cost functions that map environmental features to a certain cost for traversing a patch of terrain. These features are abstractions of the environment, in which trees, vegetation, slopes, water and rocks can be found, and the traversal costs are scalar values that represent the difficulty for a robot to cross given the patches of terrain. However, LEARCH tends to forget knowledge after new policies are learned. The study demonstrates that reinforcement learning and long-short-term memory (LSTM) neural networks can be used to provide a memory for LEARCH. Further, they allow the navigation agent to recognize hidden states of the state space it navigates. This new approach allows the knowledge learned in the previous training to be used to navigate new environments and, also, for retraining. Herein, navigation episodes are designed to confirm the memory, learning policy and hidden-state recognition capabilities, acquired by the navigation agent through the use of LSTM

IntechOpen

Crossref