Search CORE

2,862 research outputs found

Learning for Multi-robot Cooperation in Partially Observable Stochastic Environments with Macro-actions

Author: Amato Christopher
How Jonathan P.
Liu Miao
Omidshafiei Shayegan
Sivakumar Kavinayan
Publication venue
Publication date: 17/08/2017
Field of study

This paper presents a data-driven approach for multi-robot coordination in partially-observable domains based on Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a general framework for cooperative sequential decision making under uncertainty and MAs allow temporally extended and asynchronous action execution. To date, most methods assume the underlying Dec-POMDP model is known a priori or a full simulator is available during planning time. Previous methods which aim to address these issues suffer from local optimality and sensitivity to initial conditions. Additionally, few hardware demonstrations involving a large team of heterogeneous robots and with long planning horizons exist. This work addresses these gaps by proposing an iterative sampling based Expectation-Maximization algorithm (iSEM) to learn polices using only trajectory data containing observations, MAs, and rewards. Our experiments show the algorithm is able to achieve better solution quality than the state-of-the-art learning-based methods. We implement two variants of multi-robot Search and Rescue (SAR) domains (with and without obstacles) on hardware to demonstrate the learned policies can effectively control a team of distributed robots to cooperate in a partially observable stochastic environment.Comment: Accepted to the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017

arXiv.org e-Print Archive

DSpace@MIT

Solving Factored MDPs with Hybrid State and Action Variables

Author: Guestrin C.
Hauskrecht M.
Kveton B.
Publication venue: 'AI Access Foundation'
Publication date: 30/09/2011
Field of study

Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming. We analyze both theoretical and computational aspects of this approach, and demonstrate its scale-up potential on several hybrid optimization problems

arXiv.org e-Print Archive

Crossref

Maximizing the probability of attaining a target prior to extinction

Author: Abate
Aliprantis
Bertsekas
Bertsekas
Boda
Borkar
Borkar
Bouakiz
Chatterjee
Debasish Chatterjee
Derman
Digaĭlova
Dubins
Dynkin
Eaton
Eugenio Cinquemani
Gao
Hernández-Lerma
Hernández-Lerma
John Lygeros
Kesten
Kushner
Kwiatkowska
Levin
Meyn
Ohtsubo
Peskir
Powell
Prajna
Prandini
Ramponi
Rao
Schmidli
Simon
Summers
Tomlin
Whittle
Zhu
Publication venue: 'Elsevier BV'
Publication date: 27/11/2009
Field of study

We present a dynamic programming-based solution to the problem of maximizing the probability of attaining a target set before hitting a cemetery set for a discrete-time Markov control process. Under mild hypotheses we establish that there exists a deterministic stationary policy that achieves the maximum value of this probability. We demonstrate how the maximization of this probability can be computed through the maximization of an expected total reward until the first hitting time to either the target or the cemetery set. Martingale characterizations of thrifty, equalizing, and optimal policies in the context of our problem are also established.Comment: 22 pages, 1 figure. Revise

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server