Search CORE

169 research outputs found

Stick-Breaking Policy Learning in Dec-POMDPs

Author: Amato Christopher
Carin Lawrence
How Jonathan P.
Liao Xuejun
Liu Miao
Publication venue
Publication date: 01/07/2015
Field of study

Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from optimal. This paper considers a variable-size FSC to represent the local policy of each agent. These variable-size FSCs are constructed using a stick-breaking prior, leading to a new framework called \emph{decentralized stick-breaking policy representation} (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the Dec-POMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods

arXiv.org e-Print Archive

DSpace@MIT

Learning for Multi-robot Cooperation in Partially Observable Stochastic Environments with Macro-actions

Author: Amato Christopher
How Jonathan P.
Liu Miao
Omidshafiei Shayegan
Sivakumar Kavinayan
Publication venue
Publication date: 17/08/2017
Field of study

This paper presents a data-driven approach for multi-robot coordination in partially-observable domains based on Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a general framework for cooperative sequential decision making under uncertainty and MAs allow temporally extended and asynchronous action execution. To date, most methods assume the underlying Dec-POMDP model is known a priori or a full simulator is available during planning time. Previous methods which aim to address these issues suffer from local optimality and sensitivity to initial conditions. Additionally, few hardware demonstrations involving a large team of heterogeneous robots and with long planning horizons exist. This work addresses these gaps by proposing an iterative sampling based Expectation-Maximization algorithm (iSEM) to learn polices using only trajectory data containing observations, MAs, and rewards. Our experiments show the algorithm is able to achieve better solution quality than the state-of-the-art learning-based methods. We implement two variants of multi-robot Search and Rescue (SAR) domains (with and without obstacles) on hardware to demonstrate the learned policies can effectively control a team of distributed robots to cooperate in a partially observable stochastic environment.Comment: Accepted to the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017

arXiv.org e-Print Archive

DSpace@MIT

Influence-Optimistic Local Values for Multiagent Planning --- Extended Version

Author: Oliehoek Frans A.
Spaan Matthijs T. J.
Witwicki Stefan
Publication venue
Publication date: 20/07/2015
Field of study

Recent years have seen the development of methods for multiagent planning under uncertainty that scale to tens or even hundreds of agents. However, most of these methods either make restrictive assumptions on the problem domain, or provide approximate solutions without any guarantees on quality. Methods in the former category typically build on heuristic search using upper bounds on the value function. Unfortunately, no techniques exist to compute such upper bounds for problems with non-factored value functions. To allow for meaningful benchmarking through measurable quality guarantees on a very general class of problems, this paper introduces a family of influence-optimistic upper bounds for factored decentralized partially observable Markov decision processes (Dec-POMDPs) that do not have factored value functions. Intuitively, we derive bounds on very large multiagent planning problems by subdividing them in sub-problems, and at each of these sub-problems making optimistic assumptions with respect to the influence that will be exerted by the rest of the system. We numerically compare the different upper bounds and demonstrate how we can achieve a non-trivial guarantee that a heuristic solution for problems with hundreds of agents is close to optimal. Furthermore, we provide evidence that the upper bounds may improve the effectiveness of heuristic influence search, and discuss further potential applications to multiagent planning.Comment: Long version of IJCAI 2015 paper (and extended abstract at AAMAS 2015

arXiv.org e-Print Archive

University of Liverpool Repository

CiteSeerX

Apprendre à agir dans un Dec-POMDP

Author: Buffet Olivier
Dibangoye Jilles
Publication venue: HAL CCSD
Publication date: 07/06/2018
Field of study

We address a long-standing open problem of reinforcement learning in decentralized partiallyobservable Markov decision processes. Previous attempts focussed on different forms of generalized policyiteration, which at best led to local optima. In this paper, we restrict attention to plans, which are simplerto store and update than policies. We derive, under certain conditions, the first near-optimal cooperativemulti-agent reinforcement learning algorithm. To achieve significant scalability gains, we replace the greedymaximization by mixed-integer linear programming. Experiments show our approach can learn to actnear-optimally in many finite domains from the literature

INRIA a CCSD electronic archive server

Learning to Act in Continuous Dec-POMDPs

Author: Buffet Olivier
Dibangoye Jilles,
Publication venue: HAL CCSD
Publication date: 02/07/2018
Field of study

National audienceWe address a long-standing open problem of reinforcement learning in continuous decentralized partially observable Markov decision processes. Previous attempts focused on different forms of generalized policy iteration, which at best led to local optima. In this paper, we restrict attention to plans, which are simpler to store and update than policies. We derive, under mild conditions, the first optimal cooperative multi-agent reinforcement learning algorithm. To achieve significant scalability gains, we replace the greedy maximization by mixed-integer linear programming. Experiments show our approach can learn to act optimally in many finite domains from the literature.Nous nous attaquons au problème d'apprentissage par renforcement dans le cadre des processus décisionnels de Markov partiellement observables et décentralisés. Les tentatives précédentes ont conduit à différentes variantes de la méthode généralisée d'itération de politiques, qui dans le meilleur des cas abouties à des optima locaux. Dans ce papier, nous nous restreindrons au plans, qui sont des formes plus simples que des politiques. Nous dériverons, sous certaines conditions, le premier algorithme optimal d'apprentissage par renforcement coopératif. Afin d'accroître le passage a l'échelle de cet algorithme, nous remplacerons l'opérateur glouton traditionnel par un programme linéaire en nombre entier. Les résultats expérimentaux montrent que notre méthode est capable d'apprendre de façon optimale dans plusieurs bancs de test de la littérature

INRIA a CCSD electronic archive server

Learning to Act in Decentralized Partially Observable MDPs

Author: Buffet Olivier
Dibangoye Jilles
Publication venue: PMLR
Publication date: 10/07/2018
Field of study

International audienceWe address a long-standing open problem of reinforcement learning in decentralized partially observable Markov decision processes. Previous attempts focussed on different forms of generalized policy iteration, which at best led to local optima. In this paper, we restrict attention to plans, which are simpler to store and update than policies. We derive, under certain conditions, the first near-optimal cooperative multi-agent reinforcement learning algorithm. To achieve significant scalability gains, we replace the greedy maximization by mixed-integer linear programming. Experiments show our approach can learn to act near-optimally in many finite domains from the literature

INRIA a CCSD electronic archive server