3,652 research outputs found
Decentralized Control of Partially Observable Markov Decision Processes using Belief Space Macro-actions
The focus of this paper is on solving multi-robot planning problems in
continuous spaces with partial observability. Decentralized partially
observable Markov decision processes (Dec-POMDPs) are general models for
multi-robot coordination problems, but representing and solving Dec-POMDPs is
often intractable for large problems. To allow for a high-level representation
that is natural for multi-robot problems and scalable to large discrete and
continuous problems, this paper extends the Dec-POMDP model to the
decentralized partially observable semi-Markov decision process (Dec-POSMDP).
The Dec-POSMDP formulation allows asynchronous decision-making by the robots,
which is crucial in multi-robot domains. We also present an algorithm for
solving this Dec-POSMDP which is much more scalable than previous methods since
it can incorporate closed-loop belief space macro-actions in planning. These
macro-actions are automatically constructed to produce robust solutions. The
proposed method's performance is evaluated on a complex multi-robot package
delivery problem under uncertainty, showing that our approach can naturally
represent multi-robot problems and provide high-quality solutions for
large-scale problems
Decentralized control of Partially Observable Markov Decision Processes using belief space macro-actions
The focus of this paper is on solving multi-robot planning problems in continuous spaces with partial observability. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are general models for multi-robot coordination problems, but representing and solving Dec-POMDPs is often intractable for large problems. To allow for a high-level representation that is natural for multi-robot problems and scalable to large discrete and continuous problems, this paper extends the Dec-POMDP model to the Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP). The Dec-POSMDP formulation allows asynchronous decision-making by the robots, which is crucial in multi-robot domains. We also present an algorithm for solving this Dec-POSMDP which is much more scalable than previous methods since it can incorporate closed-loop belief space macro-actions in planning. These macro-actions are automatically constructed to produce robust solutions. The proposed method's performance is evaluated on a complex multi-robot package delivery problem under uncertainty, showing that our approach can naturally represent multi-robot problems and provide high-quality solutions for large-scale problems
Learning for Multi-robot Cooperation in Partially Observable Stochastic Environments with Macro-actions
This paper presents a data-driven approach for multi-robot coordination in
partially-observable domains based on Decentralized Partially Observable Markov
Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide a
general framework for cooperative sequential decision making under uncertainty
and MAs allow temporally extended and asynchronous action execution. To date,
most methods assume the underlying Dec-POMDP model is known a priori or a full
simulator is available during planning time. Previous methods which aim to
address these issues suffer from local optimality and sensitivity to initial
conditions. Additionally, few hardware demonstrations involving a large team of
heterogeneous robots and with long planning horizons exist. This work addresses
these gaps by proposing an iterative sampling based Expectation-Maximization
algorithm (iSEM) to learn polices using only trajectory data containing
observations, MAs, and rewards. Our experiments show the algorithm is able to
achieve better solution quality than the state-of-the-art learning-based
methods. We implement two variants of multi-robot Search and Rescue (SAR)
domains (with and without obstacles) on hardware to demonstrate the learned
policies can effectively control a team of distributed robots to cooperate in a
partially observable stochastic environment.Comment: Accepted to the 2017 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS 2017
Planning for Decentralized Control of Multiple Robots Under Uncertainty
We describe a probabilistic framework for synthesizing control policies for
general multi-robot systems, given environment and sensor models and a cost
function. Decentralized, partially observable Markov decision processes
(Dec-POMDPs) are a general model of decision processes where a team of agents
must cooperate to optimize some objective (specified by a shared reward or cost
function) in the presence of uncertainty, but where communication limitations
mean that the agents cannot share their state, so execution must proceed in a
decentralized fashion. While Dec-POMDPs are typically intractable to solve for
real-world problems, recent research on the use of macro-actions in Dec-POMDPs
has significantly increased the size of problem that can be practically solved
as a Dec-POMDP. We describe this general model, and show how, in contrast to
most existing methods that are specialized to a particular problem class, it
can synthesize control policies that use whatever opportunities for
coordination are present in the problem, while balancing off uncertainty in
outcomes, sensor information, and information about other agents. We use three
variations on a warehouse task to show that a single planner of this type can
generate cooperative behavior using task allocation, direct communication, and
signaling, as appropriate
Applications of DEC-MDPs in multi-robot systems
International audienceOptimizing the operation of cooperative multi-robot systems that can cooperatively act in large and complex environments has become an important focal area of research. This issue is motivated by many applications involving a set of cooperative robots that have to decide in a decentralized way how to execute a large set of tasks in partially observable and uncertain environments. Such decision problems are encountered while developing exploration rovers, teams of patrolling robots, rescue-robot colonies, mine-clearance robots, et cetera.In this chapter, we introduce problematics related to the decentralized control of multi-robot systems. We rst describe some applicative domains and review the main characteristics of the decision problems the robots must deal with. Then, we review some existing approaches to solve problems of multiagent decen- tralized control in stochastic environments. We present the Decentralized Markov Decision Processes and discuss their applicability to real-world multi-robot applications. Then, we introduce OC-DEC-MDPs and 2V-DEC-MDPs which have been developed to increase the applicability of DEC-MDPs
MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs
We present multi-agent A* (MAA*), the first complete and optimal heuristic
search algorithm for solving decentralized partially-observable Markov decision
problems (DEC-POMDPs) with finite horizon. The algorithm is suitable for
computing optimal plans for a cooperative group of agents that operate in a
stochastic environment such as multirobot coordination, network traffic
control, `or distributed resource allocation. Solving such problems efiectively
is a major challenge in the area of planning under uncertainty. Our solution is
based on a synthesis of classical heuristic search and decentralized control
theory. Experimental results show that MAA* has significant advantages. We
introduce an anytime variant of MAA* and conclude with a discussion of
promising extensions such as an approach to solving infinite horizon problems.Comment: Appears in Proceedings of the Twenty-First Conference on Uncertainty
in Artificial Intelligence (UAI2005
Inverse Reinforcement Learning in Swarm Systems
Inverse reinforcement learning (IRL) has become a useful tool for learning
behavioral models from demonstration data. However, IRL remains mostly
unexplored for multi-agent systems. In this paper, we show how the principle of
IRL can be extended to homogeneous large-scale problems, inspired by the
collective swarming behavior of natural systems. In particular, we make the
following contributions to the field: 1) We introduce the swarMDP framework, a
sub-class of decentralized partially observable Markov decision processes
endowed with a swarm characterization. 2) Exploiting the inherent homogeneity
of this framework, we reduce the resulting multi-agent IRL problem to a
single-agent one by proving that the agent-specific value functions in this
model coincide. 3) To solve the corresponding control problem, we propose a
novel heterogeneous learning scheme that is particularly tailored to the swarm
setting. Results on two example systems demonstrate that our framework is able
to produce meaningful local reward models from which we can replicate the
observed global system dynamics.Comment: 9 pages, 8 figures; ### Version 2 ### version accepted at AAMAS 201
- …