886 research outputs found
MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs
We present multi-agent A* (MAA*), the first complete and optimal heuristic
search algorithm for solving decentralized partially-observable Markov decision
problems (DEC-POMDPs) with finite horizon. The algorithm is suitable for
computing optimal plans for a cooperative group of agents that operate in a
stochastic environment such as multirobot coordination, network traffic
control, `or distributed resource allocation. Solving such problems efiectively
is a major challenge in the area of planning under uncertainty. Our solution is
based on a synthesis of classical heuristic search and decentralized control
theory. Experimental results show that MAA* has significant advantages. We
introduce an anytime variant of MAA* and conclude with a discussion of
promising extensions such as an approach to solving infinite horizon problems.Comment: Appears in Proceedings of the Twenty-First Conference on Uncertainty
in Artificial Intelligence (UAI2005
Perseus: Randomized Point-based Value Iteration for POMDPs
Partially observable Markov decision processes (POMDPs) form an attractive
and principled framework for agent planning under uncertainty. Point-based
approximate techniques for POMDPs compute a policy based on a finite set of
points collected in advance from the agents belief space. We present a
randomized point-based value iteration algorithm called Perseus. The algorithm
performs approximate value backup stages, ensuring that in each backup stage
the value of each point in the belief set is improved; the key observation is
that a single backup may improve the value of many belief points. Contrary to
other point-based methods, Perseus backs up only a (randomly selected) subset
of points in the belief set, sufficient for improving the value of each belief
point in the set. We show how the same idea can be extended to dealing with
continuous action spaces. Experimental results show the potential of Perseus in
large scale POMDP problems
Influence-Optimistic Local Values for Multiagent Planning --- Extended Version
Recent years have seen the development of methods for multiagent planning
under uncertainty that scale to tens or even hundreds of agents. However, most
of these methods either make restrictive assumptions on the problem domain, or
provide approximate solutions without any guarantees on quality. Methods in the
former category typically build on heuristic search using upper bounds on the
value function. Unfortunately, no techniques exist to compute such upper bounds
for problems with non-factored value functions. To allow for meaningful
benchmarking through measurable quality guarantees on a very general class of
problems, this paper introduces a family of influence-optimistic upper bounds
for factored decentralized partially observable Markov decision processes
(Dec-POMDPs) that do not have factored value functions. Intuitively, we derive
bounds on very large multiagent planning problems by subdividing them in
sub-problems, and at each of these sub-problems making optimistic assumptions
with respect to the influence that will be exerted by the rest of the system.
We numerically compare the different upper bounds and demonstrate how we can
achieve a non-trivial guarantee that a heuristic solution for problems with
hundreds of agents is close to optimal. Furthermore, we provide evidence that
the upper bounds may improve the effectiveness of heuristic influence search,
and discuss further potential applications to multiagent planning.Comment: Long version of IJCAI 2015 paper (and extended abstract at AAMAS
2015
Expectation Optimization with Probabilistic Guarantees in POMDPs with Discounted-sum Objectives
Partially-observable Markov decision processes (POMDPs) with discounted-sum
payoff are a standard framework to model a wide range of problems related to
decision making under uncertainty. Traditionally, the goal has been to obtain
policies that optimize the expectation of the discounted-sum payoff. A key
drawback of the expectation measure is that even low probability events with
extreme payoff can significantly affect the expectation, and thus the obtained
policies are not necessarily risk-averse. An alternate approach is to optimize
the probability that the payoff is above a certain threshold, which allows
obtaining risk-averse policies, but ignores optimization of the expectation. We
consider the expectation optimization with probabilistic guarantee (EOPG)
problem, where the goal is to optimize the expectation ensuring that the payoff
is above a given threshold with at least a specified probability. We present
several results on the EOPG problem, including the first algorithm to solve it.Comment: Full version of a paper published at IJCAI/ECAI 201
Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs (Extended Version)
Many exact and approximate solution methods for Markov Decision Processes
(MDPs) attempt to exploit structure in the problem and are based on
factorization of the value function. Especially multiagent settings, however,
are known to suffer from an exponential increase in value component sizes as
interactions become denser, meaning that approximation architectures are
restricted in the problem sizes and types they can handle. We present an
approach to mitigate this limitation for certain types of multiagent systems,
exploiting a property that can be thought of as "anonymous influence" in the
factored MDP. Anonymous influence summarizes joint variable effects efficiently
whenever the explicit representation of variable identity in the problem can be
avoided. We show how representational benefits from anonymity translate into
computational efficiencies, both for general variable elimination in a factor
graph but in particular also for the approximate linear programming solution to
factored MDPs. The latter allows to scale linear programming to factored MDPs
that were previously unsolvable. Our results are shown for the control of a
stochastic disease process over a densely connected graph with 50 nodes and 25
agents.Comment: Extended version of AAAI 2016 pape
Restricted Value Iteration: Theory and Algorithms
Value iteration is a popular algorithm for finding near optimal policies for
POMDPs. It is inefficient due to the need to account for the entire belief
space, which necessitates the solution of large numbers of linear programs. In
this paper, we study value iteration restricted to belief subsets. We show
that, together with properly chosen belief subsets, restricted value iteration
yields near-optimal policies and we give a condition for determining whether a
given belief subset would bring about savings in space and time. We also apply
restricted value iteration to two interesting classes of POMDPs, namely
informative POMDPs and near-discernible POMDPs
- …