12,717 research outputs found
Search and Explore: Symbiotic Policy Synthesis in POMDPs
This paper marries two state-of-the-art controller synthesis methods for
partially observable Markov decision processes (POMDPs), a prominent model in
sequential decision making under uncertainty. A central issue is to find a
POMDP controller - that solely decides based on the observations seen so far -
to achieve a total expected reward objective. As finding optimal controllers is
undecidable, we concentrate on synthesising good finite-state controllers
(FSCs). We do so by tightly integrating two modern, orthogonal methods for
POMDP controller synthesis: a belief-based and an inductive approach. The
former method obtains an FSC from a finite fragment of the so-called belief
MDP, an MDP that keeps track of the probabilities of equally observable POMDP
states. The latter is an inductive search technique over a set of FSCs, e.g.,
controllers with a fixed memory size. The key result of this paper is a
symbiotic anytime algorithm that tightly integrates both approaches such that
each profits from the controllers constructed by the other. Experimental
results indicate a substantial improvement in the value of the controllers
while significantly reducing the synthesis time and memory footprint.Comment: Accepted to CAV 202
Anytime Point-Based Approximations for Large POMDPs
The Partially Observable Markov Decision Process has long been recognized as
a rich framework for real-world planning and control problems, especially in
robotics. However exact solutions in this framework are typically
computationally intractable for all but the smallest problems. A well-known
technique for speeding up POMDP solving involves performing value backups at
specific belief points, rather than over the entire belief simplex. The
efficiency of this approach, however, depends greatly on the selection of
points. This paper presents a set of novel techniques for selecting informative
belief points which work well in practice. The point selection procedure is
combined with point-based value backups to form an effective anytime POMDP
algorithm called Point-Based Value Iteration (PBVI). The first aim of this
paper is to introduce this algorithm and present a theoretical analysis
justifying the choice of belief selection technique. The second aim of this
paper is to provide a thorough empirical comparison between PBVI and other
state-of-the-art POMDP methods, in particular the Perseus algorithm, in an
effort to highlight their similarities and differences. Evaluation is performed
using both standard POMDP domains and realistic robotic tasks
Expectation Optimization with Probabilistic Guarantees in POMDPs with Discounted-sum Objectives
Partially-observable Markov decision processes (POMDPs) with discounted-sum
payoff are a standard framework to model a wide range of problems related to
decision making under uncertainty. Traditionally, the goal has been to obtain
policies that optimize the expectation of the discounted-sum payoff. A key
drawback of the expectation measure is that even low probability events with
extreme payoff can significantly affect the expectation, and thus the obtained
policies are not necessarily risk-averse. An alternate approach is to optimize
the probability that the payoff is above a certain threshold, which allows
obtaining risk-averse policies, but ignores optimization of the expectation. We
consider the expectation optimization with probabilistic guarantee (EOPG)
problem, where the goal is to optimize the expectation ensuring that the payoff
is above a given threshold with at least a specified probability. We present
several results on the EOPG problem, including the first algorithm to solve it.Comment: Full version of a paper published at IJCAI/ECAI 201
- …