8,641 research outputs found
Parameter-Independent Strategies for pMDPs via POMDPs
Markov Decision Processes (MDPs) are a popular class of models suitable for
solving control decision problems in probabilistic reactive systems. We
consider parametric MDPs (pMDPs) that include parameters in some of the
transition probabilities to account for stochastic uncertainties of the
environment such as noise or input disturbances.
We study pMDPs with reachability objectives where the parameter values are
unknown and impossible to measure directly during execution, but there is a
probability distribution known over the parameter values. We study for the
first time computing parameter-independent strategies that are expectation
optimal, i.e., optimize the expected reachability probability under the
probability distribution over the parameters. We present an encoding of our
problem to partially observable MDPs (POMDPs), i.e., a reduction of our problem
to computing optimal strategies in POMDPs.
We evaluate our method experimentally on several benchmarks: a motivating
(repeated) learner model; a series of benchmarks of varying configurations of a
robot moving on a grid; and a consensus protocol.Comment: Extended version of a QEST 2018 pape
Perseus: Randomized Point-based Value Iteration for POMDPs
Partially observable Markov decision processes (POMDPs) form an attractive
and principled framework for agent planning under uncertainty. Point-based
approximate techniques for POMDPs compute a policy based on a finite set of
points collected in advance from the agents belief space. We present a
randomized point-based value iteration algorithm called Perseus. The algorithm
performs approximate value backup stages, ensuring that in each backup stage
the value of each point in the belief set is improved; the key observation is
that a single backup may improve the value of many belief points. Contrary to
other point-based methods, Perseus backs up only a (randomly selected) subset
of points in the belief set, sufficient for improving the value of each belief
point in the set. We show how the same idea can be extended to dealing with
continuous action spaces. Experimental results show the potential of Perseus in
large scale POMDP problems
- …