159 research outputs found
Optimal Cost Almost-sure Reachability in POMDPs
We consider partially observable Markov decision processes (POMDPs) with a
set of target states and every transition is associated with an integer cost.
The optimization objective we study asks to minimize the expected total cost
till the target set is reached, while ensuring that the target set is reached
almost-surely (with probability 1). We show that for integer costs
approximating the optimal cost is undecidable. For positive costs, our results
are as follows: (i) we establish matching lower and upper bounds for the
optimal cost and the bound is double exponential; (ii) we show that the problem
of approximating the optimal cost is decidable and present approximation
algorithms developing on the existing algorithms for POMDPs with finite-horizon
objectives. While the worst-case running time of our algorithm is double
exponential, we also present efficient stopping criteria for the algorithm and
show experimentally that it performs well in many examples of interest.Comment: Full Version of Optimal Cost Almost-sure Reachability in POMDPs, AAAI
2015. arXiv admin note: text overlap with arXiv:1207.4166 by other author
Sensor Synthesis for POMDPs with Reachability Objectives
Partially observable Markov decision processes (POMDPs) are widely used in
probabilistic planning problems in which an agent interacts with an environment
using noisy and imprecise sensors. We study a setting in which the sensors are
only partially defined and the goal is to synthesize "weakest" additional
sensors, such that in the resulting POMDP, there is a small-memory policy for
the agent that almost-surely (with probability~1) satisfies a reachability
objective. We show that the problem is NP-complete, and present a symbolic
algorithm by encoding the problem into SAT instances. We illustrate trade-offs
between the amount of memory of the policy and the number of additional sensors
on a simple example. We have implemented our approach and consider three
classical POMDP examples from the literature, and show that in all the examples
the number of sensors can be significantly decreased (as compared to the
existing solutions in the literature) without increasing the complexity of the
policies.Comment: arXiv admin note: text overlap with arXiv:1511.0845
Stochastic Shortest Path with Energy Constraints in POMDPs
We consider partially observable Markov decision processes (POMDPs) with a
set of target states and positive integer costs associated with every
transition. The traditional optimization objective (stochastic shortest path)
asks to minimize the expected total cost until the target set is reached. We
extend the traditional framework of POMDPs to model energy consumption, which
represents a hard constraint. The energy levels may increase and decrease with
transitions, and the hard constraint requires that the energy level must remain
positive in all steps till the target is reached. First, we present a novel
algorithm for solving POMDPs with energy levels, developing on existing POMDP
solvers and using RTDP as its main method. Our second contribution is related
to policy representation. For larger POMDP instances the policies computed by
existing solvers are too large to be understandable. We present an automated
procedure based on machine learning techniques that automatically extracts
important decisions of the policy allowing us to compute succinct human
readable policies. Finally, we show experimentally that our algorithm performs
well and computes succinct policies on a number of POMDP instances from the
literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of
AAMAS 201
Parameter-Independent Strategies for pMDPs via POMDPs
Markov Decision Processes (MDPs) are a popular class of models suitable for
solving control decision problems in probabilistic reactive systems. We
consider parametric MDPs (pMDPs) that include parameters in some of the
transition probabilities to account for stochastic uncertainties of the
environment such as noise or input disturbances.
We study pMDPs with reachability objectives where the parameter values are
unknown and impossible to measure directly during execution, but there is a
probability distribution known over the parameter values. We study for the
first time computing parameter-independent strategies that are expectation
optimal, i.e., optimize the expected reachability probability under the
probability distribution over the parameters. We present an encoding of our
problem to partially observable MDPs (POMDPs), i.e., a reduction of our problem
to computing optimal strategies in POMDPs.
We evaluate our method experimentally on several benchmarks: a motivating
(repeated) learner model; a series of benchmarks of varying configurations of a
robot moving on a grid; and a consensus protocol.Comment: Extended version of a QEST 2018 pape
IST Austria Technical Report
We consider partially observable Markov decision processes (POMDPs) with a set of target states and every transition is associated with an integer cost. The optimization objective we study asks to minimize the expected total cost till the target set is reached, while ensuring that the target set is reached almost-surely (with probability 1). We show that for integer costs approximating the optimal cost is undecidable. For positive costs, our results are as follows: (i) we establish matching lower and upper bounds for the optimal cost and the bound is double exponential; (ii) we show that the problem of approximating the optimal cost is decidable and present approximation algorithms developing on the existing algorithms for POMDPs with finite-horizon objectives. While the worst-case running time of our algorithm is double exponential, we also present efficient stopping criteria for the algorithm and show experimentally that it performs well in many examples of interest
POMDPs under Probabilistic Semantics
We consider partially observable Markov decision processes (POMDPs) with
limit-average payoff, where a reward value in the interval [0,1] is associated
to every transition, and the payoff of an infinite path is the long-run average
of the rewards. We consider two types of path constraints: (i) quantitative
constraint defines the set of paths where the payoff is at least a given
threshold lambda_1 in (0,1]; and (ii) qualitative constraint which is a special
case of quantitative constraint with lambda_1=1. We consider the computation of
the almost-sure winning set, where the controller needs to ensure that the path
constraint is satisfied with probability 1. Our main results for qualitative
path constraint are as follows: (i) the problem of deciding the existence of a
finite-memory controller is EXPTIME-complete; and (ii) the problem of deciding
the existence of an infinite-memory controller is undecidable. For quantitative
path constraint we show that the problem of deciding the existence of a
finite-memory controller is undecidable.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
IST Austria Technical Report
DEC-POMDPs extend POMDPs to a multi-agent setting, where several agents operate in an uncertain environment independently to achieve a joint objective. DEC-POMDPs have been studied with finite-horizon and infinite-horizon discounted-sum objectives, and there exist solvers both for exact and approximate solutions. In this work we consider Goal-DEC-POMDPs, where given a set of target states, the objective is to ensure that the target set is reached with minimal cost. We consider the indefinite-horizon (infinite-horizon with either discounted-sum, or undiscounted-sum, where absorbing goal states have zero-cost) problem. We present a new method to solve the problem that extends methods for finite-horizon DEC- POMDPs and the RTDP-Bel approach for POMDPs. We present experimental results on several examples, and show our approach presents promising results
IST Austria Technical Report
POMDPs are standard models for probabilistic planning problems, where an agent interacts with an uncertain environment. We study the problem of almost-sure reachability, where given a set of target states, the question is to decide whether there is a policy to ensure that the target set is reached with probability 1 (almost-surely). While in general the problem is EXPTIME-complete, in many practical cases policies with a small amount of memory suffice. Moreover, the existing solution to the problem is explicit, which first requires to construct explicitly an exponential reduction to a belief-support MDP. In this work, we first study the existence of observation-stationary strategies, which is NP-complete, and then small-memory strategies. We present a symbolic algorithm by an efficient encoding to SAT and using a SAT solver for the problem. We report experimental results demonstrating the scalability of our symbolic (SAT-based) approach
- …