12 research outputs found
POMDPs under Probabilistic Semantics
We consider partially observable Markov decision processes (POMDPs) with
limit-average payoff, where a reward value in the interval [0,1] is associated
to every transition, and the payoff of an infinite path is the long-run average
of the rewards. We consider two types of path constraints: (i) quantitative
constraint defines the set of paths where the payoff is at least a given
threshold lambda_1 in (0,1]; and (ii) qualitative constraint which is a special
case of quantitative constraint with lambda_1=1. We consider the computation of
the almost-sure winning set, where the controller needs to ensure that the path
constraint is satisfied with probability 1. Our main results for qualitative
path constraint are as follows: (i) the problem of deciding the existence of a
finite-memory controller is EXPTIME-complete; and (ii) the problem of deciding
the existence of an infinite-memory controller is undecidable. For quantitative
path constraint we show that the problem of deciding the existence of a
finite-memory controller is undecidable.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
Generalized planning: Non-deterministic abstractions and trajectory constraints
We study the characterization and computation of general policies for families of problems that share a structure characterized by a common reduction into a single abstract problem. Policies mu that solve the abstract problem P have been shown to solve all problems Q that reduce to P provided that mu terminates in Q. In this work, we shed light on why this termination condition is needed and how it can be removed. The key observation is that the abstract problem P captures the common structure among the concrete problems Q that is local (Markovian) but misses common structure that is global. We show how such global structure can be captured by means of trajectory constraints that in many cases can be expressed as LTL formulas, thus reducing generalized planning to LTL synthesis. Moreover, for a broad class of problems that involve integer variables that can be increased or decreased, trajectory constraints can be compiled away, reducing generalized planning to fully observable nondeterministic planning
Parameter-Independent Strategies for pMDPs via POMDPs
Markov Decision Processes (MDPs) are a popular class of models suitable for
solving control decision problems in probabilistic reactive systems. We
consider parametric MDPs (pMDPs) that include parameters in some of the
transition probabilities to account for stochastic uncertainties of the
environment such as noise or input disturbances.
We study pMDPs with reachability objectives where the parameter values are
unknown and impossible to measure directly during execution, but there is a
probability distribution known over the parameter values. We study for the
first time computing parameter-independent strategies that are expectation
optimal, i.e., optimize the expected reachability probability under the
probability distribution over the parameters. We present an encoding of our
problem to partially observable MDPs (POMDPs), i.e., a reduction of our problem
to computing optimal strategies in POMDPs.
We evaluate our method experimentally on several benchmarks: a motivating
(repeated) learner model; a series of benchmarks of varying configurations of a
robot moving on a grid; and a consensus protocol.Comment: Extended version of a QEST 2018 pape
Randomness for Free
We consider two-player zero-sum games on graphs. These games can be
classified on the basis of the information of the players and on the mode of
interaction between them. On the basis of information the classification is as
follows: (a) partial-observation (both players have partial view of the game);
(b) one-sided complete-observation (one player has complete observation); and
(c) complete-observation (both players have complete view of the game). On the
basis of mode of interaction we have the following classification: (a)
concurrent (both players interact simultaneously); and (b) turn-based (both
players interact in turn). The two sources of randomness in these games are
randomness in transition function and randomness in strategies. In general,
randomized strategies are more powerful than deterministic strategies, and
randomness in transitions gives more general classes of games. In this work we
present a complete characterization for the classes of games where randomness
is not helpful in: (a) the transition function probabilistic transition can be
simulated by deterministic transition); and (b) strategies (pure strategies are
as powerful as randomized strategies). As consequence of our characterization
we obtain new undecidability results for these games
LNCS
We study turn-based stochastic zero-sum games with lexicographic preferences over reachability and safety objectives. Stochastic games are standard models in control, verification, and synthesis of stochastic reactive systems that exhibit both randomness as well as angelic and demonic non-determinism. Lexicographic order allows to consider multiple objectives with a strict preference order over the satisfaction of the objectives. To the best of our knowledge, stochastic games with lexicographic objectives have not been studied before. We establish determinacy of such games and present strategy and computational complexity results. For strategy complexity, we show that lexicographically optimal strategies exist that are deterministic and memory is only required to remember the already satisfied and violated objectives. For a constant number of objectives, we show that the relevant decision problem is in NP∩coNP , matching the current known bound for single objectives; and in general the decision problem is PSPACE -hard and can be solved in NEXPTIME∩coNEXPTIME . We present an algorithm that computes the lexicographically optimal strategies via a reduction to computation of optimal strategies in a sequence of single-objectives games. We have implemented our algorithm and report experimental results on various case studies
Stochastic Games with Lexicographic Reachability-Safety Objectives
We study turn-based stochastic zero-sum games with lexicographic preferences
over reachability and safety objectives. Stochastic games are standard models
in control, verification, and synthesis of stochastic reactive systems that
exhibit both randomness as well as angelic and demonic non-determinism.
Lexicographic order allows to consider multiple objectives with a strict
preference order over the satisfaction of the objectives. To the best of our
knowledge, stochastic games with lexicographic objectives have not been studied
before. We establish determinacy of such games and present strategy and
computational complexity results. For strategy complexity, we show that
lexicographically optimal strategies exist that are deterministic and memory is
only required to remember the already satisfied and violated objectives. For a
constant number of objectives, we show that the relevant decision problem is in
NP coNP, matching the current known bound for single objectives; and in
general the decision problem is PSPACE-hard and can be solved in NEXPTIME
coNEXPTIME. We present an algorithm that computes the lexicographically
optimal strategies via a reduction to computation of optimal strategies in a
sequence of single-objectives games. We have implemented our algorithm and
report experimental results on various case studies.Comment: Full version (33 pages) of CAV20 conference paper; including an
appendix with technical proof
IST Austria Thesis
This dissertation concerns the automatic verification of probabilistic systems and programs with arrays by statistical and logical methods. Although statistical and logical methods are different in nature, we show that they can be successfully combined for system analysis. In the first part of the dissertation we present a new statistical algorithm for the verification of probabilistic systems with respect to unbounded properties, including linear temporal logic. Our algorithm often performs faster than the previous approaches, and at the same time requires less information about the system. In addition, our method can be generalized to unbounded quantitative properties such as mean-payoff bounds. In the second part, we introduce two techniques for comparing probabilistic systems. Probabilistic systems are typically compared using the notion of equivalence, which requires the systems to have the equal probability of all behaviors. However, this notion is often too strict, since probabilities are typically only empirically estimated, and any imprecision may break the relation between processes. On the one hand, we propose to replace the Boolean notion of equivalence by a quantitative distance of similarity. For this purpose, we introduce a statistical framework for estimating distances between Markov chains based on their simulation runs, and we investigate which distances can be approximated in our framework. On the other hand, we propose to compare systems with respect to a new qualitative logic, which expresses that behaviors occur with probability one or a positive probability. This qualitative analysis is robust with respect to modeling errors and applicable to many domains. In the last part, we present a new quantifier-free logic for integer arrays, which allows us to express counting. Counting properties are prevalent in array-manipulating programs, however they cannot be expressed in the quantified fragments of the theory of arrays. We present a decision procedure for our logic, and provide several complexity results