31,566 research outputs found
Markov Decision Processes with Multiple Long-run Average Objectives
We study Markov decision processes (MDPs) with multiple limit-average (or
mean-payoff) functions. We consider two different objectives, namely,
expectation and satisfaction objectives. Given an MDP with k limit-average
functions, in the expectation objective the goal is to maximize the expected
limit-average value, and in the satisfaction objective the goal is to maximize
the probability of runs such that the limit-average value stays above a given
vector. We show that under the expectation objective, in contrast to the case
of one limit-average function, both randomization and memory are necessary for
strategies even for epsilon-approximation, and that finite-memory randomized
strategies are sufficient for achieving Pareto optimal values. Under the
satisfaction objective, in contrast to the case of one limit-average function,
infinite memory is necessary for strategies achieving a specific value (i.e.
randomized finite-memory strategies are not sufficient), whereas memoryless
randomized strategies are sufficient for epsilon-approximation, for all
epsilon>0. We further prove that the decision problems for both expectation and
satisfaction objectives can be solved in polynomial time and the trade-off
curve (Pareto curve) can be epsilon-approximated in time polynomial in the size
of the MDP and 1/epsilon, and exponential in the number of limit-average
functions, for all epsilon>0. Our analysis also reveals flaws in previous work
for MDPs with multiple mean-payoff functions under the expectation objective,
corrects the flaws, and allows us to obtain improved results
Value Iteration for Long-run Average Reward in Markov Decision Processes
Markov decision processes (MDPs) are standard models for probabilistic
systems with non-deterministic behaviours. Long-run average rewards provide a
mathematically elegant formalism for expressing long term performance. Value
iteration (VI) is one of the simplest and most efficient algorithmic approaches
to MDPs with other properties, such as reachability objectives. Unfortunately,
a naive extension of VI does not work for MDPs with long-run average rewards,
as there is no known stopping criterion. In this work our contributions are
threefold. (1) We refute a conjecture related to stopping criteria for MDPs
with long-run average rewards. (2) We present two practical algorithms for MDPs
with long-run average rewards based on VI. First, we show that a combination of
applying VI locally for each maximal end-component (MEC) and VI for
reachability objectives can provide approximation guarantees. Second, extending
the above approach with a simulation-guided on-demand variant of VI, we present
an anytime algorithm that is able to deal with very large models. (3) Finally,
we present experimental results showing that our methods significantly
outperform the standard approaches on several benchmarks
Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes
We consider Markov decision processes (MDPs) with multiple limit-average (or
mean-payoff) objectives. There exist two different views: (i) the expectation
semantics, where the goal is to optimize the expected mean-payoff objective,
and (ii) the satisfaction semantics, where the goal is to maximize the
probability of runs such that the mean-payoff value stays above a given vector.
We consider optimization with respect to both objectives at once, thus unifying
the existing semantics. Precisely, the goal is to optimize the expectation
while ensuring the satisfaction constraint. Our problem captures the notion of
optimization with respect to strategies that are risk-averse (i.e., ensure
certain probabilistic guarantee). Our main results are as follows: First, we
present algorithms for the decision problems which are always polynomial in the
size of the MDP. We also show that an approximation of the Pareto-curve can be
computed in time polynomial in the size of the MDP, and the approximation
factor, but exponential in the number of dimensions. Second, we present a
complete characterization of the strategy complexity (in terms of memory bounds
and randomization) required to solve our problem.Comment: Extended journal version of the LICS'15 pape
Analysis of Timed and Long-Run Objectives for Markov Automata
Markov automata (MAs) extend labelled transition systems with random delays
and probabilistic branching. Action-labelled transitions are instantaneous and
yield a distribution over states, whereas timed transitions impose a random
delay governed by an exponential distribution. MAs are thus a nondeterministic
variation of continuous-time Markov chains. MAs are compositional and are used
to provide a semantics for engineering frameworks such as (dynamic) fault
trees, (generalised) stochastic Petri nets, and the Architecture Analysis &
Design Language (AADL). This paper considers the quantitative analysis of MAs.
We consider three objectives: expected time, long-run average, and timed
(interval) reachability. Expected time objectives focus on determining the
minimal (or maximal) expected time to reach a set of states. Long-run
objectives determine the fraction of time to be in a set of states when
considering an infinite time horizon. Timed reachability objectives are about
computing the probability to reach a set of states within a given time
interval. This paper presents the foundations and details of the algorithms and
their correctness proofs. We report on several case studies conducted using a
prototypical tool implementation of the algorithms, driven by the MAPA
modelling language for efficiently generating MAs.Comment: arXiv admin note: substantial text overlap with arXiv:1305.705
Trading Performance for Stability in Markov Decision Processes
We study the complexity of central controller synthesis problems for
finite-state Markov decision processes, where the objective is to optimize both
the expected mean-payoff performance of the system and its stability.
We argue that the basic theoretical notion of expressing the stability in
terms of the variance of the mean-payoff (called global variance in our paper)
is not always sufficient, since it ignores possible instabilities on respective
runs. For this reason we propose alernative definitions of stability, which we
call local and hybrid variance, and which express how rewards on each run
deviate from the run's own mean-payoff and from the expected mean-payoff,
respectively.
We show that a strategy ensuring both the expected mean-payoff and the
variance below given bounds requires randomization and memory, under all the
above semantics of variance. We then look at the problem of determining whether
there is a such a strategy. For the global variance, we show that the problem
is in PSPACE, and that the answer can be approximated in pseudo-polynomial
time. For the hybrid variance, the analogous decision problem is in NP, and a
polynomial-time approximating algorithm also exists. For local variance, we
show that the decision problem is in NP. Since the overall performance can be
traded for stability (and vice versa), we also present algorithms for
approximating the associated Pareto curve in all the three cases.
Finally, we study a special case of the decision problems, where we require a
given expected mean-payoff together with zero variance. Here we show that the
problems can be all solved in polynomial time.Comment: Extended version of a paper presented at LICS 201
- …