14,974 research outputs found
Value Iteration for Long-run Average Reward in Markov Decision Processes
Markov decision processes (MDPs) are standard models for probabilistic
systems with non-deterministic behaviours. Long-run average rewards provide a
mathematically elegant formalism for expressing long term performance. Value
iteration (VI) is one of the simplest and most efficient algorithmic approaches
to MDPs with other properties, such as reachability objectives. Unfortunately,
a naive extension of VI does not work for MDPs with long-run average rewards,
as there is no known stopping criterion. In this work our contributions are
threefold. (1) We refute a conjecture related to stopping criteria for MDPs
with long-run average rewards. (2) We present two practical algorithms for MDPs
with long-run average rewards based on VI. First, we show that a combination of
applying VI locally for each maximal end-component (MEC) and VI for
reachability objectives can provide approximation guarantees. Second, extending
the above approach with a simulation-guided on-demand variant of VI, we present
an anytime algorithm that is able to deal with very large models. (3) Finally,
we present experimental results showing that our methods significantly
outperform the standard approaches on several benchmarks
Infinite-Duration Bidding Games
Two-player games on graphs are widely studied in formal methods as they model
the interaction between a system and its environment. The game is played by
moving a token throughout a graph to produce an infinite path. There are
several common modes to determine how the players move the token through the
graph; e.g., in turn-based games the players alternate turns in moving the
token. We study the {\em bidding} mode of moving the token, which, to the best
of our knowledge, has never been studied in infinite-duration games. The
following bidding rule was previously defined and called Richman bidding. Both
players have separate {\em budgets}, which sum up to . In each turn, a
bidding takes place: Both players submit bids simultaneously, where a bid is
legal if it does not exceed the available budget, and the higher bidder pays
his bid to the other player and moves the token. The central question studied
in bidding games is a necessary and sufficient initial budget for winning the
game: a {\em threshold} budget in a vertex is a value such that
if Player 's budget exceeds , he can win the game, and if Player 's
budget exceeds , he can win the game. Threshold budgets were previously
shown to exist in every vertex of a reachability game, which have an
interesting connection with {\em random-turn} games -- a sub-class of simple
stochastic games in which the player who moves is chosen randomly. We show the
existence of threshold budgets for a qualitative class of infinite-duration
games, namely parity games, and a quantitative class, namely mean-payoff games.
The key component of the proof is a quantitative solution to strongly-connected
mean-payoff bidding games in which we extend the connection with random-turn
games to these games, and construct explicit optimal strategies for both
players.Comment: A short version appeared in CONCUR 2017. The paper is accepted to
JAC
MDP Optimal Control under Temporal Logic Constraints
In this paper, we develop a method to automatically generate a control policy
for a dynamical system modeled as a Markov Decision Process (MDP). The control
specification is given as a Linear Temporal Logic (LTL) formula over a set of
propositions defined on the states of the MDP. We synthesize a control policy
such that the MDP satisfies the given specification almost surely, if such a
policy exists. In addition, we designate an "optimizing proposition" to be
repeatedly satisfied, and we formulate a novel optimization criterion in terms
of minimizing the expected cost in between satisfactions of this proposition.
We propose a sufficient condition for a policy to be optimal, and develop a
dynamic programming algorithm that synthesizes a policy that is optimal under
some conditions, and sub-optimal otherwise. This problem is motivated by
robotic applications requiring persistent tasks, such as environmental
monitoring or data gathering, to be performed.Comment: Technical report accompanying the CDC2011 submissio
- …