14,972 research outputs found

    Value Iteration for Long-run Average Reward in Markov Decision Processes

    Full text link
    Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI) is one of the simplest and most efficient algorithmic approaches to MDPs with other properties, such as reachability objectives. Unfortunately, a naive extension of VI does not work for MDPs with long-run average rewards, as there is no known stopping criterion. In this work our contributions are threefold. (1) We refute a conjecture related to stopping criteria for MDPs with long-run average rewards. (2) We present two practical algorithms for MDPs with long-run average rewards based on VI. First, we show that a combination of applying VI locally for each maximal end-component (MEC) and VI for reachability objectives can provide approximation guarantees. Second, extending the above approach with a simulation-guided on-demand variant of VI, we present an anytime algorithm that is able to deal with very large models. (3) Finally, we present experimental results showing that our methods significantly outperform the standard approaches on several benchmarks

    Infinite-Duration Bidding Games

    Full text link
    Two-player games on graphs are widely studied in formal methods as they model the interaction between a system and its environment. The game is played by moving a token throughout a graph to produce an infinite path. There are several common modes to determine how the players move the token through the graph; e.g., in turn-based games the players alternate turns in moving the token. We study the {\em bidding} mode of moving the token, which, to the best of our knowledge, has never been studied in infinite-duration games. The following bidding rule was previously defined and called Richman bidding. Both players have separate {\em budgets}, which sum up to 11. In each turn, a bidding takes place: Both players submit bids simultaneously, where a bid is legal if it does not exceed the available budget, and the higher bidder pays his bid to the other player and moves the token. The central question studied in bidding games is a necessary and sufficient initial budget for winning the game: a {\em threshold} budget in a vertex is a value t∈[0,1]t \in [0,1] such that if Player 11's budget exceeds tt, he can win the game, and if Player 22's budget exceeds 1−t1-t, he can win the game. Threshold budgets were previously shown to exist in every vertex of a reachability game, which have an interesting connection with {\em random-turn} games -- a sub-class of simple stochastic games in which the player who moves is chosen randomly. We show the existence of threshold budgets for a qualitative class of infinite-duration games, namely parity games, and a quantitative class, namely mean-payoff games. The key component of the proof is a quantitative solution to strongly-connected mean-payoff bidding games in which we extend the connection with random-turn games to these games, and construct explicit optimal strategies for both players.Comment: A short version appeared in CONCUR 2017. The paper is accepted to JAC

    MDP Optimal Control under Temporal Logic Constraints

    Get PDF
    In this paper, we develop a method to automatically generate a control policy for a dynamical system modeled as a Markov Decision Process (MDP). The control specification is given as a Linear Temporal Logic (LTL) formula over a set of propositions defined on the states of the MDP. We synthesize a control policy such that the MDP satisfies the given specification almost surely, if such a policy exists. In addition, we designate an "optimizing proposition" to be repeatedly satisfied, and we formulate a novel optimization criterion in terms of minimizing the expected cost in between satisfactions of this proposition. We propose a sufficient condition for a policy to be optimal, and develop a dynamic programming algorithm that synthesizes a policy that is optimal under some conditions, and sub-optimal otherwise. This problem is motivated by robotic applications requiring persistent tasks, such as environmental monitoring or data gathering, to be performed.Comment: Technical report accompanying the CDC2011 submissio
    • …
    corecore