Search CORE

14,974 research outputs found

Value Iteration for Long-run Average Reward in Markov Decision Processes

Author: A Komuravelli
A McIver
AF Veinott
AK McIver
C Baier
C Courcoubetis
J Filar
K Chatterjee
K Chatterjee
K Chatterjee
K Chatterjee
M Duflot
M Kwiatkowska
M Kwiatkowska
M Kwiatkowska
ML Puterman
O Michael
RA Howard
S Giro
S Haddad
T Brázdil
T Brázdil
T Brázdil
Publication venue
Publication date: 13/07/2017
Field of study

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI) is one of the simplest and most efficient algorithmic approaches to MDPs with other properties, such as reachability objectives. Unfortunately, a naive extension of VI does not work for MDPs with long-run average rewards, as there is no known stopping criterion. In this work our contributions are threefold. (1) We refute a conjecture related to stopping criteria for MDPs with long-run average rewards. (2) We present two practical algorithms for MDPs with long-run average rewards based on VI. First, we show that a combination of applying VI locally for each maximal end-component (MEC) and VI for reachability objectives can provide approximation guarantees. Second, extending the above approach with a simulation-guided on-demand variant of VI, we present an anytime algorithm that is able to deal with very large models. (3) Finally, we present experimental results showing that our methods significantly outperform the standard approaches on several benchmarks

arXiv.org e-Print Archive

Crossref

Lancaster E-Prints

Infinite-Duration Bidding Games

Author: Avni Guy
Chonev Ventsislav
Henzinger Thomas A.
Publication venue
Publication date: 01/01/2019
Field of study

Two-player games on graphs are widely studied in formal methods as they model the interaction between a system and its environment. The game is played by moving a token throughout a graph to produce an infinite path. There are several common modes to determine how the players move the token through the graph; e.g., in turn-based games the players alternate turns in moving the token. We study the {\em bidding} mode of moving the token, which, to the best of our knowledge, has never been studied in infinite-duration games. The following bidding rule was previously defined and called Richman bidding. Both players have separate {\em budgets}, which sum up to

1

. In each turn, a bidding takes place: Both players submit bids simultaneously, where a bid is legal if it does not exceed the available budget, and the higher bidder pays his bid to the other player and moves the token. The central question studied in bidding games is a necessary and sufficient initial budget for winning the game: a {\em threshold} budget in a vertex is a value

t \in [0,1]

such that if Player

1

's budget exceeds

t

, he can win the game, and if Player

2

's budget exceeds

1-t

, he can win the game. Threshold budgets were previously shown to exist in every vertex of a reachability game, which have an interesting connection with {\em random-turn} games -- a sub-class of simple stochastic games in which the player who moves is chosen randomly. We show the existence of threshold budgets for a qualitative class of infinite-duration games, namely parity games, and a quantitative class, namely mean-payoff games. The key component of the proof is a quantitative solution to strongly-connected mean-payoff bidding games in which we extend the connection with random-turn games to these games, and construct explicit optimal strategies for both players.Comment: A short version appeared in CONCUR 2017. The paper is accepted to JAC

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)

MDP Optimal Control under Temporal Logic Constraints

Author: Belta Calin
Ding Xu Chu
Rus Daniela
Smith Stephen L.
Publication venue
Publication date: 01/01/2011
Field of study

In this paper, we develop a method to automatically generate a control policy for a dynamical system modeled as a Markov Decision Process (MDP). The control specification is given as a Linear Temporal Logic (LTL) formula over a set of propositions defined on the states of the MDP. We synthesize a control policy such that the MDP satisfies the given specification almost surely, if such a policy exists. In addition, we designate an "optimizing proposition" to be repeatedly satisfied, and we formulate a novel optimization criterion in terms of minimizing the expected cost in between satisfactions of this proposition. We propose a sufficient condition for a policy to be optimal, and develop a dynamic programming algorithm that synthesizes a policy that is optimal under some conditions, and sub-optimal otherwise. This problem is motivated by robotic applications requiring persistent tasks, such as environmental monitoring or data gathering, to be performed.Comment: Technical report accompanying the CDC2011 submissio

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Boston University Institutional Repository (OpenBU)