Search CORE

9 research outputs found

Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes

Author: Křetínský Jan
Meggendorfer Tobias
Publication venue
Publication date: 07/09/2017
Field of study

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related properties. Strategy iteration is one of the solution techniques applicable in this context. While in many other contexts it is the technique of choice due to advantages over e.g. value iteration, such as precision or possibility of domain-knowledge-aware initialization, it is rarely used for MDPs, since there it scales worse than value iteration. We provide several techniques that speed up strategy iteration by orders of magnitude for many MDPs, eliminating the performance disadvantage while preserving all its advantages

arXiv.org e-Print Archive

Lancaster E-Prints

LNCS

Author: A Ferrara
AK Goharshady
C Daws
C Dehnert
EA Feinberg
EM Hahn
FV Fomin
HL Bodlaender
J Fearnley
J Křetínský
J Obdržálek
JR Norris
K Chatterjee
K Chatterjee
K Chatterjee
K Chatterjee
K Chatterjee
M Kwiatkowska
M Thorup
ML Puterman
N Robertson
R Bellman
T Quatmann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Discrete-time Markov Chains (MCs) and Markov Decision Processes (MDPs) are two standard formalisms in system analysis. Their main associated quantitative objectives are hitting probabilities, discounted sum, and mean payoff. Although there are many techniques for computing these objectives in general MCs/MDPs, they have not been thoroughly studied in terms of parameterized algorithms, particularly when treewidth is used as the parameter. This is in sharp contrast to qualitative objectives for MCs, MDPs and graph games, for which treewidth-based algorithms yield significant complexity improvements. In this work, we show that treewidth can also be used to obtain faster algorithms for the quantitative problems. For an MC with n states and m transitions, we show that each of the classical quantitative objectives can be computed in O((n+m)⋅t2) time, given a tree decomposition of the MC with width t. Our results also imply a bound of O(κ⋅(n+m)⋅t2) for each objective on MDPs, where κ is the number of strategy-iteration refinements required for the given input and objective. Finally, we make an experimental evaluation of our new algorithms on low-treewidth MCs and MDPs obtained from the DaCapo benchmark suite. Our experiments show that on low-treewidth MCs and MDPs, our algorithms outperform existing well-established methods by one or more orders of magnitude

Crossref

IST Austria: PubRep (Institute of Science and Technology)

Faster Algorithms for Quantitative Analysis of Markov Chains and Markov Decision Processes with Small Treewidth

Author: Asadi Ali
Chatterjee Krishnendu
Goharshady Amir Kafshdar
Mohammadi Kiarash
Pavlogiannis Andreas
Publication venue
Publication date: 06/04/2020
Field of study

n

states and

m

transitions, we show that each of the classical quantitative objectives can be computed in

O((n+m)\cdot t^2)

time, given a tree decomposition of the MC that has width

t

. Our results also imply a bound of

O(\kappa\cdot (n+m)\cdot t^2)

for each objective on MDPs, where

\kappa

is the number of strategy-iteration refinements required for the given input and objective. Finally, we make an experimental evaluation of our new algorithms on low-treewidth MCs and MDPs obtained from the DaCapo benchmark suite. Our experimental results show that on MCs and MDPs with small treewidth, our algorithms outperform existing well-established methods by one or more orders of magnitude

arXiv.org e-Print Archive

Hal-Diderot

Comparison of Algorithms for Simple Stochastic Games (Full Version)

Author: Kretinsky Jan
Ramneantu Emanuel
Slivinskiy Alexander
Weininger Maximilian
Publication venue
Publication date: 25/08/2020
Field of study

Simple stochastic games are turn-based 2.5-player zero-sum graph games with a reachability objective. The problem is to compute the winning probability as well as the optimal strategies of both players. In this paper, we compare the three known classes of algorithms -- value iteration, strategy iteration and quadratic programming -- both theoretically and practically. Further, we suggest several improvements for all algorithms, including the first approach based on quadratic programming that avoids transforming the stochastic game to a stopping one. Our extensive experiments show that these improvements can lead to significant speed-ups. We implemented all algorithms in PRISM-games 3.0, thereby providing the first implementation of quadratic programming for solving simple stochastic games

arXiv.org e-Print Archive

Value Iteration for Simple Stochastic Games: Stopping Criterion and Learning Algorithm

Author: A Condon
AJ Hoffman
C Baier
C Courcoubetis
C Dehnert
C-H Cheng
D Andersson
DA Martin
G Arslan
J Křetínský
K Chatterjee
K Chatterjee
L Busoniu
M Kattenbelt
M Kwiatkowska
M Svorenová
ML Puterman
P Ashok
R Calinescu
RI Brafman
S Haddad
SM LaValle
T Brázdil
T Chen
T Chen
Publication venue
Publication date: 13/04/2018
Field of study

Simple stochastic games can be solved by value iteration (VI), which yields a sequence of under-approximations of the value of the game. This sequence is guaranteed to converge to the value only in the limit. Since no stopping criterion is known, this technique does not provide any guarantees on its results. We provide the first stopping criterion for VI on simple stochastic games. It is achieved by additionally computing a convergent sequence of over-approximations of the value, relying on an analysis of the game graph. Consequently, VI becomes an anytime algorithm returning the approximation of the value and the current error bound. As another consequence, we can provide a simulation-based asynchronous VI algorithm, which yields the same guarantees, but without necessarily exploring the whole game graph.Comment: CAV201

arXiv.org e-Print Archive

Crossref