Search CORE

4 research outputs found

Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes

Author: Křetínský Jan
Meggendorfer Tobias
Publication venue
Publication date: 07/09/2017
Field of study

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related properties. Strategy iteration is one of the solution techniques applicable in this context. While in many other contexts it is the technique of choice due to advantages over e.g. value iteration, such as precision or possibility of domain-knowledge-aware initialization, it is rarely used for MDPs, since there it scales worse than value iteration. We provide several techniques that speed up strategy iteration by orders of magnitude for many MDPs, eliminating the performance disadvantage while preserving all its advantages

arXiv.org e-Print Archive

Lancaster E-Prints

Faster Algorithms for Quantitative Analysis of Markov Chains and Markov Decision Processes with Small Treewidth

Author: Asadi Ali
Chatterjee Krishnendu
Goharshady Amir Kafshdar
Mohammadi Kiarash
Pavlogiannis Andreas
Publication venue
Publication date: 06/04/2020
Field of study

Discrete-time Markov Chains (MCs) and Markov Decision Processes (MDPs) are two standard formalisms in system analysis. Their main associated quantitative objectives are hitting probabilities, discounted sum, and mean payoff. Although there are many techniques for computing these objectives in general MCs/MDPs, they have not been thoroughly studied in terms of parameterized algorithms, particularly when treewidth is used as the parameter. This is in sharp contrast to qualitative objectives for MCs, MDPs and graph games, for which treewidth-based algorithms yield significant complexity improvements. In this work, we show that treewidth can also be used to obtain faster algorithms for the quantitative problems. For an MC with

n

states and

m

transitions, we show that each of the classical quantitative objectives can be computed in

O((n+m)\cdot t^2)

time, given a tree decomposition of the MC that has width

t

. Our results also imply a bound of

O(\kappa\cdot (n+m)\cdot t^2)

for each objective on MDPs, where

\kappa

is the number of strategy-iteration refinements required for the given input and objective. Finally, we make an experimental evaluation of our new algorithms on low-treewidth MCs and MDPs obtained from the DaCapo benchmark suite. Our experimental results show that on MCs and MDPs with small treewidth, our algorithms outperform existing well-established methods by one or more orders of magnitude

arXiv.org e-Print Archive

Hal-Diderot

Approximate policy iteration for Markov decision processes via quantitative adaptive aggregations

Author: Abate A
Kwiatkowska M
Češka M
Publication venue: Springer Verlag
Publication date: 01/01/2016
Field of study

We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number of approximate methods have been developed. These are typically based on value or policy iteration, enabling further speedups through lumped and distributed updates, or by employing succinct representations of the value functions. However, none of the existing approximate techniques provides general, explicit and tunable bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the optimality of the policy. In this paper we propose a new approximate policy iteration scheme that mitigates the state-space explosion problem by adaptive state-space aggregation, at the same time providing rigorous and explicit error bounds that can be used to control the optimality level of the obtained policy. We evaluate the new approach on a case study, demonstrating evidence that the state-space reduction results in considerable acceleration of the policy iteration scheme, while being able to meet the required level of precision

Oxford University Research Archive

Approximate policy iteration for Markov decision processes via quantitative adaptive aggregations

Author: Abate A
Kwiatkowska M
Češka M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/09/2016
Field of study

Oxford University Research Archive