Search CORE

40,334 research outputs found

Optimistic Value Iteration

Author: Hartmanns Arnd
Kaminski Benjamin Lucien
Publication venue
Publication date: 17/10/2019
Field of study

Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides a lower bound on unbounded probabilities or reward values. Two "sound" variations, which also deliver an upper bound, have recently appeared. In this paper, we present optimistic value iteration, a new sound approach that leverages value iteration's ability to usually deliver tight lower bounds: we obtain a lower bound via standard value iteration, use the result to "guess" an upper bound, and prove the latter's correctness. Optimistic value iteration is easy to implement, does not require extra precomputations or a priori state space transformations, and works for computing reachability probabilities as well as expected rewards. It is also fast, as we show via an extensive experimental evaluation using our publicly available implementation within the Modest Toolset

arXiv.org e-Print Archive

University of Twente Research Information

Optimistic Value Iteration

Author: Hartmanns A
Kaminski BL
Publication venue: 32nd International Conference on Computer-Aided Verification (CAV 2020)
Publication date: 14/07/2020
Field of study

Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards. Two “sound” variations, which also deliver an upper bound, have recently appeared. In this paper, we present a new sound approach that leverages value iteration’s ability to usually deliver good lower bounds: we obtain a lower bound via standard value iteration, use the result to “guess” an upper bound, and prove the latter’s correctness. We present this optimistic value iteration approach for computing reachability probabilities as well as expected rewards. It is easy to implement and performs well, as we show via an extensive experimental evaluation using our implementation within the mcsta model checker of the Modest Toolset

Certificates for Probabilistic Pushdown Automata via Optimistic Value Iteration

Author: Katoen Joost-Pieter
Winkler Tobias
Publication venue
Publication date: 20/01/2023
Field of study

Probabilistic pushdown automata (pPDA) are a standard model for discrete probabilistic programs with procedures and recursion. In pPDA, many quantitative properties are characterized as least fixpoints of polynomial equation systems. In this paper, we study the problem of certifying that these quantities lie within certain bounds. To this end, we first characterize the polynomial systems that admit easy-to-check certificates for validating bounds on their least fixpoint. Second, we present a sound and complete Optimistic Value Iteration algorithm for computing such certificates. Third, we show how certificates for polynomial systems can be transferred to certificates for various quantitative pPDA properties. Experiments demonstrate that our algorithm computes succinct certificates for several intricate example programs as well as stochastic context-free grammars with

> 10^4

production rules.Comment: Full version of a paper to appear at TACAS 2023, 30 page

arXiv.org e-Print Archive

Dynamic Programming for Positive Linear Systems with Linear Costs

Author: Li Yuchao
Publication venue
Publication date: 03/06/2023
Field of study

Recent work by Rantzer [Ran22] formulated a class of optimal control problems involving positive linear systems, linear stage costs, and linear constraints. It was shown that the associated Bellman's equation can be characterized by a finite-dimensional nonlinear equation, which is solved by linear programming. In this work, we report complementary theories for the same class of problems. In particular, we provide conditions under which the solution is unique, investigate properties of the optimal policy, study the convergence of value iteration, policy iteration, and optimistic policy iteration applied to such problems, and analyze the boundedness of the solution to the associated linear program. Apart from a form of the Frobenius-Perron theorem, the majority of our results are built upon generic dynamic programming theory applicable to problems involving nonnegative stage costs

arXiv.org e-Print Archive