40,334 research outputs found
Optimistic Value Iteration
Markov decision processes are widely used for planning and verification in
settings that combine controllable or adversarial choices with probabilistic
behaviour. The standard analysis algorithm, value iteration, only provides a
lower bound on unbounded probabilities or reward values. Two "sound"
variations, which also deliver an upper bound, have recently appeared. In this
paper, we present optimistic value iteration, a new sound approach that
leverages value iteration's ability to usually deliver tight lower bounds: we
obtain a lower bound via standard value iteration, use the result to "guess" an
upper bound, and prove the latter's correctness. Optimistic value iteration is
easy to implement, does not require extra precomputations or a priori state
space transformations, and works for computing reachability probabilities as
well as expected rewards. It is also fast, as we show via an extensive
experimental evaluation using our publicly available implementation within the
Modest Toolset
Optimistic Value Iteration
Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards. Two “sound” variations, which also deliver an upper bound, have recently appeared. In this paper, we present a new sound approach that leverages value iteration’s ability to usually deliver good lower bounds: we obtain a lower bound via standard value iteration, use the result to “guess” an upper bound, and prove the latter’s correctness. We present this optimistic value iteration approach for computing reachability probabilities as well as expected rewards. It is easy to implement and performs well, as we show via an extensive experimental evaluation using our implementation within the mcsta model checker of the Modest Toolset
Certificates for Probabilistic Pushdown Automata via Optimistic Value Iteration
Probabilistic pushdown automata (pPDA) are a standard model for discrete
probabilistic programs with procedures and recursion. In pPDA, many
quantitative properties are characterized as least fixpoints of polynomial
equation systems. In this paper, we study the problem of certifying that these
quantities lie within certain bounds. To this end, we first characterize the
polynomial systems that admit easy-to-check certificates for validating bounds
on their least fixpoint. Second, we present a sound and complete Optimistic
Value Iteration algorithm for computing such certificates. Third, we show how
certificates for polynomial systems can be transferred to certificates for
various quantitative pPDA properties. Experiments demonstrate that our
algorithm computes succinct certificates for several intricate example programs
as well as stochastic context-free grammars with production rules.Comment: Full version of a paper to appear at TACAS 2023, 30 page
Dynamic Programming for Positive Linear Systems with Linear Costs
Recent work by Rantzer [Ran22] formulated a class of optimal control problems
involving positive linear systems, linear stage costs, and linear constraints.
It was shown that the associated Bellman's equation can be characterized by a
finite-dimensional nonlinear equation, which is solved by linear programming.
In this work, we report complementary theories for the same class of problems.
In particular, we provide conditions under which the solution is unique,
investigate properties of the optimal policy, study the convergence of value
iteration, policy iteration, and optimistic policy iteration applied to such
problems, and analyze the boundedness of the solution to the associated linear
program. Apart from a form of the Frobenius-Perron theorem, the majority of our
results are built upon generic dynamic programming theory applicable to
problems involving nonnegative stage costs
- …