1,095 research outputs found
Efficient computation of exact solutions for quantitative model checking
Quantitative model checkers for Markov Decision Processes typically use
finite-precision arithmetic. If all the coefficients in the process are
rational numbers, then the model checking results are rational, and so they can
be computed exactly. However, exact techniques are generally too expensive or
limited in scalability. In this paper we propose a method for obtaining exact
results starting from an approximated solution in finite-precision arithmetic.
The input of the method is a description of a scheduler, which can be obtained
by a model checker using finite precision. Given a scheduler, we show how to
obtain a corresponding basis in a linear-programming problem, in such a way
that the basis is optimal whenever the scheduler attains the worst-case
probability. This correspondence is already known for discounted MDPs, we show
how to apply it in the undiscounted case provided that some preprocessing is
done. Using the correspondence, the linear-programming problem can be solved in
exact arithmetic starting from the basis obtained. As a consequence, the method
finds the worst-case probability even if the scheduler provided by the model
checker was not optimal. In our experiments, the calculation of exact solutions
from a candidate scheduler is significantly faster than the calculation using
the simplex method under exact arithmetic starting from a default basis.Comment: In Proceedings QAPL 2012, arXiv:1207.055
Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint
The classic objective in a reinforcement learning (RL) problem is to find a
policy that minimizes, in expectation, a long-run objective such as the
infinite-horizon discounted or long-run average cost. In many practical
applications, optimizing the expected value alone is not sufficient, and it may
be necessary to include a risk measure in the optimization process, either as
the objective or as a constraint. Various risk measures have been proposed in
the literature, e.g., mean-variance tradeoff, exponential utility, the
percentile performance, value at risk, conditional value at risk, prospect
theory and its later enhancement, cumulative prospect theory. In this article,
we focus on the combination of risk criteria and reinforcement learning in a
constrained optimization framework, i.e., a setting where the goal to find a
policy that optimizes the usual objective of infinite-horizon
discounted/average cost, while ensuring that an explicit risk constraint is
satisfied. We introduce the risk-constrained RL framework, cover popular risk
measures based on variance, conditional value-at-risk and cumulative prospect
theory, and present a template for a risk-sensitive RL algorithm. We survey
some of our recent work on this topic, covering problems encompassing
discounted cost, average cost, and stochastic shortest path settings, together
with the aforementioned risk measures in a constrained framework. This
non-exhaustive survey is aimed at giving a flavor of the challenges involved in
solving a risk-sensitive RL problem, and outlining some potential future
research directions
- …