3,160 research outputs found
Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint
The classic objective in a reinforcement learning (RL) problem is to find a
policy that minimizes, in expectation, a long-run objective such as the
infinite-horizon discounted or long-run average cost. In many practical
applications, optimizing the expected value alone is not sufficient, and it may
be necessary to include a risk measure in the optimization process, either as
the objective or as a constraint. Various risk measures have been proposed in
the literature, e.g., mean-variance tradeoff, exponential utility, the
percentile performance, value at risk, conditional value at risk, prospect
theory and its later enhancement, cumulative prospect theory. In this article,
we focus on the combination of risk criteria and reinforcement learning in a
constrained optimization framework, i.e., a setting where the goal to find a
policy that optimizes the usual objective of infinite-horizon
discounted/average cost, while ensuring that an explicit risk constraint is
satisfied. We introduce the risk-constrained RL framework, cover popular risk
measures based on variance, conditional value-at-risk and cumulative prospect
theory, and present a template for a risk-sensitive RL algorithm. We survey
some of our recent work on this topic, covering problems encompassing
discounted cost, average cost, and stochastic shortest path settings, together
with the aforementioned risk measures in a constrained framework. This
non-exhaustive survey is aimed at giving a flavor of the challenges involved in
solving a risk-sensitive RL problem, and outlining some potential future
research directions
The Maximal Positively Invariant Set: Polynomial Setting
This note considers the maximal positively invariant set for polynomial
discrete time dynamics subject to constraints specified by a basic
semialgebraic set. The note utilizes a relatively direct, but apparently
overlooked, fact stating that the related preimage map preserves basic
semialgebraic structure. In fact, this property propagates to underlying
set--dynamics induced by the associated restricted preimage map in general and
to its maximal trajectory in particular. The finite time convergence of the
corresponding maximal trajectory to the maximal positively invariant set is
verified under reasonably mild conditions. The analysis is complemented with a
discussion of computational aspects and a prototype implementation based on
existing toolboxes for polynomial optimization
High-order filtered schemes for time-dependent second order HJB equations
In this paper, we present and analyse a class of "filtered" numerical schemes
for second order Hamilton-Jacobi-Bellman equations. Our approach follows the
ideas introduced in B.D. Froese and A.M. Oberman, Convergent filtered schemes
for the Monge-Amp\`ere partial differential equation, SIAM J. Numer. Anal.,
51(1):423--444, 2013, and more recently applied by other authors to stationary
or time-dependent first order Hamilton-Jacobi equations. For high order
approximation schemes (where "high" stands for greater than one), the
inevitable loss of monotonicity prevents the use of the classical theoretical
results for convergence to viscosity solutions. The work introduces a suitable
local modification of these schemes by "filtering" them with a monotone scheme,
such that they can be proven convergent and still show an overall high order
behaviour for smooth enough solutions. We give theoretical proofs of these
claims and illustrate the behaviour with numerical tests from mathematical
finance, focussing also on the use of backward difference formulae (BDF) for
constructing the high order schemes.Comment: 27 pages, 16 figures, 4 table
- …