3,160 research outputs found

    Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

    Full text link
    The classic objective in a reinforcement learning (RL) problem is to find a policy that minimizes, in expectation, a long-run objective such as the infinite-horizon discounted or long-run average cost. In many practical applications, optimizing the expected value alone is not sufficient, and it may be necessary to include a risk measure in the optimization process, either as the objective or as a constraint. Various risk measures have been proposed in the literature, e.g., mean-variance tradeoff, exponential utility, the percentile performance, value at risk, conditional value at risk, prospect theory and its later enhancement, cumulative prospect theory. In this article, we focus on the combination of risk criteria and reinforcement learning in a constrained optimization framework, i.e., a setting where the goal to find a policy that optimizes the usual objective of infinite-horizon discounted/average cost, while ensuring that an explicit risk constraint is satisfied. We introduce the risk-constrained RL framework, cover popular risk measures based on variance, conditional value-at-risk and cumulative prospect theory, and present a template for a risk-sensitive RL algorithm. We survey some of our recent work on this topic, covering problems encompassing discounted cost, average cost, and stochastic shortest path settings, together with the aforementioned risk measures in a constrained framework. This non-exhaustive survey is aimed at giving a flavor of the challenges involved in solving a risk-sensitive RL problem, and outlining some potential future research directions

    The Maximal Positively Invariant Set: Polynomial Setting

    Get PDF
    This note considers the maximal positively invariant set for polynomial discrete time dynamics subject to constraints specified by a basic semialgebraic set. The note utilizes a relatively direct, but apparently overlooked, fact stating that the related preimage map preserves basic semialgebraic structure. In fact, this property propagates to underlying set--dynamics induced by the associated restricted preimage map in general and to its maximal trajectory in particular. The finite time convergence of the corresponding maximal trajectory to the maximal positively invariant set is verified under reasonably mild conditions. The analysis is complemented with a discussion of computational aspects and a prototype implementation based on existing toolboxes for polynomial optimization

    High-order filtered schemes for time-dependent second order HJB equations

    Full text link
    In this paper, we present and analyse a class of "filtered" numerical schemes for second order Hamilton-Jacobi-Bellman equations. Our approach follows the ideas introduced in B.D. Froese and A.M. Oberman, Convergent filtered schemes for the Monge-Amp\`ere partial differential equation, SIAM J. Numer. Anal., 51(1):423--444, 2013, and more recently applied by other authors to stationary or time-dependent first order Hamilton-Jacobi equations. For high order approximation schemes (where "high" stands for greater than one), the inevitable loss of monotonicity prevents the use of the classical theoretical results for convergence to viscosity solutions. The work introduces a suitable local modification of these schemes by "filtering" them with a monotone scheme, such that they can be proven convergent and still show an overall high order behaviour for smooth enough solutions. We give theoretical proofs of these claims and illustrate the behaviour with numerical tests from mathematical finance, focussing also on the use of backward difference formulae (BDF) for constructing the high order schemes.Comment: 27 pages, 16 figures, 4 table
    corecore