9 research outputs found

    Markov risk mappings and risk-sensitive optimal stopping

    Full text link
    In contrast to the analytic approach to risk for Markov chains based on transition risk mappings, we introduce a probabilistic setting based on a novel concept of regular conditional risk mapping with Markov update rule. We confirm that the Markov property holds for the standard measures of risk used in practice such as Value at Risk and Average Value at Risk. We analyse the dual representation for convex Markovian risk mappings and a representation in terms of their acceptance sets. The Markov property is formulated in several equivalent versions including a strong version, opening up additional risk-sensitive optimisation problems such as optimal stopping with exercise lag and optimal prediction. We demonstrate how such problems can be reduced to a risk-sensitive optimal stopping problem with intermediate costs, and derive the dynamic programming equations for the latter. Finally, we show how our results can be extended to partially observable Markov processes.Comment: 29 pages. New: extension of one-step ahead Markov property to entire "future", Markov property in terms of acceptance sets, VaR and AVaR examples, convex Markov risk mappings, application to optimal stopping with exercise lag. Notable changes: Stopping cost in the partially observable optimal stopping problem can depend on the unobservable stat

    Risk-sensitive partially observable Markov decision processes as fully observable multivariate utility optimization problems

    Get PDF
    We provide a new algorithm for solving Risk Sensitive Partially Observable Markov Decisions Processes, when the risk is modeled by a utility function, and both the state space and the space of observations are fi- nite. This algorithm is based on an observation that the change of measure and the subsequent introduction of the information space, which is used for exponential utility functions, can be actually extended for sums of exponentials if one introduces an extra vector parameter that tracks the expected accumulated cost that corresponds to each exponential. Since every increasing function can be approximated by sums of expo- nentials in finite intervals, the method can be essentially applied for any utility function, with its complexity depending on the numbe

    Provably Efficient Partially Observable Risk-Sensitive Reinforcement Learning with Hindsight Observation

    Full text link
    This work pioneers regret analysis of risk-sensitive reinforcement learning in partially observable environments with hindsight observation, addressing a gap in theoretical exploration. We introduce a novel formulation that integrates hindsight observations into a Partially Observable Markov Decision Process (POMDP) framework, where the goal is to optimize accumulated reward under the entropic risk measure. We develop the first provably efficient RL algorithm tailored for this setting. We also prove by rigorous analysis that our algorithm achieves polynomial regret O~(e∣γ∣H−1∣γ∣HH2KHS2OA)\tilde{O}\left(\frac{e^{|{\gamma}|H}-1}{|{\gamma}|H}H^2\sqrt{KHS^2OA}\right), which outperforms or matches existing upper bounds when the model degenerates to risk-neutral or fully observable settings. We adopt the method of change-of-measure and develop a novel analytical tool of beta vectors to streamline mathematical derivations. These techniques are of particular interest to the theoretical study of reinforcement learning.Comment: 38 page

    Constrained optimal stopping games

    Get PDF
    In this thesis, we consider four optimal stopping problems with stopping constraints. Chapter 2 introduces a new class of Dynkin games, where the two players are allowed to make their stopping decisions at a sequence of exogenous Poisson arrival times. The value function and the associated optimal stopping strategy are characterized by the solution of a backward stochastic differential equation. Furthermore, the chapter applies the model to study the optimal conversion and calling strategies of convertible bonds, and their asymptotics when the Poisson intensity goes to infinity. Chapter 3 generalizes the work in Chapter 2 from the risk-neutral criteria and common signal times for both players to the risk-sensitive criteria and two heterogeneous signal times. Chapter 4 considers a two-player zero-sum optimal switching games with stopping constraints. We prove the chain of inequalities involving the four values of the game, and the values of both the static and dynamic games exist in the case when the running and terminal rewards are separated. Chapter 5 studies a mixed stochastic control and constrained optimal stopping problem which models rollover debt decisions in an incomplete market. In addition to the rollover decisions, the creditor can also choose a control strategy to trade in risky assets correlated with the fundamental assets. In the case of exponential utility, we prove the complete characterization and obtain the exponential indifference bond price and its associated optimal mixed strategy
    corecore