8,528 research outputs found

    Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

    Full text link
    The classic objective in a reinforcement learning (RL) problem is to find a policy that minimizes, in expectation, a long-run objective such as the infinite-horizon discounted or long-run average cost. In many practical applications, optimizing the expected value alone is not sufficient, and it may be necessary to include a risk measure in the optimization process, either as the objective or as a constraint. Various risk measures have been proposed in the literature, e.g., mean-variance tradeoff, exponential utility, the percentile performance, value at risk, conditional value at risk, prospect theory and its later enhancement, cumulative prospect theory. In this article, we focus on the combination of risk criteria and reinforcement learning in a constrained optimization framework, i.e., a setting where the goal to find a policy that optimizes the usual objective of infinite-horizon discounted/average cost, while ensuring that an explicit risk constraint is satisfied. We introduce the risk-constrained RL framework, cover popular risk measures based on variance, conditional value-at-risk and cumulative prospect theory, and present a template for a risk-sensitive RL algorithm. We survey some of our recent work on this topic, covering problems encompassing discounted cost, average cost, and stochastic shortest path settings, together with the aforementioned risk measures in a constrained framework. This non-exhaustive survey is aimed at giving a flavor of the challenges involved in solving a risk-sensitive RL problem, and outlining some potential future research directions

    Optimal Payoffs under State-dependent Preferences

    Full text link
    Most decision theories, including expected utility theory, rank dependent utility theory and cumulative prospect theory, assume that investors are only interested in the distribution of returns and not in the states of the economy in which income is received. Optimal payoffs have their lowest outcomes when the economy is in a downturn, and this feature is often at odds with the needs of many investors. We introduce a framework for portfolio selection within which state-dependent preferences can be accommodated. Specifically, we assume that investors care about the distribution of final wealth and its interaction with some benchmark. In this context, we are able to characterize optimal payoffs in explicit form. Furthermore, we extend the classical expected utility optimization problem of Merton to the state-dependent situation. Some applications in security design are discussed in detail and we also solve some stochastic extensions of the target probability optimization problem

    Cumulative Prospect Theory Based Dynamic Pricing for Shared Mobility on Demand Services

    Full text link
    Cumulative Prospect Theory (CPT) is a modeling tool widely used in behavioral economics and cognitive psychology that captures subjective decision making of individuals under risk or uncertainty. In this paper, we propose a dynamic pricing strategy for Shared Mobility on Demand Services (SMoDSs) using a passenger behavioral model based on CPT. This dynamic pricing strategy together with dynamic routing via a constrained optimization algorithm that we have developed earlier, provide a complete solution customized for SMoDS of multi-passenger transportation. The basic principles of CPT and the derivation of the passenger behavioral model in the SMoDS context are described in detail. The implications of CPT on dynamic pricing of the SMoDS are delineated using computational experiments involving passenger preferences. These implications include interpretation of the classic fourfold pattern of risk attitudes, strong risk aversion over mixed prospects, and behavioral preferences of self reference. Overall, it is argued that the use of the CPT framework corresponds to a crucial building block in designing socio-technical systems by allowing quantification of subjective decision making under risk or uncertainty that is perceived to be otherwise qualitative.Comment: 17 pages, 6 figures, and has been accepted for publication at the 58th Annual Conference on Decision and Control, 201

    Spanning Tests for Markowitz Stochastic Dominance

    Full text link
    We derive properties of the cdf of random variables defined as saddle-type points of real valued continuous stochastic processes. This facilitates the derivation of the first-order asymptotic properties of tests for stochastic spanning given some stochastic dominance relation. We define the concept of Markowitz stochastic dominance spanning, and develop an analytical representation of the spanning property. We construct a non-parametric test for spanning based on subsampling, and derive its asymptotic exactness and consistency. The spanning methodology determines whether introducing new securities or relaxing investment constraints improves the investment opportunity set of investors driven by Markowitz stochastic dominance. In an application to standard data sets of historical stock market returns, we reject market portfolio Markowitz efficiency as well as two-fund separation. Hence, we find evidence that equity management through base assets can outperform the market, for investors with Markowitz type preferences
    corecore