8,528 research outputs found
Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint
The classic objective in a reinforcement learning (RL) problem is to find a
policy that minimizes, in expectation, a long-run objective such as the
infinite-horizon discounted or long-run average cost. In many practical
applications, optimizing the expected value alone is not sufficient, and it may
be necessary to include a risk measure in the optimization process, either as
the objective or as a constraint. Various risk measures have been proposed in
the literature, e.g., mean-variance tradeoff, exponential utility, the
percentile performance, value at risk, conditional value at risk, prospect
theory and its later enhancement, cumulative prospect theory. In this article,
we focus on the combination of risk criteria and reinforcement learning in a
constrained optimization framework, i.e., a setting where the goal to find a
policy that optimizes the usual objective of infinite-horizon
discounted/average cost, while ensuring that an explicit risk constraint is
satisfied. We introduce the risk-constrained RL framework, cover popular risk
measures based on variance, conditional value-at-risk and cumulative prospect
theory, and present a template for a risk-sensitive RL algorithm. We survey
some of our recent work on this topic, covering problems encompassing
discounted cost, average cost, and stochastic shortest path settings, together
with the aforementioned risk measures in a constrained framework. This
non-exhaustive survey is aimed at giving a flavor of the challenges involved in
solving a risk-sensitive RL problem, and outlining some potential future
research directions
Optimal Payoffs under State-dependent Preferences
Most decision theories, including expected utility theory, rank dependent
utility theory and cumulative prospect theory, assume that investors are only
interested in the distribution of returns and not in the states of the economy
in which income is received. Optimal payoffs have their lowest outcomes when
the economy is in a downturn, and this feature is often at odds with the needs
of many investors. We introduce a framework for portfolio selection within
which state-dependent preferences can be accommodated. Specifically, we assume
that investors care about the distribution of final wealth and its interaction
with some benchmark. In this context, we are able to characterize optimal
payoffs in explicit form. Furthermore, we extend the classical expected utility
optimization problem of Merton to the state-dependent situation. Some
applications in security design are discussed in detail and we also solve some
stochastic extensions of the target probability optimization problem
Cumulative Prospect Theory Based Dynamic Pricing for Shared Mobility on Demand Services
Cumulative Prospect Theory (CPT) is a modeling tool widely used in behavioral
economics and cognitive psychology that captures subjective decision making of
individuals under risk or uncertainty. In this paper, we propose a dynamic
pricing strategy for Shared Mobility on Demand Services (SMoDSs) using a
passenger behavioral model based on CPT. This dynamic pricing strategy together
with dynamic routing via a constrained optimization algorithm that we have
developed earlier, provide a complete solution customized for SMoDS of
multi-passenger transportation. The basic principles of CPT and the derivation
of the passenger behavioral model in the SMoDS context are described in detail.
The implications of CPT on dynamic pricing of the SMoDS are delineated using
computational experiments involving passenger preferences. These implications
include interpretation of the classic fourfold pattern of risk attitudes,
strong risk aversion over mixed prospects, and behavioral preferences of self
reference. Overall, it is argued that the use of the CPT framework corresponds
to a crucial building block in designing socio-technical systems by allowing
quantification of subjective decision making under risk or uncertainty that is
perceived to be otherwise qualitative.Comment: 17 pages, 6 figures, and has been accepted for publication at the
58th Annual Conference on Decision and Control, 201
Spanning Tests for Markowitz Stochastic Dominance
We derive properties of the cdf of random variables defined as saddle-type
points of real valued continuous stochastic processes. This facilitates the
derivation of the first-order asymptotic properties of tests for stochastic
spanning given some stochastic dominance relation. We define the concept of
Markowitz stochastic dominance spanning, and develop an analytical
representation of the spanning property. We construct a non-parametric test for
spanning based on subsampling, and derive its asymptotic exactness and
consistency. The spanning methodology determines whether introducing new
securities or relaxing investment constraints improves the investment
opportunity set of investors driven by Markowitz stochastic dominance. In an
application to standard data sets of historical stock market returns, we reject
market portfolio Markowitz efficiency as well as two-fund separation. Hence, we
find evidence that equity management through base assets can outperform the
market, for investors with Markowitz type preferences
- …