2,555 research outputs found
Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk
In this paper we present an algorithm to compute risk averse policies in
Markov Decision Processes (MDP) when the total cost criterion is used together
with the average value at risk (AVaR) metric. Risk averse policies are needed
when large deviations from the expected behavior may have detrimental effects,
and conventional MDP algorithms usually ignore this aspect. We provide
conditions for the structure of the underlying MDP ensuring that approximations
for the exact problem can be derived and solved efficiently. Our findings are
novel inasmuch as average value at risk has not previously been considered in
association with the total cost criterion. Our method is demonstrated in a
rapid deployment scenario, whereby a robot is tasked with the objective of
reaching a target location within a temporal deadline where increased speed is
associated with increased probability of failure. We demonstrate that the
proposed algorithm not only produces a risk averse policy reducing the
probability of exceeding the expected temporal deadline, but also provides the
statistical distribution of costs, thus offering a valuable analysis tool
Markov Decision Processes with Risk-Sensitive Criteria: An Overview
The paper provides an overview of the theory and applications of
risk-sensitive Markov decision processes. The term 'risk-sensitive' refers here
to the use of the Optimized Certainty Equivalent as a means to measure
expectation and risk. This comprises the well-known entropic risk measure and
Conditional Value-at-Risk. We restrict our considerations to stationary
problems with an infinite time horizon. Conditions are given under which
optimal policies exist and solution procedures are explained. We present both
the theory when the Optimized Certainty Equivalent is applied recursively as
well as the case where it is applied to the cumulated reward. Discounted as
well as non-discounted models are reviewe
Markov Decision Processes with Average-Value-at-Risk criteria
We investigate the problem of minimizing the Average-Value-at-Risk (AV aRr) of the discounted cost over a finite and an infinite horizon which is generated by a Markov Decision Process (MDP). We show that this problem can be reduced to an ordinary MDP with extended state space and give conditions under which an optimal policy exists. We also give a time-consistent interpretation of the AV aRr . At the end we consider a numerical example which is a simple repeated casino game. It is used to discuss the influence of the risk aversion parameter r of the AV aRr-criterion
Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods
The literature on Inverse Reinforcement Learning (IRL) typically assumes that
humans take actions in order to minimize the expected value of a cost function,
i.e., that humans are risk neutral. Yet, in practice, humans are often far from
being risk neutral. To fill this gap, the objective of this paper is to devise
a framework for risk-sensitive IRL in order to explicitly account for a human's
risk sensitivity. To this end, we propose a flexible class of models based on
coherent risk measures, which allow us to capture an entire spectrum of risk
preferences from risk-neutral to worst-case. We propose efficient
non-parametric algorithms based on linear programming and semi-parametric
algorithms based on maximum likelihood for inferring a human's underlying risk
measure and cost function for a rich class of static and dynamic
decision-making settings. The resulting approach is demonstrated on a simulated
driving game with ten human participants. Our method is able to infer and mimic
a wide range of qualitatively different driving styles from highly risk-averse
to risk-neutral in a data-efficient manner. Moreover, comparisons of the
Risk-Sensitive (RS) IRL approach with a risk-neutral model show that the RS-IRL
framework more accurately captures observed participant behavior both
qualitatively and quantitatively, especially in scenarios where catastrophic
outcomes such as collisions can occur.Comment: Submitted to International Journal of Robotics Research; Revision 1:
(i) Clarified minor technical points; (ii) Revised proof for Theorem 3 to
hold under weaker assumptions; (iii) Added additional figures and expanded
discussions to improve readabilit
Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes
We consider Markov decision processes (MDPs) with multiple limit-average (or
mean-payoff) objectives. There exist two different views: (i) the expectation
semantics, where the goal is to optimize the expected mean-payoff objective,
and (ii) the satisfaction semantics, where the goal is to maximize the
probability of runs such that the mean-payoff value stays above a given vector.
We consider optimization with respect to both objectives at once, thus unifying
the existing semantics. Precisely, the goal is to optimize the expectation
while ensuring the satisfaction constraint. Our problem captures the notion of
optimization with respect to strategies that are risk-averse (i.e., ensure
certain probabilistic guarantee). Our main results are as follows: First, we
present algorithms for the decision problems which are always polynomial in the
size of the MDP. We also show that an approximation of the Pareto-curve can be
computed in time polynomial in the size of the MDP, and the approximation
factor, but exponential in the number of dimensions. Second, we present a
complete characterization of the strategy complexity (in terms of memory bounds
and randomization) required to solve our problem.Comment: Extended journal version of the LICS'15 pape
- …