Search CORE

29,243 research outputs found

Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods

Author: Lacotte Jonathan
Majumdar Anirudha
Pavone Marco
Singh Sumeet
Publication venue
Publication date: 01/01/2018
Field of study

The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for a human's risk sensitivity. To this end, we propose a flexible class of models based on coherent risk measures, which allow us to capture an entire spectrum of risk preferences from risk-neutral to worst-case. We propose efficient non-parametric algorithms based on linear programming and semi-parametric algorithms based on maximum likelihood for inferring a human's underlying risk measure and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with ten human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk-averse to risk-neutral in a data-efficient manner. Moreover, comparisons of the Risk-Sensitive (RS) IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively, especially in scenarios where catastrophic outcomes such as collisions can occur.Comment: Submitted to International Journal of Robotics Research; Revision 1: (i) Clarified minor technical points; (ii) Revised proof for Theorem 3 to hold under weaker assumptions; (iii) Added additional figures and expanded discussions to improve readabilit

arXiv.org e-Print Archive

Princeton University Open Access Repository

Data-driven Inverse Optimization with Imperfect Information

Author: Esfahani Peyman Mohajerin
Hanasusanto Grani Adiwena
Kuhn Daniel
Shafieezadeh-Abadeh Soroosh
Publication venue
Publication date: 21/07/2017
Field of study

In data-driven inverse optimization an observer aims to learn the preferences of an agent who solves a parametric optimization problem depending on an exogenous signal. Thus, the observer seeks the agent's objective function that best explains a historical sequence of signals and corresponding optimal actions. We focus here on situations where the observer has imperfect information, that is, where the agent's true objective function is not contained in the search space of candidate objectives, where the agent suffers from bounded rationality or implementation errors, or where the observed signal-response pairs are corrupted by measurement noise. We formalize this inverse optimization problem as a distributionally robust program minimizing the worst-case risk that the {\em predicted} decision ({\em i.e.}, the decision implied by a particular candidate objective) differs from the agent's {\em actual} response to a random signal. We show that our framework offers rigorous out-of-sample guarantees for different loss functions used to measure prediction errors and that the emerging inverse optimization problems can be exactly reformulated as (or safely approximated by) tractable convex programs when a new suboptimality loss function is used. We show through extensive numerical tests that the proposed distributionally robust approach to inverse optimization attains often better out-of-sample performance than the state-of-the-art approaches

arXiv.org e-Print Archive

TU Delft Repository

Deep Mean-Shift Priors for Image Restoration

Author: Bigdeli Siavash Arjomand
Favaro Paolo
Jin Meiguang
Zwicker Matthias
Publication venue
Publication date: 01/01/2017
Field of study

In this paper we introduce a natural image prior that directly represents a Gaussian-smoothed version of the natural image distribution. We include our prior in a formulation of image restoration as a Bayes estimator that also allows us to solve noise-blind image restoration problems. We show that the gradient of our prior corresponds to the mean-shift vector on the natural image distribution. In addition, we learn the mean-shift vector field using denoising autoencoders, and use it in a gradient descent approach to perform Bayes risk minimization. We demonstrate competitive results for noise-blind deblurring, super-resolution, and demosaicing.Comment: NIPS 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Bern Open Repository and Information System (BORIS)

Inverse Optimization with Noisy Data

Author: Aswani Anil
Shen Zuo-Jun Max
Siddiq Auyon
Publication venue
Publication date: 22/12/2017
Field of study

Inverse optimization refers to the inference of unknown parameters of an optimization problem based on knowledge of its optimal solutions. This paper considers inverse optimization in the setting where measurements of the optimal solutions of a convex optimization problem are corrupted by noise. We first provide a formulation for inverse optimization and prove it to be NP-hard. In contrast to existing methods, we show that the parameter estimates produced by our formulation are statistically consistent. Our approach involves combining a new duality-based reformulation for bilevel programs with a regularization scheme that smooths discontinuities in the formulation. Using epi-convergence theory, we show the regularization parameter can be adjusted to approximate the original inverse optimization problem to arbitrary accuracy, which we use to prove our consistency results. Next, we propose two solution algorithms based on our duality-based formulation. The first is an enumeration algorithm that is applicable to settings where the dimensionality of the parameter space is modest, and the second is a semiparametric approach that combines nonparametric statistics with a modified version of our formulation. These numerical algorithms are shown to maintain the statistical consistency of the underlying formulation. Lastly, using both synthetic and real data, we demonstrate that our approach performs competitively when compared with existing heuristics

arXiv.org e-Print Archive

eScholarship - University of California