Search CORE

55,730 research outputs found

Online algorithms for POMDPs with continuous state, action, and observation spaces

Author: Kochenderfer Mykel
Sunberg Zachary
Publication venue
Publication date: 15/06/2018
Field of study

Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. This paper proposes and evaluates two new algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using weighted particle filtering. Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.Comment: Added Multilane sectio

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Rank deficiency of Kalman error covariance matrices in linear time-varying system with deterministic evolution

Author: Alberto Carrassi
Amit Apte
Bonnabel S.
Christopher K. R. T. Jones
Cohn S. E.
Colin Grudzien
Karthik S. Gurumoorthy
Merikoshi J. K.
Oseledets V. I.
Talagrand O.
Trevisan A.
Wojtowski M. P.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 04/10/2016
Field of study

We prove that for-linear, discrete, time-varying, deterministic system (perfect-model) with noisy outputs, the Riccati transformation in the Kalman filter asymptotically bounds the rank of the forecast and the analysis error covariance matrices to be less than or equal to the number of nonnegative Lyapunov exponents of the system. Further, the support of these error covariance matrices is shown to be confined to the space spanned by the unstable-neutral backward Lyapunov vectors, providing the theoretical justification for the methodology of the algorithms that perform assimilation only in the unstable-neutral subspace. The equivalent property of the autonomous system is investigated as a special case

arXiv.org e-Print Archive

Central Archive at the University of Reading

Crossref

Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods

Author: Lacotte Jonathan
Majumdar Anirudha
Pavone Marco
Singh Sumeet
Publication venue
Publication date: 01/01/2018
Field of study

The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for a human's risk sensitivity. To this end, we propose a flexible class of models based on coherent risk measures, which allow us to capture an entire spectrum of risk preferences from risk-neutral to worst-case. We propose efficient non-parametric algorithms based on linear programming and semi-parametric algorithms based on maximum likelihood for inferring a human's underlying risk measure and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with ten human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk-averse to risk-neutral in a data-efficient manner. Moreover, comparisons of the Risk-Sensitive (RS) IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively, especially in scenarios where catastrophic outcomes such as collisions can occur.Comment: Submitted to International Journal of Robotics Research; Revision 1: (i) Clarified minor technical points; (ii) Revised proof for Theorem 3 to hold under weaker assumptions; (iii) Added additional figures and expanded discussions to improve readabilit

arXiv.org e-Print Archive

Princeton University Open Access Repository