13,917 research outputs found

    Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods

    Full text link
    The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for a human's risk sensitivity. To this end, we propose a flexible class of models based on coherent risk measures, which allow us to capture an entire spectrum of risk preferences from risk-neutral to worst-case. We propose efficient non-parametric algorithms based on linear programming and semi-parametric algorithms based on maximum likelihood for inferring a human's underlying risk measure and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with ten human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk-averse to risk-neutral in a data-efficient manner. Moreover, comparisons of the Risk-Sensitive (RS) IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively, especially in scenarios where catastrophic outcomes such as collisions can occur.Comment: Submitted to International Journal of Robotics Research; Revision 1: (i) Clarified minor technical points; (ii) Revised proof for Theorem 3 to hold under weaker assumptions; (iii) Added additional figures and expanded discussions to improve readabilit

    Evolution of wealth in a nonconservative economy driven by local Nash equilibria

    Full text link
    We develop a model for the evolution of wealth in a non-conservative economic environment, extending a theory developed earlier by the authors. The model considers a system of rational agents interacting in a game theoretical framework. This evolution drives the dynamic of the agents in both wealth and economic configuration variables. The cost function is chosen to represent a risk averse strategy of each agent. That is, the agent is more likely to interact with the market, the more predictable the market, and therefore the smaller its individual risk. This yields a kinetic equation for an effective single particle agent density with a Nash equilibrium serving as the local thermodynamic equilibrium. We consider a regime of scale separation where the large scale dynamics is given by a hydrodynamic closure with this local equilibrium. A class of generalized collision invariants (GCIs) is developed to overcome the difficulty of the non-conservative property in the hydrodynamic closure derivation of the large scale dynamics for the evolution of wealth distribution. The result is a system of gas dynamics-type equations for the density and average wealth of the agents on large scales. We recover the inverse Gamma distribution, which has been previously considered in the literature, as a local equilibrium for particular choices of the cost function

    Behavior Identification and Prediction for a Probabilistic Risk Framework

    Full text link
    Operation in a real world traffic requires autonomous vehicles to be able to plan their motion in complex environments (multiple moving participants). Planning through such environment requires the right search space to be provided for the trajectory or maneuver planners so that the safest motion for the ego vehicle can be identified. Given the current states of the environment and its participants, analyzing the risks based on the predicted trajectories of all the traffic participants provides the necessary search space for the planning of motion. This paper provides a fresh taxonomy of safety / risks that an autonomous vehicle should be able to handle while navigating through traffic. It provides a reference system architecture that needs to be implemented as well as describes a novel way of identifying and predicting the behaviors of the traffic participants using classic Multi Model Adaptive Estimation (MMAE). Preliminary simulation results of the implemented model are included

    Sparse Interacting Gaussian Processes: Efficiency and Optimality Theorems of Autonomous Crowd Navigation

    Full text link
    We study the sparsity and optimality properties of crowd navigation and find that existing techniques do not satisfy both criteria simultaneously: either they achieve optimality with a prohibitive number of samples or tractability assumptions make them fragile to catastrophe. For example, if the human and robot are modeled independently, then tractability is attained but the planner is prone to overcautious or overaggressive behavior. For sampling based motion planning of joint human-robot cost functions, for ntn_t agents and TT step lookahead, O(22ntT)\mathcal O(2^{2n_t T}) samples are needed for coverage of the action space. Advanced approaches statically partition the action space into free-space and then sample in those convex regions. However, if the human is \emph{moving} into free-space, then the partition is misleading and sampling is unsafe: free space will soon be occupied. We diagnose the cause of these deficiencies---optimization happens over \emph{trajectory} space---and propose a novel solution: optimize over trajectory \emph{distribution} space by using a Gaussian process (GP) basis. We exploit the "kernel trick" of GPs, where a continuum of trajectories are captured with a mean and covariance function. By using the mean and covariance as proxies for a trajectory family we reason about collective trajectory behavior without resorting to sampling. The GP basis is sparse and optimal with respect to collision avoidance and robot and crowd intention and flexibility. GP sparsity leans heavily on the insight that joint action space decomposes into free regions; however, the decomposition contains feasible solutions only if the partition is dynamically generated. We call our approach \emph{O(2nt)\mathcal O(2^{n_t})-sparse interacting Gaussian processes}

    SLAP: Simultaneous Localization and Planning Under Uncertainty for Physical Mobile Robots via Dynamic Replanning in Belief Space: Extended version

    Full text link
    Simultaneous localization and Planning (SLAP) is a crucial ability for an autonomous robot operating under uncertainty. In its most general form, SLAP induces a continuous POMDP (partially-observable Markov decision process), which needs to be repeatedly solved online. This paper addresses this problem and proposes a dynamic replanning scheme in belief space. The underlying POMDP, which is continuous in state, action, and observation space, is approximated offline via sampling-based methods, but operates in a replanning loop online to admit local improvements to the coarse offline policy. This construct enables the proposed method to combat changing environments and large localization errors, even when the change alters the homotopy class of the optimal trajectory. It further outperforms the state-of-the-art FIRM (Feedback-based Information RoadMap) method by eliminating unnecessary stabilization steps. Applying belief space planning to physical systems brings with it a plethora of challenges. A key focus of this paper is to implement the proposed planner on a physical robot and show the SLAP solution performance under uncertainty, in changing environments and in the presence of large disturbances, such as a kidnapped robot situation.Comment: 20 pages, updated figures, extended theory and simulation result

    Safe Reinforcement Learning with Scene Decomposition for Navigating Complex Urban Environments

    Full text link
    Navigating urban environments represents a complex task for automated vehicles. They must reach their goal safely and efficiently while considering a multitude of traffic participants. We propose a modular decision making algorithm to autonomously navigate intersections, addressing challenges of existing rule-based and reinforcement learning (RL) approaches. We first present a safe RL algorithm relying on a model-checker to ensure safety guarantees. To make the decision strategy robust to perception errors and occlusions, we introduce a belief update technique using a learning based approach. Finally, we use a scene decomposition approach to scale our algorithm to environments with multiple traffic participants. We empirically demonstrate that our algorithm outperforms rule-based methods and reinforcement learning techniques on a complex intersection scenario.Comment: 8 pages; 7 figure

    A Method for Estimating the Probability of Extremely Rare Accidents in Complex Systems

    Full text link
    Estimating the probability of failures or accidents with aerospace systems is often necessary when new concepts or designs are introduced, as it is being done for Autonomous Aircraft. If the design is safe, as it is supposed to be, accident cases are hard to find. Such analysis needs some variance reduction technique and several algorithms exist for that, however specific model features may cause difficulties in practice, such as the case of system models where independent agents have to autonomously accomplish missions within finite time, and likely with the presence of human agents. For handling these scenarios, this paper presents a novel estimation approach, based on the combination of the well-established variation reduction technique of Interacting Particles System (IPS) with the long-standing optimization algorithm denominated DIviding RECTangles (DIRECT). When combined, these two techniques yield statistically significant results for extremely low probabilities. In addition, this novel approach allows the identification of intermediate events and simplifies the evaluation of sensitivity of the estimated probabilities to certain system parameters

    A-Evac: the evacuation simulator for stochastic environment

    Full text link
    We introduce an open-source software Aamks for fire risk assessment. This article focuses on a component of Aamks - an evacuation simulator named a-evac. A-evac models evacuation of humans in the fire environment produced by CFAST fire simulator. In the article we discuss the probabilistic evacuation approach, automatic planning of exit routes, the interactions amongst the moving evacuees and the impact of smoke on the humans. The results consist of risk values based on FED, F-N curves and evacuation animations.Comment: Source code of the software described in this article can be found at http://github.com/aamk

    Generating Comfortable, Safe and Comprehensible Trajectories for Automated Vehicles in Mixed Traffic

    Full text link
    While motion planning approaches for automated driving often focus on safety and mathematical optimality with respect to technical parameters, they barely consider convenience, perceived safety for the passenger and comprehensibility for other traffic participants. For automated driving in mixed traffic, however, this is key to reach public acceptance. In this paper, we revise the problem statement of motion planning in mixed traffic: Instead of largely simplifying the motion planning problem to a convex optimization problem, we keep a more complex probabilistic multi agent model and strive for a near optimal solution. We assume cooperation of other traffic participants, yet being aware of violations of this assumption. This approach yields solutions that are provably safe in all situations, and convenient and comprehensible in situations that are also unambiguous for humans. Thus, it outperforms existing approaches in mixed traffic scenarios, as we show in simulation

    Optimal Alarms for Vehicular Collision Detection

    Full text link
    An important application of intelligent vehicles is advance detection of dangerous events such as collisions. This problem is framed as a problem of optimal alarm choice given predictive models for vehicle location and motion. Techniques for real-time collision detection are surveyed and grouped into three classes: random Monte Carlo sampling, faster deterministic approximations, and machine learning models trained by simulation. Theoretical guarantees on the performance of these collision detection techniques are provided where possible, and empirical analysis is provided for two example scenarios. Results validate Monte Carlo sampling as a robust solution despite its simplicity
    corecore