7 research outputs found

    Real-Time Predictive Modeling and Robust Avoidance of Pedestrians with Uncertain, Changing Intentions

    Full text link
    To plan safe trajectories in urban environments, autonomous vehicles must be able to quickly assess the future intentions of dynamic agents. Pedestrians are particularly challenging to model, as their motion patterns are often uncertain and/or unknown a priori. This paper presents a novel changepoint detection and clustering algorithm that, when coupled with offline unsupervised learning of a Gaussian process mixture model (DPGP), enables quick detection of changes in intent and online learning of motion patterns not seen in prior training data. The resulting long-term movement predictions demonstrate improved accuracy relative to offline learning alone, in terms of both intent and trajectory prediction. By embedding these predictions within a chance-constrained motion planner, trajectories which are probabilistically safe to pedestrian motions can be identified in real-time. Hardware experiments demonstrate that this approach can accurately predict pedestrian motion patterns from onboard sensor/perception data and facilitate robust navigation within a dynamic environment.Comment: Submitted to 2014 International Workshop on the Algorithmic Foundations of Robotic

    Predictive Modeling of Pedestrian Motion Patterns with Bayesian Nonparametrics

    Get PDF
    For safe navigation in dynamic environments, an autonomous vehicle must be able to identify and predict the future behaviors of other mobile agents. A promising data-driven approach is to learn motion patterns from previous observations using Gaussian process (GP) regression, which are then used for online prediction. GP mixture models have been subsequently proposed for finding the number of motion patterns using GP likelihood as a similarity metric. However, this paper shows that using GP likelihood as a similarity metric can lead to non-intuitive clustering configurations - such as grouping trajectories with a small planar shift with respect to each other into different clusters - and thus produce poor prediction results. In this paper we develop a novel modeling framework, Dirichlet process active region (DPAR), that addresses the deficiencies of the previous GP-based approaches. In particular, with a discretized representation of the environment, we can explicitly account for planar shifts via a max pooling step, and reduce the computational complexity of the statistical inference procedure compared with the GP-based approaches. The proposed algorithm was applied on two real pedestrian trajectory datasets collected using a 3D Velodyne Lidar, and showed 15% improvement in prediction accuracy and 4.2 times reduction in computational time compared with a GP-based algorithm.Ford Motor Compan

    Massively Scalable Inverse Reinforcement Learning in Google Maps

    Full text link
    Optimizing for humans' latent preferences is a grand challenge in route recommendation, where globally-scalable solutions remain an open problem. Although past work created increasingly general solutions for the application of inverse reinforcement learning (IRL), these have not been successfully scaled to world-sized MDPs, large datasets, and highly parameterized models; respectively hundreds of millions of states, trajectories, and parameters. In this work, we surpass previous limitations through a series of advancements focused on graph compression, parallelization, and problem initialization based on dominant eigenvectors. We introduce Receding Horizon Inverse Planning (RHIP), which generalizes existing work and enables control of key performance trade-offs via its planning horizon. Our policy achieves a 16-24% improvement in global route quality, and, to our knowledge, represents the largest instance of IRL in a real-world setting to date. Our results show critical benefits to more sustainable modes of transportation (e.g. two-wheelers), where factors beyond journey time (e.g. route safety) play a substantial role. We conclude with ablations of key components, negative results on state-of-the-art eigenvalue solvers, and identify future opportunities to improve scalability via IRL-specific batching strategies

    Kernel Density Bayesian Inverse Reinforcement Learning

    Full text link
    Inverse reinforcement learning~(IRL) is a powerful framework to infer an agent's reward function by observing its behavior, but IRL algorithms that learn point estimates of the reward function can be misleading because there may be several functions that describe an agent's behavior equally well. A Bayesian approach to IRL models a distribution over candidate reward functions, alleviating the shortcomings of learning a point estimate. However, several Bayesian IRL algorithms use a QQ-value function in place of the likelihood function. The resulting posterior is computationally intensive to calculate, has few theoretical guarantees, and the QQ-value function is often a poor approximation for the likelihood. We introduce kernel density Bayesian IRL (KD-BIRL), which uses conditional kernel density estimation to directly approximate the likelihood, providing an efficient framework that, with a modified reward function parameterization, is applicable to environments with complex and infinite state spaces. We demonstrate KD-BIRL's benefits through a series of experiments in Gridworld environments and a simulated sepsis treatment task

    Bayesian nonparametric reward learning from demonstration

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 123-132).Learning from demonstration provides an attractive solution to the problem of teaching autonomous systems how to perform complex tasks. Demonstration opens autonomy development to non-experts and is an intuitive means of communication for humans, who naturally use demonstration to teach others. This thesis focuses on a specific form of learning from demonstration, namely inverse reinforcement learning, whereby the reward of the demonstrator is inferred. Formally, inverse reinforcement learning (IRL) is the task of learning the reward function of a Markov Decision Process (MDP) given knowledge of the transition function and a set of observed demonstrations. While reward learning is a promising method of inferring a rich and transferable representation of the demonstrator's intents, current algorithms suffer from intractability and inefficiency in large, real-world domains. This thesis presents a reward learning framework that infers multiple reward functions from a single, unsegmented demonstration, provides several key approximations which enable scalability to large real-world domains, and generalizes to fully continuous demonstration domains without the need for discretization of the state space, all of which are not handled by previous methods. In the thesis, modifications are proposed to an existing Bayesian IRL algorithm to improve its efficiency and tractability in situations where the state space is large and the demonstrations span only a small portion of it. A modified algorithm is presented and simulation results show substantially faster convergence while maintaining the solution quality of the original method. Even with the proposed efficiency improvements, a key limitation of Bayesian IRL (and most current IRL methods) is the assumption that the demonstrator is maximizing a single reward function. This presents problems when dealing with unsegmented demonstrations containing multiple distinct tasks, common in robot learning from demonstration (e.g. in large tasks that may require multiple subtasks to complete). A key contribution of this thesis is the development of a method that learns multiple reward functions from a single demonstration. The proposed method, termed Bayesian nonparametric inverse reinforcement learning (BNIRL), uses a Bayesian nonparametric mixture model to automatically partition the data and find a set of simple reward functions corresponding to each partition. The simple rewards are interpreted intuitively as subgoals, which can be used to predict actions or analyze which states are important to the demonstrator. Simulation results demonstrate the ability of BNIRL to handle cyclic tasks that break existing algorithms due to the existence of multiple subgoal rewards in the demonstration. The BNIRL algorithm is easily parallelized, and several approximations to the demonstrator likelihood function are offered to further improve computational tractability in large domains. Since BNIRL is only applicable to discrete domains, the Bayesian nonparametric reward learning framework is extended to general continuous demonstration domains using Gaussian process reward representations. The resulting algorithm, termed Gaussian process subgoal reward learning (GPSRL), is the only learning from demonstration method that is able to learn multiple reward functions from unsegmented demonstration in general continuous domains. GPSRL does not require discretization of the continuous state space and focuses computation efficiently around the demonstration itself. Learned subgoal rewards are cast as Markov decision process options to enable execution of the learned behaviors by the robotic system and provide a principled basis for future learning and skill refinement. Experiments conducted in the MIT RAVEN indoor test facility demonstrate the ability of both BNIRL and GPSRL to learn challenging maneuvers from demonstration on a quadrotor helicopter and a remote-controlled car.by Bernard J. Michini.Ph. D

    Modeling real-time human-automation collaborative scheduling of unmanned vehicles

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2013.This electronic version was submitted and approved by the author's academic department as part of an electronic thesis pilot project. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from department-submitted PDF version of thesis.Includes bibliographical references (p. 325-336).Recent advances in autonomy have enabled a future vision of single operator control of multiple heterogeneous Unmanned Vehicles (UVs). Real-time scheduling for multiple UVs in uncertain environments will require the computational ability of optimization algorithms combined with the judgment and adaptability of human supervisors. Automated Schedulers (AS), while faster and more accurate than humans at complex computation, are notoriously "brittle" in that they can only take into account those quantifiable variables, parameters, objectives, and constraints identified in the design stages that were deemed to be critical. Previous research has shown that when human operators collaborate with AS in real-time operations, inappropriate levels of operator trust, high operator workload, and a lack of goal alignment between the operator and AS can cause lower system performance and costly or deadly errors. Currently, designers trying to address these issues test different system components, training methods, and interaction modalities through costly human-in-the-loop testing. Thus, the objective of this thesis was to develop and validate a computational model of real-time human-automation collaborative scheduling of multiple UVs. First, attributes that are important to consider when modeling real-time human-automation collaborative scheduling were identified, providing a theoretical basis for the model proposed in this thesis. Second, a Collaborative Human-Automation Scheduling (CHAS) model was developed using system dynamics modeling techniques, enabling the model to capture non-linear human behavior and performance patterns, latencies and feedback interactions in the system, and qualitative variables such as human trust in automation. The CHAS model can aid a designer of future UV systems by simulating the impact of changes in system design and operator training on human and system performance. This can reduce the need for time-consuming human-in-the-loop testing that is typically required to evaluate such changes. It can also allow the designer to explore a wider trade space of system changes than is possible through prototyping or experimentation. Through a multi-stage validation process, the CHAS model was tested on three experimental data sets to build confidence in the accuracy and robustness of the model under different conditions. Next, the CHAS model was used to develop recommendations for system design and training changes to improve system performance. These changes were implemented and through an additional set of human subject experiments, the quantitative predictions of the CHAS model were validated. Specifically, test subjects who play computer and video games frequently were found to have a higher propensity to over-trust automation. By priming these gamers to lower their initial trust to a more appropriate level, system performance was improved by 10% as compared to gamers who were primed to have higher trust in the AS. The CHAS model provided accurate quantitative predictions of the impact of priming operator trust on system performance. Finally, the boundary conditions, limitations, and generalizability of the CHAS model for use with other real-time human-automation collaborative scheduling systems were evaluated.by Andrew S. Clare.Ph.D

    Scalable Reward Learning from Demonstration

    Get PDF
    Abstract — Reward learning from demonstration is the task of inferring the intents or goals of an agent demonstrating a task. Inverse reinforcement learning methods utilize the Markov decision process (MDP) framework to learn rewards, but typically scale poorly since they rely on the calculation of optimal value functions. Several key modifications are made to a previously developed Bayesian nonparametric inverse reinforcement learning algorithm that avoid calculation of an optimal value function and no longer require discretization of the state or action spaces. Experimental results are given which demonstrate the ability of the resulting algorithm to scale to larger problems and learn in domains with continuous demonstrations. I
    corecore