15 research outputs found

    Self-Improving Safety Performance of Reinforcement Learning Based Driving with Black-Box Verification Algorithms

    Full text link
    In this work, we propose a self-improving artificial intelligence system to enhance the safety performance of reinforcement learning (RL)-based autonomous driving (AD) agents using black-box verification methods. RL algorithms have become popular in AD applications in recent years. However, the performance of existing RL algorithms heavily depends on the diversity of training scenarios. A lack of safety-critical scenarios during the training phase could result in poor generalization performance in real-world driving applications. We propose a novel framework in which the weaknesses of the training set are explored through black-box verification methods. After discovering AD failure scenarios, the RL agent's training is re-initiated via transfer learning to improve the performance of previously unsafe scenarios. Simulation results demonstrate that our approach efficiently discovers safety failures of action decisions in RL-based adaptive cruise control (ACC) applications and significantly reduces the number of vehicle collisions through iterative applications of our method. The source code is publicly available at https://github.com/data-and-decision-lab/self-improving-RL.Comment: 7 pages, 7 figures, 2 tables, published in IEEE International Conference on Robotics and Automation (ICRA), June 2, 2023, London, U

    DeFIX: Detecting and Fixing Failure Scenarios with Reinforcement Learning in Imitation Learning Based Autonomous Driving

    Full text link
    Safely navigating through an urban environment without violating any traffic rules is a crucial performance target for reliable autonomous driving. In this paper, we present a Reinforcement Learning (RL) based methodology to DEtect and FIX (DeFIX) failures of an Imitation Learning (IL) agent by extracting infraction spots and re-constructing mini-scenarios on these infraction areas to train an RL agent for fixing the shortcomings of the IL approach. DeFIX is a continuous learning framework, where extraction of failure scenarios and training of RL agents are executed in an infinite loop. After each new policy is trained and added to the library of policies, a policy classifier method effectively decides on which policy to activate at each step during the evaluation. It is demonstrated that even with only one RL agent trained on failure scenario of an IL agent, DeFIX method is either competitive or does outperform state-of-the-art IL and RL based autonomous urban driving benchmarks. We trained and validated our approach on the most challenging map (Town05) of CARLA simulator which involves complex, realistic, and adversarial driving scenarios. The source code is publicly available at https://github.com/data-and-decision-lab/DeFIXComment: 6 pages, 4 figures, 2 tables, published in IEEE International Conference on Intelligent Transportation Systems (ITSC), October 12, 2022, Macau, Chin

    Sample Efficient Interactive End-to-End Deep Learning for Self-Driving Cars with Selective Multi-Class Safe Dataset Aggregation

    Full text link
    The objective of this paper is to develop a sample efficient end-to-end deep learning method for self-driving cars, where we attempt to increase the value of the information extracted from samples, through careful analysis obtained from each call to expert driver\'s policy. End-to-end imitation learning is a popular method for computing self-driving car policies. The standard approach relies on collecting pairs of inputs (camera images) and outputs (steering angle, etc.) from an expert policy and fitting a deep neural network to this data to learn the driving policy. Although this approach had some successful demonstrations in the past, learning a good policy might require a lot of samples from the expert driver, which might be resource-consuming. In this work, we develop a novel framework based on the Safe Dateset Aggregation (safe DAgger) approach, where the current learned policy is automatically segmented into different trajectory classes, and the algorithm identifies trajectory segments or classes with the weak performance at each step. Once the trajectory segments with weak performance identified, the sampling algorithm focuses on calling the expert policy only on these segments, which improves the convergence rate. The presented simulation results show that the proposed approach can yield significantly better performance compared to the standard Safe DAgger algorithm while using the same amount of samples from the expert.Comment: 6 pages, 6 figures, IROS2019 conferenc

    Health Aware Stochastic Planning For Persistent Package Delivery Missions Using Quadrotors

    Get PDF
    In persistent missions, taking system’s health and capability degradation into account is an essential factor to predict and avoid failures. The state space in health-aware planning problems is often a mixture of continuous vehicle-level and discrete mission-level states. This in particular poses a challenge when the mission domain is partially observable and restricts the use of computationally expensive forward search methods. This paper presents a method that exploits a structure that exists in many health-aware planning problems and performs a two-layer planning scheme. The lower layer exploits the local linearization and Gaussian distribution assumption over vehicle-level states while the higher layer maintains a non-Gaussian distribution over discrete mission-level variables. This two-layer planning scheme allows us to limit the expensive online forward search to the mission-level states, and thus predict system’s behavior over longer horizons in the future. We demonstrate the performance of the method on a long duration package delivery mission using a quadrotor in a partially-observable domain in the presence of constraints and health/capability degradation

    Automated Lane Change Decision Making using Deep Reinforcement Learning in Dynamic and Uncertain Highway Environment

    Full text link
    Autonomous lane changing is a critical feature for advanced autonomous driving systems, that involves several challenges such as uncertainty in other driver's behaviors and the trade-off between safety and agility. In this work, we develop a novel simulation environment that emulates these challenges and train a deep reinforcement learning agent that yields consistent performance in a variety of dynamic and uncertain traffic scenarios. Results show that the proposed data-driven approach performs significantly better in noisy environments compared to methods that rely solely on heuristics.Comment: Accepted to IEEE Intelligent Transportation Systems Conference - ITSC 201

    PURSUhInT: In Search of Informative Hint Points Based on Layer Clustering for Knowledge Distillation

    Full text link
    We propose a novel knowledge distillation methodology for compressing deep neural networks. One of the most efficient methods for knowledge distillation is hint distillation, where the student model is injected with information (hints) from several different layers of the teacher model. Although the selection of hint points can drastically alter the compression performance, there is no systematic approach for selecting them, other than brute-force hyper-parameter search. We propose a clustering based hint selection methodology, where the layers of teacher model are clustered with respect to several metrics and the cluster centers are used as the hint points. The proposed approach is validated in CIFAR-100 dataset, where ResNet-110 network was used as the teacher model. Our results show that hint points selected by our algorithm results in superior compression performance with respect to state-of-the-art knowledge distillation algorithms on the same student models and datasets

    MAR-CPS: Measurable Augmented Reality for Prototyping Cyber-Physical Systems

    Get PDF
    Cyber-Physical Systems (CPSs) refer to engineering platforms that rely on the inte- gration of physical systems with control, computation, and communication technologies. Autonomous vehicles are instances of CPSs that are rapidly growing with applications in many domains. Due to the integration of physical systems with computational sens- ing, planning, and learning in CPSs, hardware-in-the-loop experiments are an essential step for transitioning from simulations to real-world experiments. This paper proposes an architecture for rapid prototyping of CPSs that has been developed in the Aerospace Controls Laboratory at the Massachusetts Institute of Technology. This system, referred to as MAR-CPS (Measurable Augmented Reality for Prototyping Cyber-Physical Systems), includes physical vehicles and sensors, a motion capture technology, a projection system, and a communication network. The role of the projection system is to augment a physical laboratory space with 1) autonomous vehicles' beliefs and 2) a simulated mission environ- ment, which in turn will be measured by physical sensors on the vehicles. The main focus of this method is on rapid design of planning, perception, and learning algorithms for au- tonomous single-agent or multi-agent systems. Moreover, the proposed architecture allows researchers to project a simulated counterpart of outdoor environments in a controlled, indoor space, which can be crucial when testing in outdoor environments is disfavored due to safety, regulatory, or monetary concerns. We discuss the issues related to the design and implementation of MAR-CPS and demonstrate its real-time behavior in a variety of problems in autonomy, such as motion planning, multi-robot coordination, and learning spatio-temporal fields.Boeing Compan

    Edge on Wheels With OMNIBUS Networking for 6G Technology

    Get PDF
    In recent years, both the scientific community and the industry have focused on moving computational resources with remote data centres from the centralized cloud to decentralised computing, making them closer to the source or the so called “edge” of the network. This is due to the fact that the cloud system alone cannot sufficiently support the huge demands of future networks with the massive growth of new, time-critical applications such as self-driving vehicles, Augmented Reality/Virtual Reality techniques, advanced robotics and critical remote control of smart Internet-of-Things applications. While decentralised edge computing will form the backbone of future heterogeneous networks, it still remains at its infancy stage. Currently, there is no comprehensive platform. In this article, we propose a novel decentralised edge architecture, a solution called OMNIBUS, which enables a continuous distribution of computational capacity for end-devices in different localities by exploiting moving vehicles as storage and computation resources. Scalability and adaptability are the main features that differentiate the proposed solution from existing edge computing models. The proposed solution has the potential to scale infinitely, which will lead to a significant increase in network speed. The OMNIBUS solution rests on developing two predictive models: (i) to learn timing and direction of vehicular movements to ascertain computational capacity for a given locale, and (ii) to introduce a theoretical framework for sequential to parallel conversion in learning, optimisation and caching under contingent circumstances due to vehicles in motion
    corecore