15 research outputs found
Self-Improving Safety Performance of Reinforcement Learning Based Driving with Black-Box Verification Algorithms
In this work, we propose a self-improving artificial intelligence system to
enhance the safety performance of reinforcement learning (RL)-based autonomous
driving (AD) agents using black-box verification methods. RL algorithms have
become popular in AD applications in recent years. However, the performance of
existing RL algorithms heavily depends on the diversity of training scenarios.
A lack of safety-critical scenarios during the training phase could result in
poor generalization performance in real-world driving applications. We propose
a novel framework in which the weaknesses of the training set are explored
through black-box verification methods. After discovering AD failure scenarios,
the RL agent's training is re-initiated via transfer learning to improve the
performance of previously unsafe scenarios. Simulation results demonstrate that
our approach efficiently discovers safety failures of action decisions in
RL-based adaptive cruise control (ACC) applications and significantly reduces
the number of vehicle collisions through iterative applications of our method.
The source code is publicly available at
https://github.com/data-and-decision-lab/self-improving-RL.Comment: 7 pages, 7 figures, 2 tables, published in IEEE International
Conference on Robotics and Automation (ICRA), June 2, 2023, London, U
DeFIX: Detecting and Fixing Failure Scenarios with Reinforcement Learning in Imitation Learning Based Autonomous Driving
Safely navigating through an urban environment without violating any traffic
rules is a crucial performance target for reliable autonomous driving. In this
paper, we present a Reinforcement Learning (RL) based methodology to DEtect and
FIX (DeFIX) failures of an Imitation Learning (IL) agent by extracting
infraction spots and re-constructing mini-scenarios on these infraction areas
to train an RL agent for fixing the shortcomings of the IL approach. DeFIX is a
continuous learning framework, where extraction of failure scenarios and
training of RL agents are executed in an infinite loop. After each new policy
is trained and added to the library of policies, a policy classifier method
effectively decides on which policy to activate at each step during the
evaluation. It is demonstrated that even with only one RL agent trained on
failure scenario of an IL agent, DeFIX method is either competitive or does
outperform state-of-the-art IL and RL based autonomous urban driving
benchmarks. We trained and validated our approach on the most challenging map
(Town05) of CARLA simulator which involves complex, realistic, and adversarial
driving scenarios. The source code is publicly available at
https://github.com/data-and-decision-lab/DeFIXComment: 6 pages, 4 figures, 2 tables, published in IEEE International
Conference on Intelligent Transportation Systems (ITSC), October 12, 2022,
Macau, Chin
Sample Efficient Interactive End-to-End Deep Learning for Self-Driving Cars with Selective Multi-Class Safe Dataset Aggregation
The objective of this paper is to develop a sample efficient end-to-end deep
learning method for self-driving cars, where we attempt to increase the value
of the information extracted from samples, through careful analysis obtained
from each call to expert driver\'s policy. End-to-end imitation learning is a
popular method for computing self-driving car policies. The standard approach
relies on collecting pairs of inputs (camera images) and outputs (steering
angle, etc.) from an expert policy and fitting a deep neural network to this
data to learn the driving policy. Although this approach had some successful
demonstrations in the past, learning a good policy might require a lot of
samples from the expert driver, which might be resource-consuming. In this
work, we develop a novel framework based on the Safe Dateset Aggregation (safe
DAgger) approach, where the current learned policy is automatically segmented
into different trajectory classes, and the algorithm identifies trajectory
segments or classes with the weak performance at each step. Once the trajectory
segments with weak performance identified, the sampling algorithm focuses on
calling the expert policy only on these segments, which improves the
convergence rate. The presented simulation results show that the proposed
approach can yield significantly better performance compared to the standard
Safe DAgger algorithm while using the same amount of samples from the expert.Comment: 6 pages, 6 figures, IROS2019 conferenc
Health Aware Stochastic Planning For Persistent Package Delivery Missions Using Quadrotors
In persistent missions, taking system’s health and capability degradation into account is an essential factor to predict and avoid failures. The state space in health-aware planning problems is often a mixture of continuous vehicle-level and discrete mission-level states. This in particular poses a challenge when the mission domain is partially observable and restricts the use of computationally expensive forward search methods. This paper presents a method that exploits a structure that exists in many health-aware planning problems and performs a two-layer planning scheme. The lower layer exploits the local linearization and Gaussian distribution assumption over vehicle-level states while the higher layer maintains a non-Gaussian distribution over discrete mission-level variables. This two-layer planning scheme allows us to limit the expensive online forward search to the mission-level states, and thus predict system’s behavior over longer horizons in the future. We demonstrate the performance of the method on a long duration package delivery mission using a quadrotor in a partially-observable domain in the presence of constraints and health/capability degradation
Automated Lane Change Decision Making using Deep Reinforcement Learning in Dynamic and Uncertain Highway Environment
Autonomous lane changing is a critical feature for advanced autonomous
driving systems, that involves several challenges such as uncertainty in other
driver's behaviors and the trade-off between safety and agility. In this work,
we develop a novel simulation environment that emulates these challenges and
train a deep reinforcement learning agent that yields consistent performance in
a variety of dynamic and uncertain traffic scenarios. Results show that the
proposed data-driven approach performs significantly better in noisy
environments compared to methods that rely solely on heuristics.Comment: Accepted to IEEE Intelligent Transportation Systems Conference - ITSC
201
PURSUhInT: In Search of Informative Hint Points Based on Layer Clustering for Knowledge Distillation
We propose a novel knowledge distillation methodology for compressing deep
neural networks. One of the most efficient methods for knowledge distillation
is hint distillation, where the student model is injected with information
(hints) from several different layers of the teacher model. Although the
selection of hint points can drastically alter the compression performance,
there is no systematic approach for selecting them, other than brute-force
hyper-parameter search. We propose a clustering based hint selection
methodology, where the layers of teacher model are clustered with respect to
several metrics and the cluster centers are used as the hint points. The
proposed approach is validated in CIFAR-100 dataset, where ResNet-110 network
was used as the teacher model. Our results show that hint points selected by
our algorithm results in superior compression performance with respect to
state-of-the-art knowledge distillation algorithms on the same student models
and datasets
MAR-CPS: Measurable Augmented Reality for Prototyping Cyber-Physical Systems
Cyber-Physical Systems (CPSs) refer to engineering platforms that rely on the inte- gration of physical systems with control, computation, and communication technologies. Autonomous vehicles are instances of CPSs that are rapidly growing with applications in many domains. Due to the integration of physical systems with computational sens- ing, planning, and learning in CPSs, hardware-in-the-loop experiments are an essential step for transitioning from simulations to real-world experiments. This paper proposes an architecture for rapid prototyping of CPSs that has been developed in the Aerospace Controls Laboratory at the Massachusetts Institute of Technology. This system, referred to as MAR-CPS (Measurable Augmented Reality for Prototyping Cyber-Physical Systems), includes physical vehicles and sensors, a motion capture technology, a projection system, and a communication network. The role of the projection system is to augment a physical laboratory space with 1) autonomous vehicles' beliefs and 2) a simulated mission environ- ment, which in turn will be measured by physical sensors on the vehicles. The main focus of this method is on rapid design of planning, perception, and learning algorithms for au- tonomous single-agent or multi-agent systems. Moreover, the proposed architecture allows researchers to project a simulated counterpart of outdoor environments in a controlled, indoor space, which can be crucial when testing in outdoor environments is disfavored due to safety, regulatory, or monetary concerns. We discuss the issues related to the design and implementation of MAR-CPS and demonstrate its real-time behavior in a variety of problems in autonomy, such as motion planning, multi-robot coordination, and learning spatio-temporal fields.Boeing Compan
Edge on Wheels With OMNIBUS Networking for 6G Technology
In recent years, both the scientific community and the industry have focused on moving computational resources with remote data centres from the centralized cloud to decentralised computing, making them closer to the source or the so called “edge” of the network. This is due to the fact that the cloud system alone cannot sufficiently support the huge demands of future networks with the massive growth of new, time-critical applications such as self-driving vehicles, Augmented Reality/Virtual Reality techniques, advanced robotics and critical remote control of smart Internet-of-Things applications. While decentralised edge computing will form the backbone of future heterogeneous networks, it still remains at its infancy stage. Currently, there is no comprehensive platform. In this article, we propose a novel decentralised edge architecture, a solution called OMNIBUS, which enables a continuous distribution of computational capacity for end-devices in different localities by exploiting moving vehicles as storage and computation resources. Scalability and adaptability are the main features that differentiate the proposed solution from existing edge computing models. The proposed solution has the potential to scale infinitely, which will lead to a significant increase in network speed. The OMNIBUS solution rests on developing two predictive models: (i) to learn timing and direction of vehicular movements to ascertain computational capacity for a given locale, and (ii) to introduce a theoretical framework for sequential to parallel conversion in learning, optimisation and caching under contingent circumstances due to vehicles in motion