15 research outputs found
SHAIL: Safety-Aware Hierarchical Adversarial Imitation Learning for Autonomous Driving in Urban Environments
Designing a safe and human-like decision-making system for an autonomous
vehicle is a challenging task. Generative imitation learning is one possible
approach for automating policy-building by leveraging both real-world and
simulated decisions. Previous work that applies generative imitation learning
to autonomous driving policies focuses on learning a low-level controller for
simple settings. However, to scale to complex settings, many autonomous driving
systems combine fixed, safe, optimization-based low-level controllers with
high-level decision-making logic that selects the appropriate task and
associated controller. In this paper, we attempt to bridge this gap in
complexity by employing Safety-Aware Hierarchical Adversarial Imitation
Learning (SHAIL), a method for learning a high-level policy that selects from a
set of low-level controller instances in a way that imitates low-level driving
data on-policy. We introduce an urban roundabout simulator that controls
non-ego vehicles using real data from the Interaction dataset. We then show
empirically that our approach can produce better behavior than previous
approaches in driver imitation which have difficulty scaling to complex
environments. Our implementation is available at
https://github.com/sisl/InteractionImitation
Численное моделирование роевого алгоритма планирования пути в двухмерной некартографированной среде
This paper examines the effectiveness of swarm path planning algorithms in a two-dimensional unmapped environment. The efficiency criteria are the number of iterations in the path finding process and an assessment of the probability of successfully achieving the goal. During the study, the maximum speed of movement of the swarm and the maximum number of iterations during which it is allowed that the distance to the target does not decrease are changed. It is assumed that each particle can determine the state of the environment in a certain local region. By determining the state we mean determining the presence of an obstacle in a cell of the environment. To solve the problem of local minima, it is proposed to introduce a virtual obstacle at the local minimum point. This approach is generally known. The novelty of this approach lies in the fact that it solves the problem of detecting a local minimum by a swarm of particles. With a single movement, detecting a local minimum is trivial and comes down to checking the movement to previously visited cells. In the group case, a new solution to the problem of detecting a local minimum is required. This article provides a review and analysis of the path planning problem, problem formulation, problem statement, mathematical description of global swarm path planning algorithms with proposed modifications, pseudo-codes of planning algorithms and the results of a numerical study. In the course of numerical studies, the paper presents the criteria for the efficiency of path planning in an environment of 100x100 cells with randomly placed obstacles.Исследуется эффективность роевых алгоритмов планирования пути в двумерной некартографированной среде. В качестве критериев эффективности используется число итераций в процессе поиска пути и оценка вероятности успешного достижения цели. В ходе исследования изменяется максимальная скорость перемещения роя и максимальное число итераций, в течение которых допускается отсутствие уменьшения расстояния до цели. Предполагается, что каждая частица может определять состояние среды в некоторой локальной области. Под определением состояния имеется в виду определение наличия препятствия в ячейке среды. Для решения проблемы локальных минимумов предлагается вводить виртуальное препятствие в точке локального минимума. Данный подход в целом известен. Новизна этого подхода заключается в том, что решается задача обнаружения локального минимума роем частиц. При одиночном движении обнаружение локального минимума тривиально и сводится к проверке движения к ранее посещенным ячейкам. В групповом случае требуется новое решение задачи обнаружения локального минимума. В данной статье приводится обзор и анализ задачи планирования пути, формулировка проблемы, постановка задачи, математическое описание алгоритмов глобального роевого планирования пути с предложенными модификациями, псевдокоды алгоритмов планирования и результаты численного исследования. В ходе численных исследований определены критерии эффективности планирования пути в среде размером 100100 ячеек со случайно размещаемыми препятствиями
Does Unpredictability Influence Driving Behavior?
In this paper we investigate the effect of the unpredictability of
surrounding cars on an ego-car performing a driving maneuver. We use Maximum
Entropy Inverse Reinforcement Learning to model reward functions for an ego-car
conducting a lane change in a highway setting. We define a new feature based on
the unpredictability of surrounding cars and use it in the reward function. We
learn two reward functions from human data: a baseline and one that
incorporates our defined unpredictability feature, then compare their
performance with a quantitative and qualitative evaluation. Our evaluation
demonstrates that incorporating the unpredictability feature leads to a better
fit of human-generated test data. These results encourage further investigation
of the effect of unpredictability on driving behavior.Comment: Accepted to IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS) 202
Safe Real-World Autonomous Driving by Learning to Predict and Plan with a Mixture of Experts
The goal of autonomous vehicles is to navigate public roads safely and
comfortably. To enforce safety, traditional planning approaches rely on
handcrafted rules to generate trajectories. Machine learning-based systems, on
the other hand, scale with data and are able to learn more complex behaviors.
However, they often ignore that agents and self-driving vehicle trajectory
distributions can be leveraged to improve safety. In this paper, we propose
modeling a distribution over multiple future trajectories for both the
self-driving vehicle and other road agents, using a unified neural network
architecture for prediction and planning. During inference, we select the
planning trajectory that minimizes a cost taking into account safety and the
predicted probabilities. Our approach does not depend on any rule-based
planners for trajectory generation or optimization, improves with more training
data and is simple to implement. We extensively evaluate our method through a
realistic simulator and show that the predicted trajectory distribution
corresponds to different driving profiles. We also successfully deploy it on a
self-driving vehicle on urban public roads, confirming that it drives safely
without compromising comfort. The code for training and testing our model on a
public prediction dataset and the video of the road test are available at
https://woven.mobi/safepathne
Learning from Demonstrations of Critical Driving Behaviours Using Driver's Risk Field
In recent years, imitation learning (IL) has been widely used in industry as
the core of autonomous vehicle (AV) planning modules. However, previous work on
IL planners shows sample inefficiency and low generalisation in safety-critical
scenarios, on which they are rarely tested. As a result, IL planners can reach
a performance plateau where adding more training data ceases to improve the
learnt policy. First, our work presents an IL model using the spline
coefficient parameterisation and offline expert queries to enhance safety and
training efficiency. Then, we expose the weakness of the learnt IL policy by
synthetically generating critical scenarios through optimisation of parameters
of the driver's risk field (DRF), a parametric human driving behaviour model
implemented in a multi-agent traffic simulator based on the Lyft Prediction
Dataset. To continuously improve the learnt policy, we retrain the IL model
with augmented data. Thanks to the expressivity and interpretability of the
DRF, the desired driving behaviours can be encoded and aggregated to the
original training data. Our work constitutes a full development cycle that can
efficiently and continuously improve the learnt IL policies in closed-loop.
Finally, we show that our IL planner developed with 30 times less training
resource still has superior performance compared to the previous
state-of-the-art
Increasing the Efficiency of Policy Learning for Autonomous Vehicles by Multi-Task Representation Learning
Driving in a dynamic, multi-agent, and complex urban environment is a
difficult task requiring a complex decision-making policy. The learning of such
a policy requires a state representation that can encode the entire
environment. Mid-level representations that encode a vehicle's environment as
images have become a popular choice. Still, they are quite high-dimensional,
limiting their use in data-hungry approaches such as reinforcement learning. In
this article, we propose to learn a low-dimensional and rich latent
representation of the environment by leveraging the knowledge of relevant
semantic factors. To do this, we train an encoder-decoder deep neural network
to predict multiple application-relevant factors such as the trajectories of
other agents and the ego car. Furthermore, we propose a hazard signal based on
other vehicles' future trajectories and the planned route which is used in
conjunction with the learned latent representation as input to a down-stream
policy. We demonstrate that using the multi-head encoder-decoder neural network
results in a more informative representation than a standard single-head model.
In particular, the proposed representation learning and the hazard signal help
reinforcement learning to learn faster, with increased performance and less
data than baseline methods