15 research outputs found

    SHAIL: Safety-Aware Hierarchical Adversarial Imitation Learning for Autonomous Driving in Urban Environments

    Full text link
    Designing a safe and human-like decision-making system for an autonomous vehicle is a challenging task. Generative imitation learning is one possible approach for automating policy-building by leveraging both real-world and simulated decisions. Previous work that applies generative imitation learning to autonomous driving policies focuses on learning a low-level controller for simple settings. However, to scale to complex settings, many autonomous driving systems combine fixed, safe, optimization-based low-level controllers with high-level decision-making logic that selects the appropriate task and associated controller. In this paper, we attempt to bridge this gap in complexity by employing Safety-Aware Hierarchical Adversarial Imitation Learning (SHAIL), a method for learning a high-level policy that selects from a set of low-level controller instances in a way that imitates low-level driving data on-policy. We introduce an urban roundabout simulator that controls non-ego vehicles using real data from the Interaction dataset. We then show empirically that our approach can produce better behavior than previous approaches in driver imitation which have difficulty scaling to complex environments. Our implementation is available at https://github.com/sisl/InteractionImitation

    Численное моделирование роевого алгоритма планирования пути в двухмерной некартографированной среде

    Get PDF
    This paper examines the effectiveness of swarm path planning algorithms in a two-dimensional unmapped environment. The efficiency criteria are the number of iterations in the path finding process and an assessment of the probability of successfully achieving the goal. During the study, the maximum speed of movement of the swarm and the maximum number of iterations during which it is allowed that the distance to the target does not decrease are changed. It is assumed that each particle can determine the state of the environment in a certain local region. By determining the state we mean determining the presence of an obstacle in a cell of the environment. To solve the problem of local minima, it is proposed to introduce a virtual obstacle at the local minimum point. This approach is generally known. The novelty of this approach lies in the fact that it solves the problem of detecting a local minimum by a swarm of particles. With a single movement, detecting a local minimum is trivial and comes down to checking the movement to previously visited cells. In the group case, a new solution to the problem of detecting a local minimum is required. This article provides a review and analysis of the path planning problem, problem formulation, problem statement, mathematical description of global swarm path planning algorithms with proposed modifications, pseudo-codes of planning algorithms and the results of a numerical study. In the course of numerical studies, the paper presents the criteria for the efficiency of path planning in an environment of 100x100 cells with randomly placed obstacles.Исследуется эффективность роевых алгоритмов планирования пути в двумерной некартографированной среде. В качестве критериев эффективности используется число итераций в процессе поиска пути и оценка вероятности успешного достижения цели. В ходе исследования изменяется максимальная скорость перемещения роя и максимальное число итераций, в течение которых допускается отсутствие уменьшения расстояния до цели. Предполагается, что каждая частица может определять состояние среды в некоторой локальной области. Под определением состояния имеется в виду определение наличия препятствия в ячейке среды. Для решения проблемы локальных минимумов предлагается вводить виртуальное препятствие в точке локального минимума. Данный подход в целом известен. Новизна этого подхода заключается в том, что решается задача обнаружения локального минимума роем частиц. При одиночном движении обнаружение локального минимума тривиально и сводится к проверке движения к ранее посещенным ячейкам. В групповом случае требуется новое решение задачи обнаружения локального минимума. В данной статье приводится обзор и анализ задачи планирования пути, формулировка проблемы, постановка задачи, математическое описание алгоритмов глобального роевого планирования пути с предложенными модификациями, псевдокоды алгоритмов планирования и результаты численного исследования. В ходе численных исследований определены критерии эффективности планирования пути в среде размером 100100 ячеек со случайно размещаемыми препятствиями

    Does Unpredictability Influence Driving Behavior?

    Full text link
    In this paper we investigate the effect of the unpredictability of surrounding cars on an ego-car performing a driving maneuver. We use Maximum Entropy Inverse Reinforcement Learning to model reward functions for an ego-car conducting a lane change in a highway setting. We define a new feature based on the unpredictability of surrounding cars and use it in the reward function. We learn two reward functions from human data: a baseline and one that incorporates our defined unpredictability feature, then compare their performance with a quantitative and qualitative evaluation. Our evaluation demonstrates that incorporating the unpredictability feature leads to a better fit of human-generated test data. These results encourage further investigation of the effect of unpredictability on driving behavior.Comment: Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 202

    Safe Real-World Autonomous Driving by Learning to Predict and Plan with a Mixture of Experts

    Full text link
    The goal of autonomous vehicles is to navigate public roads safely and comfortably. To enforce safety, traditional planning approaches rely on handcrafted rules to generate trajectories. Machine learning-based systems, on the other hand, scale with data and are able to learn more complex behaviors. However, they often ignore that agents and self-driving vehicle trajectory distributions can be leveraged to improve safety. In this paper, we propose modeling a distribution over multiple future trajectories for both the self-driving vehicle and other road agents, using a unified neural network architecture for prediction and planning. During inference, we select the planning trajectory that minimizes a cost taking into account safety and the predicted probabilities. Our approach does not depend on any rule-based planners for trajectory generation or optimization, improves with more training data and is simple to implement. We extensively evaluate our method through a realistic simulator and show that the predicted trajectory distribution corresponds to different driving profiles. We also successfully deploy it on a self-driving vehicle on urban public roads, confirming that it drives safely without compromising comfort. The code for training and testing our model on a public prediction dataset and the video of the road test are available at https://woven.mobi/safepathne

    Learning from Demonstrations of Critical Driving Behaviours Using Driver's Risk Field

    Full text link
    In recent years, imitation learning (IL) has been widely used in industry as the core of autonomous vehicle (AV) planning modules. However, previous work on IL planners shows sample inefficiency and low generalisation in safety-critical scenarios, on which they are rarely tested. As a result, IL planners can reach a performance plateau where adding more training data ceases to improve the learnt policy. First, our work presents an IL model using the spline coefficient parameterisation and offline expert queries to enhance safety and training efficiency. Then, we expose the weakness of the learnt IL policy by synthetically generating critical scenarios through optimisation of parameters of the driver's risk field (DRF), a parametric human driving behaviour model implemented in a multi-agent traffic simulator based on the Lyft Prediction Dataset. To continuously improve the learnt policy, we retrain the IL model with augmented data. Thanks to the expressivity and interpretability of the DRF, the desired driving behaviours can be encoded and aggregated to the original training data. Our work constitutes a full development cycle that can efficiently and continuously improve the learnt IL policies in closed-loop. Finally, we show that our IL planner developed with 30 times less training resource still has superior performance compared to the previous state-of-the-art

    Increasing the Efficiency of Policy Learning for Autonomous Vehicles by Multi-Task Representation Learning

    Full text link
    Driving in a dynamic, multi-agent, and complex urban environment is a difficult task requiring a complex decision-making policy. The learning of such a policy requires a state representation that can encode the entire environment. Mid-level representations that encode a vehicle's environment as images have become a popular choice. Still, they are quite high-dimensional, limiting their use in data-hungry approaches such as reinforcement learning. In this article, we propose to learn a low-dimensional and rich latent representation of the environment by leveraging the knowledge of relevant semantic factors. To do this, we train an encoder-decoder deep neural network to predict multiple application-relevant factors such as the trajectories of other agents and the ego car. Furthermore, we propose a hazard signal based on other vehicles' future trajectories and the planned route which is used in conjunction with the learned latent representation as input to a down-stream policy. We demonstrate that using the multi-head encoder-decoder neural network results in a more informative representation than a standard single-head model. In particular, the proposed representation learning and the hazard signal help reinforcement learning to learn faster, with increased performance and less data than baseline methods
    corecore