14,291 research outputs found

    Combining Planning and Deep Reinforcement Learning in Tactical Decision Making for Autonomous Driving

    Full text link
    Tactical decision making for autonomous driving is challenging due to the diversity of environments, the uncertainty in the sensor information, and the complex interaction with other road users. This paper introduces a general framework for tactical decision making, which combines the concepts of planning and learning, in the form of Monte Carlo tree search and deep reinforcement learning. The method is based on the AlphaGo Zero algorithm, which is extended to a domain with a continuous state space where self-play cannot be used. The framework is applied to two different highway driving cases in a simulated environment and it is shown to perform better than a commonly used baseline method. The strength of combining planning and learning is also illustrated by a comparison to using the Monte Carlo tree search or the neural network policy separately

    Autonomous Highway Driving using Deep Reinforcement Learning

    Full text link
    The operational space of an autonomous vehicle (AV) can be diverse and vary significantly. This may lead to a scenario that was not postulated in the design phase. Due to this, formulating a rule based decision maker for selecting maneuvers may not be ideal. Similarly, it may not be effective to design an a-priori cost function and then solve the optimal control problem in real-time. In order to address these issues and to avoid peculiar behaviors when encountering unforeseen scenario, we propose a reinforcement learning (RL) based method, where the ego car, i.e., an autonomous vehicle, learns to make decisions by directly interacting with simulated traffic. The decision maker for AV is implemented as a deep neural network providing an action choice for a given system state. In a critical application such as driving, an RL agent without explicit notion of safety may not converge or it may need extremely large number of samples before finding a reliable policy. To best address the issue, this paper incorporates reinforcement learning with an additional short horizon safety check (SC). In a critical scenario, the safety check will also provide an alternate safe action to the agent provided if it exists. This leads to two novel contributions. First, it generalizes the states that could lead to undesirable "near-misses" or "collisions ". Second, inclusion of safety check can provide a safe and stable training environment. This significantly enhances learning efficiency without inhibiting meaningful exploration to ensure safe and optimal learned behavior. We demonstrate the performance of the developed algorithm in highway driving scenario where the trained AV encounters varying traffic density in a highway setting

    Autonomous Ramp Merge Maneuver Based on Reinforcement Learning with Continuous Action Space

    Full text link
    Ramp merging is a critical maneuver for road safety and traffic efficiency. Most of the current automated driving systems developed by multiple automobile manufacturers and suppliers are typically limited to restricted access freeways only. Extending the automated mode to ramp merging zones presents substantial challenges. One is that the automated vehicle needs to incorporate a future objective (e.g. a successful and smooth merge) and optimize a long-term reward that is impacted by subsequent actions when executing the current action. Furthermore, the merging process involves interaction between the merging vehicle and its surrounding vehicles whose behavior may be cooperative or adversarial, leading to distinct merging countermeasures that are crucial to successfully complete the merge. In place of the conventional rule-based approaches, we propose to apply reinforcement learning algorithm on the automated vehicle agent to find an optimal driving policy by maximizing the long-term reward in an interactive driving environment. Most importantly, in contrast to most reinforcement learning applications in which the action space is resolved as discrete, our approach treats the action space as well as the state space as continuous without incurring additional computational costs. Our unique contribution is the design of the Q-function approximation whose format is structured as a quadratic function, by which simple but effective neural networks are used to estimate its coefficients. The results obtained through the implementation of our training platform demonstrate that the vehicle agent is able to learn a safe, smooth and timely merging policy, indicating the effectiveness and practicality of our approach

    Multimodal Probabilistic Model-Based Planning for Human-Robot Interaction

    Full text link
    This paper presents a method for constructing human-robot interaction policies in settings where multimodality, i.e., the possibility of multiple highly distinct futures, plays a critical role in decision making. We are motivated in this work by the example of traffic weaving, e.g., at highway on-ramps/off-ramps, where entering and exiting cars must swap lanes in a short distance---a challenging negotiation even for experienced drivers due to the inherent multimodal uncertainty of who will pass whom. Our approach is to learn multimodal probability distributions over future human actions from a dataset of human-human exemplars and perform real-time robot policy construction in the resulting environment model through massively parallel sampling of human responses to candidate robot action sequences. Direct learning of these distributions is made possible by recent advances in the theory of conditional variational autoencoders (CVAEs), whereby we learn action distributions simultaneously conditioned on the present interaction history, as well as candidate future robot actions in order to take into account response dynamics. We demonstrate the efficacy of this approach with a human-in-the-loop simulation of a traffic weaving scenario

    Unconventional Arterial Intersection Designs under Connected and Automated Vehicle Environment: A Survey

    Full text link
    Signalized intersections are major sources of traffic delay and collision within the modern transportation system. Conventional signal optimization has revealed its limitation in improving the mobility and safety of an intersection. Unconventional arterial intersection designs (UAIDs) are able to improve the performance of an intersection by reducing phases of a signal cycle. Furthermore, they can fundamentally alter the number and the nature of the conflicting points. However, the driver's confusion, as a result of the unconventional geometric designs, remains one of the major barriers for the widespread adoption of UAIDs. Connected and Automated Vehicle (CAV) technology has the potential to overcome this barrier by eliminating the driver's confusion of a UAID. Therefore, UAIDs can play a significant role in transportation networks in the near future. In this paper, we surveyed UAID studies and implementations. In addition, we present an overview of intersection control schemes with the emergence of CAV and highlight the opportunity rises for UAID with the CAV technology. It is believed that the benefits gained from deploying UAIDs in conjunction with CAV are significant during the initial rollout of CAV under low market penetration

    Formulation of Deep Reinforcement Learning Architecture Toward Autonomous Driving for On-Ramp Merge

    Full text link
    Multiple automakers have in development or in production automated driving systems (ADS) that offer freeway-pilot functions. This type of ADS is typically limited to restricted-access freeways only, that is, the transition from manual to automated modes takes place only after the ramp merging process is completed manually. One major challenge to extend the automation to ramp merging is that the automated vehicle needs to incorporate and optimize long-term objectives (e.g. successful and smooth merge) when near-term actions must be safely executed. Moreover, the merging process involves interactions with other vehicles whose behaviors are sometimes hard to predict but may influence the merging vehicle optimal actions. To tackle such a complicated control problem, we propose to apply Deep Reinforcement Learning (DRL) techniques for finding an optimal driving policy by maximizing the long-term reward in an interactive environment. Specifically, we apply a Long Short-Term Memory (LSTM) architecture to model the interactive environment, from which an internal state containing historical driving information is conveyed to a Deep Q-Network (DQN). The DQN is used to approximate the Q-function, which takes the internal state as input and generates Q-values as output for action selection. With this DRL architecture, the historical impact of interactive environment on the long-term reward can be captured and taken into account for deciding the optimal control policy. The proposed architecture has the potential to be extended and applied to other autonomous driving scenarios such as driving through a complex intersection or changing lanes under varying traffic flow conditions.Comment: IEEE International Conference on Intelligent Transportation Systems, Yokohama, Japan, 201

    Transfer Learning versus Multi-agent Learning regarding Distributed Decision-Making in Highway Traffic

    Full text link
    Transportation and traffic are currently undergoing a rapid increase in terms of both scale and complexity. At the same time, an increasing share of traffic participants are being transformed into agents driven or supported by artificial intelligence resulting in mixed-intelligence traffic. This work explores the implications of distributed decision-making in mixed-intelligence traffic. The investigations are carried out on the basis of an online-simulated highway scenario, namely the MIT \emph{DeepTraffic} simulation. In the first step traffic agents are trained by means of a deep reinforcement learning approach, being deployed inside an elitist evolutionary algorithm for hyperparameter search. The resulting architectures and training parameters are then utilized in order to either train a single autonomous traffic agent and transfer the learned weights onto a multi-agent scenario or else to conduct multi-agent learning directly. Both learning strategies are evaluated on different ratios of mixed-intelligence traffic. The strategies are assessed according to the average speed of all agents driven by artificial intelligence. Traffic patterns that provoke a reduction in traffic flow are analyzed with respect to the different strategies.Comment: Proc. of the 10th International Workshop on Agents in Traffic and Transportation (ATT 2018), co-located with ECAI/IJCAI, AAMAS and ICML 2018 conferences (FAIM 2018

    A System's Perspective Towards an Architecture Framework for Safe Automated Vehicles

    Full text link
    With an increasing degree of automation, automated vehicle systems become more complex in terms of functional components as well as interconnected hardware and software components. Thus, holistic systems engineering becomes a severe challenge. Emergent properties like system safety are not solely arguable in singular viewpoints such as structural representations of software or electrical wiring (e.g. fault tolerant). This states the need to get several viewpoints on a system and describe correspondences between these views in order to enable traceability of emergent system properties. Today, the most abstract view found in architecture frameworks is a logical description of system functions which structures the system in terms of information flow and functional components. In this article we extend established system viewpoints towards a capability-based assessment of an automated vehicle and conduct an exemplary safety analysis to derive behavioral safety requirements. These requirements can afterwards be attributed to different viewpoints in an architecture frameworks and thus be integrated into a development process for automated vehicles.Comment: 8 pages, 6 figures. Submitted to the 2018 IEEE ITS

    Collective behavior and emergent risks in a model of human- and autonomously-driven vehicles

    Full text link
    While much effort has been invested in studies of traffic flow as a physics problem, two emerging trends in technology have broadened the subject for new investigations. The first trend is the development of self-driving vehicles. This highly-anticipated shift from human- to autonomous-drivers is expected to offer substantial benefits for traffic throughput by streamlining large-scale collective behavior. The second trend is the widespread hacking of Internet-connected devices, which as of 2015, includes vehicles. While the first proof-of-concept automobile hack was done at the single-vehicle scale, undesirable collective effects can easily arise if this activity becomes more common. Motivated by these two trends, we explore the phenomena that arise in an active matter model with lanes and lane-changing behavior. Our model incorporates a simplified minimal description of essential differences between human- and autonomous-drivers. We study the emergent collective behavior as the population of vehicles shifts from all-human to all-autonomous. Within the context of our model, we explore a worst-case scenario where Internet-connected autonomous vehicles are disabled simultaneously and \textit{en masse}. Our approach reveals a model-independent role for percolation in interpreting the results. A broad lesson our work highlights is that seemingly minor malicious activity can ultimately have major impacts when magnified through the action of collective behavior.Comment: 6 pages, 4 figures; Plus Supplemental Material

    Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions

    Full text link
    We consider the paradigm of a black box AI system that makes life-critical decisions. We propose an "arguing machines" framework that pairs the primary AI system with a secondary one that is independently trained to perform the same task. We show that disagreement between the two systems, without any knowledge of underlying system design or operation, is sufficient to arbitrarily improve the accuracy of the overall decision pipeline given human supervision over disagreements. We demonstrate this system in two applications: (1) an illustrative example of image classification and (2) on large-scale real-world semi-autonomous driving data. For the first application, we apply this framework to image classification achieving a reduction from 8.0% to 2.8% top-5 error on ImageNet. For the second application, we apply this framework to Tesla Autopilot and demonstrate the ability to predict 90.4% of system disengagements that were labeled by human annotators as challenging and needing human supervision
    • …
    corecore