14,291 research outputs found
Combining Planning and Deep Reinforcement Learning in Tactical Decision Making for Autonomous Driving
Tactical decision making for autonomous driving is challenging due to the
diversity of environments, the uncertainty in the sensor information, and the
complex interaction with other road users. This paper introduces a general
framework for tactical decision making, which combines the concepts of planning
and learning, in the form of Monte Carlo tree search and deep reinforcement
learning. The method is based on the AlphaGo Zero algorithm, which is extended
to a domain with a continuous state space where self-play cannot be used. The
framework is applied to two different highway driving cases in a simulated
environment and it is shown to perform better than a commonly used baseline
method. The strength of combining planning and learning is also illustrated by
a comparison to using the Monte Carlo tree search or the neural network policy
separately
Autonomous Highway Driving using Deep Reinforcement Learning
The operational space of an autonomous vehicle (AV) can be diverse and vary
significantly. This may lead to a scenario that was not postulated in the
design phase. Due to this, formulating a rule based decision maker for
selecting maneuvers may not be ideal. Similarly, it may not be effective to
design an a-priori cost function and then solve the optimal control problem in
real-time. In order to address these issues and to avoid peculiar behaviors
when encountering unforeseen scenario, we propose a reinforcement learning (RL)
based method, where the ego car, i.e., an autonomous vehicle, learns to make
decisions by directly interacting with simulated traffic. The decision maker
for AV is implemented as a deep neural network providing an action choice for a
given system state. In a critical application such as driving, an RL agent
without explicit notion of safety may not converge or it may need extremely
large number of samples before finding a reliable policy. To best address the
issue, this paper incorporates reinforcement learning with an additional short
horizon safety check (SC). In a critical scenario, the safety check will also
provide an alternate safe action to the agent provided if it exists. This leads
to two novel contributions. First, it generalizes the states that could lead to
undesirable "near-misses" or "collisions ". Second, inclusion of safety check
can provide a safe and stable training environment. This significantly enhances
learning efficiency without inhibiting meaningful exploration to ensure safe
and optimal learned behavior. We demonstrate the performance of the developed
algorithm in highway driving scenario where the trained AV encounters varying
traffic density in a highway setting
Autonomous Ramp Merge Maneuver Based on Reinforcement Learning with Continuous Action Space
Ramp merging is a critical maneuver for road safety and traffic efficiency.
Most of the current automated driving systems developed by multiple automobile
manufacturers and suppliers are typically limited to restricted access freeways
only. Extending the automated mode to ramp merging zones presents substantial
challenges. One is that the automated vehicle needs to incorporate a future
objective (e.g. a successful and smooth merge) and optimize a long-term reward
that is impacted by subsequent actions when executing the current action.
Furthermore, the merging process involves interaction between the merging
vehicle and its surrounding vehicles whose behavior may be cooperative or
adversarial, leading to distinct merging countermeasures that are crucial to
successfully complete the merge. In place of the conventional rule-based
approaches, we propose to apply reinforcement learning algorithm on the
automated vehicle agent to find an optimal driving policy by maximizing the
long-term reward in an interactive driving environment. Most importantly, in
contrast to most reinforcement learning applications in which the action space
is resolved as discrete, our approach treats the action space as well as the
state space as continuous without incurring additional computational costs. Our
unique contribution is the design of the Q-function approximation whose format
is structured as a quadratic function, by which simple but effective neural
networks are used to estimate its coefficients. The results obtained through
the implementation of our training platform demonstrate that the vehicle agent
is able to learn a safe, smooth and timely merging policy, indicating the
effectiveness and practicality of our approach
Multimodal Probabilistic Model-Based Planning for Human-Robot Interaction
This paper presents a method for constructing human-robot interaction
policies in settings where multimodality, i.e., the possibility of multiple
highly distinct futures, plays a critical role in decision making. We are
motivated in this work by the example of traffic weaving, e.g., at highway
on-ramps/off-ramps, where entering and exiting cars must swap lanes in a short
distance---a challenging negotiation even for experienced drivers due to the
inherent multimodal uncertainty of who will pass whom. Our approach is to learn
multimodal probability distributions over future human actions from a dataset
of human-human exemplars and perform real-time robot policy construction in the
resulting environment model through massively parallel sampling of human
responses to candidate robot action sequences. Direct learning of these
distributions is made possible by recent advances in the theory of conditional
variational autoencoders (CVAEs), whereby we learn action distributions
simultaneously conditioned on the present interaction history, as well as
candidate future robot actions in order to take into account response dynamics.
We demonstrate the efficacy of this approach with a human-in-the-loop
simulation of a traffic weaving scenario
Unconventional Arterial Intersection Designs under Connected and Automated Vehicle Environment: A Survey
Signalized intersections are major sources of traffic delay and collision
within the modern transportation system. Conventional signal optimization has
revealed its limitation in improving the mobility and safety of an
intersection. Unconventional arterial intersection designs (UAIDs) are able to
improve the performance of an intersection by reducing phases of a signal
cycle. Furthermore, they can fundamentally alter the number and the nature of
the conflicting points. However, the driver's confusion, as a result of the
unconventional geometric designs, remains one of the major barriers for the
widespread adoption of UAIDs. Connected and Automated Vehicle (CAV) technology
has the potential to overcome this barrier by eliminating the driver's
confusion of a UAID. Therefore, UAIDs can play a significant role in
transportation networks in the near future. In this paper, we surveyed UAID
studies and implementations. In addition, we present an overview of
intersection control schemes with the emergence of CAV and highlight the
opportunity rises for UAID with the CAV technology. It is believed that the
benefits gained from deploying UAIDs in conjunction with CAV are significant
during the initial rollout of CAV under low market penetration
Formulation of Deep Reinforcement Learning Architecture Toward Autonomous Driving for On-Ramp Merge
Multiple automakers have in development or in production automated driving
systems (ADS) that offer freeway-pilot functions. This type of ADS is typically
limited to restricted-access freeways only, that is, the transition from manual
to automated modes takes place only after the ramp merging process is completed
manually. One major challenge to extend the automation to ramp merging is that
the automated vehicle needs to incorporate and optimize long-term objectives
(e.g. successful and smooth merge) when near-term actions must be safely
executed. Moreover, the merging process involves interactions with other
vehicles whose behaviors are sometimes hard to predict but may influence the
merging vehicle optimal actions. To tackle such a complicated control problem,
we propose to apply Deep Reinforcement Learning (DRL) techniques for finding an
optimal driving policy by maximizing the long-term reward in an interactive
environment. Specifically, we apply a Long Short-Term Memory (LSTM)
architecture to model the interactive environment, from which an internal state
containing historical driving information is conveyed to a Deep Q-Network
(DQN). The DQN is used to approximate the Q-function, which takes the internal
state as input and generates Q-values as output for action selection. With this
DRL architecture, the historical impact of interactive environment on the
long-term reward can be captured and taken into account for deciding the
optimal control policy. The proposed architecture has the potential to be
extended and applied to other autonomous driving scenarios such as driving
through a complex intersection or changing lanes under varying traffic flow
conditions.Comment: IEEE International Conference on Intelligent Transportation Systems,
Yokohama, Japan, 201
Transfer Learning versus Multi-agent Learning regarding Distributed Decision-Making in Highway Traffic
Transportation and traffic are currently undergoing a rapid increase in terms
of both scale and complexity. At the same time, an increasing share of traffic
participants are being transformed into agents driven or supported by
artificial intelligence resulting in mixed-intelligence traffic. This work
explores the implications of distributed decision-making in mixed-intelligence
traffic. The investigations are carried out on the basis of an online-simulated
highway scenario, namely the MIT \emph{DeepTraffic} simulation. In the first
step traffic agents are trained by means of a deep reinforcement learning
approach, being deployed inside an elitist evolutionary algorithm for
hyperparameter search. The resulting architectures and training parameters are
then utilized in order to either train a single autonomous traffic agent and
transfer the learned weights onto a multi-agent scenario or else to conduct
multi-agent learning directly. Both learning strategies are evaluated on
different ratios of mixed-intelligence traffic. The strategies are assessed
according to the average speed of all agents driven by artificial intelligence.
Traffic patterns that provoke a reduction in traffic flow are analyzed with
respect to the different strategies.Comment: Proc. of the 10th International Workshop on Agents in Traffic and
Transportation (ATT 2018), co-located with ECAI/IJCAI, AAMAS and ICML 2018
conferences (FAIM 2018
A System's Perspective Towards an Architecture Framework for Safe Automated Vehicles
With an increasing degree of automation, automated vehicle systems become
more complex in terms of functional components as well as interconnected
hardware and software components. Thus, holistic systems engineering becomes a
severe challenge. Emergent properties like system safety are not solely
arguable in singular viewpoints such as structural representations of software
or electrical wiring (e.g. fault tolerant). This states the need to get several
viewpoints on a system and describe correspondences between these views in
order to enable traceability of emergent system properties. Today, the most
abstract view found in architecture frameworks is a logical description of
system functions which structures the system in terms of information flow and
functional components. In this article we extend established system viewpoints
towards a capability-based assessment of an automated vehicle and conduct an
exemplary safety analysis to derive behavioral safety requirements. These
requirements can afterwards be attributed to different viewpoints in an
architecture frameworks and thus be integrated into a development process for
automated vehicles.Comment: 8 pages, 6 figures. Submitted to the 2018 IEEE ITS
Collective behavior and emergent risks in a model of human- and autonomously-driven vehicles
While much effort has been invested in studies of traffic flow as a physics
problem, two emerging trends in technology have broadened the subject for new
investigations. The first trend is the development of self-driving vehicles.
This highly-anticipated shift from human- to autonomous-drivers is expected to
offer substantial benefits for traffic throughput by streamlining large-scale
collective behavior. The second trend is the widespread hacking of
Internet-connected devices, which as of 2015, includes vehicles. While the
first proof-of-concept automobile hack was done at the single-vehicle scale,
undesirable collective effects can easily arise if this activity becomes more
common. Motivated by these two trends, we explore the phenomena that arise in
an active matter model with lanes and lane-changing behavior. Our model
incorporates a simplified minimal description of essential differences between
human- and autonomous-drivers. We study the emergent collective behavior as the
population of vehicles shifts from all-human to all-autonomous. Within the
context of our model, we explore a worst-case scenario where Internet-connected
autonomous vehicles are disabled simultaneously and \textit{en masse}. Our
approach reveals a model-independent role for percolation in interpreting the
results. A broad lesson our work highlights is that seemingly minor malicious
activity can ultimately have major impacts when magnified through the action of
collective behavior.Comment: 6 pages, 4 figures; Plus Supplemental Material
Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions
We consider the paradigm of a black box AI system that makes life-critical
decisions. We propose an "arguing machines" framework that pairs the primary AI
system with a secondary one that is independently trained to perform the same
task. We show that disagreement between the two systems, without any knowledge
of underlying system design or operation, is sufficient to arbitrarily improve
the accuracy of the overall decision pipeline given human supervision over
disagreements. We demonstrate this system in two applications: (1) an
illustrative example of image classification and (2) on large-scale real-world
semi-autonomous driving data. For the first application, we apply this
framework to image classification achieving a reduction from 8.0% to 2.8% top-5
error on ImageNet. For the second application, we apply this framework to Tesla
Autopilot and demonstrate the ability to predict 90.4% of system disengagements
that were labeled by human annotators as challenging and needing human
supervision
- …