31,740 research outputs found
An Agent-based Modelling Framework for Driving Policy Learning in Connected and Autonomous Vehicles
Due to the complexity of the natural world, a programmer cannot foresee all
possible situations, a connected and autonomous vehicle (CAV) will face during
its operation, and hence, CAVs will need to learn to make decisions
autonomously. Due to the sensing of its surroundings and information exchanged
with other vehicles and road infrastructure, a CAV will have access to large
amounts of useful data. While different control algorithms have been proposed
for CAVs, the benefits brought about by connectedness of autonomous vehicles to
other vehicles and to the infrastructure, and its implications on policy
learning has not been investigated in literature. This paper investigates a
data driven driving policy learning framework through an agent-based modelling
approaches. The contributions of the paper are two-fold. A dynamic programming
framework is proposed for in-vehicle policy learning with and without
connectivity to neighboring vehicles. The simulation results indicate that
while a CAV can learn to make autonomous decisions, vehicle-to-vehicle (V2V)
communication of information improves this capability. Furthermore, to overcome
the limitations of sensing in a CAV, the paper proposes a novel concept for
infrastructure-led policy learning and communication with autonomous vehicles.
In infrastructure-led policy learning, road-side infrastructure senses and
captures successful vehicle maneuvers and learns an optimal policy from those
temporal sequences, and when a vehicle approaches the road-side unit, the
policy is communicated to the CAV. Deep-imitation learning methodology is
proposed to develop such an infrastructure-led policy learning framework
Safe and Robust Multi-Agent Reinforcement Learning for Connected Autonomous Vehicles under State Perturbations
Sensing and communication technologies have enhanced learning-based decision
making methodologies for multi-agent systems such as connected autonomous
vehicles (CAV). However, most existing safe reinforcement learning based
methods assume accurate state information. It remains challenging to achieve
safety requirement under state uncertainties for CAVs, considering the noisy
sensor measurements and the vulnerability of communication channels. In this
work, we propose a Robust Multi-Agent Proximal Policy Optimization with robust
Safety Shield (SR-MAPPO) for CAVs in various driving scenarios. Both robust
MARL algorithm and control barrier function (CBF)-based safety shield are used
in our approach to cope with the perturbed or uncertain state inputs. The
robust policy is trained with a worst-case Q function regularization module
that pursues higher lower-bounded reward in the former, whereas the latter,
i.e., the robust CBF safety shield accounts for CAVs' collision-free
constraints in complicated driving scenarios with even perturbed vehicle state
information. We validate the advantages of SR-MAPPO in robustness and safety
and compare it with baselines under different driving and state perturbation
scenarios in CARLA simulator. The SR-MAPPO policy is verified to maintain
higher safety rates and efficiency (reward) when threatened by both state
perturbations and unconnected vehicles' dangerous behaviors.Comment: 6 pages, 5 figure
Spatial-Temporal-Aware Safe Multi-Agent Reinforcement Learning of Connected Autonomous Vehicles in Challenging Scenarios
Communication technologies enable coordination among connected and autonomous
vehicles (CAVs). However, it remains unclear how to utilize shared information
to improve the safety and efficiency of the CAV system. In this work, we
propose a framework of constrained multi-agent reinforcement learning (MARL)
with a parallel safety shield for CAVs in challenging driving scenarios. The
coordination mechanisms of the proposed MARL include information sharing and
cooperative policy learning, with Graph Convolutional Network (GCN)-Transformer
as a spatial-temporal encoder that enhances the agent's environment awareness.
The safety shield module with Control Barrier Functions (CBF)-based safety
checking protects the agents from taking unsafe actions. We design a
constrained multi-agent advantage actor-critic (CMAA2C) algorithm to train safe
and cooperative policies for CAVs. With the experiment deployed in the CARLA
simulator, we verify the effectiveness of the safety checking, spatial-temporal
encoder, and coordination mechanisms designed in our method by comparative
experiments in several challenging scenarios with the defined hazard vehicles
(HAZV). Results show that our proposed methodology significantly increases
system safety and efficiency in challenging scenarios.Comment: This paper has been accepted by the 2023 IEEE International
Conference on Robotics and Automation (ICRA 2023). 6 pages, 5 figure
Parallelized Interactive Machine Learning on Autonomous Vehicles
Deep reinforcement learning (deep RL) has achieved superior performance in
complex sequential tasks by learning directly from image input. A deep neural
network is used as a function approximator and requires no specific state
information. However, one drawback of using only images as input is that this
approach requires a prohibitively large amount of training time and data for
the model to learn the state feature representation and approach reasonable
performance. This is not feasible in real-world applications, especially when
the data are expansive and training phase could introduce disasters that affect
human safety. In this work, we use a human demonstration approach to speed up
training for learning features and use the resulting pre-trained model to
replace the neural network in the deep RL Deep Q-Network (DQN), followed by
human interaction to further refine the model. We empirically evaluate our
approach by using only a human demonstration model and modified DQN with human
demonstration model included in the Microsoft AirSim car simulator. Our results
show that (1) pre-training with human demonstration in a supervised learning
approach is better and much faster at discovering features than DQN alone, (2)
initializing the DQN with a pre-trained model provides a significant
improvement in training time and performance even with limited human
demonstration, and (3) providing the ability for humans to supply suggestions
during DQN training can speed up the network's convergence on an optimal
policy, as well as allow it to learn more complex policies that are harder to
discover by random exploration.Comment: 6 pages, NAECON 2018 - IEEE National Aerospace and Electronics
Conferenc
- …