9 research outputs found
Longitudinal Dynamic versus Kinematic Models for Car-Following Control Using Deep Reinforcement Learning
The majority of current studies on autonomous vehicle control via deep
reinforcement learning (DRL) utilize point-mass kinematic models, neglecting
vehicle dynamics which includes acceleration delay and acceleration command
dynamics. The acceleration delay, which results from sensing and actuation
delays, results in delayed execution of the control inputs. The acceleration
command dynamics dictates that the actual vehicle acceleration does not rise up
to the desired command acceleration instantaneously due to dynamics. In this
work, we investigate the feasibility of applying DRL controllers trained using
vehicle kinematic models to more realistic driving control with vehicle
dynamics. We consider a particular longitudinal car-following control, i.e.,
Adaptive Cruise Control (ACC), problem solved via DRL using a point-mass
kinematic model. When such a controller is applied to car following with
vehicle dynamics, we observe significantly degraded car-following performance.
Therefore, we redesign the DRL framework to accommodate the acceleration delay
and acceleration command dynamics by adding the delayed control inputs and the
actual vehicle acceleration to the reinforcement learning environment state,
respectively. The training results show that the redesigned DRL controller
results in near-optimal control performance of car following with vehicle
dynamics considered when compared with dynamic programming solutions.Comment: Accepted to 2019 IEEE Intelligent Transportation Systems Conferenc
Explanation-Aware Experience Replay in Rule-Dense Environments
Human environments are often regulated by explicit and complex rulesets. Integrating Reinforcement Learning (RL) agents into such environments motivates the development of learning mechanisms that perform well in rule-dense and exception-ridden environments such as autonomous driving on regulated roads. In this letter, we propose a method for organising experience by means of partitioning the experience buffer into clusters labelled on a per-explanation basis. We present discrete and continuous navigation environments compatible with modular rulesets and 9 learning tasks. For environments with explainable rulesets, we convert rule-based explanations into case-based explanations by allocating state-transitions into clusters labelled with explanations. This allows us to sample experiences in a curricular and task-oriented manner, focusing on the rarity, importance, and meaning of events. We label this concept Explanation-Awareness (XA). We perform XA experience replay (XAER) with intra and inter-cluster prioritisation, and introduce XA-compatible versions of DQN, TD3, and SAC. Performance is consistently superior with XA versions of those algorithms, compared to traditional Prioritised Experience Replay baselines, indicating that explanation engineering can be used in lieu of reward engineering for environments with explainable features
A Practical Guide to Multi-Objective Reinforcement Learning and Planning
Real-world decision-making tasks are generally complex, requiring trade-offs
between multiple, often conflicting, objectives. Despite this, the majority of
research in reinforcement learning and decision-theoretic planning either
assumes only a single objective, or that multiple objectives can be adequately
handled via a simple linear combination. Such approaches may oversimplify the
underlying problem and hence produce suboptimal results. This paper serves as a
guide to the application of multi-objective methods to difficult problems, and
is aimed at researchers who are already familiar with single-objective
reinforcement learning and planning methods who wish to adopt a multi-objective
perspective on their research, as well as practitioners who encounter
multi-objective decision problems in practice. It identifies the factors that
may influence the nature of the desired solution, and illustrates by example
how these influence the design of multi-objective decision-making systems for
complex problems
A practical guide to multi-objective reinforcement learning and planning
Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. © 2022, The Author(s)
Dynamic-Occlusion-Aware Risk Identification for Autonomous Vehicles Using Hypergames
A particular challenge for both autonomous vehicles (AV) and human drivers is dealing with risk associated with dynamic occlusion, i.e., occlusion caused by other vehicles in traffic. In order to overcome this challenge, we use the theory of hypergames to develop a novel dynamic-occlusion risk measure (DOR). We use DOR to evaluate the safety of strategic planners, a type of AV behaviour planner that reasons over the assumptions other road users have of each other. We also present a method for augmenting naturalistic driving data to artificially generate occlusion situations. Combining our risk identification and occlusion generation methods, we are able to discover occlusion-caused collisions (OCC), which rarely occur in naturalistic driving data. Using our method we are able to increase the number of dynamic-occlusion situations in naturalistic data by a factor of 70, which allows us to increase the number of OCCs we can discover in naturalistic data by a factor of 40. We show that the generated OCCs are realistic and cover a diverse range of configurations. We then characterize the nature of OCCs at intersections by presenting an OCC taxonomy, which categorizes OCCs based on if they are left-turning or right-turning situations, and if they are reveal or tagging-on situations. Finally, in order to analyze the impact of collisions, we perform a severity analysis, where we find that the majority of OCCs result in high-impact collisions, demonstrating the need to evaluate AVs under occlusion situations before they can be released for commercial use