57 research outputs found
Environment Transformer and Policy Optimization for Model-Based Offline Reinforcement Learning
Interacting with the actual environment to acquire data is often costly and
time-consuming in robotic tasks. Model-based offline reinforcement learning
(RL) provides a feasible solution. On the one hand, it eliminates the
requirements of interaction with the actual environment. On the other hand, it
learns the transition dynamics and reward function from the offline datasets
and generates simulated rollouts to accelerate training. Previous model-based
offline RL methods adopt probabilistic ensemble neural networks (NN) to model
aleatoric uncertainty and epistemic uncertainty. However, this results in an
exponential increase in training time and computing resource requirements.
Furthermore, these methods are easily disturbed by the accumulative errors of
the environment dynamics models when simulating long-term rollouts. To solve
the above problems, we propose an uncertainty-aware sequence modeling
architecture called Environment Transformer. It models the probability
distribution of the environment dynamics and reward function to capture
aleatoric uncertainty and treats epistemic uncertainty as a learnable noise
parameter. Benefiting from the accurate modeling of the transition dynamics and
reward function, Environment Transformer can be combined with arbitrary
planning, dynamics programming, or policy optimization algorithms for offline
RL. In this case, we perform Conservative Q-Learning (CQL) to learn a
conservative Q-function. Through simulation experiments, we demonstrate that
our method achieves or exceeds state-of-the-art performance in widely studied
offline RL benchmarks. Moreover, we show that Environment Transformer's
simulated rollout quality, sample efficiency, and long-term rollout simulation
capability are superior to those of previous model-based offline RL methods.Comment: ICRA202
EquiDiff: A Conditional Equivariant Diffusion Model For Trajectory Prediction
Accurate trajectory prediction is crucial for the safe and efficient
operation of autonomous vehicles. The growing popularity of deep learning has
led to the development of numerous methods for trajectory prediction. While
deterministic deep learning models have been widely used, deep generative
models have gained popularity as they learn data distributions from training
data and account for trajectory uncertainties. In this study, we propose
EquiDiff, a deep generative model for predicting future vehicle trajectories.
EquiDiff is based on the conditional diffusion model, which generates future
trajectories by incorporating historical information and random Gaussian noise.
The backbone model of EquiDiff is an SO(2)-equivariant transformer that fully
utilizes the geometric properties of location coordinates. In addition, we
employ Recurrent Neural Networks and Graph Attention Networks to extract social
interactions from historical trajectories. To evaluate the performance of
EquiDiff, we conduct extensive experiments on the NGSIM dataset. Our results
demonstrate that EquiDiff outperforms other baseline models in short-term
prediction, but has slightly higher errors for long-term prediction.
Furthermore, we conduct an ablation study to investigate the contribution of
each component of EquiDiff to the prediction accuracy. Additionally, we present
a visualization of the generation process of our diffusion model, providing
insights into the uncertainty of the prediction
Safe, Efficient, and Comfortable Velocity Control based on Reinforcement Learning for Autonomous Driving
A model used for velocity control during car following was proposed based on
deep reinforcement learning (RL). To fulfil the multi-objectives of car
following, a reward function reflecting driving safety, efficiency, and comfort
was constructed. With the reward function, the RL agent learns to control
vehicle speed in a fashion that maximizes cumulative rewards, through trials
and errors in the simulation environment. A total of 1,341 car-following events
extracted from the Next Generation Simulation (NGSIM) dataset were used to
train the model. Car-following behavior produced by the model were compared
with that observed in the empirical NGSIM data, to demonstrate the model's
ability to follow a lead vehicle safely, efficiently, and comfortably. Results
show that the model demonstrates the capability of safe, efficient, and
comfortable velocity control in that it 1) has small percentages (8\%) of
dangerous minimum time to collision values (\textless\ 5s) than human drivers
in the NGSIM data (35\%); 2) can maintain efficient and safe headways in the
range of 1s to 2s; and 3) can follow the lead vehicle comfortably with smooth
acceleration. The results indicate that reinforcement learning methods could
contribute to the development of autonomous driving systems.Comment: Submitted to IEEE transaction on IT
- β¦