5,173 research outputs found
Uncertainty-Aware Decision Transformer for Stochastic Driving Environments
Offline Reinforcement Learning (RL) has emerged as a promising framework for
learning policies without active interactions, making it especially appealing
for autonomous driving tasks. Recent successes of Transformers inspire casting
offline RL as sequence modeling, which performs well in long-horizon tasks.
However, they are overly optimistic in stochastic environments with incorrect
assumptions that the same goal can be consistently achieved by identical
actions. In this paper, we introduce an UNcertainty-awaRE deciSion Transformer
(UNREST) for planning in stochastic driving environments without introducing
additional transition or complex generative models. Specifically, UNREST
estimates state uncertainties by the conditional mutual information between
transitions and returns, and segments sequences accordingly. Discovering the
`uncertainty accumulation' and `temporal locality' properties of driving
environments, UNREST replaces the global returns in decision transformers with
less uncertain truncated returns, to learn from true outcomes of agent actions
rather than environment transitions. We also dynamically evaluate environmental
uncertainty during inference for cautious planning. Extensive experimental
results demonstrate UNREST's superior performance in various driving scenarios
and the power of our uncertainty estimation strategy
Boosting Offline Reinforcement Learning for Autonomous Driving with Hierarchical Latent Skills
Learning-based vehicle planning is receiving increasing attention with the
emergence of diverse driving simulators and large-scale driving datasets. While
offline reinforcement learning (RL) is well suited for these safety-critical
tasks, it still struggles to plan over extended periods. In this work, we
present a skill-based framework that enhances offline RL to overcome the
long-horizon vehicle planning challenge. Specifically, we design a variational
autoencoder (VAE) to learn skills from offline demonstrations. To mitigate
posterior collapse of common VAEs, we introduce a two-branch sequence encoder
to capture both discrete options and continuous variations of the complex
driving skills. The final policy treats learned skills as actions and can be
trained by any off-the-shelf offline RL algorithms. This facilitates a shift in
focus from per-step actions to temporally extended skills, thereby enabling
long-term reasoning into the future. Extensive results on CARLA prove that our
model consistently outperforms strong baselines at both training and new
scenarios. Additional visualizations and experiments demonstrate the
interpretability and transferability of extracted skills
- …