994 research outputs found
Large Trajectory Models are Scalable Motion Predictors and Planners
Motion prediction and planning are vital tasks in autonomous driving, and
recent efforts have shifted to machine learning-based approaches. The
challenges include understanding diverse road topologies, reasoning traffic
dynamics over a long time horizon, interpreting heterogeneous behaviors, and
generating policies in a large continuous state space. Inspired by the success
of large language models in addressing similar complexities through model
scaling, we introduce a scalable trajectory model called State Transformer
(STR). STR reformulates the motion prediction and motion planning problems by
arranging observations, states, and actions into one unified sequence modeling
task. With a simple model design, STR consistently outperforms baseline
approaches in both problems. Remarkably, experimental results reveal that large
trajectory models (LTMs), such as STR, adhere to the scaling laws by presenting
outstanding adaptability and learning efficiency. Qualitative results further
demonstrate that LTMs are capable of making plausible predictions in scenarios
that diverge significantly from the training data distribution. LTMs also learn
to make complex reasonings for long-term planning, without explicit loss
designs or costly high-level annotations
GameFormer: Game-theoretic Modeling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving
Autonomous vehicles operating in complex real-world environments require
accurate predictions of interactive behaviors between traffic participants.
While existing works focus on modeling agent interactions based on their past
trajectories, their future interactions are often ignored. This paper addresses
the interaction prediction problem by formulating it with hierarchical game
theory and proposing the GameFormer framework to implement it. Specifically, we
present a novel Transformer decoder structure that uses the prediction results
from the previous level together with the common environment background to
iteratively refine the interaction process. Moreover, we propose a learning
process that regulates an agent's behavior at the current level to respond to
other agents' behaviors from the last level. Through experiments on a
large-scale real-world driving dataset, we demonstrate that our model can
achieve state-of-the-art prediction accuracy on the interaction prediction
task. We also validate the model's capability to jointly reason about the ego
agent's motion plans and other agents' behaviors in both open-loop and
closed-loop planning tests, outperforming a variety of baseline methods
- …