Autonomous vehicles operating in complex real-world environments require
accurate predictions of interactive behaviors between traffic participants.
While existing works focus on modeling agent interactions based on their past
trajectories, their future interactions are often ignored. This paper addresses
the interaction prediction problem by formulating it with hierarchical game
theory and proposing the GameFormer framework to implement it. Specifically, we
present a novel Transformer decoder structure that uses the prediction results
from the previous level together with the common environment background to
iteratively refine the interaction process. Moreover, we propose a learning
process that regulates an agent's behavior at the current level to respond to
other agents' behaviors from the last level. Through experiments on a
large-scale real-world driving dataset, we demonstrate that our model can
achieve state-of-the-art prediction accuracy on the interaction prediction
task. We also validate the model's capability to jointly reason about the ego
agent's motion plans and other agents' behaviors in both open-loop and
closed-loop planning tests, outperforming a variety of baseline methods