Realistic scene-level multi-agent motion simulations are crucial for
developing and evaluating self-driving algorithms. However, most existing works
focus on generating trajectories for a certain single agent type, and typically
ignore the consistency of generated trajectories. In this paper, we propose a
novel framework based on diffusion models, called SceneDM, to generate joint
and consistent future motions of all the agents, including vehicles, bicycles,
pedestrians, etc., in a scene. To enhance the consistency of the generated
trajectories, we resort to a new Transformer-based network to effectively
handle agent-agent interactions in the inverse process of motion diffusion. In
consideration of the smoothness of agent trajectories, we further design a
simple yet effective consistent diffusion approach, to improve the model in
exploiting short-term temporal dependencies. Furthermore, a scene-level scoring
function is attached to evaluate the safety and road-adherence of the generated
agent's motions and help filter out unrealistic simulations. Finally, SceneDM
achieves state-of-the-art results on the Waymo Sim Agents Benchmark. Project
webpage is available at https://alperen-hub.github.io/SceneDM