Predicting the future behavior of agents is a fundamental task in autonomous
vehicle domains. Accurate prediction relies on comprehending the surrounding
map, which significantly regularizes agent behaviors. However, existing methods
have limitations in exploiting the map and exhibit a strong dependence on
historical trajectories, which yield unsatisfactory prediction performance and
robustness. Additionally, their heavy network architectures impede real-time
applications. To tackle these problems, we propose Map-Agent Coupled
Transformer (MacFormer) for real-time and robust trajectory prediction. Our
framework explicitly incorporates map constraints into the network via two
carefully designed modules named coupled map and reference extractor. A novel
multi-task optimization strategy (MTOS) is presented to enhance learning of
topology and rule constraints. We also devise bilateral query scheme in context
fusion for a more efficient and lightweight network. We evaluated our approach
on Argoverse 1, Argoverse 2, and nuScenes real-world benchmarks, where it all
achieved state-of-the-art performance with the lowest inference latency and
smallest model size. Experiments also demonstrate that our framework is
resilient to imperfect tracklet inputs. Furthermore, we show that by combining
with our proposed strategies, classical models outperform their baselines,
further validating the versatility of our framework.Comment: Accepted by IEEE Robotics and Automation Letters. 8 Pages, 9 Figures,
9 Tables. Video: https://www.youtube.com/watch?v=XY388iI6sP