398 research outputs found
CRITERIA: a New Benchmarking Paradigm for Evaluating Trajectory Prediction Models for Autonomous Driving
Benchmarking is a common method for evaluating trajectory prediction models
for autonomous driving. Existing benchmarks rely on datasets, which are biased
towards more common scenarios, such as cruising, and distance-based metrics
that are computed by averaging over all scenarios. Following such a regiment
provides a little insight into the properties of the models both in terms of
how well they can handle different scenarios and how admissible and diverse
their outputs are. There exist a number of complementary metrics designed to
measure the admissibility and diversity of trajectories, however, they suffer
from biases, such as length of trajectories.
In this paper, we propose a new benChmarking paRadIgm for evaluaTing
trajEctoRy predIction Approaches (CRITERIA). Particularly, we propose 1) a
method for extracting driving scenarios at varying levels of specificity
according to the structure of the roads, models' performance, and data
properties for fine-grained ranking of prediction models; 2) A set of new
bias-free metrics for measuring diversity, by incorporating the characteristics
of a given scenario, and admissibility, by considering the structure of roads
and kinematic compliancy, motivated by real-world driving constraints. 3) Using
the proposed benchmark, we conduct extensive experimentation on a
representative set of the prediction models using the large scale Argoverse
dataset. We show that the proposed benchmark can produce a more accurate
ranking of the models and serve as a means of characterizing their behavior. We
further present ablation studies to highlight contributions of different
elements that are used to compute the proposed metrics
Towards trustworthy multi-modal motion prediction: Holistic evaluation and interpretability of outputs
Predicting the motion of other road agents enables autonomous vehicles to
perform safe and efficient path planning. This task is very complex, as the
behaviour of road agents depends on many factors and the number of possible
future trajectories can be considerable (multi-modal). Most prior approaches
proposed to address multi-modal motion prediction are based on complex machine
learning systems that have limited interpretability. Moreover, the metrics used
in current benchmarks do not evaluate all aspects of the problem, such as the
diversity and admissibility of the output. In this work, we aim to advance
towards the design of trustworthy motion prediction systems, based on some of
the requirements for the design of Trustworthy Artificial Intelligence. We
focus on evaluation criteria, robustness, and interpretability of outputs.
First, we comprehensively analyse the evaluation metrics, identify the main
gaps of current benchmarks, and propose a new holistic evaluation framework. We
then introduce a method for the assessment of spatial and temporal robustness
by simulating noise in the perception system. To enhance the interpretability
of the outputs and generate more balanced results in the proposed evaluation
framework, we propose an intent prediction layer that can be attached to
multi-modal motion prediction models. The effectiveness of this approach is
assessed through a survey that explores different elements in the visualization
of the multi-modal trajectories and intentions. The proposed approach and
findings make a significant contribution to the development of trustworthy
motion prediction systems for autonomous vehicles, advancing the field towards
greater safety and reliability.Comment: 16 pages, 7 figures, 6 table
DICE: Diverse Diffusion Model with Scoring for Trajectory Prediction
Road user trajectory prediction in dynamic environments is a challenging but
crucial task for various applications, such as autonomous driving. One of the
main challenges in this domain is the multimodal nature of future trajectories
stemming from the unknown yet diverse intentions of the agents. Diffusion
models have shown to be very effective in capturing such stochasticity in
prediction tasks. However, these models involve many computationally expensive
denoising steps and sampling operations that make them a less desirable option
for real-time safety-critical applications. To this end, we present a novel
framework that leverages diffusion models for predicting future trajectories in
a computationally efficient manner. To minimize the computational bottlenecks
in iterative sampling, we employ an efficient sampling mechanism that allows us
to maximize the number of sampled trajectories for improved accuracy while
maintaining inference time in real time. Moreover, we propose a scoring
mechanism to select the most plausible trajectories by assigning relative
ranks. We show the effectiveness of our approach by conducting empirical
evaluations on common pedestrian (UCY/ETH) and autonomous driving (nuScenes)
benchmark datasets on which our model achieves state-of-the-art performance on
several subsets and metrics
Heterogeneous Trajectory Forecasting via Risk and Scene Graph Learning
Heterogeneous trajectory forecasting is critical for intelligent
transportation systems, while it is challenging because of the difficulty for
modeling the complex interaction relations among the heterogeneous road agents
as well as their agent-environment constraint. In this work, we propose a risk
and scene graph learning method for trajectory forecasting of heterogeneous
road agents, which consists of a Heterogeneous Risk Graph (HRG) and a
Hierarchical Scene Graph (HSG) from the aspects of agent category and their
movable semantic regions. HRG groups each kind of road agents and calculates
their interaction adjacency matrix based on an effective collision risk metric.
HSG of driving scene is modeled by inferring the relationship between road
agents and road semantic layout aligned by the road scene grammar. Based on
this formulation, we can obtain an effective trajectory forecasting in driving
situations, and superior performance to other state-of-the-art approaches is
demonstrated by exhaustive experiments on the nuScenes, ApolloScape, and
Argoverse datasets.Comment: Submitted to IEEE Transactions on Intelligent Transportation Systems,
202
Trajectory Prediction for Autonomous Driving based on Multi-Head Attention with Joint Agent-Map Representation
Predicting the trajectories of surrounding agents is an essential ability for
autonomous vehicles navigating through complex traffic scenes. The future
trajectories of agents can be inferred using two important cues: the locations
and past motion of agents, and the static scene structure. Due to the high
variability in scene structure and agent configurations, prior work has
employed the attention mechanism, applied separately to the scene and agent
configuration to learn the most salient parts of both cues. However, the two
cues are tightly linked. The agent configuration can inform what part of the
scene is most relevant to prediction. The static scene in turn can help
determine the relative influence of agents on each other's motion. Moreover,
the distribution of future trajectories is multimodal, with modes corresponding
to the agent's intent. The agent's intent also informs what part of the scene
and agent configuration is relevant to prediction. We thus propose a novel
approach applying multi-head attention by considering a joint representation of
the static scene and surrounding agents. We use each attention head to generate
a distinct future trajectory to address multimodality of future trajectories.
Our model achieves state of the art results on the nuScenes prediction
benchmark and generates diverse future trajectories compliant with scene
structure and agent configuration.Comment: Revised submission for RA-
A Fast and Map-Free Model for Trajectory Prediction in Traffics
To handle the two shortcomings of existing methods, (i)nearly all models rely
on high-definition (HD) maps, yet the map information is not always available
in real traffic scenes and HD map-building is expensive and time-consuming and
(ii) existing models usually focus on improving prediction accuracy at the
expense of reducing computing efficiency, yet the efficiency is crucial for
various real applications, this paper proposes an efficient trajectory
prediction model that is not dependent on traffic maps. The core idea of our
model is encoding single-agent's spatial-temporal information in the first
stage and exploring multi-agents' spatial-temporal interactions in the second
stage. By comprehensively utilizing attention mechanism, LSTM, graph
convolution network and temporal transformer in the two stages, our model is
able to learn rich dynamic and interaction information of all agents. Our model
achieves the highest performance when comparing with existing map-free methods
and also exceeds most map-based state-of-the-art methods on the Argoverse
dataset. In addition, our model also exhibits a faster inference speed than the
baseline methods.Comment: 7 pages, 3 figure
A Hierarchical Hybrid Learning Framework for Multi-agent Trajectory Prediction
Accurate and robust trajectory prediction of neighboring agents is critical
for autonomous vehicles traversing in complex scenes. Most methods proposed in
recent years are deep learning-based due to their strength in encoding complex
interactions. However, unplausible predictions are often generated since they
rely heavily on past observations and cannot effectively capture the transient
and contingency interactions from sparse samples. In this paper, we propose a
hierarchical hybrid framework of deep learning (DL) and reinforcement learning
(RL) for multi-agent trajectory prediction, to cope with the challenge of
predicting motions shaped by multi-scale interactions. In the DL stage, the
traffic scene is divided into multiple intermediate-scale heterogenous graphs
based on which Transformer-style GNNs are adopted to encode heterogenous
interactions at intermediate and global levels. In the RL stage, we divide the
traffic scene into local sub-scenes utilizing the key future points predicted
in the DL stage. To emulate the motion planning procedure so as to produce
trajectory predictions, a Transformer-based Proximal Policy Optimization (PPO)
incorporated with a vehicle kinematics model is devised to plan motions under
the dominant influence of microscopic interactions. A multi-objective reward is
designed to balance between agent-centric accuracy and scene-wise
compatibility. Experimental results show that our proposal matches the
state-of-the-arts on the Argoverse forecasting benchmark. It's also revealed by
the visualized results that the hierarchical learning framework captures the
multi-scale interactions and improves the feasibility and compliance of the
predicted trajectories
MacFormer: Map-Agent Coupled Transformer for Real-time and Robust Trajectory Prediction
Predicting the future behavior of agents is a fundamental task in autonomous
vehicle domains. Accurate prediction relies on comprehending the surrounding
map, which significantly regularizes agent behaviors. However, existing methods
have limitations in exploiting the map and exhibit a strong dependence on
historical trajectories, which yield unsatisfactory prediction performance and
robustness. Additionally, their heavy network architectures impede real-time
applications. To tackle these problems, we propose Map-Agent Coupled
Transformer (MacFormer) for real-time and robust trajectory prediction. Our
framework explicitly incorporates map constraints into the network via two
carefully designed modules named coupled map and reference extractor. A novel
multi-task optimization strategy (MTOS) is presented to enhance learning of
topology and rule constraints. We also devise bilateral query scheme in context
fusion for a more efficient and lightweight network. We evaluated our approach
on Argoverse 1, Argoverse 2, and nuScenes real-world benchmarks, where it all
achieved state-of-the-art performance with the lowest inference latency and
smallest model size. Experiments also demonstrate that our framework is
resilient to imperfect tracklet inputs. Furthermore, we show that by combining
with our proposed strategies, classical models outperform their baselines,
further validating the versatility of our framework.Comment: Accepted by IEEE Robotics and Automation Letters. 8 Pages, 9 Figures,
9 Tables. Video: https://www.youtube.com/watch?v=XY388iI6sP
- …