3,110 research outputs found
Transformer Networks for Trajectory Forecasting
Most recent successes on forecasting the people motion are based on LSTM
models and all most recent progress has been achieved by modelling the social
interaction among people and the people interaction with the scene. We question
the use of the LSTM models and propose the novel use of Transformer Networks
for trajectory forecasting. This is a fundamental switch from the sequential
step-by-step processing of LSTMs to the only-attention-based memory mechanisms
of Transformers. In particular, we consider both the original Transformer
Network (TF) and the larger Bidirectional Transformer (BERT), state-of-the-art
on all natural language processing tasks. Our proposed Transformers predict the
trajectories of the individual people in the scene. These are "simple" model
because each person is modelled separately without any complex human-human nor
scene interaction terms. In particular, the TF model without bells and whistles
yields the best score on the largest and most challenging trajectory
forecasting benchmark of TrajNet. Additionally, its extension which predicts
multiple plausible future trajectories performs on par with more engineered
techniques on the 5 datasets of ETH + UCY. Finally, we show that Transformers
may deal with missing observations, as it may be the case with real sensor
data. Code is available at https://github.com/FGiuliari/Trajectory-Transformer.Comment: 18 pages, 3 figure
Another Vertical View: A Hierarchical Network for Heterogeneous Trajectory Prediction via Spectrums
With the fast development of AI-related techniques, the applications of
trajectory prediction are no longer limited to easier scenes and trajectories.
More and more heterogeneous trajectories with different representation forms,
such as 2D or 3D coordinates, 2D or 3D bounding boxes, and even
high-dimensional human skeletons, need to be analyzed and forecasted. Among
these heterogeneous trajectories, interactions between different elements
within a frame of trajectory, which we call the ``Dimension-Wise
Interactions'', would be more complex and challenging. However, most previous
approaches focus mainly on a specific form of trajectories, which means these
methods could not be used to forecast heterogeneous trajectories, not to
mention the dimension-wise interaction. Besides, previous methods mostly treat
trajectory prediction as a normal time sequence generation task, indicating
that these methods may require more work to directly analyze agents' behaviors
and social interactions at different temporal scales. In this paper, we bring a
new ``view'' for trajectory prediction to model and forecast trajectories
hierarchically according to different frequency portions from the spectral
domain to learn to forecast trajectories by considering their frequency
responses. Moreover, we try to expand the current trajectory prediction task by
introducing the dimension from ``another view'', thus extending its
application scenarios to heterogeneous trajectories vertically. Finally, we
adopt the bilinear structure to fuse two factors, including the frequency
response and the dimension-wise interaction, to forecast heterogeneous
trajectories via spectrums hierarchically in a generic way. Experiments show
that the proposed model outperforms most state-of-the-art methods on ETH-UCY,
Stanford Drone Dataset and nuScenes with heterogeneous trajectories, including
2D coordinates, 2D and 3D bounding boxes
Robust Human Motion Forecasting using Transformer-based Model
Comprehending human motion is a fundamental challenge for developing
Human-Robot Collaborative applications. Computer vision researchers have
addressed this field by only focusing on reducing error in predictions, but not
taking into account the requirements to facilitate its implementation in
robots. In this paper, we propose a new model based on Transformer that
simultaneously deals with the real time 3D human motion forecasting in the
short and long term. Our 2-Channel Transformer (2CH-TR) is able to efficiently
exploit the spatio-temporal information of a shortly observed sequence (400ms)
and generates a competitive accuracy against the current state-of-the-art.
2CH-TR stands out for the efficient performance of the Transformer, being
lighter and faster than its competitors. In addition, our model is tested in
conditions where the human motion is severely occluded, demonstrating its
robustness in reconstructing and predicting 3D human motion in a highly noisy
environment. Our experiment results show that the proposed 2CH-TR outperforms
the ST-Transformer, which is another state-of-the-art model based on the
Transformer, in terms of reconstruction and prediction under the same
conditions of input prefix. Our model reduces in 8.89% the mean squared error
of ST-Transformer in short-term prediction, and 2.57% in long-term prediction
in Human3.6M dataset with 400ms input prefix.Comment: This paper has been already accepted to the 2022 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS 2022
A Hierarchical Hybrid Learning Framework for Multi-agent Trajectory Prediction
Accurate and robust trajectory prediction of neighboring agents is critical
for autonomous vehicles traversing in complex scenes. Most methods proposed in
recent years are deep learning-based due to their strength in encoding complex
interactions. However, unplausible predictions are often generated since they
rely heavily on past observations and cannot effectively capture the transient
and contingency interactions from sparse samples. In this paper, we propose a
hierarchical hybrid framework of deep learning (DL) and reinforcement learning
(RL) for multi-agent trajectory prediction, to cope with the challenge of
predicting motions shaped by multi-scale interactions. In the DL stage, the
traffic scene is divided into multiple intermediate-scale heterogenous graphs
based on which Transformer-style GNNs are adopted to encode heterogenous
interactions at intermediate and global levels. In the RL stage, we divide the
traffic scene into local sub-scenes utilizing the key future points predicted
in the DL stage. To emulate the motion planning procedure so as to produce
trajectory predictions, a Transformer-based Proximal Policy Optimization (PPO)
incorporated with a vehicle kinematics model is devised to plan motions under
the dominant influence of microscopic interactions. A multi-objective reward is
designed to balance between agent-centric accuracy and scene-wise
compatibility. Experimental results show that our proposal matches the
state-of-the-arts on the Argoverse forecasting benchmark. It's also revealed by
the visualized results that the hierarchical learning framework captures the
multi-scale interactions and improves the feasibility and compliance of the
predicted trajectories
A Diffusion-Model of Joint Interactive Navigation
Simulation of autonomous vehicle systems requires that simulated traffic
participants exhibit diverse and realistic behaviors. The use of prerecorded
real-world traffic scenarios in simulation ensures realism but the rarity of
safety critical events makes large scale collection of driving scenarios
expensive. In this paper, we present DJINN - a diffusion based method of
generating traffic scenarios. Our approach jointly diffuses the trajectories of
all agents, conditioned on a flexible set of state observations from the past,
present, or future. On popular trajectory forecasting datasets, we report state
of the art performance on joint trajectory metrics. In addition, we demonstrate
how DJINN flexibly enables direct test-time sampling from a variety of valuable
conditional distributions including goal-based sampling, behavior-class
sampling, and scenario editing.Comment: 10 pages, 4 figure
A Systematic Review for Transformer-based Long-term Series Forecasting
The emergence of deep learning has yielded noteworthy advancements in time
series forecasting (TSF). Transformer architectures, in particular, have
witnessed broad utilization and adoption in TSF tasks. Transformers have proven
to be the most successful solution to extract the semantic correlations among
the elements within a long sequence. Various variants have enabled transformer
architecture to effectively handle long-term time series forecasting (LTSF)
tasks. In this article, we first present a comprehensive overview of
transformer architectures and their subsequent enhancements developed to
address various LTSF tasks. Then, we summarize the publicly available LTSF
datasets and relevant evaluation metrics. Furthermore, we provide valuable
insights into the best practices and techniques for effectively training
transformers in the context of time-series analysis. Lastly, we propose
potential research directions in this rapidly evolving field
- …