4,464 research outputs found
Transformer Networks for Trajectory Forecasting
Most recent successes on forecasting the people motion are based on LSTM
models and all most recent progress has been achieved by modelling the social
interaction among people and the people interaction with the scene. We question
the use of the LSTM models and propose the novel use of Transformer Networks
for trajectory forecasting. This is a fundamental switch from the sequential
step-by-step processing of LSTMs to the only-attention-based memory mechanisms
of Transformers. In particular, we consider both the original Transformer
Network (TF) and the larger Bidirectional Transformer (BERT), state-of-the-art
on all natural language processing tasks. Our proposed Transformers predict the
trajectories of the individual people in the scene. These are "simple" model
because each person is modelled separately without any complex human-human nor
scene interaction terms. In particular, the TF model without bells and whistles
yields the best score on the largest and most challenging trajectory
forecasting benchmark of TrajNet. Additionally, its extension which predicts
multiple plausible future trajectories performs on par with more engineered
techniques on the 5 datasets of ETH + UCY. Finally, we show that Transformers
may deal with missing observations, as it may be the case with real sensor
data. Code is available at https://github.com/FGiuliari/Trajectory-Transformer.Comment: 18 pages, 3 figure
SPION: Layer-Wise Sparse Training of Transformer via Convolutional Flood Filling
Sparsifying the Transformer has garnered considerable interest, as training
the Transformer is very computationally demanding. Prior efforts to sparsify
the Transformer have either used a fixed pattern or data-driven approach to
reduce the number of operations involving the computation of multi-head
attention, which is the main bottleneck of the Transformer. However, existing
methods suffer from inevitable problems, such as the potential loss of
essential sequence features due to the uniform fixed pattern applied across all
layers, and an increase in the model size resulting from the use of additional
parameters to learn sparsity patterns in attention operations. In this paper,
we propose a novel sparsification scheme for the Transformer that integrates
convolution filters and the flood filling method to efficiently capture the
layer-wise sparse pattern in attention operations. Our sparsification approach
reduces the computational complexity and memory footprint of the Transformer
during training. Efficient implementations of the layer-wise sparsified
attention algorithm on GPUs are developed, demonstrating a new SPION that
achieves up to 3.08X speedup over existing state-of-the-art sparse Transformer
models, with better evaluation quality
- …