Search CORE

4,326 research outputs found

Learning Future Object Prediction with a Spatiotemporal Detection Transformer

Author: Johnander Joakim
Petersson Christoffer
Tonderski Adam
Åström Kalle
Publication venue
Publication date: 21/04/2022
Field of study

We explore future object prediction -- a challenging problem where all objects visible in a future video frame are to be predicted. We propose to tackle this problem end-to-end by training a detection transformer to directly output future objects. In order to make accurate predictions about the future, it is necessary to capture the dynamics in the scene, both of other objects and of the ego-camera. We extend existing detection transformers in two ways to capture the scene dynamics. First, we experiment with three different mechanisms that enable the model to spatiotemporally process multiple frames. Second, we feed ego-motion information to the model via cross-attention. We show that both of these cues substantially improve future object prediction performance. Our final approach learns to capture the dynamics and make predictions on par with an oracle for 100 ms prediction horizons, and outperform baselines for longer prediction horizons.Comment: 15 pages, 6 figure

arXiv.org e-Print Archive

Traffic-Aware Multi-Camera Tracking of Vehicles Based on ReID and Camera Link Model

Author: Cai Jiarui
He Zhiqun
Hou Yunzhong
Hsu Hung-Min
Huang Tsung-Wei
Kumar Ratnesh
Li Peilun
Tan Xiao
Wang Gaoang
Zhang Haotian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/08/2020
Field of study

Multi-target multi-camera tracking (MTMCT), i.e., tracking multiple targets across multiple cameras, is a crucial technique for smart city applications. In this paper, we propose an effective and reliable MTMCT framework for vehicles, which consists of a traffic-aware single camera tracking (TSCT) algorithm, a trajectory-based camera link model (CLM) for vehicle re-identification (ReID), and a hierarchical clustering algorithm to obtain the cross camera vehicle trajectories. First, the TSCT, which jointly considers vehicle appearance, geometric features, and some common traffic scenarios, is proposed to track the vehicles in each camera separately. Second, the trajectory-based CLM is adopted to facilitate the relationship between each pair of adjacently connected cameras and add spatio-temporal constraints for the subsequent vehicle ReID with temporal attention. Third, the hierarchical clustering algorithm is used to merge the vehicle trajectories among all the cameras to obtain the final MTMCT results. Our proposed MTMCT is evaluated on the CityFlow dataset and achieves a new state-of-the-art performance with IDF1 of 74.93%.Comment: Accepted by ACM International Conference on Multimedia 202

arXiv.org e-Print Archive

Crossref