4,326 research outputs found
Learning Future Object Prediction with a Spatiotemporal Detection Transformer
We explore future object prediction -- a challenging problem where all
objects visible in a future video frame are to be predicted. We propose to
tackle this problem end-to-end by training a detection transformer to directly
output future objects. In order to make accurate predictions about the future,
it is necessary to capture the dynamics in the scene, both of other objects and
of the ego-camera. We extend existing detection transformers in two ways to
capture the scene dynamics. First, we experiment with three different
mechanisms that enable the model to spatiotemporally process multiple frames.
Second, we feed ego-motion information to the model via cross-attention. We
show that both of these cues substantially improve future object prediction
performance. Our final approach learns to capture the dynamics and make
predictions on par with an oracle for 100 ms prediction horizons, and
outperform baselines for longer prediction horizons.Comment: 15 pages, 6 figure
Traffic-Aware Multi-Camera Tracking of Vehicles Based on ReID and Camera Link Model
Multi-target multi-camera tracking (MTMCT), i.e., tracking multiple targets
across multiple cameras, is a crucial technique for smart city applications. In
this paper, we propose an effective and reliable MTMCT framework for vehicles,
which consists of a traffic-aware single camera tracking (TSCT) algorithm, a
trajectory-based camera link model (CLM) for vehicle re-identification (ReID),
and a hierarchical clustering algorithm to obtain the cross camera vehicle
trajectories. First, the TSCT, which jointly considers vehicle appearance,
geometric features, and some common traffic scenarios, is proposed to track the
vehicles in each camera separately. Second, the trajectory-based CLM is adopted
to facilitate the relationship between each pair of adjacently connected
cameras and add spatio-temporal constraints for the subsequent vehicle ReID
with temporal attention. Third, the hierarchical clustering algorithm is used
to merge the vehicle trajectories among all the cameras to obtain the final
MTMCT results. Our proposed MTMCT is evaluated on the CityFlow dataset and
achieves a new state-of-the-art performance with IDF1 of 74.93%.Comment: Accepted by ACM International Conference on Multimedia 202
- …