Multi-Camera Multi-Object Tracking is currently drawing attention in the
computer vision field due to its superior performance in real-world
applications such as video surveillance in crowded scenes or in wide spaces. In
this work, we propose a mathematically elegant multi-camera multiple object
tracking approach based on a spatial-temporal lifted multicut formulation. Our
model utilizes state-of-the-art tracklets produced by single-camera trackers as
proposals. As these tracklets may contain ID-Switch errors, we refine them
through a novel pre-clustering obtained from 3D geometry projections. As a
result, we derive a better tracking graph without ID switches and more precise
affinity costs for the data association phase. Tracklets are then matched to
multi-camera trajectories by solving a global lifted multicut formulation that
incorporates short and long-range temporal interactions on tracklets located in
the same camera as well as inter-camera ones. Experimental results on the
WildTrack dataset yield near-perfect performance, outperforming
state-of-the-art trackers on Campus while being on par on the PETS-09 dataset.Comment: Official version for CVPR 202