10 research outputs found
Eliminating Exposure Bias and Metric Mismatch in Multiple Object Tracking
Identity Switching remains one of the main difficulties Multiple Object Tracking (MOT) algorithms have to deal with. Many state-of-the-art approaches now use sequence models to solve this problem but their training can be affected by biases that decrease their efficiency. In this paper, we introduce a new training procedure that confronts the algorithm to its own mistakes while explicitly attempting to minimize the number of switches, which results in better training. We propose an iterative scheme of building a rich training set and using it to learn a scoring function that is an explicit proxy for the target tracking metric. Whether using only simple geometric features or more sophisticated ones that also take appearance into account, our approach outperforms the state-of-the-art on several MOT benchmarks
DyGLIP: A Dynamic Graph Model with Link Prediction for Accurate Multi-Camera Multiple Object Tracking
Multi-Camera Multiple Object Tracking (MC-MOT) is a significant computer
vision problem due to its emerging applicability in several real-world
applications. Despite a large number of existing works, solving the data
association problem in any MC-MOT pipeline is arguably one of the most
challenging tasks. Developing a robust MC-MOT system, however, is still highly
challenging due to many practical issues such as inconsistent lighting
conditions, varying object movement patterns, or the trajectory occlusions of
the objects between the cameras. To address these problems, this work,
therefore, proposes a new Dynamic Graph Model with Link Prediction (DyGLIP)
approach to solve the data association task. Compared to existing methods, our
new model offers several advantages, including better feature representations
and the ability to recover from lost tracks during camera transitions.
Moreover, our model works gracefully regardless of the overlapping ratios
between the cameras. Experimental results show that we outperform existing
MC-MOT algorithms by a large margin on several practical datasets. Notably, our
model works favorably on online settings but can be extended to an incremental
approach for large-scale datasets.Comment: accepted at CVPR 202
MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking
Standardized benchmarks have been crucial in pushing the performance of
computer vision algorithms, especially since the advent of deep learning.
Although leaderboards should not be over-claimed, they often provide the most
objective measure of performance and are therefore important guides for
research. We present MOTChallenge, a benchmark for single-camera Multiple
Object Tracking (MOT) launched in late 2014, to collect existing and new data,
and create a framework for the standardized evaluation of multiple object
tracking methods. The benchmark is focused on multiple people tracking, since
pedestrians are by far the most studied object in the tracking community, with
applications ranging from robot navigation to self-driving cars. This paper
collects the first three releases of the benchmark: (i) MOT15, along with
numerous state-of-the-art results that were submitted in the last years, (ii)
MOT16, which contains new challenging videos, and (iii) MOT17, that extends
MOT16 sequences with more precise labels and evaluates tracking performance on
three different object detectors. The second and third release not only offers
a significant increase in the number of labeled boxes but also provide labels
for multiple object classes beside pedestrians, as well as the level of
visibility for every single object of interest. We finally provide a
categorization of state-of-the-art trackers and a broad error analysis. This
will help newcomers understand the related work and research trends in the MOT
community, and hopefully shed some light on potential future research
directions.Comment: Accepted at IJC
Multi-object Tracking from the Classics to the Modern
Visual object tracking is one of the computer vision problems that has been researched extensively over the past several decades. Many computer vision applications, such as robotics, autonomous driving, and video surveillance, require the capability to track multiple objects in videos. The most popular solution approach to tracking multiple objects follows the tracking-by-detection paradigm in which the problem of tracking is divided into object detection and data association. In data association, track proposals are often generated by extending the object tracks from the previous frame with new detections in the current frame. The association algorithm then utilizes a track scorer or classifier in evaluating track proposals in order to estimate the correspondence between the object detections and object tracks. The goal of this dissertation is to design a track scorer and classifier that accurately evaluates track proposals that are generated during the association step. In this dissertation, I present novel track scorers and track classifiers that make a prediction based on long-term object motion and appearance cues and demonstrate its effectiveness in tracking by utilizing them within existing data association frameworks. First, I present an online learning algorithm that can efficiently train a track scorer based on a long-term appearance model for the classical Multiple Hypothesis Tracking (MHT) framework. I show that the classical MHT framework achieves competitive tracking performance even in modern tracking settings in which strong object detector and strong appearance models are available. Second, I present a novel Bilinear LSTM model as a deep, long-term appearance model which is a basis for an end-to-end learned track classifier. The architectural design of Bilinear LSTM is inspired by insights drawn from the classical recursive least squares framework. I incorporate this track classifier into the classical MHT framework in order to demonstrate its effectiveness in object tracking. Third, I present a novel multi-track pooling module that enables the Bilinear LSTM-based track classifier to simultaneously consider all the objects in the scene in order to better handle appearance ambiguities between different objects. I utilize this track classifier in a simple, greedy data association algorithm and achieve real-time, state-of-the-art tracking performance. I evaluate the proposed methods in this dissertation on public multi-object tracking datasets that capture challenging object tracking scenarios in urban areas.Ph.D