10 research outputs found

    Eliminating Exposure Bias and Metric Mismatch in Multiple Object Tracking

    Get PDF
    Identity Switching remains one of the main difficulties Multiple Object Tracking (MOT) algorithms have to deal with. Many state-of-the-art approaches now use sequence models to solve this problem but their training can be affected by biases that decrease their efficiency. In this paper, we introduce a new training procedure that confronts the algorithm to its own mistakes while explicitly attempting to minimize the number of switches, which results in better training. We propose an iterative scheme of building a rich training set and using it to learn a scoring function that is an explicit proxy for the target tracking metric. Whether using only simple geometric features or more sophisticated ones that also take appearance into account, our approach outperforms the state-of-the-art on several MOT benchmarks

    DyGLIP: A Dynamic Graph Model with Link Prediction for Accurate Multi-Camera Multiple Object Tracking

    Get PDF
    Multi-Camera Multiple Object Tracking (MC-MOT) is a significant computer vision problem due to its emerging applicability in several real-world applications. Despite a large number of existing works, solving the data association problem in any MC-MOT pipeline is arguably one of the most challenging tasks. Developing a robust MC-MOT system, however, is still highly challenging due to many practical issues such as inconsistent lighting conditions, varying object movement patterns, or the trajectory occlusions of the objects between the cameras. To address these problems, this work, therefore, proposes a new Dynamic Graph Model with Link Prediction (DyGLIP) approach to solve the data association task. Compared to existing methods, our new model offers several advantages, including better feature representations and the ability to recover from lost tracks during camera transitions. Moreover, our model works gracefully regardless of the overlapping ratios between the cameras. Experimental results show that we outperform existing MC-MOT algorithms by a large margin on several practical datasets. Notably, our model works favorably on online settings but can be extended to an incremental approach for large-scale datasets.Comment: accepted at CVPR 202

    MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking

    Full text link
    Standardized benchmarks have been crucial in pushing the performance of computer vision algorithms, especially since the advent of deep learning. Although leaderboards should not be over-claimed, they often provide the most objective measure of performance and are therefore important guides for research. We present MOTChallenge, a benchmark for single-camera Multiple Object Tracking (MOT) launched in late 2014, to collect existing and new data, and create a framework for the standardized evaluation of multiple object tracking methods. The benchmark is focused on multiple people tracking, since pedestrians are by far the most studied object in the tracking community, with applications ranging from robot navigation to self-driving cars. This paper collects the first three releases of the benchmark: (i) MOT15, along with numerous state-of-the-art results that were submitted in the last years, (ii) MOT16, which contains new challenging videos, and (iii) MOT17, that extends MOT16 sequences with more precise labels and evaluates tracking performance on three different object detectors. The second and third release not only offers a significant increase in the number of labeled boxes but also provide labels for multiple object classes beside pedestrians, as well as the level of visibility for every single object of interest. We finally provide a categorization of state-of-the-art trackers and a broad error analysis. This will help newcomers understand the related work and research trends in the MOT community, and hopefully shed some light on potential future research directions.Comment: Accepted at IJC

    Multi-object Tracking from the Classics to the Modern

    Get PDF
    Visual object tracking is one of the computer vision problems that has been researched extensively over the past several decades. Many computer vision applications, such as robotics, autonomous driving, and video surveillance, require the capability to track multiple objects in videos. The most popular solution approach to tracking multiple objects follows the tracking-by-detection paradigm in which the problem of tracking is divided into object detection and data association. In data association, track proposals are often generated by extending the object tracks from the previous frame with new detections in the current frame. The association algorithm then utilizes a track scorer or classifier in evaluating track proposals in order to estimate the correspondence between the object detections and object tracks. The goal of this dissertation is to design a track scorer and classifier that accurately evaluates track proposals that are generated during the association step. In this dissertation, I present novel track scorers and track classifiers that make a prediction based on long-term object motion and appearance cues and demonstrate its effectiveness in tracking by utilizing them within existing data association frameworks. First, I present an online learning algorithm that can efficiently train a track scorer based on a long-term appearance model for the classical Multiple Hypothesis Tracking (MHT) framework. I show that the classical MHT framework achieves competitive tracking performance even in modern tracking settings in which strong object detector and strong appearance models are available. Second, I present a novel Bilinear LSTM model as a deep, long-term appearance model which is a basis for an end-to-end learned track classifier. The architectural design of Bilinear LSTM is inspired by insights drawn from the classical recursive least squares framework. I incorporate this track classifier into the classical MHT framework in order to demonstrate its effectiveness in object tracking. Third, I present a novel multi-track pooling module that enables the Bilinear LSTM-based track classifier to simultaneously consider all the objects in the scene in order to better handle appearance ambiguities between different objects. I utilize this track classifier in a simple, greedy data association algorithm and achieve real-time, state-of-the-art tracking performance. I evaluate the proposed methods in this dissertation on public multi-object tracking datasets that capture challenging object tracking scenarios in urban areas.Ph.D