15 research outputs found
Real-time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification
Online multi-object tracking is a fundamental problem in time-critical video
analysis applications. A major challenge in the popular tracking-by-detection
framework is how to associate unreliable detection results with existing
tracks. In this paper, we propose to handle unreliable detection by collecting
candidates from outputs of both detection and tracking. The intuition behind
generating redundant candidates is that detection and tracks can complement
each other in different scenarios. Detection results of high confidence prevent
tracking drifts in the long term, and predictions of tracks can handle noisy
detection caused by occlusion. In order to apply optimal selection from a
considerable amount of candidates in real-time, we present a novel scoring
function based on a fully convolutional neural network, that shares most
computations on the entire image. Moreover, we adopt a deeply learned
appearance representation, which is trained on large-scale person
re-identification datasets, to improve the identification ability of our
tracker. Extensive experiments show that our tracker achieves real-time and
state-of-the-art performance on a widely used people tracking benchmark.Comment: ICME 201
Learning Non-Uniform Hypergraph for Multi-Object Tracking
The majority of Multi-Object Tracking (MOT) algorithms based on the
tracking-by-detection scheme do not use higher order dependencies among objects
or tracklets, which makes them less effective in handling complex scenarios. In
this work, we present a new near-online MOT algorithm based on non-uniform
hypergraph, which can model different degrees of dependencies among tracklets
in a unified objective. The nodes in the hypergraph correspond to the tracklets
and the hyperedges with different degrees encode various kinds of dependencies
among them. Specifically, instead of setting the weights of hyperedges with
different degrees empirically, they are learned automatically using the
structural support vector machine algorithm (SSVM). Several experiments are
carried out on various challenging datasets (i.e., PETS09, ParkingLot sequence,
SubwayFace, and MOT16 benchmark), to demonstrate that our method achieves
favorable performance against the state-of-the-art MOT methods.Comment: 11 pages, 4 figures, accepted by AAAI 201
CONFIDENCE-AWARE PEDESTRIAN TRACKING USING A STEREO CAMERA
Pedestrian tracking is a significant problem in autonomous driving. The majority of studies carries out tracking in the image domain, which is not sufficient for many realistic applications like path planning, collision avoidance, and autonomous navigation. In this study, we address pedestrian tracking using stereo images and tracking-by-detection. Our framework comes in three primary phases: (1) people are detected in image space by the mask R-CNN detector and their positions in 3D-space are computed using stereo information; (2) corresponding detections are assigned to each other across consecutive frames based on visual characteristics and 3D geometry; and (3) the current positions of pedestrians are corrected using their previous states using an extended Kalman filter. We use our tracking-to-confirm-detection method, in which detections are treated differently depending on their confidence metrics. To obtain a high recall value while keeping a low number of false positives. While existing methods consider all target trajectories have equal accuracy, we estimate a confidence value for each trajectory at every epoch. Thus, depending on their confidence values, the targets can have different contributions to the whole tracking system. The performance of our approach is evaluated using the Kitti benchmark dataset. It shows promising results comparable to those of other state-of-the-art methods
Improving Multi-Frame Data Association with Sparse Representations for Robust Near-Online Multi-Object Tracking
Multiple Object Tracking still remains a difficult problem due to appearance variations and occlusions of the targets or detection failures. Using sophisticated appearance models or performing data association over multiple frames are two common approaches that lead to gain in performances. Inspired by the success of sparse representations in Single Object Tracking, we propose to formulate the multi-frame data association step as an energy minimization problem, designing an energy that efficiently exploits sparse representations of all detections. Furthermore, we propose to use a structured sparsity-inducing norm to compute representations more suited to the tracking context. We perform extensive experiments to demonstrate the effectiveness of the proposed formulation , and evaluate our approach on two public authoritative benchmarks in order to compare it with several state-of-the-art methods
MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking
Standardized benchmarks have been crucial in pushing the performance of
computer vision algorithms, especially since the advent of deep learning.
Although leaderboards should not be over-claimed, they often provide the most
objective measure of performance and are therefore important guides for
research. We present MOTChallenge, a benchmark for single-camera Multiple
Object Tracking (MOT) launched in late 2014, to collect existing and new data,
and create a framework for the standardized evaluation of multiple object
tracking methods. The benchmark is focused on multiple people tracking, since
pedestrians are by far the most studied object in the tracking community, with
applications ranging from robot navigation to self-driving cars. This paper
collects the first three releases of the benchmark: (i) MOT15, along with
numerous state-of-the-art results that were submitted in the last years, (ii)
MOT16, which contains new challenging videos, and (iii) MOT17, that extends
MOT16 sequences with more precise labels and evaluates tracking performance on
three different object detectors. The second and third release not only offers
a significant increase in the number of labeled boxes but also provide labels
for multiple object classes beside pedestrians, as well as the level of
visibility for every single object of interest. We finally provide a
categorization of state-of-the-art trackers and a broad error analysis. This
will help newcomers understand the related work and research trends in the MOT
community, and hopefully shed some light on potential future research
directions.Comment: Accepted at IJC