56,942 research outputs found
Fusion of Head and Full-Body Detectors for Multi-Object Tracking
In order to track all persons in a scene, the tracking-by-detection paradigm
has proven to be a very effective approach. Yet, relying solely on a single
detector is also a major limitation, as useful image information might be
ignored. Consequently, this work demonstrates how to fuse two detectors into a
tracking system. To obtain the trajectories, we propose to formulate tracking
as a weighted graph labeling problem, resulting in a binary quadratic program.
As such problems are NP-hard, the solution can only be approximated. Based on
the Frank-Wolfe algorithm, we present a new solver that is crucial to handle
such difficult problems. Evaluation on pedestrian tracking is provided for
multiple scenarios, showing superior results over single detector tracking and
standard QP-solvers. Finally, our tracker ranks 2nd on the MOT16 benchmark and
1st on the new MOT17 benchmark, outperforming over 90 trackers.Comment: 10 pages, 4 figures; Winner of the MOT17 challenge; CVPRW 201
Simultaneous Localization and Recognition of Dynamic Hand Gestures
A framework for the simultaneous localization and recognition of dynamic hand gestures is proposed. At the core of this framework is a dynamic space-time warping (DSTW) algorithm, that aligns a pair of query and model gestures in both space and time. For every frame of the query sequence, feature detectors generate multiple hand region candidates. Dynamic programming is then used to compute both a global matching cost, which is used to recognize the query gesture, and a warping path, which aligns the query and model sequences in time, and also finds the best hand candidate region in every query frame. The proposed framework includes translation invariant recognition of gestures, a desirable property for many HCI systems. The performance of the approach is evaluated on a dataset of hand signed digits gestured by people wearing short sleeve shirts, in front of a background containing other non-hand skin-colored objects. The algorithm simultaneously localizes the gesturing hand and recognizes the hand-signed digit. Although DSTW is illustrated in a gesture recognition setting, the proposed algorithm is a general method for matching time series, that allows for multiple candidate feature vectors to be extracted at each time step.National Science Foundation (CNS-0202067, IIS-0308213, IIS-0329009); Office of Naval Research (N00014-03-1-0108
Deep Network Flow for Multi-Object Tracking
Data association problems are an important component of many computer vision
applications, with multi-object tracking being one of the most prominent
examples. A typical approach to data association involves finding a graph
matching or network flow that minimizes a sum of pairwise association costs,
which are often either hand-crafted or learned as linear functions of fixed
features. In this work, we demonstrate that it is possible to learn features
for network-flow-based data association via backpropagation, by expressing the
optimum of a smoothed network flow problem as a differentiable function of the
pairwise association costs. We apply this approach to multi-object tracking
with a network flow formulation. Our experiments demonstrate that we are able
to successfully learn all cost functions for the association problem in an
end-to-end fashion, which outperform hand-crafted costs in all settings. The
integration and combination of various sources of inputs becomes easy and the
cost functions can be learned entirely from data, alleviating tedious
hand-designing of costs.Comment: Accepted to CVPR 201
- …