16,797 research outputs found
Long-Term Identity-Aware Multi-Person Tracking for Surveillance Video Summarization
Multi-person tracking plays a critical role in the analysis of surveillance
video. However, most existing work focus on shorter-term (e.g. minute-long or
hour-long) video sequences. Therefore, we propose a multi-person tracking
algorithm for very long-term (e.g. month-long) multi-camera surveillance
scenarios. Long-term tracking is challenging because 1) the apparel/appearance
of the same person will vary greatly over multiple days and 2) a person will
leave and re-enter the scene numerous times. To tackle these challenges, we
leverage face recognition information, which is robust to apparel change, to
automatically reinitialize our tracker over multiple days of recordings.
Unfortunately, recognized faces are unavailable oftentimes. Therefore, our
tracker propagates identity information to frames without recognized faces by
uncovering the appearance and spatial manifold formed by person detections. We
tested our algorithm on a 23-day 15-camera data set (4,935 hours total), and we
were able to localize a person 53.2% of the time with 69.8% precision. We
further performed video summarization experiments based on our tracking output.
Results on 116.25 hours of video showed that we were able to generate a
reasonable visual diary (i.e. a summary of what a person did) for different
people, thus potentially opening the door to automatic summarization of the
vast amount of surveillance video generated every day
Underwater Fish Tracking for Moving Cameras based on Deformable Multiple Kernels
Fishery surveys that call for the use of single or multiple underwater
cameras have been an emerging technology as a non-extractive mean to estimate
the abundance of fish stocks. Tracking live fish in an open aquatic environment
posts challenges that are different from general pedestrian or vehicle tracking
in surveillance applications. In many rough habitats fish are monitored by
cameras installed on moving platforms, where tracking is even more challenging
due to inapplicability of background models. In this paper, a novel tracking
algorithm based on the deformable multiple kernels (DMK) is proposed to address
these challenges. Inspired by the deformable part model (DPM) technique, a set
of kernels is defined to represent the holistic object and several parts that
are arranged in a deformable configuration. Color histogram, texture histogram
and the histogram of oriented gradients (HOG) are extracted and serve as object
features. Kernel motion is efficiently estimated by the mean-shift algorithm on
color and texture features to realize tracking. Furthermore, the HOG-feature
deformation costs are adopted as soft constraints on kernel positions to
maintain the part configuration. Experimental results on practical video set
from underwater moving cameras show the reliable performance of the proposed
method with much less computational cost comparing with state-of-the-art
techniques.Comment: 11 page
Geometric Hypergraph Learning for Visual Tracking
Graph based representation is widely used in visual tracking field by finding
correct correspondences between target parts in consecutive frames. However,
most graph based trackers consider pairwise geometric relations between local
parts. They do not make full use of the target's intrinsic structure, thereby
making the representation easily disturbed by errors in pairwise affinities
when large deformation and occlusion occur. In this paper, we propose a
geometric hypergraph learning based tracking method, which fully exploits
high-order geometric relations among multiple correspondences of parts in
consecutive frames. Then visual tracking is formulated as the mode-seeking
problem on the hypergraph in which vertices represent correspondence hypotheses
and hyperedges describe high-order geometric relations. Besides, a
confidence-aware sampling method is developed to select representative vertices
and hyperedges to construct the geometric hypergraph for more robustness and
scalability. The experiments are carried out on two challenging datasets
(VOT2014 and Deform-SOT) to demonstrate that the proposed method performs
favorable against other existing trackers
Motion-Appearance Interactive Encoding for Object Segmentation in Unconstrained Videos
We present a novel method of integrating motion and appearance cues for
foreground object segmentation in unconstrained videos. Unlike conventional
methods encoding motion and appearance patterns individually, our method puts
particular emphasis on their mutual assistance. Specifically, we propose using
an interactively constrained encoding (ICE) scheme to incorporate motion and
appearance patterns into a graph that leads to a spatiotemporal energy
optimization. The reason of utilizing ICE is that both motion and appearance
cues for the same target share underlying correlative structure, thus can be
exploited in a deeply collaborative manner. We perform ICE not only in the
initialization but also in the refinement stage of a two-layer framework for
object segmentation. This scheme allows our method to consistently capture
structural patterns about object perceptions throughout the whole framework.
Our method can be operated on superpixels instead of raw pixels to reduce the
number of graph nodes by two orders of magnitude. Moreover, we propose to
partially explore the multi-object localization problem with inter-occlusion by
weighted bipartite graph matching. Comprehensive experiments on three benchmark
datasets (i.e., SegTrack, MOViCS, and GaTech) demonstrate the effectiveness of
our approach compared with extensive state-of-the-art methods.Comment: 11 pages, 7 figure
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
Tracklet Association Tracker: An End-to-End Learning-based Association Approach for Multi-Object Tracking
Traditional multiple object tracking methods divide the task into two parts:
affinity learning and data association. The separation of the task requires to
define a hand-crafted training goal in affinity learning stage and a
hand-crafted cost function of data association stage, which prevents the
tracking goals from learning directly from the feature. In this paper, we
present a new multiple object tracking (MOT) framework with data-driven
association method, named as Tracklet Association Tracker (TAT). The framework
aims at gluing feature learning and data association into a unity by a bi-level
optimization formulation so that the association results can be directly
learned from features. To boost the performance, we also adopt the popular
hierarchical association and perform the necessary alignment and selection of
raw detection responses. Our model trains over 20X faster than a similar
approach, and achieves the state-of-the-art performance on both MOT2016 and
MOT2017 benchmarks
Visual Tracking via Reliable Memories
In this paper, we propose a novel visual tracking framework that
intelligently discovers reliable patterns from a wide range of video to resist
drift error for long-term tracking tasks. First, we design a Discrete Fourier
Transform (DFT) based tracker which is able to exploit a large number of
tracked samples while still ensures real-time performance. Second, we propose a
clustering method with temporal constraints to explore and memorize consistent
patterns from previous frames, named as reliable memories. By virtue of this
method, our tracker can utilize uncontaminated information to alleviate
drifting issues. Experimental results show that our tracker performs favorably
against other state of-the-art methods on benchmark datasets. Furthermore, it
is significantly competent in handling drifts and able to robustly track
challenging long videos over 4000 frames, while most of others lose track at
early frames
Supervised Descent Method for Solving Nonlinear Least Squares Problems in Computer Vision
Many computer vision problems (e.g., camera calibration, image alignment,
structure from motion) are solved with nonlinear optimization methods. It is
generally accepted that second order descent methods are the most robust, fast,
and reliable approaches for nonlinear optimization of a general smooth
function. However, in the context of computer vision, second order descent
methods have two main drawbacks: (1) the function might not be analytically
differentiable and numerical approximations are impractical, and (2) the
Hessian may be large and not positive definite. To address these issues, this
paper proposes generic descent maps, which are average "descent directions" and
rescaling factors learned in a supervised fashion. Using generic descent maps,
we derive a practical algorithm - Supervised Descent Method (SDM) - for
minimizing Nonlinear Least Squares (NLS) problems. During training, SDM learns
a sequence of decent maps that minimize the NLS. In testing, SDM minimizes the
NLS objective using the learned descent maps without computing the Jacobian or
the Hessian. We prove the conditions under which the SDM is guaranteed to
converge. We illustrate the effectiveness and accuracy of SDM in three computer
vision problems: rigid image alignment, non-rigid image alignment, and 3D pose
estimation. In particular, we show how SDM achieves state-of-the-art
performance in the problem of facial feature detection. The code has been made
available at www.humansensing.cs.cmu.edu/intraface.Comment: 15 pages. In submission to TPAM
Tracking with multi-level features
We present a novel formulation of the multiple object tracking problem which
integrates low and mid-level features. In particular, we formulate the tracking
problem as a quadratic program coupling detections and dense point
trajectories. Due to the computational complexity of the initial QP, we propose
an approximation by two auxiliary problems, a temporal and spatial association,
where the temporal subproblem can be efficiently solved by a linear program and
the spatial association by a clustering algorithm. The objective function of
the QP is used in order to find the optimal number of clusters, where each
cluster ideally represents one person. Evaluation is provided for multiple
scenarios, showing the superiority of our method with respect to classic
tracking-by-detection methods and also other methods that greedily integrate
low-level features.Comment: Submitted as an IEEE PAMI short articl
- …