20,780 research outputs found
Temporal Extension of Scale Pyramid and Spatial Pyramid Matching for Action Recognition
Historically, researchers in the field have spent a great deal of effort to
create image representations that have scale invariance and retain spatial
location information. This paper proposes to encode equivalent temporal
characteristics in video representations for action recognition. To achieve
temporal scale invariance, we develop a method called temporal scale pyramid
(TSP). To encode temporal information, we present and compare two methods
called temporal extension descriptor (TED) and temporal division pyramid (TDP)
. Our purpose is to suggest solutions for matching complex actions that have
large variation in velocity and appearance, which is missing from most current
action representations. The experimental results on four benchmark datasets,
UCF50, HMDB51, Hollywood2 and Olympic Sports, support our approach and
significantly outperform state-of-the-art methods. Most noticeably, we achieve
65.0% mean accuracy and 68.2% mean average precision on the challenging HMDB51
and Hollywood2 datasets which constitutes an absolute improvement over the
state-of-the-art by 7.8% and 3.9%, respectively
Multi-Scale 3D Scene Flow from Binocular Stereo Sequences
Scene flow methods estimate the three-dimensional motion field for points in the world, using multi-camera video data. Such methods combine multi-view reconstruction with motion estimation. This paper describes an alternative formulation for dense scene flow estimation that provides reliable results using only two cameras by fusing stereo and optical flow estimation into a single coherent framework. Internally, the proposed algorithm generates probability distributions for optical flow and disparity. Taking into account the uncertainty in the intermediate stages allows for more reliable estimation of the 3D scene flow than previous methods allow. To handle the aperture problems inherent in the estimation of optical flow and disparity, a multi-scale method along with a novel region-based technique is used within a regularized solution. This combined approach both preserves discontinuities and prevents over-regularization – two problems commonly associated with the basic multi-scale approaches. Experiments with synthetic and real test data demonstrate the strength of the proposed approach.National Science Foundation (CNS-0202067, IIS-0208876); Office of Naval Research (N00014-03-1-0108
Better Feature Tracking Through Subspace Constraints
Feature tracking in video is a crucial task in computer vision. Usually, the
tracking problem is handled one feature at a time, using a single-feature
tracker like the Kanade-Lucas-Tomasi algorithm, or one of its derivatives.
While this approach works quite well when dealing with high-quality video and
"strong" features, it often falters when faced with dark and noisy video
containing low-quality features. We present a framework for jointly tracking a
set of features, which enables sharing information between the different
features in the scene. We show that our method can be employed to track
features for both rigid and nonrigid motions (possibly of few moving bodies)
even when some features are occluded. Furthermore, it can be used to
significantly improve tracking results in poorly-lit scenes (where there is a
mix of good and bad features). Our approach does not require direct modeling of
the structure or the motion of the scene, and runs in real time on a single CPU
core.Comment: 8 pages, 2 figures. CVPR 201
Video Interpolation using Optical Flow and Laplacian Smoothness
Non-rigid video interpolation is a common computer vision task. In this paper
we present an optical flow approach which adopts a Laplacian Cotangent Mesh
constraint to enhance the local smoothness. Similar to Li et al., our approach
adopts a mesh to the image with a resolution up to one vertex per pixel and
uses angle constraints to ensure sensible local deformations between image
pairs. The Laplacian Mesh constraints are expressed wholly inside the optical
flow optimization, and can be applied in a straightforward manner to a wide
range of image tracking and registration problems. We evaluate our approach by
testing on several benchmark datasets, including the Middlebury and Garg et al.
datasets. In addition, we show application of our method for constructing 3D
Morphable Facial Models from dynamic 3D data
Discriminative Scale Space Tracking
Accurate scale estimation of a target is a challenging research problem in
visual object tracking. Most state-of-the-art methods employ an exhaustive
scale search to estimate the target size. The exhaustive search strategy is
computationally expensive and struggles when encountered with large scale
variations. This paper investigates the problem of accurate and robust scale
estimation in a tracking-by-detection framework. We propose a novel scale
adaptive tracking approach by learning separate discriminative correlation
filters for translation and scale estimation. The explicit scale filter is
learned online using the target appearance sampled at a set of different
scales. Contrary to standard approaches, our method directly learns the
appearance change induced by variations in the target scale. Additionally, we
investigate strategies to reduce the computational cost of our approach.
Extensive experiments are performed on the OTB and the VOT2014 datasets.
Compared to the standard exhaustive scale search, our approach achieves a gain
of 2.5% in average overlap precision on the OTB dataset. Additionally, our
method is computationally efficient, operating at a 50% higher frame rate
compared to the exhaustive scale search. Our method obtains the top rank in
performance by outperforming 19 state-of-the-art trackers on OTB and 37
state-of-the-art trackers on VOT2014.Comment: To appear in TPAMI. This is the journal extension of the
VOT2014-winning DSST tracking metho
- …