1,499 research outputs found
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking
With efficient appearance learning models, Discriminative Correlation Filter
(DCF) has been proven to be very successful in recent video object tracking
benchmarks and competitions. However, the existing DCF paradigm suffers from
two major issues, i.e., spatial boundary effect and temporal filter
degradation. To mitigate these challenges, we propose a new DCF-based tracking
method. The key innovations of the proposed method include adaptive spatial
feature selection and temporal consistent constraints, with which the new
tracker enables joint spatial-temporal filter learning in a lower dimensional
discriminative manifold. More specifically, we apply structured spatial
sparsity constraints to multi-channel filers. Consequently, the process of
learning spatial filters can be approximated by the lasso regularisation. To
encourage temporal consistency, the filter model is restricted to lie around
its historical value and updated locally to preserve the global structure in
the manifold. Last, a unified optimisation framework is proposed to jointly
select temporal consistency preserving spatial features and learn
discriminative filters with the augmented Lagrangian method. Qualitative and
quantitative evaluations have been conducted on a number of well-known
benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123 and
VOT2018. The experimental results demonstrate the superiority of the proposed
method over the state-of-the-art approaches
Background Subtraction via Generalized Fused Lasso Foreground Modeling
Background Subtraction (BS) is one of the key steps in video analysis. Many
background models have been proposed and achieved promising performance on
public data sets. However, due to challenges such as illumination change,
dynamic background etc. the resulted foreground segmentation often consists of
holes as well as background noise. In this regard, we consider generalized
fused lasso regularization to quest for intact structured foregrounds. Together
with certain assumptions about the background, such as the low-rank assumption
or the sparse-composition assumption (depending on whether pure background
frames are provided), we formulate BS as a matrix decomposition problem using
regularization terms for both the foreground and background matrices. Moreover,
under the proposed formulation, the two generally distinctive background
assumptions can be solved in a unified manner. The optimization was carried out
via applying the augmented Lagrange multiplier (ALM) method in such a way that
a fast parametric-flow algorithm is used for updating the foreground matrix.
Experimental results on several popular BS data sets demonstrate the advantage
of the proposed model compared to state-of-the-arts
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Convex Relaxations of SE(2) and SE(3) for Visual Pose Estimation
This paper proposes a new method for rigid body pose estimation based on
spectrahedral representations of the tautological orbitopes of and
. The approach can use dense point cloud data from stereo vision or an
RGB-D sensor (such as the Microsoft Kinect), as well as visual appearance data.
The method is a convex relaxation of the classical pose estimation problem, and
is based on explicit linear matrix inequality (LMI) representations for the
convex hulls of and . Given these representations, the relaxed
pose estimation problem can be framed as a robust least squares problem with
the optimization variable constrained to these convex sets. Although this
formulation is a relaxation of the original problem, numerical experiments
indicate that it is indeed exact - i.e. its solution is a member of or
- in many interesting settings. We additionally show that this method
is guaranteed to be exact for a large class of pose estimation problems.Comment: ICRA 2014 Preprin
Sparsity Driven People Localization with a Heterogeneous Network of Cameras
This paper addresses the problem of localizing people in low and high density crowds with a network of heterogeneous cameras. The problem is recast as a linear inverse problem. It relies on deducing the discretized occupancy vector of people on the ground, from the noisy binary silhouettes observed as foreground pixels in each camera. This inverse problem is regularized by imposing a sparse occupancy vector, i.e., made of few non-zero elements, while a particular dictionary of silhouettes linearly maps these non-empty grid locations to the multiple silhouettes viewed by the cameras network. The proposed framework is (i) generic to any scene of people, i.e., people are located in low and high density crowds, (ii) scalable to any number of cameras and already working with a single camera, (iii) unconstrained by the scene surface to be monitored, and (iv) versatile with respect to the camera's geometry, e.g., planar or omnidirectional. Qualitative and quantitative results are presented on the APIDIS and the PETS 2009 Benchmark datasets. The proposed algorithm successfully detects people occluding each other given severely degraded extracted features, while outperforming state-of-the-art people localization technique
- …