203 research outputs found
Future Frame Prediction for Anomaly Detection -- A New Baseline
Anomaly detection in videos refers to the identification of events that do
not conform to expected behavior. However, almost all existing methods tackle
the problem by minimizing the reconstruction errors of training data, which
cannot guarantee a larger reconstruction error for an abnormal event. In this
paper, we propose to tackle the anomaly detection problem within a video
prediction framework. To the best of our knowledge, this is the first work that
leverages the difference between a predicted future frame and its ground truth
to detect an abnormal event. To predict a future frame with higher quality for
normal events, other than the commonly used appearance (spatial) constraints on
intensity and gradient, we also introduce a motion (temporal) constraint in
video prediction by enforcing the optical flow between predicted frames and
ground truth frames to be consistent, and this is the first work that
introduces a temporal constraint into the video prediction task. Such spatial
and motion constraints facilitate the future frame prediction for normal
events, and consequently facilitate to identify those abnormal events that do
not conform the expectation. Extensive experiments on both a toy dataset and
some publicly available datasets validate the effectiveness of our method in
terms of robustness to the uncertainty in normal events and the sensitivity to
abnormal events.Comment: IEEE Conference on Computer Vision and Pattern Recognition 201
Constructing a Non-Negative Low Rank and Sparse Graph with Data-Adaptive Features
This paper aims at constructing a good graph for discovering intrinsic data
structures in a semi-supervised learning setting. Firstly, we propose to build
a non-negative low-rank and sparse (referred to as NNLRS) graph for the given
data representation. Specifically, the weights of edges in the graph are
obtained by seeking a nonnegative low-rank and sparse matrix that represents
each data sample as a linear combination of others. The so-obtained NNLRS-graph
can capture both the global mixture of subspaces structure (by the low
rankness) and the locally linear structure (by the sparseness) of the data,
hence is both generative and discriminative. Secondly, as good features are
extremely important for constructing a good graph, we propose to learn the data
embedding matrix and construct the graph jointly within one framework, which is
termed as NNLRS with embedded features (referred to as NNLRS-EF). Extensive
experiments on three publicly available datasets demonstrate that the proposed
method outperforms the state-of-the-art graph construction method by a large
margin for both semi-supervised classification and discriminative analysis,
which verifies the effectiveness of our proposed method
PlaneDepth: Plane-Based Self-Supervised Monocular Depth Estimation
Self-supervised monocular depth estimation refers to training a monocular
depth estimation (MDE) network using only RGB images to overcome the difficulty
of collecting dense ground truth depth. Many previous works addressed this
problem using depth classification or depth regression. However, depth
classification tends to fall into local minima due to the bilinear
interpolation search on the target view. Depth classification overcomes this
problem using pre-divided depth bins, but those depth candidates lead to
discontinuities in the final depth result, and using the same probability for
weighted summation of color and depth is ambiguous. To overcome these
limitations, we use some predefined planes that are parallel to the ground,
allowing us to automatically segment the ground and predict continuous depth
for it. We further model depth as a mixture Laplace distribution, which
provides a more certain objective for optimization. Previous works have shown
that MDE networks only use the vertical image position of objects to estimate
the depth and ignore relative sizes. We address this problem for the first time
in both stereo and monocular training using resize cropping data augmentation.
Based on our analysis of resize cropping, we combine it with our plane
definition and improve our training strategy so that the network could learn
the relationship between depth and both the vertical image position and
relative size of objects. We further combine the self-distillation stage with
post-processing to provide more accurate supervision and save extra time in
post-processing. We conduct extensive experiments to demonstrate the
effectiveness of our analysis and improvements.Comment: 12 pages, 7 figure
- …