369 research outputs found
Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep Models
Video Anomaly Detection (VAD) serves as a pivotal technology in the
intelligent surveillance systems, enabling the temporal or spatial
identification of anomalous events within videos. While existing reviews
predominantly concentrate on conventional unsupervised methods, they often
overlook the emergence of weakly-supervised and fully-unsupervised approaches.
To address this gap, this survey extends the conventional scope of VAD beyond
unsupervised methods, encompassing a broader spectrum termed Generalized Video
Anomaly Event Detection (GVAED). By skillfully incorporating recent
advancements rooted in diverse assumptions and learning frameworks, this survey
introduces an intuitive taxonomy that seamlessly navigates through
unsupervised, weakly-supervised, supervised and fully-unsupervised VAD
methodologies, elucidating the distinctions and interconnections within these
research trajectories. In addition, this survey facilitates prospective
researchers by assembling a compilation of research resources, including public
datasets, available codebases, programming tools, and pertinent literature.
Furthermore, this survey quantitatively assesses model performance, delves into
research challenges and directions, and outlines potential avenues for future
exploration.Comment: Accepted by ACM Computing Surveys. For more information, please see
our project page: https://github.com/fudanyliu/GVAE
Exploiting Spatial-temporal Correlations for Video Anomaly Detection
Video anomaly detection (VAD) remains a challenging task in the pattern
recognition community due to the ambiguity and diversity of abnormal events.
Existing deep learning-based VAD methods usually leverage proxy tasks to learn
the normal patterns and discriminate the instances that deviate from such
patterns as abnormal. However, most of them do not take full advantage of
spatial-temporal correlations among video frames, which is critical for
understanding normal patterns. In this paper, we address unsupervised VAD by
learning the evolution regularity of appearance and motion in the long and
short-term and exploit the spatial-temporal correlations among consecutive
frames in normal videos more adequately. Specifically, we proposed to utilize
the spatiotemporal long short-term memory (ST-LSTM) to extract and memorize
spatial appearances and temporal variations in a unified memory cell. In
addition, inspired by the generative adversarial network, we introduce a
discriminator to perform adversarial learning with the ST-LSTM to enhance the
learning capability. Experimental results on standard benchmarks demonstrate
the effectiveness of spatial-temporal correlations for unsupervised VAD. Our
method achieves competitive performance compared to the state-of-the-art
methods with AUCs of 96.7%, 87.8%, and 73.1% on the UCSD Ped2, CUHK Avenue, and
ShanghaiTech, respectively.Comment: This paper is accepted at IEEE 26TH International Conference on
Pattern Recognition (ICPR) 202
Towards Explainable Visual Anomaly Detection
Anomaly detection and localization of visual data, including images and
videos, are of great significance in both machine learning academia and applied
real-world scenarios. Despite the rapid development of visual anomaly detection
techniques in recent years, the interpretations of these black-box models and
reasonable explanations of why anomalies can be distinguished out are scarce.
This paper provides the first survey concentrated on explainable visual anomaly
detection methods. We first introduce the basic background of image-level
anomaly detection and video-level anomaly detection, followed by the current
explainable approaches for visual anomaly detection. Then, as the main content
of this survey, a comprehensive and exhaustive literature review of explainable
anomaly detection methods for both images and videos is presented. Finally, we
discuss several promising future directions and open problems to explore on the
explainability of visual anomaly detection
Future Frame Prediction for Anomaly Detection -- A New Baseline
Anomaly detection in videos refers to the identification of events that do
not conform to expected behavior. However, almost all existing methods tackle
the problem by minimizing the reconstruction errors of training data, which
cannot guarantee a larger reconstruction error for an abnormal event. In this
paper, we propose to tackle the anomaly detection problem within a video
prediction framework. To the best of our knowledge, this is the first work that
leverages the difference between a predicted future frame and its ground truth
to detect an abnormal event. To predict a future frame with higher quality for
normal events, other than the commonly used appearance (spatial) constraints on
intensity and gradient, we also introduce a motion (temporal) constraint in
video prediction by enforcing the optical flow between predicted frames and
ground truth frames to be consistent, and this is the first work that
introduces a temporal constraint into the video prediction task. Such spatial
and motion constraints facilitate the future frame prediction for normal
events, and consequently facilitate to identify those abnormal events that do
not conform the expectation. Extensive experiments on both a toy dataset and
some publicly available datasets validate the effectiveness of our method in
terms of robustness to the uncertainty in normal events and the sensitivity to
abnormal events.Comment: IEEE Conference on Computer Vision and Pattern Recognition 201
Anomaly Detection in Aerial Videos with Transformers
Unmanned aerial vehicles (UAVs) are widely applied for purposes of
inspection, search, and rescue operations by the virtue of low-cost,
large-coverage, real-time, and high-resolution data acquisition capacities.
Massive volumes of aerial videos are produced in these processes, in which
normal events often account for an overwhelming proportion. It is extremely
difficult to localize and extract abnormal events containing potentially
valuable information from long video streams manually. Therefore, we are
dedicated to developing anomaly detection methods to solve this issue. In this
paper, we create a new dataset, named DroneAnomaly, for anomaly detection in
aerial videos. This dataset provides 37 training video sequences and 22 testing
video sequences from 7 different realistic scenes with various anomalous
events. There are 87,488 color video frames (51,635 for training and 35,853 for
testing) with the size of at 30 frames per second. Based on
this dataset, we evaluate existing methods and offer a benchmark for this task.
Furthermore, we present a new baseline model, ANomaly Detection with
Transformers (ANDT), which treats consecutive video frames as a sequence of
tubelets, utilizes a Transformer encoder to learn feature representations from
the sequence, and leverages a decoder to predict the next frame. Our network
models normality in the training phase and identifies an event with
unpredictable temporal dynamics as an anomaly in the test phase. Moreover, To
comprehensively evaluate the performance of our proposed method, we use not
only our Drone-Anomaly dataset but also another dataset. We will make our
dataset and code publicly available. A demo video is available at
https://youtu.be/ancczYryOBY. We make our dataset and code publicly available
Latent Space Autoregression for Novelty Detection
Novelty detection is commonly referred to as the discrimination of observations that do not conform to a learned model of regularity. Despite its importance in different application settings, designing a novelty detector is utterly complex due to the unpredictable nature of novelties and its inaccessibility during the training procedure, factors which expose the unsupervised nature of the problem. In our proposal, we design a general framework where we equip a deep autoencoder with a parametric density estimator that learns the probability distribution underlying its latent representations through an autoregressive procedure.
We show that a maximum likelihood objective, optimized in conjunction with the reconstruction of normal samples, effectively acts as a regularizer for the task at hand, by minimizing the differential entropy of the distribution spanned by latent vectors. In addition to providing a very general formulation, extensive experiments of our model on publicly available datasets deliver on-par or superior performances if compared to state-of-the-art methods in one-class and video anomaly detection settings. Differently from prior works, our proposal does not make any assumption about the nature of the novelties, making our work readily applicable to diverse contexts
- …