1,227 research outputs found
Energy-based Models for Video Anomaly Detection
Automated detection of abnormalities in data has been studied in research
area in recent years because of its diverse applications in practice including
video surveillance, industrial damage detection and network intrusion
detection. However, building an effective anomaly detection system is a
non-trivial task since it requires to tackle challenging issues of the shortage
of annotated data, inability of defining anomaly objects explicitly and the
expensive cost of feature engineering procedure. Unlike existing appoaches
which only partially solve these problems, we develop a unique framework to
cope the problems above simultaneously. Instead of hanlding with ambiguous
definition of anomaly objects, we propose to work with regular patterns whose
unlabeled data is abundant and usually easy to collect in practice. This allows
our system to be trained completely in an unsupervised procedure and liberate
us from the need for costly data annotation. By learning generative model that
capture the normality distribution in data, we can isolate abnormal data points
that result in low normality scores (high abnormality scores). Moreover, by
leverage on the power of generative networks, i.e. energy-based models, we are
also able to learn the feature representation automatically rather than
replying on hand-crafted features that have been dominating anomaly detection
research over many decades. We demonstrate our proposal on the specific
application of video anomaly detection and the experimental results indicate
that our method performs better than baselines and are comparable with
state-of-the-art methods in many benchmark video anomaly detection datasets
Crowd Management in Open Spaces
Crowd analysis and management is a challenging problem to ensure public
safety and security. For this purpose, many techniques have been proposed to
cope with various problems. However, the generalization capabilities of these
techniques is limited due to ignoring the fact that the density of crowd
changes from low to extreme high depending on the scene under observation. We
propose robust feature based approach to deal with the problem of crowd
management for people safety and security. We have evaluated our method using a
benchmark dataset and have presented details analysis
Detection of Unknown Anomalies in Streaming Videos with Generative Energy-based Boltzmann Models
Abnormal event detection is one of the important objectives in research and
practical applications of video surveillance. However, there are still three
challenging problems for most anomaly detection systems in practical setting:
limited labeled data, ambiguous definition of "abnormal" and expensive feature
engineering steps. This paper introduces a unified detection framework to
handle these challenges using energy-based models, which are powerful tools for
unsupervised representation learning. Our proposed models are firstly trained
on unlabeled raw pixels of image frames from an input video rather than
hand-crafted visual features; and then identify the locations of abnormal
objects based on the errors between the input video and its reconstruction
produced by the models. To handle video stream, we develop an online version of
our framework, wherein the model parameters are updated incrementally with the
image frames arriving on the fly. Our experiments show that our detectors,
using Restricted Boltzmann Machines (RBMs) and Deep Boltzmann Machines (DBMs)
as core modules, achieve superior anomaly detection performance to unsupervised
baselines and obtain accuracy comparable with the state-of-the-art approaches
when evaluating at the pixel-level. More importantly, we discover that our
system trained with DBMs is able to simultaneously perform scene clustering and
scene reconstruction. This capacity not only distinguishes our method from
other existing detectors but also offers a unique tool to investigate and
understand how the model works.Comment: This manuscript is under consideration at Pattern Recognition Letter
Street Scene: A new dataset and evaluation protocol for video anomaly detection
Progress in video anomaly detection research is currently slowed by small
datasets that lack a wide variety of activities as well as flawed evaluation
criteria. This paper aims to help move this research effort forward by
introducing a large and varied new dataset called Street Scene, as well as two
new evaluation criteria that provide a better estimate of how an algorithm will
perform in practice. In addition to the new dataset and evaluation criteria, we
present two variations of a novel baseline video anomaly detection algorithm
and show they are much more accurate on Street Scene than two state-of-the-art
algorithms from the literature.Comment: accepted to WACV 202
Unsupervised Online Anomaly Detection On Irregularly Sampled Or Missing Valued Time-Series Data Using LSTM Networks
We study anomaly detection and introduce an algorithm that processes variable
length, irregularly sampled sequences or sequences with missing values. Our
algorithm is fully unsupervised, however, can be readily extended to supervised
or semisupervised cases when the anomaly labels are present as remarked
throughout the paper. Our approach uses the Long Short Term Memory (LSTM)
networks in order to extract temporal features and find the most relevant
feature vectors for anomaly detection. We incorporate the sampling time
information to our model by modulating the standard LSTM model with time
modulation gates. After obtaining the most relevant features from the LSTM, we
label the sequences using a Support Vector Data Descriptor (SVDD) model. We
introduce a loss function and then jointly optimize the feature extraction and
sequence processing mechanisms in an end-to-end manner. Through this joint
optimization, the LSTM extracts the most relevant features for anomaly
detection later to be used in the SVDD, hence completely removes the need for
feature selection by expert knowledge. Furthermore, we provide a training
algorithm for the online setup, where we optimize our model parameters with
individual sequences as the new data arrives. Finally, on real-life datasets,
we show that our model significantly outperforms the standard approaches thanks
to its combination of LSTM with SVDD and joint optimization.Comment: 11 page
Fence GAN: Towards Better Anomaly Detection
Anomaly detection is a classical problem where the aim is to detect anomalous
data that do not belong to the normal data distribution. Current
state-of-the-art methods for anomaly detection on complex high-dimensional data
are based on the generative adversarial network (GAN). However, the traditional
GAN loss is not directly aligned with the anomaly detection objective: it
encourages the distribution of the generated samples to overlap with the real
data and so the resulting discriminator has been found to be ineffective as an
anomaly detector. In this paper, we propose simple modifications to the GAN
loss such that the generated samples lie at the boundary of the real data
distribution. With our modified GAN loss, our anomaly detection method, called
Fence GAN (FGAN), directly uses the discriminator score as an anomaly
threshold. Our experimental results using the MNIST, CIFAR10 and KDD99 datasets
show that Fence GAN yields the best anomaly classification accuracy compared to
state-of-the-art methods
DAP3D-Net: Where, What and How Actions Occur in Videos?
Action parsing in videos with complex scenes is an interesting but
challenging task in computer vision. In this paper, we propose a generic 3D
convolutional neural network in a multi-task learning manner for effective Deep
Action Parsing (DAP3D-Net) in videos. Particularly, in the training phase,
action localization, classification and attributes learning can be jointly
optimized on our appearancemotion data via DAP3D-Net. For an upcoming test
video, we can describe each individual action in the video simultaneously as:
Where the action occurs, What the action is and How the action is performed. To
well demonstrate the effectiveness of the proposed DAP3D-Net, we also
contribute a new Numerous-category Aligned Synthetic Action dataset, i.e.,
NASA, which consists of 200; 000 action clips of more than 300 categories and
with 33 pre-defined action attributes in two hierarchical levels (i.e.,
low-level attributes of basic body part movements and high-level attributes
related to action motion). We learn DAP3D-Net using the NASA dataset and then
evaluate it on our collected Human Action Understanding (HAU) dataset.
Experimental results show that our approach can accurately localize, categorize
and describe multiple actions in realistic videos
Adversarially Learned One-Class Classifier for Novelty Detection
Novelty detection is the process of identifying the observation(s) that
differ in some respect from the training observations (the target class). In
reality, the novelty class is often absent during training, poorly sampled or
not well defined. Therefore, one-class classifiers can efficiently model such
problems. However, due to the unavailability of data from the novelty class,
training an end-to-end deep network is a cumbersome task. In this paper,
inspired by the success of generative adversarial networks for training deep
models in unsupervised and semi-supervised settings, we propose an end-to-end
architecture for one-class classification. Our architecture is composed of two
deep networks, each of which trained by competing with each other while
collaborating to understand the underlying concept in the target class, and
then classify the testing samples. One network works as the novelty detector,
while the other supports it by enhancing the inlier samples and distorting the
outliers. The intuition is that the separability of the enhanced inliers and
distorted outliers is much better than deciding on the original samples. The
proposed framework applies to different related applications of anomaly and
outlier detection in images and videos. The results on MNIST and Caltech-256
image datasets, along with the challenging UCSD Ped2 dataset for video anomaly
detection illustrate that our proposed method learns the target class
effectively and is superior to the baseline and state-of-the-art methods.Comment: CVPR 2018 Pape
Plug-and-Play Anomaly Detection with Expectation Maximization Filtering
Anomaly detection in crowds enables early rescue response. A plug-and-play
smart camera for crowd surveillance has numerous constraints different from
typical anomaly detection: the training data cannot be used iteratively; there
are no training labels; and training and classification needs to be performed
simultaneously. We tackle all these constraints with our approach in this
paper. We propose a Core Anomaly-Detection (CAD) neural network which learns
the motion behavior of objects in the scene with an unsupervised method. On
average over standard datasets, CAD with a single epoch of training shows a
percentage increase in Area Under the Curve (AUC) of 4.66% and 4.9% compared to
the best results with convolutional autoencoders and convolutional LSTM-based
methods, respectively. With a single epoch of training, our method improves the
AUC by 8.03% compared to the convolutional LSTM-based approach. We also propose
an Expectation Maximization filter which chooses samples for training the core
anomaly-detection network. The overall framework improves the AUC compared to
future frame prediction-based approach by 24.87% when crowd anomaly detection
is performed on a video stream. We believe our work is the first step towards
using deep learning methods with autonomous plug-and-play smart cameras for
crowd anomaly detection
Video Anomaly Detection and Localization via Gaussian Mixture Fully Convolutional Variational Autoencoder
We present a novel end-to-end partially supervised deep learning approach for
video anomaly detection and localization using only normal samples. The insight
that motivates this study is that the normal samples can be associated with at
least one Gaussian component of a Gaussian Mixture Model (GMM), while anomalies
either do not belong to any Gaussian component. The method is based on Gaussian
Mixture Variational Autoencoder, which can learn feature representations of the
normal samples as a Gaussian Mixture Model trained using deep learning. A Fully
Convolutional Network (FCN) that does not contain a fully-connected layer is
employed for the encoder-decoder structure to preserve relative spatial
coordinates between the input image and the output feature map. Based on the
joint probabilities of each of the Gaussian mixture components, we introduce a
sample energy based method to score the anomaly of image test patches. A
two-stream network framework is employed to combine the appearance and motion
anomalies, using RGB frames for the former and dynamic flow images, for the
latter. We test our approach on two popular benchmarks (UCSD Dataset and Avenue
Dataset). The experimental results verify the superiority of our method
compared to the state of the arts
- …