384 research outputs found
An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos
Videos represent the primary source of information for surveillance
applications and are available in large amounts but in most cases contain
little or no annotation for supervised learning. This article reviews the
state-of-the-art deep learning based methods for video anomaly detection and
categorizes them based on the type of model and criteria of detection. We also
perform simple studies to understand the different approaches and provide the
criteria of evaluation for spatio-temporal anomaly detection.Comment: 15 pages, double colum
Deep Learning for Crowd Anomaly Detection
Today, public areas across the globe are monitored by an increasing amount of surveillance cameras. This widespread usage has presented an ever-growing volume of data that cannot realistically be examined in real-time. Therefore, efforts to understand crowd dynamics have brought light to automatic systems for the detection of anomalies in crowds. This thesis explores the methods used across literature for this purpose, with a focus on those fusing dense optical flow in a feature extraction stage to the crowd anomaly detection problem. To this extent, five different deep learning architectures are trained using optical flow maps estimated by three deep learning-based techniques. More specifically, a 2D convolutional network, a 3D convolutional network, and LSTM-based convolutional recurrent network, a pre-trained variant of the latter, and a ConvLSTM-based autoencoder is trained using both regular frames and optical flow maps estimated by LiteFlowNet3, RAFT, and GMA on the UCSD Pedestrian 1 dataset. The experimental results have shown that while prone to overfitting, the use of optical flow maps may improve the performance of supervised spatio-temporal architectures
Detecting abnormal events in video using Narrowed Normality Clusters
We formulate the abnormal event detection problem as an outlier detection
task and we propose a two-stage algorithm based on k-means clustering and
one-class Support Vector Machines (SVM) to eliminate outliers. In the feature
extraction stage, we propose to augment spatio-temporal cubes with deep
appearance features extracted from the last convolutional layer of a
pre-trained neural network. After extracting motion and appearance features
from the training video containing only normal events, we apply k-means
clustering to find clusters representing different types of normal motion and
appearance features. In the first stage, we consider that clusters with fewer
samples (with respect to a given threshold) contain mostly outliers, and we
eliminate these clusters altogether. In the second stage, we shrink the borders
of the remaining clusters by training a one-class SVM model on each cluster. To
detected abnormal events in the test video, we analyze each test sample and
consider its maximum normality score provided by the trained one-class SVM
models, based on the intuition that a test sample can belong to only one
cluster of normality. If the test sample does not fit well in any narrowed
normality cluster, then it is labeled as abnormal. We compare our method with
several state-of-the-art methods on three benchmark data sets. The empirical
results indicate that our abnormal event detection framework can achieve better
results in most cases, while processing the test video in real-time at 24
frames per second on a single CPU.Comment: Accepted at WACV 2019. arXiv admin note: text overlap with
arXiv:1705.0818
Novel statistical modeling methods for traffic video analysis
Video analysis is an active and rapidly expanding research area in computer vision and artificial intelligence due to its broad applications in modern society. Many methods have been proposed to analyze the videos, but many challenging factors remain untackled. In this dissertation, four statistical modeling methods are proposed to address some challenging traffic video analysis problems under adverse illumination and weather conditions.
First, a new foreground detection method is presented to detect the foreground objects in videos. A novel Global Foreground Modeling (GFM) method, which estimates a global probability density function for the foreground and applies the Bayes decision rule for model selection, is proposed to model the foreground globally. A Local Background Modeling (LBM) method is applied by choosing the most significant Gaussian density in the Gaussian mixture model to model the background locally for each pixel. In addition, to mitigate the correlation effects of the Red, Green, and Blue (RGB) color space on the independence assumption among the color component images, some other color spaces are investigated for feature extraction. To further enhance the discriminatory power of the input feature vector, the horizontal and vertical Haar wavelet features and the temporal information are integrated into the color features to define a new 12-dimensional feature vector space. Finally, the Bayes classifier is applied for the classification of the foreground and the background pixels.
Second, a novel moving cast shadow detection method is presented to detect and remove the cast shadows from the foreground. Specifically, a set of new chromatic criteria is presented to detect the candidate shadow pixels in the Hue, Saturation, and Value (HSV) color space. A new shadow region detection method is then proposed to cluster the candidate shadow pixels into shadow regions. A statistical shadow model, which uses a single Gaussian distribution to model the shadow class, is presented to classify shadow pixels. Additionally, an aggregated shadow detection strategy is presented to integrate the shadow detection results and remove the shadows from the foreground.
Third, a novel statistical modeling method is presented to solve the automated road recognition problem for the Region of Interest (RoI) detection in traffic video analysis. A temporal feature guided statistical modeling method is proposed for road modeling. Additionally, a model pruning strategy is applied to estimate the road model. Then, a new road region detection method is presented to detect the road regions in the video. The method applies discriminant functions to classify each pixel in the estimated background image into a road class or a non-road class, respectively. The proposed method provides an intra-cognitive communication mode between the RoI selection and video analysis systems.
Fourth, a novel anomalous driving detection method in videos, which can detect unsafe anomalous driving behaviors is introduced. A new Multiple Object Tracking (MOT) method is proposed to extract the velocities and trajectories of moving foreground objects in video. The new MOT method is a motion-based tracking method, which integrates the temporal and spatial features. Then, a novel Gaussian Local Velocity (GLV) modeling method is presented to model the normal moving behavior in traffic videos. The GLV model is built for every location in the video frame, and updated online. Finally, a discriminant function is proposed to detect anomalous driving behaviors.
To assess the feasibility of the proposed statistical modeling methods, several popular public video datasets, as well as the real traffic videos from the New Jersey Department of Transportation (NJDOT) are applied. The experimental results show the effectiveness and feasibility of the proposed methods
- …