14,611 research outputs found
Energy-based Models for Video Anomaly Detection
Automated detection of abnormalities in data has been studied in research
area in recent years because of its diverse applications in practice including
video surveillance, industrial damage detection and network intrusion
detection. However, building an effective anomaly detection system is a
non-trivial task since it requires to tackle challenging issues of the shortage
of annotated data, inability of defining anomaly objects explicitly and the
expensive cost of feature engineering procedure. Unlike existing appoaches
which only partially solve these problems, we develop a unique framework to
cope the problems above simultaneously. Instead of hanlding with ambiguous
definition of anomaly objects, we propose to work with regular patterns whose
unlabeled data is abundant and usually easy to collect in practice. This allows
our system to be trained completely in an unsupervised procedure and liberate
us from the need for costly data annotation. By learning generative model that
capture the normality distribution in data, we can isolate abnormal data points
that result in low normality scores (high abnormality scores). Moreover, by
leverage on the power of generative networks, i.e. energy-based models, we are
also able to learn the feature representation automatically rather than
replying on hand-crafted features that have been dominating anomaly detection
research over many decades. We demonstrate our proposal on the specific
application of video anomaly detection and the experimental results indicate
that our method performs better than baselines and are comparable with
state-of-the-art methods in many benchmark video anomaly detection datasets
A Review of Co-saliency Detection Technique: Fundamentals, Applications, and Challenges
Co-saliency detection is a newly emerging and rapidly growing research area
in computer vision community. As a novel branch of visual saliency, co-saliency
detection refers to the discovery of common and salient foregrounds from two or
more relevant images, and can be widely used in many computer vision tasks. The
existing co-saliency detection algorithms mainly consist of three components:
extracting effective features to represent the image regions, exploring the
informative cues or factors to characterize co-saliency, and designing
effective computational frameworks to formulate co-saliency. Although numerous
methods have been developed, the literature is still lacking a deep review and
evaluation of co-saliency detection techniques. In this paper, we aim at
providing a comprehensive review of the fundamentals, challenges, and
applications of co-saliency detection. Specifically, we provide an overview of
some related computer vision works, review the history of co-saliency
detection, summarize and categorize the major algorithms in this research area,
discuss some open issues in this area, present the potential applications of
co-saliency detection, and finally point out some unsolved challenges and
promising future works. We expect this review to be beneficial to both fresh
and senior researchers in this field, and give insights to researchers in other
related areas regarding the utility of co-saliency detection algorithms.Comment: 28 pages, 12 figures, 3 table
Crowded Scene Analysis: A Survey
Automated scene analysis has been a topic of great interest in computer
vision and cognitive science. Recently, with the growth of crowd phenomena in
the real world, crowded scene analysis has attracted much attention. However,
the visual occlusions and ambiguities in crowded scenes, as well as the complex
behaviors and scene semantics, make the analysis a challenging task. In the
past few years, an increasing number of works on crowded scene analysis have
been reported, covering different aspects including crowd motion pattern
learning, crowd behavior and activity analysis, and anomaly detection in
crowds. This paper surveys the state-of-the-art techniques on this topic. We
first provide the background knowledge and the available features related to
crowded scenes. Then, existing models, popular algorithms, evaluation
protocols, as well as system performance are provided corresponding to
different aspects of crowded scene analysis. We also outline the available
datasets for performance evaluation. Finally, some research problems and
promising future directions are presented with discussions.Comment: 20 pages in IEEE Transactions on Circuits and Systems for Video
Technology, 201
Spatio-Temporal Data Mining: A Survey of Problems and Methods
Large volumes of spatio-temporal data are increasingly collected and studied
in diverse domains including, climate science, social sciences, neuroscience,
epidemiology, transportation, mobile health, and Earth sciences.
Spatio-temporal data differs from relational data for which computational
approaches are developed in the data mining community for multiple decades, in
that both spatial and temporal attributes are available in addition to the
actual measurements/attributes. The presence of these attributes introduces
additional challenges that needs to be dealt with. Approaches for mining
spatio-temporal data have been studied for over a decade in the data mining
community. In this article we present a broad survey of this relatively young
field of spatio-temporal data mining. We discuss different types of
spatio-temporal data and the relevant data mining questions that arise in the
context of analyzing each of these datasets. Based on the nature of the data
mining problem studied, we classify literature on spatio-temporal data mining
into six major categories: clustering, predictive learning, change detection,
frequent pattern mining, anomaly detection, and relationship mining. We discuss
the various forms of spatio-temporal data mining problems in each of these
categories.Comment: Accepted for publication at ACM Computing Survey
AED-Net: An Abnormal Event Detection Network
It is challenging to detect the anomaly in crowded scenes for quite a long
time. In this paper, a self-supervised framework, abnormal event detection
network (AED-Net), which is composed of PCAnet and kernel principal component
analysis (kPCA), is proposed to address this problem. Using surveillance video
sequences of different scenes as raw data, PCAnet is trained to extract
high-level semantics of crowd's situation. Next, kPCA,a one-class classifier,
is trained to determine anomaly of the scene. In contrast to some prevailing
deep learning methods,the framework is completely self-supervised because it
utilizes only video sequences in a normal situation. Experiments of global and
local abnormal event detection are carried out on UMN and UCSD datasets, and
competitive results with higher EER and AUC compared to other state-of-the-art
methods are observed. Furthermore, by adding local response normalization (LRN)
layer, we propose an improvement to original AED-Net. And it is proved to
perform better by promoting the framework's generalization capacity according
to the experiments.Comment: 14 pages, 7 figure
Unsupervised Video Analysis Based on a Spatiotemporal Saliency Detector
Visual saliency, which predicts regions in the field of view that draw the
most visual attention, has attracted a lot of interest from researchers. It has
already been used in several vision tasks, e.g., image classification, object
detection, foreground segmentation. Recently, the spectrum analysis based
visual saliency approach has attracted a lot of interest due to its simplicity
and good performance, where the phase information of the image is used to
construct the saliency map. In this paper, we propose a new approach for
detecting spatiotemporal visual saliency based on the phase spectrum of the
videos, which is easy to implement and computationally efficient. With the
proposed algorithm, we also study how the spatiotemporal saliency can be used
in two important vision task, abnormality detection and spatiotemporal interest
point detection. The proposed algorithm is evaluated on several commonly used
datasets with comparison to the state-of-art methods from the literature. The
experiments demonstrate the effectiveness of the proposed approach to
spatiotemporal visual saliency detection and its application to the above
vision tasksComment: 21 page
Ego-Surfing: Person Localization in First-Person Videos Using Ego-Motion Signatures
We envision a future time when wearable cameras are worn by the masses and
recording first-person point-of-view videos of everyday life. While these
cameras can enable new assistive technologies and novel research challenges,
they also raise serious privacy concerns. For example, first-person videos
passively recorded by wearable cameras will necessarily include anyone who
comes into the view of a camera -- with or without consent. Motivated by these
benefits and risks, we developed a self-search technique tailored to
first-person videos. The key observation of our work is that the egocentric
head motion of a target person (ie, the self) is observed both in the
point-of-view video of the target and observer. The motion correlation between
the target person's video and the observer's video can then be used to identify
instances of the self uniquely. We incorporate this feature into the proposed
approach that computes the motion correlation over densely-sampled trajectories
to search for a target individual in observer videos. Our approach
significantly improves self-search performance over several well-known face
detectors and recognizers. Furthermore, we show how our approach can enable
several practical applications such as privacy filtering, target video
retrieval, and social group clustering.Comment: To appear in IEEE TPAM
Monitoring COVID-19 social distancing with person detection and tracking via fine-tuned YOLO v3 and Deepsort techniques
The rampant coronavirus disease 2019 (COVID-19) has brought global crisis
with its deadly spread to more than 180 countries, and about 3,519,901
confirmed cases along with 247,630 deaths globally as on May 4, 2020. The
absence of any active therapeutic agents and the lack of immunity against
COVID-19 increases the vulnerability of the population. Since there are no
vaccines available, social distancing is the only feasible approach to fight
against this pandemic. Motivated by this notion, this article proposes a deep
learning based framework for automating the task of monitoring social
distancing using surveillance video. The proposed framework utilizes the YOLO
v3 object detection model to segregate humans from the background and Deepsort
approach to track the identified people with the help of bounding boxes and
assigned IDs. The results of the YOLO v3 model are further compared with other
popular state-of-the-art models, e.g. faster region-based CNN (convolution
neural network) and single shot detector (SSD) in terms of mean average
precision (mAP), frames per second (FPS) and loss values defined by object
classification and localization. Later, the pairwise vectorized L2 norm is
computed based on the three-dimensional feature space obtained by using the
centroid coordinates and dimensions of the bounding box. The violation index
term is proposed to quantize the non adoption of social distancing protocol.
From the experimental analysis, it is observed that the YOLO v3 with Deepsort
tracking scheme displayed best results with balanced mAP and FPS score to
monitor the social distancing in real-time
A survey on trajectory clustering analysis
This paper comprehensively surveys the development of trajectory clustering.
Considering the critical role of trajectory data mining in modern intelligent
systems for surveillance security, abnormal behavior detection, crowd behavior
analysis, and traffic control, trajectory clustering has attracted growing
attention. Existing trajectory clustering methods can be grouped into three
categories: unsupervised, supervised and semi-supervised algorithms. In spite
of achieving a certain level of development, trajectory clustering is limited
in its success by complex conditions such as application scenarios and data
dimensions. This paper provides a holistic understanding and deep insight into
trajectory clustering, and presents a comprehensive analysis of representative
methods and promising future directions
Characterizing Human Behaviours Using Statistical Motion Descriptor
Identifying human behaviors is a challenging research problem due to the
complexity and variation of appearances and postures, the variation of camera
settings, and view angles. In this paper, we try to address the problem of
human behavior identification by introducing a novel motion descriptor based on
statistical features. The method first divide the video into N number of
temporal segments. Then for each segment, we compute dense optical flow, which
provides instantaneous velocity information for all the pixels. We then compute
Histogram of Optical Flow (HOOF) weighted by the norm and quantized into 32
bins. We then compute statistical features from the obtained HOOF forming a
descriptor vector of 192- dimensions. We then train a non-linear multi-class
SVM that classify different human behaviors with the accuracy of 72.1%. We
evaluate our method by using publicly available human action data set.
Experimental results shows that our proposed method out performs state of the
art methods
- …