6,211 research outputs found
Behavior and event detection for annotation and surveillance
Visual surveillance and activity analysis is an active research
field of computer vision. As a result, there are several
different algorithms produced for this purpose. To obtain
more robust systems it is desirable to integrate the different algorithms. To achieve this goal, the paper presents results in automatic event detection in surveillance videos, and a distributed application framework for supporting these methods. Results in motion analysis for static and moving cameras, automatic fight detection, shadow segmentation, discovery of unusual motion patterns, indexing and retrieval will be presented. These applications perform real time, and are suitable for real life applications
Learning to Detect Violent Videos using Convolutional Long Short-Term Memory
Developing a technique for the automatic analysis of surveillance videos in
order to identify the presence of violence is of broad interest. In this work,
we propose a deep neural network for the purpose of recognizing violent videos.
A convolutional neural network is used to extract frame level features from a
video. The frame level features are then aggregated using a variant of the long
short term memory that uses convolutional gates. The convolutional neural
network along with the convolutional long short term memory is capable of
capturing localized spatio-temporal features which enables the analysis of
local motion taking place in the video. We also propose to use adjacent frame
differences as the input to the model thereby forcing it to encode the changes
occurring in the video. The performance of the proposed feature extraction
pipeline is evaluated on three standard benchmark datasets in terms of
recognition accuracy. Comparison of the results obtained with the state of the
art techniques revealed the promising capability of the proposed method in
recognizing violent videos.Comment: Accepted in International Conference on Advanced Video and Signal
based Surveillance(AVSS 2017
Discovery and recognition of motion primitives in human activities
We present a novel framework for the automatic discovery and recognition of
motion primitives in videos of human activities. Given the 3D pose of a human
in a video, human motion primitives are discovered by optimizing the `motion
flux', a quantity which captures the motion variation of a group of skeletal
joints. A normalization of the primitives is proposed in order to make them
invariant with respect to a subject anatomical variations and data sampling
rate. The discovered primitives are unknown and unlabeled and are
unsupervisedly collected into classes via a hierarchical non-parametric Bayes
mixture model. Once classes are determined and labeled they are further
analyzed for establishing models for recognizing discovered primitives. Each
primitive model is defined by a set of learned parameters.
Given new video data and given the estimated pose of the subject appearing on
the video, the motion is segmented into primitives, which are recognized with a
probability given according to the parameters of the learned models.
Using our framework we build a publicly available dataset of human motion
primitives, using sequences taken from well-known motion capture datasets. We
expect that our framework, by providing an objective way for discovering and
categorizing human motion, will be a useful tool in numerous research fields
including video analysis, human inspired motion generation, learning by
demonstration, intuitive human-robot interaction, and human behavior analysis
Unmanned Aerial Systems for Wildland and Forest Fires
Wildfires represent an important natural risk causing economic losses, human
death and important environmental damage. In recent years, we witness an
increase in fire intensity and frequency. Research has been conducted towards
the development of dedicated solutions for wildland and forest fire assistance
and fighting. Systems were proposed for the remote detection and tracking of
fires. These systems have shown improvements in the area of efficient data
collection and fire characterization within small scale environments. However,
wildfires cover large areas making some of the proposed ground-based systems
unsuitable for optimal coverage. To tackle this limitation, Unmanned Aerial
Systems (UAS) were proposed. UAS have proven to be useful due to their
maneuverability, allowing for the implementation of remote sensing, allocation
strategies and task planning. They can provide a low-cost alternative for the
prevention, detection and real-time support of firefighting. In this paper we
review previous work related to the use of UAS in wildfires. Onboard sensor
instruments, fire perception algorithms and coordination strategies are
considered. In addition, we present some of the recent frameworks proposing the
use of both aerial vehicles and Unmanned Ground Vehicles (UV) for a more
efficient wildland firefighting strategy at a larger scale.Comment: A recent published version of this paper is available at:
https://doi.org/10.3390/drones501001
Automatic video censoring system using deep learning
Due to the extensive use of video-sharing platforms and services, the amount of such all kinds of content on the web has become massive. This abundance of information is a problem controlling the kind of content that may be present in such a video. More than telling if the content is suitable for children and sensitive people or not, figuring it out is also important what parts of it contains such content, for preserving parts that would be discarded in a simple broad analysis. To tackle this problem, a comparison was done for popular image deep learning models: MobileNetV2, Xception model, InceptionV3, VGG16, VGG19, ResNet101 and ResNet50 to seek the one that is most suitable for the required application. Also, a system is developed that would automatically censor inappropriate content such as violent scenes with the help of deep learning. The system uses a transfer learning mechanism using the VGG16 model. The experiments suggested that the model showed excellent performance for the automatic censoring application that could also be used in other similar applications
Deep learning for automatic violence detection: tests on the AIRTLab dataset
Following the growing availability of video surveillance cameras and the need for techniques to automatically identify events in video footages, there is an increasing interest towards automatic violence detection in videos. Deep learning-based architectures, such as 3D Convolutional Neural Networks, demonstrated their capability of extracting spatio-temporal features from videos, being effective in violence detection. However, friendly behaviours or fast moves such as hugs, small hits, claps, high fives, etc., can still cause false positives, interpreting a harmless action as violent. To this end, we present three deep-learning based models for violence detection and test them on the AIRTLab dataset, a novel dataset designed to check the robustness of algorithms against false positives. The objective is twofold: on one hand, we compute accuracy metrics on the three proposed models (two are based on transfer learning and one is trained from scratch), building a baseline of metrics for the AIRTLab dataset; on the other hand, we validate the capability of the proposed dataset of challenging the robustness to false positives. The results of the proposed models are in line with the scientific literature, in terms of accuracy, with transfer learning-based networks exhibiting better generalization capabilities than the trained from scratch network. Moreover, the tests highlighted that most of the classification errors concern the identification of non-violent clips, validating the design of the proposed dataset. Finally, to demonstrate the significance of the proposed models, the paper presents a comparison with the related literature, as well as with models based on well-established pre-trained 2D Convolutional Neural Networks 2D CNNs. Such comparison highlights that 3D models get better accuracy performance than time distributed 2D CNNs (merged with a recurrent model) in processing the spatio-temporal features of video clips. The source code of the experiments and the AIRTLab dataset are available in public repositories
Vision-based Fight Detection from Surveillance Cameras
Vision-based action recognition is one of the most challenging research
topics of computer vision and pattern recognition. A specific application of
it, namely, detecting fights from surveillance cameras in public areas,
prisons, etc., is desired to quickly get under control these violent incidents.
This paper addresses this research problem and explores LSTM-based approaches
to solve it. Moreover, the attention layer is also utilized. Besides, a new
dataset is collected, which consists of fight scenes from surveillance camera
videos available at YouTube. This dataset is made publicly available. From the
extensive experiments conducted on Hockey Fight, Peliculas, and the newly
collected fight datasets, it is observed that the proposed approach, which
integrates Xception model, Bi-LSTM, and attention, improves the
state-of-the-art accuracy for fight scene classification.Comment: 6 pages, 5 figures, 4 tables, International Conference on Image
Processing Theory, Tools and Applications, IPTA 201
Inflated 3D ConvNet context analysis for violence detection
According to the Wall Street Journal, one billion surveillance cameras will be deployed around the world by 2021. This amount of information can be hardly managed by humans. Using a Inflated 3D ConvNet as backbone, this paper introduces a novel automatic violence detection approach that outperforms state-of-the-art existing proposals. Most of those proposals consider a pre-processing step to only focus on some regions of interest in the scene, i.e., those actually containing a human subject. In this regard, this paper also reports the results of an extensive analysis on whether and how the context can affect or not the adopted classifier performance. The experiments show that context-free footage yields substantial deterioration of the classifier performance (2% to 5%) on publicly available datasets. However, they also demonstrate that performance stabilizes in context-free settings, no matter the level of context restriction applied. Finally, a cross-dataset experiment investigates the generalizability of results obtained in a single-collection experiment (same dataset used for training and testing) to cross-collection settings (different datasets used for training and testing)
WVD: A New Synthetic Dataset for Video-based Violence Detection
Violence detection is becoming increasingly relevant in many areas such as for automatic content filtering, video surveillance and law enforcement. Existing datasets and methods discriminate between violent and non-violent scenes based on very abstract definitions of violence. Available datasets, such as "Hockey Fight" and "Movies", only contain fight versus non-fight videos; no weapons are discriminated in them. In this paper, we focus explicitly on weapon-based fighting sequences and propose a new dataset based on the popular action-adventure video game Grand Theft Auto-V (GTA-V). This new dataset is called "Weapon Violence Dataset" (WVD). The choice for a virtual dataset follows a trend which allows creating and labelling as sophisticated and large volume, yet realistic, datasets as possible. Furthermore, WVD also avoids the drawbacks of access to real data and potential implications. To the best of our knowledge no similar dataset, that captures weapon-based violence, exists. The paper evaluates the proposed dataset by utilising local feature descriptors using an SVM classifier. The extracted features are aggregated using the Bag of Visual Word (BoVW) technique to classify weapon-based violence videos. Our results indicate that SURF achieves
the best performance
- …