6,211 research outputs found

    Behavior and event detection for annotation and surveillance

    Get PDF
    Visual surveillance and activity analysis is an active research field of computer vision. As a result, there are several different algorithms produced for this purpose. To obtain more robust systems it is desirable to integrate the different algorithms. To achieve this goal, the paper presents results in automatic event detection in surveillance videos, and a distributed application framework for supporting these methods. Results in motion analysis for static and moving cameras, automatic fight detection, shadow segmentation, discovery of unusual motion patterns, indexing and retrieval will be presented. These applications perform real time, and are suitable for real life applications

    Learning to Detect Violent Videos using Convolutional Long Short-Term Memory

    Full text link
    Developing a technique for the automatic analysis of surveillance videos in order to identify the presence of violence is of broad interest. In this work, we propose a deep neural network for the purpose of recognizing violent videos. A convolutional neural network is used to extract frame level features from a video. The frame level features are then aggregated using a variant of the long short term memory that uses convolutional gates. The convolutional neural network along with the convolutional long short term memory is capable of capturing localized spatio-temporal features which enables the analysis of local motion taking place in the video. We also propose to use adjacent frame differences as the input to the model thereby forcing it to encode the changes occurring in the video. The performance of the proposed feature extraction pipeline is evaluated on three standard benchmark datasets in terms of recognition accuracy. Comparison of the results obtained with the state of the art techniques revealed the promising capability of the proposed method in recognizing violent videos.Comment: Accepted in International Conference on Advanced Video and Signal based Surveillance(AVSS 2017

    Discovery and recognition of motion primitives in human activities

    Get PDF
    We present a novel framework for the automatic discovery and recognition of motion primitives in videos of human activities. Given the 3D pose of a human in a video, human motion primitives are discovered by optimizing the `motion flux', a quantity which captures the motion variation of a group of skeletal joints. A normalization of the primitives is proposed in order to make them invariant with respect to a subject anatomical variations and data sampling rate. The discovered primitives are unknown and unlabeled and are unsupervisedly collected into classes via a hierarchical non-parametric Bayes mixture model. Once classes are determined and labeled they are further analyzed for establishing models for recognizing discovered primitives. Each primitive model is defined by a set of learned parameters. Given new video data and given the estimated pose of the subject appearing on the video, the motion is segmented into primitives, which are recognized with a probability given according to the parameters of the learned models. Using our framework we build a publicly available dataset of human motion primitives, using sequences taken from well-known motion capture datasets. We expect that our framework, by providing an objective way for discovering and categorizing human motion, will be a useful tool in numerous research fields including video analysis, human inspired motion generation, learning by demonstration, intuitive human-robot interaction, and human behavior analysis

    Unmanned Aerial Systems for Wildland and Forest Fires

    Full text link
    Wildfires represent an important natural risk causing economic losses, human death and important environmental damage. In recent years, we witness an increase in fire intensity and frequency. Research has been conducted towards the development of dedicated solutions for wildland and forest fire assistance and fighting. Systems were proposed for the remote detection and tracking of fires. These systems have shown improvements in the area of efficient data collection and fire characterization within small scale environments. However, wildfires cover large areas making some of the proposed ground-based systems unsuitable for optimal coverage. To tackle this limitation, Unmanned Aerial Systems (UAS) were proposed. UAS have proven to be useful due to their maneuverability, allowing for the implementation of remote sensing, allocation strategies and task planning. They can provide a low-cost alternative for the prevention, detection and real-time support of firefighting. In this paper we review previous work related to the use of UAS in wildfires. Onboard sensor instruments, fire perception algorithms and coordination strategies are considered. In addition, we present some of the recent frameworks proposing the use of both aerial vehicles and Unmanned Ground Vehicles (UV) for a more efficient wildland firefighting strategy at a larger scale.Comment: A recent published version of this paper is available at: https://doi.org/10.3390/drones501001

    Automatic video censoring system using deep learning

    Get PDF
    Due to the extensive use of video-sharing platforms and services, the amount of such all kinds of content on the web has become massive. This abundance of information is a problem controlling the kind of content that may be present in such a video. More than telling if the content is suitable for children and sensitive people or not, figuring it out is also important what parts of it contains such content, for preserving parts that would be discarded in a simple broad analysis. To tackle this problem, a comparison was done for popular image deep learning models: MobileNetV2, Xception model, InceptionV3, VGG16, VGG19, ResNet101 and ResNet50 to seek the one that is most suitable for the required application. Also, a system is developed that would automatically censor inappropriate content such as violent scenes with the help of deep learning. The system uses a transfer learning mechanism using the VGG16 model. The experiments suggested that the model showed excellent performance for the automatic censoring application that could also be used in other similar applications

    Deep learning for automatic violence detection: tests on the AIRTLab dataset

    Get PDF
    Following the growing availability of video surveillance cameras and the need for techniques to automatically identify events in video footages, there is an increasing interest towards automatic violence detection in videos. Deep learning-based architectures, such as 3D Convolutional Neural Networks, demonstrated their capability of extracting spatio-temporal features from videos, being effective in violence detection. However, friendly behaviours or fast moves such as hugs, small hits, claps, high fives, etc., can still cause false positives, interpreting a harmless action as violent. To this end, we present three deep-learning based models for violence detection and test them on the AIRTLab dataset, a novel dataset designed to check the robustness of algorithms against false positives. The objective is twofold: on one hand, we compute accuracy metrics on the three proposed models (two are based on transfer learning and one is trained from scratch), building a baseline of metrics for the AIRTLab dataset; on the other hand, we validate the capability of the proposed dataset of challenging the robustness to false positives. The results of the proposed models are in line with the scientific literature, in terms of accuracy, with transfer learning-based networks exhibiting better generalization capabilities than the trained from scratch network. Moreover, the tests highlighted that most of the classification errors concern the identification of non-violent clips, validating the design of the proposed dataset. Finally, to demonstrate the significance of the proposed models, the paper presents a comparison with the related literature, as well as with models based on well-established pre-trained 2D Convolutional Neural Networks 2D CNNs. Such comparison highlights that 3D models get better accuracy performance than time distributed 2D CNNs (merged with a recurrent model) in processing the spatio-temporal features of video clips. The source code of the experiments and the AIRTLab dataset are available in public repositories

    Vision-based Fight Detection from Surveillance Cameras

    Full text link
    Vision-based action recognition is one of the most challenging research topics of computer vision and pattern recognition. A specific application of it, namely, detecting fights from surveillance cameras in public areas, prisons, etc., is desired to quickly get under control these violent incidents. This paper addresses this research problem and explores LSTM-based approaches to solve it. Moreover, the attention layer is also utilized. Besides, a new dataset is collected, which consists of fight scenes from surveillance camera videos available at YouTube. This dataset is made publicly available. From the extensive experiments conducted on Hockey Fight, Peliculas, and the newly collected fight datasets, it is observed that the proposed approach, which integrates Xception model, Bi-LSTM, and attention, improves the state-of-the-art accuracy for fight scene classification.Comment: 6 pages, 5 figures, 4 tables, International Conference on Image Processing Theory, Tools and Applications, IPTA 201

    Inflated 3D ConvNet context analysis for violence detection

    Get PDF
    According to the Wall Street Journal, one billion surveillance cameras will be deployed around the world by 2021. This amount of information can be hardly managed by humans. Using a Inflated 3D ConvNet as backbone, this paper introduces a novel automatic violence detection approach that outperforms state-of-the-art existing proposals. Most of those proposals consider a pre-processing step to only focus on some regions of interest in the scene, i.e., those actually containing a human subject. In this regard, this paper also reports the results of an extensive analysis on whether and how the context can affect or not the adopted classifier performance. The experiments show that context-free footage yields substantial deterioration of the classifier performance (2% to 5%) on publicly available datasets. However, they also demonstrate that performance stabilizes in context-free settings, no matter the level of context restriction applied. Finally, a cross-dataset experiment investigates the generalizability of results obtained in a single-collection experiment (same dataset used for training and testing) to cross-collection settings (different datasets used for training and testing)

    WVD: A New Synthetic Dataset for Video-based Violence Detection

    Get PDF
    Violence detection is becoming increasingly relevant in many areas such as for automatic content filtering, video surveillance and law enforcement. Existing datasets and methods discriminate between violent and non-violent scenes based on very abstract definitions of violence. Available datasets, such as "Hockey Fight" and "Movies", only contain fight versus non-fight videos; no weapons are discriminated in them. In this paper, we focus explicitly on weapon-based fighting sequences and propose a new dataset based on the popular action-adventure video game Grand Theft Auto-V (GTA-V). This new dataset is called "Weapon Violence Dataset" (WVD). The choice for a virtual dataset follows a trend which allows creating and labelling as sophisticated and large volume, yet realistic, datasets as possible. Furthermore, WVD also avoids the drawbacks of access to real data and potential implications. To the best of our knowledge no similar dataset, that captures weapon-based violence, exists. The paper evaluates the proposed dataset by utilising local feature descriptors using an SVM classifier. The extracted features are aggregated using the Bag of Visual Word (BoVW) technique to classify weapon-based violence videos. Our results indicate that SURF achieves the best performance
    corecore