125 research outputs found

    ConvGRU-CNN: Spatiotemporal Deep Learning for Real-World Anomaly Detection in Video Surveillance System

    Get PDF
    Video surveillance for real-world anomaly detection and prevention using deep learning is an important and difficult research area. It is imperative to detect and prevent anomalies to develop a nonviolent society. Realworld video surveillance cameras automate the detection of anomaly activities and enable the law enforcement systems for taking steps toward public safety. However, a human-monitored surveillance system is vulnerable to oversight anomaly activity. In this paper, an automated deep learning model is proposed in order to detect and prevent anomaly activities. The real-world video surveillance system is designed by implementing the ResNet-50, a Convolutional Neural Network (CNN) model, to extract the high-level features from input streams whereas temporal features are extracted by the Convolutional GRU (ConvGRU) from the ResNet-50 extracted features in the time-series dataset. The proposed deep learning video surveillance model (named ConvGRUCNN) can efficiently detect anomaly activities. The UCF-Crime dataset is used to evaluate the proposed deep learning model. We classified normal and abnormal activities, thereby showing the ability of ConvGRU-CNN to find a correct category for each abnormal activity. With the UCF-Crime dataset for the video surveillance-based anomaly detection, ConvGRU-CNN achieved 82.22% accuracy. In addition, the proposed model outperformed the related deep learning models

    Identification and monitoring of violent interactions in video

    Get PDF
    This project shall help to bring a tool to fight against bullying in schools. It is also possible to use it in different scenes where a camera is recording a common area shared by people, such as companies, banks, prisons, or hospitals. To achieve that, the issue is approached from two main modules. The first one, a comparative study of approaches to detect violence in video, using image and video analyser Neural Networks (NN)s: a custom image analyser NN based on LeNet5, AlexNet, custom stacked long short-term memory (LSTM) and convolutional LSTM based NNs. The trainings are done with two datasets that have been subject to modifications to correct possible misinterpretations during the learning and pretraining is applied. The LeNet5 based NN is unsuccessful and tested with an independent dataset AlexNet is inaccurate. The best results are obtained with a stacked LSTM NN and a convolutional LSTM with dropout and a LSTM layer. Both NNs achieve over 90 % of accuracy with training and validation datasets, meanwhile the stacked LSTM and the convolutional NN achieve, respectively, 75 % and 100 % of accuracy with a small independent test dataset created. The convolutional LSTM needed 10 times less epochs to achieve the same result as the stacked LSTM. The second module consists of a violence detection system that applies the best solution obtained from the comparative study. The violence detection system saves the frames detected as violence with date, time and camera name and emits a sound alarm when more than a certain number of consecutive frames are evaluated as containing violence. This way the sensitivity of the system is reduced and avoids false alarms due to small mistakes done by the intelligence

    Deep Learning for Crowd Anomaly Detection

    Get PDF
    Today, public areas across the globe are monitored by an increasing amount of surveillance cameras. This widespread usage has presented an ever-growing volume of data that cannot realistically be examined in real-time. Therefore, efforts to understand crowd dynamics have brought light to automatic systems for the detection of anomalies in crowds. This thesis explores the methods used across literature for this purpose, with a focus on those fusing dense optical flow in a feature extraction stage to the crowd anomaly detection problem. To this extent, five different deep learning architectures are trained using optical flow maps estimated by three deep learning-based techniques. More specifically, a 2D convolutional network, a 3D convolutional network, and LSTM-based convolutional recurrent network, a pre-trained variant of the latter, and a ConvLSTM-based autoencoder is trained using both regular frames and optical flow maps estimated by LiteFlowNet3, RAFT, and GMA on the UCSD Pedestrian 1 dataset. The experimental results have shown that while prone to overfitting, the use of optical flow maps may improve the performance of supervised spatio-temporal architectures

    A Survey of Deep Learning Solutions for Multimedia Visual Content Analysis

    Get PDF
    The increasing use of social media networks on handheld devices, especially smartphones with powerful built-in cameras, and the widespread availability of fast and high bandwidth broadband connections, added to the popularity of cloud storage, is enabling the generation and distribution of massive volumes of digital media, including images and videos. Such media is full of visual information and holds immense value in today's world. The volume of data involved calls for automated visual content analysis systems able to meet the demands of practice in terms of efficiency and effectiveness. Deep learning (DL) has recently emerged as a prominent technique for visual content analysis. It is data-driven in nature and provides automatic end-to-end learning solutions without the need to rely explicitly on predefined handcrafted feature extractors. Another appealing characteristic of DL solutions is the performance they can achieve, once the network is trained, under practical constraints. This paper identifies eight problem domains which require analysis of visual artifacts in multimedia. It surveys the recent, authoritative, and the best performing DL solutions and lists the datasets used in the development of these deep methods for the identified types of visual analysis problems. This paper also discusses the challenges that the DL solutions face which can compromise their reliability, robustness, and accuracy for visual content analysis

    A Survey of Deep Learning Solutions for Anomaly Detection in Surveillance Videos

    Get PDF
    Deep learning has proven to be a landmark computing approach to the computer vision domain. Hence, it has been widely applied to solve complex cognitive tasks like the detection of anomalies in surveillance videos. Anomaly detection in this case is the identification of abnormal events in the surveillance videos which can be deemed as security incidents or threats. Deep learning solutions for anomaly detection has outperformed other traditional machine learning solutions. This review attempts to provide holistic benchmarking of the published deep learning solutions for videos anomaly detection since 2016. The paper identifies, the learning technique, datasets used and the overall model accuracy. Reviewed papers were organised into five deep learning methods namely; autoencoders, continual learning, transfer learning, reinforcement learning and ensemble learning. Current and emerging trends are discussed as well

    Survey on video anomaly detection in dynamic scenes with moving cameras

    Full text link
    The increasing popularity of compact and inexpensive cameras, e.g.~dash cameras, body cameras, and cameras equipped on robots, has sparked a growing interest in detecting anomalies within dynamic scenes recorded by moving cameras. However, existing reviews primarily concentrate on Video Anomaly Detection (VAD) methods assuming static cameras. The VAD literature with moving cameras remains fragmented, lacking comprehensive reviews to date. To address this gap, we endeavor to present the first comprehensive survey on Moving Camera Video Anomaly Detection (MC-VAD). We delve into the research papers related to MC-VAD, critically assessing their limitations and highlighting associated challenges. Our exploration encompasses three application domains: security, urban transportation, and marine environments, which in turn cover six specific tasks. We compile an extensive list of 25 publicly-available datasets spanning four distinct environments: underwater, water surface, ground, and aerial. We summarize the types of anomalies these datasets correspond to or contain, and present five main categories of approaches for detecting such anomalies. Lastly, we identify future research directions and discuss novel contributions that could advance the field of MC-VAD. With this survey, we aim to offer a valuable reference for researchers and practitioners striving to develop and advance state-of-the-art MC-VAD methods.Comment: Under revie

    Design Of Computer Vision Systems For Optimizing The Threat Detection Accuracy

    Get PDF
    This dissertation considers computer vision (CV) systems in which a central monitoring station receives and analyzes the video streams captured and delivered wirelessly by multiple cameras. It addresses how the bandwidth can be allocated to various cameras by presenting a cross-layer solution that optimizes the overall detection or recognition accuracy. The dissertation presents and develops a real CV system and subsequently provides a detailed experimental analysis of cross-layer optimization. Other unique features of the developed solution include employing the popular HTTP streaming approach, utilizing homogeneous cameras as well as heterogeneous ones with varying capabilities and limitations, and including a new algorithm for estimating the effective medium airtime. The results show that the proposed solution significantly improves the CV accuracy. Additionally, the dissertation features an improved neural network system for object detection. The proposed system considers inherent video characteristics and employs different motion detection and clustering algorithms to focus on the areas of importance in consecutive frames, allowing the system to dynamically and efficiently distribute the detection task among multiple deployments of object detection neural networks. Our experimental results indicate that our proposed method can enhance the mAP (mean average precision), execution time, and required data transmissions to object detection networks. Finally, as recognizing an activity provides significant automation prospects in CV systems, the dissertation presents an efficient activity-detection recurrent neural network that utilizes fast pose/limbs estimation approaches. By combining object detection with pose estimation, the domain of activity detection is shifted from a volume of RGB (Red, Green, and Blue) pixel values to a time-series of relatively small one-dimensional arrays, thereby allowing the activity detection system to take advantage of highly capable neural networks that have been trained on large GPU clusters for thousands of hours. Consequently, capable activity detection systems with considerably fewer training sets and processing hours can be built

    Analyzing Human-Human Interactions: A Survey

    Full text link
    Many videos depict people, and it is their interactions that inform us of their activities, relation to one another and the cultural and social setting. With advances in human action recognition, researchers have begun to address the automated recognition of these human-human interactions from video. The main challenges stem from dealing with the considerable variation in recording setting, the appearance of the people depicted and the coordinated performance of their interaction. This survey provides a summary of these challenges and datasets to address these, followed by an in-depth discussion of relevant vision-based recognition and detection methods. We focus on recent, promising work based on deep learning and convolutional neural networks (CNNs). Finally, we outline directions to overcome the limitations of the current state-of-the-art to analyze and, eventually, understand social human actions
    • …
    corecore