3,357 research outputs found

    Activity Recognition based on a Magnitude-Orientation Stream Network

    Full text link
    The temporal component of videos provides an important clue for activity recognition, as a number of activities can be reliably recognized based on the motion information. In view of that, this work proposes a novel temporal stream for two-stream convolutional networks based on images computed from the optical flow magnitude and orientation, named Magnitude-Orientation Stream (MOS), to learn the motion in a better and richer manner. Our method applies simple nonlinear transformations on the vertical and horizontal components of the optical flow to generate input images for the temporal stream. Experimental results, carried on two well-known datasets (HMDB51 and UCF101), demonstrate that using our proposed temporal stream as input to existing neural network architectures can improve their performance for activity recognition. Results demonstrate that our temporal stream provides complementary information able to improve the classical two-stream methods, indicating the suitability of our approach to be used as a temporal video representation.Comment: 8 pages, SIBGRAPI 201

    Learning Deep Representations of Appearance and Motion for Anomalous Event Detection

    Full text link
    We present a novel unsupervised deep learning framework for anomalous event detection in complex video scenes. While most existing works merely use hand-crafted appearance and motion features, we propose Appearance and Motion DeepNet (AMDN) which utilizes deep neural networks to automatically learn feature representations. To exploit the complementary information of both appearance and motion patterns, we introduce a novel double fusion framework, combining both the benefits of traditional early fusion and late fusion strategies. Specifically, stacked denoising autoencoders are proposed to separately learn both appearance and motion features as well as a joint representation (early fusion). Based on the learned representations, multiple one-class SVM models are used to predict the anomaly scores of each input, which are then integrated with a late fusion strategy for final anomaly detection. We evaluate the proposed method on two publicly available video surveillance datasets, showing competitive performance with respect to state of the art approaches.Comment: Oral paper in BMVC 201

    ConvGRU-CNN: Spatiotemporal Deep Learning for Real-World Anomaly Detection in Video Surveillance System

    Get PDF
    Video surveillance for real-world anomaly detection and prevention using deep learning is an important and difficult research area. It is imperative to detect and prevent anomalies to develop a nonviolent society. Realworld video surveillance cameras automate the detection of anomaly activities and enable the law enforcement systems for taking steps toward public safety. However, a human-monitored surveillance system is vulnerable to oversight anomaly activity. In this paper, an automated deep learning model is proposed in order to detect and prevent anomaly activities. The real-world video surveillance system is designed by implementing the ResNet-50, a Convolutional Neural Network (CNN) model, to extract the high-level features from input streams whereas temporal features are extracted by the Convolutional GRU (ConvGRU) from the ResNet-50 extracted features in the time-series dataset. The proposed deep learning video surveillance model (named ConvGRUCNN) can efficiently detect anomaly activities. The UCF-Crime dataset is used to evaluate the proposed deep learning model. We classified normal and abnormal activities, thereby showing the ability of ConvGRU-CNN to find a correct category for each abnormal activity. With the UCF-Crime dataset for the video surveillance-based anomaly detection, ConvGRU-CNN achieved 82.22% accuracy. In addition, the proposed model outperformed the related deep learning models

    Effective crowd anomaly detection through spatio-temporal texture analysis

    Get PDF
    Abnormal crowd behaviors in high density situations can pose great danger to public safety. Despite the extensive installation of closed-circuit television (CCTV) cameras, it is still difficult to achieve real-time alerts and automated responses from current systems. Two major breakthroughs have been reported in this research. Firstly, a spatial-temporal texture extraction algorithm is developed. This algorithm is able to effectively extract video textures with abundant crowd motion details. It is through adopting Gabor-filtered textures with the highest information entropy values. Secondly, a novel scheme for defining crowd motion patterns (signatures) is devised to identify abnormal behaviors in the crowd by employing an enhanced gray level co-occurrence matrix model. In the experiments, various classic classifiers are utilized to benchmark the performance of the proposed method. The results obtained exhibit detection and accuracy rates which are, overall, superior to other techniques

    Architecture for automatic recognition of group activities using local motions and context

    Get PDF
    Currently, the ability to automatically detect human behavior in image sequences is one of the most important challenges in the area of computer vision. Within this broad field of knowledge, the recognition of activities of people groups in public areas is receiving special attention due to its importance in many aspects including safety and security. This paper proposes a generic computer vision architecture with the ability to learn and recognize different group activities using mainly the local group’s movements. Specifically, a multi-stream deep learning architecture is proposed whose two main streams correspond to a representation based on a descriptor capable of representing the trajectory information of a sequence of images as a collection of local movements that occur in specific regions of the scene. Additional information (e.g. location, time, etc.) to strengthen the classification of activities by including it as additional streams. The proposed architecture is capable of classifying in a robust way different activities of a group as well to deal with the one-class problems. Moreover, the use of a simple descriptor that transforms a sequence of color images into a sequence of two-image streams can reduce the curse of dimensionality using a deep learning approach. The generic deep learning architecture has been evaluated with different datasets outperforming the state-of-the-art approaches providing an efficient architecture for single and multi-class classification problems

    Online video-based abnormal detection using highly motion techniques and statistical measures

    Get PDF
    At the essence of video surveillance, there are abnormal detection approaches, which have been proven to be substantially effective in detecting abnormal incidents without prior knowledge about these incidents. Based on the state-of-the-art research, it is evident that there is a trade-off between frame processing time and detection accuracy in abnormal detection approaches. Therefore, the primary challenge is to balance this trade-off suitably by utilizing few, but very descriptive features to fulfill online performance while maintaining a high accuracy rate. In this study, we propose a new framework, which achieves the balancing between detection accuracy and video processing time by employing two efficient motion techniques, specifically, foreground and optical flow energy. Moreover, we use different statistical analysis measures of motion features to get robust inference method to distinguish abnormal behavior incident from normal ones. The performance of this framework has been extensively evaluated in terms of the detection accuracy, the area under the curve (AUC) and frame processing time. Simulation results and comparisons with ten relevant online and non-online frameworks demonstrate that our framework efficiently achieves superior performance to those frameworks, in which it presents high values for he accuracy while attaining simultaneously low values for the processing time

    Physics inspired methods for crowd video surveillance and analysis: a survey

    Get PDF

    The Visual Social Distancing Problem

    Get PDF
    One of the main and most effective measures to contain the recent viral outbreak is the maintenance of the so-called Social Distancing (SD). To comply with this constraint, workplaces, public institutions, transports and schools will likely adopt restrictions over the minimum inter-personal distance between people. Given this actual scenario, it is crucial to massively measure the compliance to such physical constraint in our life, in order to figure out the reasons of the possible breaks of such distance limitations, and understand if this implies a possible threat given the scene context. All of this, complying with privacy policies and making the measurement acceptable. To this end, we introduce the Visual Social Distancing (VSD) problem, defined as the automatic estimation of the inter-personal distance from an image, and the characterization of the related people aggregations. VSD is pivotal for a non-invasive analysis to whether people comply with the SD restriction, and to provide statistics about the level of safety of specific areas whenever this constraint is violated. We then discuss how VSD relates with previous literature in Social Signal Processing and indicate which existing Computer Vision methods can be used to manage such problem. We conclude with future challenges related to the effectiveness of VSD systems, ethical implications and future application scenarios.Comment: 9 pages, 5 figures. All the authors equally contributed to this manuscript and they are listed by alphabetical order. Under submissio
    • …
    corecore