Search CORE

3,357 research outputs found

Activity Recognition based on a Magnitude-Orientation Stream Network

Author: Caetano Carlos
de Melo Victor H. C.
Santos Jefersson A. dos
Schwartz William Robson
Publication venue
Publication date: 22/08/2017
Field of study

The temporal component of videos provides an important clue for activity recognition, as a number of activities can be reliably recognized based on the motion information. In view of that, this work proposes a novel temporal stream for two-stream convolutional networks based on images computed from the optical flow magnitude and orientation, named Magnitude-Orientation Stream (MOS), to learn the motion in a better and richer manner. Our method applies simple nonlinear transformations on the vertical and horizontal components of the optical flow to generate input images for the temporal stream. Experimental results, carried on two well-known datasets (HMDB51 and UCF101), demonstrate that using our proposed temporal stream as input to existing neural network architectures can improve their performance for activity recognition. Results demonstrate that our temporal stream provides complementary information able to improve the classical two-stream methods, indicating the suitability of our approach to be used as a temporal video representation.Comment: 8 pages, SIBGRAPI 201

arXiv.org e-Print Archive

Crossref

Learning Deep Representations of Appearance and Motion for Anomalous Event Detection

Author: Ricci Elisa
Sebe Nicu
Song Jingkuan
Xu Dan
Yan Yan
Publication venue
Publication date: 01/01/2015
Field of study

We present a novel unsupervised deep learning framework for anomalous event detection in complex video scenes. While most existing works merely use hand-crafted appearance and motion features, we propose Appearance and Motion DeepNet (AMDN) which utilizes deep neural networks to automatically learn feature representations. To exploit the complementary information of both appearance and motion patterns, we introduce a novel double fusion framework, combining both the benefits of traditional early fusion and late fusion strategies. Specifically, stacked denoising autoencoders are proposed to separately learn both appearance and motion features as well as a joint representation (early fusion). Based on the learned representations, multiple one-class SVM models are used to predict the anomaly scores of each input, which are then integrated with a late fusion strategy for final anomaly detection. We evaluate the proposed method on two publicly available video surveillance datasets, showing competitive performance with respect to state of the art approaches.Comment: Oral paper in BMVC 201

arXiv.org e-Print Archive

Crossref

ConvGRU-CNN: Spatiotemporal Deep Learning for Real-World Anomaly Detection in Video Surveillance System

Author: Qasim Gandapur Maryam
Verdú Elena
Publication venue: International Journal of Interactive Multimedia and Artificial Intelligence
Publication date: 01/06/2023
Field of study

Video surveillance for real-world anomaly detection and prevention using deep learning is an important and difficult research area. It is imperative to detect and prevent anomalies to develop a nonviolent society. Realworld video surveillance cameras automate the detection of anomaly activities and enable the law enforcement systems for taking steps toward public safety. However, a human-monitored surveillance system is vulnerable to oversight anomaly activity. In this paper, an automated deep learning model is proposed in order to detect and prevent anomaly activities. The real-world video surveillance system is designed by implementing the ResNet-50, a Convolutional Neural Network (CNN) model, to extract the high-level features from input streams whereas temporal features are extracted by the Convolutional GRU (ConvGRU) from the ResNet-50 extracted features in the time-series dataset. The proposed deep learning video surveillance model (named ConvGRUCNN) can efficiently detect anomaly activities. The UCF-Crime dataset is used to evaluate the proposed deep learning model. We classified normal and abnormal activities, thereby showing the ability of ConvGRU-CNN to find a correct category for each abnormal activity. With the UCF-Crime dataset for the video surveillance-based anomaly detection, ConvGRU-CNN achieved 82.22% accuracy. In addition, the proposed model outperformed the related deep learning models

Re-UNIR

Effective crowd anomaly detection through spatio-temporal texture analysis

Author: Fan Jiu-Lun
Hao Yu
Liu Ying
Wang Jing
Xu Zhi-Jie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/09/2018
Field of study

Abnormal crowd behaviors in high density situations can pose great danger to public safety. Despite the extensive installation of closed-circuit television (CCTV) cameras, it is still difficult to achieve real-time alerts and automated responses from current systems. Two major breakthroughs have been reported in this research. Firstly, a spatial-temporal texture extraction algorithm is developed. This algorithm is able to effectively extract video textures with abundant crowd motion details. It is through adopting Gabor-filtered textures with the highest information entropy values. Secondly, a novel scheme for defining crowd motion patterns (signatures) is devised to identify abnormal behaviors in the crowd by employing an enhanced gray level co-occurrence matrix model. In the experiments, various classic classifiers are utilized to benchmark the performance of the proposed method. The results obtained exhibit detection and accuracy rates which are, overall, superior to other techniques

Sheffield Hallam University Research Archive

Huddersfield Research Portal

Architecture for automatic recognition of group activities using local motions and context

Author: Azorin-Lopez Jorge
Borja Luis Felipe
Fuster-Guilló Andrés
Saval-Calvo Marcelo
Sebban Marc
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/07/2022
Field of study

Currently, the ability to automatically detect human behavior in image sequences is one of the most important challenges in the area of computer vision. Within this broad field of knowledge, the recognition of activities of people groups in public areas is receiving special attention due to its importance in many aspects including safety and security. This paper proposes a generic computer vision architecture with the ability to learn and recognize different group activities using mainly the local group’s movements. Specifically, a multi-stream deep learning architecture is proposed whose two main streams correspond to a representation based on a descriptor capable of representing the trajectory information of a sequence of images as a collection of local movements that occur in specific regions of the scene. Additional information (e.g. location, time, etc.) to strengthen the classification of activities by including it as additional streams. The proposed architecture is capable of classifying in a robust way different activities of a group as well to deal with the one-class problems. Moreover, the use of a simple descriptor that transforms a sequence of color images into a sequence of two-image streams can reduce the curse of dimensionality using a deep learning approach. The generic deep learning architecture has been evaluated with different datasets outperforming the state-of-the-art approaches providing an efficient architecture for single and multi-class classification problems

Repositorio Institucional de la Universidad de Alicante

HAL-UJM

Online video-based abnormal detection using highly motion techniques and statistical measures

Author: Al-Dhamari Ahlam
Khamis Nor Hisham
Mahmood Nasrul Humaimi
Sudirman Rubita
Yahya Azli
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/08/2019
Field of study

At the essence of video surveillance, there are abnormal detection approaches, which have been proven to be substantially effective in detecting abnormal incidents without prior knowledge about these incidents. Based on the state-of-the-art research, it is evident that there is a trade-off between frame processing time and detection accuracy in abnormal detection approaches. Therefore, the primary challenge is to balance this trade-off suitably by utilizing few, but very descriptive features to fulfill online performance while maintaining a high accuracy rate. In this study, we propose a new framework, which achieves the balancing between detection accuracy and video processing time by employing two efficient motion techniques, specifically, foreground and optical flow energy. Moreover, we use different statistical analysis measures of motion features to get robust inference method to distinguish abnormal behavior incident from normal ones. The performance of this framework has been extensively evaluated in terms of the detection accuracy, the area under the curve (AUC) and frame processing time. Simulation results and comparisons with ten relevant online and non-online frameworks demonstrate that our framework efficiently achieves superior performance to those frameworks, in which it presents high values for he accuracy while attaining simultaneously low values for the processing time

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

Physics inspired methods for crowd video surveillance and analysis: a survey

Author: Yu Hui
Yu Qinan
Zhang Xuguang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/10/2018
Field of study

Portsmouth University Research Portal (Pure)

The Visual Social Distancing Problem

Author: Cristani Marco
Del Bue Alessio
Murino Vittorio
Setti Francesco
Vinciarelli Alessandro
Publication venue
Publication date: 01/01/2020
Field of study

One of the main and most effective measures to contain the recent viral outbreak is the maintenance of the so-called Social Distancing (SD). To comply with this constraint, workplaces, public institutions, transports and schools will likely adopt restrictions over the minimum inter-personal distance between people. Given this actual scenario, it is crucial to massively measure the compliance to such physical constraint in our life, in order to figure out the reasons of the possible breaks of such distance limitations, and understand if this implies a possible threat given the scene context. All of this, complying with privacy policies and making the measurement acceptable. To this end, we introduce the Visual Social Distancing (VSD) problem, defined as the automatic estimation of the inter-personal distance from an image, and the characterization of the related people aggregations. VSD is pivotal for a non-invasive analysis to whether people comply with the SD restriction, and to provide statistics about the level of safety of specific areas whenever this constraint is violated. We then discuss how VSD relates with previous literature in Social Signal Processing and indicate which existing Computer Vision methods can be used to manage such problem. We conclude with future challenges related to the effectiveness of VSD systems, ethical implications and future application scenarios.Comment: 9 pages, 5 figures. All the authors equally contributed to this manuscript and they are listed by alphabetical order. Under submissio

arXiv.org e-Print Archive

Catalogo dei prodotti della ricerca

Enlighten