Search CORE

1,837 research outputs found

Large-Scale Mapping of Human Activity using Geo-Tagged Videos

Author: Liu Sen
Newsam Shawn
Zhu Yi
Publication venue
Publication date: 28/11/2017
Field of study

This paper is the first work to perform spatio-temporal mapping of human activity using the visual content of geo-tagged videos. We utilize a recent deep-learning based video analysis framework, termed hidden two-stream networks, to recognize a range of activities in YouTube videos. This framework is efficient and can run in real time or faster which is important for recognizing events as they occur in streaming video or for reducing latency in analyzing already captured video. This is, in turn, important for using video in smart-city applications. We perform a series of experiments to show our approach is able to accurately map activities both spatially and temporally. We also demonstrate the advantages of using the visual content over the tags/titles.Comment: Accepted at ACM SIGSPATIAL 201

arXiv.org e-Print Archive

Crossref

A fully integrated violence detection system using CNN and LSTM

Author: Jayavel Kayalvizhi
Naraharisetti Saamaja
Sharma Sarthak
Sudharsan B.
Trehan Vimarsh
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/08/2021
Field of study

Recently, the number of violence-related cases in places such as remote roads, pathways, shopping malls, elevators, sports stadiums, and liquor shops, has increased drastically which are unfortunately discovered only after it’s too late. The aim is to create a complete system that can perform real-time video analysis which will help recognize the presence of any violent activities and notify the same to the concerned authority, such as the police department of the corresponding area. Using the deep learning networks CNN and LSTM along with a well-defined system architecture, we have achieved an efficient solution that can be used for real-time analysis of video footage so that the concerned authority can monitor the situation through a mobile application that can notify about an occurrence of a violent event immediately

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Spatio-temporal action localization with Deep Learning

Author: Monteiro Carlos Filipe Batista Cardoso
Publication venue
Publication date: 08/06/2022
Field of study

Dissertação de mestrado em Engenharia InformáticaThe system that detects and identifies human activities are named human action recognition. On the video approach, human activity is classified into four different categories, depending on the complexity of the steps and the number of body parts involved in the action, namely gestures, actions, interactions, and activities, which is challenging for video Human action recognition to capture valuable and discriminative features because of the human body’s variations. So, deep learning techniques have provided practical applications in multiple fields of signal processing, usually surpassing traditional signal processing on a large scale. Recently, several applications, namely surveillance, human-computer interaction, and video recovery based on its content, have studied violence’s detection and recognition. In recent years there has been a rapid growth in the production and consumption of a wide variety of video data due to the popularization of high quality and relatively low-price video devices. Smartphones and digital cameras contributed a lot to this factor. At the same time, there are about 300 hours of video data updates every minute on YouTube. Along with the growing production of video data, new technologies such as video captioning, answering video surveys, and video-based activity/event detection are emerging every day. From the video input data, the detection of human activity indicates which activity is contained in the video and locates the regions in the video where the activity occurs. This dissertation has conducted an experiment to identify and detect violence with spatial action localization, adapting a public dataset for effect. The idea was used an annotated dataset of general action recognition and adapted only for violence detection.O sistema que deteta e identifica as atividades humanas é denominado reconhecimento da ação humana. Na abordagem por vídeo, a atividade humana é classificada em quatro categorias diferentes, dependendo da complexidade das etapas e do número de partes do corpo envolvidas na ação, a saber, gestos, ações, interações e atividades, o que é desafiador para o reconhecimento da ação humana do vídeo para capturar características valiosas e discriminativas devido às variações do corpo humano. Portanto, as técnicas de deep learning forneceram aplicações práticas em vários campos de processamento de sinal, geralmente superando o processamento de sinal tradicional em grande escala. Recentemente, várias aplicações, nomeadamente na vigilância, interação humano computador e recuperação de vídeo com base no seu conteúdo, estudaram a deteção e o reconhecimento da violência. Nos últimos anos, tem havido um rápido crescimento na produção e consumo de uma ampla variedade de dados de vídeo devido à popularização de dispositivos de vídeo de alta qualidade e preços relativamente baixos. Smartphones e cameras digitais contribuíram muito para esse fator. Ao mesmo tempo, há cerca de 300 horas de atualizações de dados de vídeo a cada minuto no YouTube. Junto com a produção crescente de dados de vídeo, novas tecnologias, como legendagem de vídeo, respostas a pesquisas de vídeo e deteção de eventos / atividades baseadas em vídeo estão surgindo todos os dias. A partir dos dados de entrada de vídeo, a deteção de atividade humana indica qual atividade está contida no vídeo e localiza as regiões no vídeo onde a atividade ocorre. Esta dissertação conduziu uma experiência para identificar e detetar violência com localização espacial, adaptando um dataset público para efeito. A ideia foi usada um conjunto de dados anotado de reconhecimento de ações gerais e adaptá-la apenas para deteção de violência

Universidade do Minho: RepositoriUM

Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling

Author: Hachiuma Ryo
Sato Fumiaki
Sekii Taiki
Publication venue
Publication date: 27/03/2023
Field of study

This paper simultaneously addresses three limitations associated with conventional skeleton-based action recognition; skeleton detection and tracking errors, poor variety of the targeted actions, as well as person-wise and frame-wise action recognition. A point cloud deep-learning paradigm is introduced to the action recognition, and a unified framework along with a novel deep neural network architecture called Structured Keypoint Pooling is proposed. The proposed method sparsely aggregates keypoint features in a cascaded manner based on prior knowledge of the data structure (which is inherent in skeletons), such as the instances and frames to which each keypoint belongs, and achieves robustness against input errors. Its less constrained and tracking-free architecture enables time-series keypoints consisting of human skeletons and nonhuman object contours to be efficiently treated as an input 3D point cloud and extends the variety of the targeted action. Furthermore, we propose a Pooling-Switching Trick inspired by Structured Keypoint Pooling. This trick switches the pooling kernels between the training and inference phases to detect person-wise and frame-wise actions in a weakly supervised manner using only video-level action labels. This trick enables our training scheme to naturally introduce novel data augmentation, which mixes multiple point clouds extracted from different videos. In the experiments, we comprehensively verify the effectiveness of the proposed method against the limitations, and the method outperforms state-of-the-art skeleton-based action recognition and spatio-temporal action localization methods.Comment: CVPR 202

arXiv.org e-Print Archive

Automatic video censoring system using deep learning

Author: Batra Mridula
Bhatia Madhulika
Bhatia Shaveta
Tanwar Poonam
Verma Yash
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/12/2022
Field of study

Due to the extensive use of video-sharing platforms and services, the amount of such all kinds of content on the web has become massive. This abundance of information is a problem controlling the kind of content that may be present in such a video. More than telling if the content is suitable for children and sensitive people or not, figuring it out is also important what parts of it contains such content, for preserving parts that would be discarded in a simple broad analysis. To tackle this problem, a comparison was done for popular image deep learning models: MobileNetV2, Xception model, InceptionV3, VGG16, VGG19, ResNet101 and ResNet50 to seek the one that is most suitable for the required application. Also, a system is developed that would automatically censor inappropriate content such as violent scenes with the help of deep learning. The system uses a transfer learning mechanism using the VGG16 model. The experiments suggested that the model showed excellent performance for the automatic censoring application that could also be used in other similar applications

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

DyAnNet: A Scene Dynamicity Guided Self-Trained Video Anomaly Detection Network

Author: Choi Heeseung
Dogra Debi Prosad
Kim Ig-Jae
Raghuwanshi Yash
Thakare Kamalakar
Publication venue
Publication date: 02/11/2022
Field of study

Unsupervised approaches for video anomaly detection may not perform as good as supervised approaches. However, learning unknown types of anomalies using an unsupervised approach is more practical than a supervised approach as annotation is an extra burden. In this paper, we use isolation tree-based unsupervised clustering to partition the deep feature space of the video segments. The RGB- stream generates a pseudo anomaly score and the flow stream generates a pseudo dynamicity score of a video segment. These scores are then fused using a majority voting scheme to generate preliminary bags of positive and negative segments. However, these bags may not be accurate as the scores are generated only using the current segment which does not represent the global behavior of a typical anomalous event. We then use a refinement strategy based on a cross-branch feed-forward network designed using a popular I3D network to refine both scores. The bags are then refined through a segment re-mapping strategy. The intuition of adding the dynamicity score of a segment with the anomaly score is to enhance the quality of the evidence. The method has been evaluated on three popular video anomaly datasets, i.e., UCF-Crime, CCTV-Fights, and UBI-Fights. Experimental results reveal that the proposed framework achieves competitive accuracy as compared to the state-of-the-art video anomaly detection methods.Comment: 10 pages, 8 figures, and 4 tables. (ACCEPTED AT WACV 2023

arXiv.org e-Print Archive

A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision

Author: Cao Shidong
Chai Wenhao
Hao Shengyu
Hu Wenhao
Hwang Jenq-Neng
Song Mingli
Wang Gaoang
Wang Guanhong
Zhao Zhonghan
Publication venue
Publication date: 06/07/2023
Field of study

Deep learning has the potential to revolutionize sports performance, with applications ranging from perception and comprehension to decision. This paper presents a comprehensive survey of deep learning in sports performance, focusing on three main aspects: algorithms, datasets and virtual environments, and challenges. Firstly, we discuss the hierarchical structure of deep learning algorithms in sports performance which includes perception, comprehension and decision while comparing their strengths and weaknesses. Secondly, we list widely used existing datasets in sports and highlight their characteristics and limitations. Finally, we summarize current challenges and point out future trends of deep learning in sports. Our survey provides valuable reference material for researchers interested in deep learning in sports applications

arXiv.org e-Print Archive

CrimeNet: Neural Structured Learning using Vision Transformer for violence detection

Author: Fernando J. Rendón-Segador
Jose L. Salazar González
Juan A. Álvarez-García
Tatiana Tommasi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2023
Field of study

The state of the art in violence detection in videos has improved in recent years thanks to deep learning models, but it is still below 90% of average precision in the most complex datasets, which may pose a problem of frequent false alarms in video surveillance environments and may cause security guards to disable the artificial intelligence system. In this study, we propose a new neural network based on Vision Transformer (ViT) and Neural Structured Learning (NSL) with adversarial training. This network, called CrimeNet, outperforms previous works by a large margin and reduces practically to zero the false positives. Our tests on the four most challenging violence-related datasets (binary and multi-class) show the effectiveness of CrimeNet, improving the state of the art from 9.4 to 22.17 percentage points in ROC AUC depending on the dataset. In addition, we present a generalisation study on our model by training and testing it on different datasets. The obtained results show that CrimeNet improves over competing methods with a gain of between 12.39 and 25.22 percentage points, showing remarkable robustness.MCIN/AEI/ 10.13039/501100011033 and by the “European Union NextGenerationEU/PRTR ” HORUS project - Grant n. PID2021-126359OB-I0

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

idUS. Depósito de Investigación Universidad de Sevilla

Enhancing camera surveillance using computer vision: a research note

Author: Idrees Haroon
Shah Mubarak
Surette Ray
Publication venue
Publication date: 01/01/2018
Field of study

\mathbf{Purpose}

- The growth of police operated surveillance cameras has out-paced the ability of humans to monitor them effectively. Computer vision is a possible solution. An ongoing research project on the application of computer vision within a municipal police department is described. The paper aims to discuss these issues.

\mathbf{Design/methodology/approach}

- Following the demystification of computer vision technology, its potential for police agencies is developed within a focus on computer vision as a solution for two common surveillance camera tasks (live monitoring of multiple surveillance cameras and summarizing archived video files). Three unaddressed research questions (can specialized computer vision applications for law enforcement be developed at this time, how will computer vision be utilized within existing public safety camera monitoring rooms, and what are the system-wide impacts of a computer vision capability on local criminal justice systems) are considered.

\mathbf{Findings}

- Despite computer vision becoming accessible to law enforcement agencies the impact of computer vision has not been discussed or adequately researched. There is little knowledge of computer vision or its potential in the field.

\mathbf{Originality/value}

- This paper introduces and discusses computer vision from a law enforcement perspective and will be valuable to police personnel tasked with monitoring large camera networks and considering computer vision as a system upgrade

arXiv.org e-Print Archive

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)