Search CORE

909 research outputs found

Large-Scale Mapping of Human Activity using Geo-Tagged Videos

Author: Liu Sen
Newsam Shawn
Zhu Yi
Publication venue
Publication date: 28/11/2017
Field of study

This paper is the first work to perform spatio-temporal mapping of human activity using the visual content of geo-tagged videos. We utilize a recent deep-learning based video analysis framework, termed hidden two-stream networks, to recognize a range of activities in YouTube videos. This framework is efficient and can run in real time or faster which is important for recognizing events as they occur in streaming video or for reducing latency in analyzing already captured video. This is, in turn, important for using video in smart-city applications. We perform a series of experiments to show our approach is able to accurately map activities both spatially and temporally. We also demonstrate the advantages of using the visual content over the tags/titles.Comment: Accepted at ACM SIGSPATIAL 201

arXiv.org e-Print Archive

Crossref

Two-stream Multi-dimensional Convolutional Network for Real-time Violence Detection

Author: Chakrabarty Amitabha
Ghosh Dipon Kumar
Publication venue
Publication date: 08/11/2022
Field of study

The increasing number of surveillance cameras and security concerns have made automatic violent activity detection from surveillance footage an active area for research. Modern deep learning methods have achieved good accuracy in violence detection and proved to be successful because of their applicability in intelligent surveillance systems. However, the models are computationally expensive and large in size because of their inefficient methods for feature extraction. This work presents a novel architecture for violence detection called Two-stream Multi-dimensional Convolutional Network (2s-MDCN), which uses RGB frames and optical flow to detect violence. Our proposed method extracts temporal and spatial information independently by 1D, 2D, and 3D convolutions. Despite combining multi-dimensional convolutional networks, our models are lightweight and efficient due to reduced channel capacity, yet they learn to extract meaningful spatial and temporal information. Additionally, combining RGB frames and optical flow yields 2.2% more accuracy than a single RGB stream. Regardless of having less complexity, our models obtained state-of-the-art accuracy of 89.7% on the largest violence detection benchmark dataset.Comment: 8 pages, 6 figure

arXiv.org e-Print Archive

Study on Deep Learning Techniques for Finding Suspicious Violence Detection in a Video Surveillance

Author: B. Venkatesh et al.
Publication venue: Auricle Global Society of Education and Research
Publication date: 05/11/2023
Field of study

Detecting suspicious visual objects is essential to applying automatic violence detection (AVD) in video surveillance. Continuous monitoring of objects or any unusual things is a tedious task. Learning about video surveillance is an emerging research problem in AVD applications. Deep learning is an intelligent and trustworthy technique for detecting or classifying suspicious data objects. It classifies suspicious video frames by modeling specific categories of videos. The current deep models convolutional neural network (CNN), convolutional long-term and short-term memory (ConvLSTM), AlexNet, VGG-16, MobileNet, and GoogleNet, are wildly succeeded in real-time violence detection with the input of video clips. This paper presents the findings of experimental studies for deep models using classification measures to demonstrate the models' efficacy for our AVD application. Benchmarked violence (V), non-violence (NV), and weapon violence (WV) video datasets are used in the experiment to describe the model's performance while classifying suspicious videos for public safety

International Journal on Recent and Innovation Trends in Computing and Communication

A fully integrated violence detection system using CNN and LSTM

Author: Jayavel Kayalvizhi
Naraharisetti Saamaja
Sharma Sarthak
Sudharsan B.
Trehan Vimarsh
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/08/2021
Field of study

Recently, the number of violence-related cases in places such as remote roads, pathways, shopping malls, elevators, sports stadiums, and liquor shops, has increased drastically which are unfortunately discovered only after it’s too late. The aim is to create a complete system that can perform real-time video analysis which will help recognize the presence of any violent activities and notify the same to the concerned authority, such as the police department of the corresponding area. Using the deep learning networks CNN and LSTM along with a well-defined system architecture, we have achieved an efficient solution that can be used for real-time analysis of video footage so that the concerned authority can monitor the situation through a mobile application that can notify about an occurrence of a violent event immediately

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

상황 판단을 기반으로 한 폭력 감지 인공 지능 모델의 효율화: 감시 카메라 시나리오를 중심으로

Author: 서아름
Publication venue: 서울대학교 대학원
Publication date: 01/02/2023
Field of study

학위논문(석사) -- 서울대학교대학원 : 데이터사이언스대학원 데이터사이언스학과, 2023. 2. 김형신.Recently, CCTVs are installed everywhere and play an important role in crime prevention and investigation. However, there is a problem in that a huge amount of manpower is required to monitor CCTV recordings. From this point of view, deep learning (DNN) based models that can automatically detect violence have been developed. However, they used heavy architectures such as 3D convolution or LSTM to process video data. For this reason, they require offloading to the central server for recordings to be processed so that incur huge transmission cost and privacy concern. Furthermore, given violence does not occur frequently, it is inefficient to run heavy video recognition model all the time. To solve these problems, this study proposes WhenToWatch, to enhance efficiency of violence detection system on surveillance camera. Main goals of this study are as follows: (1) To devise DNN-based violence detection system fully run on the CCTV devices to avoid offloading cost and privacy issues. (2) To reduce energy consumption of the device and processing time by introducing pre-screening module checking existence of people and deciding whether violence detection model should be executed or not. (3) To minimize computation overhead of the pre-screening module by combining lightweight non-DNN based methods and executing them according to previous status. In conclusion, WhenToWatch can be helpful when running violence detection models on edge devices such as CCTV, where power and computing resources are limited. Experiments show that WhenToWatch can reduce the execution of the violence detection model by 17% on the RWF-2000 dataset and 31% on the CCTV-Busan dataset. In addition, WhenToWatch reduces average processing time per a video from 310.46 seconds to 255.60 seconds and average power consumption from 3,303mW to 3,100mW on Jetson Nano, confirming it contributes to efficient on-device system operation.최근에는 안전을 위해 CCTV가 곳곳에 설치되어 있으며 범죄 예방 및 수사에 중요한 역할을 하고 있다. 그러나 CCTV 영상들을 실시간으로 감시하거나 녹화된 영상을 재검토하기 위해서는 막대한 인력이 필요하다는 문제점이 있다. 이러한 관점에서 자동으로 폭력을 감지할 수 있는 딥러닝 모델들이 꾸준히 개발되어왔다. 그러나 대부분의 모델은 3D 컨볼루션, LSTM 등의 무거운 영상처리 모델을 사용했기 때문에 CCTV 디바이스 내에서의 추론은 거의 불가능했고, 서버로 영상을 전송하여 처리하는 것을 전제로 한다. 이 경우 막대한 전송 비용이 발생할 뿐만 아니라 사생활 침해 문제가 발생할 소지가 있다. 뿐만 아니라, 폭력은 일반적인 사건에 비해 발생 빈도가 낮다는 점을 고려한다면 CCTV 동작 시간 내내 무거운 폭력 감지 모델을 구동하는 것은 비효율적이라고 할 수 있다. 이러한 문제점들을 해결하고 폭력 감지 시스템의 효율성을 제고하기 위해 본 연구에서는 WhenToWatch라는 폭력 감지 시스템을 제안한다. 본 연구의 주요 목적은 다음과 같다. (1) 데이터 전송 비용을 최소화하고 개인정보를 보호하기 위해 감시카메라 장치 내에서 구동 가능한 딥러닝 기반의 폭력 감지 시스템을 제안한다. (2) 감시카메라 장치의 전력 소모량과 데이터 처리 시간을 줄이기 위해 사전 판단 모듈을 도입한다. 이를 통해 사람의 존재 여부를 판단하고 폭력 감지 모델의 실행 여부를 결정함으로써 불필요한 연산량을 줄일 수 있다. (3) 사전 판단 모듈로 인한 추가적인 연산량 부담을 최소화하기 위해 실행속도가 빠른 비 딥러닝 기반의 방법론들을 결합한 시스템을 디자인하고, 이전 상태에 따라 적절한 연산을 실행한다. 최종적으로 WhenToWatch는 CCTV와 같이 리소스가 제한된 엣지 디바이스에서 폭력 감지 모델을 효율적으로 구동할 수 있게 한다. 실제 실험 결과, 제안된 사전 판단 모듈을 적용했을 때, 폭력 감지 모델의 실행 횟수는 RWF-2000 데이터셋에서 약 17% 감소했으며 CCTV-Busan 데이터셋에서는 약 31% 감소하는 것으로 나타났다. 본 논문의 시스템 구조를 통해 보다 효율적인 시스템 운영이 가능함을 확인할 수 있었다. 또한 젯슨 나노에서 평균 비디오 처리 시간은 310.46초에서 255.60초로 감소하였으며 전력 소모량은 3,303mW에서 3,100mW로 감소하여 WhenToWatch가 효율적인 온디바이스 시스템 운영에 기여할 수 있음을 보여주었다.1 Introduction 1 2 Related Work 6 2.1 Violence Detection 6 2.2 Edge AI 7 2.3 Early-skipping in Neural Networks 8 3 Methodology 9 3.1 WhenToWatch Overview 9 3.2 Implementation Details of Sub-modules 12 3.3 Dataset 15 3.4 On-device Inference 16 4 Evaluation 18 4.1 Performance of Violence Detector 18 4.2 Effect of Pre-screening Module 19 4.3 Efficiency Measurement on Jeton Nano 21 5 Discussion and Future Work 23 5.1 Discussion and Future Work 23 6 Conclusion 24 6.1 Conclusion 24 Bibliography 25 Abstract in Korean 35석

SNU Open Repository and Archive