In this era, security trends identified violence as a significant issue plaguing society globally. Statistics depicted alarming thresholds for violence, establishing itself as a momentous challenge for homeland security and defence institutions, predominantly in schools and other public locations. The advent of state-of-the-art closed circuit television (CCTV) surveillance solutions exists to aid in limiting the manifestations of violence and its impact. However, most institutions need proper analysis mechanisms that lead to prevention, apprehension, or conviction in a timely fashion. Manually monitoring and collectively analysing anthropometric data generated by CCTV surveillance devices proved impractical and time-consuming, and its outcome increases the complexity of identifying violent behavioural patterns as substantial evidence. Despite innovative CCTV sensor improvement, the impact of adequately analysing vast amounts of CCTV data adds to the monitoring challenge. This thesis proposed the amalgamation of the ”You Only Look Once version five medium” model (YOLOv5m) as activity recognition and Three Dimensional Convolution Neural Network Single level (3DCNNsl) activity recognition, two state-of-the-art artificial intelligence models incorporating weight embedding procedures to identify primitive stages of violence and weapons artefacts. The approach integrates classification support to confirm the existence of specific weapon objects (knives, bladed instruments, clubs, and guns) of interest belonging to a specific class of violence (beating, shooting, stabbing).
It also validates the presence (primitive stages of violence) of violent classes by utilising the existence of weapons belonging to its category group to infer the activity outcome. Utilising classification support concepts to validate the existence of primitive stages of violence enhances the classification outcome of violent activity recognition with robust results. This thesis commenced by conducting a two-stage literature investigation to satisfy
the research objectives, which disclosed the state-of-the-art 3DCNNsl at stage one and the YOLOv5m framework for activity with artefact recognition towards violence at stage
two. The proposed one-stage (simultaneously performing object localisation and classification) solution combines the models’ processing, reducing the impact of their architectural limitations. 3DCNNsl facilitates behavioural pattern classification, generically associating sub-class labels suggesting the presence of violence at high accuracy. In addition to 3DCNNsl, YOLOv5m architecture serves two functions: operating in an activity recognition capacity, fortifying 3DCNNsl activity output, and detecting artefacts, which establish the presence of weapons, enhancing the action classification and overall accuracy. The thesis
optimised the deep learning model selections by identifying violence in scenarios and validating its presence through a redundant weapon artefact classification weight embedding
procedure. The concept allows the classification of violence in its primitive stages before its impact escalates to lethal outcomes. The proposal extensively reviewed its operations via transfer learning in multiple fusion scenarios to identify the most optimal strategies to
realise the research objective. The evaluation dataset utilised in this thesis encompassed a selection of samples accumulated via the University of Central Florida (UCF) dataset and several social media forums. The violent action samples reflect several multifaceted real-world scenarios representing sporadic accelerated motion attributes in various environments, which aids in reducing the risk of dispensing biased results and affecting the model’s robustness. The proposal disclosed three contributory elements, which reflect the following; 1. Conducted performance testing of two known machine learning techniques (YOLOv5m and 3DCNNsl) in independently recognising violent and non-violent activities in CCTV video footage. 2. Demonstrated violent activity recognition performance in such videos when both machine learning techniques operate in tandem. 3. Implemented performance enhancement by further incorporating threat object detection in the previous combined solution. Contribution one disclosed the effectiveness of YOLOv5m activity recognition at 74% and 3 the state-of-the-art 3DCNN at 75%, conceding high misclassifications utilising data with and without augmentations and resolution modifications. The operations emphasised the obligation to explore alternative processing measures to alleviate the disadvantages of the two machine learning models. Contribution two emphasised the effectiveness of fusion enhancement techniques via decision-level voting at 85.20% over 3DCNNsl and YOLOv5m activity recognition. As a validation strategy, the operations incorporated surplus data encompassing 50 samples designed to enhance the classification complexity. The approach
rigorously appraised the operations, thus confirming its applicability. Contribution three showcased the amalgamation of fusion’s activity recognition and the power of object detection to establish its effectiveness in concatenating weight embedding. The experiments maintained data consistency similar to contribution two. Analysis disclosed the dominance of fusion incorporating threat object detection at 88.20% over 3DCNNsl, YOLOv5m activity recognition, and fusion without threat object enhancement. The results underscore the robustness of the proposed method, which has proven its classification competence, particularly in scenarios with surplus data, from an overall accuracy perspective. While the proposal debates the efficiency of individual processing compared to fusion without support, the research endeavour accentuates the effectiveness of integrating classification redundancy through weight embedding to suggest the presence of artefacts confirming the occurrence of violent actions. The findings highlight the effectiveness of the proposed method without artefact processing at 85.20% while incorporating threat object support analysis concatenating weapons (knife, club, gun in the videos) improved the accuracy to 88.20%. This evidence substantiates the solution’s robustness, fulfilling the research objectives to conclude the investigations
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.