150 research outputs found

    Real-World Anomaly Detection in Video Using Spatio-Temporal Features Analysis for Weakly Labelled Data with Auto Label Generation

    Get PDF
    Detecting anomalies in videos is a complex task due to diverse content, noisy labeling, and a lack of frame-level labeling. To address these challenges in weakly labeled datasets, we propose a novel custom loss function in conjunction with the multi-instance learning (MIL) algorithm. Our approach utilizes the UCF Crime and ShanghaiTech datasets for anomaly detection. The UCF Crime dataset includes labeled videos depicting a range of incidents such as explosions, assaults, and burglaries, while the ShanghaiTech dataset is one of the largest anomaly datasets, with over 400 video clips featuring three different scenes and 130 abnormal events. We generated pseudo labels for videos using the MIL technique to detect frame-level anomalies from video-level annotations, and to train the network to distinguish between normal and abnormal classes. We conducted extensive experiments on the UCF Crime dataset using C3D and I3D features to test our model\u27s performance. For the ShanghaiTech dataset, we used I3D features for training and testing. Our results show that with I3D features, we achieve an 84.6% frame-level AUC score for the UCF Crime dataset and a 92.27% frame-level AUC score for the ShanghaiTech dataset, which are comparable to other methods used for similar datasets

    RecapNet: Action Proposal Generation Mimicking Human Cognitive Process

    Get PDF
    International audienceGenerating action proposals in untrimmed videos is a challenging task, since video sequences usually contain lots of irrelevant contents and the duration of an action instance is arbitrary. The quality of action proposals is key to action detection performance. The previous methods mainly rely on sliding windows or anchor boxes to cover all ground-truth actions, but this is infeasible and computationally inefficient. To this end, this article proposes a RecapNet--a novel framework for generating action proposal, by mimicking the human cognitive process of understanding video content. Specifically, this RecapNet includes a residual causal convolution module to build a short memory of the past events, based on which the joint probability actionness density ranking mechanism is designed to retrieve the action proposals. The RecapNet can handle videos with arbitrary length and more important, a video sequence will need to be processed only in one single pass in order to generate all action proposals. The experiments show that the proposed RecapNet outperforms the state of the art under all metrics on the benchmark THUMOS14 and ActivityNet-1.3 datasets. The code is available publicly at https://github.com/tianwangbuaa/RecapNet

    Re-Identification in Urban Scenarios: A Review of Tools and Methods

    Get PDF
    With the widespread use of surveillance image cameras and enhanced awareness of public security, objects, and persons Re-Identification (ReID), the task of recognizing objects in non-overlapping camera networks has attracted particular attention in computer vision and pattern recognition communities. Given an image or video of an object-of-interest (query), object identification aims to identify the object from images or video feed taken from different cameras. After many years of great effort, object ReID remains a notably challenging task. The main reason is that an object's appearance may dramatically change across camera views due to significant variations in illumination, poses or viewpoints, or even cluttered backgrounds. With the advent of Deep Neural Networks (DNN), there have been many proposals for different network architectures achieving high-performance levels. With the aim of identifying the most promising methods for ReID for future robust implementations, a review study is presented, mainly focusing on the person and multi-object ReID and auxiliary methods for image enhancement. Such methods are crucial for robust object ReID, while highlighting limitations of the identified methods. This is a very active field, evidenced by the dates of the publications found. However, most works use data from very different datasets and genres, which presents an obstacle to wide generalized DNN model training and usage. Although the model's performance has achieved satisfactory results on particular datasets, a particular trend was observed in the use of 3D Convolutional Neural Networks (CNN), attention mechanisms to capture object-relevant features, and generative adversarial training to overcome data limitations. However, there is still room for improvement, namely in using images from urban scenarios among anonymized images to comply with public privacy legislation. The main challenges that remain in the ReID field, and prospects for future research directions towards ReID in dense urban scenarios, are also discussed
    corecore