Search CORE

7 research outputs found

A New Comprehensive Benchmark for Semi-supervised Video Anomaly Detection and Anticipation

Author: Cao Congqi
Lu Yue
Wang Peng
Zhang Yanning
Publication venue
Publication date: 22/05/2023
Field of study

Semi-supervised video anomaly detection (VAD) is a critical task in the intelligent surveillance system. However, an essential type of anomaly in VAD named scene-dependent anomaly has not received the attention of researchers. Moreover, there is no research investigating anomaly anticipation, a more significant task for preventing the occurrence of anomalous events. To this end, we propose a new comprehensive dataset, NWPU Campus, containing 43 scenes, 28 classes of abnormal events, and 16 hours of videos. At present, it is the largest semi-supervised VAD dataset with the largest number of scenes and classes of anomalies, the longest duration, and the only one considering the scene-dependent anomaly. Meanwhile, it is also the first dataset proposed for video anomaly anticipation. We further propose a novel model capable of detecting and anticipating anomalous events simultaneously. Compared with 7 outstanding VAD algorithms in recent years, our method can cope with scene-dependent anomaly detection and anomaly anticipation both well, achieving state-of-the-art performance on ShanghaiTech, CUHK Avenue, IITB Corridor and the newly proposed NWPU Campus datasets consistently. Our dataset and code is available at: https://campusvad.github.io.Comment: CVPR 202

arXiv.org e-Print Archive

VS-TransGRU: A Novel Transformer-GRU-based Framework Enhanced by Visual-Semantic Fusion for Egocentric Action Anticipation

Author: Cao Congqi
Lv Qinyi
Min Lingtong
Sun Ze
Zhang Yanning
Publication venue
Publication date: 08/07/2023
Field of study

Egocentric action anticipation is a challenging task that aims to make advanced predictions of future actions from current and historical observations in the first-person view. Most existing methods focus on improving the model architecture and loss function based on the visual input and recurrent neural network to boost the anticipation performance. However, these methods, which merely consider visual information and rely on a single network architecture, gradually reach a performance plateau. In order to fully understand what has been observed and capture the dependencies between current observations and future actions well enough, we propose a novel visual-semantic fusion enhanced and Transformer GRU-based action anticipation framework in this paper. Firstly, high-level semantic information is introduced to improve the performance of action anticipation for the first time. We propose to use the semantic features generated based on the class labels or directly from the visual observations to augment the original visual features. Secondly, an effective visual-semantic fusion module is proposed to make up for the semantic gap and fully utilize the complementarity of different modalities. Thirdly, to take advantage of both the parallel and autoregressive models, we design a Transformer based encoder for long-term sequential modeling and a GRU-based decoder for flexible iteration decoding. Extensive experiments on two large-scale first-person view datasets, i.e., EPIC-Kitchens and EGTEA Gaze+, validate the effectiveness of our proposed method, which achieves new state-of-the-art performance, outperforming previous approaches by a large margin.Comment: 12 pages, 7 figure

arXiv.org e-Print Archive

EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition

Author: Congqi Cao
Hanqing Lu
Jian Cheng
Yifan Zhang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Adaptive Graph Convolutional Networks for Weakly Supervised Anomaly Detection in Videos

Author: Cao Congqi
Wang Peng
Zhang Shizhou
Zhang Xin
Zhang Yanning
Publication venue
Publication date: 28/06/2022
Field of study

For weakly supervised anomaly detection, most existing work is limited to the problem of inadequate video representation due to the inability of modeling long-term contextual information. To solve this, we propose a novel weakly supervised adaptive graph convolutional network (WAGCN) to model the complex contextual relationship among video segments. By which, we fully consider the influence of other video segments on the current one when generating the anomaly probability score for each segment. Firstly, we combine the temporal consistency as well as feature similarity of video segments to construct a global graph, which makes full use of the association information among spatial-temporal features of anomalous events in videos. Secondly, we propose a graph learning layer in order to break the limitation of setting topology manually, which can extract graph adjacency matrix based on data adaptively and effectively. Extensive experiments on two public datasets (i.e., UCF-Crime dataset and ShanghaiTech dataset) demonstrate the effectiveness of our approach which achieves state-of-the-art performance

arXiv.org e-Print Archive

Protective effects of l

Author: Alvarez
Chuyan Li
Congqi Du
Dokmeci
Feng Li
Haifeng Cao
Huan Mao
Lingnv Yao
Reyes-Moreno
Saleh
Stohs
Tanphaichitr
Wei Zhang
Wenqin Lin
Publication venue: 'Cambridge University Press (CUP)'
Publication date
Field of study

Crossref