7 research outputs found
A New Comprehensive Benchmark for Semi-supervised Video Anomaly Detection and Anticipation
Semi-supervised video anomaly detection (VAD) is a critical task in the
intelligent surveillance system. However, an essential type of anomaly in VAD
named scene-dependent anomaly has not received the attention of researchers.
Moreover, there is no research investigating anomaly anticipation, a more
significant task for preventing the occurrence of anomalous events. To this
end, we propose a new comprehensive dataset, NWPU Campus, containing 43 scenes,
28 classes of abnormal events, and 16 hours of videos. At present, it is the
largest semi-supervised VAD dataset with the largest number of scenes and
classes of anomalies, the longest duration, and the only one considering the
scene-dependent anomaly. Meanwhile, it is also the first dataset proposed for
video anomaly anticipation. We further propose a novel model capable of
detecting and anticipating anomalous events simultaneously. Compared with 7
outstanding VAD algorithms in recent years, our method can cope with
scene-dependent anomaly detection and anomaly anticipation both well, achieving
state-of-the-art performance on ShanghaiTech, CUHK Avenue, IITB Corridor and
the newly proposed NWPU Campus datasets consistently. Our dataset and code is
available at: https://campusvad.github.io.Comment: CVPR 202
VS-TransGRU: A Novel Transformer-GRU-based Framework Enhanced by Visual-Semantic Fusion for Egocentric Action Anticipation
Egocentric action anticipation is a challenging task that aims to make
advanced predictions of future actions from current and historical observations
in the first-person view. Most existing methods focus on improving the model
architecture and loss function based on the visual input and recurrent neural
network to boost the anticipation performance. However, these methods, which
merely consider visual information and rely on a single network architecture,
gradually reach a performance plateau. In order to fully understand what has
been observed and capture the dependencies between current observations and
future actions well enough, we propose a novel visual-semantic fusion enhanced
and Transformer GRU-based action anticipation framework in this paper. Firstly,
high-level semantic information is introduced to improve the performance of
action anticipation for the first time. We propose to use the semantic features
generated based on the class labels or directly from the visual observations to
augment the original visual features. Secondly, an effective visual-semantic
fusion module is proposed to make up for the semantic gap and fully utilize the
complementarity of different modalities. Thirdly, to take advantage of both the
parallel and autoregressive models, we design a Transformer based encoder for
long-term sequential modeling and a GRU-based decoder for flexible iteration
decoding. Extensive experiments on two large-scale first-person view datasets,
i.e., EPIC-Kitchens and EGTEA Gaze+, validate the effectiveness of our proposed
method, which achieves new state-of-the-art performance, outperforming previous
approaches by a large margin.Comment: 12 pages, 7 figure
Adaptive Graph Convolutional Networks for Weakly Supervised Anomaly Detection in Videos
For weakly supervised anomaly detection, most existing work is limited to the
problem of inadequate video representation due to the inability of modeling
long-term contextual information. To solve this, we propose a novel weakly
supervised adaptive graph convolutional network (WAGCN) to model the complex
contextual relationship among video segments. By which, we fully consider the
influence of other video segments on the current one when generating the
anomaly probability score for each segment. Firstly, we combine the temporal
consistency as well as feature similarity of video segments to construct a
global graph, which makes full use of the association information among
spatial-temporal features of anomalous events in videos. Secondly, we propose a
graph learning layer in order to break the limitation of setting topology
manually, which can extract graph adjacency matrix based on data adaptively and
effectively. Extensive experiments on two public datasets (i.e., UCF-Crime
dataset and ShanghaiTech dataset) demonstrate the effectiveness of our approach
which achieves state-of-the-art performance