Video anomaly detection (VAD) remains a challenging task in the pattern
recognition community due to the ambiguity and diversity of abnormal events.
Existing deep learning-based VAD methods usually leverage proxy tasks to learn
the normal patterns and discriminate the instances that deviate from such
patterns as abnormal. However, most of them do not take full advantage of
spatial-temporal correlations among video frames, which is critical for
understanding normal patterns. In this paper, we address unsupervised VAD by
learning the evolution regularity of appearance and motion in the long and
short-term and exploit the spatial-temporal correlations among consecutive
frames in normal videos more adequately. Specifically, we proposed to utilize
the spatiotemporal long short-term memory (ST-LSTM) to extract and memorize
spatial appearances and temporal variations in a unified memory cell. In
addition, inspired by the generative adversarial network, we introduce a
discriminator to perform adversarial learning with the ST-LSTM to enhance the
learning capability. Experimental results on standard benchmarks demonstrate
the effectiveness of spatial-temporal correlations for unsupervised VAD. Our
method achieves competitive performance compared to the state-of-the-art
methods with AUCs of 96.7%, 87.8%, and 73.1% on the UCSD Ped2, CUHK Avenue, and
ShanghaiTech, respectively.Comment: This paper is accepted at IEEE 26TH International Conference on
Pattern Recognition (ICPR) 202