3 research outputs found
Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection
Anomalies are rare and anomaly detection is often therefore framed as
One-Class Classification (OCC), i.e. trained solely on normalcy. Leading OCC
techniques constrain the latent representations of normal motions to limited
volumes and detect as abnormal anything outside, which accounts satisfactorily
for the openset'ness of anomalies. But normalcy shares the same openset'ness
property, since humans can perform the same action in several ways, which the
leading techniques neglect. We propose a novel generative model for video
anomaly detection (VAD), which assumes that both normality and abnormality are
multimodal. We consider skeletal representations and leverage state-of-the-art
diffusion probabilistic models to generate multimodal future human poses. We
contribute a novel conditioning on the past motion of people, and exploit the
improved mode coverage capabilities of diffusion processes to generate
different-but-plausible future motions. Upon the statistical aggregation of
future modes, anomaly is detected when the generated set of motions is not
pertinent to the actual future. We validate our model on 4 established
benchmarks: UBnormal, HR-UBnormal, HR-STC, and HR-Avenue, with extensive
experiments surpassing state-of-the-art results.Comment: Accepted at ICCV202
Contracting Skeletal Kinematic Embeddings for Anomaly Detection
Detecting the anomaly of human behavior is paramount to timely recognizing
endangering situations, such as street fights or elderly falls. However,
anomaly detection is complex, since anomalous events are rare and because it is
an open set recognition task, i.e., what is anomalous at inference has not been
observed at training. We propose COSKAD, a novel model which encodes skeletal
human motion by an efficient graph convolutional network and learns to COntract
SKeletal kinematic embeddings onto a latent hypersphere of minimum volume for
Anomaly Detection. We propose and analyze three latent space designs for
COSKAD: the commonly-adopted Euclidean, and the new spherical-radial and
hyperbolic volumes. All three variants outperform the state-of-the-art,
including video-based techniques, on the ShangaiTechCampus, the Avenue, and on
the most recent UBnormal dataset, for which we contribute novel skeleton
annotations and the selection of human-related videos. The source code and
dataset will be released upon acceptance.Comment: Submitted to Patter Recognition Journa