1,037 research outputs found
Deep Weakly-supervised Anomaly Detection
Anomaly detection is typically posited as an unsupervised learning task in
the literature due to the prohibitive cost and difficulty to obtain large-scale
labeled anomaly data, but this ignores the fact that a very small number
(e.g.,, a few dozens) of labeled anomalies can often be made available with
small/trivial cost in many real-world anomaly detection applications. To
leverage such labeled anomaly data, we study an important anomaly detection
problem termed weakly-supervised anomaly detection, in which, in addition to a
large amount of unlabeled data, a limited number of labeled anomalies are
available during modeling. Learning with the small labeled anomaly data enables
anomaly-informed modeling, which helps identify anomalies of interest and
address the notorious high false positives in unsupervised anomaly detection.
However, the problem is especially challenging, since (i) the limited amount of
labeled anomaly data often, if not always, cannot cover all types of anomalies
and (ii) the unlabeled data is often dominated by normal instances but has
anomaly contamination. We address the problem by formulating it as a pairwise
relation prediction task. Particularly, our approach defines a two-stream
ordinal regression neural network to learn the relation of randomly sampled
instance pairs, i.e., whether the instance pair contains two labeled anomalies,
one labeled anomaly, or just unlabeled data instances. The resulting model
effectively leverages both the labeled and unlabeled data to substantially
augment the training data and learn well-generalized representations of
normality and abnormality. Comprehensive empirical results on 40 real-world
datasets show that our approach (i) significantly outperforms four
state-of-the-art methods in detecting both of the known and previously unseen
anomalies and (ii) is substantially more data-efficient.Comment: Theoretical results are refined and extended. Significant more
empirical results are added, including results on detecting previously
unknown anomalie
AnoOnly: Semi-Supervised Anomaly Detection without Loss on Normal Data
Semi-supervised anomaly detection (SSAD) methods have demonstrated their
effectiveness in enhancing unsupervised anomaly detection (UAD) by leveraging
few-shot but instructive abnormal instances. However, the dominance of
homogeneous normal data over anomalies biases the SSAD models against
effectively perceiving anomalies. To address this issue and achieve balanced
supervision between heavily imbalanced normal and abnormal data, we develop a
novel framework called AnoOnly (Anomaly Only). Unlike existing SSAD methods
that resort to strict loss supervision, AnoOnly suspends it and introduces a
form of weak supervision for normal data. This weak supervision is instantiated
through the utilization of batch normalization, which implicitly performs
cluster learning on normal data. When integrated into existing SSAD methods,
the proposed AnoOnly demonstrates remarkable performance enhancements across
various models and datasets, achieving new state-of-the-art performance.
Additionally, our AnoOnly is natively robust to label noise when suffering from
data contamination. Our code is publicly available at
https://github.com/cool-xuan/AnoOnly.Comment: Under review for NeurIPS202
Exploiting CLIP for Zero-shot HOI Detection Requires Knowledge Distillation at Multiple Levels
In this paper, we investigate the task of zero-shot human-object interaction
(HOI) detection, a novel paradigm for identifying HOIs without the need for
task-specific annotations. To address this challenging task, we employ CLIP, a
large-scale pre-trained vision-language model (VLM), for knowledge distillation
on multiple levels. Specifically, we design a multi-branch neural network that
leverages CLIP for learning HOI representations at various levels, including
global images, local union regions encompassing human-object pairs, and
individual instances of humans or objects. To train our model, CLIP is utilized
to generate HOI scores for both global images and local union regions that
serve as supervision signals. The extensive experiments demonstrate the
effectiveness of our novel multi-level CLIP knowledge integration strategy.
Notably, the model achieves strong performance, which is even comparable with
some fully-supervised and weakly-supervised methods on the public HICO-DET
benchmark
NNG-Mix: Improving Semi-supervised Anomaly Detection with Pseudo-anomaly Generation
Anomaly detection (AD) is essential in identifying rare and often critical
events in complex systems, finding applications in fields such as network
intrusion detection, financial fraud detection, and fault detection in
infrastructure and industrial systems. While AD is typically treated as an
unsupervised learning task due to the high cost of label annotation, it is more
practical to assume access to a small set of labeled anomaly samples from
domain experts, as is the case for semi-supervised anomaly detection.
Semi-supervised and supervised approaches can leverage such labeled data,
resulting in improved performance. In this paper, rather than proposing a new
semi-supervised or supervised approach for AD, we introduce a novel algorithm
for generating additional pseudo-anomalies on the basis of the limited labeled
anomalies and a large volume of unlabeled data. This serves as an augmentation
to facilitate the detection of new anomalies. Our proposed algorithm, named
Nearest Neighbor Gaussian Mixup (NNG-Mix), efficiently integrates information
from both labeled and unlabeled data to generate pseudo-anomalies. We compare
the performance of this novel algorithm with commonly applied augmentation
techniques, such as Mixup and Cutout. We evaluate NNG-Mix by training various
existing semi-supervised and supervised anomaly detection algorithms on the
original training data along with the generated pseudo-anomalies. Through
extensive experiments on 57 benchmark datasets in ADBench, reflecting different
data types, we demonstrate that NNG-Mix outperforms other data augmentation
methods. It yields significant performance improvements compared to the
baselines trained exclusively on the original training data. Notably, NNG-Mix
yields up to 16.4%, 8.8%, and 8.0% improvements on Classical, CV, and NLP
datasets in ADBench. Our source code will be available at
https://github.com/donghao51/NNG-Mix
Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model
Video anomaly detection (VAD) has been paid increasing attention due to its
potential applications, its current dominant tasks focus on online detecting
anomalies% at the frame level, which can be roughly interpreted as the binary
or multiple event classification. However, such a setup that builds
relationships between complicated anomalous events and single labels, e.g.,
``vandalism'', is superficial, since single labels are deficient to
characterize anomalous events. In reality, users tend to search a specific
video rather than a series of approximate videos. Therefore, retrieving
anomalous events using detailed descriptions is practical and positive but few
researches focus on this. In this context, we propose a novel task called Video
Anomaly Retrieval (VAR), which aims to pragmatically retrieve relevant
anomalous videos by cross-modalities, e.g., language descriptions and
synchronous audios. Unlike the current video retrieval where videos are assumed
to be temporally well-trimmed with short duration, VAR is devised to retrieve
long untrimmed videos which may be partially relevant to the given query. To
achieve this, we present two large-scale VAR benchmarks, UCFCrime-AR and
XDViolence-AR, constructed on top of prevalent anomaly datasets. Meanwhile, we
design a model called Anomaly-Led Alignment Network (ALAN) for VAR. In ALAN, we
propose an anomaly-led sampling to focus on key segments in long untrimmed
videos. Then, we introduce an efficient pretext task to enhance semantic
associations between video-text fine-grained representations. Besides, we
leverage two complementary alignments to further match cross-modal contents.
Experimental results on two benchmarks reveal the challenges of VAR task and
also demonstrate the advantages of our tailored method.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Anomaly Crossing: New Horizons for Video Anomaly Detection as Cross-domain Few-shot Learning
Video anomaly detection aims to identify abnormal events that occurred in
videos. Since anomalous events are relatively rare, it is not feasible to
collect a balanced dataset and train a binary classifier to solve the task.
Thus, most previous approaches learn only from normal videos using unsupervised
or semi-supervised methods. Obviously, they are limited in capturing and
utilizing discriminative abnormal characteristics, which leads to compromised
anomaly detection performance. In this paper, to address this issue, we propose
a new learning paradigm by making full use of both normal and abnormal videos
for video anomaly detection. In particular, we formulate a new learning task:
cross-domain few-shot anomaly detection, which can transfer knowledge learned
from numerous videos in the source domain to help solve few-shot abnormality
detection in the target domain. Concretely, we leverage self-supervised
training on the target normal videos to reduce the domain gap and devise a meta
context perception module to explore the video context of the event in the
few-shot setting. Our experiments show that our method significantly
outperforms baseline methods on DoTA and UCF-Crime datasets, and the new task
contributes to a more practical training paradigm for anomaly detection
- …