Anomaly detection (AD) aims to identify defective images and localize their
defects (if any). Ideally, AD models should be able to detect defects over many
image classes; without relying on hard-coded class names that can be
uninformative or inconsistent across datasets; learn without anomaly
supervision; and be robust to the long-tailed distributions of real-world
applications. To address these challenges, we formulate the problem of
long-tailed AD by introducing several datasets with different levels of class
imbalance and metrics for performance evaluation. We then propose a novel
method, LTAD, to detect defects from multiple and long-tailed classes, without
relying on dataset class names. LTAD combines AD by reconstruction and semantic
AD modules. AD by reconstruction is implemented with a transformer-based
reconstruction module. Semantic AD is implemented with a binary classifier,
which relies on learned pseudo class names and a pretrained foundation model.
These modules are learned over two phases. Phase 1 learns the pseudo-class
names and a variational autoencoder (VAE) for feature synthesis that augments
the training data to combat long-tails. Phase 2 then learns the parameters of
the reconstruction and classification modules of LTAD. Extensive experiments
using the proposed long-tailed datasets show that LTAD substantially
outperforms the state-of-the-art methods for most forms of dataset imbalance.
The long-tailed dataset split is available at
https://zenodo.org/records/10854201 .Comment: This paper is accepted to CVPR 2024. The supplementary material is
included. The long-tailed dataset split is available at
https://zenodo.org/records/1085420