Zero-Shot In-Distribution Detection in Multi-Object Settings Using
  Vision-Language Foundation Models

Aizawa, Kiyoharu; Irie, Go; Miyai, Atsuyuki; Yu, Qing

Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

Authors: Kiyoharu Aizawa
Go Irie
Atsuyuki Miyai
Qing Yu
Publication date: 10 April 2023
Publisher

Abstract

Removing out-of-distribution (OOD) images from noisy images scraped from the Internet is an important preprocessing for constructing datasets, which can be addressed by zero-shot OOD detection with vision language foundation models (CLIP). The existing zero-shot OOD detection setting does not consider the realistic case where an image has both in-distribution (ID) objects and OOD objects. However, it is important to identify such images as ID images when collecting the images of rare classes or ethically inappropriate classes that must not be missed. In this paper, we propose a novel problem setting called in-distribution (ID) detection, where we identify images containing ID objects as ID images, even if they contain OOD objects, and images lacking ID objects as OOD images. To solve this problem, we present a new approach, \textbf{G}lobal-\textbf{L}ocal \textbf{M}aximum \textbf{C}oncept \textbf{M}atching (GL-MCM), based on both global and local visual-text alignments of CLIP features, which can identify any image containing ID objects as ID images. Extensive experiments demonstrate that GL-MCM outperforms comparison methods on both multi-object datasets and single-object ImageNet benchmarks

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2304.04521

Last time updated on 14/04/2023