Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning

Gong, Liyu; Jacobs, Nathan; Sastry, Srikumar; Stylianou, Abby; Xing, Xin; Xiong, Zhexiao

Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning

Authors: Liyu Gong
Nathan Jacobs
Srikumar Sastry
Abby Stylianou
Xin Xing
Zhexiao Xiong
Publication date: 24 October 2023
Publisher

Abstract

This paper presents a novel approach to Single-Positive Multi-label Learning. In general multi-label learning, a model learns to predict multiple labels or categories for a single input image. This is in contrast with standard multi-class image classification, where the task is predicting a single label from many possible labels for an image. Single-Positive Multi-label Learning (SPML) specifically considers learning to predict multiple labels when there is only a single annotation per image in the training data. Multi-label learning is in many ways a more realistic task than single-label learning as real-world data often involves instances belonging to multiple categories simultaneously; however, most common computer vision datasets predominantly contain single labels due to the inherent complexity and cost of collecting multiple high quality annotations for each instance. We propose a novel approach called Vision-Language Pseudo-Labeling (VLPL), which uses a vision-language model to suggest strong positive and negative pseudo-labels, and outperforms the current SOTA methods by 5.5% on Pascal VOC, 18.4% on MS-COCO, 15.2% on NUS-WIDE, and 8.4% on CUB-Birds. Our code and data are available at https://github.com/mvrl/VLPL

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.15985

Last time updated on 16/01/2024