Despite achieving impressive progress, current multi-label image recognition
(MLR) algorithms heavily depend on large-scale datasets with complete labels,
making collecting large-scale datasets extremely time-consuming and
labor-intensive. Training the multi-label image recognition models with partial
labels (MLR-PL) is an alternative way, in which merely some labels are known
while others are unknown for each image. However, current MLP-PL algorithms
rely on pre-trained image similarity models or iteratively updating the image
classification models to generate pseudo labels for the unknown labels. Thus,
they depend on a certain amount of annotations and inevitably suffer from
obvious performance drops, especially when the known label proportion is low.
To address this dilemma, we propose a dual-perspective semantic-aware
representation blending (DSRB) that blends multi-granularity category-specific
semantic representation across different images, from instance and prototype
perspective respectively, to transfer information of known labels to complement
unknown labels. Specifically, an instance-perspective representation blending
(IPRB) module is designed to blend the representations of the known labels in
an image with the representations of the corresponding unknown labels in
another image to complement these unknown labels. Meanwhile, a
prototype-perspective representation blending (PPRB) module is introduced to
learn more stable representation prototypes for each category and blends the
representation of unknown labels with the prototypes of corresponding labels,
in a location-sensitive manner, to complement these unknown labels. Extensive
experiments on the MS-COCO, Visual Genome, and Pascal VOC 2007 datasets show
that the proposed DSRB consistently outperforms current state-of-the-art
algorithms on all known label proportion settings.Comment: Technical Report. arXiv admin note: text overlap with
arXiv:2203.0217