1 research outputs found
Vision-language Assisted Attribute Learning
Attribute labeling at large scale is typically incomplete and partial, posing
significant challenges to model optimization. Existing attribute learning
methods often treat the missing labels as negative or simply ignore them all
during training, either of which could hamper the model performance to a great
extent. To overcome these limitations, in this paper we leverage the available
vision-language knowledge to explicitly disclose the missing labels for
enhancing model learning. Given an image, we predict the likelihood of each
missing attribute label assisted by an off-the-shelf vision-language model, and
randomly select to ignore those with high scores in training. Our strategy
strikes a good balance between fully ignoring and negatifying the missing
labels, as these high scores are found to be informative on revealing label
ambiguity. Extensive experiments show that our proposed vision-language
assisted loss can achieve state-of-the-art performance on the newly cleaned VAW
dataset. Qualitative evaluation demonstrates the ability of the proposed method
in predicting more complete attributes.Comment: Accepted by IEEE IC-NIDC 202