1,039 research outputs found
PARTICLE: Part Discovery and Contrastive Learning for Fine-grained Recognition
We develop techniques for refining representations for fine-grained
classification and segmentation tasks in a self-supervised manner. We find that
fine-tuning methods based on instance-discriminative contrastive learning are
not as effective, and posit that recognizing part-specific variations is
crucial for fine-grained categorization. We present an iterative learning
approach that incorporates part-centric equivariance and invariance objectives.
First, pixel representations are clustered to discover parts. We analyze the
representations from convolutional and vision transformer networks that are
best suited for this task. Then, a part-centric learning step aggregates and
contrasts representations of parts within an image. We show that this improves
the performance on image classification and part segmentation tasks across
datasets. For example, under a linear-evaluation scheme, the classification
accuracy of a ResNet50 trained on ImageNet using DetCon, a self-supervised
learning approach, improves from 35.4% to 42.0% on the Caltech-UCSD Birds, from
35.5% to 44.1% on the FGVC Aircraft, and from 29.7% to 37.4% on the Stanford
Cars. We also observe significant gains in few-shot part segmentation tasks
using the proposed technique, while instance-discriminative learning was not as
effective. Smaller, yet consistent, improvements are also observed for stronger
networks based on transformers
- …