While many FSCIL studies have been undertaken, achieving satisfactory
performance, especially during incremental sessions, has remained challenging.
One prominent challenge is that the encoder, trained with an ample base session
training set, often underperforms in incremental sessions. In this study, we
introduce a novel training framework for FSCIL, capitalizing on the
generalizability of the Contrastive Language-Image Pre-training (CLIP) model to
unseen classes. We achieve this by formulating image-object-specific (IOS)
classifiers for the input images. Here, an IOS classifier refers to one that
targets specific attributes (like wings or wheels) of class objects rather than
the image's background. To create these IOS classifiers, we encode a bias
prompt into the classifiers using our specially designed module, which
harnesses key-prompt pairs to pinpoint the IOS features of classes in each
session. From an FSCIL standpoint, our framework is structured to retain
previous knowledge and swiftly adapt to new sessions without forgetting or
overfitting. This considers the updatability of modules in each session and
some tricks empirically found for fast convergence. Our approach consistently
demonstrates superior performance compared to state-of-the-art methods across
the miniImageNet, CIFAR100, and CUB200 datasets. Further, we provide additional
experiments to validate our learned model's ability to achieve IOS classifiers.
We also conduct ablation studies to analyze the impact of each module within
the architecture.Comment: 8 pages, 4 figures, 4 table