3,600 research outputs found
Enhancing Few-shot Image Classification with Cosine Transformer
This paper addresses the few-shot image classification problem, where the
classification task is performed on unlabeled query samples given a small
amount of labeled support samples only. One major challenge of the few-shot
learning problem is the large variety of object visual appearances that
prevents the support samples to represent that object comprehensively. This
might result in a significant difference between support and query samples,
therefore undermining the performance of few-shot algorithms. In this paper, we
tackle the problem by proposing Few-shot Cosine Transformer (FS-CT), where the
relational map between supports and queries is effectively obtained for the
few-shot tasks. The FS-CT consists of two parts, a learnable prototypical
embedding network to obtain categorical representations from support samples
with hard cases, and a transformer encoder to effectively achieve the
relational map from two different support and query samples. We introduce
Cosine Attention, a more robust and stable attention module that enhances the
transformer module significantly and therefore improves FS-CT performance from
5% to over 20% in accuracy compared to the default scaled dot-product
mechanism. Our method performs competitive results in mini-ImageNet, CUB-200,
and CIFAR-FS on 1-shot learning and 5-shot learning tasks across backbones and
few-shot configurations. We also developed a custom few-shot dataset for Yoga
pose recognition to demonstrate the potential of our algorithm for practical
application. Our FS-CT with cosine attention is a lightweight, simple few-shot
algorithm that can be applied for a wide range of applications, such as
healthcare, medical, and security surveillance. The official implementation
code of our Few-shot Cosine Transformer is available at
https://github.com/vinuni-vishc/Few-Shot-Cosine-Transforme
Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models
Recently, pre-trained point cloud models have found extensive applications in
downstream tasks like object classification. However, these tasks often require
{full fine-tuning} of models and lead to storage-intensive procedures, thus
limiting the real applications of pre-trained models. Inspired by the great
success of visual prompt tuning (VPT) in vision, we attempt to explore prompt
tuning, which serves as an efficient alternative to full fine-tuning for
large-scale models, to point cloud pre-trained models to reduce storage costs.
However, it is non-trivial to apply the traditional static VPT to point clouds,
owing to the distribution diversity of point cloud data. For instance, the
scanned point clouds exhibit various types of missing or noisy points. To
address this issue, we propose an Instance-aware Dynamic Prompt Tuning (IDPT)
for point cloud pre-trained models, which utilizes a prompt module to perceive
the semantic prior features of each instance. This semantic prior facilitates
the learning of unique prompts for each instance, thus enabling downstream
tasks to robustly adapt to pre-trained point cloud models. Notably, extensive
experiments conducted on downstream tasks demonstrate that IDPT outperforms
full fine-tuning in most tasks with a mere 7\% of the trainable parameters,
thus significantly reducing the storage pressure. Code is available at
\url{https://github.com/zyh16143998882/IDPT}
- …