1,593 research outputs found
MetaNODE: Prototype Optimization as a Neural ODE for Few-Shot Learning
Few-Shot Learning (FSL) is a challenging task, \emph{i.e.}, how to recognize
novel classes with few examples? Pre-training based methods effectively tackle
the problem by pre-training a feature extractor and then predicting novel
classes via a cosine nearest neighbor classifier with mean-based prototypes.
Nevertheless, due to the data scarcity, the mean-based prototypes are usually
biased. In this paper, we attempt to diminish the prototype bias by regarding
it as a prototype optimization problem. To this end, we propose a novel
meta-learning based prototype optimization framework to rectify prototypes,
\emph{i.e.}, introducing a meta-optimizer to optimize prototypes. Although the
existing meta-optimizers can also be adapted to our framework, they all
overlook a crucial gradient bias issue, \emph{i.e.}, the mean-based gradient
estimation is also biased on sparse data. To address the issue, we regard the
gradient and its flow as meta-knowledge and then propose a novel Neural
Ordinary Differential Equation (ODE)-based meta-optimizer to polish prototypes,
called MetaNODE. In this meta-optimizer, we first view the mean-based
prototypes as initial prototypes, and then model the process of prototype
optimization as continuous-time dynamics specified by a Neural ODE. A gradient
flow inference network is carefully designed to learn to estimate the
continuous gradient flow for prototype dynamics. Finally, the optimal
prototypes can be obtained by solving the Neural ODE. Extensive experiments on
miniImagenet, tieredImagenet, and CUB-200-2011 show the effectiveness of our
method.Comment: Accepted by AAAI 202
Boosting Few-shot 3D Point Cloud Segmentation via Query-Guided Enhancement
Although extensive research has been conducted on 3D point cloud
segmentation, effectively adapting generic models to novel categories remains a
formidable challenge. This paper proposes a novel approach to improve point
cloud few-shot segmentation (PC-FSS) models. Unlike existing PC-FSS methods
that directly utilize categorical information from support prototypes to
recognize novel classes in query samples, our method identifies two critical
aspects that substantially enhance model performance by reducing contextual
gaps between support prototypes and query features. Specifically, we (1) adapt
support background prototypes to match query context while removing extraneous
cues that may obscure foreground and background in query samples, and (2)
holistically rectify support prototypes under the guidance of query features to
emulate the latter having no semantic gap to the query targets. Our proposed
designs are agnostic to the feature extractor, rendering them readily
applicable to any prototype-based methods. The experimental results on S3DIS
and ScanNet demonstrate notable practical benefits, as our approach achieves
significant improvements while still maintaining high efficiency. The code for
our approach is available at
https://github.com/AaronNZH/Boosting-Few-shot-3D-Point-Cloud-Segmentation-via-Query-Guided-EnhancementComment: Accepted to ACM MM 202
AdLER: Adversarial Training with Label Error Rectification for One-Shot Medical Image Segmentation
Accurate automatic segmentation of medical images typically requires large
datasets with high-quality annotations, making it less applicable in clinical
settings due to limited training data. One-shot segmentation based on learned
transformations (OSSLT) has shown promise when labeled data is extremely
limited, typically including unsupervised deformable registration, data
augmentation with learned registration, and segmentation learned from augmented
data. However, current one-shot segmentation methods are challenged by limited
data diversity during augmentation, and potential label errors caused by
imperfect registration. To address these issues, we propose a novel one-shot
medical image segmentation method with adversarial training and label error
rectification (AdLER), with the aim of improving the diversity of generated
data and correcting label errors to enhance segmentation performance.
Specifically, we implement a novel dual consistency constraint to ensure
anatomy-aligned registration that lessens registration errors. Furthermore, we
develop an adversarial training strategy to augment the atlas image, which
ensures both generation diversity and segmentation robustness. We also propose
to rectify potential label errors in the augmented atlas images by estimating
segmentation uncertainty, which can compensate for the imperfect nature of
deformable registration and improve segmentation authenticity. Experiments on
the CANDI and ABIDE datasets demonstrate that the proposed AdLER outperforms
previous state-of-the-art methods by 0.7% (CANDI), 3.6% (ABIDE "seen"), and
4.9% (ABIDE "unseen") in segmentation based on Dice scores, respectively. The
source code will be available at https://github.com/hsiangyuzhao/AdLER
Kernel Relative-prototype Spectral Filtering for Few-shot Learning
Few-shot learning performs classification tasks and regression tasks on
scarce samples. As one of the most representative few-shot learning models,
Prototypical Network represents each class as sample average, or a prototype,
and measures the similarity of samples and prototypes by Euclidean distance. In
this paper, we propose a framework of spectral filtering (shrinkage) for
measuring the difference between query samples and prototypes, or namely the
relative prototypes, in a reproducing kernel Hilbert space (RKHS). In this
framework, we further propose a method utilizing Tikhonov regularization as the
filter function for few-shot classification. We conduct several experiments to
verify our method utilizing different kernels based on the miniImageNet
dataset, tiered-ImageNet dataset and CIFAR-FS dataset. The experimental results
show that the proposed model can perform the state-of-the-art. In addition, the
experimental results show that the proposed shrinkage method can boost the
performance. Source code is available at https://github.com/zhangtao2022/DSFN
Continual Event Extraction with Semantic Confusion Rectification
We study continual event extraction, which aims to extract incessantly
emerging event information while avoiding forgetting. We observe that the
semantic confusion on event types stems from the annotations of the same text
being updated over time. The imbalance between event types even aggravates this
issue. This paper proposes a novel continual event extraction model with
semantic confusion rectification. We mark pseudo labels for each sentence to
alleviate semantic confusion. We transfer pivotal knowledge between current and
previous models to enhance the understanding of event types. Moreover, we
encourage the model to focus on the semantics of long-tailed event types by
leveraging other associated types. Experimental results show that our model
outperforms state-of-the-art baselines and is proficient in imbalanced
datasets.Comment: Accepted in the 2023 Conference on Empirical Methods in Natural
Language Processing (EMNLP 2023
Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks
To reduce the reliance on large-scale datasets, recent works in 3D
segmentation resort to few-shot learning. Current 3D few-shot semantic
segmentation methods first pre-train the models on `seen' classes, and then
evaluate their generalization performance on `unseen' classes. However, the
prior pre-training stage not only introduces excessive time overhead, but also
incurs a significant domain gap on `unseen' classes. To tackle these issues, we
propose an efficient Training-free Few-shot 3D Segmentation netwrok, TFS3D, and
a further training-based variant, TFS3D-T. Without any learnable parameters,
TFS3D extracts dense representations by trigonometric positional encodings, and
achieves comparable performance to previous training-based methods. Due to the
elimination of pre-training, TFS3D can alleviate the domain gap issue and save
a substantial amount of time. Building upon TFS3D, TFS3D-T only requires to
train a lightweight query-support transferring attention (QUEST), which
enhances the interaction between the few-shot query and support data.
Experiments demonstrate TFS3D-T improves previous state-of-the-art methods by
+6.93% and +17.96% mIoU respectively on S3DIS and ScanNet, while reducing the
training time by -90%, indicating superior effectiveness and efficiency.Comment: Code is available at https://github.com/yangyangyang127/TFS3
CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition
Zero-shot action recognition is the task of recognizing action classes
without visual examples, only with a semantic embedding which relates unseen to
seen classes. The problem can be seen as learning a function which generalizes
well to instances of unseen classes without losing discrimination between
classes. Neural networks can model the complex boundaries between visual
classes, which explains their success as supervised models. However, in
zero-shot learning, these highly specialized class boundaries may not transfer
well from seen to unseen classes. In this paper, we propose a clustering-based
model, which considers all training samples at once, instead of optimizing for
each instance individually. We optimize the clustering using Reinforcement
Learning which we show is critical for our approach to work. We call the
proposed method CLASTER and observe that it consistently improves over the
state-of-the-art in all standard datasets, UCF101, HMDB51, and Olympic Sports;
both in the standard zero-shot evaluation and the generalized zero-shot
learning
- …