93,937 research outputs found
A Closer Look at Few-shot Classification Again
Few-shot classification consists of a training phase where a model is learned
on a relatively large dataset and an adaptation phase where the learned model
is adapted to previously-unseen tasks with limited labeled samples. In this
paper, we empirically prove that the training algorithm and the adaptation
algorithm can be completely disentangled, which allows algorithm analysis and
design to be done individually for each phase. Our meta-analysis for each phase
reveals several interesting insights that may help better understand key
aspects of few-shot classification and connections with other fields such as
visual representation learning and transfer learning. We hope the insights and
research challenges revealed in this paper can inspire future work in related
directions. Code and pre-trained models (in PyTorch) are available at
https://github.com/Frankluox/CloserLookAgainFewShot.Comment: Accepted at ICML 202
A Closer Look at Few-Shot 3D Point Cloud Classification
In recent years, research on few-shot learning (FSL) has been fast-growing in
the 2D image domain due to the less requirement for labeled training data and
greater generalization for novel classes. However, its application in 3D point
cloud data is relatively under-explored. Not only need to distinguish unseen
classes as in the 2D domain, 3D FSL is more challenging in terms of irregular
structures, subtle inter-class differences, and high intra-class variances
{when trained on a low number of data.} Moreover, different architectures and
learning algorithms make it difficult to study the effectiveness of existing 2D
FSL algorithms when migrating to the 3D domain. In this work, for the first
time, we perform systematic and extensive investigations of directly applying
recent 2D FSL works to 3D point cloud related backbone networks and thus
suggest a strong learning baseline for few-shot 3D point cloud classification.
Furthermore, we propose a new network, Point-cloud Correlation Interaction
(PCIA), with three novel plug-and-play components called Salient-Part Fusion
(SPF) module, Self-Channel Interaction Plus (SCI+) module, and Cross-Instance
Fusion Plus (CIF+) module to obtain more representative embeddings and improve
the feature distinction. These modules can be inserted into most FSL algorithms
with minor changes and significantly improve the performance. Experimental
results on three benchmark datasets, ModelNet40-FS, ShapeNet70-FS, and
ScanObjectNN-FS, demonstrate that our method achieves state-of-the-art
performance for the 3D FSL task. Code and datasets are available at
https://github.com/cgye96/A_Closer_Look_At_3DFSL.Comment: Accepted by IJCV 202
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications
Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classification. Our training procedure builds on insights from recent video classification literature and uses a trainable 3D CNN to learn the visual features. This is in contrast to previous video ZSL methods, which use pretrained feature extractors. We also extend the current benchmarking paradigm: Previous techniques aim to make the test task unknown at training time but fall short of this goal. We encourage domain shift across training and test data and disallow tailoring a ZSL model to a specific test dataset. We outperform the state-of-the-art by a wide margin. Our code, evaluation procedure and model weights are available at this http URL
Labeling the Features Not the Samples: Efficient Video Classification with Minimal Supervision
Feature selection is essential for effective visual recognition. We propose
an efficient joint classifier learning and feature selection method that
discovers sparse, compact representations of input features from a vast sea of
candidates, with an almost unsupervised formulation. Our method requires only
the following knowledge, which we call the \emph{feature sign}---whether or not
a particular feature has on average stronger values over positive samples than
over negatives. We show how this can be estimated using as few as a single
labeled training sample per class. Then, using these feature signs, we extend
an initial supervised learning problem into an (almost) unsupervised clustering
formulation that can incorporate new data without requiring ground truth
labels. Our method works both as a feature selection mechanism and as a fully
competitive classifier. It has important properties, low computational cost and
excellent accuracy, especially in difficult cases of very limited training
data. We experiment on large-scale recognition in video and show superior speed
and performance to established feature selection approaches such as AdaBoost,
Lasso, greedy forward-backward selection, and powerful classifiers such as SVM.Comment: arXiv admin note: text overlap with arXiv:1411.771
- …