3 research outputs found
Weakly-supervised Compositional FeatureAggregation for Few-shot Recognition
Learning from a few examples is a challenging task for machine learning.
While recent progress has been made for this problem, most of the existing
methods ignore the compositionality in visual concept representation (e.g.
objects are built from parts or composed of semantic attributes), which is key
to the human ability to easily learn from a small number of examples. To
enhance the few-shot learning models with compositionality, in this paper we
present the simple yet powerful Compositional Feature Aggregation (CFA) module
as a weakly-supervised regularization for deep networks. Given the deep feature
maps extracted from the input, our CFA module first disentangles the feature
space into disjoint semantic subspaces that model different attributes, and
then bilinearly aggregates the local features within each of these subspaces.
CFA explicitly regularizes the representation with both semantic and spatial
compositionality to produce discriminative representations for few-shot
recognition tasks. Moreover, our method does not need any supervision for
attributes and object parts during training, thus can be conveniently plugged
into existing models for end-to-end optimization while keeping the model size
and computation cost nearly the same. Extensive experiments on few-shot image
classification and action recognition tasks demonstrate that our method
provides substantial improvements over recent state-of-the-art methods
Metric-Based Few-Shot Learning for Video Action Recognition
In the few-shot scenario, a learner must effectively generalize to unseen
classes given a small support set of labeled examples. While a relatively large
amount of research has gone into few-shot learning for image classification,
little work has been done on few-shot video classification. In this work, we
address the task of few-shot video action recognition with a set of two-stream
models. We evaluate the performance of a set of convolutional and recurrent
neural network video encoder architectures used in conjunction with three
popular metric-based few-shot algorithms. We train and evaluate using a
few-shot split of the Kinetics 600 dataset. Our experiments confirm the
importance of the two-stream setup, and find prototypical networks and pooled
long short-term memory network embeddings to give the best performance as
few-shot method and video encoder, respectively. For a 5-shot 5-way task, this
setup obtains 84.2% accuracy on the test set and 59.4% on a special "challenge"
test set, composed of highly confusable classes
Prototype Completion with Primitive Knowledge for Few-Shot Learning
Few-shot learning is a challenging task, which aims to learn a classifier for
novel classes with few examples. Pre-training based meta-learning methods
effectively tackle the problem by pre-training a feature extractor and then
fine-tuning it through the nearest centroid based meta-learning. However,
results show that the fine-tuning step makes very marginal improvements. In
this paper, 1) we figure out the key reason, i.e., in the pre-trained feature
space, the base classes already form compact clusters while novel classes
spread as groups with large variances, which implies that fine-tuning the
feature extractor is less meaningful; 2) instead of fine-tuning the feature
extractor, we focus on estimating more representative prototypes during
meta-learning. Consequently, we propose a novel prototype completion based
meta-learning framework. This framework first introduces primitive knowledge
(i.e., class-level part or attribute annotations) and extracts representative
attribute features as priors. Then, we design a prototype completion network to
learn to complete prototypes with these priors. To avoid the prototype
completion error caused by primitive knowledge noises or class differences, we
further develop a Gaussian based prototype fusion strategy that combines the
mean-based and completed prototypes by exploiting the unlabeled samples.
Extensive experiments show that our method: (i) can obtain more accurate
prototypes; (ii) outperforms state-of-the-art techniques by 2% - 9% in terms of
classification accuracy. Our code is available online.Comment: Accepted by CVPR202