3 research outputs found

    Weakly-supervised Compositional FeatureAggregation for Few-shot Recognition

    Full text link
    Learning from a few examples is a challenging task for machine learning. While recent progress has been made for this problem, most of the existing methods ignore the compositionality in visual concept representation (e.g. objects are built from parts or composed of semantic attributes), which is key to the human ability to easily learn from a small number of examples. To enhance the few-shot learning models with compositionality, in this paper we present the simple yet powerful Compositional Feature Aggregation (CFA) module as a weakly-supervised regularization for deep networks. Given the deep feature maps extracted from the input, our CFA module first disentangles the feature space into disjoint semantic subspaces that model different attributes, and then bilinearly aggregates the local features within each of these subspaces. CFA explicitly regularizes the representation with both semantic and spatial compositionality to produce discriminative representations for few-shot recognition tasks. Moreover, our method does not need any supervision for attributes and object parts during training, thus can be conveniently plugged into existing models for end-to-end optimization while keeping the model size and computation cost nearly the same. Extensive experiments on few-shot image classification and action recognition tasks demonstrate that our method provides substantial improvements over recent state-of-the-art methods

    Metric-Based Few-Shot Learning for Video Action Recognition

    Full text link
    In the few-shot scenario, a learner must effectively generalize to unseen classes given a small support set of labeled examples. While a relatively large amount of research has gone into few-shot learning for image classification, little work has been done on few-shot video classification. In this work, we address the task of few-shot video action recognition with a set of two-stream models. We evaluate the performance of a set of convolutional and recurrent neural network video encoder architectures used in conjunction with three popular metric-based few-shot algorithms. We train and evaluate using a few-shot split of the Kinetics 600 dataset. Our experiments confirm the importance of the two-stream setup, and find prototypical networks and pooled long short-term memory network embeddings to give the best performance as few-shot method and video encoder, respectively. For a 5-shot 5-way task, this setup obtains 84.2% accuracy on the test set and 59.4% on a special "challenge" test set, composed of highly confusable classes

    Prototype Completion with Primitive Knowledge for Few-Shot Learning

    Full text link
    Few-shot learning is a challenging task, which aims to learn a classifier for novel classes with few examples. Pre-training based meta-learning methods effectively tackle the problem by pre-training a feature extractor and then fine-tuning it through the nearest centroid based meta-learning. However, results show that the fine-tuning step makes very marginal improvements. In this paper, 1) we figure out the key reason, i.e., in the pre-trained feature space, the base classes already form compact clusters while novel classes spread as groups with large variances, which implies that fine-tuning the feature extractor is less meaningful; 2) instead of fine-tuning the feature extractor, we focus on estimating more representative prototypes during meta-learning. Consequently, we propose a novel prototype completion based meta-learning framework. This framework first introduces primitive knowledge (i.e., class-level part or attribute annotations) and extracts representative attribute features as priors. Then, we design a prototype completion network to learn to complete prototypes with these priors. To avoid the prototype completion error caused by primitive knowledge noises or class differences, we further develop a Gaussian based prototype fusion strategy that combines the mean-based and completed prototypes by exploiting the unlabeled samples. Extensive experiments show that our method: (i) can obtain more accurate prototypes; (ii) outperforms state-of-the-art techniques by 2% - 9% in terms of classification accuracy. Our code is available online.Comment: Accepted by CVPR202