3,951 research outputs found
Few-Shot Image Recognition by Predicting Parameters from Activations
In this paper, we are interested in the few-shot learning problem. In
particular, we focus on a challenging scenario where the number of categories
is large and the number of examples per novel category is very limited, e.g. 1,
2, or 3. Motivated by the close relationship between the parameters and the
activations in a neural network associated with the same category, we propose a
novel method that can adapt a pre-trained neural network to novel categories by
directly predicting the parameters from the activations. Zero training is
required in adaptation to novel categories, and fast inference is realized by a
single forward pass. We evaluate our method by doing few-shot image recognition
on the ImageNet dataset, which achieves the state-of-the-art classification
accuracy on novel categories by a significant margin while keeping comparable
performance on the large-scale categories. We also test our method on the
MiniImageNet dataset and it strongly outperforms the previous state-of-the-art
methods
Learning feed-forward one-shot learners
One-shot learning is usually tackled by using generative models or
discriminative embeddings. Discriminative methods based on deep learning, which
are very effective in other learning scenarios, are ill-suited for one-shot
learning as they need large amounts of training data. In this paper, we propose
a method to learn the parameters of a deep model in one shot. We construct the
learner as a second deep network, called a learnet, which predicts the
parameters of a pupil network from a single exemplar. In this manner we obtain
an efficient feed-forward one-shot learner, trained end-to-end by minimizing a
one-shot classification objective in a learning to learn formulation. In order
to make the construction feasible, we propose a number of factorizations of the
parameters of the pupil network. We demonstrate encouraging results by learning
characters from single exemplars in Omniglot, and by tracking visual objects
from a single initial exemplar in the Visual Object Tracking benchmark.Comment: The first three authors contributed equally, and are listed in
alphabetical orde
Low-Shot Learning with Imprinted Weights
Human vision is able to immediately recognize novel visual categories after
seeing just one or a few training examples. We describe how to add a similar
capability to ConvNet classifiers by directly setting the final layer weights
from novel training examples during low-shot learning. We call this process
weight imprinting as it directly sets weights for a new category based on an
appropriately scaled copy of the embedding layer activations for that training
example. The imprinting process provides a valuable complement to training with
stochastic gradient descent, as it provides immediate good classification
performance and an initialization for any further fine-tuning in the future. We
show how this imprinting process is related to proxy-based embeddings. However,
it differs in that only a single imprinted weight vector is learned for each
novel category, rather than relying on a nearest-neighbor distance to training
instances as typically used with embedding methods. Our experiments show that
using averaging of imprinted weights provides better generalization than using
nearest-neighbor instance embeddings.Comment: CVPR 201
LCNN: Lookup-based Convolutional Neural Network
Porting state of the art deep learning algorithms to resource constrained
compute platforms (e.g. VR, AR, wearables) is extremely challenging. We propose
a fast, compact, and accurate model for convolutional neural networks that
enables efficient learning and inference. We introduce LCNN, a lookup-based
convolutional neural network that encodes convolutions by few lookups to a
dictionary that is trained to cover the space of weights in CNNs. Training LCNN
involves jointly learning a dictionary and a small set of linear combinations.
The size of the dictionary naturally traces a spectrum of trade-offs between
efficiency and accuracy. Our experimental results on ImageNet challenge show
that LCNN can offer 3.2x speedup while achieving 55.1% top-1 accuracy using
AlexNet architecture. Our fastest LCNN offers 37.6x speed up over AlexNet while
maintaining 44.3% top-1 accuracy. LCNN not only offers dramatic speed ups at
inference, but it also enables efficient training. In this paper, we show the
benefits of LCNN in few-shot learning and few-iteration learning, two crucial
aspects of on-device training of deep learning models.Comment: CVPR 1
Learning Aligned Cross-Modal Representations from Weakly Aligned Data
People can recognize scenes across many different modalities beyond natural
images. In this paper, we investigate how to learn cross-modal scene
representations that transfer across modalities. To study this problem, we
introduce a new cross-modal scene dataset. While convolutional neural networks
can categorize cross-modal scenes well, they also learn an intermediate
representation not aligned across modalities, which is undesirable for
cross-modal transfer applications. We present methods to regularize cross-modal
convolutional neural networks so that they have a shared representation that is
agnostic of the modality. Our experiments suggest that our scene representation
can help transfer representations across modalities for retrieval. Moreover,
our visualizations suggest that units emerge in the shared representation that
tend to activate on consistent concepts independently of the modality.Comment: Conference paper at CVPR 201
- …