150 research outputs found
Gaussian Prototypical Networks for Few-Shot Learning on Omniglot
We propose a novel architecture for -shot classification on the Omniglot
dataset. Building on prototypical networks, we extend their architecture to
what we call Gaussian prototypical networks. Prototypical networks learn a map
between images and embedding vectors, and use their clustering for
classification. In our model, a part of the encoder output is interpreted as a
confidence region estimate about the embedding point, and expressed as a
Gaussian covariance matrix. Our network then constructs a direction and class
dependent distance metric on the embedding space, using uncertainties of
individual data points as weights. We show that Gaussian prototypical networks
are a preferred architecture over vanilla prototypical networks with an
equivalent number of parameters. We report state-of-the-art performance in
1-shot and 5-shot classification both in 5-way and 20-way regime (for 5-shot
5-way, we are comparable to previous state-of-the-art) on the Omniglot dataset.
We explore artificially down-sampling a fraction of images in the training set,
which improves our performance even further. We therefore hypothesize that
Gaussian prototypical networks might perform better in less homogeneous,
noisier datasets, which are commonplace in real world applications
Are Few-Shot Learning Benchmarks too Simple ? Solving them without Task Supervision at Test-Time
We show that several popular few-shot learning benchmarks can be solved with
varying degrees of success without using support set Labels at Test-time (LT).
To this end, we introduce a new baseline called Centroid Networks, a
modification of Prototypical Networks in which the support set labels are
hidden from the method at test-time and have to be recovered through
clustering. A benchmark that can be solved perfectly without LT does not
require proper task adaptation and is therefore inadequate for evaluating
few-shot methods. In practice, most benchmarks cannot be solved perfectly
without LT, but running our baseline on any new combinations of architectures
and datasets gives insights on the baseline performance to be expected from
leveraging a good representation, before any adaptation to the test-time
labels
Prototypical Networks for Few-shot Learning
We propose prototypical networks for the problem of few-shot classification,
where a classifier must generalize to new classes not seen in the training set,
given only a small number of examples of each new class. Prototypical networks
learn a metric space in which classification can be performed by computing
distances to prototype representations of each class. Compared to recent
approaches for few-shot learning, they reflect a simpler inductive bias that is
beneficial in this limited-data regime, and achieve excellent results. We
provide an analysis showing that some simple design decisions can yield
substantial improvements over recent approaches involving complicated
architectural choices and meta-learning. We further extend prototypical
networks to zero-shot learning and achieve state-of-the-art results on the
CU-Birds dataset
MxML: Mixture of Meta-Learners for Few-Shot Classification
A meta-model is trained on a distribution of similar tasks such that it
learns an algorithm that can quickly adapt to a novel task with only a handful
of labeled examples. Most of current meta-learning methods assume that the
meta-training set consists of relevant tasks sampled from a single
distribution. In practice, however, a new task is often out of the task
distribution, yielding a performance degradation. One way to tackle this
problem is to construct an ensemble of meta-learners such that each
meta-learner is trained on different task distribution. In this paper we
present a method for constructing a mixture of meta-learners (MxML), where
mixing parameters are determined by the weight prediction network (WPN)
optimized to improve the few-shot classification performance. Experiments on
various datasets demonstrate that MxML significantly outperforms
state-of-the-art meta-learners, or their naive ensemble in the case of
out-of-distribution as well as in-distribution tasks.Comment: 12 page
Data Augmentation Generative Adversarial Networks
Effective training of neural networks requires much data. In the low-data
regime, parameters are underdetermined, and learnt networks generalise poorly.
Data Augmentation alleviates this by using existing data more effectively.
However standard data augmentation produces only limited plausible alternative
data. Given there is potential to generate a much broader set of augmentations,
we design and train a generative model to do data augmentation. The model,
based on image conditional Generative Adversarial Networks, takes data from a
source domain and learns to take any data item and generalise it to generate
other within-class data items. As this generative process does not depend on
the classes themselves, it can be applied to novel unseen classes of data. We
show that a Data Augmentation Generative Adversarial Network (DAGAN) augments
standard vanilla classifiers well. We also show a DAGAN can enhance few-shot
learning systems such as Matching Networks. We demonstrate these approaches on
Omniglot, on EMNIST having learnt the DAGAN on Omniglot, and VGG-Face data. In
our experiments we can see over 13% increase in accuracy in the low-data regime
experiments in Omniglot (from 69% to 82%), EMNIST (73.9% to 76%) and VGG-Face
(4.5% to 12%); in Matching Networks for Omniglot we observe an increase of 0.5%
(from 96.9% to 97.4%) and an increase of 1.8% in EMNIST (from 59.5% to 61.3%).Comment: 10 page
One-Way Prototypical Networks
Few-shot models have become a popular topic of research in the past years.
They offer the possibility to determine class belongings for unseen examples
using just a handful of examples for each class. Such models are trained on a
wide range of classes and their respective examples, learning a decision metric
in the process. Types of few-shot models include matching networks and
prototypical networks. We show a new way of training prototypical few-shot
models for just a single class. These models have the ability to predict the
likelihood of an unseen query belonging to a group of examples without any
given counterexamples. The difficulty here lies in the fact that no relative
distance to other classes can be calculated via softmax. We solve this problem
by introducing a "null class" centered around zero, and enforcing centering
with batch normalization. Trained on the commonly used Omniglot data set, we
obtain a classification accuracy of .98 on the matched test set, and of .8 on
unmatched MNIST data. On the more complex MiniImageNet data set, test accuracy
is .8. In addition, we propose a novel Gaussian layer for distance calculation
in a prototypical network, which takes the support examples' distribution
rather than just their centroid into account. This extension shows promising
results when a higher number of support examples is available
Meta Dropout: Learning to Perturb Features for Generalization
A machine learning model that generalizes well should obtain low errors on
unseen test examples. Thus, if we know how to optimally perturb training
examples to account for test examples, we may achieve better generalization
performance. However, obtaining such perturbation is not possible in standard
machine learning frameworks as the distribution of the test data is unknown. To
tackle this challenge, we propose a novel regularization method, meta-dropout,
which learns to perturb the latent features of training examples for
generalization in a meta-learning framework. Specifically, we meta-learn a
noise generator which outputs a multiplicative noise distribution for latent
features, to obtain low errors on the test instances in an input-dependent
manner. Then, the learned noise generator can perturb the training examples of
unseen tasks at the meta-test time for improved generalization. We validate our
method on few-shot classification datasets, whose results show that it
significantly improves the generalization performance of the base model, and
largely outperforms existing regularization methods such as information
bottleneck, manifold mixup, and information dropout
The Variational Homoencoder: Learning to learn high capacity generative models from few examples
Hierarchical Bayesian methods can unify many related tasks (e.g. k-shot
classification, conditional and unconditional generation) as inference within a
single generative model. However, when this generative model is expressed as a
powerful neural network such as a PixelCNN, we show that existing learning
techniques typically fail to effectively use latent variables. To address this,
we develop a modification of the Variational Autoencoder in which encoded
observations are decoded to new elements from the same class. This technique,
which we call a Variational Homoencoder (VHE), produces a hierarchical latent
variable model which better utilises latent variables. We use the VHE framework
to learn a hierarchical PixelCNN on the Omniglot dataset, which outperforms all
existing models on test set likelihood and achieves strong performance on
one-shot generation and classification tasks. We additionally validate the VHE
on natural images from the YouTube Faces database. Finally, we develop
extensions of the model that apply to richer dataset structures such as
factorial and hierarchical categories.Comment: UAI 2018 oral presentatio
Learning to Support: Exploiting Structure Information in Support Sets for One-Shot Learning
Deep Learning shows very good performance when trained on large labeled data
sets. The problem of training a deep net on a few or one sample per class
requires a different learning approach which can generalize to unseen classes
using only a few representatives of these classes. This problem has previously
been approached by meta-learning. Here we propose a novel meta-learner which
shows state-of-the-art performance on common benchmarks for one/few shot
classification. Our model features three novel components: First is a
feed-forward embedding that takes random class support samples (after a
customary CNN embedding) and transfers them to a better class representation in
terms of a classification problem. Second is a novel attention mechanism,
inspired by competitive learning, which causes class representatives to compete
with each other to become a temporary class prototype with respect to the query
point. This mechanism allows switching between representatives depending on the
position of the query point. Once a prototype is chosen for each class, the
predicated label is computed using a simple attention mechanism over prototypes
of all considered classes. The third feature is the ability of our meta-learner
to incorporate deeper CNN embedding, enabling larger capacity. Finally, to ease
the training procedure and reduce overfitting, we averages the top models
(evaluated on the validation) over the optimization trajectory. We show that
this approach can be viewed as an approximation to an ensemble, which saves the
factor of in training and test times and the factor of of in the
storage of the final model
Infinite Mixture Prototypes for Few-Shot Learning
We propose infinite mixture prototypes to adaptively represent both simple
and complex data distributions for few-shot learning. Our infinite mixture
prototypes represent each class by a set of clusters, unlike existing
prototypical methods that represent each class by a single cluster. By
inferring the number of clusters, infinite mixture prototypes interpolate
between nearest neighbor and prototypical representations, which improves
accuracy and robustness in the few-shot regime. We show the importance of
adaptive capacity for capturing complex data distributions such as alphabets,
with 25% absolute accuracy improvements over prototypical networks, while still
maintaining or improving accuracy on the standard Omniglot and mini-ImageNet
benchmarks. In clustering labeled and unlabeled data by the same clustering
rule, infinite mixture prototypes achieves state-of-the-art semi-supervised
accuracy. As a further capability, we show that infinite mixture prototypes can
perform purely unsupervised clustering, unlike existing prototypical methods
- …