73 research outputs found
A Survey of Deep Meta-Learning
Deep neural networks can achieve great successes when presented with large
data sets and sufficient computational resources. However, their ability to
learn new concepts quickly is quite limited. Meta-learning is one approach to
address this issue, by enabling the network to learn how to learn. The exciting
field of Deep Meta-Learning advances at great speed, but lacks a unified,
insightful overview of current techniques. This work presents just that. After
providing the reader with a theoretical foundation, we investigate and
summarize key methods, which are categorized into i) metric-, ii) model-, and
iii) optimization-based techniques. In addition, we identify the main open
challenges, such as performance evaluations on heterogeneous benchmarks, and
reduction of the computational costs of meta-learning.Comment: Extended version of book chapter in 'Metalearning: Applications to
Automated Machine Learning and Data Mining' (2nd edition, forthcoming
Understanding Transfer Learning and Gradient-Based Meta-Learning Techniques
Deep neural networks can yield good performance on various tasks but often
require large amounts of data to train them. Meta-learning received
considerable attention as one approach to improve the generalization of these
networks from a limited amount of data. Whilst meta-learning techniques have
been observed to be successful at this in various scenarios, recent results
suggest that when evaluated on tasks from a different data distribution than
the one used for training, a baseline that simply finetunes a pre-trained
network may be more effective than more complicated meta-learning techniques
such as MAML, which is one of the most popular meta-learning techniques. This
is surprising as the learning behaviour of MAML mimics that of finetuning: both
rely on re-using learned features. We investigate the observed performance
differences between finetuning, MAML, and another meta-learning technique
called Reptile, and show that MAML and Reptile specialize for fast adaptation
in low-data regimes of similar data distribution as the one used for training.
Our findings show that both the output layer and the noisy training conditions
induced by data scarcity play important roles in facilitating this
specialization for MAML. Lastly, we show that the pre-trained features as
obtained by the finetuning baseline are more diverse and discriminative than
those learned by MAML and Reptile. Due to this lack of diversity and
distribution specialization, MAML and Reptile may fail to generalize to
out-of-distribution tasks whereas finetuning can fall back on the diversity of
the learned features.Comment: Accepted at Machine Learning Journal, Special Issue on Discovery
Science 202
Subspace Adaptation Prior for Few-Shot Learning
Gradient-based meta-learning techniques aim to distill useful prior knowledge
from a set of training tasks such that new tasks can be learned more
efficiently with gradient descent. While these methods have achieved successes
in various scenarios, they commonly adapt all parameters of trainable layers
when learning new tasks. This neglects potentially more efficient learning
strategies for a given task distribution and may be susceptible to overfitting,
especially in few-shot learning where tasks must be learned from a limited
number of examples. To address these issues, we propose Subspace Adaptation
Prior (SAP), a novel gradient-based meta-learning algorithm that jointly learns
good initialization parameters (prior knowledge) and layer-wise parameter
subspaces in the form of operation subsets that should be adaptable. In this
way, SAP can learn which operation subsets to adjust with gradient descent
based on the underlying task distribution, simultaneously decreasing the risk
of overfitting when learning new tasks. We demonstrate that this ability is
helpful as SAP yields superior or competitive performance in few-shot image
classification settings (gains between 0.1% and 3.9% in accuracy). Analysis of
the learned subspaces demonstrates that low-dimensional operations often yield
high activation strengths, indicating that they may be important for achieving
good few-shot learning performance. For reproducibility purposes, we publish
all our research code publicly.Comment: Accepted at Machine Learning Journal, Special Issue of the ECML PKDD
2023 Journal Trac
Are LSTMs Good Few-Shot Learners?
Deep learning requires large amounts of data to learn new tasks well,
limiting its applicability to domains where such data is available.
Meta-learning overcomes this limitation by learning how to learn. In 2001,
Hochreiter et al. showed that an LSTM trained with backpropagation across
different tasks is capable of meta-learning. Despite promising results of this
approach on small problems, and more recently, also on reinforcement learning
problems, the approach has received little attention in the supervised few-shot
learning setting. We revisit this approach and test it on modern few-shot
learning benchmarks. We find that LSTM, surprisingly, outperform the popular
meta-learning technique MAML on a simple few-shot sine wave regression
benchmark, but that LSTM, expectedly, fall short on more complex few-shot image
classification benchmarks. We identify two potential causes and propose a new
method called Outer Product LSTM (OP-LSTM) that resolves these issues and
displays substantial performance gains over the plain LSTM. Compared to popular
meta-learning baselines, OP-LSTM yields competitive performance on
within-domain few-shot image classification, and performs better in
cross-domain settings by 0.5% to 1.9% in accuracy score. While these results
alone do not set a new state-of-the-art, the advances of OP-LSTM are orthogonal
to other advances in the field of meta-learning, yield new insights in how LSTM
work in image classification, allowing for a whole range of new research
directions. For reproducibility purposes, we publish all our research code
publicly.Comment: Accepted at Machine Learning Journal, Special Issue of the ECML PKDD
2023 Journal Trac
Recommended from our members
The role of statistical learning in the acquisition of motion event construal in a second language
Learning to talk about motion in a second language is very difficult because it involves restructuring deeply entrenched patterns from the first language (Slobin 1996). In this paper we argue that statistical learning (Saffran et al. 1997) can explain why L2 learners are only partially successful in restructuring their second language grammars. We explore to what extent L2 learners make use of two mechanisms of statistical learning, entrenchment and pre-emption (Boyd and Goldberg 2011) to acquire target-like expressions of motion and retreat from overgeneralisation in this domain. Paying attention to the frequency of existing patterns in the input can help learners to adjust the frequency with which they use path and manner verbs in French but is insufficient to acquire the boundary crossing constraint (Slobin and Hoiting 1994) and learn what not to say. We also look at the role of language proficiency and exposure to French in explaining the findings
The ATLAS Level-1 muon topological trigger information for run 2 of the LHC
For run 2 of the LHC, the ATLAS Level-1 trigger system will include topological information on trigger objects in order to cope with the increased trigger rates. The existing Muon-to-Central-Trigger- Processor interface (MUCTPI) has been modified in order to provide coarse-grained topological information on muon candidates. A MUCTPI- to-Level-1-Topological-Processor interface (MuCTPiToTopo) has been developed to receive the electrical information and to send it optically to the Level-1 Topological Processor (L1TOPO). This poster will describe the different modules mentioned above and present results of functionality and connection tests performed
- …