10,257 research outputs found
Meta-Learning by the Baldwin Effect
The scope of the Baldwin effect was recently called into question by two
papers that closely examined the seminal work of Hinton and Nowlan. To this
date there has been no demonstration of its necessity in empirically
challenging tasks. Here we show that the Baldwin effect is capable of evolving
few-shot supervised and reinforcement learning mechanisms, by shaping the
hyperparameters and the initial parameters of deep learning algorithms.
Furthermore it can genetically accommodate strong learning biases on the same
set of problems as a recent machine learning algorithm called MAML "Model
Agnostic Meta-Learning" which uses second-order gradients instead of evolution
to learn a set of reference parameters (initial weights) that can allow rapid
adaptation to tasks sampled from a distribution. Whilst in simple cases MAML is
more data efficient than the Baldwin effect, the Baldwin effect is more general
in that it does not require gradients to be backpropagated to the reference
parameters or hyperparameters, and permits effectively any number of gradient
updates in the inner loop. The Baldwin effect learns strong learning dependent
biases, rather than purely genetically accommodating fixed behaviours in a
learning independent manner
Learning to Learn to Disambiguate: Meta-Learning for Few-Shot Word Sense Disambiguation
The success of deep learning methods hinges on the availability of large
training datasets annotated for the task of interest. In contrast to human
intelligence, these methods lack versatility and struggle to learn and adapt
quickly to new tasks, where labeled data is scarce. Meta-learning aims to solve
this problem by training a model on a large number of few-shot tasks, with an
objective to learn new tasks quickly from a small number of examples. In this
paper, we propose a meta-learning framework for few-shot word sense
disambiguation (WSD), where the goal is to learn to disambiguate unseen words
from only a few labeled instances. Meta-learning approaches have so far been
typically tested in an -way, -shot classification setting where each task
has classes with examples per class. Owing to its nature, WSD deviates
from this controlled setup and requires the models to handle a large number of
highly unbalanced classes. We extend several popular meta-learning approaches
to this scenario, and analyze their strengths and weaknesses in this new
challenging setting.Comment: Added additional experiment
- …