22,172 research outputs found
Meta-Learning for Low-Resource Neural Machine Translation
In this paper, we propose to extend the recently introduced model-agnostic
meta-learning algorithm (MAML) for low-resource neural machine translation
(NMT). We frame low-resource translation as a meta-learning problem, and we
learn to adapt to low-resource languages based on multilingual high-resource
language tasks. We use the universal lexical
representation~\citep{gu2018universal} to overcome the input-output mismatch
across different languages. We evaluate the proposed meta-learning strategy
using eighteen European languages (Bg, Cs, Da, De, El, Es, Et, Fr, Hu, It, Lt,
Nl, Pl, Pt, Sk, Sl, Sv and Ru) as source tasks and five diverse languages (Ro,
Lv, Fi, Tr and Ko) as target tasks. We show that the proposed approach
significantly outperforms the multilingual, transfer learning based
approach~\citep{zoph2016transfer} and enables us to train a competitive NMT
system with only a fraction of training examples. For instance, the proposed
approach can achieve as high as 22.04 BLEU on Romanian-English WMT'16 by seeing
only 16,000 translated words (~600 parallel sentences).Comment: Accepted as a full paper at EMNLP 201
Zero-Shot Cross-Lingual Transfer with Meta Learning
Learning what to share between tasks has been a topic of great importance
recently, as strategic sharing of knowledge has been shown to improve
downstream task performance. This is particularly important for multilingual
applications, as most languages in the world are under-resourced. Here, we
consider the setting of training models on multiple different languages at the
same time, when little or no data is available for languages other than
English. We show that this challenging setup can be approached using
meta-learning, where, in addition to training a source language model, another
model learns to select which training instances are the most beneficial to the
first. We experiment using standard supervised, zero-shot cross-lingual, as
well as few-shot cross-lingual settings for different natural language
understanding tasks (natural language inference, question answering). Our
extensive experimental setup demonstrates the consistent effectiveness of
meta-learning for a total of 15 languages. We improve upon the state-of-the-art
for zero-shot and few-shot NLI (on MultiNLI and XNLI) and QA (on the MLQA
dataset). A comprehensive error analysis indicates that the correlation of
typological features between languages can partly explain when parameter
sharing learned via meta-learning is beneficial.Comment: Accepted as long paper in EMNLP2020 main conferenc
Learning to Learn to Disambiguate: Meta-Learning for Few-Shot Word Sense Disambiguation
The success of deep learning methods hinges on the availability of large
training datasets annotated for the task of interest. In contrast to human
intelligence, these methods lack versatility and struggle to learn and adapt
quickly to new tasks, where labeled data is scarce. Meta-learning aims to solve
this problem by training a model on a large number of few-shot tasks, with an
objective to learn new tasks quickly from a small number of examples. In this
paper, we propose a meta-learning framework for few-shot word sense
disambiguation (WSD), where the goal is to learn to disambiguate unseen words
from only a few labeled instances. Meta-learning approaches have so far been
typically tested in an -way, -shot classification setting where each task
has classes with examples per class. Owing to its nature, WSD deviates
from this controlled setup and requires the models to handle a large number of
highly unbalanced classes. We extend several popular meta-learning approaches
to this scenario, and analyze their strengths and weaknesses in this new
challenging setting.Comment: Added additional experiment
- …