Search CORE

1,590 research outputs found

Learning to Learn to Disambiguate: Meta-Learning for Few-Shot Word Sense Disambiguation

Author: Holla Nithin
Mishra Pushkar
Shutova Ekaterina
Yannakoudakis Helen
Publication venue
Publication date: 01/01/2020
Field of study

The success of deep learning methods hinges on the availability of large training datasets annotated for the task of interest. In contrast to human intelligence, these methods lack versatility and struggle to learn and adapt quickly to new tasks, where labeled data is scarce. Meta-learning aims to solve this problem by training a model on a large number of few-shot tasks, with an objective to learn new tasks quickly from a small number of examples. In this paper, we propose a meta-learning framework for few-shot word sense disambiguation (WSD), where the goal is to learn to disambiguate unseen words from only a few labeled instances. Meta-learning approaches have so far been typically tested in an

N

-way,

K

-shot classification setting where each task has

N

classes with

K

examples per class. Owing to its nature, WSD deviates from this controlled setup and requires the models to handle a large number of highly unbalanced classes. We extend several popular meta-learning approaches to this scenario, and analyze their strengths and weaknesses in this new challenging setting.Comment: Added additional experiment

arXiv.org e-Print Archive

Crossref

King's Research Portal

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Recommended from our members

Few-Shot Natural Language Processing by Meta-Learning Without Labeled Data

Author: Bansal Trapit
Publication venue: ScholarWorks@UMass Amherst
Publication date: 18/03/2022
Field of study

Humans show a remarkable capability to accurately solve a wide range of problems efficiently -- utilizing a limited amount of computation and experience. Deep learning models, by stark contrast, can be trained to be highly accurate on a narrow task while being highly inefficient in terms of the amount of compute and data required to reach that accuracy. Within natural language processing (NLP), recent breakthroughs in unsupervised pretraining have enabled reusable models that can be applied to many NLP tasks, however, learning of new tasks is still inefficient. This has led to research on few-shot learning, where the goal is to generalize to new tasks with very few labeled instances. Meta-learning, or learning to learn, treats the learning process itself as a learning problem from data with the goal of learning systems that can generalize to new tasks efficiently. This has the potential to produce few-shot learners that can accurately solve a wide range of new tasks. However, meta-learning requires a distribution over tasks with relevant labeled data that can be difficult to obtain, severely limiting the practical utility of meta-learning methods. In this dissertation, we develop methods to enable large-scale meta-learning from unlabeled text data and improve the few-shot generalization ability of NLP models. We contribute methods that propose tasks synthetically created from unlabeled text, allowing for a large task distribution for meta-learning. This leads to rapid learning of new tasks by meta-learning from millions of self-supervised tasks and minimizes the train-test mismatch in few-shot learning by optimizing the pre-training directly for future fine-tuning with a few examples. Since real-world applications of NLP require learning diverse tasks with different numbers of classes, we first introduce an optimization-based meta-learning method that can learn from multiple NLP classification tasks with any number of classes. We then leverage the proposed self-supervised approach to create meta-training tasks, with a diverse number of classes, and meta-train models for few-shot learning using this task distribution. This leads to better representation learning, learning key hyper-parameters like learning rates, can be combined with supervised tasks to regularize supervised meta-learning, and leads to accurate few-shot learning on a diverse set of NLP classification tasks. We further explore the space of self-supervised tasks for meta-learning by considering important aspects like task diversity, difficulty, type, domain, and curriculum, and investigate how they affect meta-learning performance. Our analysis shows that all these factors meaningfully alter the task distribution, some inducing significant improvements in downstream few-shot accuracy of the meta-learned models. Our findings yield accurate and efficient meta-learning methods that improve few-shot generalization to diverse tasks and should enable many future applications of meta-learning in NLP, such as hyper-parameter optimization, continual learning, efficient learning, learning in low-resource languages, and more

ScholarWorks@UMass Amherst

Learning to Learn to Disambiguate: Meta-Learning for Few-Shot Word Sense Disambiguation

Author: Holla N.
Mishra P.
Shutova E.
Yannakoudakis H.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

International Migration, Integration and Social Cohesion online publications