Search CORE

371 research outputs found

Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference

Author: Merlo Paola
Schick Timo
Schütze Hinrich
Tiedemann Jörg
Tsarfaty Reut
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/04/2021
Field of study

Some NLP tasks can be solved in a fully unsupervised fashion by providing a pretrained language model with “task descriptions” in natural language (e.g., Radford et al., 2019). While this approach underperforms its supervised counterpart, we show in this work that the two ideas can be combined: We introduce Pattern-Exploiting Training (PET), a semi-supervised training procedure that reformulates input examples as cloze-style phrases to help language models understand a given task. These phrases are then used to assign soft labels to a large set of unlabeled examples. Finally, standard supervised training is performed on the resulting training set. For several tasks and languages, PET outperforms supervised training and strong semi-supervised approaches in low-resource settings by a large margin

Open Access LMU

Label Mask for Multi-Label Text Classification

Author: An Haining
Chen Xingbing
Liu Zelong
Song Rui
Wang Xiaoguang
Xu Hao
Zhang Zhiqi
Publication venue
Publication date: 18/06/2021
Field of study

One of the key problems in multi-label text classification is how to take advantage of the correlation among labels. However, it is very challenging to directly model the correlations among labels in a complex and unknown label space. In this paper, we propose a Label Mask multi-label text classification model (LM-MTC), which is inspired by the idea of cloze questions of language model. LM-MTC is able to capture implicit relationships among labels through the powerful ability of pre-train language models. On the basis, we assign a different token to each potential label, and randomly mask the token with a certain probability to build a label based Masked Language Model (MLM). We train the MTC and MLM together, further improving the generalization ability of the model. A large number of experiments on multiple datasets demonstrate the effectiveness of our method

arXiv.org e-Print Archive

Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification

Author: Schick Timo
Schmid Helmut
Schütze Hinrich
Publication venue
Publication date: 01/01/2020
Field of study

Crossref

Open Access LMU

Improving Language Model Predictions via Prompts Enriched with Knowledge Graphs

Author: Brate Ryan
Dang Minh-Hoang
He Yuan
Hoppe Fabian
Meroño-Peñuela Albert
Sadashivaiah Vijay
Publication venue: CEUR-WS.org
Publication date: 10/10/2022
Field of study

Despite advances in deep learning and knowledge graphs (KGs), using language models for natural language understanding and question answering remains a challenging task. Pre-trained language models (PLMs) have shown to be able to leverage contextual information, to complete cloze prompts, next sentence completion and question answering tasks in various domains. Unlike structured data querying in e.g. KGs, mapping an input question to data that may or may not be stored by the language model is not a simple task. Recent studies have highlighted the improvements that can be made to the quality of information retrieved from PLMs by adding auxiliary data to otherwise naive prompts. In this paper, we explore the effects of enriching prompts with additional contextual information leveraged from the Wikidata KG on language model performance. Specifically, we compare the performance of naive vs. KG-engineered cloze prompts for entity genre classification in the movie domain. Selecting a broad range of commonly available Wikidata properties, we show that enrichment of cloze-style prompts with Wikidata information can result in a significantly higher recall for the investigated BERT and RoBERTa large PLMs. However, it is also apparent that the optimum level of data enrichment differs between models

KITopen

Oxford University Research Archive

It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

Author: Schick Timo
Schütze Hinrich
Toutanova Kristina
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/06/2021
Field of study

When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous amounts of compute are required for training and applying such big models, resulting in a large carbon footprint and making it difficult for researchers and practitioners to use them. We show that performance similar to GPT-3 can be obtained with language models that are much “greener” in that their parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain a task description, combined with gradient-based optimization; exploiting unlabeled data gives further improvements. We identify key factors required for successful natural language understanding with small language models

Open Access LMU

Efficient Few-Shot Learning Without Prompts

Author: Bates Luke
Jo Unso Eun Seo
Korat Daniel
Pereg Oren
Reimers Nils
Tunstall Lewis
Wasserblat Moshe
Publication venue
Publication date: 22/09/2022
Field of study

Recent few-shot methods, such as parameter-efficient fine-tuning (PEFT) and pattern exploiting training (PET), have achieved impressive results in label-scarce settings. However, they are difficult to employ since they are subject to high variability from manually crafted prompts, and typically require billion-parameter language models to achieve high accuracy. To address these shortcomings, we propose SetFit (Sentence Transformer Fine-tuning), an efficient and prompt-free framework for few-shot fine-tuning of Sentence Transformers (ST). SetFit works by first fine-tuning a pretrained ST on a small number of text pairs, in a contrastive Siamese manner. The resulting model is then used to generate rich text embeddings, which are used to train a classification head. This simple framework requires no prompts or verbalizers, and achieves high accuracy with orders of magnitude less parameters than existing techniques. Our experiments show that SetFit obtains comparable results with PEFT and PET techniques, while being an order of magnitude faster to train. We also show that SetFit can be applied in multilingual settings by simply switching the ST body. Our code is available at https://github.com/huggingface/setfit and our datasets at https://huggingface.co/setfit

arXiv.org e-Print Archive

TUbiblio