371 research outputs found
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Some NLP tasks can be solved in a fully unsupervised fashion by providing a pretrained language model with “task descriptions” in natural language (e.g., Radford et al., 2019). While this approach underperforms its supervised counterpart, we show in this work that the two ideas can be combined: We introduce Pattern-Exploiting Training (PET), a semi-supervised training procedure that reformulates input examples as cloze-style phrases to help language models understand a given task. These phrases are then used to assign soft labels to a large set of unlabeled examples. Finally, standard supervised training is performed on the resulting training set. For several tasks and languages, PET outperforms supervised training and strong semi-supervised approaches in low-resource settings by a large margin
Label Mask for Multi-Label Text Classification
One of the key problems in multi-label text classification is how to take
advantage of the correlation among labels. However, it is very challenging to
directly model the correlations among labels in a complex and unknown label
space. In this paper, we propose a Label Mask multi-label text classification
model (LM-MTC), which is inspired by the idea of cloze questions of language
model. LM-MTC is able to capture implicit relationships among labels through
the powerful ability of pre-train language models. On the basis, we assign a
different token to each potential label, and randomly mask the token with a
certain probability to build a label based Masked Language Model (MLM). We
train the MTC and MLM together, further improving the generalization ability of
the model. A large number of experiments on multiple datasets demonstrate the
effectiveness of our method
Improving Language Model Predictions via Prompts Enriched with Knowledge Graphs
Despite advances in deep learning and knowledge graphs (KGs), using language models for natural language understanding and question answering remains a challenging task.
Pre-trained language models (PLMs) have shown to be able to leverage contextual information, to complete cloze prompts, next sentence completion and question answering tasks in various domains. Unlike structured data querying in e.g. KGs, mapping an input question to data that may or may not be stored by the language model is not a simple task. Recent studies have highlighted the improvements that can be made to the quality of information retrieved from PLMs by adding auxiliary data to otherwise naive prompts. In this paper, we explore the effects of enriching prompts with additional contextual information leveraged from the Wikidata KG on language model performance. Specifically, we compare the performance of naive vs. KG-engineered cloze prompts for entity genre classification in the movie domain. Selecting a broad range of commonly available Wikidata properties, we show that enrichment of cloze-style prompts with Wikidata information can result in a significantly higher recall for the investigated BERT and RoBERTa large PLMs. However, it is also apparent that the optimum level of data enrichment differs between models
It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous amounts of compute are required for training and applying such big models, resulting in a large carbon footprint and making it difficult for researchers and practitioners to use them. We show that performance similar to GPT-3 can be obtained with language models that are much “greener” in that their parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain a task description, combined with gradient-based optimization; exploiting unlabeled data gives further improvements. We identify key factors required for successful natural language understanding with small language models
Efficient Few-Shot Learning Without Prompts
Recent few-shot methods, such as parameter-efficient fine-tuning (PEFT) and
pattern exploiting training (PET), have achieved impressive results in
label-scarce settings. However, they are difficult to employ since they are
subject to high variability from manually crafted prompts, and typically
require billion-parameter language models to achieve high accuracy. To address
these shortcomings, we propose SetFit (Sentence Transformer Fine-tuning), an
efficient and prompt-free framework for few-shot fine-tuning of Sentence
Transformers (ST). SetFit works by first fine-tuning a pretrained ST on a small
number of text pairs, in a contrastive Siamese manner. The resulting model is
then used to generate rich text embeddings, which are used to train a
classification head. This simple framework requires no prompts or verbalizers,
and achieves high accuracy with orders of magnitude less parameters than
existing techniques. Our experiments show that SetFit obtains comparable
results with PEFT and PET techniques, while being an order of magnitude faster
to train. We also show that SetFit can be applied in multilingual settings by
simply switching the ST body. Our code is available at
https://github.com/huggingface/setfit and our datasets at
https://huggingface.co/setfit
- …