36,891 research outputs found
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning
In recent years, there has been significant progress in developing
pre-trained language models for NLP. However, these models often struggle when
fine-tuned on small datasets. To address this issue, researchers have proposed
various adaptation approaches. Prompt-based tuning is arguably the most common
way, especially for larger models. Previous research shows that adding
contrastive learning to prompt-based fine-tuning is effective as it helps the
model generate embeddings that are more distinguishable between classes, and it
can also be more sample-efficient as the model learns from positive and
negative examples simultaneously. One of the most important components of
contrastive learning is data augmentation, but unlike computer vision,
effective data augmentation for NLP is still challenging. This paper proposes
LM-CPPF, Contrastive Paraphrasing-guided Prompt-based Fine-tuning of Language
Models, which leverages prompt-based few-shot paraphrasing using generative
language models, especially large language models such as GPT-3 and OPT-175B,
for data augmentation. Our experiments on multiple text classification
benchmarks show that this augmentation method outperforms other methods, such
as easy data augmentation, back translation, and multiple templates.Comment: 10 pages, 1 figure, 8 tables, 1 algorithm Proceedings of the 61st
Annual Meeting of the Association for Computational Linguistic
Adapting Multilingual Neural Machine Translation to Unseen Languages
Multilingual Neural Machine Translation (MNMT) for low-resource languages
(LRL) can be enhanced by the presence of related high-resource languages (HRL),
but the relatedness of HRL usually relies on predefined linguistic assumptions
about language similarity. Recently, adapting MNMT to a LRL has shown to
greatly improve performance. In this work, we explore the problem of adapting
an MNMT model to an unseen LRL using data selection and model adaptation. In
order to improve NMT for LRL, we employ perplexity to select HRL data that are
most similar to the LRL on the basis of language distance. We extensively
explore data selection in popular multilingual NMT settings, namely in
(zero-shot) translation, and in adaptation from a multilingual pre-trained
model, for both directions (LRL-en). We further show that dynamic adaptation of
the model's vocabulary results in a more favourable segmentation for the LRL in
comparison with direct adaptation. Experiments show reductions in training time
and significant performance gains over LRL baselines, even with zero LRL data
(+13.0 BLEU), up to +17.0 BLEU for pre-trained multilingual model dynamic
adaptation with related data selection. Our method outperforms current
approaches, such as massively multilingual models and data augmentation, on
four LRL.Comment: Accepted at the 16th International Workshop on Spoken Language
Translation (IWSLT), November, 201
Few-shot learning through contextual data augmentation
Machine translation (MT) models used in industries with constantly changing
topics, such as translation or news agencies, need to adapt to new data to
maintain their performance over time. Our aim is to teach a pre-trained MT
model to translate previously unseen words accurately, based on very few
examples. We propose (i) an experimental setup allowing us to simulate novel
vocabulary appearing in human-submitted translations, and (ii) corresponding
evaluation metrics to compare our approaches. We extend a data augmentation
approach using a pre-trained language model to create training examples with
similar contexts for novel words. We compare different fine-tuning and data
augmentation approaches and show that adaptation on the scale of one to five
examples is possible. Combining data augmentation with randomly selected
training sentences leads to the highest BLEU score and accuracy improvements.
Impressively, with only 1 to 5 examples, our model reports better accuracy
scores than a reference system trained with on average 313 parallel examples.Comment: 14 pages includince 3 of appendice
Enhancing Black-Box Few-Shot Text Classification with Prompt-Based Data Augmentation
Training or finetuning large-scale language models (LLMs) such as GPT-3
requires substantial computation resources, motivating recent efforts to
explore parameter-efficient adaptation to downstream tasks. One practical area
of research is to treat these models as black boxes and interact with them
through their inference APIs. In this paper, we investigate how to optimize
few-shot text classification without accessing the gradients of the LLMs. To
achieve this, we treat the black-box model as a feature extractor and train a
classifier with the augmented text data. Data augmentation is performed using
prompt-based finetuning on an auxiliary language model with a much smaller
parameter size than the black-box model. Through extensive experiments on eight
text classification datasets, we show that our approach, dubbed BT-Classifier,
significantly outperforms state-of-the-art black-box few-shot learners and
performs on par with methods that rely on full-model tuning
Domain Adaptation for Neural Networks by Parameter Augmentation
We propose a simple domain adaptation method for neural networks in a
supervised setting. Supervised domain adaptation is a way of improving the
generalization performance on the target domain by using the source domain
dataset, assuming that both of the datasets are labeled. Recently, recurrent
neural networks have been shown to be successful on a variety of NLP tasks such
as caption generation; however, the existing domain adaptation techniques are
limited to (1) tune the model parameters by the target dataset after the
training by the source dataset, or (2) design the network to have dual output,
one for the source domain and the other for the target domain. Reformulating
the idea of the domain adaptation technique proposed by Daume (2007), we
propose a simple domain adaptation method, which can be applied to neural
networks trained with a cross-entropy loss. On captioning datasets, we show
performance improvements over other domain adaptation methods.Comment: 9 page. To appear in the first ACL Workshop on Representation
Learning for NL
Cross-Lingual Adaptation for Type Inference
Deep learning-based techniques have been widely applied to the program
analysis tasks, in fields such as type inference, fault localization, and code
summarization. Hitherto deep learning-based software engineering systems rely
thoroughly on supervised learning approaches, which require laborious manual
effort to collect and label a prohibitively large amount of data. However, most
Turing-complete imperative languages share similar control- and data-flow
structures, which make it possible to transfer knowledge learned from one
language to another. In this paper, we propose cross-lingual adaptation of
program analysis, which allows us to leverage prior knowledge learned from the
labeled dataset of one language and transfer it to the others. Specifically, we
implemented a cross-lingual adaptation framework, PLATO, to transfer a deep
learning-based type inference procedure across weakly typed languages, e.g.,
Python to JavaScript and vice versa. PLATO incorporates a novel joint graph
kernelized attention based on abstract syntax tree and control flow graph, and
applies anchor word augmentation across different languages. Besides, by
leveraging data from strongly typed languages, PLATO improves the perplexity of
the backbone cross-programming-language model and the performance of downstream
cross-lingual transfer for type inference. Experimental results illustrate that
our framework significantly improves the transferability over the baseline
method by a large margin
- …