34,546 research outputs found
LMTurk: Few-Shot Learners as Crowdsourcing Workers in a Language-Model-as-a-Service Framework
Vast efforts have been devoted to creating high-performance few-shot learners, i.e., large-scale pretrained language models (PLMs) that perform well with little downstream task training data. Training PLMs has incurred significant cost, but utilizing the few-shot learners is still challenging due to their enormous size. This work focuses on a crucial question: How to make effective use of these few-shot learners? We propose LMTurk, a novel approach that treats few-shotlearners as crowdsourcing workers. The rationale is that crowdsourcing workers are in fact few-shot learners: They are shown a few illustrative examples to learn about a task and then start annotating. LMTurk employs few-shot learners built upon PLMs as workers. We show that the resulting annotations can be utilized to train models that solve the task well and are small enough to be deployable in practical scenarios. Active learning is integrated into LMTurk to reduce the amount of queries made to PLMs, minimizing the computational cost of running PLM inference passes. Altogether, LMTurk is an important step towards making effective use of current PLMs
Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks
Pre-trained language models (PLMs) have demonstrated remarkable performance
as few-shot learners. However, their security risks under such settings are
largely unexplored. In this work, we conduct a pilot study showing that PLMs as
few-shot learners are highly vulnerable to backdoor attacks while existing
defenses are inadequate due to the unique challenges of few-shot scenarios. To
address such challenges, we advocate MDP, a novel lightweight, pluggable, and
effective defense for PLMs as few-shot learners. Specifically, MDP leverages
the gap between the masking-sensitivity of poisoned and clean samples: with
reference to the limited few-shot data as distributional anchors, it compares
the representations of given samples under varying masking and identifies
poisoned samples as ones with significant variations. We show analytically that
MDP creates an interesting dilemma for the attacker to choose between attack
effectiveness and detection evasiveness. The empirical evaluation using
benchmark datasets and representative attacks validates the efficacy of MDP.Comment: Accepted by NeurIPS'2
It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous amounts of compute are required for training and applying such big models, resulting in a large carbon footprint and making it difficult for researchers and practitioners to use them. We show that performance similar to GPT-3 can be obtained with language models that are much “greener” in that their parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain a task description, combined with gradient-based optimization; exploiting unlabeled data gives further improvements. We identify key factors required for successful natural language understanding with small language models
Can In-context Learners Learn a Reasoning Concept from Demonstrations?
Large language models show an emergent ability to learn a new task from a
small number of input-output demonstrations. However, recent work shows that
in-context learners largely rely on their pre-trained knowledge, such as the
sentiment of the labels, instead of finding new associations in the input.
However, the commonly-used few-shot evaluation settings using a random
selection of in-context demonstrations can not disentangle models' ability to
learn a new skill from demonstrations, as most of the randomly-selected
demonstrations do not present relations informative for prediction beyond
exposing the new task distribution.
To disentangle models' in-context learning ability independent of models'
memory, we introduce a Conceptual few-shot learning method selecting the
demonstrations sharing a possibly-informative concept with the predicted
sample. We extract a set of such concepts from annotated explanations and
measure how much can models benefit from presenting these concepts in few-shot
demonstrations.
We find that smaller models are more sensitive to the presented concepts.
While some of the models are able to benefit from concept-presenting
demonstrations for each assessed concept, we find that none of the assessed
in-context learners can benefit from all presented reasoning concepts
consistently, leaving the in-context concept learning an open challenge
Revisiting Automated Prompting: Are We Actually Doing Better?
Current literature demonstrates that Large Language Models (LLMs) are great
few-shot learners, and prompting significantly increases their performance on a
range of downstream tasks in a few-shot learning setting. An attempt to
automate human-led prompting followed, with some progress achieved. In
particular, subsequent work demonstrates automation can outperform fine-tuning
in certain K-shot learning scenarios.
In this paper, we revisit techniques for automated prompting on six different
downstream tasks and a larger range of K-shot learning settings. We find that
automated prompting does not consistently outperform simple manual prompts. Our
work suggests that, in addition to fine-tuning, manual prompts should be used
as a baseline in this line of research
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
The recent surge of generative AI has been fueled by the generative power of
diffusion probabilistic models and the scalable capabilities of large language
models. Despite their potential, it remains elusive whether diffusion language
models can solve general language tasks comparable to their autoregressive
counterparts. This paper demonstrates that scaling diffusion models w.r.t.
data, sizes, and tasks can effectively make them strong language learners. We
build competent diffusion language models at scale by first acquiring knowledge
from massive data via masked language modeling pretraining thanks to their
intrinsic connections. We then reprogram pretrained masked language models into
diffusion language models via diffusive adaptation, wherein task-specific
finetuning and instruction finetuning are explored to unlock their versatility
in solving general language tasks. Experiments show that scaling diffusion
language models consistently improves performance across downstream language
tasks. We further discover that instruction finetuning can elicit zero-shot and
few-shot in-context learning abilities that help tackle many unseen tasks by
following natural language instructions, and show promise in advanced and
challenging abilities such as reasoning.Comment: added reference
Accelerating Neural Self-Improvement via Bootstrapping
Few-shot learning with sequence-processing neural networks (NNs) has recently
attracted a new wave of attention in the context of large language models. In
the standard N-way K-shot learning setting, an NN is explicitly optimised to
learn to classify unlabelled inputs by observing a sequence of NK labelled
examples. This pressures the NN to learn a learning algorithm that achieves
optimal performance, given the limited number of training examples. Here we
study an auxiliary loss that encourages further acceleration of few-shot
learning, by applying recently proposed bootstrapped meta-learning to NN
few-shot learners: we optimise the K-shot learner to match its own performance
achievable by observing more than NK examples, using only NK examples.
Promising results are obtained on the standard Mini-ImageNet dataset. Our code
is public.Comment: Presented at ICLR 2023 Workshop on Mathematical and Empirical
Understanding of Foundation Models,
https://openreview.net/forum?id=SDwUYcyOCy
- …