1,559 research outputs found
Prompt Injection: Parameterization of Fixed Inputs
Recent works have shown that attaching prompts to the input is effective at
conditioning Language Models (LM) to perform specific tasks. However, prompts
are always included in the input text during inference, thus incurring
substantial computational and memory overhead. Also, there is currently no
straightforward method of utilizing prompts that are longer than the maximum
input length of the LMs without incurring additional costs during inference. We
propose Prompt Injection (PI), a novel formulation of injecting the prompt into
the parameters of an LM to be an efficient alternative to attaching fixed
prompts to the input. We show that in scenarios with long fixed prompts, PI can
be up to 280 times more efficient in terms of total FLOPs than previous
approaches. We further explore methodologies for PI and show promising results
in persona-dependent conversation, semantic parsing, and zero-shot learning
with task instructions. Through these explorations, we show that PI can be a
promising direction for conditioning language models, especially in scenarios
with long and fixed prompts.Comment: PING results in Table 2 updated (bug fixed
Continually Updating Generative Retrieval on Dynamic Corpora
Generative retrieval has recently been gaining a lot of attention from the
research community for its simplicity, high performance, and the ability to
fully leverage the power of deep autoregressive models. However, prior work on
generative retrieval has mostly investigated on static benchmarks, while
realistic retrieval applications often involve dynamic environments where
knowledge is temporal and accumulated over time. In this paper, we introduce a
new benchmark called STREAMINGIR, dedicated to quantifying the generalizability
of retrieval methods to dynamically changing corpora derived from StreamingQA,
that simulates realistic retrieval use cases. On this benchmark, we conduct an
in-depth comparative evaluation of bi-encoder and generative retrieval in terms
of performance as well as efficiency under varying degree of supervision. Our
results suggest that generative retrieval shows (1) detrimental performance
when only supervised data is used for fine-tuning, (2) superior performance
over bi-encoders when only unsupervised data is available, and (3) lower
performance to bi-encoders when both unsupervised and supervised data is used
due to catastrophic forgetting; nevertheless, we show that parameter-efficient
measures can effectively mitigate the issue and result in competitive
performance and efficiency with respect to the bi-encoder baseline. Our results
open up a new potential for generative retrieval in practical dynamic
environments. Our work will be open-sourced.Comment: Work in progres
Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft Prompt
Enhancing the zero-shot performance of instruction-following models requires
heavy computation, either by scaling the total number of training datasets or
the model size. In this work, we explore how retrieval of soft prompts obtained
through prompt tuning can efficiently assist hard prompts in zero-shot task
generalization. Specifically, we train soft prompt embeddings for each prompt
through prompt tuning, store the samples of the training instances mapped with
the prompt embeddings, and retrieve the corresponding prompt embedding of the
training instance closest to the query instance during inference. While only
adding 0.007% additional parameters, retrieval of soft prompt enhances the
performance of T0 on unseen tasks by outperforming it on 10 out of 11 datasets
as well as improving the mean accuracy of T0 on BIG-bench benchmark by 2.39%
points. Also, we report an interesting finding that retrieving source
embeddings trained on similar answer choice formats is more important than
those on similar task types.Comment: EMNLP 2023 Finding
Electrostatic Steering of Thermal Emission with Active Metasurface Control of Delocalized Modes
We theoretically describe and experimentally demonstrate a
graphene-integrated metasurface structure that enables electrically-tunable
directional control of thermal emission. This device consists of a dielectric
slab that acts as a Fabry-Perot (F-P) resonator supporting long-range
delocalized modes bounded on one side by an electrostatically tunable
metal-graphene metasurface. By varying the Fermi level of the graphene, the
accumulated phase of the F-P mode is shifted, which changes the direction of
absorption and emission at a fixed frequency. We directly measure the
frequency- and angle-dependent emissivity of the thermal emission from a
fabricated device heated to 250. Our results show that electrostatic
control allows the thermal emission at 6.61 m to be continuously steered
over 16, with a peak emissivity maintained above 0.9. We analyze the
dynamic behavior of the thermal emission steerer theoretically using a Fano
interference model, and use the model to design optimized thermal steerer
structures.Comment: 8 pages, 4 figure
Knowledge Unlearning for Mitigating Privacy Risks in Language Models
Pretrained Language Models (LMs) memorize a vast amount of knowledge during
initial pretraining, including information that may violate the privacy of
personal lives and identities. Previous work addressing privacy issues for
language models has mostly focused on data preprocessing and differential
privacy methods, both requiring re-training the underlying LM. We propose
knowledge unlearning as an alternative method to reduce privacy risks for LMs
post hoc. We show that simply applying the unlikelihood training objective to
target token sequences is effective at forgetting them with little to no
degradation of general language modeling performances; it sometimes even
substantially improves the underlying LM with just a few iterations. We also
find that sequential unlearning is better than trying to unlearn all the data
at once and that unlearning is highly dependent on which kind of data (domain)
is forgotten. By showing comparisons with a previous data preprocessing method
known to mitigate privacy risks for LMs, we show that unlearning can give a
stronger empirical privacy guarantee in scenarios where the data vulnerable to
extraction attacks are known a priori while being orders of magnitude more
computationally efficient. We release the code and dataset needed to replicate
our results at https://github.com/joeljang/knowledge-unlearning
How Well Do Large Language Models Truly Ground?
Reliance on the inherent knowledge of Large Language Models (LLMs) can cause
issues such as hallucinations, lack of control, and difficulties in integrating
variable knowledge. To mitigate this, LLMs can be probed to generate responses
by grounding on external context, often given as input (knowledge-augmented
models). Yet, previous research is often confined to a narrow view of the term
"grounding", often only focusing on whether the response contains the correct
answer or not, which does not ensure the reliability of the entire response. To
address this limitation, we introduce a strict definition of grounding: a model
is considered truly grounded when its responses (1) fully utilize necessary
knowledge from the provided context, and (2) don't exceed the knowledge within
the contexts. We introduce a new dataset and a grounding metric to assess this
new definition and perform experiments across 13 LLMs of different sizes and
training methods to provide insights into the factors that influence grounding
performance. Our findings contribute to a better understanding of how to
improve grounding capabilities and suggest an area of improvement toward more
reliable and controllable LLM applications
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
Language models (LMs) with less than 100B parameters are known to perform
poorly on chain-of-thought (CoT) reasoning in contrast to large LMs when
solving unseen tasks. In this work, we aim to equip smaller LMs with the
step-by-step reasoning capability by instruction tuning with CoT rationales. In
order to achieve this goal, we first introduce a new instruction-tuning dataset
called the CoT Collection, which augments the existing Flan Collection
(including only 9 CoT tasks) with additional 1.84 million rationales across
1,060 tasks. We show that CoT fine-tuning Flan-T5 (3B & 11B) with CoT
Collection enables smaller LMs to have better CoT capabilities on unseen tasks.
On the BIG-Bench-Hard (BBH) benchmark, we report an average improvement of
+4.34% (Flan-T5 3B) and +2.60% (Flan-T5 11B), in terms of zero-shot task
accuracy. Furthermore, we show that instruction tuning with CoT Collection
allows LMs to possess stronger few-shot learning capabilities on 4
domain-specific tasks, resulting in an improvement of +2.24% (Flan-T5 3B) and
+2.37% (Flan-T5 11B), even outperforming ChatGPT utilizing demonstrations until
the max length by a +13.98% margin. Our code, the CoT Collection data, and
model checkpoints are publicly available.Comment: EMNLP 2023 (Main Conference
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
While Reinforcement Learning from Human Feedback (RLHF) aligns Large Language
Models (LLMs) with general, aggregate human preferences, it is suboptimal for
learning diverse, individual perspectives. In this work, we study Reinforcement
Learning from Personalized Human Feedback (RLPHF) problem, wherein LLMs are
aligned to multiple (sometimes conflicting) preferences by modeling alignment
as a Multi-Objective Reinforcement Learning (MORL) problem. Compared to strong
single-objective baselines, we show that we can achieve personalized alignment
by decomposing preferences into multiple dimensions. These dimensions are
defined based on personalizations that are declared as desirable by the user.
In this work, we show that they can be efficiently trained independently in a
distributed manner and combined effectively post-hoc through parameter merging.
The code is available at https://github.com/joeljang/RLPHF.Comment: Preprin
- …