17 research outputs found
Dont Add, dont Miss: Effective Content Preserving Generation from Pre-Selected Text Spans
The recently introduced Controlled Text Reduction (CTR) task isolates the
text generation step within typical summarization-style tasks. It does so by
challenging models to generate coherent text conforming to pre-selected content
within the input text (``highlights''). This framing enables increased
modularity in summarization-like tasks, allowing to couple a single CTR model
with various content-selection setups and modules. However, there are currently
no reliable CTR models, while the performance of the existing baseline for the
task is mediocre, falling short of practical utility. Here, we address this gap
by introducing a high-quality, open-source CTR model that tackles two prior key
limitations: inadequate enforcement of the content-preservation constraint, and
suboptimal silver training data. Addressing these, we amplify the
content-preservation constraint in both training, via RL, and inference, via a
controlled decoding strategy. Further, we substantially improve the silver
training data quality via GPT-4 distillation. Overall, pairing the distilled
dataset with the highlight-adherence strategies yields marked gains over the
current baseline, of up to 30 ROUGE-L points, providing a reliable CTR model
for downstream use.Comment: EMNLP 2023, finding
Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks
Data contamination has become prevalent and challenging with the rise of
models pretrained on large automatically-crawled corpora. For closed models,
the training data becomes a trade secret, and even for open models, it is not
trivial to detect contamination. Strategies such as leaderboards with hidden
answers, or using test data which is guaranteed to be unseen, are expensive and
become fragile with time. Assuming that all relevant actors value clean test
data and will cooperate to mitigate data contamination, what can be done? We
propose three strategies that can make a difference: (1) Test data made public
should be encrypted with a public key and licensed to disallow derivative
distribution; (2) demand training exclusion controls from closed API holders,
and protect your test data by refusing to evaluate without them; (3) avoid data
which appears with its solution on the internet, and release the web-page
context of internet-derived data along with the data. These strategies are
practical and can be effective in preventing data contamination.Comment: Accepted to EMNLP 202
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Transformer-based language models (LMs) are at the core of modern NLP, but
their internal prediction construction process is opaque and largely not
understood. In this work, we make a substantial step towards unveiling this
underlying prediction process, by reverse-engineering the operation of the
feed-forward network (FFN) layers, one of the building blocks of transformer
models. We view the token representation as a changing distribution over the
vocabulary, and the output from each FFN layer as an additive update to that
distribution. Then, we analyze the FFN updates in the vocabulary space, showing
that each update can be decomposed to sub-updates corresponding to single FFN
parameter vectors, each promoting concepts that are often human-interpretable.
We then leverage these findings for controlling LM predictions, where we reduce
the toxicity of GPT2 by almost 50%, and for improving computation efficiency
with a simple early exit rule, saving 20% of computation on average
The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models
Large language models (LLMs) have been shown to possess impressive
capabilities, while also raising crucial concerns about the faithfulness of
their responses. A primary issue arising in this context is the management of
(un)answerable queries by LLMs, which often results in hallucinatory behavior
due to overconfidence. In this paper, we explore the behavior of LLMs when
presented with (un)answerable queries. We ask: do models represent the fact
that the question is (un)answerable when generating a hallucinatory answer? Our
results show strong indications that such models encode the answerability of an
input query, with the representation of the first decoded token often being a
strong indicator. These findings shed new light on the spatial organization
within the latent representations of LLMs, unveiling previously unexplored
facets of these models. Moreover, they pave the way for the development of
improved decoding techniques with better adherence to factual generation,
particularly in scenarios where query (un)answerability is a concern.Comment: EMNLP 202