33 research outputs found
Structured Pruning Learns Compact and Accurate Models
The growing size of neural language models has led to increased attention in
model compression. The two predominant approaches are pruning, which gradually
removes weights from a pre-trained model, and distillation, which trains a
smaller compact model to match a larger one. Pruning methods can significantly
reduce the model size but hardly achieve large speedups as distillation.
However, distillation methods require large amounts of unlabeled data and are
expensive to train. In this work, we propose a task-specific structured pruning
method CoFi (Coarse- and Fine-grained Pruning), which delivers highly
parallelizable subnetworks and matches the distillation methods in both
accuracy and latency, without resorting to any unlabeled data. Our key insight
is to jointly prune coarse-grained (e.g., layers) and fine-grained (e.g., heads
and hidden units) modules, which controls the pruning decision of each
parameter with masks of different granularity. We also devise a layerwise
distillation strategy to transfer knowledge from unpruned to pruned models
during optimization. Our experiments on GLUE and SQuAD datasets show that CoFi
yields models with over 10x speedups with a small accuracy drop, showing its
effectiveness and efficiency compared to previous pruning and distillation
approaches.Comment: Accepted to ACL 2022; The code and models are available at
https://github.com/princeton-nlp/CoFiPrunin
Poisoning Retrieval Corpora by Injecting Adversarial Passages
Dense retrievers have achieved state-of-the-art performance in various
information retrieval tasks, but to what extent can they be safely deployed in
real-world applications? In this work, we propose a novel attack for dense
retrieval systems in which a malicious user generates a small number of
adversarial passages by perturbing discrete tokens to maximize similarity with
a provided set of training queries. When these adversarial passages are
inserted into a large retrieval corpus, we show that this attack is highly
effective in fooling these systems to retrieve them for queries that were not
seen by the attacker. More surprisingly, these adversarial passages can
directly generalize to out-of-domain queries and corpora with a high success
attack rate -- for instance, we find that 50 generated passages optimized on
Natural Questions can mislead >94% of questions posed in financial documents or
online forums. We also benchmark and compare a range of state-of-the-art dense
retrievers, both unsupervised and supervised. Although different systems
exhibit varying levels of vulnerability, we show they can all be successfully
attacked by injecting up to 500 passages, a small fraction compared to a
retrieval corpus of millions of passages.Comment: EMNLP 2023. Our code is available at
https://github.com/princeton-nlp/corpus-poisonin
Privacy Implications of Retrieval-Based Language Models
Retrieval-based language models (LMs) have demonstrated improved
interpretability, factuality, and adaptability compared to their parametric
counterparts, by incorporating retrieved text from external datastores. While
it is well known that parametric models are prone to leaking private data, it
remains unclear how the addition of a retrieval datastore impacts model
privacy. In this work, we present the first study of privacy risks in
retrieval-based LMs, particularly NN-LMs. Our goal is to explore the optimal
design and training procedure in domains where privacy is of concern, aiming to
strike a balance between utility and privacy. Crucially, we find that NN-LMs
are more susceptible to leaking private information from their private
datastore than parametric models. We further explore mitigations of privacy
risks. When privacy information is targeted and readily detected in the text,
we find that a simple sanitization step would completely eliminate the risks,
while decoupling query and key encoders achieves an even better utility-privacy
trade-off. Otherwise, we consider strategies of mixing public and private data
in both datastore and encoder training. While these methods offer modest
improvements, they leave considerable room for future work. Together, our
findings provide insights for practitioners to better understand and mitigate
privacy risks in retrieval-based LMs. Our code is available at:
https://github.com/Princeton-SysML/kNNLM_privacy
MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
The information stored in large language models (LLMs) falls out of date
quickly, and retraining from scratch is often not an option. This has recently
given rise to a range of techniques for injecting new facts through updating
model weights. Current evaluation paradigms are extremely limited, mainly
validating the recall of edited facts, but changing one fact should cause
rippling changes to the model's related beliefs. If we edit the UK Prime
Minister to now be Rishi Sunak, then we should get a different answer to Who is
married to the British Prime Minister? In this work, we present a benchmark
MQuAKE (Multi-hop Question Answering for Knowledge Editing) comprising
multi-hop questions that assess whether edited models correctly answer
questions where the answer should change as an entailed consequence of edited
facts. While we find that current knowledge-editing approaches can recall
edited facts accurately, they fail catastrophically on the constructed
multi-hop questions. We thus propose a simple memory-based approach, MeLLo,
which stores all edited facts externally while prompting the language model
iteratively to generate answers that are consistent with the edited facts.
While MQuAKE remains challenging, we show that MeLLo scales well with LLMs (up
to 175B) and outperforms previous model editors by a large margin.Comment: Our code and datasets are available at
https://github.com/princeton-nlp/MQuAK
REST: Retrieval-Based Speculative Decoding
We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm
designed to speed up language model generation. The key insight driving the
development of REST is the observation that the process of text generation
often includes certain common phases and patterns. Unlike previous methods that
rely on a draft language model for speculative decoding, REST harnesses the
power of retrieval to generate draft tokens. This method draws from the
reservoir of existing knowledge, retrieving and employing relevant tokens based
on the current context. Its plug-and-play nature allows for seamless
integration and acceleration of any language models, all without necessitating
additional training. When benchmarked on 7B and 13B language models in a
single-batch setting, REST achieves a significant speedup of 1.62X to 2.36X on
code or text generation. The code of REST is available at
https://github.com/FasterDecoding/REST
Generating regular expressions from natural language specifications: A semantics-based approach and an empirical study
Translating natural language descriptions into executable programs is a fundamental problem for computational linguistics. Recent research proposes neural-network-based approaches to address the problem. These approaches typically train a sequence-to-sequence learning model using a syntax-based objective: maximum likelihood estimation (MLE). Such syntax-based approaches do not effectively address the goal of generating semantically correct programs, because these approaches fail to handle Program Aliasing, i.e., semantically equivalent programs may have many syntactically different forms. In this thesis, we focus on generating regular expressions from natural language, an important task of the program-synthesis problem. In particular, we study the task in two aspects.
First, we address the issue of Program Aliasing, and propose a semantics-based approach named SemRegex. Different from the existing syntax-based approaches, SemRegex trains the model by maximizing the expected semantic correctness of the generated regular expressions. The semantic correctness is measured using the DFA-equivalence oracle, random test cases, and distinguishing test cases. The experiments on three public datasets demonstrate the superiority of SemRegex over the existing state-of-the-art approaches.
Second, given that existing approaches use only synthetic data in both training datasets and validation/test datasets, we raise a question: are these approaches effective to address various real-world situations? To explore this question, we conduct a characteristic study on comparing two synthetic datasets used by the recent research and a real-world dataset collected from the Internet, and conduct an experimental study on applying an existing state-of-the-art approach on the real-world dataset. Our study results suggest the existence of distinct characteristics between the synthetic datasets and the real-world dataset, and the existing state-of-the-art approach achieves extremely low effectiveness when evaluated on real-world data. We also provide initial analysis on some of those challenging cases and discuss future directions.U of I OnlyAuthor requested U of Illinois access only (OA after 2yrs) in Vireo ETD syste
Training Language Models with Memory Augmentation
Recent work has improved language models remarkably by equipping them with a
non-parametric memory component. However, most existing approaches only
introduce memories at testing time, or represent them using a separately
trained encoder -- resulting in sub-optimal training of the language model. In
this work, we present TRIME, a novel yet simple training approach designed for
training language models with memory augmentation. Our approach uses a training
objective that directly takes in-batch examples as accessible memory. We also
present new methods for memory construction and data batching, which are used
for adapting to different sets of memories -- local, long-term, and external
memory -- at testing time. We evaluate our approach on multiple language
modeling and machine translation benchmarks. We find that simply replacing the
vanilla language modeling objective by ours greatly reduces the perplexity,
without modifying the model architecture or incorporating extra context (e.g.,
18.70 17.76 on WikiText-103). We further augment language models with
long-range contexts and external knowledge and demonstrate significant gains
over previous memory-augmented approaches.Comment: Our code and models will be available at
https://github.com/princeton-nlp/TRIM
Should You Mask 15% in Masked Language Modeling?
Masked language models conventionally use a masking rate of 15% due to the
belief that more masking would provide insufficient context to learn good
representations, and less masking would make training too expensive.
Surprisingly, we find that masking up to 40% of input tokens can outperform the
15% baseline, and even masking 80% can preserve most of the performance, as
measured by finetuning on downstream tasks. Increasing the masking rates has
two distinct effects, which we investigate through careful ablations: (1) A
larger proportion of input tokens are corrupted, reducing the context size and
creating a harder task, and (2) models perform more predictions, which benefits
training. We observe that larger models with more capacity to tackle harder
tasks in particular favor higher masking rates. We also find that even more
sophisticated masking schemes such as span masking or PMI masking can benefit
from higher masking rates, albeit to a smaller extent. Our results contribute
to a better understanding of masked language modeling and shed light on more
efficient language pre-training.Comment: The code and pre-trained models are available at
https://github.com/princeton-nlp/DinkyTrai