Search CORE

18 research outputs found

Structured Pruning Learns Compact and Accurate Models

Author: Chen Danqi
Xia Mengzhou
Zhong Zexuan
Publication venue
Publication date: 01/04/2022
Field of study

The growing size of neural language models has led to increased attention in model compression. The two predominant approaches are pruning, which gradually removes weights from a pre-trained model, and distillation, which trains a smaller compact model to match a larger one. Pruning methods can significantly reduce the model size but hardly achieve large speedups as distillation. However, distillation methods require large amounts of unlabeled data and are expensive to train. In this work, we propose a task-specific structured pruning method CoFi (Coarse- and Fine-grained Pruning), which delivers highly parallelizable subnetworks and matches the distillation methods in both accuracy and latency, without resorting to any unlabeled data. Our key insight is to jointly prune coarse-grained (e.g., layers) and fine-grained (e.g., heads and hidden units) modules, which controls the pruning decision of each parameter with masks of different granularity. We also devise a layerwise distillation strategy to transfer knowledge from unpruned to pruned models during optimization. Our experiments on GLUE and SQuAD datasets show that CoFi yields models with over 10x speedups with a small accuracy drop, showing its effectiveness and efficiency compared to previous pruning and distillation approaches.Comment: Accepted to ACL 2022; The code and models are available at https://github.com/princeton-nlp/CoFiPrunin

arXiv.org e-Print Archive

MABEL: Attenuating Gender Bias using Textual Entailment Data

Author: Chen Danqi
Fellbaum Christiane
He Jacqueline
Xia Mengzhou
Publication venue
Publication date: 26/10/2022
Field of study

Pre-trained language models encode undesirable social biases, which are further exacerbated in downstream use. To this end, we propose MABEL (a Method for Attenuating Gender Bias using Entailment Labels), an intermediate pre-training approach for mitigating gender bias in contextualized representations. Key to our approach is the use of a contrastive learning objective on counterfactually augmented, gender-balanced entailment pairs from natural language inference (NLI) datasets. We also introduce an alignment regularizer that pulls identical entailment pairs along opposite gender directions closer. We extensively evaluate our approach on intrinsic and extrinsic metrics, and show that MABEL outperforms previous task-agnostic debiasing approaches in terms of fairness. It also preserves task performance after fine-tuning on downstream tasks. Together, these findings demonstrate the suitability of NLI data as an effective means of bias mitigation, as opposed to only using unlabeled sentences in the literature. Finally, we identify that existing approaches often use evaluation settings that are insufficient or inconsistent. We make an effort to reproduce and compare previous methods, and call for unifying the evaluation settings across gender debiasing methods for better future comparison.Comment: Accepted to EMNLP 2022. Code and models are publicly available at https://github.com/princeton-nlp/mabe

arXiv.org e-Print Archive

Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

Author: Chen Danqi
Gao Tianyu
Xia Mengzhou
Zeng Zhiyuan
Publication venue
Publication date: 10/10/2023
Field of study

The popularity of LLaMA (Touvron et al., 2023a;b) and other recently emerged moderate-sized large language models (LLMs) highlights the potential of building smaller yet powerful LLMs. Regardless, the cost of training such models from scratch on trillions of tokens remains high. In this work, we study structured pruning as an effective means to develop smaller LLMs from pre-trained, larger models. Our approach employs two key techniques: (1) targeted structured pruning, which prunes a larger model to a specified target shape by removing layers, heads, and intermediate and hidden dimensions in an end-to-end manner, and (2) dynamic batch loading, which dynamically updates the composition of sampled data in each training batch based on varying losses across different domains. We demonstrate the efficacy of our approach by presenting the Sheared-LLaMA series, pruning the LLaMA2-7B model down to 1.3B and 2.7B parameters. Sheared-LLaMA models outperform state-of-the-art open-source models of equivalent sizes, such as Pythia, INCITE, and OpenLLaMA models, on a wide range of downstream and instruction tuning evaluations, while requiring only 3% of compute compared to training such models from scratch. This work provides compelling evidence that leveraging existing LLMs with structured pruning is a far more cost-effective approach for building smaller LLMs.Comment: The code and models are available at https://github.com/princeton-nlp/LLM-Shearin

arXiv.org e-Print Archive

Prompting ELECTRA: Few-Shot Learning with Discriminative Pre-Trained Models

Author: Artetxe Mikel
Chen Danqi
Du Jingfei
Stoyanov Ves
Xia Mengzhou
Publication venue
Publication date: 26/10/2022
Field of study

Pre-trained masked language models successfully perform few-shot learning by formulating downstream tasks as text infilling. However, as a strong alternative in full-shot settings, discriminative pre-trained models like ELECTRA do not fit into the paradigm. In this work, we adapt prompt-based few-shot learning to ELECTRA and show that it outperforms masked language models in a wide range of tasks. ELECTRA is pre-trained to distinguish if a token is generated or original. We naturally extend that to prompt-based few-shot learning by training to score the originality of the target options without introducing new parameters. Our method can be easily adapted to tasks involving multi-token predictions without extra computation overhead. Analysis shows that ELECTRA learns distributions that align better with downstream tasks.Comment: Accepted to EMNLP 2022; The code is available at https://github.com/facebookresearch/ELECTRA-Fewshot-Learnin

arXiv.org e-Print Archive

Training Trajectories of Language Models Across Scales

Author: Artetxe Mikel
Chen Danqi
Lin Xi Victoria
Pasunuru Ramakanth
Stoyanov Ves
Xia Mengzhou
Zettlemoyer Luke
Zhou Chunting
Publication venue
Publication date: 29/05/2023
Field of study

Scaling up language models has led to unprecedented performance gains, but little is understood about how the training dynamics change as models get larger. How do language models of different sizes learn during pre-training? Why do larger language models demonstrate more desirable behaviors? In this paper, we analyze the intermediate training checkpoints of differently sized OPT models (Zhang et al.,2022)--from 125M to 175B parameters--on next-token prediction, sequence-level generation, and downstream tasks. We find that 1) at a given perplexity and independent of model sizes, a similar subset of training tokens see the most significant reduction in loss, with the rest stagnating or showing double-descent behavior; 2) early in training, all models learn to reduce the perplexity of grammatical sequences that contain hallucinations, with small models halting at this suboptimal distribution and larger ones eventually learning to assign these sequences lower probabilities; 3) perplexity is a strong predictor of in-context learning performance on 74 multiple-choice tasks from BIG-Bench, and this holds independent of the model size. Together, these results show that perplexity is more predictive of model behaviors than model size or training computation.Comment: Accepted to ACL 2023; The code and analysis results are available at https://github.com/xiamengzhou/training_trajectory_analysi

arXiv.org e-Print Archive

Efficacy of Minocycline in Acute Ischemic Stroke: A Systematic Review and Meta-Analysis of Rodent and Clinical Studies

Author: Bin Xia
Bin Xia
Hongmin Li
Hongmin Li
Mengzhou Xue
Mengzhou Xue
V. Wee Yong
V. Wee Yong
Wei Zheng
Wei Zheng
Xin Zhang
Xin Zhang
Yang Liu
Yang Liu
Zhaofu Sheng
Zhaofu Sheng
Publication venue: 'Frontiers Media SA'
Publication date: 01/12/2018
Field of study

Objectives: This study aimed to assess the efficacy of minocycline for the treatment of acute ischemic stroke.Background: While there have been meta-analysis that surveyed the efficacy of minocycline in the treatment of acute stroke, they have some methodological limitations. We performed a new systematic review which was distinct from previous one by adding new outcomes and including new studies.Methods: Document retrieval was executed through PubMed, Cochrane Central Register of Controlled Trials, the Stroke Center, NIH's Clinical Trials, Current Controlled Trials, and the WHO International Clinical Trials Registry Platform Search Portal before Jan 2018. The data meeting the inclusion criteria were extracted. Before meta-analysis, publication bias and heterogeneity of included studies were surveyed. Random and fixed-effects models were employed to calculate pooled estimates and 95% confidence intervals (CIs). Additionally, sensitivity and subgroup analyses were implemented.Result: For clinical studies, 4 trials with 201 patients in the minocycline group, and 195 patients in the control group met the inclusion criteria; 3 were randomized trials. At the end of 90-day follow up or discharge day, results showed that the groups receiving minocycline were superior to the control group, with significant differences in the NIHSS scores (mean difference [MD], −2.75; 95% CI, −4.78, 0.27; p = 0.03) and mRS scores (MD, −0.98; 95% CI, −1.27, −0.69; p < 0.01), but not Barthel Index Score (MD, 9.04; 95% CI, −0.78, 18.07; p = 0.07). For rodent experiments, 14 studies were included. Neurological severity scores (NSS) was significantly improved (MD, −1.38; 95% CI, −1.64, −1.31; p < 0.01) and infarct volume was obviously reduced (Std mean difference [SMD], −2.38; 95% CI, −3.40, −1.36; p < 0.01) in the minocycline group. Heterogeneity among the studies was proved to exist for infarct volume (Chi2 = 116.12, p < 0.01; I2 = 0.89) but not for other variables.Conclusions: Based on the results in our study, minocycline appears as an effective therapeutic option for acute ischemic stroke

Directory of Open Access Journals