Search CORE

537 research outputs found

Childhood Brain Tumors

Author: Sri Gururangan
Publication venue: 'IntechOpen'
Publication date: 22/09/2011
Field of study

Converging High-Level Coupled-Cluster Energetics via Adaptive Selection of Excitation Manifolds Driven by Moment Expansions

Author: Gururangan Karthik
Piecuch Piotr
Publication venue
Publication date: 21/08/2023
Field of study

A novel approach to rapidly converging high-level coupled-cluster (CC) energetics in an automated fashion is proposed. The key idea is an adaptive selection of the excitation manifolds defining higher-than-two-body components of the cluster operator inspired by the CC(

P

;

Q

) moment expansions. The usefulness of the resulting methodology is illustrated by molecular examples where the goal is to recover the electronic energies obtained using the CC method with a full treatment of singly, doubly, and triply excited clusters (CCSDT) when the noniterative triples corrections to CCSD fail.Comment: 18 pages, 5 tables. This article has been accepted for publication in the Journal of Chemical Physics. After it is published, it will be found at https://doi.org/10.1063/5.016287

arXiv.org e-Print Archive

Time Waits for No One! Analysis and Challenges of Temporal Misalignment

Author: Gururangan Suchin
Khashabi Daniel
Luu Kelvin
Mandyam Karishma
Smith Noah A.
Publication venue
Publication date: 01/07/2022
Field of study

When an NLP model is trained on text data from one time period and tested or deployed on data from another, the resulting temporal misalignment can degrade end-task performance. In this work, we establish a suite of eight diverse tasks across different domains (social media, science papers, news, and reviews) and periods of time (spanning five years or more) to quantify the effects of temporal misalignment. Our study is focused on the ubiquitous setting where a pretrained model is optionally adapted through continued domain-specific pretraining, followed by task-specific finetuning. We establish a suite of tasks across multiple domains to study temporal misalignment in modern NLP systems. We find stronger effects of temporal misalignment on task performance than have been previously reported. We also find that, while temporal adaptation through continued pretraining can help, these gains are small compared to task-specific finetuning on data from the target time period. Our findings motivate continued research to improve temporal robustness of NLP models.Comment: 9 pages, 6 figures, 3 table

arXiv.org e-Print Archive

Editing Models with Task Arithmetic

Author: Farhadi Ali
Gururangan Suchin
Hajishirzi Hannaneh
Ilharco Gabriel
Ribeiro Marco Tulio
Schmidt Ludwig
Wortsman Mitchell
Publication venue
Publication date: 29/03/2023
Field of study

Changing how pre-trained models behave -- e.g., improving their performance on a downstream task or mitigating biases learned during pre-training -- is a common practice when developing machine learning systems. In this work, we propose a new paradigm for steering the behavior of neural networks, centered around \textit{task vectors}. A task vector specifies a direction in the weight space of a pre-trained model, such that movement in that direction improves performance on the task. We build task vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a task. We show that these task vectors can be modified and combined together through arithmetic operations such as negation and addition, and the behavior of the resulting model is steered accordingly. Negating a task vector decreases performance on the target task, with little change in model behavior on control tasks. Moreover, adding task vectors together can improve performance on multiple tasks at once. Finally, when tasks are linked by an analogy relationship of the form ``A is to B as C is to D", combining task vectors from three of the tasks can improve performance on the fourth, even when no data from the fourth task is used for training. Overall, our experiments with several models, modalities and tasks show that task arithmetic is a simple, efficient and effective way of editing models.Comment: In Proceedings of the 11th International Conference on Learning Representations (ICLR 2023

arXiv.org e-Print Archive