537 research outputs found
Converging High-Level Coupled-Cluster Energetics via Adaptive Selection of Excitation Manifolds Driven by Moment Expansions
A novel approach to rapidly converging high-level coupled-cluster (CC)
energetics in an automated fashion is proposed. The key idea is an adaptive
selection of the excitation manifolds defining higher-than-two-body components
of the cluster operator inspired by the CC(;) moment expansions. The
usefulness of the resulting methodology is illustrated by molecular examples
where the goal is to recover the electronic energies obtained using the CC
method with a full treatment of singly, doubly, and triply excited clusters
(CCSDT) when the noniterative triples corrections to CCSD fail.Comment: 18 pages, 5 tables. This article has been accepted for publication in
the Journal of Chemical Physics. After it is published, it will be found at
https://doi.org/10.1063/5.016287
Time Waits for No One! Analysis and Challenges of Temporal Misalignment
When an NLP model is trained on text data from one time period and tested or
deployed on data from another, the resulting temporal misalignment can degrade
end-task performance. In this work, we establish a suite of eight diverse tasks
across different domains (social media, science papers, news, and reviews) and
periods of time (spanning five years or more) to quantify the effects of
temporal misalignment. Our study is focused on the ubiquitous setting where a
pretrained model is optionally adapted through continued domain-specific
pretraining, followed by task-specific finetuning. We establish a suite of
tasks across multiple domains to study temporal misalignment in modern NLP
systems. We find stronger effects of temporal misalignment on task performance
than have been previously reported. We also find that, while temporal
adaptation through continued pretraining can help, these gains are small
compared to task-specific finetuning on data from the target time period. Our
findings motivate continued research to improve temporal robustness of NLP
models.Comment: 9 pages, 6 figures, 3 table
Editing Models with Task Arithmetic
Changing how pre-trained models behave -- e.g., improving their performance
on a downstream task or mitigating biases learned during pre-training -- is a
common practice when developing machine learning systems. In this work, we
propose a new paradigm for steering the behavior of neural networks, centered
around \textit{task vectors}. A task vector specifies a direction in the weight
space of a pre-trained model, such that movement in that direction improves
performance on the task. We build task vectors by subtracting the weights of a
pre-trained model from the weights of the same model after fine-tuning on a
task. We show that these task vectors can be modified and combined together
through arithmetic operations such as negation and addition, and the behavior
of the resulting model is steered accordingly. Negating a task vector decreases
performance on the target task, with little change in model behavior on control
tasks. Moreover, adding task vectors together can improve performance on
multiple tasks at once. Finally, when tasks are linked by an analogy
relationship of the form ``A is to B as C is to D", combining task vectors from
three of the tasks can improve performance on the fourth, even when no data
from the fourth task is used for training. Overall, our experiments with
several models, modalities and tasks show that task arithmetic is a simple,
efficient and effective way of editing models.Comment: In Proceedings of the 11th International Conference on Learning
Representations (ICLR 2023
- …