While pre-trained language model (PLM) fine-tuning has achieved strong
performance in many NLP tasks, the fine-tuning stage can be still demanding in
labeled data. Recent works have resorted to active fine-tuning to improve the
label efficiency of PLM fine-tuning, but none of them investigate the potential
of unlabeled data. We propose {\ours}, a new framework that leverages unlabeled
data to improve the label efficiency of active PLM fine-tuning. AcTune switches
between data annotation and model self-training based on uncertainty: it
selects high-uncertainty unlabeled samples for active annotation and
low-uncertainty ones for model self-training. Under this framework, we design
(1) a region-aware sampling strategy that reduces redundancy when actively
querying for annotations and (2) a momentum-based memory bank that dynamically
aggregates the model's pseudo labels to suppress label noise in self-training.
Experiments on 6 text classification datasets show that AcTune outperforms the
strongest active learning and self-training baselines and improves the label
efficiency of PLM fine-tuning by 56.2\% on average. Our implementation will be
available at \url{https://github.com/yueyu1030/actune}.Comment: NAACL 2022 Main Conference (Code:
https://github.com/yueyu1030/actune