In recent years, language models (LMs) have made remarkable progress in
advancing the field of natural language processing (NLP). However, the impact
of data augmentation (DA) techniques on the fine-tuning (FT) performance of
these LMs has been a topic of ongoing debate. In this study, we evaluate the
effectiveness of three different FT methods in conjugation with
back-translation across an array of 7 diverse NLP tasks, including
classification and regression types, covering single-sentence and sentence-pair
tasks. Contrary to prior assumptions that DA does not contribute to the
enhancement of LMs' FT performance, our findings reveal that continued
pre-training on augmented data can effectively improve the FT performance of
the downstream tasks. In the most favourable case, continued pre-training
improves the performance of FT by more than 10% in the few-shot learning
setting. Our finding highlights the potential of DA as a powerful tool for
bolstering LMs' performance