Counterfactually-Augmented Data (CAD) -- minimal editing of sentences to flip
the corresponding labels -- has the potential to improve the
Out-Of-Distribution (OOD) generalization capability of language models, as CAD
induces language models to exploit domain-independent causal features and
exclude spurious correlations. However, the empirical results of CAD's OOD
generalization are not as efficient as anticipated. In this study, we attribute
the inefficiency to the myopia phenomenon caused by CAD: language models only
focus on causal features that are edited in the augmentation operation and
exclude other non-edited causal features. Therefore, the potential of CAD is
not fully exploited. To address this issue, we analyze the myopia phenomenon in
feature space from the perspective of Fisher's Linear Discriminant, then we
introduce two additional constraints based on CAD's structural properties
(dataset-level and sentence-level) to help language models extract more
complete causal features in CAD, thereby mitigating the myopia phenomenon and
improving OOD generalization capability. We evaluate our method on two tasks:
Sentiment Analysis and Natural Language Inference, and the experimental results
demonstrate that our method could unlock the potential of CAD and improve the
OOD generalization performance of language models by 1.0% to 5.9%.Comment: Expert Systems With Applications 2023. arXiv admin note: text overlap
with arXiv:2302.0934