Previous studies have demonstrated that code intelligence models are
sensitive to program transformation among which identifier renaming is
particularly easy to apply and effective. By simply renaming one identifier in
source code, the models would output completely different results. The prior
research generally mitigates the problem by generating more training samples.
Such an approach is less than ideal since its effectiveness depends on the
quantity and quality of the generated samples. Different from these studies, we
are devoted to adjusting models for explicitly distinguishing the influence of
identifier names on the results, called naming bias in this paper, and thereby
making the models robust to identifier renaming. Specifically, we formulate the
naming bias with a structural causal model (SCM), and propose a counterfactual
reasoning based framework named CARBON for eliminating the naming bias in
neural code comprehension. CARBON explicitly captures the naming bias through
multi-task learning in the training stage, and reduces the bias by
counterfactual inference in the inference stage. We evaluate CARBON on three
neural code comprehension tasks, including function naming, defect detection
and code classification. Experiment results show that CARBON achieves
relatively better performance (e.g., +0.5% on the function naming task at F1
score) than the baseline models on the original benchmark datasets, and
significantly improvement (e.g., +37.9% on the function naming task at F1
score) on the datasets with identifiers renamed. The proposed framework
provides a causal view for improving the robustness of code intelligence
models