Refactoring is an indispensable practice of improving the quality and
maintainability of source code in software evolution. Rename refactoring is the
most frequently performed refactoring that suggests a new name for an
identifier to enhance readability when the identifier is poorly named. However,
most existing works only identify renaming activities between two versions of
source code, while few works express concern about how to suggest a new name.
In this paper, we study automatic rename refactoring on variable names, which
is considered more challenging than other rename refactoring activities. We
first point out the connections between rename refactoring and various
prevalent learning paradigms and the difference between rename refactoring and
general text generation in natural language processing. Based on our
observations, we propose RefBERT, a two-stage pre-trained framework for rename
refactoring on variable names. RefBERT first predicts the number of sub-tokens
in the new name and then generates sub-tokens accordingly. Several techniques,
including constrained masked language modeling, contrastive learning, and the
bag-of-tokens loss, are incorporated into RefBERT to tailor it for automatic
rename refactoring on variable names. Through extensive experiments on our
constructed refactoring datasets, we show that the generated variable names of
RefBERT are more accurate and meaningful than those produced by the existing
method