Automatically fixing compilation errors can greatly raise the productivity of
software development, by guiding the novice or AI programmers to write and
debug code. Recently, learning-based program repair has gained extensive
attention and became the state-of-the-art in practice. But it still leaves
plenty of space for improvement. In this paper, we propose an end-to-end
solution TransRepair to locate the error lines and create the correct
substitute for a C program simultaneously. Superior to the counterpart, our
approach takes into account the context of erroneous code and diagnostic
compilation feedback. Then we devise a Transformer-based neural network to
learn the ways of repair from the erroneous code as well as its context and the
diagnostic feedback. To increase the effectiveness of TransRepair, we summarize
5 types and 74 fine-grained sub-types of compilations errors from two
real-world program datasets and the Internet. Then a program corruption
technique is developed to synthesize a large dataset with 1,821,275 erroneous C
programs. Through the extensive experiments, we demonstrate that TransRepair
outperforms the state-of-the-art in both single repair accuracy and full repair
accuracy. Further analysis sheds light on the strengths and weaknesses in the
contemporary solutions for future improvement.Comment: 11 pages, accepted to ASE '2