Recently, deep learning techniques have shown great success in automatic code
generation. Inspired by the code reuse, some researchers propose copy-based
approaches that can copy the content from similar code snippets to obtain
better performance. Practically, human developers recognize the content in the
similar code that is relevant to their needs, which can be viewed as a code
sketch. The sketch is further edited to the desired code. However, existing
copy-based approaches ignore the code sketches and tend to repeat the similar
code without necessary modifications, which leads to generating wrong results.
In this paper, we propose a sketch-based code generation approach named
SkCoder to mimic developers' code reuse behavior. Given a natural language
requirement, SkCoder retrieves a similar code snippet, extracts relevant parts
as a code sketch, and edits the sketch into the desired code. Our motivations
are that the extracted sketch provides a well-formed pattern for telling models
"how to write". The post-editing further adds requirement-specific details to
the sketch and outputs the complete code. We conduct experiments on two public
datasets and a new dataset collected by this work. We compare our approach to
20 baselines using 5 widely used metrics. Experimental results show that (1)
SkCoder can generate more correct programs, and outperforms the
state-of-the-art - CodeT5-base by 30.30%, 35.39%, and 29.62% on three datasets.
(2) Our approach is effective to multiple code generation models and improves
them by up to 120.1% in Pass@1. (3) We investigate three plausible code
sketches and discuss the importance of sketches. (4) We manually evaluate the
generated code and prove the superiority of our SkCoder in three aspects.Comment: Accepted by the 45th IEEE/ACM International Conference on Software
Engineering (ICSE 2023