2 research outputs found
A Survey of Automatic Generation of Source Code Comments: Algorithms and Techniques
As an integral part of source code files, code comments help improve program
readability and comprehension. However, developers sometimes do not comment on
their program code adequately due to the incurred extra efforts, lack of
relevant knowledge, unawareness of the importance of code commenting or some
other factors. As a result, code comments can be inadequate, absent or even
mismatched with source code, which affects the understanding, reusing and the
maintenance of software. To solve these problems of code comments, researchers
have been concerned with generating code comments automatically. In this work,
we aim at conducting a survey of automatic code commenting researches. First,
we generally analyze the challenges and research framework of automatic
generation of program comments. Second, we present the classification of
representative algorithms, the design principles, strengths and weaknesses of
each category of algorithms. Meanwhile, we also provide an overview of the
quality assessment of the generated comments. Finally, we summarize some future
directions for advancing the techniques of automatic generation of code
comments and the quality assessment of comments.Comment: 22 pages, 5 figure
Impact of Limited Memory Resources
Since early variable mnemonics were limited to as few as six to eight characters, many early programmers abbreviated concepts in their variable names. The past thirty years has seen a steady increase in permitted name length and, slowly, an increase in the actual length of identifiers. However, in theory names can be too long. Most obviously, in object-oriented programs, names often involve chaining of method calls and field selectors (e.g., class.firstAssignment().name.trim()). While longer names bring the potential for easier comprehension through more embedded sub-words, there are practical limits to length given limited human memory resources. The central hypothesis studied herein is that names used in modern programs have reached this limit. Statistical models derived from an experiment involving 158 programmers of varying degrees of experience show that longer names extracted from production code take more time to process and reduce correctness in a simple recall activity. This has clear negative implications for any attempt to read, and hence comprehend or manipulate, the source code of modern software. The experiment also evaluates the advantage of identifiers having ties to a programmer’s persistent memory. Combined these results reinforce past proposals advocating the use of limited, consistent, and regular vocabulary in identifier names. In particular, good naming limits length and reduces the need for specialized vocabulary.