2 research outputs found

    A Survey of Automatic Generation of Source Code Comments: Algorithms and Techniques

    Full text link
    As an integral part of source code files, code comments help improve program readability and comprehension. However, developers sometimes do not comment on their program code adequately due to the incurred extra efforts, lack of relevant knowledge, unawareness of the importance of code commenting or some other factors. As a result, code comments can be inadequate, absent or even mismatched with source code, which affects the understanding, reusing and the maintenance of software. To solve these problems of code comments, researchers have been concerned with generating code comments automatically. In this work, we aim at conducting a survey of automatic code commenting researches. First, we generally analyze the challenges and research framework of automatic generation of program comments. Second, we present the classification of representative algorithms, the design principles, strengths and weaknesses of each category of algorithms. Meanwhile, we also provide an overview of the quality assessment of the generated comments. Finally, we summarize some future directions for advancing the techniques of automatic generation of code comments and the quality assessment of comments.Comment: 22 pages, 5 figure

    Impact of Limited Memory Resources

    No full text
    Since early variable mnemonics were limited to as few as six to eight characters, many early programmers abbreviated concepts in their variable names. The past thirty years has seen a steady increase in permitted name length and, slowly, an increase in the actual length of identifiers. However, in theory names can be too long. Most obviously, in object-oriented programs, names often involve chaining of method calls and field selectors (e.g., class.firstAssignment().name.trim()). While longer names bring the potential for easier comprehension through more embedded sub-words, there are practical limits to length given limited human memory resources. The central hypothesis studied herein is that names used in modern programs have reached this limit. Statistical models derived from an experiment involving 158 programmers of varying degrees of experience show that longer names extracted from production code take more time to process and reduce correctness in a simple recall activity. This has clear negative implications for any attempt to read, and hence comprehend or manipulate, the source code of modern software. The experiment also evaluates the advantage of identifiers having ties to a programmer’s persistent memory. Combined these results reinforce past proposals advocating the use of limited, consistent, and regular vocabulary in identifier names. In particular, good naming limits length and reduces the need for specialized vocabulary.
    corecore