37 research outputs found
A Fine-Grained Approach for Automated Conversion of JUnit Assertions to English
Converting source or unit test code to English has been shown to improve the
maintainability, understandability, and analysis of software and tests. Code
summarizers identify important statements in the source/tests and convert them
to easily understood English sentences using static analysis and NLP
techniques. However, current test summarization approaches handle only a subset
of the variation and customization allowed in the JUnit assert API (a critical
component of test cases) which may affect the accuracy of conversions. In this
paper, we present our work towards improving JUnit test summarization with a
detailed process for converting a total of 45 unique JUnit assertions to
English, including 37 previously-unhandled variations of the assertThat method.
This process has also been implemented and released as the AssertConvert tool.
Initial evaluations have shown that this tool generates English conversions
that accurately represent a wide variety of assertion statements which could be
used for code summarization or other NLP analyses.Comment: In Proceedings of the 4th ACM SIGSOFT International Workshop on NLP
for Software Engineering (NL4SE 18), November 4, 2018, Lake Buena Vista, FL,
USA. ACM, New York, NY, USA, 4 page
Automatic Generation of Text Descriptive Comments for Code Blocks
We propose a framework to automatically generate descriptive comments for
source code blocks. While this problem has been studied by many researchers
previously, their methods are mostly based on fixed template and achieves poor
results. Our framework does not rely on any template, but makes use of a new
recursive neural network called Code-RNN to extract features from the source
code and embed them into one vector. When this vector representation is input
to a new recurrent neural network (Code-GRU), the overall framework generates
text descriptions of the code with accuracy (Rouge-2 value) significantly
higher than other learning-based approaches such as sequence-to-sequence model.
The Code-RNN model can also be used in other scenario where the representation
of code is required.Comment: aaai 201
A Neural Model for Generating Natural Language Summaries of Program Subroutines
Source code summarization -- creating natural language descriptions of source
code behavior -- is a rapidly-growing research topic with applications to
automatic documentation generation, program comprehension, and software
maintenance. Traditional techniques relied on heuristics and templates built
manually by human experts. Recently, data-driven approaches based on neural
machine translation have largely overtaken template-based systems. But nearly
all of these techniques rely almost entirely on programs having good internal
documentation; without clear identifier names, the models fail to create good
summaries. In this paper, we present a neural model that combines words from
code with code structure from an AST. Unlike previous approaches, our model
processes each data source as a separate input, which allows the model to learn
code structure independent of the text in code. This process helps our approach
provide coherent summaries in many cases even when zero internal documentation
is provided. We evaluate our technique with a dataset we created from 2.1m Java
methods. We find improvement over two baseline techniques from SE literature
and one from NLP literature
CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning
To accelerate software development, much research has been performed to help
people understand and reuse the huge amount of available code resources. Two
important tasks have been widely studied: code retrieval, which aims to
retrieve code snippets relevant to a given natural language query from a code
base, and code annotation, where the goal is to annotate a code snippet with a
natural language description. Despite their advancement in recent years, the
two tasks are mostly explored separately. In this work, we investigate a novel
perspective of Code annotation for Code retrieval (hence called `CoaCor'),
where a code annotation model is trained to generate a natural language
annotation that can represent the semantic meaning of a given code snippet and
can be leveraged by a code retrieval model to better distinguish relevant code
snippets from others. To this end, we propose an effective framework based on
reinforcement learning, which explicitly encourages the code annotation model
to generate annotations that can be used for the retrieval task. Through
extensive experiments, we show that code annotations generated by our framework
are much more detailed and more useful for code retrieval, and they can further
improve the performance of existing code retrieval models significantly.Comment: 10 pages, 2 figures. Accepted by The Web Conference (WWW) 201
Automatically Generating Documentation for Lambda Expressions in Java
When lambda expressions were introduced to the Java programming language as
part of the release of Java 8 in 2014, they were the language's first step into
functional programming. Since lambda expressions are still relatively new, not
all developers use or understand them. In this paper, we first present the
results of an empirical study to determine how frequently developers of GitHub
repositories make use of lambda expressions and how they are documented. We
find that 11% of Java GitHub repositories use lambda expressions, and that only
6% of the lambda expressions are accompanied by source code comments. We then
present a tool called LambdaDoc which can automatically detect lambda
expressions in a Java repository and generate natural language documentation
for them. Our evaluation of LambdaDoc with 23 professional developers shows
that they perceive the generated documentation to be complete, concise, and
expressive, while the majority of the documentation produced by our
participants without tool support was inadequate. Our contribution builds an
important step towards automatically generating documentation for functional
programming constructs in an object-oriented language.Comment: to appear as full paper at MSR 2019, the 16th International
Conference on Mining Software Repositorie