4,393 research outputs found
A Neural Model for Generating Natural Language Summaries of Program Subroutines
Source code summarization -- creating natural language descriptions of source
code behavior -- is a rapidly-growing research topic with applications to
automatic documentation generation, program comprehension, and software
maintenance. Traditional techniques relied on heuristics and templates built
manually by human experts. Recently, data-driven approaches based on neural
machine translation have largely overtaken template-based systems. But nearly
all of these techniques rely almost entirely on programs having good internal
documentation; without clear identifier names, the models fail to create good
summaries. In this paper, we present a neural model that combines words from
code with code structure from an AST. Unlike previous approaches, our model
processes each data source as a separate input, which allows the model to learn
code structure independent of the text in code. This process helps our approach
provide coherent summaries in many cases even when zero internal documentation
is provided. We evaluate our technique with a dataset we created from 2.1m Java
methods. We find improvement over two baseline techniques from SE literature
and one from NLP literature
Explainable Software Bot Contributions: Case Study of Automated Bug Fixes
In a software project, esp. in open-source, a contribution is a valuable
piece of work made to the project: writing code, reporting bugs, translating,
improving documentation, creating graphics, etc. We are now at the beginning of
an exciting era where software bots will make contributions that are of similar
nature than those by humans. Dry contributions, with no explanation, are often
ignored or rejected, because the contribution is not understandable per se,
because they are not put into a larger context, because they are not grounded
on idioms shared by the core community of developers. We have been operating a
program repair bot called Repairnator for 2 years and noticed the problem of
"dry patches": a patch that does not say which bug it fixes, or that does not
explain the effects of the patch on the system. We envision program repair
systems that produce an "explainable bug fix": an integrated package of at
least 1) a patch, 2) its explanation in natural or controlled language, and 3)
a highlight of the behavioral difference with examples. In this paper, we
generalize and suggest that software bot contributions must explainable, that
they must be put into the context of the global software development
conversation
Automatic Generation of Text Descriptive Comments for Code Blocks
We propose a framework to automatically generate descriptive comments for
source code blocks. While this problem has been studied by many researchers
previously, their methods are mostly based on fixed template and achieves poor
results. Our framework does not rely on any template, but makes use of a new
recursive neural network called Code-RNN to extract features from the source
code and embed them into one vector. When this vector representation is input
to a new recurrent neural network (Code-GRU), the overall framework generates
text descriptions of the code with accuracy (Rouge-2 value) significantly
higher than other learning-based approaches such as sequence-to-sequence model.
The Code-RNN model can also be used in other scenario where the representation
of code is required.Comment: aaai 201
A Fine-Grained Approach for Automated Conversion of JUnit Assertions to English
Converting source or unit test code to English has been shown to improve the
maintainability, understandability, and analysis of software and tests. Code
summarizers identify important statements in the source/tests and convert them
to easily understood English sentences using static analysis and NLP
techniques. However, current test summarization approaches handle only a subset
of the variation and customization allowed in the JUnit assert API (a critical
component of test cases) which may affect the accuracy of conversions. In this
paper, we present our work towards improving JUnit test summarization with a
detailed process for converting a total of 45 unique JUnit assertions to
English, including 37 previously-unhandled variations of the assertThat method.
This process has also been implemented and released as the AssertConvert tool.
Initial evaluations have shown that this tool generates English conversions
that accurately represent a wide variety of assertion statements which could be
used for code summarization or other NLP analyses.Comment: In Proceedings of the 4th ACM SIGSOFT International Workshop on NLP
for Software Engineering (NL4SE 18), November 4, 2018, Lake Buena Vista, FL,
USA. ACM, New York, NY, USA, 4 page
Leveraging Deep Learning for Abstractive Code Summarization of Unofficial Documentation
Usually, programming languages have official documentation to guide
developers with APIs, methods, and classes. However, researchers identified
insufficient or inadequate documentation examples and flaws with the API's
complex structure as barriers to learning an API. As a result, developers may
consult other sources (StackOverflow, GitHub, etc.) to learn more about an API.
Recent research studies have shown that unofficial documentation is a valuable
source of information for generating code summaries. We, therefore, have been
motivated to leverage such a type of documentation along with deep learning
techniques towards generating high-quality summaries for APIs discussed in
informal documentation. This paper proposes an automatic approach using the
BART algorithm, a state-of-the-art transformer model, to generate summaries for
APIs discussed in StackOverflow. We built an oracle of human-generated
summaries to evaluate our approach against it using ROUGE and BLEU metrics
which are the most widely used evaluation metrics in text summarization.
Furthermore, we evaluated our summaries empirically against a previous work in
terms of quality. Our findings demonstrate that using deep learning algorithms
can improve summaries' quality and outperform the previous work by an average
of %57 for Precision, %66 for Recall, and %61 for F-measure, and it runs 4.4
times faster
- …