3,339 research outputs found
A Neural Model for Generating Natural Language Summaries of Program Subroutines
Source code summarization -- creating natural language descriptions of source
code behavior -- is a rapidly-growing research topic with applications to
automatic documentation generation, program comprehension, and software
maintenance. Traditional techniques relied on heuristics and templates built
manually by human experts. Recently, data-driven approaches based on neural
machine translation have largely overtaken template-based systems. But nearly
all of these techniques rely almost entirely on programs having good internal
documentation; without clear identifier names, the models fail to create good
summaries. In this paper, we present a neural model that combines words from
code with code structure from an AST. Unlike previous approaches, our model
processes each data source as a separate input, which allows the model to learn
code structure independent of the text in code. This process helps our approach
provide coherent summaries in many cases even when zero internal documentation
is provided. We evaluate our technique with a dataset we created from 2.1m Java
methods. We find improvement over two baseline techniques from SE literature
and one from NLP literature
Automatic Generation of Text Descriptive Comments for Code Blocks
We propose a framework to automatically generate descriptive comments for
source code blocks. While this problem has been studied by many researchers
previously, their methods are mostly based on fixed template and achieves poor
results. Our framework does not rely on any template, but makes use of a new
recursive neural network called Code-RNN to extract features from the source
code and embed them into one vector. When this vector representation is input
to a new recurrent neural network (Code-GRU), the overall framework generates
text descriptions of the code with accuracy (Rouge-2 value) significantly
higher than other learning-based approaches such as sequence-to-sequence model.
The Code-RNN model can also be used in other scenario where the representation
of code is required.Comment: aaai 201
A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes
We propose a model to automatically describe changes introduced in the source
code of a program using natural language. Our method receives as input a set of
code commits, which contains both the modifications and message introduced by
an user. These two modalities are used to train an encoder-decoder
architecture. We evaluated our approach on twelve real world open source
projects from four different programming languages. Quantitative and
qualitative results showed that the proposed approach can generate feasible and
semantically sound descriptions not only in standard in-project settings, but
also in a cross-project setting.Comment: Accepted at ACL 201
A Fine-Grained Approach for Automated Conversion of JUnit Assertions to English
Converting source or unit test code to English has been shown to improve the
maintainability, understandability, and analysis of software and tests. Code
summarizers identify important statements in the source/tests and convert them
to easily understood English sentences using static analysis and NLP
techniques. However, current test summarization approaches handle only a subset
of the variation and customization allowed in the JUnit assert API (a critical
component of test cases) which may affect the accuracy of conversions. In this
paper, we present our work towards improving JUnit test summarization with a
detailed process for converting a total of 45 unique JUnit assertions to
English, including 37 previously-unhandled variations of the assertThat method.
This process has also been implemented and released as the AssertConvert tool.
Initial evaluations have shown that this tool generates English conversions
that accurately represent a wide variety of assertion statements which could be
used for code summarization or other NLP analyses.Comment: In Proceedings of the 4th ACM SIGSOFT International Workshop on NLP
for Software Engineering (NL4SE 18), November 4, 2018, Lake Buena Vista, FL,
USA. ACM, New York, NY, USA, 4 page
Analysis and Detection of Information Types of Open Source Software Issue Discussions
Most modern Issue Tracking Systems (ITSs) for open source software (OSS)
projects allow users to add comments to issues. Over time, these comments
accumulate into discussion threads embedded with rich information about the
software project, which can potentially satisfy the diverse needs of OSS
stakeholders. However, discovering and retrieving relevant information from the
discussion threads is a challenging task, especially when the discussions are
lengthy and the number of issues in ITSs are vast. In this paper, we address
this challenge by identifying the information types presented in OSS issue
discussions. Through qualitative content analysis of 15 complex issue threads
across three projects hosted on GitHub, we uncovered 16 information types and
created a labeled corpus containing 4656 sentences. Our investigation of
supervised, automated classification techniques indicated that, when prior
knowledge about the issue is available, Random Forest can effectively detect
most sentence types using conversational features such as the sentence length
and its position. When classifying sentences from new issues, Logistic
Regression can yield satisfactory performance using textual features for
certain information types, while falling short on others. Our work represents a
nontrivial first step towards tools and techniques for identifying and
obtaining the rich information recorded in the ITSs to support various software
engineering activities and to satisfy the diverse needs of OSS stakeholders.Comment: 41st ACM/IEEE International Conference on Software Engineering
(ICSE2019
Data-Driven Decisions and Actions in Today’s Software Development
Today’s software development is all about data: data about the software product itself, about the process and its different stages, about the customers and markets, about the development, the testing, the integration, the deployment, or the runtime aspects in the cloud. We use static and dynamic data of various kinds and quantities to analyze market feedback, feature impact, code quality, architectural design alternatives, or effects of performance optimizations. Development environments are no longer limited to IDEs in a desktop application or the like but span the Internet using live programming environments such as Cloud9 or large-volume repositories such as BitBucket, GitHub, GitLab, or StackOverflow. Software development has become “live” in the cloud, be it the coding, the testing, or the experimentation with different product options on the Internet. The inherent complexity puts a further burden on developers, since they need to stay alert when constantly switching between tasks in different phases. Research has been analyzing the development process, its data and stakeholders, for decades and is working on various tools that can help developers in their daily tasks to improve the quality of their work and their productivity. In this chapter, we critically reflect on the challenges faced by developers in a typical release cycle, identify inherent problems of the individual phases, and present the current state of the research that can help overcome these issues
- …