1,288 research outputs found
A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes
We propose a model to automatically describe changes introduced in the source
code of a program using natural language. Our method receives as input a set of
code commits, which contains both the modifications and message introduced by
an user. These two modalities are used to train an encoder-decoder
architecture. We evaluated our approach on twelve real world open source
projects from four different programming languages. Quantitative and
qualitative results showed that the proposed approach can generate feasible and
semantically sound descriptions not only in standard in-project settings, but
also in a cross-project setting.Comment: Accepted at ACL 201
Explainable Software Bot Contributions: Case Study of Automated Bug Fixes
In a software project, esp. in open-source, a contribution is a valuable
piece of work made to the project: writing code, reporting bugs, translating,
improving documentation, creating graphics, etc. We are now at the beginning of
an exciting era where software bots will make contributions that are of similar
nature than those by humans. Dry contributions, with no explanation, are often
ignored or rejected, because the contribution is not understandable per se,
because they are not put into a larger context, because they are not grounded
on idioms shared by the core community of developers. We have been operating a
program repair bot called Repairnator for 2 years and noticed the problem of
"dry patches": a patch that does not say which bug it fixes, or that does not
explain the effects of the patch on the system. We envision program repair
systems that produce an "explainable bug fix": an integrated package of at
least 1) a patch, 2) its explanation in natural or controlled language, and 3)
a highlight of the behavioral difference with examples. In this paper, we
generalize and suggest that software bot contributions must explainable, that
they must be put into the context of the global software development
conversation
A Neural Model for Generating Natural Language Summaries of Program Subroutines
Source code summarization -- creating natural language descriptions of source
code behavior -- is a rapidly-growing research topic with applications to
automatic documentation generation, program comprehension, and software
maintenance. Traditional techniques relied on heuristics and templates built
manually by human experts. Recently, data-driven approaches based on neural
machine translation have largely overtaken template-based systems. But nearly
all of these techniques rely almost entirely on programs having good internal
documentation; without clear identifier names, the models fail to create good
summaries. In this paper, we present a neural model that combines words from
code with code structure from an AST. Unlike previous approaches, our model
processes each data source as a separate input, which allows the model to learn
code structure independent of the text in code. This process helps our approach
provide coherent summaries in many cases even when zero internal documentation
is provided. We evaluate our technique with a dataset we created from 2.1m Java
methods. We find improvement over two baseline techniques from SE literature
and one from NLP literature
Data-Driven Decisions and Actions in Today’s Software Development
Today’s software development is all about data: data about the software product itself, about the process and its different stages, about the customers and markets, about the development, the testing, the integration, the deployment, or the runtime aspects in the cloud. We use static and dynamic data of various kinds and quantities to analyze market feedback, feature impact, code quality, architectural design alternatives, or effects of performance optimizations. Development environments are no longer limited to IDEs in a desktop application or the like but span the Internet using live programming environments such as Cloud9 or large-volume repositories such as BitBucket, GitHub, GitLab, or StackOverflow. Software development has become “live” in the cloud, be it the coding, the testing, or the experimentation with different product options on the Internet. The inherent complexity puts a further burden on developers, since they need to stay alert when constantly switching between tasks in different phases. Research has been analyzing the development process, its data and stakeholders, for decades and is working on various tools that can help developers in their daily tasks to improve the quality of their work and their productivity. In this chapter, we critically reflect on the challenges faced by developers in a typical release cycle, identify inherent problems of the individual phases, and present the current state of the research that can help overcome these issues
On the Relevance of Cross-project Learning with Nearest Neighbours for Commit Message Generation
Commit messages play an important role in software maintenance and evolution.
Nonetheless, developers often do not produce high-quality messages. A number of
commit message generation methods have been proposed in recent years to address
this problem. Some of these methods are based on neural machine translation
(NMT) techniques. Studies show that the nearest neighbor algorithm (NNGen)
outperforms existing NMT-based methods, although NNGen is simpler and faster
than NMT. In this paper, we show that NNGen does not take advantage of
cross-project learning in the majority of the cases. We also show that there is
an even simpler and faster variation of the existing NNGen method which
outperforms it in terms of the BLEU_4 score without using cross-project
learning
- …