11,131 research outputs found
A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes
We propose a model to automatically describe changes introduced in the source
code of a program using natural language. Our method receives as input a set of
code commits, which contains both the modifications and message introduced by
an user. These two modalities are used to train an encoder-decoder
architecture. We evaluated our approach on twelve real world open source
projects from four different programming languages. Quantitative and
qualitative results showed that the proposed approach can generate feasible and
semantically sound descriptions not only in standard in-project settings, but
also in a cross-project setting.Comment: Accepted at ACL 201
Deep Recurrent Generative Decoder for Abstractive Text Summarization
We propose a new framework for abstractive text summarization based on a
sequence-to-sequence oriented encoder-decoder model equipped with a deep
recurrent generative decoder (DRGN).
Latent structure information implied in the target summaries is learned based
on a recurrent latent random model for improving the summarization quality.
Neural variational inference is employed to address the intractable posterior
inference for the recurrent latent variables.
Abstractive summaries are generated based on both the generative latent
variables and the discriminative deterministic states.
Extensive experiments on some benchmark datasets in different languages show
that DRGN achieves improvements over the state-of-the-art methods.Comment: 10 pages, EMNLP 201
Automatic Generation of Text Descriptive Comments for Code Blocks
We propose a framework to automatically generate descriptive comments for
source code blocks. While this problem has been studied by many researchers
previously, their methods are mostly based on fixed template and achieves poor
results. Our framework does not rely on any template, but makes use of a new
recursive neural network called Code-RNN to extract features from the source
code and embed them into one vector. When this vector representation is input
to a new recurrent neural network (Code-GRU), the overall framework generates
text descriptions of the code with accuracy (Rouge-2 value) significantly
higher than other learning-based approaches such as sequence-to-sequence model.
The Code-RNN model can also be used in other scenario where the representation
of code is required.Comment: aaai 201
Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization
Generating a text abstract from a set of documents remains a challenging
task. The neural encoder-decoder framework has recently been exploited to
summarize single documents, but its success can in part be attributed to the
availability of large parallel data automatically acquired from the Web. In
contrast, parallel data for multi-document summarization are scarce and costly
to obtain. There is a pressing need to adapt an encoder-decoder model trained
on single-document summarization data to work with multiple-document input. In
this paper, we present an initial investigation into a novel adaptation method.
It exploits the maximal marginal relevance method to select representative
sentences from multi-document input, and leverages an abstractive
encoder-decoder model to fuse disparate sentences to an abstractive summary.
The adaptation method is robust and itself requires no training data. Our
system compares favorably to state-of-the-art extractive and abstractive
approaches judged by automatic metrics and human assessors.Comment: 11 page
Explainable Software Bot Contributions: Case Study of Automated Bug Fixes
In a software project, esp. in open-source, a contribution is a valuable
piece of work made to the project: writing code, reporting bugs, translating,
improving documentation, creating graphics, etc. We are now at the beginning of
an exciting era where software bots will make contributions that are of similar
nature than those by humans. Dry contributions, with no explanation, are often
ignored or rejected, because the contribution is not understandable per se,
because they are not put into a larger context, because they are not grounded
on idioms shared by the core community of developers. We have been operating a
program repair bot called Repairnator for 2 years and noticed the problem of
"dry patches": a patch that does not say which bug it fixes, or that does not
explain the effects of the patch on the system. We envision program repair
systems that produce an "explainable bug fix": an integrated package of at
least 1) a patch, 2) its explanation in natural or controlled language, and 3)
a highlight of the behavioral difference with examples. In this paper, we
generalize and suggest that software bot contributions must explainable, that
they must be put into the context of the global software development
conversation
A Neural Model for Generating Natural Language Summaries of Program Subroutines
Source code summarization -- creating natural language descriptions of source
code behavior -- is a rapidly-growing research topic with applications to
automatic documentation generation, program comprehension, and software
maintenance. Traditional techniques relied on heuristics and templates built
manually by human experts. Recently, data-driven approaches based on neural
machine translation have largely overtaken template-based systems. But nearly
all of these techniques rely almost entirely on programs having good internal
documentation; without clear identifier names, the models fail to create good
summaries. In this paper, we present a neural model that combines words from
code with code structure from an AST. Unlike previous approaches, our model
processes each data source as a separate input, which allows the model to learn
code structure independent of the text in code. This process helps our approach
provide coherent summaries in many cases even when zero internal documentation
is provided. We evaluate our technique with a dataset we created from 2.1m Java
methods. We find improvement over two baseline techniques from SE literature
and one from NLP literature
- …