5,476 research outputs found
A Neural Model for Generating Natural Language Summaries of Program Subroutines
Source code summarization -- creating natural language descriptions of source
code behavior -- is a rapidly-growing research topic with applications to
automatic documentation generation, program comprehension, and software
maintenance. Traditional techniques relied on heuristics and templates built
manually by human experts. Recently, data-driven approaches based on neural
machine translation have largely overtaken template-based systems. But nearly
all of these techniques rely almost entirely on programs having good internal
documentation; without clear identifier names, the models fail to create good
summaries. In this paper, we present a neural model that combines words from
code with code structure from an AST. Unlike previous approaches, our model
processes each data source as a separate input, which allows the model to learn
code structure independent of the text in code. This process helps our approach
provide coherent summaries in many cases even when zero internal documentation
is provided. We evaluate our technique with a dataset we created from 2.1m Java
methods. We find improvement over two baseline techniques from SE literature
and one from NLP literature
Introduction to Microservice API Patterns (MAP)
The Microservice API Patterns (MAP) language and supporting website premiered under this name at Microservices 2019. MAP distills proven, platform- and technology-independent solutions to recurring (micro-)service design and interface specification problems such as finding well-fitting service granularities, rightsizing message representations, and managing the evolution of APIs and their implementations. In this paper, we motivate the need for such a pattern language, outline the language organization and present two exemplary patterns describing alternative options for representing nested data. We also identify future research and development directions
Does State Policy Help or Hurt the Dropout Problem in California?
Examines the scope and causes of California's dropout problem, and assesses whether some state policies unintentionally drive students out of schools. Proposes a comprehensive policy framework focused on effectively serving at-risk students
Code Generation as a Dual Task of Code Summarization
Code summarization (CS) and code generation (CG) are two crucial tasks in the
field of automatic software development. Various neural network-based
approaches are proposed to solve these two tasks separately. However, there
exists a specific intuitive correlation between CS and CG, which have not been
exploited in previous work. In this paper, we apply the relations between two
tasks to improve the performance of both tasks. In other words, exploiting the
duality between the two tasks, we propose a dual training framework to train
the two tasks simultaneously. In this framework, we consider the dualities on
probability and attention weights, and design corresponding regularization
terms to constrain the duality. We evaluate our approach on two datasets
collected from GitHub, and experimental results show that our dual framework
can improve the performance of CS and CG tasks over baselines.Comment: To appear at the 33rd Conference on Neural Information Processing
Systems (NeurIPS) 201
CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning
To accelerate software development, much research has been performed to help
people understand and reuse the huge amount of available code resources. Two
important tasks have been widely studied: code retrieval, which aims to
retrieve code snippets relevant to a given natural language query from a code
base, and code annotation, where the goal is to annotate a code snippet with a
natural language description. Despite their advancement in recent years, the
two tasks are mostly explored separately. In this work, we investigate a novel
perspective of Code annotation for Code retrieval (hence called `CoaCor'),
where a code annotation model is trained to generate a natural language
annotation that can represent the semantic meaning of a given code snippet and
can be leveraged by a code retrieval model to better distinguish relevant code
snippets from others. To this end, we propose an effective framework based on
reinforcement learning, which explicitly encourages the code annotation model
to generate annotations that can be used for the retrieval task. Through
extensive experiments, we show that code annotations generated by our framework
are much more detailed and more useful for code retrieval, and they can further
improve the performance of existing code retrieval models significantly.Comment: 10 pages, 2 figures. Accepted by The Web Conference (WWW) 201
- …