445 research outputs found
Towards Automatic Generation of Short Summaries of Commits
Committing to a version control system means submitting a software change to
the system. Each commit can have a message to describe the submission. Several
approaches have been proposed to automatically generate the content of such
messages. However, the quality of the automatically generated messages falls
far short of what humans write. In studying the differences between
auto-generated and human-written messages, we found that 82% of the
human-written messages have only one sentence, while the automatically
generated messages often have multiple lines. Furthermore, we found that the
commit messages often begin with a verb followed by an direct object. This
finding inspired us to use a "verb+object" format in this paper to generate
short commit summaries. We split the approach into two parts: verb generation
and object generation. As our first try, we trained a classifier to classify a
diff to a verb. We are seeking feedback from the community before we continue
to work on generating direct objects for the commits.Comment: 4 pages, accepted in ICPC 2017 ERA Trac
A Neural Model for Generating Natural Language Summaries of Program Subroutines
Source code summarization -- creating natural language descriptions of source
code behavior -- is a rapidly-growing research topic with applications to
automatic documentation generation, program comprehension, and software
maintenance. Traditional techniques relied on heuristics and templates built
manually by human experts. Recently, data-driven approaches based on neural
machine translation have largely overtaken template-based systems. But nearly
all of these techniques rely almost entirely on programs having good internal
documentation; without clear identifier names, the models fail to create good
summaries. In this paper, we present a neural model that combines words from
code with code structure from an AST. Unlike previous approaches, our model
processes each data source as a separate input, which allows the model to learn
code structure independent of the text in code. This process helps our approach
provide coherent summaries in many cases even when zero internal documentation
is provided. We evaluate our technique with a dataset we created from 2.1m Java
methods. We find improvement over two baseline techniques from SE literature
and one from NLP literature
The current situation and management of idle rural homesteads in China - based on a survey in Jiangxi province
Generally, China is still in the middle accelerating stage of urbanization. Rural idle homesteads are the main problems of rural areas in China, according to two elements (the population and land) can be divided into two types: the first is one household with houses, and the second is the population migration. Through the research questionnaire and interview analysis, the authors know that the traditional land concept is still deeply rooted among farmers. The phenomena of building houses only along the roads and the multi-story ostentation are prominent. The needs of traditional agricultural production have become a major obstacle to the management system of idle homesteads. The root cause for idle homesteads is the inevitable result of social and economic development, but also because the current law lags and inadequate management systems are not regulated properly, that is why it is becoming more and more serious. The authors suggest that the management system of idle homesteads should be divided into three steps based on villager autonomy: the first step is to promote the voluntary withdraw system of idle homesteads, the second step is to issue homesteads use right certificates, the third step is the classification of the ways of idle homesteads use
The Path and Enlightenment of Data-Driven Digital Transformation of Organizational Learning ——A Case Study of the Practice of China Telecom
This paper took China Telecom as a case. It has analyzed data-driven digital transformation in organizational learning, and summarized the methods and enlightenments of digital transformation
Fabrication and characterizations of proton-exchanged LiNbO3 waveguides fabricated by inductively coupled plasma technique
This Letter reports the use of an inductively coupled plasma technique for fabrication of proton-exchanged (PE) LiNbO3 (LN) waveguides. Planar and stripe waveguides have been formed in Y-cut LN which are difficult to obtain with the conventional molten acid method due to the occurrence of surface damage. Secondary ion mass spectrometry, scanning electron microscopy, and infrared absorption spectrum characterization results revealed that a uniform vertical PE profile with a single low order crystal phase has been directly obtained as a result of this unique process. X-ray photoelectron spectroscopy characterization of the treated surface revealed the existence of NbO as the cause for a sometimes darkened surface and confirms the ability to completely restore the surface to LN by oxygen plasma treatment. Atomic force microscopy measurement confirms that good surface quality has been maintained after regeneration of the surface to LN
Statement-based Memory for Neural Source Code Summarization
Source code summarization is the task of writing natural language
descriptions of source code behavior. Code summarization underpins software
documentation for programmers. Short descriptions of code help programmers
understand the program quickly without having to read the code itself. Lately,
neural source code summarization has emerged as the frontier of research into
automated code summarization techniques. By far the most popular targets for
summarization are program subroutines. The idea, in a nutshell, is to train an
encoder-decoder neural architecture using large sets of examples of subroutines
extracted from code repositories. The encoder represents the code and the
decoder represents the summary. However, most current approaches attempt to
treat the subroutine as a single unit. For example, by taking the entire
subroutine as input to a Transformer or RNN-based encoder. But code behavior
tends to depend on the flow from statement to statement. Normally dynamic
analysis may shed light on this flow, but dynamic analysis on hundreds of
thousands of examples in large datasets is not practical. In this paper, we
present a statement-based memory encoder that learns the important elements of
flow during training, leading to a statement-based subroutine representation
without the need for dynamic analysis. We implement our encoder for code
summarization and demonstrate a significant improvement over the
state-of-the-art.Comment: 10 pages 2 figure
- …