135,963 research outputs found
Leveraging Deep Learning for Abstractive Code Summarization of Unofficial Documentation
Usually, programming languages have official documentation to guide
developers with APIs, methods, and classes. However, researchers identified
insufficient or inadequate documentation examples and flaws with the API's
complex structure as barriers to learning an API. As a result, developers may
consult other sources (StackOverflow, GitHub, etc.) to learn more about an API.
Recent research studies have shown that unofficial documentation is a valuable
source of information for generating code summaries. We, therefore, have been
motivated to leverage such a type of documentation along with deep learning
techniques towards generating high-quality summaries for APIs discussed in
informal documentation. This paper proposes an automatic approach using the
BART algorithm, a state-of-the-art transformer model, to generate summaries for
APIs discussed in StackOverflow. We built an oracle of human-generated
summaries to evaluate our approach against it using ROUGE and BLEU metrics
which are the most widely used evaluation metrics in text summarization.
Furthermore, we evaluated our summaries empirically against a previous work in
terms of quality. Our findings demonstrate that using deep learning algorithms
can improve summaries' quality and outperform the previous work by an average
of %57 for Precision, %66 for Recall, and %61 for F-measure, and it runs 4.4
times faster
A Neural Model for Generating Natural Language Summaries of Program Subroutines
Source code summarization -- creating natural language descriptions of source
code behavior -- is a rapidly-growing research topic with applications to
automatic documentation generation, program comprehension, and software
maintenance. Traditional techniques relied on heuristics and templates built
manually by human experts. Recently, data-driven approaches based on neural
machine translation have largely overtaken template-based systems. But nearly
all of these techniques rely almost entirely on programs having good internal
documentation; without clear identifier names, the models fail to create good
summaries. In this paper, we present a neural model that combines words from
code with code structure from an AST. Unlike previous approaches, our model
processes each data source as a separate input, which allows the model to learn
code structure independent of the text in code. This process helps our approach
provide coherent summaries in many cases even when zero internal documentation
is provided. We evaluate our technique with a dataset we created from 2.1m Java
methods. We find improvement over two baseline techniques from SE literature
and one from NLP literature
Example-based controlled translation
The first research on integrating controlled language data in an Example-Based Machine Translation (EBMT) system was published in [Gough & Way, 2003]. We improve on their sub-sentential alignment algorithm to populate the system’s databases with more than six times as many potentially useful fragments. Together with two simple novel improvements—correcting mistranslations in the lexicon, and allowing multiple translations in the lexicon—translation quality improves considerably when target language
translations are constrained. We also develop the first EBMT system which attempts to filter the source language data using controlled language specifications. We provide
detailed automatic and human evaluations of a number of experiments carried out to test the quality of the system. We observe that our system outperforms Logomedia in a number of tests. Finally, despite conflicting results from different automatic evaluation metrics, we observe a preference for controlling the source data rather than the target translations
Controlled generation in example-based machine translation
The theme of controlled translation is currently in vogue in the area of MT. Recent research (Sch¨aler et al., 2003;
Carl, 2003) hypothesises that EBMT systems are perhaps best suited to this challenging task. In this paper, we present
an EBMT system where the generation of the target string is filtered by data written according to controlled language
specifications. As far as we are aware, this is the only research available on this topic. In the field of controlled language applications, it is more usual to constrain the source language in this way rather than the target. We translate a small corpus of controlled English into French using the on-line MT system Logomedia, and seed the memories of our EBMT system with a set of automatically induced lexical resources using the Marker Hypothesis as a segmentation tool. We test our system on a large set of sentences extracted from a Sun Translation Memory, and provide both an automatic and a human evaluation. For comparative purposes, we also provide results for Logomedia itself
- …