Search CORE

90,557 research outputs found

A Neural Model for Generating Natural Language Summaries of Program Subroutines

Author: Jiang Siyuan
LeClair Alexander
McMillan Collin
Publication venue
Publication date: 05/02/2019
Field of study

Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built manually by human experts. Recently, data-driven approaches based on neural machine translation have largely overtaken template-based systems. But nearly all of these techniques rely almost entirely on programs having good internal documentation; without clear identifier names, the models fail to create good summaries. In this paper, we present a neural model that combines words from code with code structure from an AST. Unlike previous approaches, our model processes each data source as a separate input, which allows the model to learn code structure independent of the text in code. This process helps our approach provide coherent summaries in many cases even when zero internal documentation is provided. We evaluate our technique with a dataset we created from 2.1m Java methods. We find improvement over two baseline techniques from SE literature and one from NLP literature

arXiv.org e-Print Archive

Crossref

Eastern Michigan University: Digital Commons@EMU

Do we really need to write documentation for a system? CASE tool add-ons: generator+editor for a precise documentation

Author: Mou Dongyue
Spichkova Maria
Zhu Xiuna
Publication venue
Publication date: 29/04/2014
Field of study

One of the common problems of system development projects is that the system documentation is often outdated and does not describe the latest version of the system. The situation is even more complicated if we are speaking not about a natural language description of the system, but about its formal specification. In this paper we discuss how the problem could be solved by updating the documentation automatically, by generating a new formal specification from the model if the model is frequently changed.Comment: In Proceedings International Conference on Model-Driven Engineering and Software Development (MODELSWARD'13

arXiv.org e-Print Archive

RMIT Research Repository

Learning Semantic Correspondences in Technical Documentation

Author: Kuhn Jonas
Richardson Kyle
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

We consider the problem of translating high-level textual descriptions to formal representations in technical documentation as part of an effort to model the meaning of such documentation. We focus specifically on the problem of learning translational correspondences between text descriptions and grounded representations in the target documentation, such as formal representation of functions or code templates. Our approach exploits the parallel nature of such documentation, or the tight coupling between high-level text and the low-level representations we aim to learn. Data is collected by mining technical documents for such parallel text-representation pairs, which we use to train a simple semantic parsing model. We report new baseline results on sixteen novel datasets, including the standard library documentation for nine popular programming languages across seven natural languages, and a small collection of Unix utility manuals.Comment: accepted to ACL-201

arXiv.org e-Print Archive

Crossref

A Fine-Grained Approach for Automated Conversion of JUnit Assertions to English

Author: Gonzalez Danielle
Mirakhorli Mehdi
Prentice Suzanne
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/11/2018
Field of study

Converting source or unit test code to English has been shown to improve the maintainability, understandability, and analysis of software and tests. Code summarizers identify important statements in the source/tests and convert them to easily understood English sentences using static analysis and NLP techniques. However, current test summarization approaches handle only a subset of the variation and customization allowed in the JUnit assert API (a critical component of test cases) which may affect the accuracy of conversions. In this paper, we present our work towards improving JUnit test summarization with a detailed process for converting a total of 45 unique JUnit assertions to English, including 37 previously-unhandled variations of the assertThat method. This process has also been implemented and released as the AssertConvert tool. Initial evaluations have shown that this tool generates English conversions that accurately represent a wide variety of assertion statements which could be used for code summarization or other NLP analyses.Comment: In Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering (NL4SE 18), November 4, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 page

arXiv.org e-Print Archive

Crossref