1,008 research outputs found

    A Fine-Grained Approach for Automated Conversion of JUnit Assertions to English

    Full text link
    Converting source or unit test code to English has been shown to improve the maintainability, understandability, and analysis of software and tests. Code summarizers identify important statements in the source/tests and convert them to easily understood English sentences using static analysis and NLP techniques. However, current test summarization approaches handle only a subset of the variation and customization allowed in the JUnit assert API (a critical component of test cases) which may affect the accuracy of conversions. In this paper, we present our work towards improving JUnit test summarization with a detailed process for converting a total of 45 unique JUnit assertions to English, including 37 previously-unhandled variations of the assertThat method. This process has also been implemented and released as the AssertConvert tool. Initial evaluations have shown that this tool generates English conversions that accurately represent a wide variety of assertion statements which could be used for code summarization or other NLP analyses.Comment: In Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering (NL4SE 18), November 4, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 page

    A Neural Model for Generating Natural Language Summaries of Program Subroutines

    Full text link
    Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built manually by human experts. Recently, data-driven approaches based on neural machine translation have largely overtaken template-based systems. But nearly all of these techniques rely almost entirely on programs having good internal documentation; without clear identifier names, the models fail to create good summaries. In this paper, we present a neural model that combines words from code with code structure from an AST. Unlike previous approaches, our model processes each data source as a separate input, which allows the model to learn code structure independent of the text in code. This process helps our approach provide coherent summaries in many cases even when zero internal documentation is provided. We evaluate our technique with a dataset we created from 2.1m Java methods. We find improvement over two baseline techniques from SE literature and one from NLP literature

    Automatic generation of descriptions for Prolog programs

    Get PDF
    It is often hard for students and newcomers used to imperative languages to learn a declarative language such as Prolog. One of their main difficulties is understanding the procedural component of Prolog. Despite being a declarative language, Prolog allows for the creation of procedures whose structure is very different from the more common imperative languages. To tackle this issue, we try to facilitate code comprehension of procedural Prolog through the generation of formal and natural descriptions. First, we represent the workflow of Prolog encoded procedures through formal descriptions similar to imperative languages. To do this, we identify programming patterns that represent the basic blocks of certain classes of Prolog programs. Then we view more complex Prolog programs as coherent compositions of instances of the basic patterns. By using formal templates, we formally describe these individual patterns into an intermediate formal language. Afterwards, we generate natural language descriptions by using templates to describe the formal constructs. Using this two-step approach, we obtain two descriptions (one formal and one in natural language) that are both explanatory of the original program.Normalmente é difícil para alunos e iniciantes que estão habituados a linguagens imperativas, aprender uma linguagem declarativa como o Prolog. Uma das suas principais dificuldades é entender a componente procedimental do Prolog. Apesar de ser uma linguagem declarativa, o Prolog permite a criação de procedimentos cuja estrutura é bastante diferente da usada nas linguagens imperativas. Para abordar este problema tentámos facilitar a compreensão de código do Prolog procedimental através da geração de descrições formais e em linguagem natural. Primeiro, representamos a lógica dos procedimentos em Prolog através de descrições formais similares a linguagens imperativas. Para isto, identificamos os padrões que representam os blocos básicos de certas classes de programas. Depois, consideramos os programas mais complexos como composições destes padrões básicos. Através da utilização de templates formais, descrevemos formalmente estes padrões individuais numa linguagem formal intermédia. Seguidamente, geramos descrições em linguagem natural utilizando templates para descrever os construtos formais. Ao usar esta abordagem de dois passos obtemos duas descrições (uma formal e uma em linguagem natural) que são ambas explanatórias do programa original

    Automatically Extracting Subroutine Summary Descriptions from Unstructured Comments

    Full text link
    Summary descriptions of subroutines are short (usually one-sentence) natural language explanations of a subroutine's behavior and purpose in a program. These summaries are ubiquitous in documentation, and many tools such as JavaDocs and Doxygen generate documentation built around them. And yet, extracting summaries from unstructured source code repositories remains a difficult research problem -- it is very difficult to generate clean structured documentation unless the summaries are annotated by programmers. This becomes a problem in large repositories of legacy code, since it is cost prohibitive to retroactively annotate summaries in dozens or hundreds of old programs. Likewise, it is a problem for creators of automatic documentation generation algorithms, since these algorithms usually must learn from large annotated datasets, which do not exist for many programming languages. In this paper, we present a semi-automated approach via crowdsourcing and a fully-automated approach for annotating summaries from unstructured code comments. We present experiments validating the approaches, and provide recommendations and cost estimates for automatically annotating large repositories.Comment: 10 pages, plus references. Accepted for publication in the 27th IEEE International Conference on. Software Analysis, Evolution and Reengineering London, Ontario, Canada, February 18-21, 202

    Mining Question and Answer Sites for Automatic Comment Generation

    Get PDF
    Code comments improve software maintainability, programming productivity, and software reliability. To address the comment scarcity issue in many projects and save developers’ time in writing comments, we propose a new, general automatic comment generation approach, which mines comments from a large programming Question and Answer (Q&A) site. Q&A sites allow programmers to post questions and receive solutions, which contain code segments together with their descriptions, referred to as code-description mappings. We develop AutoComment to extract such mappings, and leverage them to generate description comments automatically for similar code segments matched in open source projects. We apply AutoComment to analyze 92,140 Java and Android tagged Q&A posts to extract 132,767 code-description mappings, which help AutoComment generate 102 comments automatically for 23 Java and Android projects. The number of generated comments is still low, but the user study results show that the majority of the participants consider the generated comments accurate, adequate, concise, and useful in helping them understand the code. One of the advantages from mining Q&A sites for automatic comment generation is that human written comments can provide information that is not explicitly in the code. In the future, we would like to focus on improving both the yield and quality of the generated comments. To improve the yield, we can replace the token-based clone detection tool with one that can detect addition and reordering of lines to increase the number of code matches. To improve the quality, we can apply advanced natural language processing techniques such as semantic role labeling to analyze the semantics of the sentences, or typed dependencies to analyze the grammatical structure of the sentences

    Prototype of a tool for automatic generation of commit messages for Java applications

    Get PDF
    Although version control systems allow developers to describe and explain the rationale behind code changes in commit messages, the state of practice indicates that most of the time such commit messages are either very short or even empty. In fact, in a recent study of 23K+ Java projects it has been found that only 10% of the messages are descriptive and over 66% of those messages contained fewer words as compared to a typical English sentence. However, accurate and complete commit messages summarizing software changes are important to support a number of development and maintenance tasks. This thesis presents an approach, coined as ChangeScribe, which is designed to generate commit messages automatically from change sets. ChangeScribe generates natural language commit messages by taking into account commit stereotype, the type of changes (e.g., files rename, changes done only to property files), as well as the impact set of the underlying changes. This work presents the evaluation of ChangeScribe in an evaluative survey involving 23 developers in which the participants analyzed automatically generated commit messages from real changes and compared them with commit messages written by the original developers of six open source systems. The results demonstrate that automatically generated messages by ChangeScribe are preferred in about 62% of the cases for large commits, and about 54% for small commitsResumen. Aunque los sistemas de control de versiones le permiten a los desarrolladores de software describir y explicar las razones por la cuales modificaron el código fuente utilizando un mensaje en el commit, en la práctica estos mensajes son muy cortos o incluso vacíos. De hecho, en recientes estudios de 23K+ de proyectos Java se ha encontrado que el 10% de los mensajes son descriptivos y alrededor del 66% de estos contienen pocas palabras comparado con el tamaño promedio de una oración escrita en el idioma inglés. Sin embargo, resumir los cambios en el software de una manera precisa y completa es muy importante para apoyar las tareas que se realizan en el desarrollo y mantenimiento de un software. Este trabajo presenta ChangeScribe un prototipo para generar mensajes de commit usando lenguaje natural y teniendo en cuenta el estereotipo del commit, el tipo de cambio (rename de un archivo, cambios a archivos de propiedades, etc ), y también el conjunto de impacto de los cambios realizados. De otro lado, presenta la evaluación de ChangeScribe en un estudio de usuarios que involucró 23 desarrolladores de software que analizaron los mensajes de commit generados automáticamente por ChangeScribe y los mensajes de commit escritos por los desarrolladores originales de seis sistemas open source. Los resultados demuestran que los mensajes generados de forma automática por ChangeScribe son preferidos en cerca del 62% de los casos en commits largos, y en cerca de 54% de los casos en commits cortos (pocas modificaciones).Maestrí
    corecore