56,104 research outputs found

    A three-layer model of source code comprehension

    Get PDF
    In this paper we first propose a source code comprehension model built as a hierarchy of three abstraction levels from the source code to the purpose (goal) of the program. The elements belonging to each layer have been precisely defined as well as their links to the elements in the adjacent layers. Consequently this model allows to bridge the semantic gap between the purpose of the program defined in business terms and the code that implements it. The model leverages two ontologies: an action ontology, which is specific to our approach, and a domain concept ontology. Next this model has been implemented as a tool under Eclipse and two experiments have been performed to assess the relevance of our approach in the maintenance of a large-scale program. The results of this experiment are very encouraging. The contribution of the paper is the presentation of our program comprehension model built on a novel approach based on an action ontology, the description of the tool we developed to assess the relevance of model and the testing of the latter with two controlled experiments

    A Neural Model for Generating Natural Language Summaries of Program Subroutines

    Full text link
    Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built manually by human experts. Recently, data-driven approaches based on neural machine translation have largely overtaken template-based systems. But nearly all of these techniques rely almost entirely on programs having good internal documentation; without clear identifier names, the models fail to create good summaries. In this paper, we present a neural model that combines words from code with code structure from an AST. Unlike previous approaches, our model processes each data source as a separate input, which allows the model to learn code structure independent of the text in code. This process helps our approach provide coherent summaries in many cases even when zero internal documentation is provided. We evaluate our technique with a dataset we created from 2.1m Java methods. We find improvement over two baseline techniques from SE literature and one from NLP literature

    Exploiting Sentence Embedding for Medical Question Answering

    Full text link
    Despite the great success of word embedding, sentence embedding remains a not-well-solved problem. In this paper, we present a supervised learning framework to exploit sentence embedding for the medical question answering task. The learning framework consists of two main parts: 1) a sentence embedding producing module, and 2) a scoring module. The former is developed with contextual self-attention and multi-scale techniques to encode a sentence into an embedding tensor. This module is shortly called Contextual self-Attention Multi-scale Sentence Embedding (CAMSE). The latter employs two scoring strategies: Semantic Matching Scoring (SMS) and Semantic Association Scoring (SAS). SMS measures similarity while SAS captures association between sentence pairs: a medical question concatenated with a candidate choice, and a piece of corresponding supportive evidence. The proposed framework is examined by two Medical Question Answering(MedicalQA) datasets which are collected from real-world applications: medical exam and clinical diagnosis based on electronic medical records (EMR). The comparison results show that our proposed framework achieved significant improvements compared to competitive baseline approaches. Additionally, a series of controlled experiments are also conducted to illustrate that the multi-scale strategy and the contextual self-attention layer play important roles for producing effective sentence embedding, and the two kinds of scoring strategies are highly complementary to each other for question answering problems.Comment: 8 page
    corecore