Search CORE

56,104 research outputs found

A three-layer model of source code comprehension

Author: Agrawal Ashish
Belmonte Javier
Dugerdil Philippe
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

In this paper we first propose a source code comprehension model built as a hierarchy of three abstraction levels from the source code to the purpose (goal) of the program. The elements belonging to each layer have been precisely defined as well as their links to the elements in the adjacent layers. Consequently this model allows to bridge the semantic gap between the purpose of the program defined in business terms and the code that implements it. The model leverages two ontologies: an action ontology, which is specific to our approach, and a domain concept ontology. Next this model has been implemented as a tool under Eclipse and two experiments have been performed to assess the relevance of our approach in the maintenance of a large-scale program. The results of this experiment are very encouraging. The contribution of the paper is the presentation of our program comprehension model built on a novel approach based on an action ontology, the description of the tool we developed to assess the relevance of model and the testing of the latter with two controlled experiments

Crossref

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

RERO DOC Digital Library

A Neural Model for Generating Natural Language Summaries of Program Subroutines

Author: Jiang Siyuan
LeClair Alexander
McMillan Collin
Publication venue
Publication date: 05/02/2019
Field of study

Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built manually by human experts. Recently, data-driven approaches based on neural machine translation have largely overtaken template-based systems. But nearly all of these techniques rely almost entirely on programs having good internal documentation; without clear identifier names, the models fail to create good summaries. In this paper, we present a neural model that combines words from code with code structure from an AST. Unlike previous approaches, our model processes each data source as a separate input, which allows the model to learn code structure independent of the text in code. This process helps our approach provide coherent summaries in many cases even when zero internal documentation is provided. We evaluate our technique with a dataset we created from 2.1m Java methods. We find improvement over two baseline techniques from SE literature and one from NLP literature

arXiv.org e-Print Archive

Crossref

Eastern Michigan University: Digital Commons@EMU

Exploiting Sentence Embedding for Medical Question Answering

Author: Hao Yu
Liu Xien
Lv Ping
Wu Ji
Publication venue
Publication date: 14/11/2018
Field of study

Despite the great success of word embedding, sentence embedding remains a not-well-solved problem. In this paper, we present a supervised learning framework to exploit sentence embedding for the medical question answering task. The learning framework consists of two main parts: 1) a sentence embedding producing module, and 2) a scoring module. The former is developed with contextual self-attention and multi-scale techniques to encode a sentence into an embedding tensor. This module is shortly called Contextual self-Attention Multi-scale Sentence Embedding (CAMSE). The latter employs two scoring strategies: Semantic Matching Scoring (SMS) and Semantic Association Scoring (SAS). SMS measures similarity while SAS captures association between sentence pairs: a medical question concatenated with a candidate choice, and a piece of corresponding supportive evidence. The proposed framework is examined by two Medical Question Answering(MedicalQA) datasets which are collected from real-world applications: medical exam and clinical diagnosis based on electronic medical records (EMR). The comparison results show that our proposed framework achieved significant improvements compared to competitive baseline approaches. Additionally, a series of controlled experiments are also conducted to illustrate that the multi-scale strategy and the contextual self-attention layer play important roles for producing effective sentence embedding, and the two kinds of scoring strategies are highly complementary to each other for question answering problems.Comment: 8 page

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications