20 research outputs found

    Relating Developers’ Concepts and Artefact Vocabulary in a Financial Software Module

    Get PDF
    Developers working on unfamiliar systems are challenged to accurately identify where and how high-level concepts are implemented in the source code. Without additional help, concept location can become a tedious, time-consuming and error-prone task. In this paper we study an industrial financial application for which we had access to the user guide, the source code, and some change requests. We compared the relative importance of the domain concepts, as understood by developers, in the user manual and in the source code. We also searched the code for the concepts occurring in change requests, to see if they could point developers to code to be modified. We varied the searches (using exact and stem matching, discarding stop-words, etc.) and present the precision and recall. We discuss the implication of our results for maintenance

    A Case Study in Matching Service Descriptions to Implementations in an Existing System

    Full text link
    A number of companies are trying to migrate large monolithic software systems to Service Oriented Architectures. A common approach to do this is to first identify and describe desired services (i.e., create a model), and then to locate portions of code within the existing system that implement the described services. In this paper we describe a detailed case study we undertook to match a model to an open-source business application. We describe the systematic methodology we used, the results of the exercise, as well as several observations that throw light on the nature of this problem. We also suggest and validate heuristics that are likely to be useful in partially automating the process of matching service descriptions to implementations.Comment: 20 pages, 19 pdf figure

    Identifying class name inconsistency in hierarchy: a first simple heuristic

    Get PDF
    International audienceGiving good class names is an important task. Good programmers often report that they take several attempts to find an adequate one. Often programmers do not name consistently classes within a package, project or hierarchy. This is a problem because it hampers understanding the systems. In this article we present a simple heuristic (a distribution) to characterise class naming. We combine such a heuristic with structural information to identify inconsistent class names. In addition, we use this simple heuristic to give packages a shape. We applied such heuristic to 285 packages in Pharo to identify misnamed classes. Some of these misnamed classes are reported and discussed here

    Instrumentación de Programas Escritos en Java para Interconectar los Dominios del Problema y del Programa

    Get PDF
    La Comprensión de Programas (CP) es una disciplina de la Ingeniería de Software cuyo objetivo es facilitar el entendimiento de los sistemas; mediante el desarrollo de Métodos, Técnicas, Estrategias y Herramientas que permiten comprender las funcionalidades del sistema de estudio. Uno de los principales desafíos en CP es establecer una relación entre los Dominios del Problema y del Programa. El primero se relaciona con el comportamiento del sistema de estudio; mientras que el segundo se centra en las componentes del programa para producir dicho comportamiento. Una forma de construir esta relación consiste en elaborar una representación para cada dominio y luego establecer un procedimiento de vinculación entre ambas representaciones. La tarea anterior implica extraer información de ambos dominios, para lo cual existen múltiples técnicas. En este artículo se describe un esquema de extracción de información dinámica desde el dominio del programa, que es muy útil para la implementación de estrategias de comprensión.Sociedad Argentina de Informática e Investigación Operativ

    Efficient Information Retrieval for Software Bug Localization

    Get PDF
    Software systems are often shipped with defects. When a bug is reported, developers use the information available in the associated report to locate source code fragments that need to be modified to fix the bug. However, as software systems evolve in size and complexity, bug localization can become a tedious and time-consuming process. Contemporary bug localization tools utilize Information Retrieval (IR) methods for automated support to minimize the manual effort. IR methods exploit the textual content of bug reports to capture and rank relevant buggy source files. However, for an IR-based bug localization tool to be useful, it must achieve adequate retrieval accuracy. Lower precision and recall can leave developers with large amounts of incorrect information to wade through. Motivated by these observations, in this dissertation, we propose a new paradigm of information-theoretic IR methods to support bug localization tasks in software systems. These methods exploit the co-occurrence patterns of code terms in software systems to reveal latent semantic information that other methods often fail to capture. We further investigate the impact of combining various IR methods on the retrieval accuracy of bug localization engines. The main assumption is that different IR methods, targeting different dimensions of similarity between software artifacts, can enhance the confidence in each other\u27s results. Furthermore, we propose a novel approach for enhancing the performance of IR-enabled bug localization methods in the context of Open-Source Software (OSS). The proposed approach exploits knowledge from previously resolved bugs to help localize new bugs. Our analysis uses multiple datasets generated for multiple open-source and closed source projects. Our results show that a) information-theoretic IR methods can significantly outperform classical IR methods in bug localization tasks, b) optimized IR-hybrids can significantly outperform individual IR methods, and near-optimal global configurations can be determined for different combinations of IR methods, and c) information extracted from previously resolved bug reports can significantly enhance the accuracy of IR-enabled bug localization methods in OSS

    A three-layer model of source code comprehension

    Get PDF
    In this paper we first propose a source code comprehension model built as a hierarchy of three abstraction levels from the source code to the purpose (goal) of the program. The elements belonging to each layer have been precisely defined as well as their links to the elements in the adjacent layers. Consequently this model allows to bridge the semantic gap between the purpose of the program defined in business terms and the code that implements it. The model leverages two ontologies: an action ontology, which is specific to our approach, and a domain concept ontology. Next this model has been implemented as a tool under Eclipse and two experiments have been performed to assess the relevance of our approach in the maintenance of a large-scale program. The results of this experiment are very encouraging. The contribution of the paper is the presentation of our program comprehension model built on a novel approach based on an action ontology, the description of the tool we developed to assess the relevance of model and the testing of the latter with two controlled experiments

    Restructuring source code identifiers

    Get PDF
    In software engineering, maintenance cost 60% of overall project lifecycle costs of any software product. Program comprehension is a substantial part of maintenance and evolution cost and, thus, any advancement in maintenance, evolution, and program understanding will potentially greatly reduce the total cost of ownership of any software products. Identifiers are an important source of information during program understanding and maintenance. Programmers often use identifiers to build their mental models of the software artifacts. Thus, poorly-chosen identifiers have been reported in the literature as misleading and increasing the program comprehension effort. Identifiers are composed of terms, which can be dictionary words, acronyms, contractions, or simple strings. We conjecture that the use of identical terms in different contexts may increase the risk of faults, and hence maintenance effort. We investigate our conjecture using a measure combining term entropy and term context-coverage to study whether certain terms increase the odds ratios of methods to be fault-prone. We compute term entropy and context-coverage of terms extracted from identifiers in Rhino 1.4R3 and ArgoUML 0.16. We show statistically that methods containing terms with high entropy and context-coverage are more fault-prone than others, and that the new measure is only partially correlated with size. We will build on this study, and will apply summarization technique for extracting linguistic information form methods and classes. Using this information, we will extract domain concepts from source code, and propose linguistic based refactoring
    corecore