4 research outputs found

    Measuring the Impact of Code Dependencies on Software Architecture Recovery Techniques

    Get PDF
    Many techniques have been proposed to automatically recover software architectures from software implementations. A thorough comparison among the recovery techniques is needed to understand their effectiveness and applicability. This study improves on previous studies in two ways. First, we study the impact of leveraging more accurate symbol dependencies on the accuracy of architecture recovery techniques. In addition, we evaluate other factors of the input dependencies such as the level of granularity, the impact of virtual call resolution, global variable usage and whether using direct dependencies provides better results than using transitive dependencies. Previous studies have not extensively studied how the quality of the input might affect the quality of the output for architecture recovery techniques. Second, we study a system (Chromium) that is substantially larger (10 million lines of code) than those included in previous studies. Obtaining the ground-truth architecture of Chromium involved two years of collaboration with its developers. As part of this work we developed a new submodule-based technique to recover preliminary versions of ground-truth architectures. The other systems that we study have been examined previously. In some cases, we have updated the ground-truth architectures to newer versions, and in other cases we have corrected newly discovered inconsistencies. Our evaluation of nine variants of six state-of-the-art architecture recovery techniques on 8 types of dependencies shows that symbol dependencies generally produce architectures with higher accuracies than include dependencies. We also observed that using a higher level of granularity (i.e., module level) and direct dependencies helps generating better architectures. Despite this improvement, the overall accuracy is low for all recovery techniques. The results suggest that (1) in addition to architecture recovery techniques, the type of dependencies used as their inputs is another factor to consider for high recovery accuracy, and (2) more accurate recovery techniques are needed. Our results show that some of the studied architecture recovery techniques (ACDC, Bunch-SAHC, WCA and ARC) scale to the 10M lines-of-code range (the size of Chromium), whereas others do not

    Machine Learning for Software Dependability

    Get PDF
    Dependability is an important quality of modern software but is challenging to achieve. Many software dependability techniques have been proposed to help developers improve software reliability and dependability such as defect prediction [83,96,249], bug detection [6,17, 146], program repair [51, 127, 150, 209, 261, 263], test case prioritization [152, 250], or software architecture recovery [13,42,67,111,164,240]. In this thesis, we consider how machine learning (ML) and deep learning (DL) can be used to enhanced software dependability through three examples in three different domains: automatic program repair, bug detection in electronic document readers, and software architecture recovery. In the first work, we propose a new G&V technique—CoCoNuT, which uses ensemble learning on the combination of convolutional neural networks (CNNs) and a new context-aware neural machine translation (NMT) architecture to automatically fix bugs in multiple programming languages. To better represent the context of a bug, we introduce a new context-aware NMT architecture that represents the buggy source code and its surrounding context separately. CoCoNuT uses CNNs instead of recurrent neural networks (RNNs) since CNN layers can be stacked to extract hierarchical features and better model source code at different granularity levels (e.g., statements and functions). In addition, CoCoNuTtakes advantage of the randomness in hyperparameter tuning to build multiple models that fix different bugs and combines these models using ensemble learning to fix more bugs.CoCoNuT fixes 493 bugs, including 307 bugs that are fixed by none of the 27 techniques with which we compare. In the second work, we present a study on the correctness of PDF documents and readers and propose an approach to detect and localize the source of such inconsistencies automatically. We evaluate our automatic approach on a large corpus of over 230Kdocuments using 11 popular readers and our experiments have detected 30 unique bugs in these readers and files. In the third work, we compare software architecture recovery techniques to understand their effectiveness and applicability. Specifically, we study the impact of leveraging accurate symbol dependencies on the accuracy of architecture recovery techniques. In addition, we evaluate other factors of the input dependencies such as the level of granularity and the dynamic-bindings graph construction. The results of our evaluation of nine architecture recovery techniques and their variants suggest that (1) using accurate symbol dependencies has a major influence on recovery quality, and (2) more accurate recovery techniques are needed. Our results show that some of the studied architecture recovery techniques scale to very large systems, whereas others do not

    Méthodologie de transformation du CIM en PIM dans l'approche MDA

    Get PDF
    L’Object Management Group (OMG) a proposé une nouvelle approche de développement de logiciel nommée Model Driven Architecture (MDA). Cette approche met l’accent sur l’élaboration des modèles de plus haut niveau d’abstraction et favorise l’approche de transformation d’un modèle à l’autre. MDA préconise l’élaboration des trois types de modèles suivants : • Computation Independent Model (CIM) : ce modèle représente le plus haut niveau d’abstraction et décrit les exigences du système ainsi que sa manière de fonctionner dans son environnement tandis que les détails de la structure de l’application et de la réalisation sont cachés ou encore indéterminés. • Platform Independent Model (PIM) : ce modèle décrit les détails du système sans montrer les détails spécifiques à une plateforme d’exécution ou à une technologie particulière. • Platform Specific Model (PSM) : ce modèle décrit les détails et les caractéristiques supprimés du PIM. Il doit être adapté pour spécifier l’implémentation du système dans une seule et unique plateforme technologique. Comme ces différents types de modèles représentent différents niveaux d’abstraction d’un même système, MDA recommande l’utilisation de mécanismes de transformation permettant les transformations du CIM vers le PIM et du PIM vers le PSM. Depuis l’avènement de MDA, plusieurs travaux de recherche ont abordé la problématique de transformation du PIM vers le PSM et du PSM vers le code mais très peu traitent de la transformation du CIM vers le PIM. Bien que la littérature présente quelques travaux reliés à cette question, il semble que peu de chercheurs se soient penchés sur les problèmes reliés à la transformation du CIM vers le PIM. Ainsi, le CIM a été initialement considéré comme partie intégrante du PIM. Bien que la notion de l’indépendance de la plateforme soit assez claire, la notion du concept ‘’Computation’’ reste floue. Par conséquent, la frontière entre les modèles CIM et PIM reste aussi vague. Dans le but de transformer le CIM en PIM, nous avons identifié les trois problématiques de recherche suivantes : 1) la définition de l’architecture du CIM permettant de circonscrire ses frontières par rapport au PIM, 2) la définition de l’architecture du PIM permettant de circonscrire ses frontières par rapport au PSM, 3) la définition d’une méthodologie permettant de transformer le CIM en PIM. La contribution de cette thèse s’inscrit dans le domaine de l’ingénierie dirigée par les modèles. Nous y proposons : 1) une architecture du CIM basée sur la composition de trois modèles Business Motivation Model (BMM), Business Process Model (BPM) et Requirement Model (RM), 2) une architecture du PIM basée sur les patrons d’analyse et les patrons archétype, 3) une méthodologie couvrant l’ensemble des étapes de création du CIM ainsi que les techniques et les artefacts à produire, permettant la transformation du CIM en PIM. Ce travail contribue de plus à l’amélioration de la traçabilité entre le CIM et le PIM ainsi qu’à la réduction du fossé entre les activités des analystes d’affaires et des architectes de logiciels

    A Heuristic Approach to Solving the Software Clustering Problem

    No full text
    This paper provides an overview of the author's Ph.D. thesis [8]. The primary contribution of this research involved developing techniques to extract architectural information about a system directly from its source code. To accomplish this objective a series of software clustering algorithms were developed. These algorithms use metaheuristic search techniques to partition a directed graph generated from the entities and relations in the source code into subsystems. Determining the optimal solution to this problem was shown to be NP-hard, thus significant emphasis was placed on finding solutions that were regarded as "good enough" quickly. Several evaluation techniques were developed to gauge solution quality, and all of the software clustering tools created to support this work were made available for download over the Internet
    corecore