13 research outputs found

    A Process-Oriented Software Architecture Reconstruction Taxonomy

    Get PDF
    International audienceTo maintain and understand large applications, it is cru- cial to know their architecture. The first problem is that architectures are not explicitly represented in the code as classes and packages are. The second problem is that suc- cessful applications evolve over time so their architecture inevitably drifts. Reconstructing and checking whether the architecture is still valid is thus an important aid. While there is a plethora of approaches and techniques supporting architecture reconstruction, there is no comprehensive state of the art and it is often difficult to compare the ap- proaches. This article presents a first state of the art in soft- ware architecture reconstruction, with the desire to support the understanding of the field

    Bridging Technical Spaces: Model Translation from TA to XMI and Back Again

    Get PDF
    There are many different techniques and notations for extracting architecturally interesting information from the source code of existing software systems. This process is known as reverse engineering. One current problem with reverse engineering techniques is that models of software systems cannot easily be transferred from one notation and storage format to another. We refer to this as the problem of bridging technical spaces. In this work, we approach the issue of bridging between the SWAG technical space and the UML technical space. The SWAG technical space, named after the Software Architecture Group at the University of Waterloo, consists of fact extractors, fact manipulators, schemas, and a fact storage language - the Tuple-Attribute language (TA). The UML technical space consists of the UML metamodel, the XML Metadata Interchange (XMI) format for encoding UML models, and various UML modeling tools. We have designed and implemented a plugin for MagicDraw UML, which will import, export, and merge between XMI-encoded UML models and TA-encoded Function-Level Schema models. We document evidence of what is referred to as a Bridge Domain - a technical space which exists between two encodable spaces. The metamodels of the two notation languages that we have focused on are very rich and flexible, but neither technical space is capable of fully expressing an accurate architectural model of any given software system; however, each technical space is capable of maintaining certain semantic information relevant to that technical space through multiple merge operations

    A Hybrid Model for Object-Oriented Software Maintenance

    Get PDF
    An object-oriented software system is composed of a collection of communicating objects that co-operate with one another to achieve some desired goals. The object is the basic unit of abstraction in an OO program; objects may model real-world entities or internal abstractions of the system. Similar objects forms classes, which encapsulate the data and operations performed on the data. Therefore, extracting, analyzing, and modelling classes/objects and their relationships is of key importance in understanding and maintaining object-oriented software systems. However, when dealing with large and complex object-oriented systems, maintainers can easily be overwhelmed by the vast number of classes/objects and the high degree of interdependencies among them. In this thesis, we propose a new model, which we call the Hybrid Model, to represent object-oriented systems at a coarse-grained level of abstraction. To promote the comprehensibility of objects as independent units, we group the complete static description of software objects into aggregate components. Each aggregate component logically represents a set of objects, and the components interact with one other through explicitly defined ports. We present and discuss several applications of the Hybrid Model in reverse engineering and software evolution. The Hybrid Model can be used to support a divide-and-conquer comprehension strategy for program comprehension. At a low level of abstraction, maintainers can focus on one aggregate-component at a time, while at a higher level, each aggregate component can be understood as a whole and be mapped to coarse-grained design abstractions, such as subsystems. Based on the new model, we further propose a set of dependency analysis methods. The analysis results reveal the external properties of aggregate components, and lead to better understand the nature of their interdependencies. In addition, we apply the new model in software evolution analysis. We identify a collection of change patterns in terms of changes in aggregate components and their interrelationships. These patterns help to interpret how an evolving system changes at the architectural level, and provides valuable information to understand why the system is designed as the way it is

    Empirical Evaluation of the Usefulness of Graph-based Visualization Techniques to Support Software Understanding

    Get PDF
    Many researchers have highlighted the scarcity of empirical studies that systematically examine the advantages and disadvantages of the use of visualization techniques for software understanding activities. Such studies are crucial for gathering and analyzing objective and quantifiable evidence about the usefulness of proposed visualization techniques and tools, and ultimately, for guiding the research in software visualization. This paper presents a controlled experiment aimed at assessing the impact of a graph-based visualization technique on comprehension tasks. Six common comprehension tasks were performed by 20 undergraduate software engineering students. The completion time and the accuracy of the participants’ responses were measured. The results indicate that on one hand the use of the graph-based visualization increases the correctness (by 21.45% in average) but on the other hand it does not reduce the completion time in program comprehension tasks.Resumen: Muchos investigadores han señalado la falta de estudios empíricos que sistemáticamente examinen las ventajas y desventajas del uso de técnicas de visualiza ción para soportar la comprensión del software. Estos estudios son indispensables para recolectar y analizar evidencia objetiva y cuantificable acerca de la utilidad de las técnicas de visualización y herramientas propuestas, y más aún, para servir como gu ía de la investigación en visualización de software. En este estudio, 6 tareas típicas de comprensión de software fueron realizadas por 20 estudiantes de ingeniería de software. Se midió el tiempo de respuesta y se calificó la exactitud en las respuestas d e los participantes. Los resultados indican que, por una parte, el uso de la técnica de visualización basada en grafos mejoró la exactitud en las respuestas de los estudiantes (21.45% en promedio), por otra parte, no se encontró evidencia de reducción en e l tiempo gastado por los estudiantes para resolver las tareas de comprensión de software.Maestrí

    Evolution and Architecture of Open Source Software Collections: A Case Study of Debian

    Get PDF
    Software has been studied at a variety of granularities. Code, classes, groups of classes, programs and finally large scale applications have been examined in detail. What lies beyond is the study of software collections that group together many individual applications. Collecting software and distributing it via a central repository has been popular for a while in the open source world, and only recently caught on commercially with Apple’s Mac app store and Microsoft’s Windows store. In many of these software collections, there is normally a complex process that must be followed in order to fully integrate new applications into the system. Moreover, in the case of open source software collections, applications frequently rely on each other for functionality and their interactions can be complex. We know that there are thousands of applications in these software collections that people depend on worldwide, but research in this area has been limited compared to other areas and granularities of software. In this thesis, we explore the evolution and architecture of a large open source software collections by using Debian as a case study. Debian is a software collection based off the Linux kernel with a large number of packages spread over multiple hardware platforms. Each package provides a particular service or application and is actively maintained by one or more developers. This thesis investigates how these packages evolve through time and their interactions with one another. The first half of the thesis describes the life cycle of a package from inception to end by carrying out a longitudinal study using the Ultimate Debian Database (UDD). The birth of packages is examined to see how Debian is growing. Conversely, package death is also analyzed to determine the lifespan of these packages. Moreover, four different package attributes are examined. They are package age, package bugs, package maintainers and package popularity. These four attributes combine to give us the overall biography of Debian packages. Debian’s architecture is explored in the second part of the thesis, where we analyze how packages interact with each other by examining the package dependencies in detail. The dependencies within Debian are extensive, which makes for an interesting architecture, but they are complex to analyze. This thesis provides a close look at the layered pattern. This pattern categorizes each package into one of five layers based on how they are used. These layers may also be visualized to give a concise view of how an application is structured. Using these views, we define five architectural subpatterns and anti-subpatterns which can aid developers in creating and maintaining packages

    Detecting Feature-Interaction Hotspots in Automotive Software using Relational Algebra

    Get PDF
    Modern software projects are programmed by multiple teams, consist of millions of lines of code, and are split into separate components that, during runtime, may not be contained in the same process. Due to these complexities, software defects are a common reality; defects cost the global economy over a trillion dollars each year. One area where developing safe software is crucial is the automotive domain. As the typical modern vehicle consists of over 100 million lines of code and is responsible for controlling vehicle motion through advanced driver-assistance systems (ADAS), there is a potential for these systems to malfunction in catastrophic ways. Due to this risk, automotive software needs to be inspected to verify that it is safe. The problem is that it can be difficult to carry out this detection in code; manual analysis does not scale well, search tools like grep have no contextual awareness of code, and although code reviews can be effective, they cannot target the entire codebase properly. Furthermore, automotive systems are comprised of numerous, communicating features that can possibly interact in unexpected or undefined ways. This thesis addresses this problem through the development of a static-analysis methodology that detects custom interaction patterns coined as hotspots. We identify several classes of automotive hotspots that describe patterns in automotive software that have the possibility of manifesting as a feature interaction. To detect these hotspots, this methodology employs a static, relational analysis toolchain that create a queryable model from source code and enable engineer defined queries to be run on the model that aim to reveal potential hotspots in the underlying source code. The purpose of this methodology is not to detect bugs with surety but work towards an analysis methodology that can scale to automotive software systems. We test this hotspot detection methodology through a case study conducted on the Autonomoose autonomous driving platform. In it, we generate a model of the entire Autonomoose codebase and run relational algebra queries on the model. Each script in the case study detects a type of hotspot we identify in this thesis. The results of each query are presented

    Policy-Driven Framework for Static Identification and Verification of Component Dependencies

    Get PDF
    Software maintenance is considered to be among the most difficult, lengthy and costly parts of a software application's life-cycle. Regardless of the nature of the software application and the software engineering efforts to reduce component coupling to minimum, dependencies between software components in applications will always exist and initiate software maintenance operations as they tend to threaten the "health" of the software system during the evolution of particular components. The situation is more serious with modern technologies and development paradigms, such as Service Oriented Architecture Systems and Cloud Computing that introduce larger software systems that consist of a substantial number of components which demonstrate numerous types of dependencies with each other. This work proposes a reference architecture and a corresponding software framework that can be used to model the dependencies between components in software systems and can support the verification of a set of policies that are derived from system dependencies and are relative to the software maintenance operations being applied. Dependency modelling is performed using configuration information from the system, as well as information harvested from component interface descriptions. The proposed approach has been applied to a medium scale SOA system, namely the SCA Travel Sample from Apache Software Foundation, and has been evaluated for performance in a configuration specification related to a simulated SOA system consisting to up to a thousand web services offered in a few hundred components

    Investigating Modern Release Engineering Practices

    Get PDF
    Modern release engineering has moved from longer release cycles and separate development and release teams to a continuous and integrated process. However, release engineering practices include not only integration, build and test execution but also a better management of features. The goal of this research is to investigate the modern release engineering practices which cover four milestones in the field of release engineering, i. understanding rapid release by measuring the time and effort involved in release cycles, ii. feature management based on feature toggles iii. the impact of toggles on the system architecture, and iv. the quality of builds that contain ignored failing and flaky tests. This thesis is organized as a “manuscript” thesis whereby each milestone constitutes an accepted or submitted paper. First, we investigate the rapid release model for two major open source software projects. We quantify the time and effort which is involved in both the development and stabilization phases of a release cycle where, we found that despite using the rapid release process, both the Chrome Browser and the Linux Kernel have a period where developers rush changes to catch the current release. Second, we examine feature management based on feature toggles which is a widely used technique in software companies to manage features by turning them on/off during development as well as release periods. Developers typically isolate unrelated/unreleased changes on branches. However, large companies, such as Google and Facebook do their development on single branch. They isolate unfinished features using feature toggles that allow them to disable unstable code. Third, feature toggles provide not only a better management of features but also keep modules isolated and feature oriented which makes the architecture underneath the source code readable and iiieasily extractable. As the project grows, modules keep accepting features and features cross-cut into the modules. We found that the architecture can be easily extracted based on feature toggles and provides a different view compared to the traditional modular representations of software architecture. Fourth, we investigate the impact of failing tests on the quality of builds where we consider browser-crash as a quality factor. In this study we found that ignoring failing and flaky tests leads to dramatically more crashes than builds with all tests passing

    Open Source Software Evolution and Its Dynamics

    Get PDF
    This thesis undertakes an empirical study of software evolution by analyzing open source software (OSS) systems. The main purpose is to aid in understanding OSS evolution. The work centers on collecting large quantities of structural data cost-effectively and analyzing such data to understand software evolution dynamics (the mechanisms and causes of change or growth). We propose a multipurpose systematic approach to extracting program facts (e. g. , function calls). This approach is supported by a suite of C and C++ program extractors, which cover different steps in the program build process and handle both source and binary code. We present several heuristics to link facts extracted from individual files into a combined system model of reasonable accuracy. We extract historical sequences of system models to aid software evolution analysis. We propose that software evolution can be viewed as Punctuated Equilibrium (i. e. , long periods of small changes interrupted occasionally by large avalanche changes). We develop two approaches to study such dynamical behavior. One approach uses the evolution spectrograph to visualize file level changes to the implemented system structure. The other approach relies on automated software clustering techniques to recover system design changes. We discuss lessons learned from using these approaches. We present a new perspective on software evolution dynamics. From this perspective, an evolving software system responds to external events (e. g. , new functional requirements) according to Self-Organized Criticality (SOC). The SOC dynamics is characterized by the following: (1) the probability distribution of change sizes is a power law; and (2) the time series of change exhibits long range correlations with power law behavior. We present empirical evidence that SOC occurs in open source software systems

    Machine Learning for Software Dependability

    Get PDF
    Dependability is an important quality of modern software but is challenging to achieve. Many software dependability techniques have been proposed to help developers improve software reliability and dependability such as defect prediction [83,96,249], bug detection [6,17, 146], program repair [51, 127, 150, 209, 261, 263], test case prioritization [152, 250], or software architecture recovery [13,42,67,111,164,240]. In this thesis, we consider how machine learning (ML) and deep learning (DL) can be used to enhanced software dependability through three examples in three different domains: automatic program repair, bug detection in electronic document readers, and software architecture recovery. In the first work, we propose a new G&V technique—CoCoNuT, which uses ensemble learning on the combination of convolutional neural networks (CNNs) and a new context-aware neural machine translation (NMT) architecture to automatically fix bugs in multiple programming languages. To better represent the context of a bug, we introduce a new context-aware NMT architecture that represents the buggy source code and its surrounding context separately. CoCoNuT uses CNNs instead of recurrent neural networks (RNNs) since CNN layers can be stacked to extract hierarchical features and better model source code at different granularity levels (e.g., statements and functions). In addition, CoCoNuTtakes advantage of the randomness in hyperparameter tuning to build multiple models that fix different bugs and combines these models using ensemble learning to fix more bugs.CoCoNuT fixes 493 bugs, including 307 bugs that are fixed by none of the 27 techniques with which we compare. In the second work, we present a study on the correctness of PDF documents and readers and propose an approach to detect and localize the source of such inconsistencies automatically. We evaluate our automatic approach on a large corpus of over 230Kdocuments using 11 popular readers and our experiments have detected 30 unique bugs in these readers and files. In the third work, we compare software architecture recovery techniques to understand their effectiveness and applicability. Specifically, we study the impact of leveraging accurate symbol dependencies on the accuracy of architecture recovery techniques. In addition, we evaluate other factors of the input dependencies such as the level of granularity and the dynamic-bindings graph construction. The results of our evaluation of nine architecture recovery techniques and their variants suggest that (1) using accurate symbol dependencies has a major influence on recovery quality, and (2) more accurate recovery techniques are needed. Our results show that some of the studied architecture recovery techniques scale to very large systems, whereas others do not
    corecore