286 research outputs found

    Lightweight Multilingual Software Analysis

    Full text link
    Developer preferences, language capabilities and the persistence of older languages contribute to the trend that large software codebases are often multilingual, that is, written in more than one computer language. While developers can leverage monolingual software development tools to build software components, companies are faced with the problem of managing the resultant large, multilingual codebases to address issues with security, efficiency, and quality metrics. The key challenge is to address the opaque nature of the language interoperability interface: one language calling procedures in a second (which may call a third, or even back to the first), resulting in a potentially tangled, inefficient and insecure codebase. An architecture is proposed for lightweight static analysis of large multilingual codebases: the MLSA architecture. Its modular and table-oriented structure addresses the open-ended nature of multiple languages and language interoperability APIs. We focus here as an application on the construction of call-graphs that capture both inter-language and intra-language calls. The algorithms for extracting multilingual call-graphs from codebases are presented, and several examples of multilingual software engineering analysis are discussed. The state of the implementation and testing of MLSA is presented, and the implications for future work are discussed.Comment: 15 page

    The Java system dependence graph

    Get PDF
    The Program Dependence Graph was introduced by Ottenstein and Ottenstein in 1984 [14]. It was suggested to be a suitable internal program representation for monolithic programs, for the purpose of carrying out certain software engineering operations such as slicing and the computation of program metrics. Since then, Horwitz et al. have introduced the multi-procedural equivalent System Dependence Graph [9]. Many authors have proposed object-oriented dependence graph construction approaches [11, 10, 20, 12]. Every approach provides its own benefits, some of which are language specific. This paper is based on Java and combines the most important benefits from a range of approaches. The result is a Java System Dependence Graph, which summarises the key benefits offered by different approaches and adapts them (if necessary) to the Java language

    Identifying reusable functions in code using specification driven techniques

    Get PDF
    The work described in this thesis addresses the field of software reuse. Software reuse is widely considered as a way to increase the productivity and improve the quality and reliability of new software systems. Identifying, extracting and reengineering software. components which implement abstractions within existing systems is a promising cost-effective way to create reusable assets. Such a process is referred to as reuse reengineering. A reference paradigm has been defined within the RE(^2) project which decomposes a reuse reengineering process in five sequential phases. In particular, the first phase of the reference paradigm, called Candidature phase, is concerned with the analysis of source code for the identification of software components implementing abstractions and which are therefore candidate to be reused. Different candidature criteria exist for the identification of reuse-candidate software components. They can be classified in structural methods (based on structural properties of the software) and specification driven methods (that search for software components implementing a given specification).In this thesis a new specification driven candidature criterion for the identification and the extraction of code fragments implementing functional abstractions is presented. The method is driven by a formal specification of the function to be isolated (given in terms of a precondition and a post condition) and is based on the theoretical frameworks of program slicing and symbolic execution. Symbolic execution and theorem proving techniques are used to map the specification of the functional abstractions onto a slicing criterion. Once the slicing criterion has been identified the slice is isolated using algorithms based on dependence graphs. The method has been specialised for programs written in the C language. Both symbolic execution and program slicing are performed by exploiting the Combined C Graph (CCG), a fine-grained dependence based program representation that can be used for several software maintenance tasks

    A combined representation for the maintenance of C programs

    Get PDF
    A programmer wishing to make a change to a piece of code must first gain a full understanding of the behaviours and functionality involved. This process of program comprehension is difficult and time consuming, and often hindered by the absence of useful program documentation. Where documentation is absent, static analysis techniques are often employed to gather programming level information in the form of data and control flow relationships, directly from the source code itself. Software maintenance environments are created by grouping together a number of different static analysis tools such as program sheers, call graph builders and data flow analysis tools, providing a maintainer with a selection of 'views' of the subject code. However, each analysis tool often requires its own intermediate program representation (IPR). For example, an environment comprising five tools may require five different IPRs, giving repetition of information and inefficient use of storage space. A solution to this problem is to develop a single combined representation which contains all the program relationships required to present a maintainer with each required code view. The research presented in this thesis describes the Combined C Graph (CCG), a dependence-based representation for C programs from which a maintainer is able to construct data and control dependence views, interprocedural control flow views, program slices and ripple analyses. The CCG extends earlier dependence-based program representations, introducing language features such as expressions with embedded side effects and control flows, value returning functions, pointer variables, pointer parameters, array variables and structure variables. Algorithms for the construction of the CCG are described and the feasibility of the CCG demonstrated by means of a C/Prolog based prototype implementation

    Isolating cause-effect chains from computer programs

    Get PDF

    Inter-module code analysis techniques for software maintenance

    Get PDF
    The research described in this thesis addresses itself to the problem of maintaining large, undocumented systems written in languages that contain a module construct. Emphasis is placed on developing techniques for analysing the code of these systems, thereby helping a maintenance programmer to understand a system. Techniques for improving the structure of a system are presented. These techniques help make the code of a system easier to understand. All the code analysis techniques described in this thesis involve reasoning with, and manipulating, graphical representations of a system. To help with these graph manipulations, a set of graph operations are developed that allow a maintenance programmer to combine graphs to create a bigger graph, and to extract subgraphs from a given graph that satisfy specified constraints. A relational database schema is developed to represent the information needed for inter-module code analysis. Pointers are given as to how this database can be used for inter-module code analysis

    Tool-supported identification of functional concerns in object-oriented code

    Get PDF
    Concern identification aims to find the implementation of a functional concern in existing source code. In this work, concerns are described, using the Hierarchic Concern Model, as gray-boxes containing subconcerns, inputs, and outputs. The inputs and outputs are used as concern seeds to identify data-oriented abstractions of concern implementations, called concern skeletons. The identification approach is based on context free language reachability and supported by a tool, called CoDEx
    corecore