61 research outputs found

    Making legacy Fortran code type safe through automated program transformation

    Get PDF
    Fortran is still widely used in scientific computing, and a very large corpus of legacy as well as new code is written in FORTRAN 77. In general this code is not type safe, so that incorrect programs can compile without errors. In this paper, we present a formal approach to ensure type safety of legacy Fortran code through automated program transformation. The objective of this work is to reduce programming errors by guaranteeing type safety. We present the first rigorous analysis of the type safety of FORTRAN 77 and the novel program transformation and type checking algorithms required to convert FORTRAN 77 subroutines and functions into pure, side-effect free subroutines and functions in Fortran 90. We have implemented these algorithms in a source-to-source compiler which type checks and automatically transforms the legacy code. We show that the resulting code is type safe and that the pure, side-effect free and referentially transparent subroutines can readily be offloaded to accelerators

    A Computational Science Agenda for Programming Language Research

    Get PDF
    Scientific models are often expressed as large and complicated programs. These programs embody numerous assumptions made by the developer (e.g., for differential equations, the discretization strategy and resolution). The complexity and pervasiveness of these assumptions means that often the only true description of the model is the software itself. This has led various researchers to call for scientists to publish their source code along with their papers. We argue that this is unlikely to be beneficial since it is almost impossible to separate implementation assumptions from the original scientific intent. Instead we advocate higher-level abstractions in programming languages, coupled with lightweight verification techniques such as specification and type systems. In this position paper, we suggest several novel techniques and outline an evolutionary approach to applying these to existing and future models. One-dimensional heat flow is used as an example throughout

    Automated Software Transplantation

    Get PDF
    Automated program repair has excited researchers for more than a decade, yet it has yet to find full scale deployment in industry. We report our experience with SAPFIX: the first deployment of automated end-to-end fault fixing, from test case design through to deployed repairs in production code. We have used SAPFIX at Facebook to repair 6 production systems, each consisting of tens of millions of lines of code, and which are collectively used by hundreds of millions of people worldwide. In its first three months of operation, SAPFIX produced 55 repair candidates for 57 crashes reported to SAPFIX, of which 27 have been deem as correct by developers and 14 have been landed into production automatically by SAPFIX. SAPFIX has thus demonstrated the potential of the search-based repair research agenda by deploying, to hundreds of millions of users worldwide, software systems that have been automatically tested and repaired. Automated software transplantation (autotransplantation) is a form of automated software engineering, where we use search based software engineering to be able to automatically move a functionality of interest from a ‘donor‘ program that implements it into a ‘host‘ program that lacks it. Autotransplantation is a kind of automated program repair where we repair the ‘host‘ program by augmenting it with the missing functionality. Automated software transplantation would open many exciting avenues for software development: suppose we could autotransplant code from one system into another, entirely unrelated, system, potentially written in a different programming language. Being able to do so might greatly enhance the software engineering practice, while reducing the costs. Automated software transplantation manifests in two different flavors: monolingual, when the languages of the host and donor programs is the same, or multilingual when the languages differ. This thesis introduces a theory of automated software transplantation, and two algorithms implemented in two tools that achieve this: µSCALPEL for monolingual software transplantation and τSCALPEL for multilingual software transplantation. Leveraging lightweight annotation, program analysis identifies an organ (interesting behavior to transplant); testing validates that the organ exhibits the desired behavior during its extraction and after its implantation into a host. We report encouraging results: in 14 of 17 monolingual transplantation experiments involving 6 donors and 4 hosts, popular real-world systems, we successfully autotransplanted 6 new functionalities; and in 10 out of 10 multlingual transplantation experiments involving 10 donors and 10 hosts, popular real-world systems written in 4 different programming languages, we successfully autotransplanted 10 new functionalities. That is, we have passed all the test suites that validates the new functionalities behaviour and the fact that the initial program behaviour is preserved. Additionally, we have manually checked the behaviour exercised by the organ. Autotransplantation is also very useful: in just 26 hours computation time we successfully autotransplanted the H.264 video encoding functionality from the x264 system to the VLC media player, a task that is currently done manually by the developers of VLC, since 12 years ago. We autotransplanted call graph generation and indentation for C programs into Kate, (a popular KDE based test editor used as an IDE by a lot of C developers) two features currently missing from Kate, but requested by the users of Kate. Autotransplantation is also efficient: the total runtime across 15 monolingual transplants is 5 hours and a half; the total runtime across 10 multilingual transplants is 33 hours

    Object-oriented implementations of the MPDATA advection equation solver in C++, Python and Fortran

    Full text link
    Three object-oriented implementations of a prototype solver of the advection equation are introduced. The presented programs are based on Blitz++ (C++), NumPy (Python), and Fortran's built-in array containers. The solvers include an implementation of the Multidimensional Positive-Definite Advective Transport Algorithm (MPDATA). The introduced codes exemplify how the application of object-oriented programming (OOP) techniques allows to reproduce the mathematical notation used in the literature within the program code. A discussion on the tradeoffs of the programming language choice is presented. The main angles of comparison are code brevity and syntax clarity (and hence maintainability and auditability) as well as performance. In the case of Python, a significant performance gain is observed when switching from the standard interpreter (CPython) to the PyPy implementation of Python. Entire source code of all three implementations is embedded in the text and is licensed under the terms of the GNU GPL license

    Conceptual roles of data in program: analyses and applications

    Get PDF
    Program comprehension is the prerequisite for many software evolution and maintenance tasks. Currently, the research falls short in addressing how to build tools that can use domain-specific knowledge to provide powerful capabilities for extracting valuable information for facilitating program comprehension. Such capabilities are critical for working with large and complex program where program comprehension often is not possible without the help of domain-specific knowledge.;Our research advances the state-of-art in program analysis techniques based on domain-specific knowledge. The program artifacts including variables and methods are carriers of domain concepts that provide the key to understand programs. Our program analysis is directed by domain knowledge stored as domain-specific rules. Our analysis is iterative and interactive. It is based on flexible inference rules and inter-exchangeable and extensible information storage. We designed and developed a comprehensive software environment SeeCORE based on our knowledge-centric analysis methodology. The SeeCORE tool provides multiple views and abstractions to assist in understanding complex programs. The case studies demonstrate the effectiveness of our method. We demonstrate the flexibility of our approach by analyzing two legacy programs in distinct domains

    A Re-engineering approach for software systems complying with the utilisation of ubiquitous computing technologies.

    Get PDF
    The evident progression of ubiquitous technologies has put forward the introduction of new features which software systems can sustain. Several of the ubiquitous technologies available today are regarded as fundamental elements of many software applications in various domains. The utilisation of ubiquitous technologies has an apparent impact on business processes that can grant organisations a competitive advantage and improve their productivity. The change in the business processes in such organisations typically leads to a change in the underlying software systems. In addressing the need for change in the underlying software systems, this research is focused on establishing a general framework and methodology to facilitate the reengineering of software systems in order to allow the incorporation of new features which are introduced by the employment of ubiquitous technologies. Although this thesis aims to be general and not limited to a specific programming language or software development approach, the focus is on Object-Oriented software. The reengineering framework follows a systematic step-based approach, with greater focus on the reverse engineering aspect. The four stages of the framework are: program understanding, additional-requirement engineering, integration, and finally the testing and operation stage. In its first stage, the proposed reengineering framework regards the source code as the starting point to understand the system using a static-analysis based method. The second stage is concerned with the elicitation of the user functional requirements resulting from the introduction of ubiquitous technologies. In the third stage, the goal is to integrate the system’s components and hardware handlers using a developed integration algorithm and available integration techniques. In the fourth and final stage, which is discussed in a general manner only in this thesis, the reengineered system is tested and put in the operation phase. The proposed approach is demonstrated using a case study in Java to show that the proposed approach is feasible and promising in its domain. Conclusions are drawn based on analysis and further research directions are discussed at the end of the study

    The Design & Implementation of an Abstract Semantic Graph for Statement-Level Dynamic Analysis of C++ Applications

    Get PDF
    In this thesis, we describe our system, Hylian, for statement-level analysis, both static and dynamic, of a C++ application. We begin by extending the GNU gcc parser to generate parse trees in XML format for each of the compilation units in a C++ application. We then provide verification that the generated parse trees are structurally equivalent to the code in the original C++ application. We use the generated parse trees, together with an augmented version of the gcc test suite, to recover a grammar for the C++ dialect that we parse. We use the recovered grammar to generate a schema for further verification of the parse trees and evaluate the coverage provided by our C++ test suite. We then extend the parse tree, for each compilation unit, with semantic information to form an abstract semantic graph, ASG, and then link the ASGs for all of the compilation units into a unified ASG for the entire application under study. In addition, to relieve the cognitive burden of information that may inundate a developer, we describe our development of extensions to Hylian to build abbreviated abstract semantic graphs, which incorporate information about user code, but not about compiler provided library code. Finally, we describe the various approaches that we adopted to provide assurance for the developer that the ASGs that Hylian builds, correctly represent the program under study

    Formula Translation in Blitz++, NumPy and Modern Fortran: A Case Study of the Language Choice Tradeoffs

    Get PDF

    Structured parallelism discovery with hybrid static-dynamic analysis and evaluation technique

    Get PDF
    Parallel computer architectures have dominated the computing landscape for the past two decades; a trend that is only expected to continue and intensify, with increasing specialization and heterogeneity. This creates huge pressure across the software stack to produce programming languages, libraries, frameworks and tools which will efficiently exploit the capabilities of parallel computers, not only for new software, but also revitalizing existing sequential code. Automatic parallelization, despite decades of research, has had limited success in transforming sequential software to take advantage of efficient parallel execution. This thesis investigates three approaches that use commutativity analysis as the enabler for parallelization. This has the potential to overcome limitations of traditional techniques. We introduce the concept of liveness-based commutativity for sequential loops. We examine the use of a practical analysis utilizing liveness-based commutativity in a symbolic execution framework. Symbolic execution represents input values as groups of constraints, consequently deriving the output as a function of the input and enabling the identification of further program properties. We employ this feature to develop an analysis and discern commutativity properties between loop iterations. We study the application of this approach on loops taken from real-world programs in the OLDEN and NAS Parallel Benchmark (NPB) suites, and identify its limitations and related overheads. Informed by these findings, we develop Dynamic Commutativity Analysis (DCA), a new technique that leverages profiling information from program execution with specific input sets. Using profiling information, we track liveness information and detect loop commutativity by examining the code’s live-out values. We evaluate DCA against almost 1400 loops of the NPB suite, discovering 86% of them as parallelizable. Comparing our results against dependence-based methods, we match the detection efficacy of two dynamic and outperform three static approaches, respectively. Additionally, DCA is able to automatically detect parallelism in loops which iterate over Pointer-Linked Data Structures (PLDSs), taken from wide range of benchmarks used in the literature, where all other techniques we considered failed. Parallelizing the discovered loops, our methodology achieves an average speedup of 3.6× across NPB (and up to 55×) and up to 36.9× for the PLDS-based loops on a 72-core host. We also demonstrate that our methodology, despite relying on specific input values for profiling each program, is able to correctly identify parallelism that is valid for all potential input sets. Lastly, we develop a methodology to utilize liveness-based commutativity, as implemented in DCA, to detect latent loop parallelism in the shape of patterns. Our approach applies a series of transformations which subsequently enable multiple applications of DCA over the generated multi-loop code section and match its loop commutativity outcomes against the expected criteria for each pattern. Applying our methodology on sets of sequential loops, we are able to identify well-known parallel patterns (i.e., maps, reduction and scans). This extends the scope of parallelism detection to loops, such as those performing scan operations, which cannot be determined as parallelizable by simply evaluating liveness-based commutativity conditions on their original form
    • …
    corecore