3 research outputs found

    A Verification Technique for Deterministic Parallel Programs (extended version)

    Get PDF
    A commonly used approach to develop parallel programs is to augment a sequential program with compiler directives that indicate which program blocks may potentially be executed in parallel. This paper develops a verification technique to prove correctness of compiler directives combined with functional correctness of the program. We propose syntax and semantics for a simple core language, capturing the main forms of deterministic parallel programs. This language distinguishes three kinds of basic blocks: parallel, vectorized and sequential blocks, which can be composed using three different composition operators: sequential, parallel and fusion composition. We show that it is sufficient to have contracts for the basic blocks to prove correctness of the compiler directives, and moreover that functional correctness of the sequential program implies correctness of the parallelized program. We formally prove correctness of our approach. In addition, we define a widely-used subset of OpenMP that can be encoded into our core language, thus effectively enabling the verification of OpenMP compiler directives, and we discuss automated tool support for this verification process

    Operating System Support for Redundant Multithreading

    Get PDF
    Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardware suffering from permanent and transient faults will continue to increase in future chip generations. Researchers proposed various solutions to this issue with different downsides: Specialized hardware components make hardware more expensive in production and consume additional energy at runtime. Fault-tolerant algorithms and libraries enforce specific programming models on the developer. Compiler-based fault tolerance requires the source code for all applications to be available for recompilation. In this thesis I present ASTEROID, an operating system architecture that integrates applications with different reliability needs. ASTEROID is built on top of the L4/Fiasco.OC microkernel and extends the system with Romain, an operating system service that transparently replicates user applications. Romain supports single- and multi-threaded applications without requiring access to the application's source code. Romain replicates applications and their resources completely and thereby does not rely on hardware extensions, such as ECC-protected memory. In my thesis I describe how to efficiently implement replication as a form of redundant multithreading in software. I develop mechanisms to manage replica resources and to make multi-threaded programs behave deterministically for replication. I furthermore present an approach to handle applications that use shared-memory channels with other programs. My evaluation shows that Romain provides 100% error detection and more than 99.6% error correction for single-bit flips in memory and general-purpose registers. At the same time, Romain's execution time overhead is below 14% for single-threaded applications running in triple-modular redundant mode. The last part of my thesis acknowledges that software-implemented fault tolerance methods often rely on the correct functioning of a certain set of hardware and software components, the Reliable Computing Base (RCB). I introduce the concept of the RCB and discuss what constitutes the RCB of the ASTEROID system and other fault tolerance mechanisms. Thereafter I show three case studies that evaluate approaches to protecting RCB components and thereby aim to achieve a software stack that is fully protected against hardware errors
    corecore