908 research outputs found

    Trace-level speculative multithreaded architecture

    Get PDF
    This paper presents a novel microarchitecture to exploit trace-level speculation by means of two threads working cooperatively in a speculative and non-speculative way respectively. The architecture presents two main benefits: (a) no significant penalties are introduced in the presence of a misspeculation and (b) any type of trace predictor can work together with this proposal. In this way, aggressive trace predictors can be incorporated since misspeculations do not introduce significant penalties. We describe in detail TSMA (trace-level speculative multithreaded architecture) and present initial results to show the benefits of this proposal. We show how simple trace predictors achieve significant speed-up in the majority of cases. Results of a simple trace speculation mechanism show an average speed-up of 16%.Peer ReviewedPostprint (published version

    Operating System Support for Redundant Multithreading

    Get PDF
    Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardware suffering from permanent and transient faults will continue to increase in future chip generations. Researchers proposed various solutions to this issue with different downsides: Specialized hardware components make hardware more expensive in production and consume additional energy at runtime. Fault-tolerant algorithms and libraries enforce specific programming models on the developer. Compiler-based fault tolerance requires the source code for all applications to be available for recompilation. In this thesis I present ASTEROID, an operating system architecture that integrates applications with different reliability needs. ASTEROID is built on top of the L4/Fiasco.OC microkernel and extends the system with Romain, an operating system service that transparently replicates user applications. Romain supports single- and multi-threaded applications without requiring access to the application's source code. Romain replicates applications and their resources completely and thereby does not rely on hardware extensions, such as ECC-protected memory. In my thesis I describe how to efficiently implement replication as a form of redundant multithreading in software. I develop mechanisms to manage replica resources and to make multi-threaded programs behave deterministically for replication. I furthermore present an approach to handle applications that use shared-memory channels with other programs. My evaluation shows that Romain provides 100% error detection and more than 99.6% error correction for single-bit flips in memory and general-purpose registers. At the same time, Romain's execution time overhead is below 14% for single-threaded applications running in triple-modular redundant mode. The last part of my thesis acknowledges that software-implemented fault tolerance methods often rely on the correct functioning of a certain set of hardware and software components, the Reliable Computing Base (RCB). I introduce the concept of the RCB and discuss what constitutes the RCB of the ASTEROID system and other fault tolerance mechanisms. Thereafter I show three case studies that evaluate approaches to protecting RCB components and thereby aim to achieve a software stack that is fully protected against hardware errors

    Compiler analysis for trace-level speculative multithreaded architectures

    Get PDF
    Trace-level speculative multithreaded processors exploit trace-level speculation by means of two threads working cooperatively. One thread, called the speculative thread, executes instructions ahead of the other by speculating on the result of several traces. The other thread executes speculated traces and verifies the speculation made by the first thread. In this paper, we propose a static program analysis for identifying candidate traces to be speculated. This approach identifies large regions of code whose live-output values may be successfully predicted. We present several heuristics to determine the best opportunities for dynamic speculation, based on compiler analysis and program profiling information. Simulation results show that the proposed trace recognition techniques achieve on average a speed-up close to 38% for a collection of SPEC2000 benchmarks.Peer ReviewedPostprint (published version

    TwinPCG: Dual Thread Redundancy with Forward Recovery for Preconditioned Conjugate Gradient Methods

    Get PDF
    Even though iterative solvers like the Preconditioned Conjugate Gradient method (PCG) have been studied for over fifty years, fault tolerance for such solvers has seen much attention in recent years. For iterative solvers, two major reliable strategies of recovery exist: checkpoint-restart for backward recovery, or some type of redundancy technique for forward recovery. Efficient low-overhead redundancy techniques like algorithm-based fault tolerance for sparse matrix-vector products (SpMxV) have recently been proposed. These techniques add resilience with a good, but limited scope; state-of-the-art techniques correct at most 1 fault within a SpMxV. In this work, we study a more powerful resilience concept, which is redundant multithreading. It offers more generic and stronger recovery guarantees, including any soft faults in PCG iterations (among others covering SpMxV), but also requires more resources. We carefully study this redundancy-efficiency conflict. We propose a fault-tolerant PCG method, called TwinPCG, which introduces very small wall-clock time overhead, and significant advantages in detection and correction strategies. Our method uses Dual Modular Redundancy instead of the more expensive Triple Modular Redundancy (TMR); still, it retains the TMR advantages of fault correction. We describe, implement, and benchmark our iterative solver, and compare it in terms of efficiency and fault tolerance capabilities to state-of-the-art techniques. We find that before multithreading in BLAS, TwinPCG introduces 5-6% runtime overhead compared to reference PCG implementations, and can exploit BLAS multithreading well. In the presence of faults, it reliably performs forward recovery for a range of problems, showing all the strengths of TMR techniques

    Multiversioning hardware transactional memory for fail-operational multithreaded applications

    Get PDF
    Modern safety-critical embedded applications like autonomous driving need to be fail-operational, while high performance and low power consumption are demanded simultaneously. The prevalent fault tolerance mechanisms suffer from disadvantages: Some (e.g. triple modular redundancy) require a substantial amount of duplication, resulting in high hardware costs and power consumption. Others, like lockstep, require supplementary checkpointing mechanisms to recover from errors. Further approaches (e.g. software-based process-level redundancy) cannot handle the indeterminism caused by multithreaded execution. This paper presents a novel approach for fail-operational systems using hardware transactional memory for embedded systems. The hardware transactional memory is extended to support multiple versions, enabling redundant atomic operations and recovery in case of an error. In our FPGA-based evaluation, we executed the PARSEC benchmark suite with fault tolerance on 12 cores. The evaluation shows that multiversioning can successfully recover from all transient errors with an overhead comparable to fault tolerance mechanisms without recovery

    Memory Subsystems for Security, Consistency, and Scalability

    Get PDF
    In response to the continuous demand for the ability to process ever larger datasets, as well as discoveries in next-generation memory technologies, researchers have been vigorously studying memory-driven computing architectures that shall allow data-intensive applications to access enormous amounts of pooled non-volatile memory. As applications continue to interact with increasing amounts of components and datasets, existing systems struggle to eÿciently enforce the principle of least privilege for security. While non-volatile memory can retain data even after a power loss and allow for large main memory capacity, programmers have to bear the burdens of maintaining the consistency of program memory for fault tolerance as well as handling huge datasets with traditional yet expensive memory management interfaces for scalability. Today’s computer systems have become too sophisticated for existing memory subsystems to handle many design requirements. In this dissertation, we introduce three memory subsystems to address challenges in terms of security, consistency, and scalability. Specifcally, we propose SMVs to provide threads with fne-grained control over access privileges for a partially shared address space for security, NVthreads to allow programmers to easily leverage nonvolatile memory with automatic persistence for consistency, and PetaMem to enable memory-centric applications to freely access memory beyond the traditional process boundary with support for memory isolation and crash recovery for security, consistency, and scalability

    The Atomic Manifesto: a Story in Four Quarks

    Get PDF
    This report summarizes the viewpoints and insights gathered in the Dagstuhl Seminar on Atomicity in System Design and Execution, which was attended by 32 people from four different scientific communities: database and transaction processing systems, fault tolerance and dependable systems, formal methods for system design and correctness reasoning, and hardware architecture and programming languages. Each community presents its position in interpreting the notion of atomicity and the existing state of the art, and each community identifies scientific challenges that should be addressed in future work. In addition, the report discusses common themes across communities and strategic research problems that require multiple communities to team up for a viable solution. The general theme of how to specify, implement, compose, and reason about extended and relaxed notions of atomicity is viewed as a key piece in coping with the pressing issue of building and maintaining highly dependable systems that comprise many components with complex interaction patterns

    Hardware-Assisted Dependable Systems

    Get PDF
    Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations, unavailability of internet services, data losses, malfunctioning components, and consequently financial losses or even death of people. In particular, faults in microprocessors (CPUs) and memory corruption bugs are among the major unresolved issues of today. CPU faults may result in benign crashes and, more problematically, in silent data corruptions that can lead to catastrophic consequences, silently propagating from component to component and finally shutting down the whole system. Similarly, memory corruption bugs (memory-safety vulnerabilities) may result in a benign application crash but may also be exploited by a malicious hacker to gain control over the system or leak confidential data. Both these classes of errors are notoriously hard to detect and tolerate. Usual mitigation strategy is to apply ad-hoc local patches: checksums to protect specific computations against hardware faults and bug fixes to protect programs against known vulnerabilities. This strategy is unsatisfactory since it is prone to errors, requires significant manual effort, and protects only against anticipated faults. On the other extreme, Byzantine Fault Tolerance solutions defend against all kinds of hardware and software errors, but are inadequately expensive in terms of resources and performance overhead. In this thesis, we examine and propose five techniques to protect against hardware CPU faults and software memory-corruption bugs. All these techniques are hardware-assisted: they use recent advancements in CPU designs and modern CPU extensions. Three of these techniques target hardware CPU faults and rely on specific CPU features: ∆-encoding efficiently utilizes instruction-level parallelism of modern CPUs, Elzar re-purposes Intel AVX extensions, and HAFT builds on Intel TSX instructions. The rest two target software bugs: SGXBounds detects vulnerabilities inside Intel SGX enclaves, and “MPX Explained” analyzes the recent Intel MPX extension to protect against buffer overflow bugs. Our techniques achieve three goals: transparency, practicality, and efficiency. All our systems are implemented as compiler passes which transparently harden unmodified applications against hardware faults and software bugs. They are practical since they rely on commodity CPUs and require no specialized hardware or operating system support. Finally, they are efficient because they use hardware assistance in the form of CPU extensions to lower performance overhead
    corecore