747 research outputs found

    An Experimental Evaluation of the REE SIFT Environment for Spaceborne Applications

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems Laborator

    Hardware-Assisted Dependable Systems

    Get PDF
    Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations, unavailability of internet services, data losses, malfunctioning components, and consequently financial losses or even death of people. In particular, faults in microprocessors (CPUs) and memory corruption bugs are among the major unresolved issues of today. CPU faults may result in benign crashes and, more problematically, in silent data corruptions that can lead to catastrophic consequences, silently propagating from component to component and finally shutting down the whole system. Similarly, memory corruption bugs (memory-safety vulnerabilities) may result in a benign application crash but may also be exploited by a malicious hacker to gain control over the system or leak confidential data. Both these classes of errors are notoriously hard to detect and tolerate. Usual mitigation strategy is to apply ad-hoc local patches: checksums to protect specific computations against hardware faults and bug fixes to protect programs against known vulnerabilities. This strategy is unsatisfactory since it is prone to errors, requires significant manual effort, and protects only against anticipated faults. On the other extreme, Byzantine Fault Tolerance solutions defend against all kinds of hardware and software errors, but are inadequately expensive in terms of resources and performance overhead. In this thesis, we examine and propose five techniques to protect against hardware CPU faults and software memory-corruption bugs. All these techniques are hardware-assisted: they use recent advancements in CPU designs and modern CPU extensions. Three of these techniques target hardware CPU faults and rely on specific CPU features: ∆-encoding efficiently utilizes instruction-level parallelism of modern CPUs, Elzar re-purposes Intel AVX extensions, and HAFT builds on Intel TSX instructions. The rest two target software bugs: SGXBounds detects vulnerabilities inside Intel SGX enclaves, and “MPX Explained” analyzes the recent Intel MPX extension to protect against buffer overflow bugs. Our techniques achieve three goals: transparency, practicality, and efficiency. All our systems are implemented as compiler passes which transparently harden unmodified applications against hardware faults and software bugs. They are practical since they rely on commodity CPUs and require no specialized hardware or operating system support. Finally, they are efficient because they use hardware assistance in the form of CPU extensions to lower performance overhead

    Resilience-Building Technologies: State of Knowledge -- ReSIST NoE Deliverable D12

    Get PDF
    This document is the first product of work package WP2, "Resilience-building and -scaling technologies", in the programme of jointly executed research (JER) of the ReSIST Network of Excellenc

    Distributed eventual leader election in the crash-recovery and general omission failure models.

    Get PDF
    102 p.Distributed applications are present in many aspects of everyday life. Banking, healthcare or transportation are examples of such applications. These applications are built on top of distributed systems. Roughly speaking, a distributed system is composed of a set of processes that collaborate among them to achieve a common goal. When building such systems, designers have to cope with several issues, such as different synchrony assumptions and failure occurrence. Distributed systems must ensure that the delivered service is trustworthy.Agreement problems compose a fundamental class of problems in distributed systems. All agreement problems follow the same pattern: all processes must agree on some common decision. Most of the agreement problems can be considered as a particular instance of the Consensus problem. Hence, they can be solved by reduction to consensus. However, a fundamental impossibility result, namely (FLP), states that in an asynchronous distributed system it is impossible to achieve consensus deterministically when at least one process may fail. A way to circumvent this obstacle is by using unreliable failure detectors. A failure detector allows to encapsulate synchrony assumptions of the system, providing (possibly incorrect) information about process failures. A particular failure detector, called Omega, has been shown to be the weakest failure detector for solving consensus with a majority of correct processes. Informally, Omega lies on providing an eventual leader election mechanism

    Dependable IMS services - A Performance Analysis of Server Replication and Mid-Session Inter-Domain Handover

    Get PDF

    Compilation Optimizations to Enhance Resilience of Big Data Programs and Quantum Processors

    Get PDF
    Modern computers can experience a variety of transient errors due to the surrounding environment, known as soft faults. Although the frequency of these faults is low enough to not be noticeable on personal computers, they become a considerable concern during large-scale distributed computations or systems in more vulnerable environments like satellites. These faults occur as a bit flip of some value in a register, operation, or memory during execution. They surface as either program crashes, hangs, or silent data corruption (SDC), each of which can waste time, money, and resources. Hardware methods, such as shielding or error correcting memory (ECM), exist, though they can be difficult to implement, expensive, and may be limited to only protecting against errors in specific locations. Researchers have been exploring software detection and correction methods as an alternative, commonly trading either overhead in execution time or memory usage to protect against faults. Quantum computers, a relatively recent advancement in computing technology, experience similar errors on a much more severe scale. The errors are more frequent, costly, and difficult to detect and correct. Error correction algorithms like Shor’s code promise to completely remove errors, but they cannot be implemented on current noisy intermediate-scale quantum (NISQ) systems due to the low number of available qubits. Until the physical systems become large enough to support error correction, researchers instead have been studying other methods to reduce and compensate for errors. In this work, we present two methods for improving the resilience of classical processes, both single- and multi-threaded. We then introduce quantum computing and compare the nature of errors and correction methods to previous classical methods. We further discuss two designs for improving compilation of quantum circuits. One method, focused on quantum neural networks (QNNs), takes advantage of partial compilation to avoid recompiling the entire circuit each time. The other method is a new approach to compiling quantum circuits using graph neural networks (GNNs) to improve the resilience of quantum circuits and increase fidelity. By using GNNs with reinforcement learning, we can train a compiler to provide improved qubit allocation that improves the success rate of quantum circuits

    High-Level Analysis of the Impact of Soft-Faults in Cyberphysical Systems

    Get PDF
    As digital systems grow in complexity and are used in a broader variety of safety-critical applications, there is an ever-increasing demand for assessing the dependability and safety of such systems, especially when subjected to hazardous environments. As a result, it is important to identify and correct any functional abnormalities and component faults as early as possible in order to minimize performance degradation and to avoid potential perilous situations. Existing techniques often lack the capacity to perform a comprehensive and exhaustive analysis on complex redundant architectures, leading to less than optimal risk evaluation. Hence, an early analysis of dependability of such safety-critical applications enables designers to develop systems that meets high dependability requirements. Existing techniques in the field often lack the capacity to perform full system analyses due to state-explosion limitations (such as transistor and gate-level analyses), or due to the time and monetary costs attached to them (such as simulation, emulation, and physical testing). In this work we develop a system-level methodology to model and analyze the effects of Single Event Upsets (SEUs) in cyberphysical system designs. The proposed methodology investigates the impacts of SEUs in the entire system model (fault tree level), including SEU propagation paths, logical masking of errors, vulnerability to specific events, and critical nodes. The methodology also provides insights on a system's weaknesses, such as the impact of each component to the system's vulnerability, as well as hidden sources of failure, such as latent faults. Moreover, the proposed methodology is able to identify and categorize the system's components in order of criticality, and to evaluate different approaches to the mitigation of such criticality (in the form of different configurations of TMR) in order to obtain the most efficient mitigation solution available. The proposed methodology is also able to model and analyze system components individually (system component level), in order to more accurately estimate the component's vulnerability to SEUs. In this case, a more refined analysis of the component is conducted, which enables us to identify the source of the component's criticality. Thereafter, a second mitigation mechanic (internal to the component) takes place, in order to evaluate the gains and costs of applying different configurations of TMR to the component internally. Finally, our approach will draw a comparison between the results obtained at both levels of analysis in order to evaluate the most efficient way of improving the targeted system design

    Programming Languages and Systems

    Get PDF
    This open access book constitutes the proceedings of the 31st European Symposium on Programming, ESOP 2022, which was held during April 5-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 21 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems

    Programming Languages and Systems

    Get PDF
    This open access book constitutes the proceedings of the 31st European Symposium on Programming, ESOP 2022, which was held during April 5-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 21 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems
    corecore