11 research outputs found

    Data criticality estimation in software applications

    Get PDF
    In safety-critical applications it is often possible to exploit software techniques to increase system's fault- tolerance. Common approaches are based on data redundancy to prevent data corruption during the software execution. Duplicating most critical variables only can significantly reduce the memory and performance overheads, while still guaranteeing very good results in terms of fault-tolerance improvement. This paper presents a new methodology to compute the criticality of variables in target software applications. Instead of resorting to time consuming fault injection experiments, the proposed solution is based on the run- time analysis of the variables' behavior logged during the execution of the target application under different workloads

    Link-time smart card code hardening

    Get PDF
    This paper presents a feasibility study to protect smart card software against fault-injection attacks by means of link-time code rewriting. This approach avoids the drawbacks of source code hardening, avoids the need for manual assembly writing, and is applicable in conjunction with closed third-party compilers. We implemented a range of cookbook code hardening recipes in a prototype link-time rewriter and evaluate their coverage and associated overhead to conclude that this approach is promising. We demonstrate that the overhead of using an automated link-time approach is not significantly higher than what can be obtained with compile-time hardening or with manual hardening of compiler-generated assembly code

    ROSE::FTTransform - A source-to-source translation framework for exascale fault-tolerance research

    Full text link
    Exascale computing systems will require sufficient resilience to tolerate numerous types of hardware faults while still assuring correct program execution. Such extreme-scale machines are expected to be dominated by processors driven at lower voltages (near the minimum 0.5 volts for current transistors). At these voltage levels, the rate of transient errors increases dramatically due to the sensitivity to transient and geographically localized voltage drops on parts of the processor chip. To achieve power efficiency, these processors are likely to be streamlined and minimal, and thus they cannot be expected to handle transient errors entirely in hardware. Here we present an open, compiler-based framework to automate the armoring of High Performance Computing (HPC) software to protect it from these types of transient processor errors. We develop an open infrastructure to support research work in this area, and we define tools that, in the future, may provide more complete automated and/or semi-automated solutions to support software resiliency on future exascale architectures. Results demonstrate that our approach is feasible, pragmatic in how it can be separated from the software development process, and reasonably efficient (0% to 30% overhead for the Jacobi iteration on common hardware; and 20%, 40%, 26%, and 2% overhead for a randomly selected subset of benchmarks from the Livermore Loops [1])

    SSFT: selective software fault tolerance

    Get PDF
    Cataloged from PDF version of article.As technology advances, the processors are shrunk in size and manufactured using higher density transistors which makes them cheaper, more power efficient and more powerful. While this progress is most beneficial to end-users, these advances make processors more vulnerable to outside radiation causing soft errors which occur mostly in the form of single bit flips on data. For protection against soft errors, hardware techniques like ECC (Error Correcting Code) and Ram Parity Memory are proposed to provide error detection and even error correction capabilities. While hardware techniques provide effective solutions, software only techniques may offer cheaper and more flexible alternatives where additional hardware is not available or cannot be introduced to existing architectures. Software fault detection techniques -while powerful- rely mostly on redundancy which causes significant amount of performance overhead and increase in the number of bits susceptible to soft errors. In most cases, where reliability is a concern, the availability and performance of the system is even a bigger concern, which actually requires a multi objective optimization approach. In applications where a certain margin of error is acceptable and availability is important, the existing software fault tolerance techniques may not be applied directly because of the unacceptable performance overheads they introduce to the system. Our technique Selective Software Fault Tolerance (SSFT) aims at providing availability and reliability simultaneously, by providing only required amount of protection while preserving the quality of the program output. SSFT uses software profiling information to understand application’s vulnerabilities against transient faults. Transient faults are more likely to occur in instructions that have higher execution counts. Additionally, the instructions that cause greater damage in program output when hit by transient faults, should be considered as application weaknesses in terms of reliability. SSFT combines these information to eliminate the instructions from fault tolerance, that are less likely to be hit by transient errors or cause errors in program output. This approach reduces power consumption and redundancy (therefore less data bits susceptible to soft errors), while improving performance and providing acceptable reliability. This technique can easily be adapted to existing software fault tolerance techniques in order to achieve a more suitable form of protection that will satisfy different concerns of the application. Similarly, hybrid and hardware only approaches may also take advantage of the optimizations provided by our technique.Turhan, TuncerM.S

    A source-to-source compiler for generating dependable software

    No full text
    Over the last years, an increasing number of safety-critical tasks have been demanded for computer systems. In particular, safety-critical computer-based applications are hitting market areas where cost is a major issue, and thus solutions are required which conjugate fault tolerance with low costs. A source-to-source compiler supporting a software-implemented hardware fault tolerance approach is proposed, based on a set of source code transformation rules. The proposed approach hardens a program against transient memory errors by introducing software redundancy: every computation is performed twice and results are compared, and control flow invariants are checked explicitly. By exploiting the tool's capabilities, several benchmark applications have been hardened against transient errors. Fault injection campaigns have been performed to evaluate the fault detection capability of the hardened applications. In addition, we analyzed the proposed approach in terms of space and time overheads

    Hardware-Assisted Dependable Systems

    Get PDF
    Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations, unavailability of internet services, data losses, malfunctioning components, and consequently financial losses or even death of people. In particular, faults in microprocessors (CPUs) and memory corruption bugs are among the major unresolved issues of today. CPU faults may result in benign crashes and, more problematically, in silent data corruptions that can lead to catastrophic consequences, silently propagating from component to component and finally shutting down the whole system. Similarly, memory corruption bugs (memory-safety vulnerabilities) may result in a benign application crash but may also be exploited by a malicious hacker to gain control over the system or leak confidential data. Both these classes of errors are notoriously hard to detect and tolerate. Usual mitigation strategy is to apply ad-hoc local patches: checksums to protect specific computations against hardware faults and bug fixes to protect programs against known vulnerabilities. This strategy is unsatisfactory since it is prone to errors, requires significant manual effort, and protects only against anticipated faults. On the other extreme, Byzantine Fault Tolerance solutions defend against all kinds of hardware and software errors, but are inadequately expensive in terms of resources and performance overhead. In this thesis, we examine and propose five techniques to protect against hardware CPU faults and software memory-corruption bugs. All these techniques are hardware-assisted: they use recent advancements in CPU designs and modern CPU extensions. Three of these techniques target hardware CPU faults and rely on specific CPU features: ∆-encoding efficiently utilizes instruction-level parallelism of modern CPUs, Elzar re-purposes Intel AVX extensions, and HAFT builds on Intel TSX instructions. The rest two target software bugs: SGXBounds detects vulnerabilities inside Intel SGX enclaves, and “MPX Explained” analyzes the recent Intel MPX extension to protect against buffer overflow bugs. Our techniques achieve three goals: transparency, practicality, and efficiency. All our systems are implemented as compiler passes which transparently harden unmodified applications against hardware faults and software bugs. They are practical since they rely on commodity CPUs and require no specialized hardware or operating system support. Finally, they are efficient because they use hardware assistance in the form of CPU extensions to lower performance overhead

    Reducing the Complexity of Heterogeneous Computing: A Unified Approach for Application Development and Runtime Optimization

    Get PDF
    Heterogeneous systems with accelerators promise considerable performance improvements at a lower cost than homogeneous CPU-only systems. However, to benefit from this potential, considerable work is required from developers to integrate them efficiently in an application. This work contributes a new framework implemented with an online-learning runtime system that simplifies development and makes applications more portable, efficient and reliable across different systems

    Técnicas híbridas de tolerancia a fallos en microprocesadores

    Get PDF
    Este trabajo de tesis doctoral presenta tres nuevas técnicas de detección de errores de control de flujo en microprocesadores. Estas técnicas han sido implementadas de manera no intrusiva en un módulo hardware externo y se han combinado con técnicas de software para obtener una elevada capacidad de detección de errores. Se ha utilizado la interfaz de traza como medio de observación de la ejecución permitiendo la detección de errores de flujo de una manera no intrusiva y sin penalización en las prestaciones. La primera técnica propuesta ha sido denominada Predicción del Contador de Programa. Está técnica está basada en el cálculo del contador de programa a partir del código de instrucción actual y el valor previo del contador de programa. Esta técnica es capaz de detectar el subconjunto de errores que afectan al contador de programa de una manera muy eficiente y con un coste en términos de recursos necesarios para su implementación muy reducido. Adicionalmente, es importante destacar que la técnica Predicción del Contador de Programa es complementaria a las otras dos técnicas descritas en la tesis, Monitorización de firmas y Monitorización dual, y es necesaria para que éstas dos alcancen elevadas tasas de detección de errores de control de flujo. La técnica de Monitorización de Firmas realiza el cálculo de una firma online con el objetivo de verificar la ejecución de un bloque de instrucciones. Cada bloque tiene asignada una firma de referencia que es calculada en tiempo de compilación. La firma de referencia y la calculada en tiempo de ejecución son comparadas al final de la ejecución de cada bloque. Una de las mejoras que presenta esta técnica respecto a técnicas basadas en el cálculo de firmas propuestas por otros autores es que consigue reducir el tamaño de la memoria necesaria para almacenar las firmas de referencia utilizando una tabla denominada CFC-ST. Se han propuesto dos métodos de almacenamiento y acceso a la tabla CFC-ST, un método estático y un método dinámico. En ambos casos el impacto sobre el rendimiento del sistema es reducido al conseguir reducir el tamaño de la tabla necesaria para almacenar las firmas de referencia. La tercera técnica desarrollada en esta tesis doctoral es la técnica de Monitorización Dual. Esta técnica monitoriza la ejecución utilizando dos puntos de observación diferentes, la interfaz de traza y el bus de memoria. Gracias a estos dos puntos de observación se obtiene información de la ejecución en diferentes etapas del pipeline del microprocesador. En concreto se realiza la comparación del contador de programa y el código de instrucción obtenidos en la etapa de búsqueda con el valor de los mismos justo después de la etapa de ejecución. Uno de los puntos a destacar de las técnicas Monitorización Dual y Predicción del Contador de Programa es que pueden ser implementadas sin necesidad de almacenar información de la ejecución calculada en tiempo de compilación, reduciendo de una manera considerable el impacto sobre el rendimiento del sistema, ya que el número de recursos necesarios para su implementación es muy reducido. Se han realizado extensas campañas de inyección de fallos utilizando la herramienta AMUSE, lo cual nos ha permitido evaluar la efectividad de las tres técnicas propuestas en esta tesis doctoral. Los resultados obtenidos en cada una de las campañas de inyección demuestran que las técnicas presentadas en esta tesis detectan de una manera eficiente aquellos errores que afectan al flujo de programa alcanzando una buena solución de compromiso entre la cobertura a fallos y el impacto sobre el rendimiento del sistema.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: María Luisa López Vallejo.- Secretario: Enrique San Millán Heredia.- Vocal: Eduardo de la Torre Arnan
    corecore