71,868 research outputs found

    Havens: Explicit Reliable Memory Regions for HPC Applications

    Full text link
    Supporting error resilience in future exascale-class supercomputing systems is a critical challenge. Due to transistor scaling trends and increasing memory density, scientific simulations are expected to experience more interruptions caused by transient errors in the system memory. Existing hardware-based detection and recovery techniques will be inadequate to manage the presence of high memory fault rates. In this paper we propose a partial memory protection scheme based on region-based memory management. We define the concept of regions called havens that provide fault protection for program objects. We provide reliability for the regions through a software-based parity protection mechanism. Our approach enables critical program objects to be placed in these havens. The fault coverage provided by our approach is application agnostic, unlike algorithm-based fault tolerance techniques.Comment: 2016 IEEE High Performance Extreme Computing Conference (HPEC '16), September 2016, Waltham, MA, US

    Beyond the golden run : evaluating the use of reference run models in fault injection analysis

    Get PDF
    Fault injection (FI) has been shown to be an effective approach to assess- ing the dependability of software systems. To determine the impact of faults injected during FI, a given oracle is needed. This oracle can take a variety of forms, however prominent oracles include (i) specifications, (ii) error detection mechanisms and (iii) golden runs. Focusing on golden runs, in this paper we show that there are classes of software which a golden run based approach can not be used to analyse. Specifically we demonstrate that a golden run based approach can not be used when analysing systems which employ a main control loop with an irregular period. Further, we show how a simple model, which has been refined using FI, can be employed as an oracle in the analysis of such a system

    Detecting and Refactoring Operational Smells within the Domain Name System

    Full text link
    The Domain Name System (DNS) is one of the most important components of the Internet infrastructure. DNS relies on a delegation-based architecture, where resolution of names to their IP addresses requires resolving the names of the servers responsible for those names. The recursive structures of the inter dependencies that exist between name servers associated with each zone are called dependency graphs. System administrators' operational decisions have far reaching effects on the DNSs qualities. They need to be soundly made to create a balance between the availability, security and resilience of the system. We utilize dependency graphs to identify, detect and catalogue operational bad smells. Our method deals with smells on a high-level of abstraction using a consistent taxonomy and reusable vocabulary, defined by a DNS Operational Model. The method will be used to build a diagnostic advisory tool that will detect configuration changes that might decrease the robustness or security posture of domain names before they become into production.Comment: In Proceedings GaM 2015, arXiv:1504.0244

    A methodology for the generation of efficient error detection mechanisms

    Get PDF
    A dependable software system must contain error detection mechanisms and error recovery mechanisms. Software components for the detection of errors are typically designed based on a system specification or the experience of software engineers, with their efficiency typically being measured using fault injection and metrics such as coverage and latency. In this paper, we introduce a methodology for the design of highly efficient error detection mechanisms. The proposed methodology combines fault injection analysis and data mining techniques in order to generate predicates for efficient error detection mechanisms. The results presented demonstrate the viability of the methodology as an approach for the development of efficient error detection mechanisms, as the predicates generated yield a true positive rate of almost 100% and a false positive rate very close to 0% for the detection of failure-inducing states. The main advantage of the proposed methodology over current state-of-the-art approaches is that efficient detectors are obtained by design, rather than by using specification-based detector design or the experience of software engineers

    Optimizing Scrubbing by Netlist Analysis for FPGA Configuration Bit Classification and Floorplanning

    Full text link
    Existing scrubbing techniques for SEU mitigation on FPGAs do not guarantee an error-free operation after SEU recovering if the affected configuration bits do belong to feedback loops of the implemented circuits. In this paper, we a) provide a netlist-based circuit analysis technique to distinguish so-called critical configuration bits from essential bits in order to identify configuration bits which will need also state-restoring actions after a recovered SEU and which not. Furthermore, b) an alternative classification approach using fault injection is developed in order to compare both classification techniques. Moreover, c) we will propose a floorplanning approach for reducing the effective number of scrubbed frames and d), experimental results will give evidence that our optimization methodology not only allows to detect errors earlier but also to minimize the Mean-Time-To-Repair (MTTR) of a circuit considerably. In particular, we show that by using our approach, the MTTR for datapath-intensive circuits can be reduced by up to 48.5% in comparison to standard approaches

    Identification of suitable biomarkers for stress and emotion detection for future personal affective wearable sensors

    Get PDF
    Skin conductivity (i.e., sweat) forms the basis of many physiology-based emotion and stress detection systems. However, such systems typically do not detect the biomarkers present in sweat, and thus do not take advantage of the biological information in the sweat. Likewise, such systems do not detect the volatile organic components (VOC’s) created under stressful conditions. This work presents a review into the current status of human emotional stress biomarkers and proposes the major potential biomarkers for future wearable sensors in affective systems. Emotional stress has been classified as a major contributor in several social problems, related to crime, health, the economy, and indeed quality of life. While blood cortisol tests, electroencephalography and physiological parameter methods are the gold standards for measuring stress; however, they are typically invasive or inconvenient and not suitable for wearable real-time stress monitoring. Alternatively, cortisol in biofluids and VOCs emitted from the skin appear to be practical and useful markers for sensors to detect emotional stress events. This work has identified antistress hormones and cortisol metabolites as the primary stress biomarkers that can be used in future sensors for wearable affective systems
    corecore