71,868 research outputs found
Havens: Explicit Reliable Memory Regions for HPC Applications
Supporting error resilience in future exascale-class supercomputing systems
is a critical challenge. Due to transistor scaling trends and increasing memory
density, scientific simulations are expected to experience more interruptions
caused by transient errors in the system memory. Existing hardware-based
detection and recovery techniques will be inadequate to manage the presence of
high memory fault rates.
In this paper we propose a partial memory protection scheme based on
region-based memory management. We define the concept of regions called havens
that provide fault protection for program objects. We provide reliability for
the regions through a software-based parity protection mechanism. Our approach
enables critical program objects to be placed in these havens. The fault
coverage provided by our approach is application agnostic, unlike
algorithm-based fault tolerance techniques.Comment: 2016 IEEE High Performance Extreme Computing Conference (HPEC '16),
September 2016, Waltham, MA, US
Beyond the golden run : evaluating the use of reference run models in fault injection analysis
Fault injection (FI) has been shown to be an effective approach to assess- ing the dependability of software systems. To determine the impact of faults injected during FI, a given oracle is needed. This oracle can take a variety of forms, however prominent oracles include (i) specifications, (ii) error detection mechanisms and (iii) golden runs. Focusing on golden runs, in this paper we show that there are classes of software which a golden run based approach can not be used to analyse. Specifically we demonstrate that a golden run based approach can not be used when analysing systems which employ a main control loop with an irregular period. Further, we show how a simple model, which has been refined using FI, can be employed as an oracle in the analysis of such a system
Detecting and Refactoring Operational Smells within the Domain Name System
The Domain Name System (DNS) is one of the most important components of the
Internet infrastructure. DNS relies on a delegation-based architecture, where
resolution of names to their IP addresses requires resolving the names of the
servers responsible for those names. The recursive structures of the inter
dependencies that exist between name servers associated with each zone are
called dependency graphs. System administrators' operational decisions have far
reaching effects on the DNSs qualities. They need to be soundly made to create
a balance between the availability, security and resilience of the system. We
utilize dependency graphs to identify, detect and catalogue operational bad
smells. Our method deals with smells on a high-level of abstraction using a
consistent taxonomy and reusable vocabulary, defined by a DNS Operational
Model. The method will be used to build a diagnostic advisory tool that will
detect configuration changes that might decrease the robustness or security
posture of domain names before they become into production.Comment: In Proceedings GaM 2015, arXiv:1504.0244
A methodology for the generation of efficient error detection mechanisms
A dependable software system must contain error detection mechanisms and error recovery mechanisms. Software components for the detection of errors are typically designed based on a system specification or the experience of software engineers, with their efficiency typically being measured using fault injection and metrics such as coverage and latency. In this paper, we introduce a methodology for the design of highly efficient error detection mechanisms. The proposed methodology combines fault injection analysis and data mining techniques in order to generate predicates for efficient error detection mechanisms. The results presented demonstrate the viability of the methodology as an approach for the development of efficient error detection mechanisms, as the predicates generated yield a true positive rate of almost 100% and a false positive rate very close to 0% for the detection of failure-inducing states. The main advantage of the proposed methodology over current state-of-the-art approaches is that efficient detectors are obtained by design, rather than by using specification-based detector design or the experience of software engineers
Optimizing Scrubbing by Netlist Analysis for FPGA Configuration Bit Classification and Floorplanning
Existing scrubbing techniques for SEU mitigation on FPGAs do not guarantee an
error-free operation after SEU recovering if the affected configuration bits do
belong to feedback loops of the implemented circuits. In this paper, we a)
provide a netlist-based circuit analysis technique to distinguish so-called
critical configuration bits from essential bits in order to identify
configuration bits which will need also state-restoring actions after a
recovered SEU and which not. Furthermore, b) an alternative classification
approach using fault injection is developed in order to compare both
classification techniques. Moreover, c) we will propose a floorplanning
approach for reducing the effective number of scrubbed frames and d),
experimental results will give evidence that our optimization methodology not
only allows to detect errors earlier but also to minimize the
Mean-Time-To-Repair (MTTR) of a circuit considerably. In particular, we show
that by using our approach, the MTTR for datapath-intensive circuits can be
reduced by up to 48.5% in comparison to standard approaches
Identification of suitable biomarkers for stress and emotion detection for future personal affective wearable sensors
Skin conductivity (i.e., sweat) forms the basis of many physiology-based emotion and stress detection systems. However, such systems typically do not detect the biomarkers present in sweat, and thus do not take advantage of the biological information in the sweat. Likewise, such systems do not detect the volatile organic components (VOC’s) created under stressful conditions. This work presents a review into the current status of human emotional stress biomarkers and proposes the major potential biomarkers for future wearable sensors in affective systems. Emotional stress has been classified as a major contributor in several social problems, related to crime, health, the economy, and indeed quality of life. While blood cortisol tests, electroencephalography and physiological parameter methods are the gold standards for measuring stress; however, they are typically invasive or inconvenient and not suitable for wearable real-time stress monitoring. Alternatively, cortisol in biofluids and VOCs emitted from the skin appear to be practical and useful markers for sensors to detect emotional stress events. This work has identified antistress hormones and cortisol metabolites as the primary stress biomarkers that can be used in future sensors for wearable affective systems
- …