6 research outputs found

    A conceptual framework for semantic case-based safety analysis

    Get PDF
    Hazard and Operability (HAZOP) Analysis and Failure Mode and Effect Analysis (FMEA) are among the most widely used safety analysis procedures in the development of safety-critical and embedded systems. These analyses are generally perceived as complex and time-consuming, hindering an effective reuse of previous results or experiences. In this paper we present a conceptual semantic case-based framework for safety analysis, which facilitates the reuse of previous HAZOP and FMEA experiences in order to reduce the time and effort associated with these analyses. We present the core technologies of the conceptual framework and evaluated a prototype of the framework, KROSA, in an experiment with domain experts at ABB Norway. Initial results confirm the viability of the conceptual framework for industrial applicatio

    Level of confidence evaluation and its usage for Roll-back Recovery with Checkpointing optimization

    Full text link
    Increasing soft error rates for semiconductor devices manu- factured in later technologies enforces the use of fault tolerant techniques such as Roll-back Recovery with Checkpointing (RRC). However, RRC introduces time overhead that increases the completion (execution) time. For non-real-time systems, research have focused on optimizing RRC and shown that it is possible to find the optimal number of checkpoints such that the average execution time is minimal. While minimal average execution time is important, it is for real-time systems important to provide a high probability that deadlines are met. Hence, there is a need of probabilistic guarantees that jobs employing RRC complete before a given deadline. First, we present a mathematical framework for the evaluation of level of confidence, the probability that a given deadline is met, when RRC is employed. Second, we present an optimization method for RRC that finds the number of checkpoints that results in the minimal completion time while the minimal com- pletion time satisfies a given level of confidence requirement. Third, we use the proposed framework to evaluate probabilistic guarantees for RRC optimization in non-real-time systems

    Real-Time Scheduling Algorithm Design on Stochastic Processors

    Get PDF
    Recent studies have shown that significant power savings are possible with the use of in- exact processors, which may contain a small percentage of errors in computation. However, use of such processors in time-sensitive systems is challenging as these processors significantly hamper the system performance. In this thesis, a design framework is developed for real-time applications running on stochastic processors. To identify hardware error pat- terns, two methods are proposed to predict the occurrence of hardware errors. In addition, an algorithm is designed that uses knowledge of the hardware error patterns to judiciously schedule real-time jobs in order to maximize real-time performance. Both analytical and simulation results show that the proposed approach provides significant performance improvements when compared to an existing real-time scheduling algorithm and is efficient enough for online use

    Mathematics in Software Reliability and Quality Assurance

    Get PDF
    This monograph concerns the mathematical aspects of software reliability and quality assurance and consists of 11 technical papers in this emerging area. Included are the latest research results related to formal methods and design, automatic software testing, software verification and validation, coalgebra theory, automata theory, hybrid system and software reliability modeling and assessment

    Designs for increasing reliability while reducing energy and increasing lifetime

    Get PDF
    In the last decades, the computing technology experienced tremendous developments. For instance, transistors' feature size shrank to half at every two years as consistently from the first time Moore stated his law. Consequently, number of transistors and core count per chip doubles at each generation. Similarly, petascale systems that have the capability of processing more than one billion calculation per second have been developed. As a matter of fact, exascale systems are predicted to be available at year 2020. However, these developments in computer systems face a reliability wall. For instance, transistor feature sizes are getting so small that it becomes easier for high-energy particles to temporarily flip the state of a memory cell from 1-to-0 or 0-to-1. Also, even if we assume that fault-rate per transistor stays constant with scaling, the increase in total transistor and core count per chip will significantly increase the number of faults for future desktop and exascale systems. Moreover, circuit ageing is exacerbated due to increased manufacturing variability and thermal stresses, therefore, lifetime of processor structures are becoming shorter. On the other side, due to the limited power budget of the computer systems such that mobile devices, it is attractive to scale down the voltage. However, when the voltage level scales to beyond the safe margin especially to the ultra-low level, the error rate increases drastically. Nevertheless, new memory technologies such as NAND flashes present only limited amount of nominal lifetime, and when they exceed this lifetime, they can not guarantee storing of the data correctly leading to data retention problems. Due to these issues, reliability became a first-class design constraint for contemporary computing in addition to power and performance. Moreover, reliability even plays increasingly important role when computer systems process sensitive and life-critical information such as health records, financial information, power regulation, transportation, etc. In this thesis, we present several different reliability designs for detecting and correcting errors occurring in processor pipelines, L1 caches and non-volatile NAND flash memories due to various reasons. We design reliability solutions in order to serve three main purposes. Our first goal is to improve the reliability of computer systems by detecting and correcting random and non-predictable errors such as bit flips or ageing errors. Second, we aim to reduce the energy consumption of the computer systems by allowing them to operate reliably at ultra-low voltage level. Third, we target to increase the lifetime of new memory technologies by implementing efficient and low-cost reliability schemes
    corecore