24 research outputs found

    A Design Approach for Soft Errors Protection in Real-Time Systems

    Get PDF
    This paper proposes the use of metrics to refine system design for soft errors protection in system on chip architectures. Specifically this research shows the use of metrics in design space exploration that highlight where in the structure of the model and at what point in the behaviour, protection is needed against soft errors. As these metrics improve the ability of the system to provide functionality, they are referred to here as reliability metrics. Previous approaches to prevent soft errors focused on recovery after detection. Almost no research has been directed towards preventive measures. But in real-time systems, deadlines are performance requirements that absolutely must be met and a missed deadline constitutes an erroneous action and a possible system failure. This paper focuses on a preventive approach as a solution rather than recovery after detection. The intention of this research is to prevent serious loss of system functionality or system failure though it may not be able to eliminate the impact of soft errors completely

    A Model-Based Soft Errors Risks Minimization Approach

    Get PDF
    Minimizing the risk of system failure in any computer structure requires identifying those components whose failure is likely to impact on system functionality. Clearly, the degree of protection or prevention required against faults is not the same for all components. Tolerating soft errors can be much improved if critical components can be identified at an early design phase and measures are taken to lower their criticalities at that stage. This improvement is achieved by presenting a criticality ranking (among the components) formed by combining a prediction of faults, consequences of them, and a propagation of errors at the system modeling phase; and pointing out ways to apply changes in the model to minimize the risk of degradation of desired functionalities. Case study results are given to validate the approach

    A Novel Approach to Minimizing the Risks of Soft Errors in Mobile and Ubiquitous Systems

    Get PDF
    A novel approach to minimizing the risks of soft errors at modelling level of mobile and ubiquitous systems is outlined. From a pure dependability viewpoint, critical components, whose failure is likely to impact on system functionality, attract more attention of protection/prevention mechanisms (against soft errors) than others do. Tolerating soft errors can be much improved if critical components can be identified at an early design phase and measures are taken to lower their criticalities at that stage. This improvement is achieved by presenting a criticality ranking (among the components) formed by combining a prediction of soft errors, consequences of them, and a propagation of failures at system modelling phase; and pointing out the ways to apply changes in the model to minimize the risks of degradation of desired functionalities. Case study results are given to illustrate and validate the approach

    Cross-Layer Early Reliability Evaluation for the Computing cOntinuum

    Get PDF
    Advanced multifunctional computing systems realized in forthcoming technologies hold the promise of a significant increase of the computational capability that will offer end-users ever improving services and functionalities (e.g., next generation mobile devices, cloud services, etc.). However, the same path that is leading technologies toward these remarkable achievements is also making electronic devices increasingly unreliable, posing a threat to our society that is depending on the ICT in every aspect of human activities. Reliability of electronic systems is therefore a key challenge for the whole ICT technology and must be guaranteed without penalizing or slowing down the characteristics of the final products. CLERECO EU FP7 (GA No. 611404) research project addresses early accurate reliability evaluation and efficient exploitation of reliability at different design phases, since these aspects are two of the most important and challenging tasks toward this goal

    Lock-V: a heterogeneous fault tolerance architecture based on Arm and RISC-V

    Get PDF
    This article presents Lock-V, a heterogeneous fault tolerance architecture that explores a dual-core lockstep (DCLS) technique to mitigate single event upset (SEU) and common-mode failure (CMF) problems. The Lock-V was deployed in two versions, Lock-VA and Lock-VM by applying design diversity in two processor architectures at the instruction set architecture (ISA)-level. Lock-VA features an Arm Cortex-A9 with a RISC-V RV64GC, while Lock-VM includes an Arm Cortex-M3 along with a RISC-V RV32IMA processor. The solution explores fieldprogrammable gate array (FPGA) technology to deploy softcore versions of the RISC-V processors, and dedicated accelerators for performing error detection and triggering the software rollback system used for error recovery. To test Lock-V in both versions, a fault-injection mechanism was implemented to cause bit-flips in the processor registers, a common problem usually present in heavy radiation environments.This work has been supported by FCT - Fundação para a Ciência e a Tecnologia within the R&D Units Project Scope: UIDB/00319/2020

    Exploiting Existing Copies in Register File for Soft Error Correction

    Get PDF
    Soft errors are an increasingly important problem in contemporary digital systems. Being the major data holding component in contemporary microprocessors, the register file has been an important part of the processor on which researchers offered many different schemes to protect against soft errors. In this paper we build on the previously proposed schemes and start with the observation that many register values already have a replica inside the storage space. We use this already available redundancy inside the register file in combination with a previously proposed value replication scheme for soft error detection and correction. We show that, by employing schemes that make use of the already available copies of the values inside the register file, it is possible to detect and correct 39.0 percent of the errors with an additional power consumption of 18.9 percent

    ESoftCheck: Removal of Non-vital Checks for Fault Tolerance

    Full text link