46,966 research outputs found

    Efficient Simulation of Structural Faults for the Reliability Evaluation at System-Level

    Get PDF
    In recent technology nodes, reliability is considered a part of the standard design ¿ow at all levels of embedded system design. While techniques that use only low-level models at gate- and register transfer-level offer high accuracy, they are too inefficient to consider the overall application of the embedded system. Multi-level models with high abstraction are essential to efficiently evaluate the impact of physical defects on the system. This paper provides a methodology that leverages state-of-the-art techniques for efficient fault simulation of structural faults together with transaction-level modeling. This way it is possible to accurately evaluate the impact of the faults on the entire hardware/software system. A case study of a system consisting of hardware and software for image compression and data encryption is presented and the method is compared to a standard gate/RT mixed-level approac

    Functional and timing implications of transient faults in critical systems

    Get PDF
    Embedded systems in critical domains, such as auto-motive, aviation, space domains, are often required to guarantee both functional and temporal correctness. Considering transient faults, fault analysis and mitigation approaches are implemented at various levels of the system design, in order to maintain the functional correctness. However, transient faults and their mitigation methods have a timing impact, which can affect the temporal correctness of the system. In this work, we expose the functional and the timing implications of transient faults for critical systems. More precisely, we initially highlight the timing effect of transient faults occurring in the combinational and sequential logic of a processor. Furthermore, we propose a full stack vulnerability analysis that drives the design of selective hardware-based mitigation for real-time applications. Last, we study the timing impact of software-based reliability mitigation methods applied in a COTS GPU, using a fault tolerant middleware.This work has been partially funded by ANR-FASY (ANR-21-CE25-0008-01) and received funding by ESA through the 4000136514/21/NL/GLC/my co-funded PhD activity ”Mixed Software/Hardware-based Fault-tolerance Techniques for Complex COTS System-on-Chip in Radiation Environments” and the GPU4S (GPU for Space) project. Moreover, it was partially supported by the Spanish Ministry of Economy and Competitiveness under grants PID2019-107255GB-C21 and IJC2020-045931-I (Spanish State Research Agency / http://dx.doi.org/10.13039/501100011033), by the European Union’s Horizon 2020 grant agreement No 739551 (KIOS CoE) and from the Government of the Republic of Cyprus through the Cyprus Deputy Ministry of Research, Innovation and Digital Policy.Peer ReviewedPostprint (author's final draft

    Cross-layer system reliability assessment framework for hardware faults

    Get PDF
    System reliability estimation during early design phases facilitates informed decisions for the integration of effective protection mechanisms against different classes of hardware faults. When not all system abstraction layers (technology, circuit, microarchitecture, software) are factored in such an estimation model, the delivered reliability reports must be excessively pessimistic and thus lead to unacceptably expensive, over-designed systems. We propose a scalable, cross-layer methodology and supporting suite of tools for accurate but fast estimations of computing systems reliability. The backbone of the methodology is a component-based Bayesian model, which effectively calculates system reliability based on the masking probabilities of individual hardware and software components considering their complex interactions. Our detailed experimental evaluation for different technologies, microarchitectures, and benchmarks demonstrates that the proposed model delivers very accurate reliability estimations (FIT rates) compared to statistically significant but slow fault injection campaigns at the microarchitecture level.Peer ReviewedPostprint (author's final draft

    Experimental analysis of computer system dependability

    Get PDF
    This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
    corecore