10,195 research outputs found

    Experimental analysis of computer system dependability

    Get PDF
    This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance

    Transient fault behavior in a microprocessor: A case study

    Get PDF
    An experimental analysis is described which studies the susceptibility of a microprocessor based jet engine controller to upsets caused by current and voltage transients. A design automation environment which allows the run time injection of transients and the tracing from their impact device to the pin level is described. The resulting error data are categorized by the charge levels of the injected transients by location and by their potential to cause logic upsets, latched errors, and pin errors. The results show a 3 picoCouloumb threshold, below which the transients have little impact. An Arithmetic and Logic Unit transient is most likely to result in logic upsets and pin errors (i.e., impact the external environment). The transients in the countdown unit are potentially serious since they can result in latched errors, thus causing latent faults. Suggestions to protect the processor against these errors, by incorporating internal error detection and transient suppression techniques, are also made

    High quality testing of grid style power gating

    No full text
    This paper shows that existing delay-based testing techniques for power gating exhibit fault coverage loss due to unconsidered delays introduced by the structure of the virtual voltage power-distribution-network (VPDN). To restore this loss, which could reach up to 70.3% on stuck-open faults, we propose a design-for-testability (DFT) logic that considers the impact of VPDN on fault coverage in order to constitute the proper interface between the VPDN and the DFT. The proposed logic can be easily implemented on-top of existing DFT solutions and its overhead is optimized by an algorithm that offers trade-off flexibility between test-application-time and hardware overhead. Through physical layout SPICE simulations, we show complete fault coverage recovery on stuck-open faults and 43.2% test-application-time improvement compared to a previously proposed DFT technique. To the best of our knowledge, this paper presents the first analysis of the VPDN impact on test qualit

    Simulation-based Fault Injection with QEMU for Speeding-up Dependability Analysis of Embedded Software

    Get PDF
    Simulation-based fault injection (SFI) represents a valuable solu- tion for early analysis of software dependability and fault tolerance properties before the physical prototype of the target platform is available. Some SFI approaches base the fault injection strategy on cycle-accurate models imple- mented by means of Hardware Description Languages (HDLs). However, cycle- accurate simulation has revealed to be too time-consuming when the objective is to emulate the effect of soft errors on complex microprocessors. To overcome this issue, SFI solutions based on virtual prototypes of the target platform has started to be proposed. However, current approaches still present some draw- backs, like, for example, they work only for specific CPU architectures, or they require code instrumentation, or they have a different target (i.e., design errors instead of dependability analysis). To address these disadvantages, this paper presents an efficient fault injection approach based on QEMU, one of the most efficient and popular instruction-accurate emulator for several microprocessor architectures. As main goal, the proposed approach represents a non intrusive technique for simulating hardware faults affecting CPU behaviours. Perma- nent and transient/intermittent hardware fault models have been abstracted without losing quality for software dependability analysis. The approach mini- mizes the impact of the fault injection procedure in the emulator performance by preserving the original dynamic binary translation mechanism of QEMU. Experimental results for both x86 and ARM processors proving the efficiency and effectiveness of the proposed approach are presented

    Cross-layer system reliability assessment framework for hardware faults

    Get PDF
    System reliability estimation during early design phases facilitates informed decisions for the integration of effective protection mechanisms against different classes of hardware faults. When not all system abstraction layers (technology, circuit, microarchitecture, software) are factored in such an estimation model, the delivered reliability reports must be excessively pessimistic and thus lead to unacceptably expensive, over-designed systems. We propose a scalable, cross-layer methodology and supporting suite of tools for accurate but fast estimations of computing systems reliability. The backbone of the methodology is a component-based Bayesian model, which effectively calculates system reliability based on the masking probabilities of individual hardware and software components considering their complex interactions. Our detailed experimental evaluation for different technologies, microarchitectures, and benchmarks demonstrates that the proposed model delivers very accurate reliability estimations (FIT rates) compared to statistically significant but slow fault injection campaigns at the microarchitecture level.Peer ReviewedPostprint (author's final draft

    Chip level simulation of fault tolerant computers

    Get PDF
    Chip level modeling techniques, functional fault simulation, simulation software development, a more efficient, high level version of GSP, and a parallel architecture for functional simulation are discussed

    Cross-layer Soft Error Analysis and Mitigation at Nanoscale Technologies

    Get PDF
    This thesis addresses the challenge of soft error modeling and mitigation in nansoscale technology nodes and pushes the state-of-the-art forward by proposing novel modeling, analyze and mitigation techniques. The proposed soft error sensitivity analysis platform accurately models both error generation and propagation starting from a technology dependent device level simulations all the way to workload dependent application level analysis

    Driving and Protection of High Density High Temperature Power Module for Electric Vehicle Application

    Get PDF
    There has been an increasing trend for the commercialization of electric vehicles (EVs) to reduce greenhouse gas emissions and dependence on petroleum. However, a key technical barrier to their wide application is the development of high power density electric drive systems due to limited space within EVs. High temperature environment inherent in EVs further introduces a new level of complexity. Under high power density and high temperature operation, system reliability and safety also become important. This dissertation deals with the development of advanced driving and protection technologies for high temperature high density power module capable of operating under the harsh environment of electric vehicles, while ensuring system reliability and safety under short circuit conditions. Several related research topics will be discussed in this dissertation. First, an active gate driver (AGD) for IGBT modules is proposed to improve their overall switching performance. The proposed one has the capability of reducing the switching loss, delay time, and Miller plateau duration during turn-on and turn-off transient without sacrificing current and voltage stress. Second, a board-level integrated silicon carbide (SiC) MOSFET power module is developed for high temperature and high power density application. Specifically, a silicon-on-insulator (SOI) based gate driver board is designed and fabricated through chip-on-board (COB) technique. Also, a 1200 V / 100 A SiC MOSFET phase-leg power module is developed utilizing high temperature packaging technologies. Third, a comprehensive short circuit ruggedness evaluation and numerical investigation of up-to-date commercial silicon carbide (SiC) MOSFETs is presented. The short circuit capability of three types of commercial 1200 V SiC MOSFETs is tested under various conditions. The experimental short circuit behaviors are compared and analyzed through numerical thermal dynamic simulation. Finally, according to the short circuit ruggedness evaluation results, three short circuit protection methods are proposed to improve the reliability and overall cost of the SiC MOSFET based converter. A comparison is made in terms of fault response time, temperature dependent characteristics, and applications to help designers select a proper protection method
    • …
    corecore