14,793 research outputs found
Recommended from our members
Statistical methods for rapid system evaluation under transient and permanent faults
textTraditional solutions for test and reliability do not scale well for modern designs with their size and complexity increasing with every technology generation. Therefore, in order to meet time-to-market requirements as well as acceptable product quality, it is imperative that new methodologies be developed for quickly evaluating a system in the presence of faults. In this research, statistical methods have been employed and implemented to 1) estimate the stuck-at fault coverage of a test sequence and evaluate the given test vector set without the need for complete fault simulation, and 2) analyze design vulnerabilities in the presence of radiation-based (soft) errors. Experimental results show that these statistical techniques can evaluate a system under test orders of magnitude faster than state-of-the-art methods with a small margin of error. In this dissertation, I have introduced novel methodologies that utilize the information from fault-free simulation and partial fault simulation to predict the fault coverage of a long sequence of test vectors for a design under test. These methodologies are practical for functional testing of complex designs under a long sequence of test vectors. Industry is currently seeking efficient solutions for this challenging problem. The last part of this dissertation discusses a statistical methodology for a detailed vulnerability analysis of systems under soft errors. This methodology works orders of magnitude faster than traditional fault injection. In addition, it is shown that the vulnerability factors calculated by this method are closer to complete fault injection (which is the ideal way of soft error vulnerability analysis), compared to statistical fault injection. Performing such a fast soft error vulnerability analysis is very cruicial for companies that design and build safety-critical systems.Electrical and Computer Engineerin
Cross-Layer Early Reliability Evaluation for the Computing cOntinuum
Advanced multifunctional computing systems realized in forthcoming technologies hold the promise of a significant increase of the computational capability that will offer end-users ever improving services and functionalities (e.g., next generation mobile devices, cloud services, etc.). However, the same path that is leading technologies toward these remarkable achievements is also making electronic devices increasingly unreliable, posing a threat to our society that is depending on the ICT in every aspect of human activities. Reliability of electronic systems is therefore a key challenge for the whole ICT technology and must be guaranteed without penalizing or slowing down the characteristics of the final products. CLERECO EU FP7 (GA No. 611404) research project addresses early accurate reliability evaluation and efficient exploitation of reliability at different design phases, since these aspects are two of the most important and challenging tasks toward this goal
High-Level Analysis of the Impact of Soft-Faults in Cyberphysical Systems
As digital systems grow in complexity and are used in a broader variety of safety-critical applications, there is an ever-increasing demand for assessing the dependability and safety of such systems, especially when subjected to hazardous environments. As a result, it is important to identify and correct any functional abnormalities and component faults as early as possible in order to minimize performance degradation and to avoid potential perilous situations. Existing techniques often lack the capacity to perform a comprehensive
and exhaustive analysis on complex redundant architectures, leading to less than optimal risk evaluation. Hence, an early analysis of dependability of such safety-critical applications enables designers to develop systems that meets high dependability requirements. Existing techniques in the field often lack the capacity to perform full system analyses due to state-explosion limitations (such as transistor and gate-level analyses), or due to the time and monetary costs attached to them (such as simulation, emulation, and physical testing).
In this work we develop a system-level methodology to model and analyze the effects of Single Event Upsets (SEUs) in cyberphysical system designs. The proposed methodology investigates the impacts of SEUs in the entire system model (fault tree level), including SEU propagation paths, logical masking of errors, vulnerability to specific events, and critical nodes. The methodology also provides insights on a system's weaknesses, such as the impact of each component to the system's vulnerability, as well as hidden sources of failure, such as latent faults. Moreover, the proposed methodology is able to identify and categorize the system's components in order of criticality, and to evaluate different approaches to the mitigation of such criticality (in the form of different configurations of TMR) in order to obtain the most efficient mitigation solution available.
The proposed methodology is also able to model and analyze system components individually (system component level), in order to more accurately estimate the component's vulnerability to SEUs. In this case, a more refined analysis of the component is conducted, which enables us to identify the source of the component's criticality. Thereafter, a second mitigation mechanic (internal to the component) takes place, in order to evaluate the gains and costs of applying different configurations of TMR to the component internally. Finally, our approach will draw a comparison between the results obtained at both levels of analysis in order to evaluate the most efficient way of improving the targeted system design
Quantile-based optimization under uncertainties using adaptive Kriging surrogate models
Uncertainties are inherent to real-world systems. Taking them into account is
crucial in industrial design problems and this might be achieved through
reliability-based design optimization (RBDO) techniques. In this paper, we
propose a quantile-based approach to solve RBDO problems. We first transform
the safety constraints usually formulated as admissible probabilities of
failure into constraints on quantiles of the performance criteria. In this
formulation, the quantile level controls the degree of conservatism of the
design. Starting with the premise that industrial applications often involve
high-fidelity and time-consuming computational models, the proposed approach
makes use of Kriging surrogate models (a.k.a. Gaussian process modeling).
Thanks to the Kriging variance (a measure of the local accuracy of the
surrogate), we derive a procedure with two stages of enrichment of the design
of computer experiments (DoE) used to construct the surrogate model. The first
stage globally reduces the Kriging epistemic uncertainty and adds points in the
vicinity of the limit-state surfaces describing the system performance to be
attained. The second stage locally checks, and if necessary, improves the
accuracy of the quantiles estimated along the optimization iterations.
Applications to three analytical examples and to the optimal design of a car
body subsystem (minimal mass under mechanical safety constraints) show the
accuracy and the remarkable efficiency brought by the proposed procedure
Analyzing and Predicting Processor Vulnerability to Soft Errors Using Statistical Techniques
The shrinking processor feature size, lower threshold voltage and increasing on-chip transistor density make current processors highly vulnerable to soft errors. Architectural Vulnerability Factor (AVF) reflects the probability that a raw soft error eventually causes a visible error in the program output, indicating the processor’s susceptibility to soft errors at architectural level. The awareness of the AVF, both at the early design stage and during program runtime, is greatly useful for designing reliable processors. However, measuring the AVF is extremely costly, resulting in large overheads in hardware, computation, and power. The situation is further exacerbated in a multi-threaded processor environment where resource contention and data sharing exist among different threads. Consequently, predicting the AVF from other easily-measured metrics becomes extraordinarily attractive to computer designers. We propose a series of AVF modeling and prediction works via using advanced statistical techniques. First, we utilize the Boosted Regression Trees (BRT) scheme to dynamically predict the AVF during program execution from a variety of performance metrics. This correlation is generalized to be across different workloads, program phases, and processor configurations on a single-threaded superscalar processor. Second, the AVF prediction is extended to multi-threaded processors where the inter-thread resource contention shows significant and non-uniform impacts on different programs; we propose a two-level predictive mechanism using BRT as building blocks to characterize the contention behavior. Finally, we employ a rule search strategy named Patient Rule Induction Method (PRIM) to explore a large processor design space at the early design stage. We are capable of generating selective rules on important configuration parameters. These rules quantify the design space subregion yielding lowest values of the response, thereby providing useful guidelines for designing reliable processors while achieving high performance
Cross-layer Soft Error Analysis and Mitigation at Nanoscale Technologies
This thesis addresses the challenge of soft error modeling and mitigation in nansoscale technology nodes and pushes the state-of-the-art forward by proposing novel modeling, analyze and mitigation techniques. The proposed soft error sensitivity analysis platform accurately models both error generation and propagation starting from a technology dependent device level simulations all the way to workload dependent application level analysis
- …