2,162 research outputs found

    Memory read faults: taxonomy and automatic test generation

    Get PDF
    This paper presents an innovative algorithm for the automatic generation of March tests. The proposed approach is able to generate an optimal March test for an unconstrained set of memory faults in very low computation time. Moreover, we propose a new complete taxonomy for memory read faults, a class of faults never carefully addressed in the past

    Development of a Data Driven Multiple Observer and Causal Graph Approach for Fault Diagnosis of Nuclear Power Plant Sensors and Field Devices

    Get PDF
    Data driven multiple observer and causal graph approach to fault detection and isolation is developed for nuclear power plant sensors and actuators. It can be integrated into the advanced instrumentation and control system for the next generation nuclear power plants. The developed approach is based on analytical redundancy principle of fault diagnosis. Some analytical models are built to generate the residuals between measured values and expected values. Any significant residuals are used for fault detection and the residual patterns are analyzed for fault isolation. Advanced data driven modeling methods such as Principal Component Analysis and Adaptive Network Fuzzy Inference System are used to achieve on-line accurate and consistent models. As compared with most current data-driven modeling, it is emphasized that the best choice of model structure should be obtained from physical study on a system. Multiple observer approach realizes strong fault isolation through designing appropriate residual structures. Even if one of the residuals is corrupted, the approach is able to indicate an unknown fault instead of a misleading fault. Multiple observers are designed through making full use of the redundant relationships implied in a process when predicting one variable. Data-driven causal graph is developed as a generic approach to fault diagnosis for nuclear power plants where limited fault information is available. It has the potential of combining the reasoning capability of qualitative diagnostic method and the strength of quantitative diagnostic method in fault resolution. A data-driven causal graph consists of individual nodes representing plant variables connected with adaptive quantitative models. With the causal graph, fault detection is fulfilled by monitoring the residual of each model. Fault isolation is achieved by testing the possible assumptions involved in each model. Conservatism is implied in the approach since a faulty sensor or a fault actuator signal is isolated only when their reconstructions can fully explain all the abnormal behavior of the system. The developed approaches have been applied to nuclear steam generator system of a pressurized water reactor and a simulation code has been developed to show its performance. The results show that both single and dual sensor faults and actuator faults can be detected and isolated correctly independent of fault magnitudes and initial power level during early fault transient

    Unsupervised Clustering for Fault Diagnosis in Nuclear Power Plant Components

    No full text
    International audienceThe development of empirical classification models for fault diagnosis usually requires a process of training based on a set of examples. In practice, data collected during plant operation contain signals measured in faulty conditions, but they are 'unlabeled', i.e., the indication of the type of fault is usually not available. Then, the objective of the present work is to develop a methodology for the identification of transients of similar characteristics, under the conjecture that faults of the same type lead to similar behavior in the measured signals. The proposed methodology is based on the combined use of Haar wavelet transform, fuzzy similarity, spectral clustering and the Fuzzy C-Means algorithm. A procedure for interpreting the fault cause originating the similar transients is proposed, based on the identification of prototypical behaviors. Its performance is tested with respect to an artificial case study and then applied on transients originated by different faults in the pressurizer of a nuclear power reactor

    New techniques for functional testing of microprocessor based systems

    Get PDF
    Electronic devices may be affected by failures, for example due to physical defects. These defects may be introduced during the manufacturing process, as well as during the normal operating life of the device due to aging. How to detect all these defects is not a trivial task, especially in complex systems such as processor cores. Nevertheless, safety-critical applications do not tolerate failures, this is the reason why testing such devices is needed so to guarantee a correct behavior at any time. Moreover, testing is a key parameter for assessing the quality of a manufactured product. Consolidated testing techniques are based on special Design for Testability (DfT) features added in the original design to facilitate test effectiveness. Design, integration, and usage of the available DfT for testing purposes are fully supported by commercial EDA tools, hence approaches based on DfT are the standard solutions adopted by silicon vendors for testing their devices. Tests exploiting the available DfT such as scan-chains manipulate the internal state of the system, differently to the normal functional mode, passing through unreachable configurations. Alternative solutions that do not violate such functional mode are defined as functional tests. In microprocessor based systems, functional testing techniques include software-based self-test (SBST), i.e., a piece of software (referred to as test program) which is uploaded in the system available memory and executed, with the purpose of exciting a specific part of the system and observing the effects of possible defects affecting it. SBST has been widely-studies by the research community for years, but its adoption by the industry is quite recent. My research activities have been mainly focused on the industrial perspective of SBST. The problem of providing an effective development flow and guidelines for integrating SBST in the available operating systems have been tackled and results have been provided on microprocessor based systems for the automotive domain. Remarkably, new algorithms have been also introduced with respect to state-of-the-art approaches, which can be systematically implemented to enrich SBST suites of test programs for modern microprocessor based systems. The proposed development flow and algorithms are being currently employed in real electronic control units for automotive products. Moreover, a special hardware infrastructure purposely embedded in modern devices for interconnecting the numerous on-board instruments has been interest of my research as well. This solution is known as reconfigurable scan networks (RSNs) and its practical adoption is growing fast as new standards have been created. Test and diagnosis methodologies have been proposed targeting specific RSN features, aimed at checking whether the reconfigurability of such networks has not been corrupted by defects and, in this case, at identifying the defective elements of the network. The contribution of my work in this field has also been included in the first suite of public-domain benchmark networks

    Autonomous Recovery Of Reconfigurable Logic Devices Using Priority Escalation Of Slack

    Get PDF
    Field Programmable Gate Array (FPGA) devices offer a suitable platform for survivable hardware architectures in mission-critical systems. In this dissertation, active dynamic redundancy-based fault-handling techniques are proposed which exploit the dynamic partial reconfiguration capability of SRAM-based FPGAs. Self-adaptation is realized by employing reconfiguration in detection, diagnosis, and recovery phases. To extend these concepts to semiconductor aging and process variation in the deep submicron era, resilient adaptable processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. A new approach to autonomous fault-handling which addresses these goals is developed using only a uniplex hardware arrangement. It operates by observing a health metric to achieve Fault Demotion using Recon- figurable Slack (FaDReS). Here an autonomous fault isolation scheme is employed which neither requires test vectors nor suspends the computational throughput, but instead observes the value of a health metric based on runtime input. The deterministic flow of the fault isolation scheme guarantees success in a bounded number of reconfigurations of the FPGA fabric. FaDReS is then extended to the Priority Using Resource Escalation (PURE) online redundancy scheme which considers fault-isolation latency and throughput trade-offs under a dynamic spare arrangement. While deep-submicron designs introduce new challenges, use of adaptive techniques are seen to provide several promising avenues for improving resilience. The scheme developed is demonstrated by hardware design of various signal processing circuits and their implementation on a Xilinx Virtex-4 FPGA device. These include a Discrete Cosine Transform (DCT) core, Motion Estimation (ME) engine, Finite Impulse Response (FIR) Filter, Support Vector Machine (SVM), and Advanced Encryption Standard (AES) blocks in addition to MCNC benchmark circuits. A iii significant reduction in power consumption is achieved ranging from 83% for low motion-activity scenes to 12.5% for high motion activity video scenes in a novel ME engine configuration. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. Compared to previous alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The results indicate the benefits of priority-aware resiliency over conventional redundancy approaches in terms of fault-recovery, power consumption, and resource-area requirements. Together, these provide a broad range of strategies to achieve autonomous recovery of reconfigurable logic devices under a variety of constraints, operating conditions, and optimization criteria

    Radiation Induced Fault Detection, Diagnosis, and Characterization of Field Programmable Gate Arrays

    Get PDF
    The development of Field Programmable Gate Arrays (FPGAs) has been a great achievement in the world of micro-electronics. One of these devices can be programmed to do just about anything, and replace the need for thousands of individual specialized devices. Despite their great versatility, FPGAs are still extremely vulnerable to radiation from cosmic waves in space and from adversaries on the ground. Extensive research has been conducted to examine how radiation disrupts different types of FPGAs. The results show, unfortunately, that the newer FPGAs with smaller technology are even more susceptible to radiation damage than the older ones. This research incorporates and enhances current methods of radiation detection. The design consists of 15 sensor networks that each have 29 sensors. The sensors are simple inverters, but they have the ability to detect flipped bits and delay errors caused by radiation. Analyzers process the outputs of each sensor to determine if the value agrees with what is expected. This information is fed to a reporter that creates an easy-to-read output that describes which network the fault is in, what type of fault is present, how many are in the network, how long they have been there, and the percent slowdown if it is a delay issue. Each network reports any fault data, to the computer screen in real time. This design does need some improvement, but once those improvements are made and tested, this system can be incorporated with FPGA reconfiguration methods that automatically place application logic away from failing errors of the FPGA. This system has great potential to become a great too in fault mitigation

    Experimental analysis of computer system dependability

    Get PDF
    This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance

    Theory and design of reliable spacecraft data systems

    Get PDF
    Theory and techniques applicable to design, analysis, and fault diagnosis of reliable spacecraft data system

    Constraint solving over multi-valued logics - application to digital circuits

    Get PDF
    Due to usage conditions, hazardous environments or intentional causes, physical and virtual systems are subject to faults in their components, which may affect their overall behaviour. In a ‘black-box’ agent modelled by a set of propositional logic rules, in which just a subset of components is externally visible, such faults may only be recognised by examining some output function of the agent. A (fault-free) model of the agent’s system provides the expected output given some input. If the real output differs from that predicted output, then the system is faulty. However, some faults may only become apparent in the system output when appropriate inputs are given. A number of problems regarding both testing and diagnosis thus arise, such as testing a fault, testing the whole system, finding possible faults and differentiating them to locate the correct one. The corresponding optimisation problems of finding solutions that require minimum resources are also very relevant in industry, as is minimal diagnosis. In this dissertation we use a well established set of benchmark circuits to address such diagnostic related problems and propose and develop models with different logics that we formalise and generalise as much as possible. We also prove that all techniques generalise to agents and to multiple faults. The developed multi-valued logics extend the usual Boolean logic (suitable for faultfree models) by encoding values with some dependency (usually on faults). Such logics thus allow modelling an arbitrary number of diagnostic theories. Each problem is subsequently solved with CLP solvers that we implement and discuss, together with a new efficient search technique that we present. We compare our results with other approaches such as SAT (that require substantial duplication of circuits), showing the effectiveness of constraints over multi-valued logics, and also the adequacy of a general set constraint solver (with special inferences over set functions such as cardinality) on other problems. In addition, for an optimisation problem, we integrate local search with a constructive approach (branch-and-bound) using a variety of logics to improve an existing efficient tool based on SAT and ILP

    Resilience of an embedded architecture using hardware redundancy

    Get PDF
    In the last decade the dominance of the general computing systems market has being replaced by embedded systems with billions of units manufactured every year. Embedded systems appear in contexts where continuous operation is of utmost importance and failure can be profound. Nowadays, radiation poses a serious threat to the reliable operation of safety-critical systems. Fault avoidance techniques, such as radiation hardening, have been commonly used in space applications. However, these components are expensive, lag behind commercial components with regards to performance and do not provide 100% fault elimination. Without fault tolerant mechanisms, many of these faults can become errors at the application or system level, which in turn, can result in catastrophic failures. In this work we study the concepts of fault tolerance and dependability and extend these concepts providing our own definition of resilience. We analyse the physics of radiation-induced faults, the damage mechanisms of particles and the process that leads to computing failures. We provide extensive taxonomies of 1) existing fault tolerant techniques and of 2) the effects of radiation in state-of-the-art electronics, analysing and comparing their characteristics. We propose a detailed model of faults and provide a classification of the different types of faults at various levels. We introduce an algorithm of fault tolerance and define the system states and actions necessary to implement it. We introduce novel hardware and system software techniques that provide a more efficient combination of reliability, performance and power consumption than existing techniques. We propose a new element of the system called syndrome that is the core of a resilient architecture whose software and hardware can adapt to reliable and unreliable environments. We implement a software simulator and disassembler and introduce a testing framework in combination with ERA’s assembler and commercial hardware simulators
    • 

    corecore