64 research outputs found

    Diagnosis of Discrete Event Systems with Petri Nets

    Get PDF

    Discrete and hybrid methods for the diagnosis of distributed systems

    Get PDF
    Many important activities of modern society rely on the proper functioning of complex systems such as electricity networks, telecommunication networks, manufacturing plants and aircrafts. The supervision of such systems must include strong diagnosis capability to be able to effectively detect the occurrence of faults and ensure appropriate corrective measures can be taken in order to recover from the faults or prevent total failure. This thesis addresses issues in the diagnosis of large complex systems. Such systems are usually distributed in nature, i.e. they consist of many interconnected components each having their own local behaviour. These components interact together to produce an emergent global behaviour that is complex. As those systems increase in complexity and size, their diagnosis becomes increasingly challenging. In the first part of this thesis, a method is proposed for diagnosis on distributed systems that avoids a monolithic global computation. The method, based on converting the graph of the system into a junction tree, takes into account the topology of the system in choosing how to merge local diagnoses on the components while still obtaining a globally consistent result. The method is shown to work well for systems with tree or near-tree structures. This method is further extended to handle systems with high clustering by selectively ignoring some connections that would still allow an accurate diagnosis to be obtained. A hybrid system approach is explored in the second part of the thesis, where continuous dynamics information on the system is also retained to help better isolate or identify faults. A hybrid system framework is presented that models both continuous dynamics and discrete evolution in dynamical systems, based on detecting changes in the fundamental governing dynamics of the system rather than on residual estimation. This makes it possible to handle systems that might not be well characterised and where parameter drift is present. The discrete aspect of the hybrid system model is used to derive diagnosability conditions using indicator functions for the detection and isolation of multiple, arbitrary sequential or simultaneous events in hybrid dynamical networks. Issues with diagnosis in the presence of uncertainty in measurements due sensor or actuator noise are addressed. Faults may generate symptoms that are in the same order of magnitude as the latter. The use of statistical techniques,within a hybrid system framework, is proposed to detect these elusive fault symptoms and translate this information into probabilities for the actual operational mode and possibility of transition between modes which makes it possible to apply probabilistic analysis on the system to handle the underlying uncertainty present

    Methods and Systems for Fault Diagnosis in Nuclear Power Plants

    Get PDF
    This research mainly deals with fault diagnosis in nuclear power plants (NPP), based on a framework that integrates contributions from fault scope identification, optimal sensor placement, sensor validation, equipment condition monitoring, and diagnostic reasoning based on pattern analysis. The research has a particular focus on applications where data collected from the existing SCADA (supervisory, control, and data acquisition) system is not sufficient for the fault diagnosis system. Specifically, the following methods and systems are developed. A sensor placement model is developed to guide optimal placement of sensors in NPPs. The model includes 1) a method to extract a quantitative fault-sensor incidence matrix for a system; 2) a fault diagnosability criterion based on the degree of singularities of the incidence matrix; and 3) procedures to place additional sensors to meet the diagnosability criterion. Usefulness of the proposed method is demonstrated on a nuclear power plant process control test facility (NPCTF). Experimental results show that three pairs of undiagnosable faults can be effectively distinguished with three additional sensors selected by the proposed model. A wireless sensor network (WSN) is designed and a prototype is implemented on the NPCTF. WSN is an effective tool to collect data for fault diagnosis, especially for systems where additional measurements are needed. The WSN has distributed data processing and information fusion for fault diagnosis. Experimental results on the NPCTF show that the WSN system can be used to diagnose all six fault scenarios considered for the system. A fault diagnosis method based on semi-supervised pattern classification is developed which requires significantly fewer training data than is typically required in existing fault diagnosis models. It is a promising tool for applications in NPPs, where it is usually difficult to obtain training data under fault conditions for a conventional fault diagnosis model. The proposed method has successfully diagnosed nine types of faults physically simulated on the NPCTF. For equipment condition monitoring, a modified S-transform (MST) algorithm is developed by using shaping functions, particularly sigmoid functions, to modify the window width of the existing standard S-transform. The MST can achieve superior time-frequency resolution for applications that involves non-stationary multi-modal signals, where classical methods may fail. Effectiveness of the proposed algorithm is demonstrated using a vibration test system as well as applications to detect a collapsed pipe support in the NPCTF. The experimental results show that by observing changes in time-frequency characteristics of vibration signals, one can effectively detect faults occurred in components of an industrial system. To ensure that a fault diagnosis system does not suffer from erroneous data, a fault detection and isolation (FDI) method based on kernel principal component analysis (KPCA) is extended for sensor validations, where sensor faults are detected and isolated from the reconstruction errors of a KPCA model. The method is validated using measurement data from a physical NPP. The NPCTF is designed and constructed in this research for experimental validations of fault diagnosis methods and systems. Faults can be physically simulated on the NPCTF. In addition, the NPCTF is designed to support systems based on different instrumentation and control technologies such as WSN and distributed control systems. The NPCTF has been successfully utilized to validate the algorithms and WSN system developed in this research. In a real world application, it is seldom the case that one single fault diagnostic scheme can meet all the requirements of a fault diagnostic system in a nuclear power. In fact, the values and performance of the diagnosis system can potentially be enhanced if some of the methods developed in this thesis can be integrated into a suite of diagnostic tools. In such an integrated system, WSN nodes can be used to collect additional data deemed necessary by sensor placement models. These data can be integrated with those from existing SCADA systems for more comprehensive fault diagnosis. An online performance monitoring system monitors the conditions of the equipment and provides key information for the tasks of condition-based maintenance. When a fault is detected, the measured data are subsequently acquired and analyzed by pattern classification models to identify the nature of the fault. By analyzing the symptoms of the fault, root causes of the fault can eventually be identified

    A Hybrid Approach to Fault Diagnosis in Teams of Autonomous Systems

    Get PDF
    Discrete event systems (DES) are dynamical systems equipped with a discrete state set and an event driven state transition structure. An event in a DES occurs instantaneously causing transition from one state to another. DES models have emerged to provide a formal treatment of many man-made systems such as automated manufacturing systems, computer systems, communication networks and air traffic control systems. In this thesis, we study fault diagnosis in teams of autonomous systems. In particular, one consider a team of two spacecraft in deep space. The spacecraft cooperate with each other in leader-follower formation flying. Formation flying demonstrates the capability of spacecraft to react to each other in order to maintain a desired relative distance autonomously without human intervention. In the system considered here, instruments (actuators and sensors) may fail and cause error. Because of the communication delays in deep space, each entity should be able to diagnose the failure and decide how to reconfigure itself. Basically, fault diagnosis in such systems requires information exchange between the autonomous elements of the team. The exchanged information for example may include position and velocity data. Our goal in the thesis is to propose a method for fault diagnosis with reduced information exchange. One solution is to transmit only discrete event information between autonomous systems. Transmission of discrete event data occurs less frequently than the transmission of continuous streams of data. The discrete event data may include high level supervisory commands issued every now and then and discretized values of continuous data that are transmitted only when a continuous-variable data (such as angle or acceleration) crosses the threshold. The fault diagnosis scheme proposed in this thesis is an adaptation of hybrid fault diagnosis for distributed autonomous systems. This system is simulated using MATLAB/SIMULINK Software and DECK Toolbox. We examined different maneuvers for spacecraft and investigated the effect of faults on the overall system and the performance of our designed fault diagnoser

    Diagnosis and Fault-tolerant Control, 3rd Edition

    Get PDF
    Fault-tolerant control aims at a gradual shutdown response in automated systems when faults occur. It satisfies the industrial demand for enhanced availability and safety, in contrast to traditional reactions to faults, which bring about sudden shutdowns and loss of availability. The book presents effective model-based analysis and design methods for fault diagnosis and fault-tolerant control. Architectural and structural models are used to analyse the propagation of the fault through the process, to test the fault detectability and to find the redundancies in the process that can be used to ensure fault tolerance. It also introduces design methods suitable for diagnostic systems and fault-tolerant controllers for continuous processes that are described by analytical models of discrete-event systems represented by automata. The book is suitable for engineering students, engineers in industry and researchers who wish to get an overview of the variety of approaches to process diagnosis and fault-tolerant control. The authors have extensive teaching experience with graduate and PhD students, as well as with industrial experts. Parts of this book have been used in courses for this audience. The authors give a comprehensive introduction to the main ideas of diagnosis and fault-tolerant control and present some of their most recent research achievements obtained together with their research groups in a close cooperatio n with European research projects. The third edition resulted from a major re-structuring and re-writing of the former edition, which has been used for a decade by numerous research groups. New material includes distributed diagnosis of continuous and discrete-event systems, methods for reconfigurability analysis, and extensions of the structural methods towards fault-tolerant control. The bibliographical notes at the end of all chapters have been up-dated. The chapters end with exercises to be used in lectures

    Diagnosis and Fault-Tolerant Control

    Full text link

    Autonomous Recovery Of Reconfigurable Logic Devices Using Priority Escalation Of Slack

    Get PDF
    Field Programmable Gate Array (FPGA) devices offer a suitable platform for survivable hardware architectures in mission-critical systems. In this dissertation, active dynamic redundancy-based fault-handling techniques are proposed which exploit the dynamic partial reconfiguration capability of SRAM-based FPGAs. Self-adaptation is realized by employing reconfiguration in detection, diagnosis, and recovery phases. To extend these concepts to semiconductor aging and process variation in the deep submicron era, resilient adaptable processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. A new approach to autonomous fault-handling which addresses these goals is developed using only a uniplex hardware arrangement. It operates by observing a health metric to achieve Fault Demotion using Recon- figurable Slack (FaDReS). Here an autonomous fault isolation scheme is employed which neither requires test vectors nor suspends the computational throughput, but instead observes the value of a health metric based on runtime input. The deterministic flow of the fault isolation scheme guarantees success in a bounded number of reconfigurations of the FPGA fabric. FaDReS is then extended to the Priority Using Resource Escalation (PURE) online redundancy scheme which considers fault-isolation latency and throughput trade-offs under a dynamic spare arrangement. While deep-submicron designs introduce new challenges, use of adaptive techniques are seen to provide several promising avenues for improving resilience. The scheme developed is demonstrated by hardware design of various signal processing circuits and their implementation on a Xilinx Virtex-4 FPGA device. These include a Discrete Cosine Transform (DCT) core, Motion Estimation (ME) engine, Finite Impulse Response (FIR) Filter, Support Vector Machine (SVM), and Advanced Encryption Standard (AES) blocks in addition to MCNC benchmark circuits. A iii significant reduction in power consumption is achieved ranging from 83% for low motion-activity scenes to 12.5% for high motion activity video scenes in a novel ME engine configuration. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. Compared to previous alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The results indicate the benefits of priority-aware resiliency over conventional redundancy approaches in terms of fault-recovery, power consumption, and resource-area requirements. Together, these provide a broad range of strategies to achieve autonomous recovery of reconfigurable logic devices under a variety of constraints, operating conditions, and optimization criteria

    Design Disjunction for Resilient Reconfigurable Hardware

    Get PDF
    Contemporary reconfigurable hardware devices have the capability to achieve high performance, power efficiency, and adaptability required to meet a wide range of design goals. With scaling challenges facing current complementary metal oxide semiconductor (CMOS), new concepts and methodologies supporting efficient adaptation to handle reliability issues are becoming increasingly prominent. Reconfigurable hardware and their ability to realize self-organization features are expected to play a key role in designing future dependable hardware architectures. However, the exponential increase in density and complexity of current commercial SRAM-based field-programmable gate arrays (FPGAs) has escalated the overhead associated with dynamic runtime design adaptation. Traditionally, static modular redundancy techniques are considered to surmount this limitation; however, they can incur substantial overheads in both area and power requirements. To achieve a better trade-off among performance, area, power, and reliability, this research proposes design-time approaches that enable fine selection of redundancy level based on target reliability goals and autonomous adaptation to runtime demands. To achieve this goal, three studies were conducted: First, a graph and set theoretic approach, named Hypergraph-Cover Diversity (HCD), is introduced as a preemptive design technique to shift the dominant costs of resiliency to design-time. In particular, union-free hypergraphs are exploited to partition the reconfigurable resources pool into highly separable subsets of resources, each of which can be utilized by the same synthesized application netlist. The diverse implementations provide reconfiguration-based resilience throughout the system lifetime while avoiding the significant overheads associated with runtime placement and routing phases. Evaluation on a Motion-JPEG image compression core using a Xilinx 7-series-based FPGA hardware platform has demonstrated the potential of the proposed FT method to achieve 37.5% area saving and up to 66% reduction in power consumption compared to the frequently-used TMR scheme while providing superior fault tolerance. Second, Design Disjunction based on non-adaptive group testing is developed to realize a low-overhead fault tolerant system capable of handling self-testing and self-recovery using runtime partial reconfiguration. Reconfiguration is guided by resource grouping procedures which employ non-linear measurements given by the constructive property of f-disjunctness to extend runtime resilience to a large fault space and realize a favorable range of tradeoffs. Disjunct designs are created using the mosaic convergence algorithm developed such that at least one configuration in the library evades any occurrence of up to d resource faults, where d is lower-bounded by f. Experimental results for a set of MCNC and ISCAS benchmarks have demonstrated f-diagnosability at the individual slice level with average isolation resolution of 96.4% (94.4%) for f=1 (f=2) while incurring an average critical path delay impact of only 1.49% and area cost roughly comparable to conventional 2-MR approaches. Finally, the proposed Design Disjunction method is evaluated as a design-time method to improve timing yield in the presence of large random within-die (WID) process variations for application with a moderately high production capacity
    • …
    corecore