558 research outputs found

    Optimal discrimination between transient and permanent faults

    Get PDF
    An important practical problem in fault diagnosis is discriminating between permanent faults and transient faults. In many computer systems, the majority of errors are due to transient faults. Many heuristic methods have been used for discriminating between transient and permanent faults; however, we have found no previous work stating this decision problem in clear probabilistic terms. We present an optimal procedure for discriminating between transient and permanent faults, based on applying Bayesian inference to the observed events (correct and erroneous results). We describe how the assessed probability that a module is permanently faulty must vary with observed symptoms. We describe and demonstrate our proposed method on a simple application problem, building the appropriate equations and showing numerical examples. The method can be implemented as a run-time diagnosis algorithm at little computational cost; it can also be used to evaluate any heuristic diagnostic procedure by compariso

    Mapping of Fault-Tolerant Applications with Transparency on Distributed Embedded Systems

    Get PDF

    Machine learning methods for fault classification

    Get PDF
    With the constant evolution and ever-increasing transistor densities in semiconductor technology, error rates are on the rise. Errors that occur on semiconductor chips can be attributed to permanent, transient or intermittent faults. Out of these errors, once permanent errors appear, they do not go away and once intermittent faults appear on chips, the probability that they will occur again is high, making these two types of faults critical. Transient faults occur very rarely, making them non-critical. Incorrect classification during manufacturing tests in case of critical faults, may result in failure of the chip during operational lifetime or decrease in product quality, whereas discarding chips with non-critical faults may result in unnecessary yield loss. Existing mechanisms to distinguish between the fault types are mostly rule-based, and as fault types start manifesting similarly as we move to lower technology nodes, these rules become obsolete over time. Hence, rules need to be updated every time the technology is changed. Machine learning approaches have shown that the uncertainty can be compensated with previous experience. In our case, the ambiguity of classification rules can be compensated by storing past classification decisions and learn from those for accurate classification. This thesis presents an effective solution to the problem of fault classification in VLSI chips using Support Vector Machine (SVM) based machine learning techniques

    Robust fault diagnosis of physical systems in operation

    Get PDF
    Ideas are presented and demonstrated for improved robustness in diagnostic problem solving of complex physical systems in operation, or operative diagnosis. The first idea is that graceful degradation can be viewed as reasoning at higher levels of abstraction whenever the more detailed levels proved to be incomplete or inadequate. A form of abstraction is defined that applies this view to the problem of diagnosis. In this form of abstraction, named status abstraction, two levels are defined. The lower level of abstraction corresponds to the level of detail at which most current knowledge-based diagnosis systems reason. At the higher level, a graph representation is presented that describes the real-world physical system. An incremental, constructive approach to manipulating this graph representation is demonstrated that supports certain characteristics of operative diagnosis. The suitability of this constructive approach is shown for diagnosing fault propagation behavior over time, and for sometimes diagnosing systems with feedback. A way is shown to represent different semantics in the same type of graph representation to characterize different types of fault propagation behavior. An approach is demonstrated that threats these different behaviors as different fault classes, and the approach moves to other classes when previous classes fail to generate suitable hypotheses. These ideas are implemented in a computer program named Draphys (Diagnostic Reasoning About Physical Systems) and demonstrated for the domain of inflight aircraft subsystems, specifically a propulsion system (containing two turbofan systems and a fuel system) and hydraulic subsystem

    Fault Diagnosis Algorithms for Wireless Sensor Networks

    Get PDF
    The sensor nodes in wireless sensor networks (WSNs) are deployed in unattended and hostile environments. The ill-disposed environment affects the monitoring infrastructure that includes the sensor nodes and the links. In addition, node failures and environmental hazards cause frequent topology change, communication failure, and network partition. This in turn adds a new dimension to the fragility of the WSN topology. Such perturbations are far more common in WSNs than those found in conventional wireless networks. These perturbations demand efficient techniques for discovering disruptive behavior in WSNs. Traditional fault diagnosis techniques devised for wired interconnected networks, and conventional wireless networks are not directly applicable to WSNs due to its specific requirements and limitations. System-level diagnosis is a technique to identify faults in distributed networks such as multiprocessor systems, wired interconnected networks, and conventional wireless networks. Recently, this has been applied on ad hoc networks and WSNs. This is performed by deduction, based on information in the form of results of tests applied to the sensor nodes. Neighbor coordination-based system-level diagnosis is a variation of this method, which exploits the spatio-temporal correlation between sensor measurements. In this thesis, we present a new approach to diagnose faulty sensor nodes in a WSN, which works in conjunction with the underlying clustering protocol and exploits spatio-temporal correlation between sensor measurements. An advantage of this method is that the diagnostic operation constitutes real work performed by the system, rather than a specialized diagnostic task. In this way, the normal operation of the network can be used for the diagnosis and resulting less time and message overhead. In this thesis, we have devised and evaluated fault diagnosis algorithms for WSNs considering persistence of the faults (transient, intermittent, and permanent), faults in communication channels and in one of the approaches, we attempt to solve the issue of node mobility in diagnosis. A cluster based distributed fault diagnosis (CDFD) algorithm is proposed where the diagnostic local view is obtained by exploiting the spatially correlated sensor measurements. We derived an optimal threshold for effective fault diagnosis in sparse networks. The message complexity of CDFD is O(n) and the number of bits exchanged to diagnose the network are O(n log2 n). The intermittent fault diagnosis is formulated as a multiobjective optimization problem based on the inter-test interval and number of test repetitions required to diagnose the intermittent faults. The two objectives such as detection latency and energy overhead are taken into consideration with a constraint of detection errors. A high level (> 95%) of detection accuracy is achieved while keeping the false alarm rate low (< 1%) for sparse networks. The proposed cluster based distributed intermittent fault diagnosis (CDIFD) algorithm is energy efficient because in CDIFD, diagnostic messages are sent as the output of the routine tasks of the WSNs. A count and threshold-based mechanism is used to discriminate the persistence of faults. The main characteristics of these faults are the amounts of time the fault disappears. We adopt this state-holding time to discriminate transient from intermittent or permanent faults. The proposed cluster based distributed fault diagnosis and discrimination (CDFDD) algorithm is energy efficient due to the improved network lifetime which is greater than 1150 data-gathering rounds with transient fault rates as high as 20%. A mobility aware hierarchal architecture is proposed which is to detect hard and soft faults in dynamic WSN topology assuming random movements of nodes in the WSN. A test pattern that ensures error checking of each functional block of a sensor node is employed to diagnose the network. The proposed mobility aware cluster based distributed fault diagnosis (MCDFD) algorithm assures a better packet delivery ratio (> 80%) in highly dynamic networks with a fault rate as high as 30%. The network lifetime is more than 900 data-gathering rounds in a highly dynamic network with a fault rate as high as 20%

    Toward Fault Adaptive Power Systems in Electric Ships

    Get PDF
    Shipboard Power Systems (SPS) play a significant role in next-generation Navy fleets. With the increasing power demand from propulsion loads, ship service loads, weaponry systems and mission systems, a stable and reliable SPS is critical to support different aspects of ship operation. It also becomes the technology-enabler to improve ship economy, efficiency, reliability, and survivability. Moreover, it is important to improve the reliability and robustness of the SPS while working under different operating conditions to ensure safe and satisfactory operation of the system. This dissertation aims to introduce novel and effective approaches to respond to different types of possible faults in the SPS. According to the type and duration, the possible faults in the Medium Voltage DC (MVDC) SPS have been divided into two main categories: transient and permanent faults. First, in order to manage permanent faults in MVDC SPS, a novel real-time reconfiguration strategy has been proposed. Onboard postault reconfiguration aims to ensure the maximum power/service delivery to the system loads following a fault. This study aims to implement an intelligent real-time reconfiguration algorithm in the RTDS platform through an optimization technique implemented inside the Real-Time Digital Simulator (RTDS). The simulation results demonstrate the effectiveness of the proposed real-time approach to reconfigure the system under different fault situations. Second, a novel approach to mitigate the effect of the unsymmetrical transient AC faults in the MVDC SPS has been proposed. In this dissertation, the application of combined Static Synchronous Compensator (STATCOM)-Super Conducting Fault Current Limiter (SFCL) to improve the stability of the MVDC SPS during transient faults has been investigated. A Fluid Genetic Algorithm (FGA) optimization algorithm is introduced to design the STATCOM\u27s controller. Moreover, a multi-objective optimization problem has been formulated to find the optimal size of SFCL\u27s impedance. In the proposed scheme, STATCOM can assist the SFCL to keep the vital load terminal voltage close to the normal state in an economic sense. The proposed technique provides an acceptable post-disturbance and postault performance to recover the system to its normal situation over the other alternatives

    Data generation and model usage for machine learning-based dynamic security assessment and control

    Get PDF
    The global effort to decarbonise, decentralise and digitise electricity grids in response to climate change and evolving electricity markets with active consumers (prosumers) is gaining traction in countries around the world. This effort introduces new challenges to electricity grid operation. For instance, the introduction of variable renewable energy generation like wind and solar energy to replace conventional power generation like oil, gas, and coal increases the uncertainty in power systems operation. Additionally, the dynamics introduced by these renewable energy sources that are interfaced through converters are much faster than those in conventional system with thermal power plants. This thesis investigates new operating tools for the system operator that are data-driven to help manage the increased operational uncertainty in this transition. The presented work aims to an- swer some open questions regarding the implementation of these machine learning approaches in real-time operation, primarily related to the quality of training data to train accurate machine- learned models for predicting dynamic behaviour, and the use of these machine-learned models in the control room for real-time operation. To answer the first question, this thesis presents a novel sampling approach for generating ’rare’ operating conditions that are physically feasible but have not been experienced by power systems before. In so doing, the aim is to move away from historical observations that are often limited in describing the full range of operating conditions. Then, the thesis presents a novel approach based on Wasserstein distance and entropy to efficiently combine both historical and ’rare’ operating conditions to create an enriched database capable of training a high- performance classifier. To answer the second question, this thesis presents a scalable and rigorous workflow to trade-off multiple objective criteria when choosing decision tree models for real-time operation by system operators. Then, showcases a practical implementation for using a machine-learned model to optimise power system operation cost using topological control actions. Future research directions are underscored by the crucial role of machine learning in securing low inertia systems, and this thesis identifies research gaps covering physics-informed learning, machine learning-based network planning for secure operation, and robust training datasets are outlined.Open Acces
    corecore