54,250 research outputs found

    Fault diagnosis of distributed systems : analysis, simulation and performance measurement.

    Get PDF
    Fault diagnosis forms an essential component in the design of highly reliable distributed computing systems. Early models for diagnosis require a global observer, whereas the diagnosis is shared between the systems nodes in later models. These models are reviewed and their different diagnosability properties reconciled. The design of improved fault diagnosis algorithms for systems without a global observer provides the main motivation for the thesis. The modified algorithm SELF3 [Hoss88] is taken as a starting point. A number of communication architectures used in distributed systems are reviewed. The properties of diagnosis algorithms depend strongly on the testing graph. A general class of testing graphs, designated as H-graphs, (which are a generalization of Dꞩṭ graphs introduced in [Prep67]), are investigated and their diagnostic properties determined. A software simulator for distributed systems has been written as the main investigative tool for diagnosis algorithms. The design and structure of the simulator are described. The diagnosis process is measured in terms of diagnostic time and number of messages produced, and the factors upon which these quantities depend are identified. The results of simulation of a number of systems are given under various fault conditions. A modified way of routing diagnosis messages, which, especially in large system s, results in a reduction in both the number of diagnosis messages and the time required to perform diagnosis, is presented. The thesis also contains a number of specific recommendations for improving existing self-diagnosis algorithms

    Fault Recovery in Swarm Robotics Systems using Learning Algorithms

    Get PDF
    When faults occur in swarm robotic systems they can have a detrimental effect on collective behaviours, to the point that failed individuals may jeopardise the swarm's ability to complete its task. Although fault tolerance is a desirable property of swarm robotic systems, fault recovery mechanisms have not yet been thoroughly explored. Individual robots may suffer a variety of faults, which will affect collective behaviours in different ways, therefore a recovery process is required that can cope with many different failure scenarios. In this thesis, we propose a novel approach for fault recovery in robot swarms that uses Reinforcement Learning and Self-Organising Maps to select the most appropriate recovery strategy for any given scenario. The learning process is evaluated in both centralised and distributed settings. Additionally, we experimentally evaluate the performance of this approach in comparison to random selection of fault recovery strategies, using simulated collective phototaxis, aggregation and foraging tasks as case studies. Our results show that this machine learning approach outperforms random selection, and allows swarm robotic systems to recover from faults that would otherwise prevent the swarm from completing its mission. This work builds upon existing research in fault detection and diagnosis in robot swarms, with the aim of creating a fully fault-tolerant swarm capable of long-term autonomy

    Agent Based Test and Repair of Distributed Systems

    Get PDF
    This article demonstrates how to use intelligent agents for testing and repairing a distributed system, whose elements may or may not have embedded BIST (Built-In Self-Test) and BISR (Built-In Self-Repair) facilities. Agents are software modules that perform monitoring, diagnosis and repair of the faults. They form together a society whose members communicate, set goals and solve tasks. An experimental solution is presented, and future developments of the proposed approach are explore

    Towards distributed diagnosis of the Tennessee Eastman process benchmark

    Get PDF
    A distributed hybrid strategy is outlined for the isolation of faults and disturbances in the Tennessee Eastman process, which would build on existing structures for distributed control systems, so should be easy to implement, be cheap and be widely applicable. The main emphasis in the paper is on one component of the strategy, a steady-state-based approach. Results obtained by applying this approach are presented and knowledge limitations are discussed. In particular a way in which a knowledge-base might evolve to improve isolation capabilities is suggested and the role of the operator is briefly discussed

    A self-validating control system based approach to plant fault detection and diagnosis

    Get PDF
    An approach is proposed in which fault detection and diagnosis (FDD) tasks are distributed to separate FDD modules associated with each control system located throughout a plant. Intended specifically for those control systems that inherently eliminate steady state error, it is modular, steady state based, requires very little process specific information and therefore should be attractive to control systems implementers who seek economies of scale. The approach is applicable to virtually all types of process plant, whether they are open loop stable or not, have a type or class number of zero or not and so on. Based on qualitative reasoning, the approach is founded on the application of control systems theory to single and cascade control systems with integral action. This results in the derivation of cause-effect knowledge and fault isolation procedures that take into account factors like interactions between control systems, and the availability of non-control-loop-based sensors

    Oscillation-based DFT for Second-order Bandpass OTA-C Filters

    Get PDF
    This document is the Accepted Manuscript version. Under embargo until 6 September 2018. The final publication is available at Springer via https://doi.org/10.1007/s00034-017-0648-9.This paper describes a design for testability technique for second-order bandpass operational transconductance amplifier and capacitor filters using an oscillation-based test topology. The oscillation-based test structure is a vectorless output test strategy easily extendable to built-in self-test. The proposed methodology converts filter under test into a quadrature oscillator using very simple techniques and measures the output frequency. Using feedback loops with nonlinear block, the filter-to-oscillator conversion techniques easily convert the bandpass OTA-C filter into an oscillator. With a minimum number of extra components, the proposed scheme requires a negligible area overhead. The validity of the proposed method has been verified using comparison between faulty and fault-free simulation results of Tow-Thomas and KHN OTA-C filters. Simulation results in 0.25μm CMOS technology show that the proposed oscillation-based test strategy for OTA-C filters is suitable for catastrophic and parametric faults testing and also effective in detecting single and multiple faults with high fault coverage.Peer reviewedFinal Accepted Versio

    AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments

    Get PDF
    This report considers the application of Articial Intelligence (AI) techniques to the problem of misuse detection and misuse localisation within telecommunications environments. A broad survey of techniques is provided, that covers inter alia rule based systems, model-based systems, case based reasoning, pattern matching, clustering and feature extraction, articial neural networks, genetic algorithms, arti cial immune systems, agent based systems, data mining and a variety of hybrid approaches. The report then considers the central issue of event correlation, that is at the heart of many misuse detection and localisation systems. The notion of being able to infer misuse by the correlation of individual temporally distributed events within a multiple data stream environment is explored, and a range of techniques, covering model based approaches, `programmed' AI and machine learning paradigms. It is found that, in general, correlation is best achieved via rule based approaches, but that these suffer from a number of drawbacks, such as the difculty of developing and maintaining an appropriate knowledge base, and the lack of ability to generalise from known misuses to new unseen misuses. Two distinct approaches are evident. One attempts to encode knowledge of known misuses, typically within rules, and use this to screen events. This approach cannot generally detect misuses for which it has not been programmed, i.e. it is prone to issuing false negatives. The other attempts to `learn' the features of event patterns that constitute normal behaviour, and, by observing patterns that do not match expected behaviour, detect when a misuse has occurred. This approach is prone to issuing false positives, i.e. inferring misuse from innocent patterns of behaviour that the system was not trained to recognise. Contemporary approaches are seen to favour hybridisation, often combining detection or localisation mechanisms for both abnormal and normal behaviour, the former to capture known cases of misuse, the latter to capture unknown cases. In some systems, these mechanisms even work together to update each other to increase detection rates and lower false positive rates. It is concluded that hybridisation offers the most promising future direction, but that a rule or state based component is likely to remain, being the most natural approach to the correlation of complex events. The challenge, then, is to mitigate the weaknesses of canonical programmed systems such that learning, generalisation and adaptation are more readily facilitated
    corecore