41 research outputs found

    Failure diagnosis and prognosis in stochastic discrete-event and cyber-physical systems

    Get PDF
    In this dissertation we study the problem of fault diagnosis in both discrete event systems and cyber physical systems. Discrete event systems (DESs) are event-driven systems with discrete states that evolve in response to abrupt occurrences of discrete changes (called events). The stochastic DESs are used to characterize the quantitative behavior of the system, by modeling the uncertainty on the occurrence of events as random variables with certain distribution. A stochastic DES is similar to the Markov chain models, with the difference being that, in stochastic DESs, the transition is labeled with the event while the event information is omitted in a Markov chain. Many physical systems, such as manufacturing systems, communication protocols, reactive software, telephone networks, traffic systems, robotics and digital hardware, can be modeled as DESs at a certain level of abstraction. Fault diagnosis is to detect the occurrence of a fault so as to enable any fault tolerant actions. It is a crucial and challenging problem that has attracted considerable attentions in the literature of software engineering, automotive systems, power systems and nuclear engineering. In this dissertation, we propose the online detection schemes for stochastic DESs and also introduce the notions of missed detections (MDs) and false alarms (FAs), or equivalently, false-negatives and false-positives, for the schemes. The idea is that given any observation (of partially observed events), the detector recursively computes the conditional probability of the nonoccurrence of a fault and issues a fault decision if the probability of the nonoccurrence of a fault falls below an appropriately chosen threshold, and issues no-decision otherwise. We establish that S-Diagnosability is a necessary and sufficient condition for achieving any desired levels of MD and FA rates, where the notion of S-Diagnosability was proposed by Thorsley, et al. in 2005, requiring that given any tolerable ambiguity level &rho and error bound &tau , there must exist a delay bound n such that for any fault trace, its extensions, longer than n and probability of ambiguity higher than &rho, occur with probability smaller than &tau . Algorithms for determining the detection scheme parameters of detection threshold and detection delay bound for the specified MD and FA rates requirement are also presented, based on the construction of an extended observer, which computes, for each observation sequence, the set of states reached in the system model, along with their probabilities and the number of post-fault transitions executed. This dissertation also studies the fault diagnosis in cyber physical systems, where the dynamics of the physical systems over discrete sample instances are described by stochastic difference equations, and the nonfault behaviors are specified by linear-time temporal logic (LTL) formulas over sequences of requirement variables that are functions of inputs and states (just as the outputs). We first introduce the notion of an input-output stochastic hybrid automaton (I/O-SHA), and then show that it can be used to model the refinement of a given discrete-time stochastic system against its LTL specification so as to identify the system behaviors that satisfy the nonfault specification versus the ones that violate it in form of reachability of a fault location. For this we propose a refinement algorithm that refines the system model in form of discrete-time stochastic equations with respect to its specification model in form of a Buchi acceptor, and the resulting refinement can be modeled as an I/O-SHA. We further show that the fault detection problem then reduces to a state estimation problem for the I/O-SHA. The performance of the detection protocol is evaluated in terms of its FA and MD rates. We additionally propose the notion of S-Diagnosability for I/O-SHA, which can guarantee the existence of detectors that can achieve any desired FA and MD rates. We further consider the fault prognosis problem, where the goal is to predict a fault prior to its occurrence, for stochastic DESs. We introduce m-steps Stochastic-Prognosability, or simply Sm-Prognosability, requiring for any tolerance level &rho and error bound &tau , there exists a reaction bound k &ge m, such that the set of fault traces for which a fault cannot be predicted k steps in advance with tolerance level &rho, occurs with probability smaller than &tau . Similar to the fault diagnosis problem, we formalize the notion of a prognoser that maps observations to decisions by comparing a suitable statistic with a threshold, and show that Sm-Prognosability is a necessary and sufficient condition for the existence of a prognoser with reaction bound at least m (i.e., prediction at least m-steps prior to the occurrence of a fault) that can achieve any specified FA and MD rate requirement. Moreover, we provide a polynomial algorithm for verifying Sm-Prognosability

    Lower bounds for dilation, wirelength, and edge congestion of embedding graphs into hypercubes

    Full text link
    Interconnection networks provide an effective mechanism for exchanging data between processors in a parallel computing system. One of the most efficient interconnection networks is the hypercube due to its structural regularity, potential for parallel computation of various algorithms, and the high degree of fault tolerance. Thus it becomes the first choice of topological structure of parallel processing and computing systems. In this paper, lower bounds for the dilation, wirelength, and edge congestion of an embedding of a graph into a hypercube are proved. Two of these bounds are expressed in terms of the bisection width. Applying these results, the dilation and wirelength of embedding of certain complete multipartite graphs, folded hypercubes, wheels, and specific Cartesian products are computed

    Adaptive fault-tolerant routing in hypercube multicomputers

    Get PDF
    A connected hypercube with faulty links and/or nodes is called an injured hypercube. To enable any non-faulty node to communicate with any other non-faulty node in an injured hypercube, the information on component failures has to be made available to non-faulty nodes so as to route messages around the faulty components. A distributed adaptive fault tolerant routing scheme is proposed for an injured hypercube in which each node is required to know only the condition of its own links. Despite its simplicity, this scheme is shown to be capable of routing messages successfully in an injured hypercube as long as the number of faulty components is less than n. Moreover, it is proved that this scheme routes messages via shortest paths with a rather high probabiltiy and the expected length of a resulting path is very close to that of a shortest path. Since the assumption that the number of faulty components is less than n in an n-dimensional hypercube might limit the usefulness of the above scheme, a routing scheme is introduced based on depth-first search which works in the presence of an arbitrary number of faulty components. Due to the insufficient information on faulty components, the paths chosen by the above scheme may not always be the shortest. To guarantee that all messages be routed via shortest paths, it is proposed that every mode be equipped with more information than that on its own links. The effects of this additional information on routing efficiency are analyzed, and the additional information to be kept at each node for the shortest path routing is determined. Several examples and remarks are also given to illustrate the results

    Control and diagnosis of real-time systems under finite-precision measurement of time

    Get PDF
    A discrete event system (DES) is an event-driven system that evolves according to abrupt occurrences of discrete changes (events). The domain of such systems encompasses aspects of many man-made systems such as manufacturing systems, telephone networks, communication protocols, traffic systems, embedded software, asynchronous hardware, robotics, etc. Supervisory control theory for DESs studies the existence and synthesis of the supervisory controllers, namely, supervisors that restrict the system behaviors by dynamically disabling certain controllable events so that the controlled close-loop system could behave as desired. Extensive work on supervisory control of untimed DESs exists and the extension to the timed setting has been reported in the literature. In this dissertation, we study the supervisory control of dense-time DESs in which the digital-clocks of finite-precision are employed to observe the event occurrence times, thereby relaxing the assumption of the prior works that time can be measured precisely. In our setting, the passing of time is measured using the number of ticks generated by a digital-clock and we allow the plant events and digital-clock ticks to occur concurrently. We formalize the notion of a control policy that issues the control actions based on the observations of events and their occurrence times as measured using a digital-clock, and show that such a control policy can be equivalently represented as a digitalized -automaton, namely, an untimed-automaton that evolves over the events (of the plant) and ticks (of the digital-clock). We introduce the notion of observability with respect to the partial observations of time resulting from the use of a digital-clock, and show that this property together with controllability serves as a necessary and sufficient condition for the existence of a supervisor to enforce a real-time specification on a dense-time discrete event plant. The observability condition presented in the dissertation is very different from the one arising due to a partial observation of events since a partial observation of time is in general nondeterministic (the number of ticks generated in any time interval can vary from execution to execution of a digital-clock). We also present a method to verify the proposed observability and controllability conditions, and an algorithm to compute a supervisor when such conditions are satisfied. Furthermore we examine the lattice structure of a class of timing-mask observable languages, and show that the proposed observability is not preserved under intersection but preserved under union. Fault diagnosis for DESs is to detect the occurrence of a fault so as to enable any corrective actions. It is crucial in automatic control of large complex man-made systems and has attracted considerable attention in the literature of reliability engineering, control and computer science. For the event-driven systems with timing-requirements such as manufacturing systems, communication networks, real-time scheduling and traffic systems, fault diagnosis involves detecting the timing-faults, besides the sequence-faults. This requires monitoring timing and sequence of events, both of which may only be partially observed in practice. In this dissertation, we extend the prior works on fault diagnosis of timed DESs by allowing time to be partially observed using a digital-clock which measures the advancement of time with finite precision by the number of ticks. For the diagnosis purposes, the set of nonfaulty timed-traces is specified as another timed-automaton that is deterministic. We show that the set of timed-traces observed using a digital-clock with finite precision is regular, i.e., can be represented using a finite (untimed) automaton. We also show that the verification of diagnosability (the ability to detect the execution of a faulty timed-trace within a bounded time delay) as well as the off-line synthesis of a diagnoser are decidable by reducing these problems to the untimed setting. The reduction to the untimed setting also suggests an effective method for the off-line computation of a diagnoser as well as its on-line implementation for diagnosis. The aforementioned results are further extended to the nondeterministic setting, i.e., diagnosis of dense-time DESs using digital-clocks under nondeterministic event observation mask. We introduce the notion of lifting (associating each event with each of its nondeterministic observations), and show that diagnosis of dense-time DESs employing digital-clocks to observe event occurrence times under nondeterministic event observation mask can be reduced to that of the deterministic setting, i.e., diagnosis of the lifted dense-time DESs under the deterministic lifted event observation mask, and hence can be further reduced to diagnosis of the untimed setting

    Gradient based system-level diagnosis

    Get PDF
    Traditional approaches in system-level diagnosis in multiprocessor systems are usually based on the oversimplified PMC test invalidation model, however Blount introduced a more general model containing conditional probabilities as parameters for different test invalidation situations. He suggested a lookup table based approach, but no algorithmic solution has been elaborated until our P-graph based solution introduced in previous publications. In this approach the diagnostic process is formulated as an optimization problem and the optimal solution is determined. Although the average behavior of the algorithm is quite good, the worst case complexity is exponential. In this paper we introduce a novel group of fast diagnostic algorithms that we named gradient based algorithms. This approach only approximates the optimal maximum likelihood or maximum a posteriori solution, but it has a polynomial complexity of the magnitude of O\left (N \cdot NbCount + N^2\right ), where N is the size of the system and NbCount is number of neighbors of a single unit. The idea of the base algorithm is that it takes an initial fault pattern and iterates till the likelihood of the actual fault pattern can be increased with a single state-change in the pattern. Improvements of this base algorithm, complexity analysis and simulation results are also presented. The main, although not exclusive application field of the algorithms is wafer-scale diagnosis, since the accuracy and the performance is still good even if relative large number of faults are present

    Research in the effective implementation of guidance computers with large scale arrays Interim report

    Get PDF
    Functional logic character implementation in breadboard design of NASA modular compute

    Fault Diagnosis Algorithms for Wireless Sensor Networks

    Get PDF
    The sensor nodes in wireless sensor networks (WSNs) are deployed in unattended and hostile environments. The ill-disposed environment affects the monitoring infrastructure that includes the sensor nodes and the links. In addition, node failures and environmental hazards cause frequent topology change, communication failure, and network partition. This in turn adds a new dimension to the fragility of the WSN topology. Such perturbations are far more common in WSNs than those found in conventional wireless networks. These perturbations demand efficient techniques for discovering disruptive behavior in WSNs. Traditional fault diagnosis techniques devised for wired interconnected networks, and conventional wireless networks are not directly applicable to WSNs due to its specific requirements and limitations. System-level diagnosis is a technique to identify faults in distributed networks such as multiprocessor systems, wired interconnected networks, and conventional wireless networks. Recently, this has been applied on ad hoc networks and WSNs. This is performed by deduction, based on information in the form of results of tests applied to the sensor nodes. Neighbor coordination-based system-level diagnosis is a variation of this method, which exploits the spatio-temporal correlation between sensor measurements. In this thesis, we present a new approach to diagnose faulty sensor nodes in a WSN, which works in conjunction with the underlying clustering protocol and exploits spatio-temporal correlation between sensor measurements. An advantage of this method is that the diagnostic operation constitutes real work performed by the system, rather than a specialized diagnostic task. In this way, the normal operation of the network can be used for the diagnosis and resulting less time and message overhead. In this thesis, we have devised and evaluated fault diagnosis algorithms for WSNs considering persistence of the faults (transient, intermittent, and permanent), faults in communication channels and in one of the approaches, we attempt to solve the issue of node mobility in diagnosis. A cluster based distributed fault diagnosis (CDFD) algorithm is proposed where the diagnostic local view is obtained by exploiting the spatially correlated sensor measurements. We derived an optimal threshold for effective fault diagnosis in sparse networks. The message complexity of CDFD is O(n) and the number of bits exchanged to diagnose the network are O(n log2 n). The intermittent fault diagnosis is formulated as a multiobjective optimization problem based on the inter-test interval and number of test repetitions required to diagnose the intermittent faults. The two objectives such as detection latency and energy overhead are taken into consideration with a constraint of detection errors. A high level (> 95%) of detection accuracy is achieved while keeping the false alarm rate low (< 1%) for sparse networks. The proposed cluster based distributed intermittent fault diagnosis (CDIFD) algorithm is energy efficient because in CDIFD, diagnostic messages are sent as the output of the routine tasks of the WSNs. A count and threshold-based mechanism is used to discriminate the persistence of faults. The main characteristics of these faults are the amounts of time the fault disappears. We adopt this state-holding time to discriminate transient from intermittent or permanent faults. The proposed cluster based distributed fault diagnosis and discrimination (CDFDD) algorithm is energy efficient due to the improved network lifetime which is greater than 1150 data-gathering rounds with transient fault rates as high as 20%. A mobility aware hierarchal architecture is proposed which is to detect hard and soft faults in dynamic WSN topology assuming random movements of nodes in the WSN. A test pattern that ensures error checking of each functional block of a sensor node is employed to diagnose the network. The proposed mobility aware cluster based distributed fault diagnosis (MCDFD) algorithm assures a better packet delivery ratio (> 80%) in highly dynamic networks with a fault rate as high as 30%. The network lifetime is more than 900 data-gathering rounds in a highly dynamic network with a fault rate as high as 20%

    Modeling, monitoring, and diagnosis of complex systems with high-dimensional streaming data

    Get PDF
    With the development of technology, sensing systems became ubiquitous. As a result, a wide variety of complex systems are continuously monitored by hundreds of sensors collecting large volumes of rich data. Learning the structure of complex systems, from sensing data, provides unique opportunities for real-time process monitoring and for accurate fault diagnosis in a wide range of applications. This dissertation presents new methodologies to analyze the high-dimensional data collected by sensors to learn the interactions between different entities in complex systems for system monitoring and diagnosis.Ph.D
    corecore