4,856 research outputs found

    A probabilistic model for information and sensor validation

    Get PDF
    This paper develops a new theory and model for information and sensor validation. The model represents relationships between variables using Bayesian networks and utilizes probabilistic propagation to estimate the expected values of variables. If the estimated value of a variable differs from the actual value, an apparent fault is detected. The fault is only apparent since it may be that the estimated value is itself based on faulty data. The theory extends our understanding of when it is possible to isolate real faults from potential faults and supports the development of an algorithm that is capable of isolating real faults without deferring the problem to the use of expert provided domain-specific rules. To enable practical adoption for real-time processes, an any time version of the algorithm is developed, that, unlike most other algorithms, is capable of returning improving assessments of the validity of the sensors as it accumulates more evidence with time. The developed model is tested by applying it to the validation of temperature sensors during the start-up phase of a gas turbine when conditions are not stable; a problem that is known to be challenging. The paper concludes with a discussion of the practical applicability and scalability of the model

    Survivable algorithms and redundancy management in NASA's distributed computing systems

    Get PDF
    The design of survivable algorithms requires a solid foundation for executing them. While hardware techniques for fault-tolerant computing are relatively well understood, fault-tolerant operating systems, as well as fault-tolerant applications (survivable algorithms), are, by contrast, little understood, and much more work in this field is required. We outline some of our work that contributes to the foundation of ultrareliable operating systems and fault-tolerant algorithm design. We introduce our consensus-based framework for fault-tolerant system design. This is followed by a description of a hierarchical partitioning method for efficient consensus. A scheduler for redundancy management is introduced, and application-specific fault tolerance is described. We give an overview of our hybrid algorithm technique, which is an alternative to the formal approach given
    corecore