4,544 research outputs found

    Rapid Recovery for Systems with Scarce Faults

    Full text link
    Our goal is to achieve a high degree of fault tolerance through the control of a safety critical systems. This reduces to solving a game between a malicious environment that injects failures and a controller who tries to establish a correct behavior. We suggest a new control objective for such systems that offers a better balance between complexity and precision: we seek systems that are k-resilient. In order to be k-resilient, a system needs to be able to rapidly recover from a small number, up to k, of local faults infinitely many times, provided that blocks of up to k faults are separated by short recovery periods in which no fault occurs. k-resilience is a simple but powerful abstraction from the precise distribution of local faults, but much more refined than the traditional objective to maximize the number of local faults. We argue why we believe this to be the right level of abstraction for safety critical systems when local faults are few and far between. We show that the computational complexity of constructing optimal control with respect to resilience is low and demonstrate the feasibility through an implementation and experimental results.Comment: In Proceedings GandALF 2012, arXiv:1210.202

    Redundancy management for efficient fault recovery in NASA's distributed computing system

    Get PDF
    The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance

    Clustering Algorithms for Scale-free Networks and Applications to Cloud Resource Management

    Full text link
    In this paper we introduce algorithms for the construction of scale-free networks and for clustering around the nerve centers, nodes with a high connectivity in a scale-free networks. We argue that such overlay networks could support self-organization in a complex system like a cloud computing infrastructure and allow the implementation of optimal resource management policies.Comment: 14 pages, 8 Figurs, Journa

    Generating Property-Directed Potential Invariants By Backward Analysis

    Full text link
    This paper addresses the issue of lemma generation in a k-induction-based formal analysis of transition systems, in the linear real/integer arithmetic fragment. A backward analysis, powered by quantifier elimination, is used to output preimages of the negation of the proof objective, viewed as unauthorized states, or gray states. Two heuristics are proposed to take advantage of this source of information. First, a thorough exploration of the possible partitionings of the gray state space discovers new relations between state variables, representing potential invariants. Second, an inexact exploration regroups and over-approximates disjoint areas of the gray state space, also to discover new relations between state variables. k-induction is used to isolate the invariants and check if they strengthen the proof objective. These heuristics can be used on the first preimage of the backward exploration, and each time a new one is output, refining the information on the gray states. In our context of critical avionics embedded systems, we show that our approach is able to outperform other academic or commercial tools on examples of interest in our application field. The method is introduced and motivated through two main examples, one of which was provided by Rockwell Collins, in a collaborative formal verification framework.Comment: In Proceedings FTSCS 2012, arXiv:1212.657
    • …
    corecore