13,472 research outputs found

    Pinwheel Scheduling for Fault-tolerant Broadcast Disks in Real-time Database Systems

    Full text link
    The design of programs for broadcast disks which incorporate real-time and fault-tolerance requirements is considered. A generalized model for real-time fault-tolerant broadcast disks is defined. It is shown that designing programs for broadcast disks specified in this model is closely related to the scheduling of pinwheel task systems. Some new results in pinwheel scheduling theory are derived, which facilitate the efficient generation of real-time fault-tolerant broadcast disk programs.National Science Foundation (CCR-9308344, CCR-9596282

    Parallelizing Deadlock Resolution in Symbolic Synthesis of Distributed Programs

    Full text link
    Previous work has shown that there are two major complexity barriers in the synthesis of fault-tolerant distributed programs: (1) generation of fault-span, the set of states reachable in the presence of faults, and (2) resolving deadlock states, from where the program has no outgoing transitions. Of these, the former closely resembles with model checking and, hence, techniques for efficient verification are directly applicable to it. Hence, we focus on expediting the latter with the use of multi-core technology. We present two approaches for parallelization by considering different design choices. The first approach is based on the computation of equivalence classes of program transitions (called group computation) that are needed due to the issue of distribution (i.e., inability of processes to atomically read and write all program variables). We show that in most cases the speedup of this approach is close to the ideal speedup and in some cases it is superlinear. The second approach uses traditional technique of partitioning deadlock states among multiple threads. However, our experiments show that the speedup for this approach is small. Consequently, our analysis demonstrates that a simple approach of parallelizing the group computation is likely to be the effective method for using multi-core computing in the context of deadlock resolution

    EOS: A project to investigate the design and construction of real-time distributed embedded operating systems

    Get PDF
    The EOS project is investigating the design and construction of a family of real-time distributed embedded operating systems for reliable, distributed aerospace applications. Using the real-time programming techniques developed in co-operation with NASA in earlier research, the project staff is building a kernel for a multiple processor networked system. The first six months of the grant included a study of scheduling in an object-oriented system, the design philosophy of the kernel, and the architectural overview of the operating system. In this report, the operating system and kernel concepts are described. An environment for the experiments has been built and several of the key concepts of the system have been prototyped. The kernel and operating system is intended to support future experimental studies in multiprocessing, load-balancing, routing, software fault-tolerance, distributed data base design, and real-time processing

    A Short Counterexample Property for Safety and Liveness Verification of Fault-tolerant Distributed Algorithms

    Full text link
    Distributed algorithms have many mission-critical applications ranging from embedded systems and replicated databases to cloud computing. Due to asynchronous communication, process faults, or network failures, these algorithms are difficult to design and verify. Many algorithms achieve fault tolerance by using threshold guards that, for instance, ensure that a process waits until it has received an acknowledgment from a majority of its peers. Consequently, domain-specific languages for fault-tolerant distributed systems offer language support for threshold guards. We introduce an automated method for model checking of safety and liveness of threshold-guarded distributed algorithms in systems where the number of processes and the fraction of faulty processes are parameters. Our method is based on a short counterexample property: if a distributed algorithm violates a temporal specification (in a fragment of LTL), then there is a counterexample whose length is bounded and independent of the parameters. We prove this property by (i) characterizing executions depending on the structure of the temporal formula, and (ii) using commutativity of transitions to accelerate and shorten executions. We extended the ByMC toolset (Byzantine Model Checker) with our technique, and verified liveness and safety of 10 prominent fault-tolerant distributed algorithms, most of which were out of reach for existing techniques.Comment: 16 pages, 11 pages appendi

    Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 1: Army fault tolerant architecture overview

    Get PDF
    Digital computing systems needed for Army programs such as the Computer-Aided Low Altitude Helicopter Flight Program and the Armored Systems Modernization (ASM) vehicles may be characterized by high computational throughput and input/output bandwidth, hard real-time response, high reliability and availability, and maintainability, testability, and producibility requirements. In addition, such a system should be affordable to produce, procure, maintain, and upgrade. To address these needs, the Army Fault Tolerant Architecture (AFTA) is being designed and constructed under a three-year program comprised of a conceptual study, detailed design and fabrication, and demonstration and validation phases. Described here are the results of the conceptual study phase of the AFTA development. Given here is an introduction to the AFTA program, its objectives, and key elements of its technical approach. A format is designed for representing mission requirements in a manner suitable for first order AFTA sizing and analysis, followed by a discussion of the current state of mission requirements acquisition for the targeted Army missions. An overview is given of AFTA's architectural theory of operation

    On model checking data-independent systems with arrays without reset

    Full text link
    A system is data-independent with respect to a data type X iff the operations it can perform on values of type X are restricted to just equality testing. The system may also store, input and output values of type X. We study model checking of systems which are data-independent with respect to two distinct type variables X and Y, and may in addition use arrays with indices from X and values from Y . Our main interest is the following parameterised model-checking problem: whether a given program satisfies a given temporal-logic formula for all non-empty nite instances of X and Y . Initially, we consider instead the abstraction where X and Y are infinite and where partial functions with finite domains are used to model arrays. Using a translation to data-independent systems without arrays, we show that the u-calculus model-checking problem is decidable for these systems. From this result, we can deduce properties of all systems with finite instances of X and Y . We show that there is a procedure for the above parameterised model-checking problem of the universal fragment of the u-calculus, such that it always terminates but may give false negatives. We also deduce that the parameterised model-checking problem of the universal disjunction-free fragment of the u-calculus is decidable. Practical motivations for model checking data-independent systems with arrays include verification of memory and cache systems, where X is the type of memory addresses, and Y the type of storable values. As an example we verify a fault-tolerant memory interface over a set of unreliable memories.Comment: Appeared in Theory and Practice of Logic Programming, vol. 4, no. 5&6, 200

    Blazes: Coordination Analysis for Distributed Programs

    Full text link
    Distributed consistency is perhaps the most discussed topic in distributed systems today. Coordination protocols can ensure consistency, but in practice they cause undesirable performance unless used judiciously. Scalable distributed architectures avoid coordination whenever possible, but under-coordinated systems can exhibit behavioral anomalies under fault, which are often extremely difficult to debug. This raises significant challenges for distributed system architects and developers. In this paper we present Blazes, a cross-platform program analysis framework that (a) identifies program locations that require coordination to ensure consistent executions, and (b) automatically synthesizes application-specific coordination code that can significantly outperform general-purpose techniques. We present two case studies, one using annotated programs in the Twitter Storm system, and another using the Bloom declarative language.Comment: Updated to include additional materials from the original technical report: derivation rules, output stream label
    • …
    corecore