2,019 research outputs found

    Phase Clocks for Transient Fault Repair

    Full text link
    Phase clocks are synchronization tools that implement a form of logical time in distributed systems. For systems tolerating transient faults by self-repair of damaged data, phase clocks can enable reasoning about the progress of distributed repair procedures. This paper presents a phase clock algorithm suited to the model of transient memory faults in asynchronous systems with read/write registers. The algorithm is self-stabilizing and guarantees accuracy of phase clocks within O(k) time following an initial state that is k-faulty. Composition theorems show how the algorithm can be used for the timing of distributed procedures that repair system outputs.Comment: 22 pages, LaTe

    Fast and compact self-stabilizing verification, computation, and fault detection of an MST

    Get PDF
    This paper demonstrates the usefulness of distributed local verification of proofs, as a tool for the design of self-stabilizing algorithms.In particular, it introduces a somewhat generalized notion of distributed local proofs, and utilizes it for improving the time complexity significantly, while maintaining space optimality. As a result, we show that optimizing the memory size carries at most a small cost in terms of time, in the context of Minimum Spanning Tree (MST). That is, we present algorithms that are both time and space efficient for both constructing an MST and for verifying it.This involves several parts that may be considered contributions in themselves.First, we generalize the notion of local proofs, trading off the time complexity for memory efficiency. This adds a dimension to the study of distributed local proofs, which has been gaining attention recently. Specifically, we design a (self-stabilizing) proof labeling scheme which is memory optimal (i.e., O(logn)O(\log n) bits per node), and whose time complexity is O(log2n)O(\log ^2 n) in synchronous networks, or O(Δlog3n)O(\Delta \log ^3 n) time in asynchronous ones, where Δ\Delta is the maximum degree of nodes. This answers an open problem posed by Awerbuch and Varghese (FOCS 1991). We also show that Ω(logn)\Omega(\log n) time is necessary, even in synchronous networks. Another property is that if ff faults occurred, then, within the requireddetection time above, they are detected by some node in the O(flogn)O(f\log n) locality of each of the faults.Second, we show how to enhance a known transformer that makes input/output algorithms self-stabilizing. It now takes as input an efficient construction algorithm and an efficient self-stabilizing proof labeling scheme, and produces an efficient self-stabilizing algorithm. When used for MST, the transformer produces a memory optimal self-stabilizing algorithm, whose time complexity, namely, O(n)O(n), is significantly better even than that of previous algorithms. (The time complexity of previous MST algorithms that used Ω(log2n)\Omega(\log^2 n) memory bits per node was O(n2)O(n^2), and the time for optimal space algorithms was O(nE)O(n|E|).) Inherited from our proof labelling scheme, our self-stabilising MST construction algorithm also has the following two properties: (1) if faults occur after the construction ended, then they are detected by some nodes within O(log2n)O(\log ^2 n) time in synchronous networks, or within O(Δlog3n)O(\Delta \log ^3 n) time in asynchronous ones, and (2) if ff faults occurred, then, within the required detection time above, they are detected within the O(flogn)O(f\log n) locality of each of the faults. We also show how to improve the above two properties, at the expense of some increase in the memory

    Separation of Circulating Tokens

    Full text link
    Self-stabilizing distributed control is often modeled by token abstractions. A system with a single token may implement mutual exclusion; a system with multiple tokens may ensure that immediate neighbors do not simultaneously enjoy a privilege. For a cyber-physical system, tokens may represent physical objects whose movement is controlled. The problem studied in this paper is to ensure that a synchronous system with m circulating tokens has at least d distance between tokens. This problem is first considered in a ring where d is given whilst m and the ring size n are unknown. The protocol solving this problem can be uniform, with all processes running the same program, or it can be non-uniform, with some processes acting only as token relays. The protocol for this first problem is simple, and can be expressed with Petri net formalism. A second problem is to maximize d when m is given, and n is unknown. For the second problem, the paper presents a non-uniform protocol with a single corrective process.Comment: 22 pages, 7 figures, epsf and pstricks in LaTe

    On Time Adaptivity and Stabilization (Algorithm Engineering as a New Paradigm)

    Get PDF

    Distributed Computing with Adaptive Heuristics

    Full text link
    We use ideas from distributed computing to study dynamic environments in which computational nodes, or decision makers, follow adaptive heuristics (Hart 2005), i.e., simple and unsophisticated rules of behavior, e.g., repeatedly "best replying" to others' actions, and minimizing "regret", that have been extensively studied in game theory and economics. We explore when convergence of such simple dynamics to an equilibrium is guaranteed in asynchronous computational environments, where nodes can act at any time. Our research agenda, distributed computing with adaptive heuristics, lies on the borderline of computer science (including distributed computing and learning) and game theory (including game dynamics and adaptive heuristics). We exhibit a general non-termination result for a broad class of heuristics with bounded recall---that is, simple rules of behavior that depend only on recent history of interaction between nodes. We consider implications of our result across a wide variety of interesting and timely applications: game theory, circuit design, social networks, routing and congestion control. We also study the computational and communication complexity of asynchronous dynamics and present some basic observations regarding the effects of asynchrony on no-regret dynamics. We believe that our work opens a new avenue for research in both distributed computing and game theory.Comment: 36 pages, four figures. Expands both technical results and discussion of v1. Revised version will appear in the proceedings of Innovations in Computer Science 201

    Fault-tolerant Algorithms for Tick-Generation in Asynchronous Logic: Robust Pulse Generation

    Full text link
    Today's hardware technology presents a new challenge in designing robust systems. Deep submicron VLSI technology introduced transient and permanent faults that were never considered in low-level system designs in the past. Still, robustness of that part of the system is crucial and needs to be guaranteed for any successful product. Distributed systems, on the other hand, have been dealing with similar issues for decades. However, neither the basic abstractions nor the complexity of contemporary fault-tolerant distributed algorithms match the peculiarities of hardware implementations. This paper is intended to be part of an attempt striving to overcome this gap between theory and practice for the clock synchronization problem. Solving this task sufficiently well will allow to build a very robust high-precision clocking system for hardware designs like systems-on-chips in critical applications. As our first building block, we describe and prove correct a novel Byzantine fault-tolerant self-stabilizing pulse synchronization protocol, which can be implemented using standard asynchronous digital logic. Despite the strict limitations introduced by hardware designs, it offers optimal resilience and smaller complexity than all existing protocols.Comment: 52 pages, 7 figures, extended abstract published at SSS 201

    Stabilizing Server-Based Storage in Byzantine Asynchronous Message-Passing Systems

    Full text link
    A stabilizing Byzantine single-writer single-reader (SWSR) regular register, which stabilizes after the first invoked write operation, is first presented. Then, new/old ordering inversions are eliminated by the use of a (bounded) sequence number for writes, obtaining a practically stabilizing SWSR atomic register. A practically stabilizing Byzantine single-writer multi-reader (SWMR) atomic register is then obtained by using several copies of SWSR atomic registers. Finally, bounded time-stamps, with a time-stamp per writer, together with SWMR atomic registers, are used to construct a practically stabilizing Byzantine multi-writer multi-reader (MWMR) atomic register. In a system of nn servers implementing an atomic register, and in addition to transient failures, the constructions tolerate t<n/8 Byzantine servers if communication is asynchronous, and t<n/3 Byzantine servers if it is synchronous. The noteworthy feature of the proposed algorithms is that (to our knowledge) these are the first that build an atomic read/write storage on top of asynchronous servers prone to transient failures, and where up to t of them can be Byzantine
    corecore