1,542 research outputs found

    Dynamic FTSS in Asynchronous Systems: the Case of Unison

    Full text link
    Distributed fault-tolerance can mask the effect of a limited number of permanent faults, while self-stabilization provides forward recovery after an arbitrary number of transient fault hit the system. FTSS protocols combine the best of both worlds since they are simultaneously fault-tolerant and self-stabilizing. To date, FTSS solutions either consider static (i.e. fixed point) tasks, or assume synchronous scheduling of the system components. In this paper, we present the first study of dynamic tasks in asynchronous systems, considering the unison problem as a benchmark. Unison can be seen as a local clock synchronization problem as neighbors must maintain digital clocks at most one time unit away from each other, and increment their own clock value infinitely often. We present many impossibility results for this difficult problem and propose a FTSS solution when the problem is solvable that exhibits optimal fault containment

    Universal Loop-Free Super-Stabilization

    Get PDF
    We propose an univesal scheme to design loop-free and super-stabilizing protocols for constructing spanning trees optimizing any tree metrics (not only those that are isomorphic to a shortest path tree). Our scheme combines a novel super-stabilizing loop-free BFS with an existing self-stabilizing spanning tree that optimizes a given metric. The composition result preserves the best properties of both worlds: super-stabilization, loop-freedom, and optimization of the original metric without any stabilization time penalty. As case study we apply our composition mechanism to two well known metric-dependent spanning trees: the maximum-flow tree and the minimum degree spanning tree

    Phase Clocks for Transient Fault Repair

    Full text link
    Phase clocks are synchronization tools that implement a form of logical time in distributed systems. For systems tolerating transient faults by self-repair of damaged data, phase clocks can enable reasoning about the progress of distributed repair procedures. This paper presents a phase clock algorithm suited to the model of transient memory faults in asynchronous systems with read/write registers. The algorithm is self-stabilizing and guarantees accuracy of phase clocks within O(k) time following an initial state that is k-faulty. Composition theorems show how the algorithm can be used for the timing of distributed procedures that repair system outputs.Comment: 22 pages, LaTe

    Metastability-Containing Circuits

    Get PDF
    In digital circuits, metastability can cause deteriorated signals that neither are logical 0 or logical 1, breaking the abstraction of Boolean logic. Unfortunately, any way of reading a signal from an unsynchronized clock domain or performing an analog-to-digital conversion incurs the risk of a metastable upset; no digital circuit can deterministically avoid, resolve, or detect metastability (Marino, 1981). Synchronizers, the only traditional countermeasure, exponentially decrease the odds of maintained metastability over time. Trading synchronization delay for an increased probability to resolve metastability to logical 0 or 1, they do not guarantee success. We propose a fundamentally different approach: It is possible to contain metastability by fine-grained logical masking so that it cannot infect the entire circuit. This technique guarantees a limited degree of metastability in---and uncertainty about---the output. At the heart of our approach lies a time- and value-discrete model for metastability in synchronous clocked digital circuits. Metastability is propagated in a worst-case fashion, allowing to derive deterministic guarantees, without and unlike synchronizers. The proposed model permits positive results and passes the test of reproducing Marino's impossibility results. We fully classify which functions can be computed by circuits with standard registers. Regarding masking registers, we show that they become computationally strictly more powerful with each clock cycle, resulting in a non-trivial hierarchy of computable functions

    self-stabilising

    Get PDF
    We revisit the approach to Byzantine fault-tolerant clock synchronization based on approximate agreement introduced by Lynch and Welch. Our contribution is threefold: (1) We provide a slightly refined variant of the algorithm yielding improved bounds on the skew that can be achieved and the sustainable frequency offsets. (2) We show how to extend the technique to also synchronize clock rates. This permits less frequent communication without significant loss of precision, provided that clock rates change sufficiently slowly. (3) We present a coupling scheme that allows to make these algorithms self-stabilizing while preserving their high precision. The scheme utilizes a low-precision, but self-stabilizing algorithm for the purpose of recovery

    Asynchronous neighborhood task synchronization

    Full text link
    Faults are likely to occur in distributed systems. The motivation for designing self-stabilizing system is to be able to automatically recover from a faulty state. As per Dijkstra\u27s definition, a system is self-stabilizing if it converges to a desired state from an arbitrary state in a finite number of steps. The paradigm of self-stabilization is considered to be the most unified approach to designing fault-tolerant systems. Any type of faults, e.g., transient, process crashes and restart, link failures and recoveries, and byzantine faults, can be handled by a self-stabilizing system; Many applications in distributed systems involve multiple phases. Solving these applications require some degree of synchronization of phases. In this thesis research, we introduce a new problem, called asynchronous neighborhood task synchronization ( NTS ). In this problem, processes execute infinite instances of tasks, where a task consists of a set of steps. There are several requirements for this problem. Simultaneous execution of steps by the neighbors is allowed only if the steps are different. Every neighborhood is synchronized in the sense that all neighboring processes execute the same instance of a task. Although the NTS problem is applicable in nonfaulty environments, it is more challenging to solve this problem considering various types of faults. In this research, we will present a self-stabilizing solution to the NTS problem. The proposed solution is space optimal, fault containing, fully localized, and fully distributed. One of the most desirable properties of our algorithm is that it works under any (including unfair) daemon. We will discuss various applications of the NTS problem

    Self-stabilising Byzantine Clock Synchronisation is Almost as Easy as Consensus

    Get PDF
    We give fault-tolerant algorithms for establishing synchrony in distributed systems in which each of the nn nodes has its own clock. Our algorithms operate in a very strong fault model: we require self-stabilisation, i.e., the initial state of the system may be arbitrary, and there can be up to f<n/3f<n/3 ongoing Byzantine faults, i.e., nodes that deviate from the protocol in an arbitrary manner. Furthermore, we assume that the local clocks of the nodes may progress at different speeds (clock drift) and communication has bounded delay. In this model, we study the pulse synchronisation problem, where the task is to guarantee that eventually all correct nodes generate well-separated local pulse events (i.e., unlabelled logical clock ticks) in a synchronised manner. Compared to prior work, we achieve exponential improvements in stabilisation time and the number of communicated bits, and give the first sublinear-time algorithm for the problem: - In the deterministic setting, the state-of-the-art solutions stabilise in time Θ(f)\Theta(f) and have each node broadcast Θ(flog⁥f)\Theta(f \log f) bits per time unit. We exponentially reduce the number of bits broadcasted per time unit to Θ(log⁥f)\Theta(\log f) while retaining the same stabilisation time. - In the randomised setting, the state-of-the-art solutions stabilise in time Θ(f)\Theta(f) and have each node broadcast O(1)O(1) bits per time unit. We exponentially reduce the stabilisation time to log⁥O(1)f\log^{O(1)} f while each node broadcasts log⁥O(1)f\log^{O(1)} f bits per time unit. These results are obtained by means of a recursive approach reducing the above task of self-stabilising pulse synchronisation in the bounded-delay model to non-self-stabilising binary consensus in the synchronous model. In general, our approach introduces at most logarithmic overheads in terms of stabilisation time and broadcasted bits over the underlying consensus routine.Comment: 54 pages. To appear in JACM, preliminary version of this work has appeared in DISC 201

    Fault-tolerant Algorithms for Tick-Generation in Asynchronous Logic: Robust Pulse Generation

    Full text link
    Today's hardware technology presents a new challenge in designing robust systems. Deep submicron VLSI technology introduced transient and permanent faults that were never considered in low-level system designs in the past. Still, robustness of that part of the system is crucial and needs to be guaranteed for any successful product. Distributed systems, on the other hand, have been dealing with similar issues for decades. However, neither the basic abstractions nor the complexity of contemporary fault-tolerant distributed algorithms match the peculiarities of hardware implementations. This paper is intended to be part of an attempt striving to overcome this gap between theory and practice for the clock synchronization problem. Solving this task sufficiently well will allow to build a very robust high-precision clocking system for hardware designs like systems-on-chips in critical applications. As our first building block, we describe and prove correct a novel Byzantine fault-tolerant self-stabilizing pulse synchronization protocol, which can be implemented using standard asynchronous digital logic. Despite the strict limitations introduced by hardware designs, it offers optimal resilience and smaller complexity than all existing protocols.Comment: 52 pages, 7 figures, extended abstract published at SSS 201
    • 

    corecore