13 research outputs found

    Towards Optimal Synchronous Counting

    Full text link
    Consider a complete communication network of nn nodes, where the nodes receive a common clock pulse. We study the synchronous cc-counting problem: given any starting state and up to ff faulty nodes with arbitrary behaviour, the task is to eventually have all correct nodes counting modulo cc in agreement. Thus, we are considering algorithms that are self-stabilizing despite Byzantine failures. In this work, we give new algorithms for the synchronous counting problem that (1) are deterministic, (2) have linear stabilisation time in ff, (3) use a small number of states, and (4) achieve almost-optimal resilience. Prior algorithms either resort to randomisation, use a large number of states, or have poor resilience. In particular, we achieve an exponential improvement in the space complexity of deterministic algorithms, while still achieving linear stabilisation time and almost-linear resilience.Comment: 17 pages, 2 figure

    LIPIcs

    Get PDF
    Fault-tolerant distributed algorithms play an important role in many critical/high-availability applications. These algorithms are notoriously difficult to implement correctly, due to asynchronous communication and the occurrence of faults, such as the network dropping messages or computers crashing. Nonetheless there is surprisingly little language and verification support to build distributed systems based on fault-tolerant algorithms. In this paper, we present some of the challenges that a designer has to overcome to implement a fault-tolerant distributed system. Then we review different models that have been proposed to reason about distributed algorithms and sketch how such a model can form the basis for a domain-specific programming language. Adopting a high-level programming model can simplify the programmer's life and make the code amenable to automated verification, while still compiling to efficiently executable code. We conclude by summarizing the current status of an ongoing language design and implementation project that is based on this idea

    Bounding the Impact of Unbounded Attacks in Stabilization

    Get PDF
    Self-stabilization is a versatile approach to fault-tolerance since it permits a distributed system to recover from any transient fault that arbitrarily corrupts the contents of all memories in the system. Byzantine tolerance is an attractive feature of distributed systems that permits to cope with arbitrary malicious behaviors. Combining these two properties proved difficult: it is impossible to contain the spatial impact of Byzantine nodes in a self-stabilizing context for global tasks such as tree orientation and tree construction. We present and illustrate a new concept of Byzantine containment in stabilization. Our property, called Strong Stabilization enables to contain the impact of Byzantine nodes if they actually perform too many Byzantine actions. We derive impossibility results for strong stabilization and present strongly stabilizing protocols for tree orientation and tree construction that are optimal with respect to the number of Byzantine nodes that can be tolerated in a self-stabilizing context

    Fast self-stabilizing byzantine tolerant digital clock synchronization

    Get PDF
    Consider a distributed network in which up to a third of the nodes may be Byzantine, and in which the non-faulty nodes may be subject to transient faults that alter their memory in an arbitrary fashion. Within the context of this model, we are interested in the digital clock synchronization problem; which consists of agreeing on bounded integer counters, and increasing these counters regularly. It has been postulated in the past that synchronization cannot be solved in a Byzantine tolerant and self-stabilizing manner. The first solution to this problem had an expected exponential convergence time. Later, a deterministic solution was published with linear convergence time, which is optimal for deterministic solutions. In the current paper we achieve an expected constant convergence time. We thus obtain the optimal probabilistic solution, both in terms of convergence time and in terms of resilience to Byzantine adversaries

    Synchronous Counting and Computational Algorithm Design

    Full text link
    Consider a complete communication network on nn nodes, each of which is a state machine. In synchronous 2-counting, the nodes receive a common clock pulse and they have to agree on which pulses are "odd" and which are "even". We require that the solution is self-stabilising (reaching the correct operation from any initial state) and it tolerates ff Byzantine failures (nodes that send arbitrary misinformation). Prior algorithms are expensive to implement in hardware: they require a source of random bits or a large number of states. This work consists of two parts. In the first part, we use computational techniques (often known as synthesis) to construct very compact deterministic algorithms for the first non-trivial case of f=1f = 1. While no algorithm exists for n<4n < 4, we show that as few as 3 states per node are sufficient for all values n4n \ge 4. Moreover, the problem cannot be solved with only 2 states per node for n=4n = 4, but there is a 2-state solution for all values n6n \ge 6. In the second part, we develop and compare two different approaches for synthesising synchronous counting algorithms. Both approaches are based on casting the synthesis problem as a propositional satisfiability (SAT) problem and employing modern SAT-solvers. The difference lies in how to solve the SAT problem: either in a direct fashion, or incrementally within a counter-example guided abstraction refinement loop. Empirical results suggest that the former technique is more efficient if we want to synthesise time-optimal algorithms, while the latter technique discovers non-optimal algorithms more quickly.Comment: 35 pages, extended and revised versio

    Efficient Counting with Optimal Resilience

    No full text
    In the synchronous cc-counting problem, we are given a synchronous system of nn nodes, where up to ff of the nodes may be Byzantine, that is, have arbitrary faulty behaviour. The task is to have all of the correct nodes count modulo cc in unison in a self-stabilising manner: regardless of the initial state of the system and the faulty nodes' behavior, eventually rounds are consistently labelled by a counter modulo cc at all correct nodes. We provide a deterministic solution with resilience f<n/3f<n/3 that stabilises in O(f)O(f) rounds and every correct node broadcasts O(log2f)O(\log^2 f) bits per round. We build and improve on a recent result offering stabilisation time O(f)O(f) and communication complexity O(log2f/loglogf)O(\log^2 f /\log \log f) but with sub-optimal resilience f=n1o(1)f = n^{1-o(1)} (PODC 2015). Our new algorithm has optimal resilience, asymptotically optimal stabilisation time, and low communication complexity. Finally, we modify the algorithm to guarantee that after stabilisation very little communication occurs. In particular, for optimal resilience and polynomial counter size c=nO(1)c=n^{O(1)}, the algorithm broadcasts only O(1)O(1) bits per node every Θ(n)\Theta(n) rounds without affecting the other properties of the algorithm; communication-wise this is asymptotically optimal

    Minimizing Message Size in Stochastic Communication Patterns: Fast Self-Stabilizing Protocols with 3 bits

    Get PDF
    This paper considers the basic PULL\mathcal{PULL} model of communication, in which in each round, each agent extracts information from few randomly chosen agents. We seek to identify the smallest amount of information revealed in each interaction (message size) that nevertheless allows for efficient and robust computations of fundamental information dissemination tasks. We focus on the Majority Bit Dissemination problem that considers a population of nn agents, with a designated subset of source agents. Each source agent holds an input bit and each agent holds an output bit. The goal is to let all agents converge their output bits on the most frequent input bit of the sources (the majority bit). Note that the particular case of a single source agent corresponds to the classical problem of Broadcast. We concentrate on the severe fault-tolerant context of self-stabilization, in which a correct configuration must be reached eventually, despite all agents starting the execution with arbitrary initial states. We first design a general compiler which can essentially transform any self-stabilizing algorithm with a certain property that uses \ell-bits messages to one that uses only log\log \ell-bits messages, while paying only a small penalty in the running time. By applying this compiler recursively we then obtain a self-stabilizing Clock Synchronization protocol, in which agents synchronize their clocks modulo some given integer TT, within O~(lognlogT)\tilde O(\log n\log T) rounds w.h.p., and using messages that contain 33 bits only. We then employ the new Clock Synchronization tool to obtain a self-stabilizing Majority Bit Dissemination protocol which converges in O~(logn)\tilde O(\log n) time, w.h.p., on every initial configuration, provided that the ratio of sources supporting the minority opinion is bounded away from half. Moreover, this protocol also uses only 3 bits per interaction.Comment: 28 pages, 4 figure

    Dynamic FTSS in Asynchronous Systems: the Case of Unison

    Full text link
    Distributed fault-tolerance can mask the effect of a limited number of permanent faults, while self-stabilization provides forward recovery after an arbitrary number of transient fault hit the system. FTSS protocols combine the best of both worlds since they are simultaneously fault-tolerant and self-stabilizing. To date, FTSS solutions either consider static (i.e. fixed point) tasks, or assume synchronous scheduling of the system components. In this paper, we present the first study of dynamic tasks in asynchronous systems, considering the unison problem as a benchmark. Unison can be seen as a local clock synchronization problem as neighbors must maintain digital clocks at most one time unit away from each other, and increment their own clock value infinitely often. We present many impossibility results for this difficult problem and propose a FTSS solution when the problem is solvable that exhibits optimal fault containment

    Synchronous Counting and Computational Algorithm Design

    No full text
    Consider a complete communication network on nn nodes, each of which is a state machine. In synchronous 2-counting, the nodes receive a common clock pulse and they have to agree on which pulses are "odd" and which are "even". We require that the solution is self-stabilising (reaching the correct operation from any initial state) and it tolerates ff Byzantine failures (nodes that send arbitrary misinformation). Prior algorithms are expensive to implement in hardware: they require a source of random bits or a large number of states. This work consists of two parts. In the first part, we use computational techniques (often known as synthesis) to construct very compact deterministic algorithms for the first non-trivial case of f=1f = 1. While no algorithm exists for n<4n < 4, we show that as few as 3 states per node are sufficient for all values n4n \ge 4. Moreover, the problem cannot be solved with only 2 states per node for n=4n = 4, but there is a 2-state solution for all values n6n \ge 6. In the second part, we develop and compare two different approaches for synthesising synchronous counting algorithms. Both approaches are based on casting the synthesis problem as a propositional satisfiability (SAT) problem and employing modern SAT-solvers. The difference lies in how to solve the SAT problem: either in a direct fashion, or incrementally within a counter-example guided abstraction refinement loop. Empirical results suggest that the former technique is more efficient if we want to synthesise time-optimal algorithms, while the latter technique discovers non-optimal algorithms more quickly

    Fault tolerant pulse synchronization

    Get PDF
    Pulse synchronization is the evolution of spontaneous firing action across a network of sensor nodes. In the pulse synchronization model all nodes across a network produce a pulse, or "fire", at regular intervals even without access to a shared global time. Previous researchers have proposed the Reachback Firefly algorithm for pulse synchronization, in which nodes react to the firings of other nodes by changing their period. We propose an extension to this algorithm for tolerating arbitrary or Byzantine faults of nodes. Our algorithm queues up all the firings heard in the current cycle and discards outliers at the end of the cycle. An adjustment is computed with the remaining values and used as a starting point of the next cycle. Through simulation we validate the performance of our algorithm and study the overhead in terms of convergence time and periodicity. The simulation considers two specific kinds of Byzantine faults, the No Jump model where faulty nodes follow their own firing cycle without reacting to firings heard from other nodes and the Random Jump model where faulty nodes fire at any random time in their cycle
    corecore