Search CORE

25 research outputs found

Self-stabilising Byzantine Clock Synchronisation is Almost as Easy as Consensus

Author: Lenzen Christoph
Rybicki Joel
Publication venue
Publication date: 01/01/2019
Field of study

We give fault-tolerant algorithms for establishing synchrony in distributed systems in which each of the

n

nodes has its own clock. Our algorithms operate in a very strong fault model: we require self-stabilisation, i.e., the initial state of the system may be arbitrary, and there can be up to

f<n/3

ongoing Byzantine faults, i.e., nodes that deviate from the protocol in an arbitrary manner. Furthermore, we assume that the local clocks of the nodes may progress at different speeds (clock drift) and communication has bounded delay. In this model, we study the pulse synchronisation problem, where the task is to guarantee that eventually all correct nodes generate well-separated local pulse events (i.e., unlabelled logical clock ticks) in a synchronised manner. Compared to prior work, we achieve exponential improvements in stabilisation time and the number of communicated bits, and give the first sublinear-time algorithm for the problem: - In the deterministic setting, the state-of-the-art solutions stabilise in time

\Theta(f)

and have each node broadcast

\Theta(f \log f)

bits per time unit. We exponentially reduce the number of bits broadcasted per time unit to

\Theta(\log f)

while retaining the same stabilisation time. - In the randomised setting, the state-of-the-art solutions stabilise in time

\Theta(f)

and have each node broadcast

O(1)

bits per time unit. We exponentially reduce the stabilisation time to

\log^{O(1)} f

while each node broadcasts

\log^{O(1)} f

bits per time unit. These results are obtained by means of a recursive approach reducing the above task of self-stabilising pulse synchronisation in the bounded-delay model to non-self-stabilising binary consensus in the synchronous model. In general, our approach introduces at most logarithmic overheads in terms of stabilisation time and broadcasted bits over the underlying consensus routine.Comment: 54 pages. To appear in JACM, preliminary version of this work has appeared in DISC 201

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)

MPG.PuRe

self-stabilizing

Author: Lenzen C.
Rybicki J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Consider a fully-connected synchronous distributed system consisting of n nodes, where up to f nodes may be faulty and every node starts in an arbitrary initial state. In the synchronous C-counting problem, all nodes need to eventually agree on a counter that is increased by one modulo C in each round for given C>1. In the self-stabilising firing squad problem, the task is to eventually guarantee that all non-faulty nodes have simultaneous responses to external inputs: if a subset of the correct nodes receive an external “go” signal as input, then all correct nodes should agree on a round (in the not-too-distant future) in which to jointly output a “fire” signal. Moreover, no node should generate a “fire” signal without some correct node having previously received a “go” signal as input. We present a framework reducing both tasks to binary consensus at very small cost. For example, we obtain a deterministic algorithm for self-stabilising Byzantine firing squads with optimal resilience f<n/3, asymptotically optimal stabilisation and response time O(f), and message size O(log f). As our framework does not restrict the type of consensus routines used, we also obtain efficient randomised solutions

IST Austria: PubRep (Institute of Science and Technology)

MPG.PuRe

Near-optimal self-stabilising counting and firing squads

Author: Lenzen Christoph
Rybicki Joel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/01/2017
Field of study

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)

MPG.PuRe

Constructing fail-controlled nodes for distributed systems: a software approach

Author: Brasileiro Francisco Vilar
Publication venue: Newcastle University
Publication date: 01/01/1995
Field of study

PhD ThesisDesigning and implementing distributed systems which continue to provide specified services in the presence of processing site and communication failures is a difficult task. To facilitate their development, distributed systems have been built assuming that their underlying hardware components are Jail-controlled, i.e. present a well defined failure mode. However, if conventional hardware cannot provide the assumed failure mode, there is a need to build processing sites or nodes, and communication infra-structure that present the fail-controlled behaviour assumed. Coupling a number of redundant processors within a replicated node is a well known way of constructing fail-controlled nodes. Computation is replicated and executed simultaneously at each processor, and by employing suitable validation techniques to the outputs generated by processors (e.g. majority voting, comparison), outputs from faulty processors can be prevented from appearing at the application level. One way of constructing replicated nodes is by introducing hardwired mechanisms to couple replicated processors with specialised validation hardware circuits. Processors are tightly synchronised at the clock cycle level, and have their outputs validated by a reliable validation hardware. Another approach is to use software mechanisms to perform synchronisation of processors and validation of the outputs. The main advantage of hardware based nodes is the minimum performance overhead incurred. However, the introduction of special circuits may increase the complexity of the design tremendously. Further, every new microprocessor architecture requires considerable redesign overhead. Software based nodes do not present these problems, on the other hand, they introduce much bigger performance overheads to the system. In this thesis we investigate alternative ways of constructing efficient fail-controlled, software based replicated nodes. In particular, we present much more efficient order protocols, which are necessary for the implementation of these nodes. Our protocols, unlike others published to date, do not require processors' physical clocks to be explicitly synchronised. The main contribution of this thesis is the precise definition of the semantics of a software based Jail-silent node, along with its efficient design, implementation and performance evaluation.The Brazilian National Research Council (CNPq/Brasil)

Newcastle University eTheses

TRIX: Low-Skew Pulse Propagation for Fault-Tolerant Hardware

Author: Lenzen Christoph
Wiederhake Ben
Publication venue
Publication date: 01/01/2020
Field of study

The vast majority of hardware architectures use a carefully timed reference signal to clock their computational logic. However, standard distribution solutions are not fault-tolerant. In this work, we present a simple grid structure as a more reliable clock propagation method and study it by means of simulation experiments. Fault-tolerance is achieved by forwarding clock pulses on arrival of the second of three incoming signals from the previous layer. A key question is how well neighboring grid nodes are synchronized, even without faults. Analyzing the clock skew under typical-case conditions is highly challenging. Because the forwarding mechanism involves taking the median, standard probabilistic tools fail, even when modeling link delays just by unbiased coin flips. Our statistical approach provides substantial evidence that this system performs surprisingly well. Specifically, in an "infinitely wide" grid of height~

H

, the delay at a pre-selected node exhibits a standard deviation of

O(H^{1/4})

(

\approx 2.7

link delay uncertainties for

H=2000

) and skew between adjacent nodes of

o(\log \log H)

(

\approx 0.77

link delay uncertainties for

H=2000

). We conclude that the proposed system is a very promising clock distribution method. This leads to the open problem of a stochastic explanation of the tight concentration of delays and skews. More generally, we believe that understanding our very simple abstraction of the system is of mathematical interest in its own right.Comment: 16 pages, 11 figure

arXiv.org e-Print Archive

MPG.PuRe

{TRIX}: {L}ow-Skew Pulse Propagation for Fault-Tolerant Hardware

Author: Lenzen C.
Wiederhake B.
Publication venue
Publication date: 01/01/2020
Field of study

H

, the delay at a pre-selected node exhibits a standard deviation of

O(H^{1/4})

(

\approx 2.7

link delay uncertainties for

H=2000

) and skew between adjacent nodes of

o(\log \log H)

(

\approx 0.77

link delay uncertainties for

H=2000

MPG.PuRe

Fault Tolerant Gradient Clock Synchronization

Author: Bund Johannes
Lenzen Christoph
Rosenbaum Will
Publication venue
Publication date: 01/01/2019
Field of study

Synchronizing clocks in distributed systems is well-understood, both in terms of fault-tolerance in fully connected systems and the dependence of local and global worst-case skews (i.e., maximum clock difference between neighbors and arbitrary pairs of nodes, respectively) on the diameter of fault-free systems. However, so far nothing non-trivial is known about the local skew that can be achieved in topologies that are not fully connected even under a single Byzantine fault. Put simply, in this work we show that the most powerful known techniques for fault-tolerant and gradient clock synchronization are compatible, in the sense that the best of both worlds can be achieved simultaneously. Concretely, we combine the Lynch-Welch algorithm [Welch1988] for synchronizing a clique of

n

nodes despite up to

f<n/3

Byzantine faults with the gradient clock synchronization (GCS) algorithm by Lenzen et al. [Lenzen2010] in order to render the latter resilient to faults. As this is not possible on general graphs, we augment an input graph

\mathcal{G}

by replacing each node by

3f+1

fully connected copies, which execute an instance of the Lynch-Welch algorithm. We then interpret these clusters as supernodes executing the GCS algorithm, where for each cluster its correct nodes' Lynch-Welch clocks provide estimates of the logical clock of the supernode in the GCS algorithm. By connecting clusters corresponding to neighbors in

\mathcal{G}

in a fully bipartite manner, supernodes can inform each other about (estimates of) their logical clock values. This way, we achieve asymptotically optimal local skew, granted that no cluster contains more than

f

faulty nodes, at factor

O(f)

and

O(f^2)

overheads in terms of nodes and edges, respectively. Note that tolerating

f

faulty neighbors trivially requires degree larger than

f

, so this is asymptotically optimal as well

arXiv.org e-Print Archive

MPG.PuRe

Enhanced Phase Clocks, Population Protocols, and Fast Space Optimal Leader Election

Author: Gasieniec Leszek
Stachowiak Grzegorz
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/11/2020
Field of study

University of Liverpool Repository

Self-stabilising Byzantine clock synchronisation is almost as easy as consensus

Author: Lenzen Christoph
Rybicki Joel Patrick
Publication venue
Publication date: 01/01/2017
Field of study

We give fault-tolerant algorithms for establishing synchrony in distributed systems in which each of the n nodes has its own clock. Our algorithms operate in a very strong fault model: we require self-stabilisation, i.e., the initial state of the system may be arbitrary, and there can be up to f<n/3 ongoing Byzantine faults, i.e., nodes that deviate from the protocol in an arbitrary manner. Furthermore, we assume that the local clocks of the nodes may progress at different speeds (clock drift) and communication has bounded delay. In this model, we study the pulse synchronisation problem, where the task is to guarantee that eventually all correct nodes generate well-separated local pulse events (i.e., unlabelled logical clock ticks) in a synchronised manner. Compared to prior work, we achieve exponential improvements in stabilisation time and the number of communicated bits, and give the first sublinear-time algorithm for the problem: - In the deterministic setting, the state-of-the-art solutions stabilise in time Theta(f) and have each node broadcast Theta(f log f) bits per time unit. We exponentially reduce the number of bits broadcasted per time unit to Theta(log f) while retaining the same stabilisation time. - In the randomised setting, the state-of-the-art solutions stabilise in time Theta(f) and have each node broadcast O(1) bits per time unit. We exponentially reduce the stabilisation time to polylog f while each node broadcasts polylog f bits per time unit. These results are obtained by means of a recursive approach reducing the above task of self-stabilising pulse synchronisation in the bounded-delay model to non-self-stabilising binary consensus in the synchronous model. In general, our approach introduces at most logarithmic overheads in terms of stabilisation time and broadcasted bits over the underlying consensus routine.Peer reviewe

Dagstuhl Research Online Publication Server

Helsingin yliopiston digitaalinen arkisto

MPG.PuRe

Tools and Algorithms for the Construction and Analysis of Systems

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2021
Field of study

This book is Open Access under a CC BY licence. The LNCS 11427 and 11428 proceedings set constitutes the proceedings of the 25th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2019, which took place in Prague, Czech Republic, in April 2019, held as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019. The total of 42 full and 8 short tool demo papers presented in these volumes was carefully reviewed and selected from 164 submissions. The papers are organized in topical sections as follows: Part I: SAT and SMT, SAT solving and theorem proving; verification and analysis; model checking; tool demo; and machine learning. Part II: concurrent and distributed systems; monitoring and runtime verification; hybrid and stochastic systems; synthesis; symbolic verification; and safety and fault-tolerant systems

Directory of Open Access Books (DOAB)