1,542 research outputs found
Dynamic FTSS in Asynchronous Systems: the Case of Unison
Distributed fault-tolerance can mask the effect of a limited number of
permanent faults, while self-stabilization provides forward recovery after an
arbitrary number of transient fault hit the system. FTSS protocols combine the
best of both worlds since they are simultaneously fault-tolerant and
self-stabilizing. To date, FTSS solutions either consider static (i.e. fixed
point) tasks, or assume synchronous scheduling of the system components. In
this paper, we present the first study of dynamic tasks in asynchronous
systems, considering the unison problem as a benchmark. Unison can be seen as a
local clock synchronization problem as neighbors must maintain digital clocks
at most one time unit away from each other, and increment their own clock value
infinitely often. We present many impossibility results for this difficult
problem and propose a FTSS solution when the problem is solvable that exhibits
optimal fault containment
Universal Loop-Free Super-Stabilization
We propose an univesal scheme to design loop-free and super-stabilizing
protocols for constructing spanning trees optimizing any tree metrics (not only
those that are isomorphic to a shortest path tree). Our scheme combines a novel
super-stabilizing loop-free BFS with an existing self-stabilizing spanning tree
that optimizes a given metric. The composition result preserves the best
properties of both worlds: super-stabilization, loop-freedom, and optimization
of the original metric without any stabilization time penalty. As case study we
apply our composition mechanism to two well known metric-dependent spanning
trees: the maximum-flow tree and the minimum degree spanning tree
Phase Clocks for Transient Fault Repair
Phase clocks are synchronization tools that implement a form of logical time
in distributed systems. For systems tolerating transient faults by self-repair
of damaged data, phase clocks can enable reasoning about the progress of
distributed repair procedures. This paper presents a phase clock algorithm
suited to the model of transient memory faults in asynchronous systems with
read/write registers. The algorithm is self-stabilizing and guarantees accuracy
of phase clocks within O(k) time following an initial state that is k-faulty.
Composition theorems show how the algorithm can be used for the timing of
distributed procedures that repair system outputs.Comment: 22 pages, LaTe
Metastability-Containing Circuits
In digital circuits, metastability can cause deteriorated signals that
neither are logical 0 or logical 1, breaking the abstraction of Boolean logic.
Unfortunately, any way of reading a signal from an unsynchronized clock domain
or performing an analog-to-digital conversion incurs the risk of a metastable
upset; no digital circuit can deterministically avoid, resolve, or detect
metastability (Marino, 1981). Synchronizers, the only traditional
countermeasure, exponentially decrease the odds of maintained metastability
over time. Trading synchronization delay for an increased probability to
resolve metastability to logical 0 or 1, they do not guarantee success.
We propose a fundamentally different approach: It is possible to contain
metastability by fine-grained logical masking so that it cannot infect the
entire circuit. This technique guarantees a limited degree of metastability
in---and uncertainty about---the output.
At the heart of our approach lies a time- and value-discrete model for
metastability in synchronous clocked digital circuits. Metastability is
propagated in a worst-case fashion, allowing to derive deterministic
guarantees, without and unlike synchronizers. The proposed model permits
positive results and passes the test of reproducing Marino's impossibility
results. We fully classify which functions can be computed by circuits with
standard registers. Regarding masking registers, we show that they become
computationally strictly more powerful with each clock cycle, resulting in a
non-trivial hierarchy of computable functions
self-stabilising
We revisit the approach to Byzantine fault-tolerant clock synchronization based on approximate agreement introduced by Lynch and Welch. Our contribution is threefold: (1) We provide a slightly refined variant of the algorithm yielding improved bounds on the skew that can be achieved and the sustainable frequency offsets. (2) We show how to extend the technique to also synchronize clock rates. This permits less frequent communication without significant loss of precision, provided that clock rates change sufficiently slowly. (3) We present a coupling scheme that allows to make these algorithms self-stabilizing while preserving their high precision. The scheme utilizes a low-precision, but self-stabilizing algorithm for the purpose of recovery
Asynchronous neighborhood task synchronization
Faults are likely to occur in distributed systems. The motivation for designing self-stabilizing system is to be able to automatically recover from a faulty state. As per Dijkstra\u27s definition, a system is self-stabilizing if it converges to a desired state from an arbitrary state in a finite number of steps. The paradigm of self-stabilization is considered to be the most unified approach to designing fault-tolerant systems. Any type of faults, e.g., transient, process crashes and restart, link failures and recoveries, and byzantine faults, can be handled by a self-stabilizing system; Many applications in distributed systems involve multiple phases. Solving these applications require some degree of synchronization of phases. In this thesis research, we introduce a new problem, called asynchronous neighborhood task synchronization ( NTS ). In this problem, processes execute infinite instances of tasks, where a task consists of a set of steps. There are several requirements for this problem. Simultaneous execution of steps by the neighbors is allowed only if the steps are different. Every neighborhood is synchronized in the sense that all neighboring processes execute the same instance of a task. Although the NTS problem is applicable in nonfaulty environments, it is more challenging to solve this problem considering various types of faults. In this research, we will present a self-stabilizing solution to the NTS problem. The proposed solution is space optimal, fault containing, fully localized, and fully distributed. One of the most desirable properties of our algorithm is that it works under any (including unfair) daemon. We will discuss various applications of the NTS problem
Self-stabilising Byzantine Clock Synchronisation is Almost as Easy as Consensus
We give fault-tolerant algorithms for establishing synchrony in distributed
systems in which each of the nodes has its own clock. Our algorithms
operate in a very strong fault model: we require self-stabilisation, i.e., the
initial state of the system may be arbitrary, and there can be up to
ongoing Byzantine faults, i.e., nodes that deviate from the protocol in an
arbitrary manner. Furthermore, we assume that the local clocks of the nodes may
progress at different speeds (clock drift) and communication has bounded delay.
In this model, we study the pulse synchronisation problem, where the task is to
guarantee that eventually all correct nodes generate well-separated local pulse
events (i.e., unlabelled logical clock ticks) in a synchronised manner.
Compared to prior work, we achieve exponential improvements in stabilisation
time and the number of communicated bits, and give the first sublinear-time
algorithm for the problem:
- In the deterministic setting, the state-of-the-art solutions stabilise in
time and have each node broadcast bits per time
unit. We exponentially reduce the number of bits broadcasted per time unit to
while retaining the same stabilisation time.
- In the randomised setting, the state-of-the-art solutions stabilise in time
and have each node broadcast bits per time unit. We
exponentially reduce the stabilisation time to while each node
broadcasts bits per time unit.
These results are obtained by means of a recursive approach reducing the
above task of self-stabilising pulse synchronisation in the bounded-delay model
to non-self-stabilising binary consensus in the synchronous model. In general,
our approach introduces at most logarithmic overheads in terms of stabilisation
time and broadcasted bits over the underlying consensus routine.Comment: 54 pages. To appear in JACM, preliminary version of this work has
appeared in DISC 201
Fault-tolerant Algorithms for Tick-Generation in Asynchronous Logic: Robust Pulse Generation
Today's hardware technology presents a new challenge in designing robust
systems. Deep submicron VLSI technology introduced transient and permanent
faults that were never considered in low-level system designs in the past.
Still, robustness of that part of the system is crucial and needs to be
guaranteed for any successful product. Distributed systems, on the other hand,
have been dealing with similar issues for decades. However, neither the basic
abstractions nor the complexity of contemporary fault-tolerant distributed
algorithms match the peculiarities of hardware implementations. This paper is
intended to be part of an attempt striving to overcome this gap between theory
and practice for the clock synchronization problem. Solving this task
sufficiently well will allow to build a very robust high-precision clocking
system for hardware designs like systems-on-chips in critical applications. As
our first building block, we describe and prove correct a novel Byzantine
fault-tolerant self-stabilizing pulse synchronization protocol, which can be
implemented using standard asynchronous digital logic. Despite the strict
limitations introduced by hardware designs, it offers optimal resilience and
smaller complexity than all existing protocols.Comment: 52 pages, 7 figures, extended abstract published at SSS 201
- âŠ