2,019 research outputs found
Phase Clocks for Transient Fault Repair
Phase clocks are synchronization tools that implement a form of logical time
in distributed systems. For systems tolerating transient faults by self-repair
of damaged data, phase clocks can enable reasoning about the progress of
distributed repair procedures. This paper presents a phase clock algorithm
suited to the model of transient memory faults in asynchronous systems with
read/write registers. The algorithm is self-stabilizing and guarantees accuracy
of phase clocks within O(k) time following an initial state that is k-faulty.
Composition theorems show how the algorithm can be used for the timing of
distributed procedures that repair system outputs.Comment: 22 pages, LaTe
Fast and compact self-stabilizing verification, computation, and fault detection of an MST
This paper demonstrates the usefulness of distributed local verification of
proofs, as a tool for the design of self-stabilizing algorithms.In particular,
it introduces a somewhat generalized notion of distributed local proofs, and
utilizes it for improving the time complexity significantly, while maintaining
space optimality. As a result, we show that optimizing the memory size carries
at most a small cost in terms of time, in the context of Minimum Spanning Tree
(MST). That is, we present algorithms that are both time and space efficient
for both constructing an MST and for verifying it.This involves several parts
that may be considered contributions in themselves.First, we generalize the
notion of local proofs, trading off the time complexity for memory efficiency.
This adds a dimension to the study of distributed local proofs, which has been
gaining attention recently. Specifically, we design a (self-stabilizing) proof
labeling scheme which is memory optimal (i.e., bits per node), and
whose time complexity is in synchronous networks, or time in asynchronous ones, where is the maximum degree of
nodes. This answers an open problem posed by Awerbuch and Varghese (FOCS 1991).
We also show that time is necessary, even in synchronous
networks. Another property is that if faults occurred, then, within the
requireddetection time above, they are detected by some node in the locality of each of the faults.Second, we show how to enhance a known
transformer that makes input/output algorithms self-stabilizing. It now takes
as input an efficient construction algorithm and an efficient self-stabilizing
proof labeling scheme, and produces an efficient self-stabilizing algorithm.
When used for MST, the transformer produces a memory optimal self-stabilizing
algorithm, whose time complexity, namely, , is significantly better even
than that of previous algorithms. (The time complexity of previous MST
algorithms that used memory bits per node was , and
the time for optimal space algorithms was .) Inherited from our proof
labelling scheme, our self-stabilising MST construction algorithm also has the
following two properties: (1) if faults occur after the construction ended,
then they are detected by some nodes within time in synchronous
networks, or within time in asynchronous ones, and (2) if
faults occurred, then, within the required detection time above, they are
detected within the locality of each of the faults. We also show
how to improve the above two properties, at the expense of some increase in the
memory
Separation of Circulating Tokens
Self-stabilizing distributed control is often modeled by token abstractions.
A system with a single token may implement mutual exclusion; a system with
multiple tokens may ensure that immediate neighbors do not simultaneously enjoy
a privilege. For a cyber-physical system, tokens may represent physical objects
whose movement is controlled. The problem studied in this paper is to ensure
that a synchronous system with m circulating tokens has at least d distance
between tokens. This problem is first considered in a ring where d is given
whilst m and the ring size n are unknown. The protocol solving this problem can
be uniform, with all processes running the same program, or it can be
non-uniform, with some processes acting only as token relays. The protocol for
this first problem is simple, and can be expressed with Petri net formalism. A
second problem is to maximize d when m is given, and n is unknown. For the
second problem, the paper presents a non-uniform protocol with a single
corrective process.Comment: 22 pages, 7 figures, epsf and pstricks in LaTe
Distributed Computing with Adaptive Heuristics
We use ideas from distributed computing to study dynamic environments in
which computational nodes, or decision makers, follow adaptive heuristics (Hart
2005), i.e., simple and unsophisticated rules of behavior, e.g., repeatedly
"best replying" to others' actions, and minimizing "regret", that have been
extensively studied in game theory and economics. We explore when convergence
of such simple dynamics to an equilibrium is guaranteed in asynchronous
computational environments, where nodes can act at any time. Our research
agenda, distributed computing with adaptive heuristics, lies on the borderline
of computer science (including distributed computing and learning) and game
theory (including game dynamics and adaptive heuristics). We exhibit a general
non-termination result for a broad class of heuristics with bounded
recall---that is, simple rules of behavior that depend only on recent history
of interaction between nodes. We consider implications of our result across a
wide variety of interesting and timely applications: game theory, circuit
design, social networks, routing and congestion control. We also study the
computational and communication complexity of asynchronous dynamics and present
some basic observations regarding the effects of asynchrony on no-regret
dynamics. We believe that our work opens a new avenue for research in both
distributed computing and game theory.Comment: 36 pages, four figures. Expands both technical results and discussion
of v1. Revised version will appear in the proceedings of Innovations in
Computer Science 201
Fault-tolerant Algorithms for Tick-Generation in Asynchronous Logic: Robust Pulse Generation
Today's hardware technology presents a new challenge in designing robust
systems. Deep submicron VLSI technology introduced transient and permanent
faults that were never considered in low-level system designs in the past.
Still, robustness of that part of the system is crucial and needs to be
guaranteed for any successful product. Distributed systems, on the other hand,
have been dealing with similar issues for decades. However, neither the basic
abstractions nor the complexity of contemporary fault-tolerant distributed
algorithms match the peculiarities of hardware implementations. This paper is
intended to be part of an attempt striving to overcome this gap between theory
and practice for the clock synchronization problem. Solving this task
sufficiently well will allow to build a very robust high-precision clocking
system for hardware designs like systems-on-chips in critical applications. As
our first building block, we describe and prove correct a novel Byzantine
fault-tolerant self-stabilizing pulse synchronization protocol, which can be
implemented using standard asynchronous digital logic. Despite the strict
limitations introduced by hardware designs, it offers optimal resilience and
smaller complexity than all existing protocols.Comment: 52 pages, 7 figures, extended abstract published at SSS 201
Stabilizing Server-Based Storage in Byzantine Asynchronous Message-Passing Systems
A stabilizing Byzantine single-writer single-reader (SWSR) regular register,
which stabilizes after the first invoked write operation, is first presented.
Then, new/old ordering inversions are eliminated by the use of a (bounded)
sequence number for writes, obtaining a practically stabilizing SWSR atomic
register. A practically stabilizing Byzantine single-writer multi-reader (SWMR)
atomic register is then obtained by using several copies of SWSR atomic
registers. Finally, bounded time-stamps, with a time-stamp per writer, together
with SWMR atomic registers, are used to construct a practically stabilizing
Byzantine multi-writer multi-reader (MWMR) atomic register. In a system of
servers implementing an atomic register, and in addition to transient failures,
the constructions tolerate t<n/8 Byzantine servers if communication is
asynchronous, and t<n/3 Byzantine servers if it is synchronous. The noteworthy
feature of the proposed algorithms is that (to our knowledge) these are the
first that build an atomic read/write storage on top of asynchronous servers
prone to transient failures, and where up to t of them can be Byzantine
- …