Reconciling fault-tolerant distributed computing and systems-on-chip by Matthias Függer & Ulrich Schmid
Distrib. Comput. (2012) 24:323–355
DOI 10.1007/s00446-011-0151-7
Reconciling fault-tolerant distributed computing
and systems-on-chip
Matthias Függer · Ulrich Schmid
Received: 17 August 2010 / Accepted: 10 November 2011 / Published online: 11 December 2011
© The Author(s) 2011. This article is published with open access at Springerlink.com
Abstract Classic distributed computing abstractions do not
match well the reality of digital logic gates, which are the
elementary building blocks of Systems-on-Chip (SoCs) and
other Very Large Scale Integrated (VLSI) circuits: Massively
concurrent, continuous computations undermine the concept
of sequential processes executing sequences of atomic zero-
time computing steps, and very limited computational resour-
ces at gate-level make even simple operations prohibitively
costly. In this paper, we introduce a modeling and analy-
sis framework based on continuous computations and zero-
bit message channels, and employ this framework for the
correctness & performance analysis of a distributed fault-
tolerant clocking approach for Systems-on-Chip (SoCs).
Starting out from a “classic” distributed Byzantine fault-tol-
erant tick generation algorithm, we show how to adapt it
for direct implementation in clockless digital logic, and rig-
orously prove its correctness and derive analytic expressions
for worst case performance metrics like synchronization pre-
cision and clock frequency. Rather than on absolute delay
values, both the algorithm’s correctness and the achievable
synchronization precision depend solely on the ratio of cer-
tain path delays. Since these ratios can be mapped directly to
placement & routing constraints, there is typically no need
This work originates in our DARTS project, which has been a joint
effort of Vienna University of Technology and RUAG Space, see
http://ti.tuwien.ac.at/darts for details. It has been supported by the
Austrian bm:vit FIT-IT project DARTS (809456-SCK/SAI) and the
Austrian FWF projects Theta (P17757), PSRTS (P20529) and FATAL
(P21694).
M. Függer (B) · U. Schmid
Technische Universität Wien, Embedded Computing Systems




for changing the algorithm when migrating to a faster imple-
mentation technology and/or when using a slightly different
layout in an SoC.
Keywords Clock synchronization · Fault-tolerant
distributed systems · Modeling approaches · VLSI
1 Introduction
Shrinking feature sizes and increasing clock speeds are the
most visible signs of the tremendous advances in VLSI
design, which will accommodate billions of transistors on
a single chip in the near future [34]. This comes at the price
of increased system-level complexity, however: With today’s
deep submicron technology with GHz clock speeds, wiring
delays dominate transistor switching delays, and electrical
signals cannot traverse the whole chip within a single clock
cycle any more. Consequently modern VLSI chips can no
longer be viewed as monolithic blocks of synchronous hard-
ware, where all the chip’s state-holding gates simultaneously
perform state transitions: The engineering of the clock tree,
which disseminates the high-speed clock — typically gener-
ated by a quartz oscillator in conjunction with a clock multi-
plier — throughout the whole chip with zero skew is becom-
ing more and more difficult [6,22,52,64] and ineffective.
Large VLSI chips are hence nowadays increasingly being
considered as more or less loosely-coupled systems of inter-
acting subsystems — the advent of Systems-on-Chip (SoC)
and Networks-on-Chip (NoC).
Moreover, the smaller feature sizes and the reduced volt-
age swing needed for high clock speeds and low power con-
sumption dramatically increase the adverse effects of upsets
due to α-particle or neutron hits [4,15,29,36,57,73], cross-
talk and ground bouncing [50]. The resulting increase of the
123
324 M. Függer, U. Schmid
transient failure rate (soft-error rate) [46] and crosstalk sensi-
tivity [59] has hence raised concerns about the dependability
of future generation VLSI chips [11]. Mitigation techniques
exist at different levels of abstraction, including replication
of a chip’s functional units at system-level. Since in syn-
chronous hardware designs the oscillator and its clock tree
make up a non-negligible single point-of-failure [70], conse-
quent fault-tolerant designs, too, comprise SoCs and NoCs
of interacting functional units that independently perform
state-transitions.
Modern SoCs have hence much in common with loosely-
coupled distributed systems that have been studied by the
fault-tolerant distributed algorithms community for decades.
Recent work e.g. on scheduling of memory requests [54],
transactional memory in multicores [19], and self-stabiliz-
ing microprocessors [13], as well as our own work intro-
duced below, confirms that it is indeed possible to utilize
distributed algorithms research in the VLSI context. Con-
versely, results and methods from VLSI-related research, as
error-correcting codes, have proved useful e.g. for tolerating
Byzantine adversaries [14], and blend nicely with distrib-
uted algorithms research on fault-tolerant consensus [23], for
example. Attempts to systematically bridge the gap between
distributed algorithms and VLSI design are lacking, however,
[9,67]; our paper makes a first step in this direction.
This work originates in our DARTS project, which is
devoted to fault-tolerant clock generation in SoCs: As the
zero-skew clock requirement had to be dropped in modern
VLSI circuits anyway, replacing the classic centralized clock
generation approach (an obvious single-point-of-failure) by
a distributed solution becomes a feasible option. Like stan-
dard GALS (globally asynchronous, locally synchronous)
architectures [8], DARTS replaces the traditional common
clock source by multiple clock sources. In sharp contrast
to GALS, however, which employs multiple unsynchro-
nized clock devices (typically quartz oscillators), DARTS
utilizes a Byzantine fault-tolerant distributed tick generation
algorithm, which guarantee some bounded clock skew also
between the different clock sources and their clock domains.
Such multi-synchronous [72,78] GALS systems are benefi-
cial from a designers point of view, since they combine the
convenient local synchrony with a global time base across the
whole chip. It has been shown in [61] that these properties
facilitate even metastability-free high-speed communication
across clock domains via bounded-size buffers.
The DARTS clocking scheme has been implemented both
in an FPGA [20] and in a custom radiation-hardened ASIC
[24,27], which proves that the approach is feasible in practice
and indeed works very well. Although the implementation
complexity of DARTS is definitely not negligible, it must be
considered as the price for a fault-tolerant clocking system.
Moreover, it is not clear how an improved and fully engi-
neered version of DARTS, which consists of standard gates
and wires only, would actually compare w.r.t. area to tradi-
tional carefully balanced clock trees with their many strong
clock drivers.
In our attempts to devise a rigorous correctness proof
for DARTS, we realized that classic distributed computing
abstractions do not match well the peculiarities of hardware
implementations:
(i) Inherent fine-grained parallelism, which is caused by a
large number of digital logic gates that concurrently and
continuously compute their outputs based on their inputs.
This undermines the abstraction of a (typically small) col-
lection of processors that perform atomic computing steps
at discrete points in time, which is common to (almost) all
existing distributed computing models.
(ii) Very limited resources, which make even apparently
simple operations like addition or comparison of k-bit
numbers, as well as sending such numbers via messages,
prohibitively costly. This is in conflict with the basic atomic
operations that are typically used in existing distributed
algorithms.
Contributions: This paper introduces a novel framework for
modeling and analysis of distributed algorithms implemented
in VLSI, which explicitly addresses the above issues. Using
this framework, we present a complete and rigorous correct-
ness proof and worst case performance analysis of DARTS.
The detailed contributions of our paper are the following:
(1) We introduce a novel modeling and analysis framework
based on signals, which allow modeling of continuous
computations and binary message channels with delay.
The framework facilitates “switching” between differ-
ent — but consistent — abstractions of the same signal
(e.g., state view). In sharp contrast to existing modeling
frameworks capable of expressing timed executions,1
these features allow to express the properties — and
reason about correctness and performance — of fault-
tolerant distributed algorithms designed for a direct
implementation in digital logic in a very natural and
simple way.
(2) We adapt the simple Byzantine fault-tolerant distrib-
uted tick generation algorithm introduced in [82] for
a direct implementation in clockless digital logic.
Major modifications concern the enforcement of certain
atomic actions (“interlocking”) via implicit handshak-
ing between parts of the algorithm, and by replacing the
k-bit messages used for communication in the original
1 Our framework is substantially different from existing modeling
frameworks for clockless VLSI circuits (“trace theory”), which are
time-free and hence cannot deal with failures [21].
123
Fault-tolerant tick generation in VLSI 325
algorithm by zero-bit messages, i.e., by up/down signal
transitions.
(3) We prove that the adapted algorithm is correct, and
derive worst case bounds for its performance metrics
like synchronization precision and minimum/maximum
clock frequency. Since our system-level proof rests on
fundamental properties of certain elementary building
blocks only, it effectively reduces the complex prob-
lem of guaranteeing system correctness to the problem
of guaranteeing the correctness of fairly simple basic
blocks. Consequently, in sharp contrast to classic dis-
tributed algorithms and their correctness proofs, our
low-level modeling approach leaves only a small “proof
gap” with respect to the actual implementation.
The paper is organized as follows: Following an overview
of related work in Sect. 2, we informally explain the original
tick generation algorithm and the required modifications in
Sect. 3. Section 4 introduces our modeling framework and
provides the system and failure model used in the subsequent
analysis, as well as the detailed specification of our hard-
ware tick generation algorithm and its (elementary) building
blocks. Section 5 provides the detailed correctness proof and
the worst case performance analysis. Some conclusions and
directions of further research in Sect. 6 complete the paper.
A glossary of our notation can be found in (Table 1).
2 Related work
VLSI clock generation: There exists a huge body of work on
classic fault-tolerant clock synchronization [63,74], includ-
ing hardware-assisted clock synchronization [68], where a
set of free-running physical clocks are to be synchronized.
In contrast to these approaches, DARTS does not com-
pute adjustment values for synchronizing free-running (e.g.,
driven by a quartz oscillator) local hardware clocks, but gen-
erates clock ticks that are inherently synchronized (by means
of a distributed algorithm) in a closed-loop fashion. Note
that there is research on “extracting” clock synchronization
from the actual communication in a distributed computation
[1,31,58,60] that bears some relation to our approach.
The few approaches for distributed clock generation with-
out local clock sources we are aware of are essentially based
on a (distributed) ring oscillator, which is formed by gates
arranged in a feedback loop. Instead of being dictated by a
quartz, the frequency of the generated clock signal is deter-
mined by the end-to-end delay of the feedback loop. In [51],
a regular structure of closed loops of an odd number of invert-
ers is used for distributed clock generation. Similarly, [17,18]
employs local tick generation cells, arranged in a two-dimen-
sional grid, with each cell inverting its output signal when
its four inputs (from the up, down, left and right neighbor)
match the current clock output value. Since clock synchroni-
zation theory [12] reveals that high connectivity is required
for bounded synchronization tightness in the presence of fail-
ures, however, the sparsely connected designs proposed in
[17,18,51] are not fault-tolerant.
Modeling approaches: The theory of asynchronous (clock-
less) distributed systems — in the absence of failures — has
been used in the VLSI community for decades [10]: Research
on transition signaling [7,71], delay-insensitivity [16,49],
micropipelines [77], etc. has established a sound basis for
dealing with self-timed systems [32]. Since then, much
research has been conducted on benefits and limitations of
clockless circuits. There is a wealth of literature on the arbiter
problem [41,47], which is — like the latch, the inertial delay
and the mutex — impossible to solve in a delay-insensitive
way [3,49]. Both arbiter-free problems [43] and a few ways
to circumvent the impossibility of implementing arbiters by
adding (some) timing properties [48] or order properties [76]
have been thoroughly investigated.
Existing modeling approaches for clockless circuits are
based on algebraic trace theory [16,49] or Petri-nets [43,83];
such specifications are time-free. Time-augmented clockless
circuits can be handled by using timed Petri-nets, which
assign time intervals to each transition [5,55,65,84]. How-
ever, to the best of our knowledge, none of these modeling
frameworks deals with failures. On the other hand, model-
ing frameworks developed in distributed algorithms research,
like timed I/O automatons [37] or TLA [42], can deal with
failures, but are not tailored to the specific needs of VLSI
circuits.
Fault-tolerance in VLSI: Dependability concerns have
also stimulated a large body of research work devoted to
fault-tolerance and fault-prevention in VLSI systems [40].
Fine-grained fault tolerance, e.g. at transistor and gate level,
encoding, error detection & recovery/reconfiguration, and
radiation hardening techniques are the methods of choice
here, see e.g. [35,53,56,79,80] for some examples. The
proposed techniques are very different from the “system-
level approach” employed in fault-tolerant distributed algo-
rithms, which we will exploit in this paper. We note, however,
that those approaches are complementary, in the sense that
VLSI fault-tolerance techniques can reduce component fail-
ure rates and, hence, the required system-level redundancy.
Although we are not aware of any work that deals with
the fine-grain parallelism inherent in VLSI implementa-
tions, there is a sizeable body of work devoted to hardware
implementations of fault-tolerant algorithms. Well-known
examples are MAFT [38], SAFEBUS [33], GUARDS [62]
and TTP [39]. However, in sharp contrast to our problem,
these systems incorporate hardware assistance only. The
major part of the algorithms is still implemented in con-
ventional software and executed on general-purpose proces-
sors. As a consequence, minimizing the gate-level resource
123
326 M. Függer, U. Schmid
Table 1 Glossary of our
Notation Name Description Definition
B Maximum node booting completion times Sect. 4.2
#bp(t) # Ticks generated by node p Sect. 4.2
bmax (t) # Ticks generated by first correct node Sect. 5.2
bmin(t) # Ticks generated by last correct node Sect. 5.2
C Set of correct nodes in the system Sect. 4.2
#dp,q (t) #Ticks commonly removed from (p, q) Sect. 4.2
δ(X; t) Value of channel X ’s delivery function at time t Sect. 4.1
F Set of faulty nodes in the system Sect. 4.2
f Maximum number of faulty nodes Sect. 4.2
FCR Fault-containment region Sect. 4.2
n Total number of nodes in the system Sect. 4.2
P Set of all n nodes in the system Sect. 4.2
(p, q) Pipepair for remote q at local node p Sect. 4.2
˜PG E Q,o/ep,q (t) Status (p, q): # ticks remote ≥ local Sect. 4.2
˜PG R,o/ep,q (t) Status (p, q): # ticks remote > local Sect. 4.2
π Precision bound Sect. 4.4
#rlocp,q (t) #Ticks stored in local pipe of (p, q) Sect. 4.2
#rremp,q (t) #Ticks stored in remote pipe of (p, q) Sect. 4.2
̂S Event trace representation of signal S Sect. 4.1
˜S(t) Status representation of signal S Sect. 4.1
#S(t) Counting function representation of signal S Sect. 4.1
Sloc Maximum size local pipes Sect. 4.2
Srem Maximum size remote pipes Sect. 4.2
#slocp,q (t) Current size local pipe of (p, q) Sect. 4.2
#sremp,q (t) Current size remote pipe of (p, q) Sect. 4.2
TBS(k) Booting-induced tick k generation time Eq. 42
T −f irst Minimum tick generation interval Eq. 39
Tmax Maximum local loop delay Constr. 1
Tmin Minimum local loop delay Constr. 1
Tmin,dis Stripped minimum local loop delay Constr. 1
TP Maximum tick generation interval Eq. 40
TQS Maximum tick catchup time Eq. 41
Tdel Maximum time until removal of tick Eq. 70
T locdel Maximum time until removal of tick, 2nd bound Eq. 75
tfirst,k Time when first correct node sends tick k Sect. 4.2
tlast,k Time when last correct node sends tick k Sect. 4.2
tb Time when first correct node completes booting Lem. 9
tp,b Time when node p completes booting Sect. 4.2
trmv,k Time of removal of tick k from (p, q) Sect. 4.2
[




















Min./max. delay remote channel + pipe Sect. 4.2
[




Min./max. delay threshold modules Sect. 4.2
123
Fault-tolerant tick generation in VLSI 327
Fig. 1 Replacing synchronous clocking by fault-tolerant distributed
tick generation
consumption implied by these algorithms has never been con-
sidered. A notable exception is [2], however, where it has
been shown that consensus can be solved with 1-bit mes-
sages.
3 DARTS informal overview
As shown in Fig. 1, the basic idea of DARTS is to replace the
common quartz oscillator and the clock tree by a fully dis-
tributed GALS-like approach [8]: Every functional unit Fui
has attached a dedicated fault-tolerant tick generation block
(TG-Alg) here, which generates Fui ’s local clock signal. To
accomplish this, all TG-Alg blocks communicate with each
other over a simple “network” of clock signals (TG-Net). In
contrast to standard GALS, however, DARTS ensures that
the local clock signals of different Fu’s are synchronized to
each other to within a few clock cycles (termed multi-syn-
chronous clocking [72,78]). It has been proved elsewhere
[62] that even such loose synchrony suffices for implement-
ing metastability-free high-speed communication between
different Fu’s using bounded-size buffers.
DARTS clocks (patented in [69]) provide a number of
additional advantages, which makes them particularly prom-
ising for critical applications in the aerospace domain: First
of all, the approach entirely circumvents quartz oscillators,
which are fairly big and sensitive devices (shock, vibration,
temperature etc.), as well as the cumbersome clock tree engi-
neering issue [6,22,52,64]. It is fault-tolerant, in the sense
that the correctness of the clock signals supplied by correct
TG-Algs is not affected by transient and permanent failures
occurring in other TG-Algs and/or in the TG-Net. DARTS
clocks can also be guaranteed to indeed start operating at
booting time, a feature that is difficult to ensure for oscillator-
based clocking approaches in space applications. Moreover,
the clocks always run at the maximum speed and adapt to the
current communication delays within the TG-Algs and the
TG-Net, both of which may vary with the current operating
conditions, such as supply voltage and temperature. And last
but not least, since different Fus are driven by slightly differ-
ent clock signals, DARTS clocks alleviate EM radiation and
ground bouncing problems [50] that typically plague devices
using synchronous clocking.
The TG-Algs developed and analyzed in this paper derive
from a simple synchronizer algorithm for the -Model
Fig. 2 Algorithm for generating approximately simultaneous tick(k)
messages
[44,81,82] introduced in [82]. The (core of this) algorithm,
which is based on Srikanth and Toueg’s consistent broad-
casting primitive [75], is shown in Fig. 2. It assumes a
message-driven system (where nodes make atomic receive-
compute-send steps whenever they receive a message) of
n = 3 f + 1 nodes (= TG-Alg instances), at most f of which
may behave arbitrarily faulty, i.e., Byzantine. The number
of nodes n is equal to the number of Fus the SoC is par-
titioned into, and also depends on the intended number of
faults f the SoC should sustain. Typical DARTS systems
will probably have some f in the order of 1 to 4, resulting
in some n in the range of 4 to 13 TG-Algs. All correct nodes
are connected by a reliable2 point-to-point message-passing
network (= TG-Net): No spurious messages are ever gen-
erated, no messages are lost or altered, and all messages
sent at time t are eventually received within the interval
t +[τ−, τ+], where τ− respectively τ+ denote the (possibly
unknown) lower respectively upper bound on the end-to-end
delay of messages exchanged between correct nodes. Let
ε = τ+ − τ− be the maximum uncertainty of the message
delay, and  = τ+/τ− the maximum delay ratio.
The idea of the algorithm is the following: Initially, every
node broadcasts tick(0) in line 3. If a correct node p
receives f + 1 tick() messages (line 5), it can be sure
that at least one of those was broadcast by a correct node.
Therefore, p can safely catch up and send tick(k + 1), …,
tick(). If some node p receives 2 f + 1 tick(k) messages
(line 7) and thus sends tick(k + 1), one can be sure that
all tick(k) messages broadcast by correct nodes, i.e., at least
f + 1, will be received within time ε by every other correct
node. Hence, every correct node will execute line 5 and
send tick(k), if it has not already done so.
In conjunction with the fact that the fastest correct node
cannot send consecutive tick(k), tick(k + 1),…arbitrarily
fast, this implies a bound on the synchronization precision:
Our detailed analysis will reveal that correct nodes gener-
ate a sequence of consecutive messages tick(k), k ≥ 1, in
2 Note that this reliable network assumption is not unduly restrictive,
since communication failures can be mapped to failures of the sending
node.
123
328 M. Függer, U. Schmid
a synchronized way (see Sect. 4.4): If #bp(t) denotes the
number of tick(k) messages broadcast by node p by real-
time t (which is identical to the value of variable k at real-
time t , cf. Fig. 2), it will turn out that (t2 − t1)αmin ≤
#bp(t2) − #bp(t1) ≤ (t2 − t1)αmax for any correct node p
and t2 − t1 sufficiently large (“accuracy”); the constants αmin
and αmax depend on τ−, τ+. Moreover, every two correct
nodes p, q maintain |#bp(t) − #bq(t)| ≤ π (“precision”),
for all t ≥ 0 with a small constant π that depends on 
only. Note carefully that the algorithm automatically adapts
to the instantaneous timing characteristics of any involved
computation and communication.
Since the algorithm in Fig. 2 looks very simple, it is tempt-
ing to conclude that it is easily translated into a hardware
description language: Node p’s TG-Alg just needs to drive
a Boolean-valued clock signal, where it outputs the k-th sig-
nal transition when the algorithm sends its tick(k) message;
the TG-Net is formed by feeding all clock signals to all
TG-Algs. It turns out, however, that a number of challenging
issues must be solved to actually accomplish this:
• How to implement the TG-Net efficiently? The algorithm
assumes a fully connected network, consisting of n2
links,3 so anything beyond a single wire per link is unac-
ceptable [26]. Moreover, for implementation simplicity
and performance, the information transmitted via the TG-
Net must be kept to a minimum. Ideally, and almost man-
datory, the TG-Net should just feed the emitted clock
ticks, i.e., signal transitions, of every TG-Alg node to
every other TG-Alg node.
• How to adapt the original algorithm for zero-bit mes-
sages? By just sending signal transitions, no information
except the occurrence time can be conveyed over the TG-
Net. Thus, the tick number k contained in the messages of
Fig. 2 must be maintained at every receiver, individually
for every sender.
• How to map hardware faults to node failures? Given that
the algorithm shown in Fig. 2 tolerates Byzantine fail-
ures, we are on the safe side here. Interestingly, there is
evidence [30] that assuming less severe failures is inap-
propriate in the presence of real hardware faults: Even
simple stuck-at faults could produce early and/or incon-
sistently perceived signal transition, and cannot hence be
modeled as a crash or omission node failure. More severe
hardware faults, like delay faults or early/spurious clock
transitions induced e.g. by particle hits or crosstalk, can
also easily lead to Byzantine failures.
• How to ensure atomicity of actions in a VLSI implementa-
tion? This turned out to be the most demanding challenge:
All fault-tolerant distributed computing models assume
3 Note that a bus of n broadcast links that provide every TG-Alg with
the messages from all other TG-Algs is in fact sufficient here.
Fig. 3 Basic architecture of a TG-Alg
atomic computing steps at the level of a single node. For
example, the algorithm presented in Fig. 2 assumes that:
(i) messages are received, (ii) the number of received mes-
sages is checked with respect to a threshold, and (iii)
possibly a new message is broadcast and variable k is
updated; all this happens in one atomic computing step.
This abstraction does not apply when the algorithm is
implemented via clockless digital logic gates, which con-
currently and continuously compute their outputs based
on their inputs. Explicit synchronization (serialization
of actions/interlocking) must be introduced if two local
computations must not interfere with each other.
3.1 Informal overview of the TG-Alg design problems
Taking into account the above issues, we arrived at the basic
architecture of a single TG-Alg shown in Fig. 3.
The major building blocks of a single TG-Alg are the
(n − 1) +/− counters, one for each of the n − 1 other TG-
Algs in the system. Each such device counts the difference
of (i) the number of clock ticks seen from the respective
peer, and (ii) the number of clock ticks generated locally so
far. Counting the difference between these two numbers is
sufficient to implement the algorithm of Fig. 2: To decide
when to broadcast the next message, the algorithm needs to
know when there are enough remote messages tick() for
which  > k or  ≥ k. Since  > k ⇔  − k > 0 and
 ≥ k ⇔  − k ≥ 0, it suffices to know when the difference
− k is > 0 or ≥ 0. Thus, the +/− counters are supposed to
provide two binary status functions, ˜G R and ˜G E Q, which
are true when the counter’s actual value is > 0 and ≥ 0,
respectively. In addition, a ≥ f + 1 and a ≥ 2 f + 1 thresh-
old circuit is required for implementing the rules in (line 5)
and (line 7) in Fig. 2, respectively. Finally, there is a device
(shown as an OR-gate in Fig. 3), which is responsible for gen-
erating every local clock tick exactly once when the ≥ f +1
or the ≥ 2 f + 1 threshold circuit triggers the generation of
a new tick message.
Again, the above architecture is deceptively simple. The
major problem encountered when trying to implement Fig. 3
in hardware is the lack of a common clock signal that could
123
Fault-tolerant tick generation in VLSI 329
be used for a synchronous design. Obviously, such a clock
signal is not available here. Hence, one has to resort to a
(quasi) delay insensitive clockless implementation [32] of
the algorithm depicted in Fig. 3, which raises a number of
intricate problems:
How to reconcile transition signaling and fault-tolerance?
The probably most elegant paradigm for clockless logic is
transition signaling, where information is conveyed exclu-
sively via signal transitions, rather than via signal states as in
conventional logic. Any reasonable delay-insensitive clock-
less circuit can be composed from a small set of elementary
building blocks here, which includes the Muller C-Element
that forms the equivalent of the logical AND of two input
signals, see [7,48,77] for details.
The “expressive” power of transition signaling is restricted
to time-free systems, however, where causality is the only
meaningful relation between events. Consequently, there is
no full equivalent to the conventional logical OR of two input
signals: If only one of the two inputs can provide a tran-
sition, the EXOR (exclusive OR) gate can safely be used
for this purpose. However, there is no meaningful transition
signaling OR if both inputs could — but need not — provide
a (somehow related) transition. In fact, generating an OR
output transition when the first input transition arrives would
destroy the causality relation of the second input transition
and the generated output transition.
Unfortunately, incorporating fault-tolerance, as instanti-
ated by our threshold gates, requires exactly this semantics:
The clock signal generated by some TG-Alg must not depend
causally on the clocks generated by faulty TG-Algs, since this
may lead to blocking. Hence, there is no way but to “switch”
from transition signaling to state signaling and vice versa to
circumvent this problem.
How to manage a clean switch between transition signal-
ing and state signaling? In our implementation of Fig. 3,
transition signaling logic is used for processing clock ticks
(+/− counters). The+/− counters, on the other hand, output
signals with status ˜G R and ˜G E Q. These signals are then
processed by the threshold circuits, which themselves output
signals with status ˜T H G R (respectively ˜T H G E Q ), signal-
ling whether the ˜G R signals have reached the f +1 threshold
or not (respectively whether the ˜G E Q signals have reached
the 2 f + 1 threshold or not). This information is finally fed
into a circuit (depicted by the OR-gate in Fig. 3) responsible
for generating exactly one state transition tick k for any k,
which is again implemented in transition logic.
The intermediate switching to status signalling performed
by the +/− counters, however, bears a problem: it is not fea-
sible to decrement all +/− counters at the same time, since it
is inevitable that a local tick(k) message is received at slightly
different times by a node’s +/− counters. This results in the
fact that some of the +/− counters produce a status of ˜G R
and ˜G E Q based on local tick(k) and others based on the
old local tick(k − 1), at least until they finally receive local
tick(k). This problem is solved as follows: We will see during
the paper that by proper constraints it can be enforced that
no three +/− counters set their ports’ status based on local
ticks k, k − 1 and k − 2 or even less at the same time. Thus it
suffices to distinguish whether ˜G R and ˜G E Q is based on k
or on k −1, which is done by simply duplicating both signals
˜G R and ˜G E Q, one for status that is based on even local ticks
and one for odd local ticks.
More specifically, since transitions of binary-valued clock
signals must strictly alternate between low-to-high (= odd
clock tick) and high-to-low (= even clock tick), it is obvi-
ous that the next tick to be generated after an even tick
must be odd, and vice versa. Hence, in our solution, we just
(i) duplicate the signals ˜G E Q, ˜G R, obtaining the signals
˜G E Qe, ˜G Re, ˜G E Qo and ˜G Ro, (ii) duplicate the two thresh-
old circuits, (iii) duplicate the signals ˜T H G E Q and ˜T H G R ,
obtaining the four signals ˜T H G E Qe, ˜T H G Re, ˜T H G E Qo
and ˜T H G R
o
, generated by one of the threshold circuits each,
and (iv) use the rules:
• Generate an even tick k + 1 ∈ Neven := 2N if the last
tick generated was odd (k ∈ Nodd := 2N + 1) and either
(a) the same or a greater number of ticks have been seen
from ≥ 2 f + 1 TG-Algs (˜G E Qo true), thereby enabling
˜T H G E Qo, or (b) a greater number of ticks have been
seen from ≥ f +1 TG-Algs (˜G Ro true), thereby enabling
˜T H G R
o
. Note that condition (a) ensures that the odd
tick k has been seen from ≥ 2 f + 1 remote TG-Algs,
whereas (b) guarantees that the even tick k+1 has already
been seen from ≥ f + 1 ones.
• Generate an odd tick k + 1 ∈ Nodd if the last tick gen-
erated was even (k ∈ Neven) and either (a) the same or a
greater number of ticks have been seen from≥ 2 f +1 TG-
Algs (˜G E Qe true), thereby enabling ˜T H G E Qe, or (b) a
greater number of ticks have been seen from ≥ f +1 TG-
Algs (˜G Re true), thereby enabling ˜T H G E Qe. Again,
condition (a) ensures that the even tick k has been seen
from ≥ 2 f + 1 remote TG-Algs, whereas (b) guarantees
that the odd tick k+1 has already been seen from ≥ f +1
ones.
Glitches4, due to the non-simultaneous arrival of clock sig-
nal transitions from the peers, are masked by simply ignor-
ing the output of the threshold circuits (say, ˜T H G Ro and
˜T H G E Qo) that generated the even tick k when the next tick
4 A glitch is a wrong state transition.
123
330 M. Függer, U. Schmid
Fig. 4 Tick generation by
splitting up in even and odd ticks
k + 1 to be generated is odd5. This “gap” thus allows ˜G Ro
and ˜G E Qo (which were responsible for activating ˜T H G Ro
or ˜T H G E Qo) for tick k to first become inactive and then
become active again for tick k + 2. As will be revealed by
Lemma 3, this is sufficient to avoid mixing up old and new
instances of ˜G Ro (respectively ˜G E Qo).
As an example consider Fig. 4. It depicts a trace of signal
˜T H G R
e ∨ ˜T H G E Qe, responsible for generating odd ticks,
as well as signal ˜T H G R
o ∨ ˜T H G E Qo, responsible for gen-
erating even ticks: 1© at the beginning of a new even tick gen-
eration, both signals are inactive. 2© ˜T H G Ro or ˜T H G E Qo
becomes active due to the arrival of sufficiently many remote
ticks, thereby enabling the signal ˜T H G R
o ∨ ˜T H G E Qo that
generates an even tick k. 3© The first local tick k arrive at p’s
+/− counters, which thus disable their ˜G E Qo and ˜G Ro sig-
nals, and hence finally ˜T H G R
o
and ˜T H G E Qo. Note that
˜T H G R
o ∨ ˜T H G E Qo may intermittently become enabled
again, due to the arrival of remote ticks from other TG-Algs.
4© After some time, all of p’s +/− counters have received
the local tick k and ˜T H G R
o ∨ ˜T H G E Qo has stabilized
being inactive. Now the next odd tick k + 1 is generated as
soon as ˜T H G R
e ∨ ˜T H G E Qe becomes active.
How to implement the +/− counters? This task turned
out to be the most delicate part of the hardware design work.
Actually, implementing a clockless up/down counter is inher-
ently difficult due to the fact that the up-clock (“+ port”) and
the down-clock (“− port”) transitions are totally unrelated.
They can hence occur arbitrarily close in time to each other,
which usually causes metastability problems [77]. Another
problem is how to correctly generate the status of ˜G Ro and
˜G E Qo (respectively ˜G Re and ˜G E Qe). They should truly
reflect the current counter value, at least during times when
they are used. We will specify the detailed properties that
5 Of course, no further glitches may occur as soon as this next tick has
occurred.
must be maintained by these signals in Sect. 4. Note that our
correctness proof and performance analysis will only rely on
these properties, i.e., is valid for every implementation of the
+/− counters that fulfills these properties.
Our particular implementation of the +/− counters con-
sists of two elastic pipelines [77], which can be seen as shift
registers/FIFO buffer for signal transitions. One of the elastic
pipelines is attached to the remote clock signal (“+ port”), the
other one is fed by the local clock signal (“− port”). They
are fitted together at their ends via a special Diff-Gate, which
removes “matching” transitions, that is, transitions represent-
ing the same tick number, as soon as they traveled through
the pipelines. The signals with status ˜G Ro, ˜G E Qo, ˜G Re and
˜G E Qe are derived from monitoring the last few stages of
both pipes. Further details are provided in Sect. 4 and in the
descriptions of our implementations [20,24,27].
4 Modeling and analysis framework
In this section, we introduce a modeling framework for
fault-tolerant distributed algorithms that are implemented
by means of clockless digital circuits, which is amenable
to mathematical correctness proofs and worst-case perfor-
mance analysis. The presented framework addresses the mul-
titude of issues raised in the previous section: It is based on
a continuous model of computation and time, and avoids the
use of design elements and abstractions that are not available
or too costly at the hardware implementation level. To handle
the design complexity challenge at such low levels of abstrac-
tion, it also supports hierarchical modeling: At the top-level
of DARTS, for example, there is the entire system, consisting
of n TG-Algs interconnected via the clock signal wires mak-
ing up the TG-Net; every TG-Alg can be further partitioned
into several building-blocks (like the +/− counters), which
are interconnected in some non-regular way. Before we for-
mally state the framework, we give an informal overview of
the main ingredients.
123
Fault-tolerant tick generation in VLSI 331
Our modeling framework is based on modules, which pos-
sess input and output ports. An execution of a module’s ports
is an assignment of a signal (which captures continuous com-
putation over time) to each of the module’s ports. A module’s
allowed behavior is specified by a set of executions of the
module’s ports. Note that modules differ from classic dis-
tributed computing abstractions like timed automatons [45]
primarily in that they continuously compute their outputs.
Compound modules consist of multiple sub-modules and
their interconnect, which specifies how sub-module ports are
connected to each other and to the module’s input and out-
put ports. The interconnect specification itself assumes zero
delays; modeling non-zero interconnect delays, e.g., for real
wires, requires intermediate channels: A channel is a module
that possesses a single input port and a single output port, and
its behavior specifies delayed FIFO delivery of input port sig-
nal transitions at the output port. Modules that are not further
refined are called basic modules. Elementary basic modules
are those that calculate zero-delay Boolean functions (AND,
OR, …) and channels.
Clearly, the behavior of a (non-faulty) compound module
is determined by the behavior of its constituent sub-mod-
ules; the behavior of a basic module must be given a priori.
Correctness proofs establish properties of the behaviors of
higher-level compound modules, based on the assumption
that (1) the system and failure model holds, and (2) that the
implementations of (non-faulty) basic modules indeed sat-
isfy their behavioral specification.
4.1 Signals and zero-bit message channels
Since we target implementations using clockless circuits, our
formal framework will be based on a continuous notion of
real-time t ∈ R+0 . We assume that the system initialization
(reset) is triggered at time t = 0; different modules may
complete their reset at different times, however.
Signal: A signal S is an event trace, i.e., a set of time/value
tuples. Formally,
S ⊆ R+0 × {0, 1},
where event (t, 1) ∈ S respectively (t, 0) ∈ S means that
S takes on value 1 respectively 0 at time t . We require non-
simultaneity of contradicting events on a single signal, i.e.,
((t, x) ∈ S) ∧ ((t, y) ∈ S) ⇒ x = y,
and assume that the initial event (0, I ), with either I = 0
or I = 1, is always present in S. We also disallow alternat-
ing Zeno behavior in our event traces, i.e., we require that
at most finitely many events with different value can occur
in any finite time interval, cp. [45, p. 737f]. Note, however,
that S may still contain arbitrarily many idempotent events.6
Consequently, if
pre(S, t) := {(t ′, v′) ∈ S | t ′ ≤ t}
suff(S, t) := {(t ′, v′) ∈ S | t ′ ≥ t}
denotes the prefix and suffix of S at time t , respectively, there
need not be a maximum element (vmax, tmax)— with respect
to the time component — in pre(S, t) and/or no minimum
element (vmin, tmin) in suff(S, t). However,
last-val(S, t) := v′ such that ∃(t ′, v′) ∈ pre(S, t) :
∀(t ′′, v′′) ∈ pre(S, t) : (t ′′ ≥ t ′) ⇒ (v′′ = v′)
is well-defined.7
Since our modeling framework is primarily devoted to
“real” systems like DARTS, we will restrict our attention to
systems made up of well-formed circuits only. A well-formed
circuit does not contain zero-delay wires, branches with infi-
nite fan-in/out and other non-implementable assumptions.
We say a signal S is well-formed if (i) it does not show alter-
nating Zeno behavior and (ii) the function t → last-val(S, t)
is right-continuous. It can be shown that well-formed cir-
cuits never produce signals that are not well-formed, unless
their input signals are not well-formed. Therefore, during
this paper, we may safely assume that all signals by which
the behavior of DARTS is modeled, are well-formed. Well-
formed signals allow for more abstract representations than
just event traces, which we will now introduce.
In fact, specifying systems in terms of event traces is some-
times overly complicated. More convenient in this regard are
two higher-level representations of signals: (i) status, and (ii)
counting function. All three representations will be consis-
tent, in a well-defined way, and can hence be used inter-
changeably.
Status: The status representation of a signal S, denoted by
˜S, is a function
˜S : R+0 → {0, 1}
from real-time to its instantaneous Boolean value, defined by
∀t ≥ 0 : ˜S(t) := last-val(S, t).
Since S is well-formed, the resulting ˜S(t) is obviously right-
continuous.
Status functions may be composed out of already defined
status functions by using arbitrary Boolean predicates, e.g.,
˜A := ˜B ∧˜C , with status function ˜B, ˜C , is defined as ˜A(t) :=
˜B(t) ∧ ˜C(t).
6 This is the reason why we use the term “event” here, rather than the
more common term “transition”.
7 If alternating Zeno traces were not forbidden, one could construct
event traces S where last-val(S, t) is not well-defined. For instance, this
happens in a Zeno-trace with alternating 0, 1, 0, . . . events approaching,
but not reaching, some time t .
123
332 M. Függer, U. Schmid
Counting function: Finally, a signal S can be represented
by the number of non-idempotent events (excluding the ini-
tial event) that occur during (0, t], denoted as the signal’s
counting function #S(t). For example, if S’s event trace is
given by S = {(0, 0), (1, 1), (1.5, 1), (2, 0)} and I = 0,
then #S(0) = 0, #S(0.5) = 0, #S(1) = 1, #S(1.5) = 1,
and #S(2) = 2. Sometimes, we will also employ generalized
counting functions #S′(t) that have an initial value other than
0: We define #S′(t) := #S(t)+ S0, where #S is the standard
counting function of S and S0 an arbitrary offset. It follows
immediately from the properties of signals that t1 ≤ t2 ⇒
#S(t1) ≤ #S(t2) for any counting function #S, and that #S
is right-continuous at any point of discontinuity since S is
well-formed.
In the sequel, we will use the most convenient represen-
tation of a signal S interchangeably, namely S itself, ˜S or
#S.
Execution: We begin with the formal definition of a system.
A system is a set P of ports, whereby a port can be thought
of as a measurement point on a digital chip. An execution
(of ports P) is a function that assigns each p ∈ P a signal p̂.
To avoid cluttered notation we simply write p˜ respectively
# p when we refer to the abstract signal representations ˜p̂
respectively # p̂. To specify a system’s allowed executions
in a convenient way, modules are introduced: a module is a
triple comprising (i) a set of input ports I, (ii) a set of output
ports O, and (iii) a set of allowed executions E of ports I∪O.
It is the specification of E , where the convenience of a three-
fold representation of signals comes into play, and it will be
extensively used in Sect. 4.3. For example, to specify a mod-
ule with no input port and a single output port o that produces
a constant-0 signal at o, we simply require that E comprises
all executions of ports {o} that fulfill o˜ = 0.
The system’s allowed executions can thus be specified by
stating a set of ports P together with a set of modules that
have input/output ports in P .
Channel: A channel models a reliable FIFO channel for sig-
nal transitions with finite delay. Since signal transitions must
be alternating, only the occurrence time but no data can be
conveyed over a single channel (“zero-bit messages”). For-
mally, the semantics of a channel X is as follows: Let Xs be
the channel’s single input port [which will be connected to an
output port of a single sender module], and Xr be its single
output port [which will be connected to the input ports of
some receiver module(s)]. Intuitively a channel maps events
at the input port occurring at some time t to (delayed) events
at the output port occurring at some delivery time t ′, where
the delay t ′ − t is not necessarily the same for each event,
if the channel has non-constant delay. Formally, we demand
that there exists a continuous and strongly monotonically
increasing delivery function δ : R+0 → R+0 for X , which
maps sending time t to delivery time δ(t). Note that we will
use δ(X; t) to refer to δ(t) when the corresponding channel







, i.e., for all times t in R+0 ,
δ(t) − t ∈ [τ−X , τ+X
]
. (1)
From the properties of δ, it follows immediately that δ is a
bijection from R+0 to its codomain δ(R+0 ). More specifically,
δ maps every closed interval [t1, t2] bijectively to the closed
interval [δ(t1), δ(t2)]. Clearly, the inverse function δ−1 of δ
also exists and has the same properties. In addition, we will
assume that the channel output has some well-defined ini-
tial state (is initialized to) I ∈ {0, 1}, which is I = 0 if not
specified otherwise.
Given δ, the channel’s behavior in terms of event traces is
specified by two properties, namely
(0, I ) ∈ ̂Xr ∧ (t, v) ∈ ̂Xr with v ∈ {0, 1}, t ∈ [0, δ(0))
which ensures that before the first event is delivered from the
input port to the output port, i.e., before δ(0), only one event
occurs at the output port: the event which sets the channel
output port to its initial value I . Secondly we demand that
(δ(t), x) ∈ ̂Xr ⇔ (t, x) ∈ ̂Xs . (2)
Since δ carries over the total order of the events (t, x) in ̂Xs
to the events (δ(t), x) in ̂Xr [called matching events in the
sequel], it follows immediately that ̂Xr is an event trace.
A more abstract specification of a channel in terms of
states is by,
∀t ∈ [0, δ(0)) : ˜Xr (t) := ˜Xr (0) = I and
∀t ≥ 0 : ˜Xr (δ(t)) := ˜Xs(t).
Note that this definition is consistent in the sense that an
execution fulfilling the event trace specification from above
fulfills the abstract state definition.
When considering the counting functions of a constant
delay channel’s ports, we observe that the output counting
function is obtained by shifting the input counting function in
time by the constant delay, say τX , obtaining #Xr (t + τX ) =
#Xs(t). For non-constant delay channels, this equality does
not hold in general, but has to be replaced by inequalities
(Pmax) and (Pmin) of the following lemma, summarizing
important channel properties:
Lemma 1 If X is a channel with sending port Xs , receiving






, then the following properties
hold:
(Ps) t1 ≤ t2 ⇒ #Xs(t1) ≤ #Xs(t2)
(Pr) t1 ≤ t2 ⇒ #Xr (t1) ≤ #Xr (t2)
(Pmax) #Xr (t + τ+X ) ≥ #Xs(t)
(Pmin) #Xr (t + τ−X ) ≤ #Xs(t)
123
Fault-tolerant tick generation in VLSI 333
Fig. 5 Fault-containment region for TG-Alg p
Proof (Ps) and (Pr) follow immediately from the definition
of a counting function and the fact that both ̂Xs and ̂Xr are
event traces. (Pmax) and (Pmin) follow from the fact that δ
bijectively maps the (non-idempotent) events occurring in ̂Xs
by time t to the (non-idempotent) events occurring in ̂Xr by





Having introduced the basics of our formal framework, we
can now define our system model. A DARTS system consists
of a set P of n := |P| top-level modules, where n ∈ N. The
top-level modules will interchangeably be called TG-Alg or
node, and are usually denoted by letters p, q etc. Every TG-
Alg p in P has exactly one output port with the counting
function #bp(t), where it broadcasts its clock, and one input
port per remote TG-Alg q ∈ P \ {p} with the counting func-
tion #rremp,q (t), where it receives q’s clock. We assume a fully
connected system, i.e., from every TG-Alg p to every TG-
Alg q ∈ P \ {p}, there is a channel 〈RE M, p, q〉 with input
#bp(t), output #rremq,p (t), and delay in [τ−rem, τ+rem]. Figure 5
shows the resulting outbound channels of TG-Alg p.
The following notation will be used throughout the paper:
For a function f , let f (t→) be its left limit at time t , i.e.,
f (t→) := limξ→t f (ξ) and ξ approaches t from the left.
For any k ≥ 1, we say that node p sends tick k, at time tp,k ,
if the kth event (without counting idempotent events) occurs
at tp,k . In terms of counting functions: tp,k is the time for
that #bp(tp,k) = #bp(t→p,k) + 1 = k. The time when the first
(respectively the last) correct node sends tick k is denoted
by tfirst,k (respectively tlast,k); note that the node who is the
first (respectively last) one to send tick k may be different for
different k.
Failure model: Since hardware faults easily lead to Byzan-
tine failures [30], we assume this failure semantics here: The
adverse power of Byzantine failures in our context lies in the
ability of a faulty node to generate wrong clock ticks (early
timing failures or even spurious) that are perceived inconsis-
tently at different remote nodes. Such failures could be the
consequence of manufacturing defects or electrostatic break-
down [40], particle hits [4,57], or electromagnetic noise [50],
which may affect any module in a TG-Alg. Due to differ-
ent wire lengths and signal-level detection thresholds, such
faults typically propagate differently to different receivers.
Note that we allow faulty nodes to create even metastability
[41], but we must assume that metastability cannot propagate
beyond FCRs (see below); we have already some convincing
evidence [25] that this is ensured by the elastic pipelines in
the +/− counters with large probability.
We partition our system into multiple fault-containment
regions (FCRs), i.e., sets of (sub-)modules that are potentially
affected by a single fault like a particle hit and thus cannot be
assumed to fail independently. More specifically, we define
FCR p to consist of the single TG-Alg p together with all its
outgoing channels, as depicted in Fig. 5. If FCR p is faulty,
any of its (sub-)modules may behave arbitrarily (Byzantine).8
Since every FCR is associated with exactly one TG-Alg, we
will also use these terms interchangeably.
Throughout the paper, let C be the set of correct FCRs, and
F , with f := |F |, the set of faulty FCRs. Clearly P = C ∪ F
and C ∩ F = ∅, i.e., C and F are partitions of P . We will
prove that correct nodes behave as specified in Sect. 4.4 in
the presence of up to f Byzantine faulty FCRs, provided that
the total number of nodes is n ≥ 3 f + 2. Note that this is
slightly more than the required lower bound of n ≥ 3 f +1 for
clock synchronization [12], but facilitates a considerably bet-
ter precision and accuracy (attained by counting only remote
messages when calculating the f + 1 respectively 2 f + 1
thresholds; including self-reception would lead to τ−rem =
τ−loc in Theorem 2 and hence spoil the achievable worst-case
precision). In case one does not want to spend an extra node
to the required 3 f + 1 nodes, an alternative is to add self-
reception and to artificially increase its delay, e.g., by feed-
ing the signal through inverter chains, so that τ−loc becomes
of the order of remote delays. Note that both a 3 f + 2 node
system without and a 3 f + 1 node system with self-recep-
tion have about the same overall size for reasonably small
f , since the extra digital logic needed for self-reception at
3 f + 1 nodes is about the size of a node without self-recep-
tion. This allows the designer to choose between a system
with a slightly larger number of nodes and a system which is
slightly more resilient, without changing the overall system
size. While, throughout this paper, we focus on the solution
using 3 f + 2 nodes without self-reception, an adaptation
of the algorithm and its correctness proof to the alternative
solution can be done in a straightforward manner.
Booting: We assume that the whole system is simultaneously
reset at time t = 0. However, we allow the modules to
8 In terms of standard failure models for distributed algorithms, our
assumption corresponds to mapping link failures to sender node fail-
ures.
123
334 M. Függer, U. Schmid
Fig. 6 Architecture of node p
complete booting at (slightly) different times: If tp,b denotes
the time by which all modules in a correct FCR p have com-
pleted booting, we require that tp,b ∈ [0, B] for some con-
stant B ≤ τ−rem . Note that the latter condition will ensure that
messages sent by p are never lost at a correct node q because
of late booting.
4.3 TG-Alg architecture and module specification
In this section, we will describe the architecture of a TG-Alg,
i.e., its sub-modules and interconnect, and formally specify
their behaviors. It is important to note here that the behavioral
properties defined in this subsection are assumed properties,
i.e., basic properties that must a priori be guaranteed by the
implementation of the modules. (Modules in FCRs hit by a
failure may deviate (arbitrarily) from their correct behavior,
however.) Based on these basic properties, the correctness
proofs provided in Sect. 5 will show that the system of TG-
Algs will maintain the system-level properties (precision and
accuracy) as specified in Sect. 4.4.
Figure 6 shows the general architecture of the TG-Alg p,
cp. Fig. 3. It consists of one +/− counter module per remote
TG-Alg (only two are depicted), four threshold modules
implementing the f + 1 and 2 f + 1 rules in Fig. 2, and
a tick broadcast module that finally generates p’s clock ticks
#bp(t). Every +/− counter is refined into several additional
sub-modules: A pair of elastic pipes (remote pipe, local pipe)
that form FIFO buffers for (remote, local) clock ticks, a Diff-
Gate module that removes matching remote and local ticks
from the pipes, and a Pipe Compare Signal Generator (PCSG)
module that generates the signals’ status reflecting the dif-
ference of the number of ticks present in the local and remote
pipe.
We will now specify the ports and the behavior of all these
modules in detail.
(i) Pairs of elastic pipes: Every TG-Alg p incorporates n−1
pairs of elastic pipelines [77], each of which corresponds to
a single remote TG-Alg q ∈ P \ {p}. We will denote the
pair of pipes at p corresponding to q by (p, q) in the sequel.
(p, q) consists of a remote pipeline that can store up to Srem
clock ticks sent by q, and a local pipeline that can hold up
to Sloc clock ticks sent by p locally. Note that the numbers
Srem and Sloc are implementation parameters that have to be
chosen in accordance with Theorems 4 and 6; in the specifi-
cations of this section, they are just assumed to be unbounded
(= always sufficiently large).
123
Fault-tolerant tick generation in VLSI 335
The local pipe in (p, q) has a single input port that is fed
by TG-Alg p’s local clock ticks, i.e., #bp(t), supplied via the
channel 〈L OC, p, q〉, and a single output port represented
by the counting function #rlocp,q(t). Similarly, the remote pipe
in (p, q) has a single input port that is fed by TG-Alg q’s
local clock ticks supplied via the channel 〈RE M, q, p〉, cp.
Fig. 5, and a single output port represented by the counting
function #rremp,q (t).
We say that p receives tick k from the remote node q at
time t , or, equivalently, that remote tick k is received in the
pipepair (p, q) at p if #rremp,q (t) = #rremp,q (t→)+1 = k. Anal-
ogously, we say that local tick k is received in the pipepair
(p, q) at p, if #rlocp,q(t) = #rlocp,q(t→) + 1 = k.
Behavioral description: Both pipes in the pair (p, q) have
the behavior of a zero-delay9 channel: For any t ≥ 0, #rlocp,q(t)
respectively #rremp,q (t) gives the number of clock ticks that
reached the end of the local respectively remote pipe by time
t . Upon reset, both pipes are pre-filled with the virtual tick 0
(such a tick is called virtual since it has never been sent), and
#rremp,q (tp,b) = #rlocp,q(tp,b) := 0.
(ii) Diff-Gate module: To avoid pipes with infinite capacity,
each pair of pipes is equipped with a special Diff-Gate circuit
that removes matching clock ticks, i.e., clock ticks contained
in both pipes. However, it deletes matching ticks only if at
least one tick remains in both the local and remote pipe.
The Diff-Gate for (p, q) has two input ports connected to
#rremp,q (t) and #rlocp,q(t), and a single output port represented
by the counting function #dp,q(t), which gives the largest
tick number that has been removed from both the remote and
local pipe of (p, q) by time t . To allow removal of the virtual
tick 0, the initial value is #dp,q(tp,b) = −1.
Behavioral description: Recall that virtual tick 0 shows up
at the output of the local and the remote pipe at booting com-
pletion time tloc,0 = tp,b.
Ticks are removed from the pipes as follows:
• k = 0: If
– tick k + 1 shows up at the output #rremp,q (trem,k+1) of
the remote pipe of (p, q) at time trem,k+1, and
– tick k +1 shows up at the output #rlocp,q(tloc,k+1) of the
local pipe of (p, q) at time tloc,k+1,
then tick k = 0 is removed at time trmv,k within
max{trem,k+1, tloc,k+1} +
[





• k ≥ 1: If
– tick k + 1 shows up at the output #rremp,q (trem,k+1) of
the remote pipe of (p, q) at time trem,k+1, and
9 Actually, the pipe delays are accounted for in the delays of the real
channels in the signal path, namely, 〈RE M, q, p〉 and 〈L OC, p, q〉.
– tick k +1 shows up at the output #rlocp,q(tloc,k+1) of the
local pipe of (p, q) at time tloc,k+1, and
– tick k − 1 has been removed at time trmv,k−1,
then tick k is removed at time trmv,k within
max{trem,k+1, tloc,k+1, trmv,k−1} +
[





Thereby τ−Di f f and τ
+
Di f f are timing parameters of the
Diff-gate that specify how fast it removes matching ticks.
On top of #rremp,q (t), #rlocp,q(t) and #dp,q(t), we define the size
of the local and remote pipe of (p, q) at time t as
#slocp,q(t) := #rlocp,q(t) − #dp,q(t)
#sremp,q (t) := #rremp,q (t) − #dp,q(t).
Note that our definitions imply that a tick occupies space in
a pipe only after it reached its end, i.e., when it shows up in





(iii) Pipe Compare Signal Generator (PCSG) module: The
signals provided by the pair of pipes (p, q) and its Diff-Gate
are connected to the PCSG, which generates four signals
with status ˜PG E Q,op,q (t), ˜PG E Q,ep,q (t), ˜PG R,op,q (t) and ˜PG R,ep,q (t)
that characterize the difference of the number of clock ticks
stored in the remote and local pipes by time t . Different sig-
nals are provided for odd and even clock ticks. For example,
status ˜PG E Q,op,q (t) signals when the number of remote clock
ticks is greater or equal than the number of local clock ticks,
provided that the last clock tick that showed up in the local
pipe was odd; ˜PG R,op,q (t) does the same for “greater” replacing
“greater or equal”.
All these signals are fed, via dedicated channels that add
some delay, to the threshold modules of the TG-Alg p.
Behavioral description: The signals generated by the PCSG
associated with (p, q) must satisfy the following properties:
˜PG E Q,op,q (t) := [#rremp,q (t) ≥ #rlocp,q(t)]
∧ [#rlocp,q(t) ∈ Nodd ] ∧ [#slocp,q(t) = 1]
˜PG E Q,ep,q (t) := [#rremp,q (t) ≥ #rlocp,q(t)]
∧ [#rlocp,q(t) ∈ Neven] ∧ [#slocp,q(t) = 1]
˜PG R,op,q (t) := [#rremp,q (t) > #rlocp,q(t)]
∧ [#rlocp,q(t) ∈ Nodd ] ∧ [#slocp,q(t) = 1]
˜PG R,ep,q (t) := [#rremp,q (t) > #rlocp,q(t)]
∧ [#rlocp,q(t) ∈ Neven] ∧ [#slocp,q(t) = 1]
123
336 M. Függer, U. Schmid
Note that these signals need to be true only if the local
pipes contain exactly one tick (#slocp,q(t) = 1), which makes
it easier for an implementation to fulfill these properties.
The above signals are fed into four dedicated channels that
connect the PCSG with the threshold modules, all of which
are initialized to 0:
• Channel 〈PG E QtoG E Q, o, p, q〉 with input ˜PG E Q,op,q (t)
and output ˜G E Qop,q(t) and delay in
[





• Channel 〈PG E QtoG E Q, e, p, q〉 with input ˜PG E Q,ep,q (t),
output ˜G E Qep,q(t) and delay in
[





• Channel 〈PG RtoG R, o, p, q〉 with input ˜PG R,op,q (t), out-







• Channel 〈PG RtoG R, e, p, q〉 with input ˜PG R,ep,q (t), out-







(iv) Threshold modules: ˜G E Qo/ep,q(t) and ˜G Ro/ep,q(t) are fur-
ther processed at four threshold modules: If the number of
active ˜G E Qo/ep,q(t) respectively ˜G Ro/ep,q(t) signals exceeds the
threshold of 2 f + 1 respectively f + 1, the corresponding









. This property will be for-
malized below as a submodule which continuously computes
a Boolean predicate [which is a function of time here], involv-
ing the sum of the status functions of a threshold module’s
input ports, and feeding the result into a channel.
Behavioral description: The threshold modules comprise of
modules with the threshold modules’ input ports and outputs

 ˜T H G E Qop(t) =
∑
q∈Q
˜G E Qop,q(t) ≥ 2 f + 1

 ˜T H G E Qep(t) =
∑
q∈Q
˜G E Qep,q(t) ≥ 2 f + 1






˜G Rop,q(t) ≥ f + 1






˜G Rep,q(t) ≥ f + 1,
as well as successive channels, which are all initialized to 0
and have delay within
[






G E QtoT H G E Q, o, p〉 whose input port is





G E QtoT H G E Q, e, p〉 whose input port is





G RtoT H G R, o, p〉 with input 
 ˜T H G Rop




G RtoT H G R, e, p〉 with input 
 ˜T H G Rep
and output port ˜T H G R
e
p.
(v) Tick generation module: The TG-Alg p generates the
next clock tick, at some time t , when (i) both threshold out-
puts for the previously generated tick, say, ˜T H G E Qop(t) and
˜T H G R
o
p(t), are inactive again, and (ii) at least one thresh-
old output ˜T H G E Qep(t) or ˜T H G R
e
p(t) for the current tick
becomes active. We will refer to (i) as the disabling path and
to (ii) as the enabling path in the sequel. The Tick broadcast
module hence has four input ports connected to the threshold
outputs, and a single output port represented by the counting
function #bp(t), which is the number of ticks broadcast by
p by time t . Finally, #bp(t) is distributed to the local pipe in
(p, q) at TG-Alg p and to the remote pipe in (q, p) at TG-Alg
q, for all q ∈ P \ {p}, via dedicated channels 〈L OC, p, q〉
and 〈RE M, p, q〉, respectively.
Behavioral description: Let the signal ̂bp be the set for
which
(0, 0) ∈ ̂bp,
(t, 0) ∈ ̂bp ⇔ ( ˜T H G E Q
o
p(t) ∨ ˜T H G R
o
p(t))
∧ ¬( ˜T H G E Qep(t) ∨ ˜T H G R
e
p(t)),
(t, 1) ∈ ̂bp ⇔ ˜T H G E Q
e
p(t) ∨ ˜T H G R
e
p(t))
∧ ¬( ˜T H G E Qop(t) ∨ ˜T H G R
o
p(t)).
Then, #bp(t) is defined as the counting function of ̂bp, with
the initial value #bp(0) = 0.
(vi) Interconnect: The n − 1 channels, one for each
q ∈ P \ {p}, 〈L OC, p, q〉 and 〈RE M, p, q〉 for distribut-
ing #bp(t) are all initialized to 0 and adhere to the following
specifications:
• Local channel 〈L OC, p, q〉 with input signal #bp(t), out-







• Remote channel 〈RE M, p, q〉 with input signal #bp(t),





Recall that the channel delays also include the delays of the
pipelines.
4.4 System-level properties
In all executions complying to the system and failure model
introduced in the previous sections, the following properties
must be guaranteed:
(P) Precision: (Theorem 2) There is a constant π , such that
for every pair of correct nodes p, q in C :
123
Fault-tolerant tick generation in VLSI 337
∀t : |#bq(t) − #bp(t)| ≤ π. (3)
(A) Accuracy: (Theorem 3) There are constants R−, O−,
R+, O+ > 0, such that for every correct node p in C :
O−(t2 − t1) − R− ≤ #bp(t2) − #bp(t1)
≤ O+(t2 − t1) + R+. (4)
(S) Size: (Theorems 4 and 6) There are constants Srem and
Sloc, such that for every pair or correct nodes p, q in C :
#slocp,q(t) ≤ Sloc and #sremp,q (t) ≤ Srem .
Informally, the precision requirement (P) just states that
the difference of the number of clock ticks generated by any
two different correct nodes is bounded, whereas the accuracy
requirement (A) guarantees some relation of the progress of
the clock ticks with respect to the progress of real-time. Note
that (A) is also called envelope requirement in literature, and
effectively bounds the frequency of the generated clock ticks.
Finally, the size requirement guarantees that the size of the
pipelines remains bounded.
In the following Sect. 5, we will show that the system of
TG-Algs indeed satisfies the above properties in all execu-
tions complying to our system and failure model, provided
that (a) the implementations of non-faulty basic modules
specified in Sect. 4.3 indeed fulfill their specifications, and
(b) the additional “global” Constraints 1–3 introduced later
hold. Our Theorems 2, 3, 4 and 6 will also establish numeri-
cal values for all the constants introduced above, which only
depend on the delay parameters introduced in the specifica-
tions of the TG-Alg basic modules in Sect. 4.3.
5 Correctness proofs
Our correctness proof and performance analysis has a layered
structure, where the lemmas and theorems in a layer establish
higher-level abstractions atop of the lower-level abstractions
provided by the layer below:
1. The lowest proof layer (Sect. 5.1) deals with the prob-
lem of creating the abstraction of uniquely labeled tick k
messages atop of anonymous up/down transitions: Pro-
vided that Constraint 1 (which bounds the relative speed
ratio of all local channels) holds, the pivotal Interlock-
ing Lemma 3 proves that ticks k −2, k −4, . . . are never
falsely interpreted as tick k by the algorithm.
The result of the Interlocking Lemma allows us to bound
any correct node’s maximum tick frequency (Lemma 4),
and to rule out the possibility of unbounded queueing
effects in the pipes (Lemma 5); the latter requires Con-
straint 2 to hold, which ensures that the Diff-Gate can
digest ticks at the maximum clock frequency. Atop of
these results, it is possible to prove that both the f + 1-
rule (GR, Lemma 6) and the 2 f +1-rule (GEQ, Lemma 7)
work as expected.
2. The intermediate proof layer (Sect. 5.2) establishes ele-
mentary synchronization properties of the ticks gener-
ated at different correct nodes, namely, Progress (P),
Unforgeability (U), Quasi-Simultaneity (QS) and finally
Booting Simultaneity (BS). Note that these results cor-
respond to the synchronization properties of the clas-
sic algorithm of Fig. 2, cp. [44,66,75,81,82]. Our major
Theorem 1 requires Constraint 3 to hold, which essen-
tially guarantees that even the slowest local channel is
faster than the fastest remote channel.
Based on the elementary synchronization properties, it is
possible to bound the progress of the fastest node (Lem-
mas 10–13) and the slowest node (Lemmas 15–17) in the
system.
3. The top layer of our proof (Sect. 5.3) establishes our
major results: Bounds for precision (Theorem 2), accu-
racy (Theorem 3) and pipeline sizes (Theorems 4 and 6).
5.1 Bottom proof layer
We start our detailed treatment with the technical Lemma 2,
which asserts a certain persistence of the number of ticks
present in the local and remote pipe for a certain time.
Lemma 2 If, for a correct node p ∈ C and a different cor-
rect node q ∈ C \ {p}, at time t it holds that
(#rlocp,q(t) = k) ∧ (#slocp,q(t) = 1)
for some k ≥ 1, then it must hold that
#rlocp,q(t − τ−Di f f ) ≥ k and
#rremp,q (t − τ−Di f f ) ≥ k.
Proof See Lemma 2 in Appendix A. unionsq
We next establish a main result, namely our Interlocking
Lemma 3, which states that an “old” tick k − 2, k − 4, . . .
is never falsely interpreted as a “new” tick k in the GR and
GEQ rules of the algorithm. This is not immediately evident:
An even tick k + 1 is generated by ˜T H G E Qo or ˜T H G Ro
being active (depending on whether the GEQ or the GR rule
triggered the broadcast); assume that it was triggered by the
˜T H G E Qo signal. ˜T H G E Qo is only enabled, if enough (at
least 2 f +1) ˜G E Qo signals are active. For all of these signals,
it must hold that the responsible +/− counter has received
an even number of ticks locally, that is, any ˜G E Qo may be
based on local tick k or k − 2 or …. We, however, require
tick k + 1 to depend solely on ˜G E Qo signals based on tick
k only.
123
338 M. Függer, U. Schmid
The Interlocking Lemma will require that for each
TG-Alg, all the delays along a path through the Threshold
module, via the PCSG module and along the local loop are
within about a factor 2 of each other, which is expressed
formally in Constraint 1.
Constraint 1 (Interlocking Constraint). We abbreviate
Tmax := τ+T H + max(τ+G R, τ+G E Q) + τ+loc
Tmin := τ−T H + min(τ−G R, τ−G E Q) + τ−loc + τ−Di f f (5)
Tmin,dis := τ−T H + min(τ−G R, τ−G E Q) + τ−loc.
Then the relation Tmax ≤ Tmin + Tmin,dis must hold.
Lemma 3 (Interlocking lemma) If, for some correct node p
and k + 1 ≥ 2, #bp(t) = k + 1, then:
(i) There exists a set Q of size |Q| ≥ 2 f + 1 such that for
t ′ := t − τ−T H − τ−G E Q:
k ∈ Neven ⇒ ∀q ∈ Q : ∃tq ≤ t ′ :
˜PG E Q,ep,q (tq) ∧ #rlocp,q(tq) ≥ k
k ∈ Nodd ⇒ ∀q ∈ Q : ∃tq ≤ t ′ :
˜PG E Q,op,q (tq) ∧ #rlocp,q(tq) ≥ k
(ii) or there exists a set Q of size |Q| ≥ f + 1 such that
for t ′ := t − τ−T H − τ−G R:
k ∈ Neven ⇒ ∀q ∈ Q : ∃tq ≤ t ′ :
˜PG R,ep,q (tq) ∧ #rlocp,q(tq) ≥ k
k ∈ Nodd ⇒ ∀q ∈ Q : ∃tq ≤ t ′ :
˜PG R,op,q (tq) ∧ #rlocp,q(tq) ≥ k
Proof The proof is by induction on k + 1 ≥ 2, the num-
ber of ticks sent by node p. The lemma is first shown for
tick k + 1 = 2. Then we assume that some tick k + 1 > 2 is
the first tick for which the lemma does not hold. By investi-
gating the cause which triggered the sending of this tick, we
obtain a contradiction to Constraint 1.
Begin (k +1 = 2): Assume p sends tick 2 at time tp,2. Then,
by the algorithm (specification of the tick generation mod-
ule), (a) ˜T H G E Qop(tp,2) or (b) ˜T H G R
o
p(tp,2) must have
held. We consider both cases:
case (a): If ˜T H G E Qop(tp,2), then by the algorithm (speci-
fication of the threshold modules), there must be a set Q ⊆
P \ {p}, of size |Q| ≥ 2 f + 1, such that, for time t ′ :=
δ−1(〈
G E QtoT H G E Q, o, p〉; tp,2)
∀q ∈ Q : ˜G E Qop,q(t ′).
Again by the algorithm (specification of the PCSG to thresh-
old module channels), tq:=δ−1(
〈PG E QtoG E Q, o, p, q〉; t ′)
defined for every q ∈ Q, we obtain
∀q ∈ Q : ˜PG E Q,op,q (tq)
and by this
˜PG E Q,op,q (tq) ≡ [#rremp,q (tq) ≥ #rlocp,q(tq)]
∧[#rlocp,q(tq) ∈ Nodd ] ∧ [#slocp,q(tq) = 1]. (6)
Since #rlocp,q(tq) ≥ 0 from reset on and #rlocp,q(tq) ∈ Nodd ,
#rlocp,q(tq) ≥ 1 = k. (7)
Finally, from the channel properties, we know that
tp,2 − tq = tp,2 − δ−1
(〈





G E QtoT H G E Q, o, p〉; tp,2
)
)
≥ τ−T H + τ−G E Q .
The lemma follows.
case (b): If ˜T H G Rop(tp,2), then, by the algorithm (specifi-
cation of the threshold modules), there must be a set Q ⊆
P \ {p}, of size |Q| ≥ f + 1, such that, for time t ′ :=
δ−1(〈
G RtoT H G R, o, p〉; tp,2)
∀q ∈ Q : ˜G Rop,q(t ′).
By the algorithm (specification of the PCSG to thresh-
old module channels), and with tq := δ−1(
〈PG RtoG R, o,
p, q
〉 ; t ′) defined for every q ∈ Q, this implies
∀q ∈ Q : ˜PG R,op,q (tq) (8)
and by this
˜PG R,op,q (tq) ≡ [#rremp,q (tq) > #rlocp,q(tq)]
∧[#rlocp,q(tq) ∈ Nodd ] ∧ [#slocp,q(tq) = 1]. (9)
Since #rlocp,q(tq) ≥ 0 from reset on and #rlocp,q(tq) ∈ Nodd ,
#rlocp,q(tq) ≥ 1 = k. (10)
From the channel properties, we know that
tp,2 − tq = tp,2 − δ−1
(〈





G RtoT H G R, o, p〉; tp,2
)
)
≥ τ−T H + τ−G R .
The lemma follows, again.
Step (k + 1 ≥ 3): Assume by contradiction that k + 1 is the
first tick for which the lemma does not hold. Let tp,k+1 be
the time p sends tick k + 1. Assume wlog. that k ∈ Nodd 10.
We will establish two delay bounds, one on the enabling path
and the other on the disabling path.
10 The proof for k ∈ Neven is analogous.
123
Fault-tolerant tick generation in VLSI 339
Enabling path: To send tick k+1 at time tp,k+1, by the algo-
rithm (specification of the tick generation module), at least
one of the two threshold signals must have been enabled, i.e.,
(a) ˜T H G E Qop(tp,k+1) or (b) ˜T H G R
o
p(tp,k+1) must have
held. We consider both cases:
case (a): If ˜T H G E Qop(tp,k+1), then, by the algorithm (spec-
ification of the threshold modules), there must be a set Q ⊆
P \ {p}, of size |Q| ≥ 2 f + 1, such that, at time t ′ :=
δ−1(〈
G E QtoT H G E Q, o, p〉; tp,k+1)
∀q ∈ Q : ˜G E Qop,q(t ′). (11)
Again by the algorithm (PCSG to threshold module chan-
nels), at tq := δ−1(
〈PG E QtoG E Q, o, p, q〉; t ′) defined for
every q ∈ Q, this implies
∀q ∈ Q : ˜PG E Q,op,q (tq) (12)
and by this
˜PG E Q,op,q (tq) ≡ [#rremp,q (tq) ≥ #rlocp,q(tq)] ∧
∧[#rlocp,q(tq) ∈ Nodd ] ∧ [#slocp,q(tq) = 1].
(13)
By the channel properties
τ−T H + τ−G E Q ≤ tp,k+1 − tq ≤ τ+T H + τ+G E Q . (14)
Assuming that ∀q ∈ Q : #rlocp,q(tq) ≥ k yields the desired
result of the Lemma. Thus we only have to investigate the
negation:
∃q ∈ Q : #rlocp,q(tq) < k.
Since, by (13), #rlocp,q(tq) ∈ Nodd , we obtain
∃q ∈ Q : #rlocp,q(tq) ≤ k − 2.
Thus, tick k −1 must be received in the local pipe of pipepair
(p, q) at time trcv,k−1, with
trcv,k−1 > tq .
The combination with (14) yields
tp,k+1 − trcv,k−1 < τ+T H + τ+G E Q . (15)
Let tp,k−1 be the sending time of tick k − 1. Clearly, by the
local channel properties, we arrive at
tp,k−1 ≥ trcv,k−1 − τ+loc
⇒ tp,k+1 − tp,k−1 < τ+T H + τ+G E Q + τ+loc. (16)
Before proceeding further, we will handle the disabling path.
Disabling path: Let tp,k be the sending time of tick k. By
the induction hypothesis, we know that the lemma holds for
tick k. According to the lemma, we have to distinguish two
cases (i) and (ii):
case (a.i): Assume that implication (i) is valid, i.e., there
exists a set Q′ of size |Q′| ≥ 2 f + 1, such that, for
tq ′ := δ−1
(〈









∀q ′ ∈ Q′ : ˜PG E Q,ep,q ′ (tq ′) ∧ #rlocp,q ′(tq ′) ≥ k − 1. (17)
Thus, local tick k − 1 must have been received in pipepair
(p, q ′) at time tq ′,rcv,k−1, with
tq ′,rcv,k−1 ≤ tq ′ − τ−Di f f
by Lemma 2. Furthermore, by the properties of the local
channels,
tq ′,rcv,k−1 − tp,k−1 ≥ τ−loc.
Thus, we find
tp,k − tp,k−1 ≥ τ−T H + τ−G E Q + τ−Di f f + τ−loc. (18)
From the algorithm (specification of the tick generation mod-
ule), it follows that at time tp,k+1
¬ ˜T H G E Qep(tp,k+1) (19)
[and also ¬ ˜T H G Rep(tp,k+1), which is handled analogously,
cf. (a.ii) below] must hold, i.e., the threshold signals that
generated (the odd) tick k must be inactive again. Therefore,
for the times tq ′′ defined as
tq ′′ := δ−1
(〈









it must hold that
Q′′, |Q′′| ≥ 2 f + 1 : ∀q ′′ ∈ Q′′ : ˜PG E Q,ep,q ′′ (tq ′′). (21)
Let us choose Q′′ := Q′. Because of the FIFO property of
the channels and because of tp,k < tp,k+1, we obtain
tq ′ < tq ′′ .
Clearly, there has to be at least one q ′ ∈ Q′ for which
˜PG E Q,ep,q ′ (tq ′) but ¬˜PG E Q,ep,q ′ (tq ′′), (22)
since otherwise Q′′ := Q′ would have been a choice for Q′′,
contradicting (21). However, (22) can only hold if local tick k
has been in pipepair (p, q ′) at time tq ′,rcv,k , with
tq ′,rcv,k ≤ tq ′′ .
In combination with (20) and the channel properties, this
implies
tp,k+1 − tq ′,rcv,k ≥ τ−T H + τ−G E Q (23)
123
340 M. Függer, U. Schmid
and, by the local channel properties,
tq ′,rcv,k − tp,k ≥ τ−loc. (24)
Finally, (18) together with (23) and (24) yields
tp,k+1 − tp,k−1 ≥ (τ−T H + τ−G E Q + τ−Di f f + τ−loc)
+(τ−T H + τ−G E Q + τ−loc). (25)
case (a.ii): Assuming that implication (ii) is valid, i.e., that
there exists a set Q′ of size |Q′| ≥ f + 1, such that, for
tq ′ := δ−1
(〈









∀q ′ ∈ Q′ : ˜PG R,ep,q ′ (tq ′) ∧ #rlocp,q ′(tq ′) ≥ k − 1.
By analogous arguments as in case (a.i), we obtain
tp,k+1 − tp,k−1 ≥ (τ−T H + τ−G R + τ−Di f f + τ−loc)
+(τ−T H + τ−G R + τ−loc). (26)
Combination of (a.i) and (a.ii): Combining (16), (25) and
(26) leads to
(τ−T H + min{τ−G E Q, τ−G R} + τ−Di f f + τ−loc)
+(τ−T H + min{τ−G E Q, τ−G R} + τ−loc)
≤ tp,k+1 − tp,k−1 < τ+T H + τ+G E Q + τ+loc,
which is a contradiction to Constraint 1. The lemma follows.
case (b): If ˜T H G Rop(tp,k+1), then, by the algorithm (spec-
ification of the threshold module), there must be a set Q ⊆
P \ {p}, of size |Q| ≥ f + 1, such that, for time t ′ :=
δ−1(〈
G RtoT H G R, o, p〉; tp,k+1)
∀q ∈ Q : ˜G Rop,q(t ′).
By analogous arguments as in case (a), we obtain an analogon
to (16) as
tp,k+1 − tp,k−1 < τ+T H + τ+G R + τ+loc, (27)
as well as analogons to (25) and (26), namely,
tp,k+1 − tp,k−1 ≥ (τ−T H + τ−G E Q + τ−Di f f + τ−loc)
+(τ−T H + τ−G E Q + τ−loc). (28)
and
tp,k+1 − tp,k−1 ≥ (τ−T H + τ−G R + τ−Di f f + τ−loc)
+(τ−T H + τ−G R + τ−loc). (29)
Combination of (b.i) and (b.ii): Combining (27), (28) and
(29) leads to
(τ−T H + min{τ−G E Q, τ−G R} + τ−Di f f + τ−loc)
+(τ−T H + min{τ−G E Q, τ−G R} + τ−loc)
≤ tp,k+1 − tp,k−1 < τ+T H + τ+G R + τ+loc,
which is a contradiction to Constraint 1. The lemma follows.
unionsq
The next lemma establishes a minimum duration between
any two successive ticks generated at a correct node, i.e., an
upper bound on the clock frequency that could possibly be
generated locally at any correct node.
Lemma 4 If correct node p sends tick k ≥ 1 at time tp,k ,
then it cannot send tick k + 1 before tp,k + Tmin.
Proof Assume by contradiction that p sends tick k + 1 at
time tp,k+1, wlog. for k ∈ Neven , with
tp,k+1 − tp,k < Tmin . (30)
We apply Lemma 3 for tick k + 1. Thus, implication (i) or
(ii) has to be true:
case (i): There exists at least one node q ∈ Q, such that, for
some tq ≤ tp,k+1 − τ−T H − τ−G E Q
˜PG E Q,ep,q (tq) ∧ (#rlocp,q(tq) ≥ k)
⇒ (#slocp,q(tq) = 1) ∧ (#rlocp,q(tq) ≥ k).
By applying Lemma 2, we obtain
#rlocp,q(tq − τ−Di f f ) ≥ k. (31)
Furthermore, by the local channel property and (Pmin) in
Lemma 1,
#rlocp,q(tq − τ−Di f f ) ≤ #bp(tq − τ−Di f f − τ−loc). (32)
Combining (31) and (32) yields
#bp(tq − τ−Di f f − τ−loc) ≥ k,
i.e., tick k must have been sent at time tp,k with
tp,k ≤ tq − τ−Di f f − τ−loc
≤ tp,k+1 − τ−Di f f − τ−T H − τ−G E Q − τ−loc
≤ tp,k+1 − τ−Di f f − τ−T H − min{τ−G E Q, τ−G R} − τ−loc
= tp,k+1 − Tmin,
contradicting (30). The lemma follows.
case (ii): Starting from (ii) in Lemma 3, the contradiction is
derived analogously as for case (i).
The lemma follows in both cases. unionsq
Lemma 5 together with Constraint 2 allows us to exclude
the possibility of unbounded queuing effects inside a pipepair
(p, q) of local/remote pipes #rlocp,q(t) and #rremp,q (t) situated
at correct node p and corresponding to correct node q. Con-
straint 2 ensures that the Diff-gate can digest matching ticks
at least as fast as any node can generate them.
Constraint 2 τ+Di f f ≤ Tmin
Lemma 5 For any pair of distinct correct nodes p, q and
k ≥ 1: If correct node p sent tick k at tp,k and q sent tick k
at tq,k , then tick k − 1 is removed from the local and remote
pipe of pipepair (p, q) by time max{tp,k +τ+loc, tq,k +τ+rem}+
τ+Di f f , if Constraint 2 holds.
123
Fault-tolerant tick generation in VLSI 341
Proof The proof is by induction on the number of ticks k ≥ 1
that are sent by p and q.
Begin (k = 1): Assume that p sends tick 1 at tp,k . Tick 1
will be received in any of p’s local pipes by tp,k + τ+loc.
Furthermore, assume that q sends tick 1 at tq,k . It will cer-
tainly be received in the remote pipe of the pipepair (p, q) by
tq,k +τ+rem . Since there is no tick that could block the remov-
ing of tick 0, tick 0 is removed by max{tp,k + τ+loc, tq,k +
τ+rem} + τ+Di f f from both the local and remote pipe, accord-
ing to the Diff-gate properties. The lemma follows.
Step (k > 1): As our induction hypothesis, assume that
tick k − 2 is removed from both pipes by max{tp,k−1 +
τ+loc, tq,k−1 + τ+rem} + τ+Di f f . By analogous arguments as
above, tick k is received in both pipes of (p, q) by tboth ,
defined as
tboth := max{tp,k + τ+loc, tq,k + τ+rem}.
Because of Lemma 4, consecutive ticks cannot be generated
with less than Tmin distance in-between, i.e.,
tp,k − tp,k−1 ≥ Tmin and
tq,k − tq,k−1 ≥ Tmin .
Thus
tboth ≥ max{tp,k−1 + τ+loc, tq,k−1 + τ+rem} + Tmin
≥ max{tp,k−1 + τ+loc, tq,k−1 + τ+rem} + τ+Di f f
by Constraint 2. According to the induction hypothesis, we
know that tick k − 2 has already been removed by tboth . By
the properties of the Diff-gate, tick k − 1 is hence removed
by tboth + τ+Di f f . The lemma follows. unionsq
The next lemma establishes the result, that the GR-rule
really does its duty, that is, sends the next tick if sufficiently
many ticks are available.
Lemma 6 For all correct nodes p and ticks k ≥ 1: If there
exists a set Q of correct nodes with |Q| ≥ f + 1, such that:
(i) all nodes q ∈ Q send tick k + 1 by tQ,k+1.
(ii) p sends tick k by tp,k ,
then p sends tick k + 1 by t ′ := max{tp,k + τ+loc + τ+Di f f +
τ+G R, tp,k + τ+loc + τ+G E Q, tQ,k+1 + τ+rem + τ+G R} + τ+T H .
Proof Wlog. assume that k ∈ Neven . We distinguish two
cases:
(a) Suppose that ∃q ∈ Q : #rlocp,q(t ′) > k. Since #rlocp,q(t ′) ≤
#bp(t ′), as no tick can be locally received if it has not
been sent before, #bp(t ′) > k must hold. Thus, p has
sent tick k + 1 by t ′. The lemma follows.
(b) Otherwise, assume that
∀q ∈ Q : #rlocp,q(t ′) ≤ k. (33)
By Lemma 4, it follows that all nodes q send tick k by
tQ,k with tQ,k ≤ tQ,k+1 − Tmin . By applying Lemma 5,
we conclude that tick k − 1 is removed from all of p’s
pipepairs (for all q ∈ Q) by
tdel,k−1 := max{tp,k + τ+loc, tQ,k + τ+rem} + τ+Di f f
≤ max{tp,k +τ+loc, tQ,k+1−Tmin + τ+rem}+τ+Di f f
≤ max{tp,k + τ+loc + τ+Di f f , tQ,k+1 + τ+rem}
due to Constraint 2.
Clearly, since all nodes q ∈ Q are correct and have sent
tick k + 1 by tQ,k+1, tick k + 1 must be received in
the remote pipe of (p, q) by trcv,Q := tQ,k+1 + τ+rem .
Furthermore, p must receive tick k in all its local pipes
by trcv,p := tp,k + τ+loc. Consequently, at time trcv :=
max{trcv,p, trcv,Q, tdel,k−1} ≤ max{tp,k +τ+loc +τ+Di f f ,
tQ,k+1 + τ+rem} with trcv ≤ t ′, it holds that:
#rlocp,q(trcv) ≥ k,
#rremp,q (trcv) ≥ k + 1, and (34)
#dp,q(t) ≥ k − 1.
By combination of (34) with (33), it holds that for all
ξ ∈ [trcv, t ′
]
:
#rlocp,q(ξ) = k. (35)
Furthermore,
˜PG R,ep,q (ξ) ≡ (#rremp,q (ξ) > #rlocp,q(ξ))
∧ (#rlocp,q(ξ) ∈ Neven) ∧ (#slocp,q(ξ) = 1)
≡ (#rremp,q (ξ)>k)∧(#rlocp,q(ξ)−#dp,q(ξ)=1)
≡ (#dp,q(ξ) = k − 1). (36)
Assuming ∃q ∈ Q : #dp,q(ξ) > k − 1 implies
#rlocp,q(ξ) > k, since by definition of the Diff-gate’s
behavior, there must be at least one tick in the local pipe
which was not deleted, thereby contradicting (35). Thus
it must hold that ∀q ∈ Q : #dp,q(ξ) = k −1. Therefore





By the algorithm (specification of the PCSG to threshold
module channels), ∀q ∈ Q : ˜G Rep,q(ξ) is true for times
ξ ∈ [trcv + τ+G R, t ′
]
. Again by the algorithm (threshold
module),




342 M. Függer, U. Schmid
is true for each time ξ within
[





tp,k + τ+loc + τ+Di f f
tQ,k+1 + τ+rem
}
+ τ+G R + τ+T H , t ′
]
.
It remains to be shown that the disabling path can-
not inhibit the generation of tick k + 1 at p. For the
sake of contradiction, assume that the disabling path
can enforce tp,k+1 > t ′.
Because of the lemma’s assumption (ii), tick k must
eventually be received in all of p’s local pipes cor-
responding to correct nodes r , i.e., ∀r ∈ C \ {p} :
#rlocp,r (ξ) ≥ k, for ξ ∈
[
tp,k + τ+loc, t ′
]
. Since tick k + 1
is not generated by t ′, we actually have #rlocp,r (ξ) = k.
As k ∈ Neven , this implies
∀r ∈ C \ {p} : ¬
(
˜PG R,op,r (ξ) ∨ ˜PG E Q,op,r (ξ)
)
.
Consequently, by the algorithm (specification of the
PCSG to threshold module channels and the threshold
modules), together with the fact that there are only up




˜T H G R
o







tp,k + τ+loc + max{τ+G R, τ+G E Q} + τ+T H , t ′
]
.
Combining (37) and (38) and noting that
max{τ+G R + τ+Di f f , max{τ+G R, τ+G E Q}} = max{τ+G R +
τ+Di f f , τ
+
G E Q}, it is apparent that p must send tick k + 1
by t ′, providing the required contradiction. The lemma
follows.
The lemma follows in both cases. unionsq
Analogous to Lemma 6, the next lemma states that the GEQ-
rule does its duty.
Lemma 7 For all correct nodes p and ticks k ≥ 1: If there
exists a set Q of correct nodes with |Q| ≥ 2 f +1, such that:
(i) all nodes q ∈ Q send tick k by tQ,k ,
(ii) p sends tick k by tp,k ,
then p sends tick k + 1 by t ′ := max{tp,k + τ+loc + τ+Di f f +
τ+G E Q, tp,k +τ+loc +τ+G R, tQ,k +τ+rem +τ+Di f f +τ+G E Q}+τ+T H .
Proof See Lemma 7 in Appendix A. unionsq
5.2 Intermediate proof layer
Based on the results of the bottom proof layer, we can now
establish elementary synchronization properties of the ticks
generated at different correct nodes. The following Theo-
rem 1 corresponds to well-known classic results on consis-
tent broadcasting [44,66,75,82], which are expressed and
proved in our new modeling framework. Major differences
to the existing proofs are the far lower level of abstraction
at which the proofs have to be carried out, and the problems
arising from queueing effects that are due to bounded queue
sizes of the local and remote queues and the fact that a TG-
Alg can process and generate ticks not arbitrarily fast, both of
which is in contrast to the original algorithm stated in Fig. 2.
For the theorem to hold, Constraint 3 must hold, which essen-
tially guarantees that even the slowest local channel is faster
than the fastest remote channel.
Constraint 3 For
T −f irst := τ−rem + τ−Di f f + τ−G E Q + τ−T H , (39)
the relation T −f irst ≥ τ+loc +max{τ+Di f f +τ+G R, τ+G E Q}+τ+T H
must hold.
Theorem 1 (Synchronization Properties). The algorithm
satisfies the synchronization properties Progress (P),
Unforgeability (U), Quasi-Simultaneity (QS), and Booting-
Simultaneity (BS), if Constraints 1, 2, 3 and n ≥ 3 f + 2
hold.
(P) Progress. If all correct nodes send tick k ≥ 1 by time
t, then every correct node sends at least tick k + 1 by time





τ+loc + τ+Di f f + τ+G E Q,
τ+loc + τ+G R,




+ τ+T H . (40)
(U) Unforgeability. If no correct node sends tick k ≥ 1
by time t, then no correct node sends tick k + 1 by time
t + T −f irst or earlier, where T −f irst is given by (39).
(QS) Quasi-Simultaneity. If some correct node p sends tick
k + 1 ≥ 2 by time t, then every correct node (p included)
sends at least tick k by time t + TQS, with
TQS := max
{













− T −f irst
}
+ (τ+T H − τ−T H ). (41)
(BS) Booting-Simultaneity. If some correct node sends tick
k ≥ 1 by time t, then every correct node sends at least
tick k by time t + TBS(k), with













+(τ+T H − τ−T H ) + (TP − T −f irst )(k − 1). (42)
123
Fault-tolerant tick generation in VLSI 343
We will show the properties Progress (P), Unforgeability (U),
Quasi-Simultaneity (QS) and Booting-Simultaneity (BS) one
after the other.
Progress (P)
Proof Assume that all correct nodes C , with |C | ≥ 2 f + 2,
sent tick k ≥ 1 by time t . Now focus on a correct node
p ∈ C : We can apply Lemma 7 with Q = C \ {p} and
tp,k = tQ,k = t . Thus, p must send tick k + 1 by




tp,k + τ+loc + τ+Di f f + τ+G E Q,
tp,k + τ+loc + τ+G R,









τ+loc + τ+Di f f + τ+G E Q,
τ+loc + τ+G R,





= t + TP .
The property follows. unionsq
Unforgeability (U)
Proof Let p be the first correct node that sends tick k+1 ≥ 2
at time tp,k+1, and assume wlog. that k ∈ Neven . We apply
Lemma 3 and consider the two possible cases:
(i) Since |Q| ≥ 2 f + 1, there must be a subset C ′ ⊆ Q
of correct nodes of size |C ′| ≥ f + 1. Clearly, it must
hold that
∀r ∈ C ′ : ∃tr ≤ t ′ : ˜PG E Q,ep,r (tr ) ∧ #rlocp,r (tr ) ≥ k
with t ′ = tp,k+1−τ−T H −τ−G E Q . By applying Lemma 2,
we obtain for each of the r ∈ C ′
#rremp,r (t
′ − τ−Di f f ) ≥ k.
By the remote channel properties, this implies #br (t ′′)
≥ k with
t ′′ := t ′ − τ−Di f f − τ−rem
= tp,k+1 − τ−T H − τ−G E Q − τ−Di f f − τ−rem
= tp,k+1 − T −f irst ,
i.e., node r — and by this the first correct node which
sent tick k — has sent tick k by tp,k+1 − T −f irst . The
property follows.
(ii) Since |Q| ≥ f + 1, there must be at least one correct
nodes r = p among Q for which #rremp,r (t ′) ≥ k + 1
with t ′ = tp,k+1 − τ−T H − τ−G R . For r , this implies
#br (t ′) ≥ k + 1 — a contradiction to the assumption
that p was the first correct node to send tick k + 1.
Again, the property follows.
The property follows in both cases. unionsq
Before turning to the proof of (QS), we proceed with two
technical lemmas.
Lemma 8 If the first correct node p sends tick k + 1 ≥ 2
at time tp,k+1, then at t ′ := tp,k+1 − τ−T H − τ−G E Q − τ−Di f f
it must hold that: There exists a set Q of size |Q| ≥ 2 f + 1
such that
∀q ∈ Q : #rremp,q (t ′) ≥ k
Proof Analogous to the proof of (U), we apply Lemmas 3
and 2, and consider the two possible cases:
(i) This case exactly matches the implication of our
lemma.
(ii) Since, |Q| ≥ f + 1, there must be at least one correct
node r = p among Q which has already sent tick k+1
before p did; this contradicts the assumption that p is
the first correct node to send tick k + 1.
The lemma holds in both cases. unionsq
Lemma 9 Let tb be the time when the first correct node com-
pletes booting. Every correct node must send tick 1 within the
interval
[
t f irst,1, tlast,1
]
with
t f irst,1 ≥ tb + min{τ−G E Q, τ−G R} + τ−T H , and
tlast,1 ≤ tb + B + max{τ+G E Q, τ+G R} + τ+T H .
Proof Let p be the first correct node that completes booting,
i.e., tb = tp,b. Since the virtual tick 0 is already available in
all pipepairs (p, q) at p at time tb, the first correct node that
sends tick 1 does so at t f irst,1, with
t f irst,1 ≥ tb + min{τ−G E Q, τ−G R} + τ−T H .
Since all other correct nodes boot by tb + B, they all must
send tick 1 by time tlast,1 with
tlast,1 ≤ tb + B + max{τ+G E Q, τ+G R} + τ+T H .
The lemma follows. unionsq
Quasy Simultaneity (QS)
Proof The proof is by induction on the number of ticks k +
1 ≥ 2 sent by the first correct node.
Begin (k + 1 = 2): By Lemma 9, the first correct node must
send tick 1 at t f irst,1 with
t f irst,1 ≥ tb + min{τ−G E Q, τ−G R} + τ−T H .
123
344 M. Függer, U. Schmid
Let t f irst,2 be the time when the first correct node sends tick
2. By Unforgeability,
t f irst,2 ≥ t f irst,1 + T −f irst
≥ tb + min{τ−G E Q, τ−G R} + τ−T H + T −f irst .
By Lemma 9, all other correct nodes must send tick 1 by
tlast,1 with
tlast,1 ≤ tb + B + max{τ+G E Q, τ+G R} + τ+T H .
Thus,
tlast,1 − t f irst,2 ≤ B + max{τ+G E Q, τ+G R}
− min{τ−G E Q, τ−G R} + (τ+T H − τ−T H ) − T −f irst ≤ TQS .
Step (k + 1 ≥ 3): Let t f irst,k be the time when the first
correct node sends tick k. As our induction hypothesis, we
assume that all correct nodes send tick k − 1 by tlast,k−1 ≤
t f irst,k + TQS .
Let t f irst,k+1 be the time the first correct node, say p, sends
tick k+1 ≥ 3. By Lemma 8, there exists a set Q of size |Q| ≥
2 f +1, such that at time t ′ = t f irst,k+1−τ−T H−τ−G E Q−τ−Di f f :
∀q ∈ Q : #rremp,q (t ′) ≥ k. (43)
Clearly, there is a subset ˜Q ⊆ Q of correct nodes among Q
of size at least |˜Q| ≥ f + 1. Let ˜Q′ := C \ (˜Q ∪ {p}). Now
consider the partitioning of correct nodes C = ˜Q ∪{p}∪ ˜Q′.
We will prove the lemma separately for each of the three
partitions: In case q ∈ ˜Q, by (43) and the remote channel
properties, q has sent tick k by t ′ −τ−rem < t f irst,k+1. In case
q = p, it has sent tick k by t f irst,k+1. For the only non-trivial
case q ∈ ˜Q′, it follows from the remote channel properties
that any q˜ ∈ ˜Q, as a correct node, must have sent tick k by
t
˜Q,k := t f irst,k+1 − τ−T H − τ−G E Q − τ−Di f f − τ−rem . (44)
Now consider an arbitrary correct node r ∈ ˜Q′. By the induc-
tion hypothesis and (U),
tr,k−1 ≤ t f irst,k + TQS
≤ t f irst,k+1 − T −f irst + TQS . (45)
We may now apply Lemma 6 to node r and the set of correct
nodes ˜Q with tr,k−1 and t˜Q,k from (45) and (44). This yields:







tr,k−1 + τ+loc + τ+Di f f + τ+G R,
tr,k−1 + τ+loc + τ+G E Q ,
t














−T −f irst + TQS +τ+loc + τ+Di f f + τ+G R,
−T −f irst +TQS +τ+loc + τ+G E Q ,






≤ t f irst,k+1 + max
{
TQS, TQS,−τ−T H − τG E Q
−τ−Di f f − τ−rem + τ+rem + τ+G R + τ+T H
}
(46)
≤ t f irst,k+1 + TQS, (47)
where (46) follows by applying Constraint 3 and (47) from
the definition of TQS . unionsq
Booting Simultaneity (BS)
Proof The proof is by induction on k ≥ 1.
Begin (k = 1): By analogous means as in the (QS) proof,
we obtain
t f irst,1 ≥ tb + min{τ−G E Q, τ−G R} + τ−T H
tlast,1 ≤ tb + B + max{τ+G E Q, τ+G R} + τ+T H .
Thus,
tlast,1 − t f irst,1 ≤ B + max{τ+G E Q, τ+G R}
− min{τ−G E Q, τ−G R} + (τ+T H − τ−T H ) ≤ TBS(1).
The proposition follows.
Step (k > 1): Assume the Lemma is true for k. From (U)
and (P), it follows that
t f irst,k+1 − t f irst,k ≥ T −f irst
tlast,k+1 − tlast,k ≤ TP .
In combination with the induction hypothesis, this yields:




+ (tlast,k − t f irst,k
) + (t f irst,k − t f irst,k+1
)
≤ TP + TBS(k) − T −f irst = TBS(k + 1).
The proposition follows. unionsq
Lemma 10 (Fastest Progress). Assume that p is the first cor-
rect node that sends tick number k ≥ 1 at time t f irst,k . Then
no correct node can send tick k′ ≥ k before time t f irst,k +
(k′ − k)T −f irst .
Proof The proof is by induction on k′ − k.
Begin (k′ = k): The lemma trivially holds since the first
correct node cannot send tick k before time t f irst,k .
Step (k′ ≥ k +1): Assume that p is the first correct node that
sends tick k. The first correct node q ∈ C , by the induction
hypothesis, does not send tick k′ before t + (k′ − k)T −f irst .
Because of (U), no other correct node can send tick k′ + 1
by time t + (k′ − k)T −f irst + T −f irst = t + (k′ + 1 − k)T −f irst .
The lemma follows. unionsq
123
Fault-tolerant tick generation in VLSI 345
Having completed the proof of our major Theorem 1, we
proceed with Lemmas 11, 12 and 13 that bound the progress
of the ticks generated by the fastest node. For this purpose,
we define bmax (t) as the maximum of #bp(t) over all cor-
rect nodes C , i.e., bmax (t) := max{#bp(t) | p ∈ C}. Sim-
ilarly, we define bmin(t) := min{#bp(t) | p ∈ C}. Recall
that f (t→) denotes the left limit of function f at time t .
For example, if node p sends tick k ≥ 1 at time tp,k , then
#bp(t→p,k) = k − 1 (whereas #bp(tp,k) = k).
Lemma 11 (Maximum Increase of bmax in (t f irst,k, t
)). If
the first correct node sends tick k ≥ 1 at t f irst,k , then for all
times t > t f irst,k
bmax (t→) − bmax (t f irst,k) ≤
⌈




or, equivalently: The number N of ticks sent by the correct









Proof Let t f irst, j be the time when the first correct node
sends tick j , and assume by contradiction that
N ≥
⌈




According to the definition of N ,
t f irst,k+N < t. (49)
By applying Lemma 10 to t f irst,k and t f irst,k+N , we find
t f irst,k+N − t f irst,k ≥ N T −f irst
≥
⌈




≥ t − t f irst,k . (50)
Clearly, (50) contradicts (49). The lemma follows. unionsq
Lemma 12 (Maximum Increase of bmax in (t f irst,k, t
]). If
the first correct node sends tick k ≥ 1 at t f irst,k , then
∀t > t f irst,k : bmax (t) − bmax (t f irst,k) ≤
⌊
t − t f irst,k
T −f irst
⌋
or, equivalently: The number N of ticks sent by the correct
first node in the interval I = (tt f irst,k , t
]






The following Lemma 13 is a weaker form of Lemma 12,
where the beginning of the interval I not necessarily coin-
cides with the sending of a tick by the first correct node, i.e.,
where I is not “aligned” with t f irst,k .
Lemma 13 (Maximum Increase of bmax in (t, t ′]).
∀t ′ > t : bmax (t ′) − bmax (t) ≤
⌈
t ′ − t
T −f irst
⌉
or, equivalently: The number N of ticks sent by the correct






The following Lemma 14 is an analogon to Lemma 13 for
any correct node p.




∀t ′ > t : #bp(t ′) − #bp(t) ≤
⌈
t ′ − t
Tmin
⌉
or, equivalently: The number N of ticks sent by the correct






The next Lemma 15 bounds the progress of the last correct
node.
Lemma 15 (Last Progress) If the last correct node sends
tick k ≥ 1 at tlast,k , then the last correct node sends tick k +
N , N ≥ 0, by tlast,k+N , with
tlast,k+N − tlast,k ≤ N TP .
Proof The proof is by induction on N :
Begin (N = 0): Clearly tlast,k − tlast,k ≤ 0 is true. The
lemma follows.
Step (N > 0): As induction hypothesis, assume that the
lemma is true for N − 1. By applying (P), we immediately
obtain




+ (tlast,k+N−1 − tlast,k
)
≤ TP + (N − 1)TP = N TP . (51)
The lemma follows. unionsq
The following two simple technical lemmas complete the
intermediate proof layer.
Lemma 16 (Progress by (QS)) If p is a correct node which
sends tick k ≥ 2 at tp,k , then p sends tick k + N , N ≥ 1, at
tp,k+N with
tp,k+N − tp,k ≤ (N + 1)TP + TQS .
Proof With Lemma 15 and (QS), it follows that
tp,k+N − tp,k ≤ tlast,k+N − t f irst,k
= (tlast,k+N − tlast,k−1) + (tlast,k−1 − t f irst,k)
≤ (N + 1)TP + TQS .
The lemma follows. unionsq
123
346 M. Függer, U. Schmid
Lemma 17 (Progress by (BS)) If p is a correct node which
sends tick k ≥ 1 at tp,k , then p sends tick k + N , N ≥ 1, at
tp,k+N with
tp,k+N − tp,k ≤ N TP + TBS(k).
Proof With Lemma 15 and (BS), it follows that
tp,k+N − tp,k ≤ tlast,k+N − t f irst,k
= (tlast,k+N − tlast,k) + (tlast,k − t f irst,k)
≤ N TP + TBS(k).
The lemma follows. unionsq
5.3 Top proof layer
We are now ready for establishing our major results. The first
one, Theorem 2, bounds the precision π of our algorithm, i.e.,
shows that for every pair of correct nodes p, q ∈ C : ∀t ≥
0 : |#bq(t) − #bp(t)| ≤ π .





+ 1 is a valid
precision-bound.
Proof Let p, q be two distinct correct nodes. Clearly for all
t, |#bq(t)−#bp(t)| ≤ bmax (t)−bmin(t). We will bound this
term by distinguishing three cases for t : (i) t ∈ [0, tlast,1
)
,




for some k ≥ 1.
ad (i): Since t ∈ [0, tlast,1
)
, we have bmax (t) ≤ bmax (t→last,1)
and bmin(t) = 0. Thus,
bmax (t) − bmin(t)
= bmax (t) ≤ bmax (t→last,1)
=
(
bmax (t→last,1) − bmax (t f irst,1)
)
+ bmax (t f irst,1)
≤
⌈
















τ+G E Q ,
τ+G R
}































where (52) follows from Lemma 11 and (53) from Lemma 9.
ad (ii):
bmax (tlast,k) − bmin(tlast,k)




bmax (t f irst,k+1) − bmin(tlast,k)
)














where (55) follows from Lemma 12 and (56) from (QS).
ad (iii): Since t ∈ (tlast,k, tlast,k+1
)
, we have bmax (t) ≤
bmax (t→last,k+1). Thus,
bmax (t) − bmin(t)




bmax (t f irst,k+2) − bmin(t)
)
= (bmax (t) − bmax (t f irst,k+2)
) + (k + 2 − k)














where (57) follows from Lemma 11 and (58) from (QS).
Combining (54), (56) and (58) provides a precision bound
















+ 1 = π
The theorem follows. unionsq
Our next major result, Theorem 3 (Accuracy), bounds the
number of ticks generated locally at a correct node p dur-
ing some real-time interval (t1, t2], i.e., allows to make state-
ments about the local clock frequency. For example, it reveals
that the long-term frequency is within
[
1/TP , 1/T −f irst
]
.
Theorem 3 (Accuracy). Given t1 and t2 with t2 > t1 ≥ tp,1,






t2 − t1 − max
{



















Fault-tolerant tick generation in VLSI 347
Proof To prove the accuracy upper bound, we start from
#bp(t2) − #bp(t1) ≤ bmax (t2) − bmin(t1)
≤ (bmax (t2) − bmax (t1)
) +
(




Both terms of (59) can be bounded by applying Lemma 13
and Theorem 2, respectively, which yields






bmax (t1) − bmin(t1) ≤ π, (61)
thus yielding






Moreover, from Lemma 14, it follows that






A combination of both bounds (62) and (63) leads to











The upper bound follows.
To prove the accuracy lower bound, let k = #bp(t1) ≥ 1
and N ≥ 0, such that, k + N = #bp(t2). Clearly such a k
and N exist since p has sent tick 1 by t1. By the definition
of k and N ,
tp,k ≤ t1 (64)
tp,k+N+1 > t2. (65)
For k ≥ 2 we can apply Lemma 16 together with (64) and
(65), yielding
t2 − t1 < tp,k+N+1 − tp,k (66)
≤ (N + 2)TP + TQS








For k ≥ 1, we apply Lemma 17 to (66), which yields
t2 − t1 < (N + 1)TP + TBS(k)








Combining the bounds (67), (68) and the trivial bound 0
yields (with #bp(t1) = k)
























if k ≥ 2.
(69)
Note that the term min
{
TBS(k), TQS + TP
}
for k ≥ 2 in
(69) accounts for the fact that correct nodes may be syn-
chronized very tightly after booting (within TBS(k)), such
that TQS + TP would be too conservative. From the defi-
nition of TBS(k) in (42), however, it follows that the initial
synchrony from booting usually cannot be maintained: For
T −f irst < TP , which is typically the case in real systems,
we obtain limk→∞ TBS(k) = ∞. Thus, for some k0,∀k ≥
k0 : TQS + TP < TBS(k), i.e., the constant bound from (QS)
will be tighter, which prevents the nodes from drifting apart
further.
In case k is not known, a valid bound is the minimum of
all lower bounds. By the reasons given above, it is not con-
servative to utilize the bound min
{
TBS(k), TQS + TP
} ≤












The lower bound follows.
The theorem follows. unionsq






+ π becomes asymptotically determining for
the upper bound for large enough t2 − t1. From this and the
lower bound, we immediately obtain the long-term frequency
bounds of
[
1/TP , 1/T −f irst
]
.
Our final Theorems 4 and 6 establish bounds on the size
of the local and remote pipeline. We start with a technical
Lemma 18, which bounds the maximum time some tick can
exist in the system before it is eliminated by the Diff-Gate in
all pipepairs associated with correct nodes.
Lemma 18 If the first correct node has sent tick k ≥ 2 by
time t, then every correct node p ∈ C has removed tick k −2
from all its pipepairs corresponding to correct nodes q ∈ C
by time t + Tdel , with
Tdel := TQS + max{τ+loc, τ+rem} + τ+Di f f , (70)
or, which is equivalent
#dp,q(t + Tdel) ≥ bmax (t) − 2.
Proof See Lemma 18 in Appendix A. unionsq
Theorem 4 (Local pipeline size bound). For every pair of









348 M. Függer, U. Schmid
Proof Choose two arbitrary distinct correct nodes p, q and
consider #slocp,q(t). We distinguish between two cases for t :
(i) t ≥ t f irst,2 + Tdel and (ii) t < t f irst,2 + Tdel .
ad (i):
#slocp,q(t) = #rlocp,q(t) − #dp,q(t)
≤ #bp(t − τ−loc) − #dp,q(t)
≤ bmax (t − τ−loc) − #dp,q(t)
= (bmax (t − τ−loc) − bmax (t − Tdel)
)









where (72) follows from applying Lemmas 13 and 18.
ad (ii): Since #sremp,q (t) ≤ #rremp,q (t) + 1 must always hold,
because the local pipe may contain at most all ticks received
so far plus the initial tick 0, we obtain
#slocp,q(t) ≤ #rlocp,q(t) + 1
≤ #bp(t − τ−loc) + 1
≤ bmax (t − τ−loc) + 1
≤ bmax (t f irst,2 + Tdel − τ−loc) + 1
≤
(
bmax (t f irst,2 + Tdel − τ−loc)
−bmax (t f irst,2)
)








where (74) follows from Lemma 12.
The theorem follows in both cases. unionsq
In order to bound the size of the remote pipelines, we can
use exactly the same derivation based on Lemma 18 as used
in Theorem 4 (with remote delays instead of local delays) to
obtain the following Theorem 5.
Theorem 5 For every pair of distinct correct nodes p, q ∈






However, the resulting bound is overly large in most
parameter settings: In order to maximize #sremp,q (t), we need a
scenario where the local node p is slow and the remote node
q is fast. The time Tdel established by Lemma 18 is too con-
servative for this case, however, since it actually considers q
being slow. Lemma 19 provides a refined result.
Lemma 19 If k := bmax (t) ≥ 2, then for every pair of cor-
rect nodes p, q ∈ C, with
T locdel := TQS + τ+loc + τ+Di f f , (75)
t ′ := t + T locdel − τ+rem − τ+Di f f ,
k′ := #bq(t ′),
the following holds:
(a) If k′ ≥ k − 1, then tick k − 2 is removed from pipepair
(p, q) by time t + T locdel , i.e.,
#dp,q(t + T locdel ) ≥ bmax (t) − 2.
(b) If k′ ≤ k − 2, then tick k′ − 1 is removed from pipepair
(p, q) by time t + T locdel , i.e.,
#dp,q(t + T locdel ) ≥ #bq(t ′) − 1.
Proof To prove case (a), assume k′ ≥ k − 1, which implies
tq,k−1 ≤ t ′. Hence,
tq,k−1 + τ+rem + τ+Di f f ≤ t ′ + τ+rem + τ+Di f f = t + T locdel ,
and, by (QS) and our assumption t f irst,k ≤ t ,
tp,k−1 + τ+loc + τ+Di f f ≤ t f irst,k + TQS + τ+loc + τ+Di f f
≤ t + T locdel . (76)
We can now apply Lemma 5, which reveals that tick k − 2 is
removed from the pipepair (p, q) by time trmv,k−2, with
trmv,k−2 ≤ max{tp,k−1 + τ+loc, tq,k−1 + τ+rem} + τ+Di f f
≤ t + T locdel
as asserted.
To prove case (b), assume k′ ≤ k − 2, which implies
tq,k−1 > t ′ and tq,k′ ≤ t ′. Hence,
tq,k′ + τ+rem + τ+Di f f ≤ t ′ + τ+rem + τ+Di f f = t + T locdel ,
tq,k−1 + τ+rem + τ+Di f f > t ′ + τ+rem + τ+Di f f = t + T locdel .
Since (76) also holds in case (b) and trivially tp,k′ ≤ tp,k−1,
we find
tp,k′ + τ+loc + τ+Di f f ≤ t + T locdel .
We can again apply Lemma 5, which reveals that tick k′ − 1
is removed from the pipepair (p, q) by time trmv,k′−1, with
trmv,k′−1 ≤ max{tp,k′ + τ+loc, tq,k′ + τ+rem} + τ+Di f f
≤ t + T locdel .
as asserted. The lemma follows. unionsq
Now we can establish our final major Theorem 6.
123
Fault-tolerant tick generation in VLSI 349
Theorem 6 (Remote pipeline size bound). For every pair
of distinct correct nodes p, q ∈ C, #sremp,q (t) is bounded by
Srem, with Srem :=
max
{⌊











Proof Choose two arbitrary distinct correct nodes p, q and
consider #sremp,q (t). Since we will apply Lemma 19, we dis-
tinguish the following cases:
Case (a): Suppose it holds that t ≥ t f irst,2 + T locdel as well as
#bq(t − τ+rem − τ+Di f f ) ≥ bmax (t − T locdel )−1. Then, we find
#sremp,q (t) = #rremp,q (t) − #dp,q(t)
≤ #bq(t − τ−rem) − #dp,q(t)
≤ bmax (t − τ−rem) − #dp,q(t)
=
(













where (77) follows from applying Lemmas 13 and 19.
Case (b): Suppose it holds that t ≥ t f irst,2 + T locdel , but now
#bq(t − τ+rem − τ+Di f f ) ≤ bmax (t − T locdel )−2. Then, we find
#sremp,q (t) = #rremp,q (t) − #dp,q(t)
≤ #bq(t − τ−rem) − #dp,q(t)
= #bq(t − τ−rem) − #bq(t − τ+rem − τ+Di f f )
+ #bq(t − τ+rem − τ+Di f f ) − #dp,q(t)
≤
⌈





where (78) follows from Lemmas 14 and 19. Note that τ+rem −
τ−rem + τ+Di f f is always non-negative.
Case (c): Finally suppose that t < t f irst,2 + T locdel . Since
#sremp,q (t) ≤ #rremp,q (t) + 1 must always hold, because the
remote pipe may contain at most all ticks received so far
plus the initial tick 0, we obtain
#sremp,q (t) ≤ #rremp,q (t) + 1
≤ #bq(t − τ−rem) + 1
≤ bmax (t − τ−rem) + 1
≤ bmax (t f irst,2 + T locdel − τ−rem) + 1
≤
(
bmax (t f irst,2 + T locdel − τ−rem)
− bmax (t f irst,2)
)
+ bmax (t f irst,2) + 1 (79)
≤
⌊





where (80) follows from Lemma 12.
The theorem follows in all cases. unionsq
6 Conclusions
We introduced the DARTS clock generation approach, which
has been derived from a well-known distributed fault-tolerant
tick generation algorithm and adapted for direct implementa-
tion in hardware. Major modifications had to be applied to the
original distributed algorithm in order to adapt to the inherent
fine-grain parallelism and limited resources of VLSI hard-
ware implementations. DARTS provides a set of local clock
signals, driving the subsystems of a SoC, for example, which
are closely synchronized to each other. Our approach does
not need quartz oscillators or the like, is guaranteed to start
initially, and generates a clock frequency that automatically
adapts to the current operating conditions.
The resulting algorithm (and its synchronization preci-
sion) are only weakly dependent on the implementation tech-
nology and the actual placement and routing on a chip: It
just requires that the ratio of certain path delays, rather than
the delays themselves, satisfy a few moderate constraints.
The algorithm itself depends on these constraints only via the
number of stages used in the elastic pipelines. Hence, there is
usually no need to modify the algorithm when using a differ-
ent implementation technology and/or a different placement
& routing in an SoC.
We also provided a rigorous correctness proof and a worst
case performance analysis, which employs a novel frame-
work for the specification and analysis of distributed algo-
rithms that are directly implementable in clockless digital
logic. It shows that a system incorporating n ≥ 3 f +2 clock
generation units (TG-Algs) can cope with up to f Byzan-
tine faulty TG-Algs. Our proof rests on simple properties of
some elementary building blocks only, which can be verified
by digital design tools. Hence, the correctness of a system
of any size n can be guaranteed if the low-level building
blocks are implemented correctly. Note that a comparable
result cannot be established via model checking.
Backed up by the lessons learned in the DARTS project,
and encouraged by the conclusions of a recent Dagstuhl sem-
inar [9] devoted to the topic, we feel justified to claim that the
proposed modeling framework, the building blocks, and the
problems solved in DARTS are paradigmic for fault-tolerant
clockless algorithms in VLSI in general. We will conclude
our paper with a few arguments in favor of this claim.
In classic clockless VLSI circuits, each module hand-
shakes with its predecessor and its successor modules in a
123
350 M. Függer, U. Schmid
“wait-for-all” manner, i.e., it waits until it has received a valid
handshake signal from all its predecessor modules before it
generates the next handshake signal for its predecessor mod-
ules. Clearly, this approach is infeasible if some modules may
fail and hence generate erroneous or no handshake signals. In
this case, however, it is natural to replace the “wait-for-all” by
thresholds: Instead of waiting for all predecessors, a module
only waits for a sufficiently large subset of its predecessors to
complete the handshaking. Still, for repeated threshold-based
handshaking, it is instrumental not to mix up handshake sig-
nals that have not been used for passing the threshold earlier
with “fresh” ones. Obviously, it is exactly this kind of behav-
ior that is encountered in a single DARTS TG-Alg module:
It generates the next tick if a sufficiently large number of
its predecessor modules (i.e., other TG-Alg modules) have
generated a tick, and never mixes up old and new ticks.
As a consequence, we are convinced that both (i) the build-
ing blocks of DARTS, and (ii) the switch from transition logic
(tick broadcast, elastic pipelines), which is typical for stan-
dard clockless circuits, to state logic (PCSGs and threshold
modules, which are standard synchronous circuits), and back
to transition logic, are not specific to DARTS but paradigmic
for fault-tolerant clockless circuits in general [28].
Part of our current and future research is devoted to further
substantiating this claim: We recently completed some work
on an important building block for a self-stabilizing variant
of DARTS, which will allow to dismiss the simultaneous
booting restriction and to transparently recover from tran-
sient failures, at the price of higher circuit complexity. More-
over, we started working on how to make other fault-tolerant
distributed algorithms, including consensus, amenable to a
direct implementation in digital hardware. Needless to say,
our modeling framework, which captures all the peculiari-
ties of such systems without unnecessary overhead, proved
instrumental for all this work.
Acknowledgments The contributions of Johann Vilanek (preliminary
simulations and experiments, and Diff-Gate design), Markus Ferrin-
ger (FPGA prototype), Thomas Handl (tools and library setup), and
Andreas Dielacher (pipelined DARTS) are gratefully acknowledged.
Valuable feedback on the design and implementation of the TG-Algs,
which were primarily conducted by Gottfried Fuchs and Gerald Kempf,
was provided by Andreas Steininger and Josef Widder. We would fur-
ther like to thank the anonymous reviewers for their valuable feedback.
Open Access This article is distributed under the terms of the Creative
Commons Attribution Noncommercial License which permits any
noncommercial use, distribution, and reproduction in any medium,
provided the original author(s) and source are credited.
Appendix A: Proof of technical lemmas
In this appendix, we provide the proofs of some technical
lemmas, which have been omitted from the main text in order
to improve readability.
Lemma 2 If, for a correct node p ∈ C and a different cor-
rect node q ∈ C \ {p}, at time t it holds that
(#rlocp,q(t) = k) ∧ (#slocp,q(t) = 1)
for some k ≥ 1, then it must hold that
#rlocp,q(t − τ−Di f f ) ≥ k and
#rremp,q (t − τ−Di f f ) ≥ k.
Proof
(#rlocp,q(t) = k) ∧ (#slocp,q(t) = 1)
≡ (#rlocp,q(t) = k) ∧ (#rlocp,q(t) − #dp,q(t) = 1)
⇒ #dp,q(t) = k − 1 (81)
Let trmv,k−1 be the time tick k − 1 is removed from the pipe-
pair (p, q). From (81) it follows that
trmv,k−1 ≤ t. (82)
Now assume by contradiction that
#rlocp,q(t − τ−Di f f ) < k or
#rremp,q (t − τ−Di f f ) < k.
(83)
Denoting by tloc,k (respectively trem,k) the time the local
(respectively remote) tick k is received in the pipepair (p, q),
it follows that
tloc,k > t − τ−Di f f respectively (84)
trem,k > t − τ−Di f f . (85)
Combination of (82) with (84) (respectively (85)) yields
trmv,k−1 < tloc,k + τ−Di f f respectively (86)
trmv,k−1 < trem,k + τ−Di f f . (87)
From the behavioral specification of the Diff-gate, however,
we know that
trmv,k−1 ≥ tloc,k + τ−Di f f and
trmv,k−1 ≥ trem,k + τ−Di f f ,
contradicting (86) and (87). The lemma follows. unionsq
Lemma 7 For all correct nodes p and ticks k ≥ 1: If there
exists a set Q of correct nodes with |Q| ≥ 2 f + 1, such that,
for all q ∈ Q:
(i) all nodes q send tick k by tQ,k ,
(ii) p sends tick k by tp,k ,
then p sends tick k + 1 by t ′ := max{tp,k + τ+loc + τ+Di f f +
τ+G E Q, tp,k +τ+loc +τ+G R, tQ,k +τ+rem +τ+Di f f +τ+G E Q}+τ+T H .
Proof Wlog. assume that k ∈ Neven . We distinguish two
cases:
123
Fault-tolerant tick generation in VLSI 351
(a) Suppose that ∃q ∈ Q : #rlocp,q(t ′) > k. Since #rlocp,q(t ′) ≤
#bp(t ′), as no tick can be locally received if it has not
been sent before, #bp(t ′) > k must hold. Thus, p has
sent tick k + 1 by t ′. The lemma follows.
(b) Otherwise, assume that
∀q ∈ Q : #rlocp,q(t ′) ≤ k. (88)
By applying Lemma 5, we conclude that tick k − 1 is
removed from all of p’s pipepairs (for all q ∈ Q) by
tdel,k−1 := max{tp,k + τ+loc, tQ,k + τ+rem} + τ+Di f f .
Clearly, since all nodes q ∈ Q are correct and have sent
tick k by tQ,k , tick k must be received in the remote pipe
of (p, q) by trcv,Q := tQ,k +τ+rem . Furthermore, p must
receive tick k in all its local pipes by trcv,p := tp,k+τ+loc.
Consequently, at trcv := max{trcv,p, trcv,Q, tdel,k−1} ≤
max{tp,k + τ+loc, tQ,k + τ+rem} + τ+Di f f with trcv ≤ t ′, it
holds that:
#rlocp,q(trcv) ≥ k #rremp,q (trcv) ≥ k #dp,q(t) ≥ k − 1
(89)
By combination of (89) with (88), it holds that for all
ξ ∈ [trcv, t ′
]
:
#rlocp,q(ξ) = k. (90)
Furthermore,
˜PG E Q,ep,q (ξ)
≡ (#rremp,q (ξ) ≥ #rlocp,q(ξ))
∧(#rlocp,q(ξ) ∈ Neven) ∧ (#slocp,q(ξ) = 1)
≡ (#rremp,q (ξ) ≥ k) ∧ (#rlocp,q(ξ) − #dp,q(ξ) = 1)
≡ (#dp,q(ξ) = k − 1). (91)
Assuming ∃q ∈ Q : #dp,q(ξ) > k − 1 implies
#rlocp,q(ξ) > k, contradicting (90). Thus ∀q ∈ Q :
#dp,q(ξ) = k − 1 must hold. Therefore, ∀q ∈ Q :





By the algorithm (specification of the PCSG to thresh-
old module channels), ∀q ∈ Q : ˜G E Qep,q(ξ) is true for
ξ ∈
[
trcv + τ+G E Q, t ′
]
. Again by the algorithm (speci-
fication of the threshold modules),
˜T H G E Qep(ξ) (92)
is true for any time ξ within
[




max{tp,k + τ+loc, tQ,k + τ+rem} + τ+Di f f + τ+G E Q + τ+T H , t ′
]
.
It remains to be shown that the disabling path can-
not inhibit the generation of tick k + 1 at p. For the
sake of contradiction, assume that the disabling path
can enforce tp,k+1 > t ′.
Because of the lemma’s assumption (ii), tick k must
eventually be received in all of p’s local pipes cor-
responding to correct nodes r , i.e., ∀r ∈ C \ {p} :
#rlocp,r (ξ) ≥ k, for ξ ∈
[
tp,k + τ+loc, t ′
]
. Since tick k + 1
is not generated by t ′, we actually have #rlocp,r (ξ) = k.
As k ∈ Neven , this implies
∀r ∈ C \ {p} : ¬
(
˜PG R,op,r (ξ) ∨ ˜PG E Q,op,r (ξ)
)
.
Consequently, by the algorithm (specification of the
PCSG to threshold module channels and the threshold
modules), together with the fact that there are only up




˜T H G R
o







tp,k + τ+loc + max{τ+G R, τ+G E Q} + τ+T H , t ′
]
.
Combining (92) and (93) and noting that max{τ+G E Q
+τ+Di f f , max{τ+G R, τ+G E Q}}=max{τ+G E Q+τ+Di f f , τ+G R},
it is apparent that p must send tick k+1 by t ′, providing
the required contradiction. The lemma follows.
The lemma follows in both cases. unionsq
Lemma 12 (Maximum Increase of bmax in (t f irst,k, t
]).
If the first correct node sends tick k ≥ 1 at t f irst,k , then
∀t > t f irst,k : bmax (t) − bmax (t f irst,k) ≤
⌊
t − t f irst,k
T −f irst
⌋
or, equivalently: The number N of ticks sent by the correct
first node in the interval I = (tt f irst,k , t
]






Proof Let t f irst, j be the time when the first correct node
sends tick j , and assume by contradiction that
N ≥
⌊




According to the definition of N ,
t f irst,k+N ≤ t. (95)
123
352 M. Függer, U. Schmid
By applying Lemma 10 to t f irst,k and t f irst,k+N , and recall-
ing x < x + 1 for all real x , we find
t f irst,k+N − t f irst,k ≥ N T −f irst
≥
(⌊






> t − t f irst,k . (96)
Clearly, (96) contradicts (95). The lemma follows. unionsq
Lemma 13 (Maximum Increase of bmax in (t, t ′]).
∀t ′ > t : bmax (t ′) − bmax (t) ≤
⌈
t ′ − t
T −f irst
⌉
or, equivalently: The number N of ticks sent by the correct






Proof Let t f irst, j the time when the first correct node sends
tick j . We distinguish two cases: (i) N ≥ 1, that is, ∃k :
t f irst,k+1 ∈ I and (ii) N = 0.
ad (i): Assume by contradiction that
N ≥
⌈




According to the definition of N and the assumption N ≥ 1,
there must be some k such that
t f irst,k+1 > t (98)
t f irst,k+N ≤ t ′. (99)
By applying Lemma 10 to t f irst,k+1 and t f irst,k+N , we find
t f irst,k+N − t > t f irst,k+N − t f irst,k+1
≥ (N − 1)T −f irst
≥
⌈




≥ t ′ − t . (100)
Clearly, (100) contradicts (99). The lemma follows.





holds for t ′ > t . The
lemma follows.
The lemma follows in both cases. unionsq




∀t ′ > t : #bp(t ′) − #bp(t) ≤
⌈
t ′ − t
Tmin
⌉
or, equivalently: The number N of ticks sent by the correct






Proof Let tp, j the time when node p sends tick j . We dis-
tinguish two cases: (i) N ≥ 1, i.e., ∃k : tp,k+1 ∈ I and (ii)
N = 0.
ad (i): Assume by contradiction that
N ≥
⌈




According to the definition of N and the assumption N ≥ 1,
there must be some k such that
tp,k+1 > t (102)
tp,k+N ≤ t ′. (103)
By applying Lemma 4 to tp,k+1 and tp,k+N , we find
tp,k+N − t > tp,k+N − tp,k+1
≥ (N − 1)Tmin
≥
⌈




≥ t ′ − t . (104)
Clearly, (104) contradicts (102). The lemma follows.





holds for t ′ > t and the
lemma follows.
The lemma follows in both cases. unionsq
Lemma 18 If the first correct node has sent tick k ≥ 2 by
time t, then every correct node p ∈ C has removed tick k −2
from all its pipepairs corresponding to correct nodes q ∈ C
by time t + Tdel , with
Tdel := TQS + max{τ+loc, τ+rem} + τ+Di f f , (105)
or, which is equivalent
#dp,q(t + Tdel) ≥ bmax (t) − 2.
Proof Consider a pair of pipes (p, q) located at p ∈ C , cor-
responding to a different q ∈ C . Furthermore, assume that
bmax (t) ≥ k holds at time t . We are interested in how much
later tick k − 2 is removed from this pipepair:
Clearly, t f irst,k ≤ t , and by (QS),
tp,k−1 ≤ tlast,k−1
≤ t f irst,k + TQS and (106)
tq,k−1 ≤ t f irst,k + TQS . (107)
We can now apply Lemma 5 in combination with (106) and
(107), which reveals that tick k − 2 is removed from the
pipepair at p corresponding to q by time trmv,k−2, with
trmv,k−2 ≤ max{tp,k−1 + τ+loc, tq,k−1 + τ+rem} + τ+Di f f
≤ t f irst,k + TQS + max{τ+loc, τ+rem} + τ+Di f f
= t f irst,k + Tdel
≤ t + Tdel .
The lemma follows. unionsq
123
Fault-tolerant tick generation in VLSI 353
References
1. Attiya, H., Herzberg, A., Rajsbaum, S.: Optimal clock synchroni-
zation under different delay assumptions. SIAM J. Comput. 25(2),
369–389 (1996)
2. Bar-Noy, A., Dolev, D.: Consensus algorithms with one-bit mes-
sages. Distrib. Comput. 4, 105–110 (1991)
3. Barros, J.C., Johnson, B.W.: Equivalence of the arbiter, the syn-
chronizer, the latch, and the inertial delay. IEEE Trans. Com-
put. 32(7), 603–614 (1983)
4. Baumann, R.: Soft errors in advanced computer systems. IEEE
Des. Test Comput. 22(3), 258–266 (2005)
5. Belluomini, W., Myers, C.J.: Verification of timed systems using
posets. In: Computer Aided Verification, pp. 403–415 (1998)
6. Bhamidipati, R., Zaidi, A., Makineni, S., Low, K., Chen, R., Liu,
K.-Y., Dalgrehn, J.: Challenges and methodologies for implement-
ing high-performance network processors. Intel Technol. J. 6(3),
83–92 (2002)
7. Black, D.L.: On the existince of delay-insensitive fair arbiters: trace
theory and its limitations. Distrib. Comput. 1, 205–225 (1986)
8. Chapiro, D.M.: Globally-Asynchronous Locally-Synchronous
Systems. PhD thesis, Stanford University (1984)
9. Charron-Bost, B., Dolev, S., Ebergen, J., Schmid, U.: 08371 sum-
mary—fault-tolerant distributed algorithms on VLSI chips. In:
Charron-Bost, B., Dolev, S., Ebergen, J., Schmid, U. (eds.) Fault-
Tolerant Distributed Algorithms on VLSI Chips, number 08371 in
Dagstuhl Seminar Proceedings, Dagstuhl, Germany, 2009. Schloss
Dagstuhl—Leibniz-Zentrum fuer Informatik, Germany
10. Clarke, E.M.: Editorial: distributed computing issues in hardware
design. Distrib. Comput. 1, 185–186 (1986)
11. Constantinescu, C.: Trends and challenges in VLSI circuit reliabil-
ity. IEEE Micro 23(4), 14–19 (2003)
12. Dolev, D., Halpern, J.Y., Strong, H.R.: On the possibility and
impossibility of achieving clock synchronization. J. Comput. Syst.
Sci. 32, 230–250 (1986)
13. Dolev, S., Haviv, Y.: Self-stabilizing microprocessors, analyzing
and overcoming soft-errors. IEEE Trans. Comput. 55(4), 385–
399 (2006)
14. Dolev, S., Tzachar, N.: Brief announcment: Corruption resilient
fountain codes. In: Taubenfeld, G. (ed.) Distributed Comput-
ing, Lecture Notes in Computer Science, vol. 5218, pp. 502–503.
Springer, Berlin/Heidelberg (2008)
15. Dyer, C., Rodgers, D.: Effects on spacecraft & aircraft electronics.
In: Proceedings ESA Workshop on Space Weather, ESA WPP-155,
pp. 17–27. ESA, Nordwijk, The Netherlands (1998)
16. Ebergen, J.C.: A formal approach to designing delay-insensitive
circuits. Distrib. Comput. 5, 107–119 (1991)
17. Fairbanks, S.: Method and apparatus for a distributed clock gener-
ator, 2004. US patent no. US2004108876
18. Fairbanks, S., Moore, S.: Self-timed circuitry for global clocking.
In: Proceedings of the Eleventh International IEEE Symposium on
Asynchronous Circuits and Systems, pp. 86–96 (2005)
19. Ferri, C., Moreshet, T., Iris Bahar, R., Benini, L., Herlihy,
M.: A hardware/software framework for supporting transactional
memory in a MPSoC environment. SIGARCH Comput. Archit.
News 35(1), 47–54 (2007)
20. Ferringer, M., Fuchs, G., Steininger, A., Kempf, G.: VLSI Imple-
mentation of a Fault-Tolerant Distributed Clock Generation. In:
IEEE International Symposium on Defect and Fault-Tolerance in
VLSI Systems (DFT2006), pp. 563–571 (2006)
21. Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of dis-
tributed consensus with one faulty process. J. ACM 32(2), 374–
382 (1985)
22. Friedman, E.G.: Clock distribution networks in synchronous digital
integrated circuits. Proc. IEEE 89(5), 665–692 (2001)
23. Friedman, R., Mostefaoui, A., Rajsbaum, S., Raynal, M.: Asyn-
chronous agreement and its relation with error-correcting
codes. IEEE Trans. Comput. 56(7), 865–875 (2007)
24. Fuchs, G.: Fault-Tolerant Distributed Algorithms for On-Chip
Tick Generation: Concepts, Implementations and Evaluations. PhD
thesis, Vienna University of Technology, Fakultät für Informatik
(2009)
25. Fuchs, G., Függer, M., Steininger, A.: On the threat of metasta-
bility in an asynchronous fault-tolerant clock generation scheme.
In: 15th IEEE International Symposium on Asynchronous Circuits
and Systems (ASYNC’09), pp. 127–136, Chapel Hill, N. Carolina,
USA (2009)
26. Fuchs, G., Függer, M., Steininger, A., Zangerl, F.: Analysis of con-
straints in a fault-tolerant distributed clock generation scheme. In:
3rd International Workshop on Dependable Embedded Systems
(WDES’06) (2006)
27. Fuchs, G., Steininger, A.: VLSI implementation of a distributed
algorithm for fault-tolerant clock generation. J. Electr. Comput.
Eng. 2011, 23 (2011). doi:10.1155/2011/936712
28. Függer, M.: Analysis of On-Chip Fault-Tolerant Distributed Algo-
rithms. PhD thesis, Technische Universität Wien, Institut für
Technische Informatik, Treitlstr. 1-3/182-2, 1040 Vienna, Austria
(2010)
29. Gadlage, M.J., Eaton, P.H., Benedetto, J.M., Carts, M., Zhu, V.,
Turflinger, T.L.: Digital device error rate trends in advanced CMOS
technologies. IEEE Trans. Nucl. Sci. 53(6), 3466–3471 (2006)
30. Grahsl, J., Handl, T., Steininger, A.: Exploring the usefulness of
the gate-level stuck-at fault model for Muller C-elements. In: Pro-
ceedings 20. Workshop für Testmethoden und Zuverlässigkeit von
Schaltungen und Systemen (TuZ’08), pp. 165–169, Vienna, Aus-
tria (2008)
31. Halpern, J.Y., Megiddo, N., Munshi, A.A.: Optimal precision in
the presence of uncertainty. J. Complex. 1(2), 170–196 (1985)
32. Hauck, S.: Asynchronous design methodologies: an over-
view. Proc. IEEE 83(1), 69–93 (1995)
33. Hoyme, K., Driscoll, K.: Safebus. In: Proceedings IEEE/AIAA
11th Digital Avionics Systems Conference, pp. 68–73 (1992)
34. International technology roadmap for semiconductors (2007)
35. Jang, W., Martin, A.J.: SEU-tolerant QDI circuits. In: Proceed-
ings 11th Int’l Symposium on Asynchronous Circuits and Systems
(ASYNC’05), pp. 156–165 (2005)
36. Karnik, T., Hazucha, P., Patel, J.: Characterization of soft errors
caused by single event upsets in CMOS processes. IEEE Trans.
Dependable Secur. Comput. 1(2), 128–143 (2004)
37. Kaynar, D.K., Lynch, N., Segala, R., Vaandrager, F.: Timed I/O
automata: a mathematical framework for modeling and analyzing
real-time systems. In: Proceedings 24th IEEE International Real-
Time Systems Symposium (RTSS’03), vol. 00, 166–177 (2003)
38. Kieckhafer, R.M., Walter, C.J., Finn, A.M., Thambidurai,
P.M.: The MAFT architecture for distributed fault tolerance. IEEE
Trans. Comput. 37, 398–405 (1988)
39. Kopetz, H., Grünsteidl, G.: TTP-A protocol for fault-tolerant real-
time systems. Computer 27(1), 14–23 (1994)
40. Koren, I., Koren, Z.: Defect tolerance in VLSI circuits: techniques
and yield analysis. Proc. IEEE 86(9), 1819–1838 (1998)
41. Lamport, L.: Buridan’s principle. Technical report, SRI Technical
Report (1984)
42. Lamport, L.: Specifying Systems, The TLA+ Language and Tools
for Hardware and Software Engineers. Addison-Wesley, Bos-
ton (2002)
43. Lamport, L.: Arbitration-free synchronization. Distrib. Com-
put. 16(2/3), 219–237 (2003)
44. Le Lann, G., Schmid, U.: How to implement a timer-free perfect
failure detector in partially synchronous systems. Technical Report
183/1-127, Department of Automation, Technische Universität
123
354 M. Függer, U. Schmid
Wien, January 2003. (Replaced by Research Report 28/2005,
Institut für Technische Informatik, TU Wien, 2005.)
45. Lynch, N.: Distributed Algorithms. Morgan Kaufman, San Fran-
cisco (1996)
46. Maheshwari, A., Koren, I., Burleson, W.: Accurate estimation of
Soft Error Rate (SER) in VLSI circuits. In: Proceedings of the 2004
IEEE International Symposium on Defect and Fault Tolerance in
VLSI Systems, pp. 377–385 (2004)
47. Marino, L.: General theory of metastable operation. IEEE Trans.
Comput. C-30(2), 107–115 (1981)
48. Martin, A.J.: Compiling communicating processes into delay-
insensitive VLSI circuits. Distrib. Comput. 1, 226–234 (1986)
49. Martin, A.J.: The limitations to delay-insensitivity in asynchronous
circuits. In: AUSCRYPT ’90: Proceedings of the sixth MIT con-
ference on Advanced research in VLSI, pp. 263–278. MIT Press,
Cambridge, MA, USA (1990)
50. Maza, M.S., Aranda, M.L.: Analysis of clock distribution networks
in the presence of crosstalk and groundbounce. In: Proceedings
International IEEE Conference on Electronics, Circuits, and Sys-
tems (ICECS), pp. 773–776 (2001)
51. Maza, M.S., Aranda, M.L.: Interconnected rings and oscillators as
gigahertz clock distribution nets. In: GLSVLSI ’03: Proceedings
of the 13th ACM Great Lakes symposium on VLSI, pp. 41–44.
ACM Press (2003)
52. Metra, C., Francescantonio, S.D., Mak, T.M.: Implications of clock
distribution faults and issues with screening them during manufac-
turing testing. IEEE Trans. Comput. 53(5), 531–546 (2004)
53. Mitra, S., Seifert, N., Zhang, M., Shi, Q., Kim, K.S.: Robust sys-
tem design with built-in soft-error resilience. IEEE Comput. 38(5),
43–52 (2005)
54. Moscibroda, T., Mutlu, O.: Distributed order scheduling and its
application to multi-core DRAM controllers. In: Proceedings of
the 27th ACM Symposium on Principles of Distributed Computing
(PODC’08), pp. 365–374, Toronto, Canada (2008)
55. Myers, C.J., Meng, T.H.Y.: Synthesis of timed asynchronous cir-
cuits. IEEE Trans. VLSI Syst. 1(2), 106–119 (1993)
56. Nicolaidis, M.: GRAAL: a fault-tolerant architecture for enabling
nanometric technologies. In: Proceedings 13th IEEE Interna-
tional On-Line Testing Symposium (IOLTS’07), pp. 255–255
(2007)
57. Normand, E.: Single-event effects in avionics. IEEE Trans. Nucl.
Sci. 43(2), 461–474 (1996)
58. Ostrovsky, R., Patt-Shamir, B.: Optimal and efficient clock syn-
chronization under drifting clocks. In: PODC ’99: Proceedings
of the Eighteenth Annual ACM Symposium on Principles of
Distributed Computing, pp. 3–12. ACM, New York, NY, USA
(1999)
59. Palit, A.K., Meyer, V., Anheier, W., Schloeffel, J.: Modeling and
analysis of crosstalk coupling effect on the victim interconnect
using the ABCD network model. In: Proceedings of the 19th IEEE
International Symposium on Defect and Fault Tolerance in VLSI
Systems (DFT’04), pp. 174–182 (2004)
60. Patt-Shamir, B., Rajsbaum, S.: A theory of clock synchronization
(extended abstract). In: STOC ’94: Proceedings of the Twenty-
Sixth Annual ACM Symposium on Theory of computing, pp. 810–
819. ACM Press, New York, NY, USA (1994)
61. Polzer, T., Handl, T., Steininger, A.: A metastability-free multi-
synchronous communication scheme for socs. In: Proceedings of
the Stabilization, Safety, and Security of Distributed Systems, 11th
International Symposium, SSS 2009, Lyon, France, November 3–6,
2009, pp. 578–592 (2009)
62. Powell, D., Arlat, J., Beus-Dukic, L., Bondavalli, A., Coppola,
P., Fantechi, A., Jenn, E., Rabejac, C., Wellings, A.: GUARDS:
a generic upgradable architecture for real-time dependable sys-
tems. IEEE Trans. Parallel Distrib. Syst. 10(6), 580–599 (1999)
63. Ramanathan, P., Shin, K.G., Butler, R.W.: Fault-tolerant clock syn-
chronization in distributed systems. IEEE Comput. 23(10), 33–
42 (1990)
64. Restle, P.J. et al.: A clock distribution network for microproces-
sors. IEEE J. Solid-State Circuits 36(5), 792–799 (2001)
65. Rokicki, T., Myers, C.J.: Automatic verification of timed circuits.
In: Computer Aided Verification, pp. 468–480 (1994)
66. Schmid, U.: How to model link failures: A perception-based
fault model. In: Proceedings of the International Conference
on Dependable Systems and Networks (DSN’01), pp. 57–66,
Göteborg, Sweden (2001)
67. Schmid, U.: Keynote: distributed algorithms and VLSI. In: Pro-
ceedings of the 10th International Symposium on Stabilization,
Safety, and Security of Distributed Systems (SSS’08), Lecture
Notes in Computer Science, vol. 5340, page 3, Detroit, USA,
November 2008. Springer Verlag. (http://www.vmars.tuwien.ac.
at/documents/extern/2467/sss08.pdf)
68. Schmid, U., Klasek, J., Mandl, T., Nachtnebel, H., Cadek, G.R.,
Kerö, N.: A network time interface M-module for distributing GPS-
time over LANs. Real-Time Syst. 18(1), 24–57 (2000)
69. Schmid, U., Steininger, A.: Dezentrale Fehlertolerante Takt-
generierung in VLSI Chips. Research Report 69/2004, Techni-
sche Universität Wien, Institut für Technische Informatik, 2004.
International patent PCT WO2006/007619: EP 1769356, US
2009/0102534, ZL 200580024166.6, AT 501510
70. Seifert, N., Shipley, P., Pant, M.D., Ambrose, V., Gill, B.: Radi-
ation-induced clock jitter and race. In: Proceedings 43rd Annual
IEEE International Reliability Physics Symposium, pp. 215–222,
17–21 (2005)
71. Seitz, C.L.: System timing. In: Introduction to VLSI Systems,
pp. 218–262. Addison Wesley, Boston (1980)
72. Semiat, Y., Ginosar, R.: Timing measurements of synchronization
circuits. Int. Symp. Asynchr. Circuits Syst. 0, 68 (2003)
73. Shivakumar, P., Kistler, M., Keckler, S.W., Burger, D., Alvisi, L.:
Modeling the effect of technology trends on the soft error rate
of combinational logic. In: Proceedings of International Confer-
ence on Dependable Systems and Networks, DSN, pp. 389–398
(2002)
74. Simons, B., Lundelius-Welch, J., Lynch, N.: An overview of clock
synchronization. In: Simons, B., Spector, A. (eds.) Fault-Tolerant
Distributed Computing, LNCS 448, pp. 84–96. Springer, Berlin
(1990)
75. Srikanth, T.K., Toueg, S.: Optimal clock synchronization. J.
ACM 34(3), 626–645 (1987)
76. Stevens, K.S., Ginosar, R., Rotem, S.: Relative timing [asynchro-
nous design]. IEEE Trans. VLSI Syst. 11(1), 129–140 (2003)
77. Sutherland, I.E.: Micropipelines. Communications of the ACM,
Turing Award, 32(6), 720–738, June 1989. ISSN:0001-0782
78. Teehan, P., Greenstreet, M., Lemieux, G.: A survey and taxon-
omy of GALS design styles. IEEE Des. Test Comput. 24(5), 418–
428 (2007)
79. Thaker, D.D., Impens, F., Chuang, I.L., Amirtharajah, R., Chong,
F.T.: Recursive TMR: scaling fault tolerance in the nanoscale
era. IEEE Des. Test Comput. 22(4), 298–305 (2005)
80. Verdel, T., Makris, Y.: Duplication-based concurrent error detec-
tion in asynchronous circuits: shortcomings and remedies. In:
Proceedings 17th IEEE International Symposium on Defect and
Fault Tolerance in VLSI Systems (DFT 2002), pp. 345–353
(2002)
81. Widder, J., Le Lann, G., Schmid, U.: Failure detection with boot-
ing in partially synchronous systems. In: Proceedings of the 5th
European Dependable Computing Conference (EDCC-5), LNCS,
vol. 3463, pp. 20–37. Springer Budapest, Hungary (2005)
82. Widder, J., Schmid, U.: The theta-model: achieving synchrony
without clocks. Distrib. Comput. 22(1), 29–47 (2009)
123
Fault-tolerant tick generation in VLSI 355
83. Yakovlev, A., Lavagno, L., Sangiovanni-Vincentelli, A.: A uni-
fied signal transition graph model for asynchronous control circuit
synthesis. In: Proceedings of the 1992 IEEE/ACM international
conference on Computer-aided design (ICCAD’92), pp. 104–111.
IEEE Computer Society Press, Los Alamitos, CA, USA (1992)
84. Yoneda, T., Kitai, T., Myers, C.J.: Automatic derivation of
timing constraints by failure analysis. In: Proceedings 14th Interna-
tional Conference on Computer Aided Verification (CAV’02), Lec-
ture Notes in Computer Science, vol. 2404, pp. 195–208. Springer,
Berlin (2002)
123
