Abstract We revisit the approach to Byzantine fault-tolerant clock synchronization based on approximate agreement introduced by Lynch and Welch. Our contribution is threefold:
Introduction
When designing a synchronous distributed system, the most fundamental question is how to generate and distribute the system clock. This task is mission critical, both in terms of performance and reliability. With ever-growing complexity of hardware, reliable high-performance clocking becomes increasingly challenging; at the same time, the ramifications of clocking errors become harder to predict.
Against this background, it might be unsurprising that fault-tolerant distributed clock synchronization algorithms have found their way into real-world systems with high reliability demands: the Time-Triggered Protocol (TTP) [13] and FlexRay [9, 11] tolerate Byzantine (i.e., worst-case) faults and are utilized in cars and airplanes. Both of these systems derive from the classic fault-tolerant synchronization algorithm by Lynch and Welch [18] , which is based on repeatedly performing approximate agreement [5] on the time of the next clock pulse. Another application domain with even more stringent requirements is hardware for spacecraft and satellites. Here, a reliable system clock is in demand despite frequent transient faults due to radiation. In addition, quartz oscillators are prone to damage during launch, making the use of less accurate, electronic oscillators preferable.
Unfortunately, existing implementations are not self-stabilizing, i.e., do not guarantee automatic recovery from transient faults. This is essential for the space domain, but also highly desirable in the systems utilizing TTP or FlexRay. This claim is supported by the presence of various mechanisms that monitor the nodes and perform resets in case of observed faulty behavior in both protocols. Thus, it is of interest to devise synchronization algorithms that stabilize on their own, instead of relying on monitoring techniques: these need to be highly reliable as well, or their failure may bring down the system due to erroneous detection of or response to faults.
Against this backdrop, in this work we set out to answer the following questions:
1. Can the guarantees of [18] be further improved? In particular, how does the approach perform if the (relative) phase drift of the local clock sources are larger than for typical quartz oscillators?
2. Under which circumstances is it useful to apply the technique also to frequencies, i.e., algorithmically adjust clock rates?
3. Can the solution be made self-stabilizing?
Our Contribution. We obtain promising answers to the above questions, in the sense that conceptually simple (i.e., implementation-friendly!) variations on the Lynch-Welch approach achieve excellent performance guarantees. Specifically, we obtain the following main results.
1. We present a refined analysis of a variant of the Lynch-Welch algorithm. We show that the algorithm converges to a steady-state error E ∈ O((ϑ − 1)T + U ) , where hardware clock rates are between 1 and ϑ, messages take between d − U and d time to arrive at their destination, and T ∈ Ω(d) is the (nominal) time between consecutive clock pulses (i.e., the time required for a single approximate agreement step). This works even for very poor local clock sources: it suffices if ϑ ≤ 1.1, although the skew bound goes to infinity as ϑ approaches this critical value; for ϑ ≤ 1.01, the bound is fairly close to 2(ϑ − 1)T + 4U . 1
2. We give a second algorithm that interleaves approximate agreement on clock rates with the phase (i.e., clock offset) correction scheme. If the clocks are sufficiently stable, i.e., the maximum rate of change ν of clock rates is sufficiently small, this enables to significantly extend T (and thus decrease the frequency of communication) without substantially affecting skews. Provided that ϑ is not too large, for any T satisfying max{(ϑ − 1) 2 T, νT 2 } U , it is possible to guarantee a skew of O(U ).
3. We introduce a generic approach that enables to couple either of these algorithms to FATAL [6, 7] . FATAL is a self-stabilizing synchronization algorithm, but in comparison suffers from poor performance. The coupling scheme permits to combine the best of both worlds, namely the self-stabilization properties of FATAL with the small skew of the Lynch-Welch synchronization scheme.
On the technical side, the first two results require little innovation compared to prior work. However, it proved challenging to obtain clean, easy-to-implement algorithms that are amenable to a tractable analysis and achieve tight skew bounds. This is worthwhile for two reasons: (1) there is strong indication that the approach has considerable practical merit, 2 and (2) no readily usable mathematical analysis of the frequency correction scheme exists in the literature. 3 In fact, the second algorithm we present differs from FlexRay (which also aims to adjust frequencies) in a crucial point. In order to avoid that the approximate agreement scheme is rendered ineffective because nodes reach the imposed limits on adjusting their frequency, 4 we add a correction slowly pulling back nodes' frequencies to the nominal rate. Without this provision, it is straightforward to construct executions in which, e.g., the majority of the nodes runs too fast for another node to sufficiently adjust its clock rate to match their speed. This means that, in the worst case, FlexRay's frequency correction is futile. In contrast, the coupling scheme we use to combine our non-stabilizing algorithms with FATAL showcases a novel technique of independent interest. We leverage FATAL's clock "beats" to effectively (re-)initialize the synchronization algorithm we couple it to. Here, care has to be taken to avoid such resets from occurring during regular operation of the Lynch-Welch scheme, as this could result in large skews or even spurious clock pulses. The solution is a feedback mechanism that enables the synchronization algorithm to actively trigger the next beat of FATAL at the appropriate time. FATAL stabilizes regardless of how these feedback signals behave, while actively triggering beats ensures that all nodes pass the checks which, if failed, trigger the respective node being reset.
While a specific interface is required from the stabilizing algorithm to permit this approach, it seems likely that most, if not all, self-stabilizing synchronization algorithms could be modified to provide it. Thus, we consider the technique a highly useful separation of the tasks to achieve small skews and to ensure (fast) stabilization.
range. Solutions fully implemented in hardware are of interest for two reasons. First, having to implement the full software abstraction dramatically increases the number of potential reasons for a node to fail -at least from the point of view of the synchronization algorithm. A slim hardware implementation is thus likely to result in a substantially higher degree of reliability of the clocking mechanism. Second, if higher precision of synchronization is required, the significantly smaller delays incurred by dedicated hardware make it possible to meet these demands.
Apart from these issues, the complexity of a software solution renders TTP and FlexRay unsuitable as fault-tolerant clocking schemes for VLSI circuits. The DARTS project [3, 10] aimed at developing such a scheme, with the goal of coming up with a robust clocking method for space applications. Instead of being based on the Lynch-Welch approach, it implements the fault-tolerant synchronization algorithm by Srikanth and Toueg [17] . Unfortunately, DARTS falls short of its design goals in two ways. On the one hand, the Srikanth-Toueg primitive achieves skews of Θ(d), which tend to be significantly larger than those attainable with the Lynch-Welch approach. 5 Accordingly, the operational frequency DARTS can sustain (without large communication buffers and communication delays of multiple logical rounds) is in the range of 100 MHz, i.e., about an order of magnitude smaller than typical system speeds. Moreover, DARTS is not self-stabilizing. This means that DARTS -just like TTP and FlexRay -is unlikely to successfully cope with high rates of transient faults. Worse, the rate of transient faults will scale with the number of nodes (and thus sustainable faults). For space environments, this implies that adding fault-tolerance without self-stabilization cannot be expected to increase the reliability of the system at all.
These concerns inspired follow-up work seeking to overcome these downsides of DARTS. From an abstract point of view, FATAL [6, 7] can be interpreted as another incarnation of the Srikanth-Toueg approach. However, FATAL combines tolerance to Byzantine faults with selfstabilization in O(n) time with probability 1 − 2 −Ω(n) ; after recovery is complete, the algorithm maintains correct operation deterministically. Like DARTS, FATAL and the substantial line of prior work on Byzantine self-stabilizing synchronization algorithms (e.g., [2, 8] ) cannot achieve better clock skews than Θ(d). The key motivation for the present paper is to combine the better precision achieved by the Lynch-Welch approach with the self-stabilization properties of FATAL.
Concerning frequency correction, little related work exists. A notable exception is the extension of the interval-based synchronization framework to rate synchronization [15, 16] . In principle, it seems feasible to derive similar results by specialization and minor adaptions of this powerful machinery to our setting. Unfortunately, apart from the technical hurdles involved, an educated guess (based on the amount of necessary specialization and estimates that need to be strengthened) result in worse constants and more involved algorithms, and it is unclear whether our approach to self-stabilization can be fitted to this framework. However, it is worth noting that the overall proof strategies for the (non-stabilizing) phase and frequency correction algorithms bear notable similarities to this generic framework: separately deriving bounds on the precision of measurements, plugging these into a generic convergence argument, and separating the analysis of frequency and phase corrections.
Coming to lower bounds and impossibility results, the following is known.
• In a system of n nodes, no algorithm can tolerate n/3 Byzantine faults. All mentioned algorithms are optimal in that they tolerate n/3 − 1 Byzantine faults [4] .
• To tolerate this number of faults, Ω(n 2 ) communication links are required. 6 All mentioned algorithms assume full connectivity and communicate by broadcasts (faulty nodes may not adhere to this). Less well-connected topologies are outside the scope of this work.
• The worst-case precision of an algorithm cannot be better than (1 − 1/n)U in a network where communication delays may vary by U [14] . In the fault-free case and with ϑ − 1 sufficiently small, this bound can be almost matched (cf. Section 4); all variants of the Lynch-Welch approach match this bound asymptotically granted sufficiently accurate local clocks.
• Trivially, the worst case precision of any algorithm is at least (ϑ − 1)T if nodes exchange messages every T time units. In the fault-free case, this is essentially matched by our phase correction algorithm as well.
• With faults, the upper bound on the skew of the algorithm increases by factor 1/(1 − α), where α ≈ 1/2 if ϑ ≈ 1. It appears plausible that this is optimal under the constraint that the algorithm's resilience to Byzantine faults is optimal, due to a lower bound on the convergence rate of approximate agreement [5] .
Overall, the resilience of the presented solution to faults is optimal, its precision asymptotically optimal, and it seems reasonable to assume that there is little room for improvement in this regard. In contrast, no non-trivial lower bounds on the stabilization time of self-stabilizing fault-tolerant synchronization algorithms are known.
It remains an open question whether it is possible to achieve stabilization within o(n) time.
Model
We assume a fully connected system of n nodes, up to f := (n − 1)/3 of which may be Byzantine faulty (i.e., arbitrarily deviate from the protocol). We denote by V the set of all nodes and by C ⊆ V the subset of correct nodes, i.e., those that are not faulty. Communication is by broadcast of "pulses," which are messages without content: the only information conveyed is when a node transmitted a pulse. Nodes can distinguish between senders; this is used to distinguish the case of multiple pulses being sent by a single (faulty) node from multiple nodes sending one pulse each. Note that faulty nodes are not bound by the broadcast restriction, i.e., may send a pulse to a subset of the nodes only. The system is semi-synchronous. A pulse sent by node v ∈ C at (Newtonian) time p v ∈ R + 0 is received by node w ∈ C at time
we refer to d as the maximum message delay (or, chiefly, delay) and to U as the delay uncertainty (or, chiefly, uncertainty).
For these timing guarantees to be useful to an algorithm, the nodes must have a means to measure the progress of time. Each node v ∈ C is equipped with a hardware clock H v , which is modeled as a strictly increasing function H v : R + 0 → R + 0 . We require that there is a constant ϑ > 1 such that for all times t < t , it holds that
i.e., the hardware clocks have bounded drift. 7 We remark that our results can be easily translated to the case of discrete and bounded clocks. 8 We refer to H v (t) as the local time of v at time t.
Executions are event-based, where an event at node v is the reception of a message, a previously computed (and stored) local time being reached, or the initialization of the algorithm. A node may then perform computations and possibly send a pulse. For simplicity, we assume that these operations take zero time; adapting our results to account for computation time is straightforward. 7 It is common to define the drift symmetrically, i.e., (1 − ρ)(t − t) ≤ Hv(t ) − Hv(t) ≤ (1 + ρ)(t − t) for some 0 < ρ < 1. For ρ 1 and ϑ ≈ 1, up to minor order terms this is equivalent to setting ρ := (ϑ − 1)/2 and rescaling the real time axis by factor 1 − ρ. The one-sided formulation results in less cluttered notation.
8 Discretization can be handled by re-interpreting the discretization error as part of the delay uncertainty. All our algorithms use the hardware clock exclusively to measure bounded time differences.
Problem. A clock synchronization algorithm generates distinguished events or clock pulses at times p v (r) for r ∈ N and v ∈ C so that the following conditions are satisfied for all r ∈ N.
The first requirement is a bound on the synchronization error between the r th clock ticks; naturally, it is desired that e(r) is as small as possible. The second requirement is a bound on the time between consecutive clock ticks, which can be translated to a bound on the frequency of the clocks; here, the goal is that A min /A max ≈ 1. The precision of the algorithm is measured by the steady state error 9 E := lim r →∞ sup r≥r {e(r)} .
Self-stabilization will be introduced and discussed in Section 6.
Phase Synchronization Algorithm
Our basic algorithm is a variant of the one by Lynch and Welch [18] , which synchronizes clocks by simulating perpetual synchronous approximate agreement [5] on the times when clock pulses should be generated. We diverge only in terms of communication: instead of round numbers, nodes broadcast content-free pulses. Due to sufficient waiting times between pulses, during regular operation received messages from correct nodes can be correctly attributed to the respective round. In fact, the primary purpose of transmitting round numbers in the Lynch-Welch algorithm is to add recovery properties. Our technique for adding self-stabilization (presented in Section 6) leverages the pulse synchronization algorithm from [6, 7] instead, which requires to broadcast constant-sized messages only. Before presenting the algorithm and its analysis in Sections 4.2 and 4.3, respectively, we revisit some basic properties of the technique for approximate agreement introduced in [5] in the context used here. The results in this section are derivatives of the ones from [5, 18] , but adapting them to our setting and notation is essential for deriving our main results in Sections 5 and 6.
Properties of Approximate Agreement Steps
Abstractly speaking, the synchronization performs approximate agreement steps in each (simulated synchronous) round. In approximate agreement, each node is given an input value and the goal is to let nodes determine values that are close to each other and within the interval spanned by the correct nodes' inputs.
In the clock synchronization setting, there is the additional obstacle that the communicated values are points in time. Due to delay uncertainty and drifting clocks, the communicated values are subject to a (worst-case) perturbation of at most some δ ∈ R + 0 . We will determine δ later in our analysis of the clock synchronization algorithms; we assume it to be given for now. The effect of these disturbances is straightforward: they may shift outputs by at most δ in each direction, increasing the range of the outputs by an additive 2δ in each step (in the worst case).
Algorithm 1 describes an approximate agreement step from the point of view of node v ∈ C. When implementing this later on, we need to make use of timing constraints to ensure that (i) correct nodes receive each other's messages in time to perform the associated computations and (ii) correct nodes' messages can be correctly attributed to the round to which they belong. Figure 1 depicts how a round unfolds assuming that these timing constraints are satisfied. 9 Typically, e(r) is a monotone sequence, implying that simply E = limr→∞ e(r).
Algorithm 1: Approximate agreement step at node v ∈ C (with synchronous message exchange).
1 // node v is given input value x v ; 2 broadcast x v to all nodes (including self); 3 // if w ∈ C, the received valuex wv ∈ [x w − δ, x w + δ]; 4 receive first valuex wv from each node w (x wv := x v if no message from w received); 5 S v ← {x wv | w ∈ V }; 6 denote by S k v the k th element of S v w.r.t. ascending order;
y ≤ x /2 + 2δ Denote by x the |C|-dimensional vector of correct nodes' inputs, i.e., (
The diameter x of x is the difference between the maximum and minimum components of x. Formally,
We will use the same notation for other values, e.g. y and y . For simplicity, we assume that |C| = n − f in the following; all statements can be adapted by replacing n − f with |C| where appropriate.
Consider the special case of δ = 0. Intuitively, Algorithm 1 discards the smallest and largest f values each to ensure that values from faulty nodes cannot cause outputs to lie outside the range spanned by the correct nodes' values. Afterwards, y v is determined as the midpoint of the interval spanned by the remaining values. Since f < n/3, i.e., n − f ≥ 2f + 1, the median of correct nodes' values is part of all intervals computed by correct nodes. From this, it is easy to see that y ≤ x /2, see Figure 1 . For δ > 0, we simply observe that the resulting values y v , v ∈ C, are shifted by at most δ compared to the case where δ = 0, resulting in y ≤ x /2 + 2δ. We now prove these properties.
Proof. As there are at most f faulty nodes, for v ∈ C we have that
Proof. We show the claim for δ = 0 first, i.e.,x wv = x w for all v, w ∈ C. Denote by x k the k th element of x w.r.t. ascending order. Since f < n/3, we have that n − f ≥ 2f + 1. Hence, for all v ∈ C,
For any v, w ∈ C, it follows that
Symmetrically, we have that y w − y v ≤ x /2 and thus |y v − y w | ≤ x /2. As v, w ∈ C were arbitrary, this yields y ≤ x /2 (under the assumption that δ = 0). For the general case, observe that S
, and S n−f w each can be changed by at most δ. This can affect (S
by at most 4δ/2 = 2δ; the claim follows.
Algorithm
Algorithm 2 shows the pseudocode of the phase synchronization algorithm at node v ∈ C. It implements iterative approximate agreement steps on the times when to send pulses. The algorithm assumes that the nodes are initialized within a (local) time window of size F . In each round r ∈ N, the nodes estimate the phase offset of their pulses 10 and then compute an according phase correction ∆ v (r). Figure 2 illustrates how a round of the algorithm plays out.
To fully specify the algorithm, we need to determine how long the waiting periods in each round are (in terms of local time), which will be given as τ 1 (r), τ 2 (r), and T (r)−∆(r)−τ 1 (r)−τ 2 (r). Here, we must ensure for all r ∈ N that 1. for all v, w ∈ C, the message that v broadcasts at time
If these conditions are satisfied at all correct nodes, we say that round r is executed correctly, and we can interpret the round as an approximate agreement step in the sense of Section 4.1. We will show in the next section that the following condition is sufficient for all rounds to be executed correctly.
Algorithm 2: Phase synchronization algorithm, code for node v ∈ C. Time t v (r), r ∈ N 0 , is the time when round r + 1 starts. for each node w ∈ V do 9 τ wv := H v (t wv ), where first message from w received at t wv (τ wv := ∞ if none received);
13 // T (r) denotes the nominal length of round r 14 wait until time t v (r) with
Figure 2: A round of Algorithm 2 from the point of view of nodes v and w. Note that the durations marked on the horizontal axis are measured using the local hardware clock.
Condition 1. Define e(1) := F + (1 − 1/ϑ)τ 1 (1) and inductively for all r ∈ N that e(r + 1) :
We require for all r ∈ N that
Here, e(r) is a bound on the synchronization error in round r, i.e., we will show that p(r) ≤ e(r) for all r ∈ N, provided Condition 1 is satisfied. Condition 1 cannot be satisfied for arbitrary ϑ > 1 such that e(r) is bounded independently of r. The intuition is that rounds must be long enough to ensure that all pulses from correct nodes are received (i.e., at least ϑe(r)), but during this time additional error is built up by drifting clocks; if the approximate agreement step cannot overcome this relative skew increase, round r + 1 has to be even longer, and so on. However, any ϑ ≤ 1.1 can be sustained.
Lemma 3. Condition 1 can be satisfied such that lim r→∞ e(r) < ∞ if
In this case, we can achieve
Proof. By plugging e(1) into the inequality for τ 1 (1), we see that we may choose τ 1 (1) < ∞ if and only if ϑ < 2. Assuming that this is the case, we choose to satisfy all inequalities with equality, yielding for r ∈ N that
Thus,
where the second equality holds because α < 1. Because α < 1 is a stricter constraint on ϑ than ϑ < 2, this completes the proof.
Several remarks are in order.
• α goes to 1/2 as ϑ goes to 1. For ϑ = 1.01, we already have that α ≈ 0.55. Thus, the approach can support fairly large phase drifts.
• For ϑ ≈ 1, we have that lim r→∞ e(r) ≈ 4U + 2(ϑ − 1)d. From Corollary 2, one can see that if (ϑ − 1)d U , this can be reduced to lim r→∞ e(r) ≈ 2U .
• The lower bound by Lynch and Welch [14] shows that this is optimal up to factor 2. It is straightforward to verify that in the fault-free case with ϑ = 1, the algorithm attains the lower bound.
• The convergence is exponential, i.e., for any ε > 0 we have that e(r) ≤ (1 + ε) lim r→∞ e(r) for all r ≥ r ε ∈ Θ(log F/(ε lim r→∞ e(r))).
Analysis
In this section, we prove that Condition 1 is indeed sufficient to ensure that p(r) ≤ e(r) for all r ∈ N. In the following, denote by p(r), r ∈ N 0 , the vector of times when nodes v ∈ C broadcast their r th pulse, i.e.,
If v ∈ C takes note of the pulse from w ∈ C in round r, the corresponding value τ wv − τ vv can be interpreted as inexact measurement of p w (r) − p v (r). This is captured by the following lemma, which provides precise bounds on the incurred error.
Lemma 4. Suppose v ∈ C receives the pulses from both w ∈ C and itself in round r at a time
where τ wv and τ vv denote the values of the respective variables in the algorithm in round r.
Proof. Denote by t uv the time when v receives the pulse from u ∈ {v, w}. The communication model guarantees that
Moreover, if p w (r) − p v (r) ≥ 0, the bounds on the hardware clock speed guarantee that
and thus
This bound also holds in case p w (r) − p v (r) < 0, as we can switch the roles of v and w in the above inequalities. We conclude that
(1),(2)
We remark that if (ϑ − 1)d < U and U is known, it is beneficial to refrain from having v send a message to itself. Instead it estimates the arrival time of the message using its hardware clock, yielding the following corollary.
Corollary 2. Suppose v ∈ C receives the pulse from w ∈ C in round r at a time from
where τ wv denotes the value of the respective variable in the algorithm in round r.
Proof. By repeating the proof of Lemma 4, where the term
In the sequel, we use the bounds provided by Lemma 4. However, the reader should keep in mind that in case (ϑ − 1)d
U and sufficiently precise bounds on U are known, Corollary 2 shows how to effectively cut the influence of the uncertainty in half.
Using Lemma 4, we can interpret the phase shifts ∆ v (r) as outcomes of an approximate agreement step, yielding the following corollary.
Corollary 3. Suppose in round r ∈ N, it holds for all v, w ∈ C that v receives the pulse from w ∈ C and itself in round r during
Proof. By Lemma 4, we can interpret the values 2(τ wv − τ vv )/(ϑ + 1) as measurements of p w (r) − p v (r) with error δ = ϑU + (ϑ − 1) p(r) /(ϑ + 1). Note that shifting all values by p v (r) in an approximate agreement step changes the result by exactly p v (r), implying that p v (r) − ∆ v (r) equals the result of an approximate agreement step with inputs p w (r), w ∈ C, and error δ at node v. Thus, the claims follow from Corollary 1 and Lemma 2, noting that 1/2 + 2(ϑ − 1)/(ϑ + 1) = (5ϑ − 3)/(2(ϑ + 1)).
To derive a bound on p(r + 1) , it remains to analyze the effect of the clock drift between the pulses. To this end, we examine how an established timing relation between actions of two correct nodes deteriorates due to measuring time using the inaccurate hardware clocks.
Proof. Since hardware clocks are increasing, t v ≥ t v and t w ≥ t w . The inequalities follow because hardware clock rates are between 1 and ϑ ≥ 1.
This readily yields a bound on p(r + 1) -provided that all nodes can compute when to send the next pulse on time.
Corollary 4. Assume that round r ∈ N is executed correctly. Then
Proof. For v, w ∈ C, assume w.l.o.g. that p v (r + 1) − p w (r + 1) ≥ 0. By Lemma 5 and Corollary 3, we have that
This bound hinges on the assumption that the round is executed correctly. We next establish sufficient conditions for this to be the case.
Lemma 6. Suppose that
Then round r is executed correctly.
the time when this message is received by w. We have that
showing that H w (t vw ) ≥ H w (t w (r − 1)), i.e., w starts listening for the pulse of v on time.
Similarly,
. Thus, w receives the pulse from v before it stops listening, and the first requirement of correct execution of round r is met for all v, w ∈ C. It remains to prove that for each v ∈ C, it holds that T (r) − ∆ v (r) ≥ τ 1 (r) + τ 2 (r). By the preconditions of the lemma, this is satisfied if ∆ v (r) ≤ ϑ( p(r) + U ). As we already established the precondition of Corollary 3 for round r, the corollary shows that this inequality is satisfied.
We have almost all pieces in place to inductively bound p(r) and determine suitable values for τ 1 (r), τ 2 (r), and T (r). The last missing bit is an anchor for the induction, i.e., a bound on p(1) .
The claim follows by applying Lemma 5.
Theorem 1. Suppose that Condition 1 is satisfied. Then, for all r ∈ N, it holds that p(r) ≤ e(r). If α = (6ϑ 2 + 5ϑ − 9)/(2(ϑ + 1)(2 − ϑ)) < 1 (which holds for ϑ ≤ 1.1), we can choose the parameters such that the condition holds and Algorithm 2 has steady state error
Proof. To show the first part, inductively use Lemma 6 and Lemma 4 to show that round r is executed correctly and that p(r + 1) ≤ e(r + 1), respectively; the induction anchor is given by p(1) ≤ e(1) according to Corollary 5. The second part directly follows from Lemma 3.
Phase and Frequency Synchronization Algorithm
In this section, we extend the phase synchronization algorithm to also synchronize frequencies. The basic idea is to apply the approximate agreement not only to phase offsets, but also to frequency offsets. To this end, in each round the phase difference is measured twice, applying any phase correction only after the second measurement. This enables nodes to obtain an estimate of the relative clock speeds, which in turn is used to obtain an estimate of the differences in clock speeds.
Ensuring that this procedure is executed correctly is straightforward by limiting |µ v (r) − 1| to be small, where µ v (r) is the factor by which node v changes its clock rate during round r. However, constraining this multiplier means that approximate agreement steps cannot be performed correctly in case µ v (r + 1) would lie outside the valid range of multipliers. This is fixed by introducing a correction that "pulls" frequencies back to the default rate.
Of course, for all this to be meaningful, we need to assume that hardware clock rates do not change faster than the algorithm can adjust the multipliers to keep the effective frequencies aligned.
Additional Assumptions on the Clocks
We require that clock rates satisfy a Lipschitz condition as well. In the following, we assume that H v is differentiable (for all v ∈ C) with derivative h v , where h v satisfies for t, t ∈ R + 0 that
for some ν > 0. Note that we maintain the model assumption that hardware clock rates are close to 1 at all times, i.e., 1 ≤ h v (t) ≤ ϑ for all t ∈ R + 0 .
Algorithm
Algorithm 3 gives the pseudocode of our approach. Mostly, the algorithm can be seen as a variant of Algorithm 2 that allows for speeding up clocks by factors µ v (r) ∈ [1, ϑ 2 ], where ϑh v (t) is considered the nominal rate at time t. 11 For simplicity, we fix all local waiting times independently of the round length. The main difference to Algorithm 2 is that a second pulse signal is sent before the phase correction is applied, enabling to determine the rate multipliers for the next round by an approximate agreement step as well. A frequency measurement is obtained by comparing the (observed) relative rate of the clock of node w during a local time interval of length τ 2 + τ 3 to the desired relative clock rate of 1. Since the clock of node v is considered to run at speed µ v (r)h v (t) during the measurement period, the former takes the form µ v (r)∆ wv /(τ 2 + τ 3 ), where ∆ wv is the time difference between the arrival times of the two pulses from w measured with H v . The approximate agreement step results in a new multiplierμ v (r + 1) at node v; we then move this result by ε in direction of the nominal rate multiplier ϑ and ensure that we remain within the acceptable multiplier range [1, ϑ 2 ].
To fully specify the algorithm, we need to determine how long the waiting periods are (in terms of local time) and choose ε. Here, we must ensure for all r ∈ N that 1. for all v, w ∈ C, the message v broadcasts at time t v (r − 1) + τ 1 /µ v (r − 1) is received by w at a local time from [H w (t w (r − 1)), H w (t w (r − 1)) + τ 1 /µ v (r − 1) + τ 2 /µ w (r)], Algorithm 3: Phase and frequency synchronization algorithm, code for node v ∈ C. Time t v (r), r ∈ N 0 , is the time when round r + 1 starts. 
If these conditions are satisfied for r ∈ N, we say that round r was executed correctly. We now specify the constraints our choices for the parameters must satisfy to ensure that all rounds are executed correctly and both phase and frequency errors converge to small values. We require that
Here, all but the last conditions mimic Condition 1, where the bounds on τ 3 and τ 4 account for the fact that between the first and the second pulse of each round, the nodes' opinion on the "synchronized time" drift apart slowly. The lower bound on ε ensures that the pull-back of multipliers to the nominal ones is sufficiently strong to guarantee that, in fact, multipliers will never leave the valid range of [1, ϑ 2 ]. We now show that these constraints can be satisfied provided that ϑ is not too large. Proof. We choose τ 1 , τ 2 , τ 3 , and τ 4 minimal such that the respective constraints are satisfied, and pick any feasible ε. Hence, the remaining constraints are that
and e(1) = max F + 1 + 1 ϑ e(1),
Using that 2 −θ > 0 (which is a weaker constraint thanᾱ < 1), assuming that e(1) equals the first term of the maximum would yield that
and clearly there is a T 0 ∈ O(F + d + U ) such that (4) is satisfied for any T ≥ T 0 . Assuming that e(1) equals the second term in the maximum, (4) becomes
Using thatᾱ < 1, we can resolve this to
For the final claim, observe that by induction on r, we have that 
Analysis
In the following, denote by p(r) and q(r), r ∈ N, the vectors of times when nodes v ∈ C broadcast their first and second pulse in round r, respectively. Thus, we have that
We will first make use of the analysis we performed for the phase correction algorithm to show that all rounds are executed correctly. Then we will refine the analysis by examining the impact of the frequency correction steps.
Phase Correction Steps
Observe that because for all r ∈ N 0 and v ∈ C, we have that 1 ≤ µ v (r) ≤ ϑ 2 , for all times t we have that 1 ≤ µ v (r)h v (t) ≤ ϑ 3 =θ. Thus, we may interpret the waiting periods of Algorithm 3 as nodes waiting for τ 1 , τ 2 , etc. local time with hardware clocks of driftθ = ϑ 3 . Thus, we can make use of the same arguments as in Section 4.3 to obtain a series of results.
Corollary 6. For all r ∈ N, q(r) ≤ p(r) + (1 − 1/θ)(τ 1 + τ 2 ).
Proof. By application of Lemma 5.
Corollary 7. Suppose that
Proof. As for Lemma 6, where the pulse in the frequency correction step is analyzed analogously.
Theorem 2. Suppose that Condition 2 is satisfied and that
α :=β + (4θ + 3)(θ − 1) < 1 , whereβ := (2θ 2 + 5θ − 5)/(2(θ + 1)) (this is the case for ϑ ≤ 1.011). Then, for all r ∈ N, it holds that p(r) ≤ e(r) and the algorithm has steady state error
In particular, all rounds r ∈ N are executed correctly.
Proof. As for Theorem 1, where we replace ϑ withθ, Lemma 6 with Corollary 7 and Lemma 3 with Lemma 7. However, the induction step requires that we can apply Lemma 6 again in step r + 1 if we could do so in step r ∈ N. This readily follows from Condition 2 if e(r + 1) ≤ e(r) for all r ∈ N. We show this by induction on r. Abbreviate x := (3θ − 1)U + (1 − 1/θ)T . Our claim is that (i) for r ∈ N, e(r) ≥ x/(1 −β) and (ii) for r ≥ 2, e(r) ≤ e(r − 1). The base case r = 1 requires (i) only, which holds by definition of e(1). For the step from r to r + 1, we bound e(r + 1) =βe(r) + x ≥β x 1 −β + x = x 1 −β and e(r) − e(r + 1) = (1 −β)e(r) − x ≥ x − x = 0 .
Finally, observe that our reasoning shows as part of the inductive argument that all rounds are executed correctly.
Frequency Correction Steps
In the following, we assume that the prerequisites of Theorem 2 are satisfied. In particular, all rounds are executed correctly, i.e., we can assume that correct nodes receive each others' pulses. We introduce some notation to capture the behavior of the (logical) rates of the nodes' clocks. This notation may seem somewhat cumbersome; basically, the reader may think of the clock rates h v (t) being almost constant, implying that all considered values for a given node v ∈ C are essentially the same, slowly deviating at rate at most ν. By ρ(r), we denote the vector whose entries are the intervals of clock rate ranges of nodes v ∈ C between the first pulses in rounds r ∈ N and r + 1. Concretely,
By ρ(r) , we denote the difference between maximum and minimum rate in ρ(r), i.e.,
Furthermore, we denote byρ(r) v := µ v (r)h v ((p v (r) + p v (r + 1))/2), byρ(r) the respective vector, and by ρ(r) := max v∈C {ρ(r)} − min v∈C {ρ(r)}. Note thatρ(r) v ∈ ρ(r) v by definition. We start by showing thatρ(r) v approximates µ v (r)h v (t) well for times t between pulse r and r + 1 of v ∈ C, i.e., we may seeρ(r) v as "the" clock rate of v in round r.
Proof. Using that hardware clock rates are at least 1 and that |∆ v (r)| < max{τ 1 , τ 2 } = τ 2 , we see that
By our assumptions on the hardware clocks, this yields that
Two corollaries relate the progress of the hardware clocks between (i) p v (r) and q v (r) and (ii) t wv and t wv toρ(r) v , respectively.
Corollary 8.
For v ∈ C and r ∈ N, we have that
Proof. Let ρ ∈ ρ(r) v such that ρ(q v (r) − p v (r)) = τ 2 + τ 3 . By definition of ρ(r) v and the mean value theorem, such a ρ exists and ρ = µ v (r)h v (t) for some t ∈ [p v (r), p v (r + 1)]. By Lemma 8, |ρ −ρ(r) v | < νT . Thus,
Corollary 9. For v, w ∈ C and r ∈ N, we have that
Proof. Letρ ∈ ρ(r) v such that t wv −t wv = µ v (r)(H v (t wv )−H v (t wv ). By definition of ρ(r) v and the mean value theorem, such a ρ exists and
where the second last step exploits that t wv − t wv ≤ q w (r)
since clock rates are at least 1, and the final inequality easily follows from Condition 2.
These results put us in the position to prove that 1 − µ v (r)∆ wv /(τ 2 + τ 3 ) is indeed a good estimate ofρ(r) w −ρ(r) v . Thus, this (computable) value can serve as a proxy for the difference between "the" clock rates of w and v in round r.
Lemma 9. For v, w ∈ C and r ∈ N, we have that
and by Corollaries 8 and 9 that
Note that |µ v (r)∆ wv /(t wv − t wv )| ≤ ϑ 3 . Therefore,
We conclude that
We remark that the Θ((1 − 1/ϑ 3 ) 2 ) factor is, more precisely, bounded as Θ((1 − 1/ϑ 3 ) ρ(r) ). However, for this to be of use, we would have to choose ε depending on r. Since rule-ofthumb calculations show that this term is unlikely to be significant in any real system and the improvement would not extend to the self-stabilizing variant of the algorithm, we refrained from adding this additional complication.
Given that we can bound the "measurement error" of the frequency correction step by Lemma 9, the results from Section 4.1 can be invoked to show convergence. First, we analyze the properties ofμ v (r + 1), which Lemma 11 then uses to control µ v (r + 1).
Then, for all v, w ∈ C,
Furthermore,
Proof. Set δ := ϑ 3 (1 − ϑ −3 ) 2 + ϑ 3 U/(τ 2 + τ 3 ) + (ϑ 3 + 1)νT . Observe that, by Lemma 9, we can interpretρ(r) v + ξ v (r), v ∈ C, as the results of an approximate agreement step with error δ on inputsρ(r). By Lemma 2, this implies that
By Corollary 1, max u∈C |{ξ u (r)|} ≤ ρ(r) + δ. Hence, we have for u ∈ C that
Using this bound for both v and w, we conclude that
For the second claim of the lemma, we apply Lemma 1. Together with (8) , this shows for v ∈ C thatμ
where we used that 2h
Combining this with the above inequalities completes the proof.
Lemma 11. For round r ∈ N and v ∈ C, abbreviatet v := (p v (r) + p v (r + 1))/2, i.e.,ρ(r) v = µ v (r)h v (t v ). For all v, w ∈ C, we have that
Proof. Let v ∈ C and w ∈ C maximize and minimize µ u (r + 1)h u (t u ) over u ∈ C, respectively. By Lemma 10, we have that
We make a case distinction.
Case 1:
Case 2: µ v (r + 1) −μ v (r + 1) > ε. This implies that µ v (r + 1) = 1 ≤ µ v (r).
a)μ w (r + 1) ≤ ϑ, i.e., we have that µ w (r + 1) ≥μ w (r + 1) + ε. Using Lemma 10, we bound
b)μ w (r + 1) > ϑ, yielding that µ w (r + 1) ≥ ϑ − ε. It follows that
Case 3:μ w (r + 1) − µ w (r + 1) > ε. This implies that µ w (r + 1) = ϑ 2 ≥ µ w (r).
a)μ v (r + 1) > ϑ, i.e., we have that µ v (r + 1) ≤μ v (r + 1) − ε. Using Lemma 10, we bound
In all cases, we get that
It remains to take into account that hardware clock speeds change between rounds using Lemma 8.
Corollary 10. For all r ∈ N,
Proof. By applying Lemma 11 and noting that for all u ∈ C, |ρ(r) v −ρ(r + 1) v | ≤ ν(T + τ 2 ) by Lemma 8.
We conclude that the steady state frequency error is in O(ε).
Corollary 11. Assume that β := (2ϑ − 1)/2 < 1. Then
Proof. From iterative application of Corollary 10, we get that
Lemma 8 shows that ρ(r ) ≤ ρ(r ) + ν(T + τ 2 ). Since Condition 2 holds, 1 − β ∈ Ω(1) and the overall error is bounded by O(ε).
Steady State Error with Frequency Correction
To make use of Corollary 11, we need to derive a variant of Corollary 4 that allows for better control of p(r + 1) in case ρ(r) is small.
Lemma 12.
If round r ∈ N is executed correctly, then
Proof. For v, w ∈ C, assume w.l.o.g. that p v (r + 1) − p w (r + 1) ≥ 0 (the other case is symmetric). Denote by ρ v ∈ ρ(r) v the average (adjusted) clock rate of v during [p v (r), p v (r + 1)], i.e.,
ρ w is defined analogously for w. Recall that 1 ≤ ρ u ≤θ for u ∈ {v, w}. Using this and Corollary 3 (with ϑ replaced byθ = ϑ 3 ), we conclude that
Plugging this into our machinery we arrive at the main result of this section.
Theorem 3. Suppose that Condition 2 is satisfied and that
(which is the case for ϑ ≤ 1.01). Then, with α := (4θ 2 + 5θ − 7)/(2(θ + 1)) < 1 and β := (2ϑ − 1)/2 < 1, Algorithm 3 has steady state error
Proof. As the preconditions of Theorem 2 are satisfied, all rounds are executed correctly. By Corollary 11, this implies that
We plug this into the bound from Lemma 12, which we apply inductively to show that
Under reasonable assumptions we can obtain a more readable error bound.
Corollary 12.
Assume that the prerequisites of Theorem 3 are satisfied. Moreover, suppose that
• ε is chosen minimally such that it satisfies Condition 2,
, which is feasible whenever T θ (e(1) + d), and
Then the steady state error of Algorithm 3 is bounded by roughly 28U .
Proof. Note that 1/α ≈ 1/2 implies that 1/β ≈ 1/2 and thatθ ≈ 1. Plugging in ε into the bound from Theorem 3, the steady state error is approximately bounded by
A few remarks:
• Corollary 12 basically states that increasing T is fine, as long as max{(θ − 1) 2 T, νT 2 } U . This improves over Algorithm 2, where it is required that (ϑ − 1)T U , as it permits to transmit pulses at significantly smaller frequencies.
• While the error bound of roughly 28U is about factor 7 larger than the about 4U Algorithm 2 provides, this is likely to be overly conservative. The source of this difference is that we assume that in a frequency measurement, the full uncertainty U may skew the observation of the relative clock speed. However, this measurement is based on sending two signals in the same direction over the same communication link in fairly short order. In most settings, the difference in delays will be much smaller than between messages on different communication links. Accordingly, the relative contribution of the frequency measurement to the error is likely to be much smaller in practice.
• If this is not the case, one may extend the time span for a frequency measurement over multiple rounds to decrease the effect of the uncertainty. This requires that the accumulated phase corrections do not become so large as to prevent a clear distinction of the frequencyrelated pulse (whose sending time must not be altered due to phase corrections) from phase-related pulses. 12 To not further complicate the analysis, we refrained from presenting this option; it is used in [15, 16] .
Self-stabilization
In this section, we propose a generic mechanism that can be used to transform Algorithm 2 and Algorithm 3 into self-stabilizing solutions. An algorithm is self-stabilizing, if it (re)establishes correct operation from arbitrary states in bounded time. If there is an upper bound on the time this takes in the worst case, we refer to it as the stabilization time. We stress that, while self-stabilizing solutions to the problem are known, all of them have skew Ω(d); augmenting the Lynch-Welch approach with self-stabilization capabilities thus enables to achieve an optimal skew bound of O((ϑ − 1)T + U ) in Byzantine self-stabilizing manner for the first time.
Our approach can be summarized as follows. Nodes locally count their pulses modulo some M ∈ N. We use a low-frequency, imprecise, but self-stabilizing synchronization algorithm (called FATAL) from earlier work [6, 7] to generate a "heartbeat." On each such beat, nodes will locally check whether the next pulse with number 1 modulo M will occur within an expected time (local) window whose size is determined by the precision the algorithm would exhibit after M correctly executed pulses (in the non-stabilizing case). If this is not the case, the node is "reset" such that pulse 1 will occur within this time window.
This simple strategy ensures that a beat forces all nodes to generate a pulse with number 1 modulo M within a bounded time window. Assuming a value of F corresponding to its length in Algorithm 2 or Algorithm 3 hence ensures that the respective algorithm will run as intended-at least up to the point when the next beat occurs. Inconveniently, if the beat is not synchronized with the next occurrence of a pulse 1 mod M , some or all nodes may be reset, breaking the guarantees established by the perpetual application of approximate agreement steps. This issue is resolved by leveraging a feedback mechanism provided by FATAL: FATAL offers a (configurable) time window during which a NEXT signal externally provided to each node may trigger the next beat. If this signal arrives at each correct node at roughly the same time, we can be sure that the corresponding beat is generated shortly thereafter. This allows for sufficient control on when the next beat occurs to prevent any node from ever being reset after the first (correct) beat. Since FATAL stabilizes regardless of how the externally provided signals behave, this suffices to achieve stabilization of the resulting compound algorithm.
FATAL
We summarize the properties of FATAL in the following corollary, where each node has the ability to trigger a local NEXT signal perceived by the local instance of FATAL at any time.
Algorithm 4: Interface algorithm, actions for node v ∈ C in response to a local event at time t. Runs in parallel to local instances of FATAL and either Algorithm 2 or Algorithm 3. In case Algorithm 2 is used, we assume that τ 1 (r), τ 2 (r), and T (r) do not depend on r ∈ N and omit r from the notation. 1 // algorithm maintains local variable i ∈ {0, . . . , M − 1} 2 if v generates a pulse at time t then Corollary 13 (of [7] ). For suitable parameters P, B 1 , B 2 , B 3 , D ∈ R + , FATAL stabilizes within O((B 1 + B 2 + B 3 )n) time with probability 1 − 2 −Ω(n) . Once stabilized, nodes v ∈ C generate beats b v (k), k ∈ N, such that the following properties hold for all k ∈ N.
1. For all v, w ∈ C, we have that
2. If no v ∈ C triggers its NEXT signal during [min w∈C {b w (k)} + B 1 , t] for some t ≤ min w∈C {b w (k)} + B 1 + B 2 + B 3 , then min w∈C {b w (k + 1)} ≥ t.
3. If all v ∈ C trigger their NEXT signals during [min w∈C {b w (k)} + B 1 + B 2 , t] for some t ≤ min w∈C {b w (k)} + B 1 + B 2 + B 3 , then max w∈C {b w (k + 1)} ≤ t + P .
Denoting by d F the maximum end-to-end delay (sum of maximum message and computational delay) of FATAL, for any φ ≥ 1 and any constant C we can ensure that
Proof. For φ = 1, all statements follow directly from Lemma 3.4 and Corollary 4.16 in [7] , noting that nodes will switch from state ready to propose (in the main state machine) in response to a NEXT signal if their timeout T 3 is expired. Once all correct nodes switched to propose, this results in all nodes switching to accept and generating a beat within d F time. For φ > 1, one simply needs to observe that multiplying each timeout for a satisfying Condition 3.3 in [7] by φ results in another valid choice; the bound on the stabilization time given in Corollary 4.16 scales accordingly.
Algorithm
Our self-stabilizing solution utilizes both FATAL and the clock synchronization algorithm with very limited interaction. We already stressed that FATAL will stabilize regardless of the NEXT signals and note that it is not influenced by Algorithm 4 in any other way. Concerning the clock synchronization algorithm (either Algorithm 2 or Algorithm 3), we assume that a "careful" implementation is used that does not maintain state variables for a long time. Concretely, Algorithm 2 will clear memory between loop iterations, and Algorithm 3 will memorize the new multiplier value µ v (r + 1) only, which is explicitly assigned during round r. If this is satisfied, no further consistency checks of variables are required, and it will be straightforward to re-use the analyses from Sections 4.3 and 5.3. Having said this, let us turn to Algorithm 4, which is basically an ongoing consistency check based on the beats that resets the clock synchronization algorithm if necessary. The feedback triggering the next beat in a timely fashion is implemented by simply triggering the NEXT signal on each M th beat, with a small delay ensuring that all nodes arrive in the same round and have their counter variable i reading 0. The consistency checks then ask for i = 0 and the next pulse being triggered within a certain local time window; if either does not apply, the reset function is called, ensuring that both conditions are met.
Condition 3 lists the constraints on R − (the minimum local time between a beat and local pulse 1 mod M ), R + (the respective maximum local time), and M (the number of pulses between beats) -the parameters of Algorithm 4 -need to satisfy so that we can show that the algorithm is guaranteed to stabilize. Condition 3. We require that
Intuitively, these constraints ensure the following:
• (9) says that resets on a beat enforce the skew to become bounded by e(1).
• (10) and (11) ensure that correct nodes receive the first pulses from all other correct nodes after a beat.
• (12) guarantees that these are actually the "round-1" pulses also for nodes that have been reset, i.e., there are no spurious pulses from before such a reset that are received during the respective time window.
• (13) and (14) make sure that FATAL will ignore any NEXT signals that may still be active when a beat occurs and that there is sufficient time for the first round after the beat to complete.
• (15) and (16) enforce that the (now correctly executing) algorithm will trigger the NEXT signals and thus the next beat well-aligned with the time reference it provides.
• Finally, (17) and (18) imply that such a beat will result in no resets.
We need to show that these constraints can be satisfied in conjunction with the ones required by the employed synchronization algorithm.
Lemma 13. Conditions 1 and 3 can be simultaneously satisfied such that τ 1 (r) = τ 1 , τ 2 (r) = τ 2 and T (r) = T for all r ∈ N, and lim r→∞ e(r) < ∞ if
where β = (2ϑ 2 + 5ϑ − 5)/(2(ϑ + 1)). In this case,
Here, we may choose any
) and B 1 , B 2 , and B 3 such that FATAL stabilizes in time O(n(d F + d)) with probability 1 − 2 −Ω(n) .
Proof. We choose R − and R + such that (17) and (18) are satisfied with equality. Thus, any choice of
satisfies (9), and for (10)- (12) to hold it is sufficient that
These lower bounds on τ 1 and τ 2 are weaker than those imposed by Condition 1, which demands that min{τ 1 , τ 2 } ≥ ϑe(1) > F . Setting τ 1 := ϑe(1), τ 2 := ϑ(e(1) + d), and requiring T ≥ ϑ(τ 1 + τ 2 + e(1) + U ) thus guarantees that the above lower bounds on τ 1 and τ 2 hold, we have that
and the inequalities of Condition 1 are satisfied for r = 1. Moreover, with x := (3ϑ − 1)U + (1 − 1/ϑ)T , we have for r ∈ N that e(r) = β r−1 e(1)
i.e., e(r) is a convex combination of e(1) and x/(1 − β). We require that e(1) ≥ x/(1 − β), i.e.,
here, we used that 2−ϑ > 0, because α < 1. Thus, e(r) ≤ e(1), and we conclude that Condition 1 holds for
For any c > 1, sufficiently large M ensures that
where the last step uses that 1 − β ∈ Ω(1) because α < 1. Assuming sufficiently large M , the above lower bound on T can hence be met iff
In this case, for sufficiently large M the constraint on T is satisfied if
where we used that ϑ and thus 1 − α and 1 − β are constants.
To complete the proof, it remains to show that, for any such choice of T and a given lower bound on M , we can satisfy Inequalities (13)- (16) such that FATAL has the claimed guarantees on the stabilization time. Given that all parameters except for M , B 1 , B 2 , and B 3 are already fixed independently of these values, it suffices if we can solve the system
for an arbitrary K ∈ R + such that M is sufficiently large. By Corollary 13, we may choose B 1 , B 2 , and B 3 such that, e.g., B 3 ≥ B 1 + B 2 . Picking φ ≥ 1 in the corollary sufficiently large, we get that φB 1 ≥ K and M := 2(B 1 + B 2 )/(ϑK) is sufficiently large and satisfies the second and third inequality (where again we use that 2 − ϑ ∈ Ω(1)).
Finally, note that P ∈ O(d F ) and all factors occurring in this proof are constants depending on ϑ only, implying that φ and M are constants as well. The bound on the stabilization time thus readily follows from Corollary 13 as well.
In the remainder of the section, we assume (i) that the beat generation algorithm has already stabilized, i.e., the guarantees stated in Corollary 13 hold, (ii) that the executed clock synchronization algorithm is Algorithm 2, and (iii) that Condition 1 holds. The analysis for Algorithm 3 is analogous, whereθ = ϑ 3 takes the role of ϑ and Condition 2 takes the role of Condition 1; this is formalized by the following corollary and Theorem 5 at the end of this section. Corollary 14. Conditions 2 and 3 can be simultaneously satisfied such that lim r→∞ e(r) < ∞ if
Proof. Analogous to the proof of Lemma 13, but replacing the constraint T ≥ ϑ(τ 1 +τ 2 +e(1)+U ) by T ≥ τ 1 + τ 2 + τ 3 + τ 4 +θ(e(1) + U ) >θ(τ 1 + τ 2 + e(1) + U ) and setting τ 3 :=θ(e(1) + (1 − 1/θ)(τ 1 + τ 2 )) and τ 4 :=θ(e(1) + d + (1 − 1/θ)(τ 1 + τ 2 )) in accordance with Condition 2. This results in the requirement that
which in turn leads to the value forᾱ.
Analysis
Our analysis starts with the first correct beat produced by FATAL, which is perceived at node v ∈ C at time b v (1). Subsequent beats at v occur at times b v (2), b v (3), etc. We first establish that the first beat guarantees to "initialize" the synchronization algorithm such that it will run correctly from this point on (neglecting for the moment the possible intervention by further beats). We use this do define the "first" pulse times p v (1), v ∈ C, as well; we enumerate consecutive pulses accordingly.
Recall that in the above reasoning, we assumed that min v∈C {b v (2)} is sufficiently large. Clearly, this is the case if round 1 ends at all nodes before this time. Accordingly, we bound for
where the second last step makes use of Corollary 3. Because no node v ∈ C generates a pulse
], no such node triggers a NEXT signal during this time interval (cf. Algorithm 4). We have that
≤ B 1 , implying by Corollary 13 that min v∈C {b v (2)} ≥ b + B 1 + B 2 .
Lemma 14 serves as induction anchor for the argument showing that all rounds of the algorithm are executed correctly. However, due to possible interference of future beats, for the moment we can merely conclude that this is the case until the next beat; we obtain the following corollary. Proof. Lemma 14 shows that the first beat "initializes" the system such that p(1) ≤ e(1) and the first round is executed correctly. By Corollary 13, min v∈C {b v (2)} ≥ min{N, b+B 1 +B 2 +B 3 }. Hence, after round 1 Algorithm 2 will be executed without interference from Algorithm 4 until (at least) time min v∈C {p v (M ) + e(M )}. For r ∈ {2, . . . , M }, the claim thus follows as in Section 4.3.
Next, we leverage this insight to prove that the progress of the synchronization algorithm -which will operate correctly at least until the next beat -together with the constraints of Condition 3 ensures the following: the first time when node v ∈ C triggers its NEXT signal after time b + B 1 falls within the window of opportunity for triggering the next beat provided by FATAL.
Lemma 15. For v ∈ C, denote by N v (1) the infimum of times t ≥ b + B 1 when it triggers its NEXT signal. We have that H v (N v (1)) = p v (M ) + ϑe(M ) and that
Proof. At time b v (1), v ∈ C sets i := 0 (unless it already holds that i = 0). Thus, v will not trigger the NEXT signal until it sent at least M pulses and waited for ϑe(M ) local time, i.e., In the first case, we have that
We conclude that The claim now follows from (15) and (16) . With respect to the second case, observe that since no NEXT signal is triggered at any v ∈ C after time b + B 1 until time b + B 1 + B 2 + B 3 , min v∈C {b v (2)} ≥ b + B 1 + B 2 + B 3 by Corollary 13. Thus, Algorithm 2 runs without interference up to this time. Using this, we can establish the same bounds as for the first case.
This immediately implies that the second beat occurs in response to the NEXT signals, which itself are aligned with pulse M . The claim now follows from the second and third statements of Corollary 13.
Having established this timing relation between b(2) and p(M ), we can conclude that no correct node is reset due to the second beat. 
≤ b v (2) + R + .
Repeating the above reasoning for all pairs of beats b(k), b(k + 1), k ∈ N, it follows that no correct node is reset by any beat other than the first. Thus, the clock synchronization algorithm is indeed (re-)initialized by the first beat to run without any further meddling from Algorithm 4. This implies the same bounds on the steady state error as for the original synchronization algorithm. Proof. Lemma 13 that Conditions 1 and 3 can be satisfied such that lim r→∞ e(r) = ((ϑ − 1)T + (3ϑ − 1)U )/β and T 0 ∈ O(d F + d). Hence, we may apply the statements derived in this section. By Corollary 13, the beat generation mechanism will eventually stabilize. Afterwards, we can apply Lemma 16 to show that the second (correct) beat results in no calls to the reset function in Algorithm 4. In fact, this extends to any beat except for the first: letting beat k ∈ N take the role of beat 1, our reasoning shows that beat k + 1 does not result in a reset at any node. Moreover, applying the same reasoning to Corollary 15, we conclude that all rounds r ∈ N are executed correctly, and that p(r) ≤ e(r). The bound on E follows.
Observe that, in comparison to Theorem 1, the expression obtained for the steady state error replaces d by O(d F + d), which is essentially the skew upon initialization by the first beat. In Algorithm 2, we circumvented any dependence on F by varying round lengths over time. For the self-stabilizing solution, this is not possible, since counting rounds locally is not guaranteed to ensure a consistent opinion across all nodes concerning the nominal length of the current round; we are restricted to counting rounds modM ∈ N, so any long round length will reoccur regularly.
It remains to draw the analogous conclusions for using Algorithm 4 with Algorithm 3 as synchronization algorithm. (which holds for ϑ ≤ 1.004), whereθ = ϑ 3 andβ = (2θ 2 + 5θ − 5)/(2(θ + 1)), then all parameters can be chosen such that the compound algorithm self-stabilizes in O(n) time and has steady state error E ≤ (4θ − 2)U + ν(T + τ 2 )T 1 − α + (3ϑε + 2ν(T + τ 2 ))T (1 − α)(1 − β) , where α := (4θ 2 + 5θ − 7)/(2(θ + 1)) < 1 and β := (2ϑ − 1)/2 < 1. Here, any value of T ≥ T 0 ∈ O(d F + d) is possible.
Proof. As for Theorem 4, with Corollary 14 taking the place of Lemma 13 and noting that the convergence argument for the frequencies relies on rounds being executed correctly only (i.e., no assumptions on µ v (1), v ∈ C, are required).
We remark that despite the stringent requirements on ϑ for the recovery argument to work (i.e.,ᾱ < 1), the actual bound on the precision involves α and β. If ϑ ≤ 1.004, we have α ≤ 0.512 and β ≤ 0.502. Concerning stabilization, we remark that it takes O(n) time with probability 1 − 2 −Ω(n) , which is directly inherited from FATAL. The subsequent convergence to small skews is not affected by n, and will be much faster for realistic parameters, so we refrain from a more detailed statement.
Conclusions
The results derived in this paper demonstrate that the Lynch-Welch synchronization principle is a promising candidate for reliable clock generation, not only in software, but also in hardware. Apart from accurate bounds on the synchronization error depending on the quality of clocks, we present a generic coupling scheme enabling to add self-stabilization properties.
We believe these results to be of practical merit. Concretely, first results from a prototype Field-Programmable Gate Array (FPGA) implementation of Algorithm 2 show a skew of 182 ps [12] . Given the appealing simplicity of the presented algorithms and this excellent performance, we consider the approach a viable candidate for reliable clock generation in fault-tolerant low-level hardware and other areas.
