A Tight Lower Bound for Clock Synchronization in Odd-Ary M-Toroids by Frank, Reginald & Welch, Jennifer L.
ar
X
iv
:1
80
7.
05
13
9v
1 
 [c
s.D
C]
  1
3 J
ul 
20
18
A Tight Lower Bound for Clock Synchronization
in Odd-Ary M-Toroids
Reginald Frank∗
Texas A&M University, USA
reginaldfrank77@tamu.edu
Jennifer L. Welch†
Texas A&M University, USA
welch@cse.tamu.edu
July 16, 2018
Abstract
Synchronizing clocks in a distributed system in which processes communicate through messages
with uncertain delays is subject to inherent errors. Prior work has shown upper and lower bounds on the
best synchronization achievable in a variety of network topologies and assumptions about the uncertainty
on the message delays. However, until now there has not been a tight closed-form expression for the
optimal synchronization in k-ary m-cubes with wraparound, where k is odd. In this paper, we prove a
lower bound of 1
4
um
(
k − 1
k
)
, where k is the (odd) number of processes in the each of them dimensions,
and u is the uncertainty in delay on every link. Our lower bound matches the previously known upper
bound.
1 Introduction
Synchronizing clocks in a distributed system in which processes communicate through messages with un-
certain delays is subject to inherent errors. A body of work has sought bounds on how closely the clocks
can be synchronized when there is no drift in the hardware clocks and there are no failures. Prior work has
shown upper and lower bounds on the best synchronization achievable in a variety of network topologies
and assumptions about the uncertainty on the message delays.
Lundelius and Lynch [4] showed that, in an n-process clique with the same uncertainty u on every
link, the best synchronization possible is u
(
1− 1
n
)
. Subsequently, Halpern et al. [3] considered arbitrary
topologies in which each link may have a different uncertainty, and they showed that the optimal clock
synchronization is the solution of an optimization problem; however, no general closed-form expression
was given. Biaz and Welch [2] gave a collection of closed-form upper and lower bounds on the optimal
clock synchronization for k-arym-cubes (m-dimensional hypercubes with k processes in every dimension),
both with and without wraparound, in which every link has the same uncertainty, u. When there is no
wraparound, the tight bound is 1
2
um (k − 1). When there is wraparound and k is even, the tight bound
is 1
4
umk. However, when there is wraparound and k is odd, there is a gap between the upper bound of
1
4
um
(
k − 1
k
)
and the lower bound of 1
4
um (k − 1).
In this paper, we consider k-ary m-cubes with wraparound (“m-toroids”) and odd k. We show a lower
bound of 1
4
um
(
k − 1
k
)
, which matches the previously known upper bound. We use the same shifting
technique from previous lower bounds for clock synchronization (e.g., [4, 3, 2]). The key insight in our
improved lower bound is to exploit the fact that the graph is a collection of rings in each dimension and to
use multiple shifted executions instead of just one.
2 Preliminaries
We first present our model and problem statement (following [4, 1, 2]). We consider a graph of km processes,
where k ≥ 3 is odd and m ≥ 1, in which each process id is a tuple 〈p0, p1, ..., pm−1〉 where each pi ∈
{0, 1, ..., k − 1}. There are links in both directions between any two processes ~p and ~q if and only if their
ids differ in exactly one component, say the i-th, such that pi = qi+1 (addition on process indices is modulo
k throughout). Each process ~p has a hardware clock modeled as a function HC~p from reals (real time) to
reals (clock time). We assume there is no drift, so HC~p(t) = t+ c~p for some constant c~p. Each process is
modeled as a state machine whose transition function takes as input the current state, current value of the
hardware clock, and current event (receipt of a message or some internal occurrence), and produces a new
state and a message to send over each incident link.
∗Supported in part by the Distributed Research Experiences for Undergraduates (DREU) program, a joint project of the CRA
Committee on the Status of Women in Computing Research (CRA-W) and the Coalition to Diversify Computing (CDC), which is
funded in part by the NSF Broadening Participation in Computing program (NSF CNS-0540631).
†Supported in part by NSF grant 1526725.
1
A history of process ~p is a sequence of alternating states and pairs of the form (event, hardware clock
value), beginning with ~p ’s initial state. Each state must follow correctly from the previous one according
to ~p ’s transition function and the hardware clock values must increase. A timed history of ~p is a history
together with an assignment of a real time t to each pair (e, T ) in the history such that HC~p(t) = T . An
execution is a set of km timed histories, one per process, with a bijection for each link between the set
of messages sent over the link and the set of messages received over the link. The delay of a message is
the difference between the real time when it is received and the real time when it is sent. An execution is
admissible if every message has delay in [0, u] where u is a fixed value called the uniform uncertainty.
We assume each process ~p has a local variable adj~p as part of its state and we define its adjusted clock
AC~p(t) to be equal to HC~p(t) + adj~p(t). An execution has terminated once all processes have stopped
changing their adj variables. We say the algorithm achieves ǫ-synchronized clocks if every admissible ex-
ecution eventually terminates with |AC~p(t) − AC~q(t)| ≤ ǫ for all processes ~p and ~q and all times t after
termination.
“Shifting” an execution changes the real times at which events occur [4]. Let x be an m-dimensional
matrix of real numbers with k elements in each dimension, which we call a shift matrix; elements of x are
indexed by process ids. Define shift(α,x) be the result of adding x~p to the real time associated with each
event in ~p ’s timed history in α. Shifting changes the hardware clocks and message delays as follows [4, 1]:
Lemma 1. Let α be an execution with hardware clocks HC~p and let x be a shift matrix. Then shift(α,x) is
a (not necessarily admissible) execution in which
(a) the hardware clock of each ~p, denoted HC ′~p(t), equals HC~p(t)− x~p and
(b) every message from ~p to ~q has delay δ − x~p + x~q, where δ is the message’s delay in α.
3 Lower Bound
Theorem 1. For any algorithm that achieves ǫ-synchronized clocks in a k-ary m-toroid with uniform un-
certainty u, where k is odd, it must be that ǫ ≥ 1
4
um
(
k − 1
k
)
.
Proof. Let A be any algorithm that achieves ǫ-synchronized clocks in a k-ary m-toroid with uniform un-
certainty u, where k = 2r + 1 for some integer r ≥ 1. Let α be the admissible execution of A in which
HC~p(t) = t for each process ~p, every message from ~p to ~q, where ~q is ~p’s neighbor in the h-th dimension
such that qh = ph + 1, has the same fixed delay δ~p,~q, which is 0 if 0 ≤ ph < r and is u if r ≤ ph < k, and
every message from ~q to ~p has the same fixed delay δ~q,~p = u− δ~p,~q.
For 0 ≤ i < k, define αi = shift(α,xi), where the ~p-th element of the shift matrix xi, denoted xi~p, is
defined as
∑m−1
j=0 W
i
pj
, where W is defined as follows:
range of i
0 ≤ i < r r ≤ i < k
range of pj W
i
pj
range of pj W
i
pj
0 ≤ pj ≤ i 0 0 ≤ pj ≤ i− r pju
i < pj ≤ r (pj − i)u i− r < pj ≤ r (i− r)u
r < pj ≤ r + i+ 1 (r − i)u r < pj ≤ i (i− pj)u
r + i+ 1 < pj ≤ 2r (2r − pj + 1)u i < pj ≤ 2r 0
The idea behind the shift amounts in W is to cause two processes that are farthest apart in the graph to
be shifted as far apart in real time as possible—thus achieving a large skew between their adjusted clocks—
while maintaining valid message delays between all neighbors. By considering multiple shifted executions,
we can cancel out terms involving adjusted clocks, leaving behind only terms that involve the system pa-
rameters ǫ and u, and the graph parameters k andm.
As an example, consider the case when k = 5 and m = 1, that is, a 5-element ring (cf. Figure 1). We
will denote the process with id 〈i〉 by pi.
2
p0
p1
p2 p3
p4
Figure 1: Graph of a 5-ary 1-toroid
p0
p1
p2
p3
p4
p0
0 u
Figure 2: Delay Pattern in α for 5-ary 1-toroid
Figure 2 depicts the pattern of message delays in α for the 5-element ring, with p0 occurring twice for
convenience in representing the wrap-around. The interpretation is that every message, if any, sent from p0
to p1 has delay 0, every message sent from p1 to p0 has delay u, etc. We make no assumption about when
or if such messages are sent, as that depends on the algorithm.
p0
p1
p2
p3
p4
p0
0 u 2u
Figure 3: Shifted Delay Pattern for α1
p0
p1
p2
p3
p4
p0
0 u 2u
Figure 4: Shifted Delay Pattern for α4
Now we consider two of the five shifts for this special case. The shift matrix defined by W for α1 is
[0, 0, u, u, u] and that for α4 is [0, u, 2u, u, 0]. Figures 3 and 4 depict the pattern of message delays in α1 and
α4 respectively, reflecting the changes indicated by Lemma 1(b). Visual inspection shows that the delays
are still in the valid range and thus the shifted executions are admissible.
Admissibility and Lemma 1(a) imply that AC ′1 − AC
′
4 = AC1 − AC4 + u ≤ ǫ and AC
′
4 − AC
′
2 =
AC4 − AC2 + 2u ≤ ǫ. Similarly, one can check that α
0, α2, and α3 are admissible and then get similar
inequalities. Summing the five inequalities results in 6u ≤ 5ǫ, or ǫ ≥ 6u/5 which agrees with Theorem 1.
We now show that all shifted executions are admissible.
Lemma 2. For all i, 0 ≤ i < k, αi is admissible.
Proof. Fix i with 0 ≤ i < k. We must show that all message delays are in [0, u]. Let ~p and ~q be two
3
neighbors that differ in the h-th dimension such that qh = ph + 1 and qj = pj for all j 6= h. Denote the
(fixed) delay of messages from ~p to ~q in αi by δi~p,~q. By Lemma 1(b), δ
i
~p,~q = δ~p,~q +∆
i
~p,~q, where∆
i
~p,~q denotes
−xi~p + x
i
~q. Observe that ∆
i
~q,~p = −∆
i
~p,~q.
∆i~p,~q = −
m−1∑
j=0
W
i
pj
+
m−1∑
j=0
W
i
qj
by definition of shift vector xi for αi
= −Wiph +W
i
qh
since ~p and ~q only differ in the h-th dimension
= −Wiph +W
i
ph+1
by definition of ~q
Referring to the table defining W, we get the following values for ∆i~p,~q:
range of i
0 ≤ i < r r ≤ i < k
range of ph ∆
i
~p,~q range of ph ∆
i
~p,~q
0 ≤ ph < i 0 0 ≤ ph < i− r u
i ≤ ph < r u i− r ≤ ph < r 0
r ≤ ph < r + i+ 1 0 r ≤ ph < i −u
r + i+ 2 ≤ ph ≤ 2r −u i ≤ ph ≤ 2r 0
To gain an intuition for why αi is admissible, consider how the delays chosen for α relate to ∆i~p,~q.
Recall that ~p and ~q are neighbors in dimension h. If ~p occurs before index r in dimension h, then δ~p,~q, the
delay from ~p to ~q in α, is chosen so that ∆i~p,~q can be maximized; otherwise it is chosen so that ∆
i
~p,~q can
be minimized. To keep the shifted message delays in the valid range, ∆i~p,~q must be between u and −u. In
particular, the delay in α when ~p occurs before index r is chosen so that∆i~p,~q can be u; otherwise it is chosen
so that ∆i~p,~q can be −u. Below we formalize these ideas.
Since δ~p,~q is in [0, u], so is δ
i
~p,~q for all table entries where∆
i
~p,~q = 0. For all table entries where∆
i
~p,~q = u,
the definition of α states that δ~p,~q = 0, and thus δ
i
~p,~q = 0 + u = u. For all table entries where ∆
i
~p,~q = −u,
the definition of α states that δ~p,~q = u, and thus δ
i
~p,~q = u+ (−u) = 0. In all cases δ
i
~p,~q is in [0, u].
Since δ~q,~p is defined in α to be u− δ~p,~q and∆
i
~q,~p = −∆
i
~p,~q, it follows that δ
i
~q,~p = u− δ
i
~p,~q. Since we just
showed that δi~p,~q is in [0, u], the same is true of δ
i
~q,~p. Thus α
i is admissible.
Fix any i with 0 ≤ i < r. We focus on two processes that are maximally far away from each other. Since
αi is admissible by Lemma 2, A must ensure that ACi〈i,...,i〉 − AC
i
〈i+r+1,...,i+r+1〉 ≤ ǫ, where AC
i
~p denotes
the adjusted clock of process ~p after termination in αi. By definition of αi and Lemma 1(a), ACi〈i,...,i〉 =
AC〈i,...,i〉 and AC
i
〈i+r+1,...,i+r+1〉 = AC〈i+r+1,...,i+r+1〉 −m(r − i)u. Thus by substituting we get:
AC〈i,...,i〉 −AC〈i+r+1,...,i+r+1〉 ≤ −m(r − i)u+ ǫ for 0 ≤ i < r (1)
Similarly, we can show:
AC〈i,...,i〉 −AC〈i−r,...,i−r〉 ≤ −m(i− r)u+ ǫ for r ≤ i < k (2)
Adding together the r inequalities from (1) and the k − r inequalities from (2) gives
r−1∑
i=0
AC〈i,...,i〉 −
r−1∑
i=0
AC〈i−r,...,i−r〉 +
k−1∑
i=r
AC〈i,...,i〉 −
k−1∑
i=r
AC〈i−r,...,i−r〉
≤ −um
[
r−1∑
i=0
(r − i) +
k−1∑
i=r
(i− r)
]
+ kǫ
(3)
The left-hand side of (3) resolves to 0 and the expression in square brackets equals (k2 − 1)/4, and thus
ǫ ≥ 1
4
um
(
k − 1
k
)
.
4 Conclusion
We have closed the gap between the best previously-known closed-form upper and lower bounds on the
optimal clock synchronization for k-ary m-toroids when k is odd and the uncertainty on each link is the
same. By applying a more involved set of shifts than those in the prior work [2] and exploiting the specific
network topology, we achieved a lower bound that equals the upper bound due to the algorithm in [2].
4
References
[1] H. Attiya and J. Welch. Distributed Computing: Fundamentals, Simulations, and Advanced Topics,
Second Edition. John Wiley & Sons, Hoboken, NJ, 2004.
[2] Saad Biaz and Jennifer L. Welch. Closed form bounds for clock synchronization under simple uncer-
tainty assumptions. Inf. Process. Lett., 80(3):151–157, 2001.
[3] Joseph Y. Halpern, Nimrod Megiddo, and Ashfaq A. Munshi. Optimal precision in the presence of
uncertainty. J. Complexity, 1(2):170–196, 1985.
[4] Jennifer Lundelius and Nancy Lynch. An upper and lower bound for clock synchronization. Information
and Control, 62(2/3):190–204, 1984.
5
