Experimental validation of clock synchronization algorithms by Palumbo, Daniel L. & Graham, R. Lynn
, NASA
Technical
Paper
3209
July 1992
t
'-, \
..{L
:i_ _,
Experimental Validation
of Clock Synchronization
Algorithms
Daniel L. Palumbo
and R. Lynn Graham
(NASA-IP-32_q_) _XPER' MgNTAL VALIP'ATI_N _F t_)2-27-, _ :,
CLUC_k S¥_£H_fJ*,I[ZAT[QN ALG_nRIlrHf4S (NASA}
24
I Ul_
uric!as
H1/OZ 00992}6
/
https://ntrs.nasa.gov/search.jsp?R=19920018346 2020-03-17T10:44:11+00:00Z

ERRATA
NASA Technical Paper 3209
Experimental Validation of Clock
Synchronization Algorithms
Daniel L. Palumbo and R. Lynn Graham
,hfly 1992
Page 5, figure 5: The figure should appear a.s follows:
I(T): ] E R
I P
T
I+(T); 1" • R +
+
- X T c
Issued July 1992

NASA
Technical
Paper
3209
1992
National Aeronautics and
Space Administration
Office of Management
Scientific and Technical
Information Program
Experimental Validation
of Clock Synchronization
Algorithms
Daniel L. Palumbo
Langley Research Center
Hampton, Virginia
R. Lynn Graham
PRC Kentron, Inc.
Hampton, Virginia

Abstract
The objective of this work is to validate math-
ematically derived clock synchronization theories
and their associated algorithms through experiment.
Two theories are considered, the Interactive Conver-
gence Clock Synchronization Algorithm and tile Mid-
point Algorithm. Special clock circuitry was designed
and built so that several operating conditions and
failure modes (including malicious failures) could be
tested. Both theories arc shown to predict conser-
vative upper bounds (i.e., measured values of clock
skew were always less than the theory prediction).
Insight gained during experimentation led to alterna-
tive derivations of the theories. These new theories
accurately predict the behavior of the clock system.
It is found that a 100-percent penalty is paid to tol-
erate worst-case failures. It is also shown that under
optimal conditions (with minimum error and no fail-
ures) the clock skew can be ms much as three clock
ticks. Clock skew grows to six clock ticks when fail-
ures are present. Finally, it is concluded that one
cannot rely solely on test procedures or theoretical
analysis to predict worst-case conditions.
Introduction
Many theories of clock synchronization have been
proposed and subjected to the rigors of mathematical
proof of correctness (see refs. 1 and 2). Few of these
theories are validated by experiment. One of the dif-
ficulties in validating clock synchronization theory is
that the theory often predicts the behavior of the syn-
chronization algorithm under failure conditions that
are hard to replicate in the lab (e.g., the presence of
a "malicious liar," ref. 3). The objective of this work
is to select a theory for validation, build a synchro-
nization subsystem that is based on this theory, and
subject this subsystem to a series of tests designed
to validate the theory.
The Interactive Convergence Clock Synchroniza-
tion Algorithm (ICCSA) of Lamport and Melliar-
Smith (ref. 4) was chosen as a test subject because
of its use on the SIFT (Software Implemented Fault-
Tolerance) computer (ref. 5) and the fact that the al-
gorithm and the accompanying bounding theory had
been recently subjected to the rigors of a mechani-
cal proof (ref. 6). During the process of testing, it
was found that the theoretical bound on the clock
skew was larger than the observed maximum clock
skew. Although the theory only guarantees an upper
bound, this discrepancy led to inquiries into why the
theory was not more accurate. In the course of this
investigation, an alternative method for the deriva-
tion of the expression for the clock skew bound was
developed. This new expression accurately predicts
the observed clock skew for the Interactive Conver-
gence Clock Synchronization Algorithm.
Lundelius has derived a clock skew bound (ref. 7)
for the Midpoint Algorithm proposed by Dolev
(ref. 8). The Dolev algorithm was programmed into
the clock synchronization subsystem and tested. As
with the ICCSA theory, the predicted bound was
found to be greater than the observed clock skew
(although only in extreme cases). With the insight
gained from the previous derivation and applying a
fresh approach to the worst-case analysis of the Mid-
point Algorithm, a new expression is derived that
accurately predicts the observed clock skew.
In the following sections, expressions for the clock
skew bound for both the ICCSA and the Midpoint
Algorithm will be derived. A test plan will be
introduced, and the design of the clock subsystem
described. Results of the testing are presented and
case studies are done. Finally, conclusions concerning
this work are drawn.
Symbols
EHDM
L
HDM
ICCSA
?Tt
p, q, r, s
R
S
T
Tqp
G
t
t i
extended hierarchical design
methodology
clock counter frequency
clock reference frequency
hierarchical design
methodology
Interactive Convergence Clock
Synchronization Algorithm
number of faulty clocks in a
synchronizing set
number of clocks in a synchro-
nizing set
processor designations
minimum length of synchro-
nization period
minimum length of synchro-
nization process
clock time
time of clock correction
clock reading of processor p
upon receipt of synchroniza-
tion signal from processor q
time of synchronization signal
real time, (1 - p)T + e + to
uncorrected clock function (see
fig. 4)
to
u
A
60
co
P
PM
PP
Pqp
E
Xp
V
real-time offset at T = 0
drift rate setting in clock
subsystem peripheral
limit of perceived skew allowed
in ICCSA
perceived skew of processor p
with respect to processor q
maximum skew between good
clocks in a synchronizing set
real-tinm skew between
clocks p and q
real-time skew between pro-
cessors p and q when clock for
processor p equals T
maximum initial skew
maximum clock read error
minimum read error, 1/fi:
clock drift rate with respect to
real time
maximum drift rate expected
between any two clocks
drift rate of clock p
drift rate between clocks p
and q
maximum clock correction
clock correction calculated by
processor p
perceived skew value derived
Dora faulty clock reading
Clock Fundamentals
The purpose of synchronizing clocks is to pro-
vide a global time base throughout a distributed
system. Once this time base exists, transactions
between members of the distributed system can be
controlled based on time. For example, the manage-
ment of redundant, data in a real-time fault-tolerant
computer is simplified if the processors are synchro-
nized (ref. 9). In the following discussions, the term
clock refers to a device that provides a time base for
a processor. A processor thus inherits time-related
characteristics from its clock. For this reason, we
sometimes refer to a processor as drifting with re-
spect to other processors when, in fact, the drift is
actually a property of the clock.
2
A common convention has been that real time is
denoted by a lowercase letter, as in t or ?5, and that
clock times are capitalized, as in T and A. A clock
approximates real time with the relationship between
clock time and real time given by
t= (1 - p)T (1)
where t is real time, T is clock time, and p is the
rate of drift of clock time from real time. A clock
may have some nonzero offset at clock time T = 0,
ms represented by the constant t0 in equation (2).
t= (1-p)T+to (2)
If p is zero, the clock is a perfect clock. If p is
positive, the clock is a fast clock and accunmlates
time faster than real time. Clocks are considered to
be digital devices consisting of a crystal oscillator
and a counter. Ideally, the crystal oscillates at
frequency ft. Deviations from this specification are
what cause drift among a set of clocks. The digital
nature of the counter causes the relation between t
and T to be discontinuous, as shown in figure 1. The
error in reading a clock is denoted as g, and for digital
clocks ¢ has a minimum, s0, of 1 ft. Thus, for a
digital clock the inverse of equation (2) becomes
T = [(t - t0)/(1 - p)J (a)
where [ j represents the floor function.
For a set of clocks, a maximum drift rate PM is
chosen so that for any nonfaulty clock p in the set
Ir,pl _<pM/2 (4)
The drift between any two clocks p and q in the
set of nonfaulty clocks is given by
pqp=pq--pp (5)
with
tpqpl <- PM (6)
d-e
t
$
0_2 }co
p=0
p>0
Clock time, T
Figure I. Real time versus clock time.
The real-time skew _Sqp tha! exists |)etween two
clocks at some clock time 7" is given by
= 6(T) - tq(T) (7)
Alternatively, the skew can be cxpressc<t in terms of
the difference between two clock values at, some real
time t. The form of equation (7) was chosen, as this
is the perst)ective taken in the Lamport and Mclliar-
Smith proof.
Synchronizing Clocks
In the two algorithms considcrc(t here, synchro-
nization is accomt)lished by t)criodically executing an
algorithm that first computes a clock correction value
and then apt)lies the correction to the local processor
clock. In order to compute either of the two algo-
rithms, each processor in the synchronizing set must
obtain a perceived skew Aqp t)ctv, rC(Hl its ch)ck and
each of the other clocks in the set. To obtain Aqp,
processor p IllllSt COtllpllte the difference t)etween its
local clock and the remote clock. Processor p must, in
effect, read the clock of processor q. Figure 2 graph-
ically depicts this process. By design, the algorithm
executes every /? time units and takes S time units
to complete. In the clock subsystent constructed for
these tests, actual clock values are not transmitted.
Instead, at predetermined time T._ during S, clock q
sends a synchronization signal to p. Upon receipt
of this signal, p reads its local clock and stores this
value, Tqp; Tqp is then the local clock value for proces-
sor p taken at a real time corrcst)onding to Ts, the
clock reading for t)rocessor q. The t)erceiv('d skew"
Aqp can then t)e computed as Tip - T.,.
Aqp
qp _ Tp
// I
Clock time Tq
Z,.*' Algorithm I I Algorithm
st art Ts cnd
4 D
s
iI D
Figure 2. l/ea(ling the clock of anoth('r l)roc('ssor.
More precisely stated, the l)crceived skew values
arc arrived at t)y the following process:
.
3.
Each t)rocessor I)roadcasts a synchronizing signal
at a t)rcdeterlnined time T.,.
Upon reccit)t of the synchronizing signals from
()thor processors, the rec(>iving t)r()cessor p stores
its clock value, T,fp.
The t)('rceivcd skew is th(,n the stored value, Tq,,
mimls 7]_. or
%;;,- (s)
Figm'e 3 represents this process taking place 1)e-
tween two t)rocessors p and q, with processor q hav-
ing a clock that is faster than processor p. From th('
gral)h it can bc seen that 7",1i, can })e thought of as
the vahw of clock p at real time tq(7_,), or
T;,(t,;(;C,))± c,;;, (9)
where 7), is the inverse ch)ck function ()f clock p, and
cqp is the error inherent in taking Tp.
By using equation (S) with equations (9), (2), an,t
(3), the following expression for the per(:civcd skew
can t)e derived (sec appendix A):
Aqp (Sqp(T_) ± c 4- pp Aqp (10)
An examination of figure 3 will reveal that if q ix
faster than p, then Pql, > O, [_ql, > 0, and Aqp < 0.
To correct its clock, the slower processor p must add
a positive value to the clock. Since the values of
Z2Xqpwill be negative, the resulting correcting wduc
must be subtracted fl'om clock p (assuming that a
sign change does not occur in the algorithm).
L tp
i
_(tq(Ts))
T
Figure 3. Formulation of Tql,
Figure 4 graphically depicts the effect of applying
a correction to a fast clock. The superscripts i and
i + 1 refer to synchronization periods, a,s will be
discussed in the section "Periods i and i + 1." In the
figure, t i refers to the uncorrected clock filnction and
t i+l to the corrected clock function. The correction is
applied at clock time To. The following relationship
exists between the corrected and the uncorrected
clock fimctions:
ti+l(Tc) = ti(Tc) + (1 - p)x i (11)
Tc-)_i
//
/
Figure 4. Effect of applying correction.
P
Some Useful Relations
The following relations will be used to derive
the bound formulas. Detailed derivations of these
relations are given in appendix A. These relations
hold true provided that a clock correction is m)t
applied during the interval from T to (T + C).
(Sqp(r + C) -- Dqp(T) + pqpC (12)
Equation (12) states that the skew between p
and q at some time T plus a constant C is equivalent
to the skew that exists at time T plus an amount
equal to the relative drift rate times the constant.
[_qp(T) = pqp(T -To) + 6qp(Tc) (13)
Equation (13) states that the skew between p
and q during a synchronization period is equivalent to
the skew at the beginning of the period (To) plus the
skew accumulated over the period due to the relative
drift pqp.
_,.,_(r) - _p(r) = _,q(r) (14)
Equation (14) states a relationship that exists
between the skews of three good clocks, p, q, and r.
The Proofs
The statement of the bounding theorem is taken
largely from references 4 and reference 6.
Clock Skew Bounding Theorem
For a set of n processors cooperating in the syn-
chronization algorithm for all time T through pe-
riod i, a bound 5 exists on the skew between any
two of the processors given that at most m of the n
processors are faulty. Stated mathematically,
Itip(T) - t;(T)l < 5 (15)
Because this theorem is written in terms of cousecu-
tive periods of time, it is convenient to use proof by
induction. To do this, we will derive an expression
for _ for the first interval, i = 0, and then show that
another expression exists that is true for the following
intervals. This latter expression depends on charac-
teristics of the synchronization algorithm, and thus
separate derivations arc necessary for the ICCSA and
the Midpoint Algorithm.
The First Period, i = 0
At system start-up, assume a maximum skew 50
exists between all good processors in the set. Then,
at theendofperiod0with T = R,
t°(R) - t°(R) = (1- - (1- + %, -
= (pq -- pp)R + top -- toq
= pqpR + top -- tl)q
(16)
Expres-
sion (16) is thus one constraint on the value of 6, i.e.,
6 >_ pMR + _o.
- tq(R)l <_ PMR + _o <_
where in expression (16) [top- toq[ <_ 60.
Periods i and i + 1
To continue tile proof, we will assume that all
expression for the bound is true for period i and
show that the same expression is true for i + 1.
As stated above, this expression will depend on the
synchronization algorithm. However, we can derive a
general expression from which the subsequent proofs
can continue. Refer to figure 5 for a graphical
representation of the situation that exists between
periods i and i + 1. To reduce clutter in the terms,
the lack of a superscript will refer to period i and a
+ superscript, will refer to period i + 1.
t(T): T e t+(T); T
R ii 4 II
+-
T - ?( TC
t
Figure 5. Transition from period i to period i + 1.
the PX difference terms. In short, when Xp- Xq is
inaximized, flqXq - Pp)(p is minimized.
Substituting the resulting expression in equa-
tion (13) written for period i + 1, we obtain
+ * (19)(Sqp(T) <_ _qp(Tc) + (Xp - Xq) + pMR
with R > (T - To). Expression (19) will be used in
the following sections to derive bound expressions for
the associated algorithms.
The Interactive Convergence Clock
Synchronization Algorithm
The ICCSA is derived for n clocks synchronizing
in the presence of m faulty clocks. In this algorithm,
a processor computes the correction by averaging all
the perceived skew vahms Aqp. To limit the effect
of a faulty clock, the Aqp are subjected to the test
that their absolute value be less than some inaximum
expected wdue A. If Aqp exceeds A, Aql_ is set to O.
More precisely
1 Yl
71
q=l
where _qI, = 0 if IAqp[ > A. A value for A is easily
derived from equation (10):
A > b+s+ PMA (21)
- 2
Wishing to replace the correction terms in equa-
tion (19) with an expression based on equation (20),
we look at the correction terms more closely:
Using equation (7) for period i + 1, we have
%(T,:) = t+(r,,)- +tq (T(,) (17)
Then using equation (11) t.o replace the t + functions
with t, and then equation (7) again to recombine the
t functions, we get
_q+(Tc) = _Sqp(Tc) -1- (Xp - 7)(q) 4- pqXq - Pt)XI) (18)
It is assumed that the difference between the PX
terms can be ignored. For an error-free system this
is justified because, when considering the worst-case
skew condition with pq equal to negative pp_ PqX.q
will be of the same sign and approximately equal to
pp)(.p. When clock read errors are present, the worst-
case read error effect occurs when the error for clock q
is equal to but opposite to the error for clock p. As
in the error-free ease, the effect is canceled out in
1E_,, v -E_rq
kp -- kq = rl 7"1
r=l r=l
r=l
it 2 m
?l
r= 1
1 1 m
+ - (Am, - Apq) + - (Aqp - Aqq) + -- (VI, - Vq)
The final expression contains four terms, the first of
which contains values of Aqp taken from n- 2 - m
good processors. The second and third terms have
readings of the local clock, e.g., App. The last term
holds the readings from the m possible faulty clocks
(denoted by V). In appendix B, each term is taken
individually and expanded under assumptions relat-
ing to those terms and then recombined to obtain
5
(m - _t_ 2(,_, - 1 - m)
%, - x_ < \_/%p(T,,) + .,
+ t'_I(" - '.,) ix 2-_ A+ (22)
tt 15
Substituting equation (22) into equation (19), we
get
< (,,t) 2(.- ] - .,)
2'i"l _X+ p_l (,_ - m) ix + __ + p,_tR (23)
7t 1/
Now we create an expression for 6 and assmne it. holds
for period i, i.e., that b > 6qp(T), with T in i and with
b given by"
b_> 2(n- l-m) 2'm ix+ p,_l A + --
/t )+ p.vn (24)
Under this assumption then by replacing bqp with b
in equation (23) we have
2(n- 1 -m) PM(n- ,n)
e+ A
It IS
21n
+ --A + (25)
75
Now using equation (24) for b, it follows that
eq+(T) _<2(.- -'") c + + 2,,, ix
IL -- H,_ _- ?Tt
which completes the proof.
The Midpoint Algorithm
In the Midpoint Algorithm, as suggested by Dolev
(ref. 8), the correction is computed as the midpoint
of the span of values of ixqp after the 'm largest and
smallest values have been discarded. Stated for tile
ease where m = l,
1. Processor obtains all the Aqp values.
2. The ixqp are ordered so that ixmin --_ ixmilJ'-
&max' _-- ixmax.
6
3. Discard Ami n and Area x an(t use the new mini-
InllIn and lllaxinlllin, ixlnin_ all(t ixmax r, to coin-
putt the correction as
Amin' + ix,nax' (27)
Xp = 2
This algorithm has tile property that the clock read-
ing of a faulty processor will not be used to com-
pute t.h(. correction unless it is boun(ted by good clock
readings. This results in it being possible to derive a
tighter hotrod.
In the following sections, an expression for Xlp is
derived by first considering the case with no errors,
then with some clock read error 5, and finally with
an arbitrary faulty clock reading.
The ideal ca_e. In the absence of a faulty clock
and read errors, all good processors in a synchroniz-
ing set will place the processor readings in the same
order. Take, for example, the four-t)rocessor system
(p, q, r, .s) where
tp(r) <_ tq(T) <_ t,.(T) < t s(T)
Then, for any member i in (p, q, r..s)
Ap, < Aqi _<A,.i < A._i
All good processors will then use ch)ck readings from
the same two processors to compute their resp_,ctive
corrections. (In the above example_ this would be
ixqi and Ari. ) This is equivalent to tile processors
using a single clock reading which is at the midpoint
of these two clock readings (bmid(Sr_)). Thus using
equation (10) with c = 0, we ha_,e
k'p = ix,,,id,p = -bmid(T._) + Ppix,,,i,t,p (28)
Including read error. Any read error pr,_sent
in the clock readings will affect the clock corre,:tion
by at most the read error c:
A,.i,V ± _- + ixmax / ± c
Xp = 2
____ixmin t -1- ixmaox' ±
2
: ixmi(t,p ± g
= -6mid,p(Ts) + ppAmid, p :k -C (29)
Including a faulty clock. In reference t(, fig-
ure 6, consider that the maximmn and minimum
readings taken from good clocks differ by at most
6 + 2s. The algorithm guarantees that if a faulty
clock reading is used in computing the correction,
it is bounded by good clock readings. Thus, the
maximum error that a faulty clock could cause is
1/2(<5+ 2e). The expression for tile correction includ-
ing both read error and error due to a faulty clock
reading becomes
Xp = 2
Xp = --<5,nid,p(T_) + ,OpAmid,p
A maximum correction E can be obtained by using <5
and A for tile maximum values of (Staid,p and z_.mid,p,
giving
E>6+PMA± ( <5)- 2 74+c
56 PM A> --+e+
- 4 2 (31)
Now using equation (30) in equation (19), we obtain
_v-_,(T)_<¢.,(%) + [,_,,,q(7;) - _,,+(Z_)]
0 + pR (32)
Substituting equation (35) fi_r <5in equation (34), we
get
<5,+_(T) <_ 4e + 2pMA + 2pMR < _ (36)
which completes the proof.
Ami n Amin' Area x, Ama x
,_le, ,ele, l _[_'I i ,,._i" I e, _ I ' i
<5+
,I ID
2_
FiRurc 6. Set <>fperceived skews taken from good clocks.
If the effect of faulty processors were to l>e ignored
(m = 0), then equation (35) becomes
+ (37)6<tp(T ) < 2_- + pMA + pMl?
and the clock bound is
b _> 2s + pMA + p,_tR (38)
Experimental Verification
To experimentally verify the derived skew bounds,
sew_ral tests were performed in which the effect of
varying one parameter of the skew b<mnd expression
was measured while the remaining parameters were
either hckt constant or zero. The parameters are
6o, c, p, m, n, and R. For the clock subsystem that
was actually tested, the number of clocks 7_was kept
constant at four, and thus m was limite<t to (0,1) for
both algorithms. It was decided that if p is tested,
it is not necessary to test the effe<'t of varying the
synchronization period R. The following test cases
were then generated:
Ignoring the difference between the pA terms (as
was done in eq. (18) with the PX terms) and using
equation (14) on [<5,nq(T,Q - <smp(T._)], we get
6+(T) <_ <sqp(rc)-6qp(T_)+2 _ -FE +pMR (3:])
We then use equation (12) with Tc = T_.+A to obtain
_Sq+,(T)<_pMA+2(}) +pMR (34)
Now to continue the induction, we assume the fol-
lowing expression to be true for period i:
6 > 4s + 2pMA + 2pMR (35)
1. <5=Owith 6o=O,m=O,a=O. andp=O
2. <5= f(ho) during tile first period with p = 0
3. 6 = f(6o) during the first .,oriod with p = C
4. <5= f(e) with m = (0,-' / = 0, and 60 = 0
5. <5= f(g) with m = (0, 1),p = C. and b0 = 0
6. <5= f(p) with m = (0, 1),s = 0, aim <5o= 0
7. <5= f(p) within= (0,1),s=C. and hi)=0
In all the tests, the read error is treated as a
random variable with a mean of zero. This is not
the case in most communication systems. However,
the expected value of the comnmnication delay is
often known and can be subtracted from the clock
readings in the synchronization algorithm, so that
the resulting effect is a read error with zero mean.
In addition to functioning as a synchronizing cir-
cuit, the clock subsystem must be able to support
the test plan. The followingcapabilitieswerethen
designedinto the clocksubsystemand tile experi-
mentsupportenvironment:
1. Ability to sustainlong-durationdataacquisi-
tion of internalvariableswithout perturbing
thesystemflmction
2. Availabilityof a global clock that can be read
by each processor under test; the global clock
will represent real time
a. Ability to set the starting skew ?J0of each clock
4. Ability to set the drift rate of each clock with
respect to real time, i.e., the global clock
5. Ability to set the read error of each clock
6. Ability to emulate a faulty clock, especially a
malicious liar
The following sections describe tile clock sub-
system and experiment environment.
Design of Clock Subsystem
The clock subsystem is designed as a synchro-
nization peripheral. This primary function is then
augmented to provide the data acquisition and con-
trol necessary to accomplish the tests proposed in
the previous section. The next section will describe
the design of the primary synchronization function.
This is followed by a section on the actual design,
which includes the test augmentations. In these and
the subsequent sections, the term clock tick is used
to refer to one increment of digital time. Practically
all tile paraineters are stated ill terms of clock ticks
instead of time. A clock tick is ea_sily converted to
time once the base frequency of the clock is known.
A clock synchronization peripheral. As men-
tioned previously, the ICCSA was first used in the
SIFT computer. This implementation was tested
(rcf. 10), and it was found that the clock skews
were due primarily to large clock read errors. It was
proposed then that a simple hardware enhancement
could greatly reduce the read error, tighten the clock
synchrolfization, and thus increase the efficiency of
interprocessor communication. While it is possible
to put the entire clock flmction in hardware, for the
purposes of this test it is convenient to have the al-
gorithm in software so that alternate algorithms can
be tested. Having the algorithm in software also en-
hances data acquisition and fault simulation.
Figure 7 is a block diagram of how the clock
functions are distributed between the clock periph-
eral hardware and the synchronization software. The
8
clock hardware monitors a communication channel
for the presence of a synchronizing signal. When a
sync signal is detected, the hardware latches the local
clock value and stores it in a register related to the
processor that sent the signal. The clock hardware
also generates a sync signal at a specified time Ts
and places the signal on the conmmnication channel.
These functions are done most efficiently (i.e., tile
lowest read error is realized) if they are integrated
with the communications and networking protocols.
The clock peripheral also generates an interrupt to
the host processor to indicate the end of the pe-
riod. The processor then executes the clock algo-
rithm, reading the clock read registers, computing
the correction, and correcting tile clock.
Processor/Memor
Syncalgorithm
Clock peripheral
Figure 7. Block diagram of clock fimctions.
Several considerations must be made to properly
design the clock peripheral. The ICCSA requires that
all clock readings greater then A be ignored. This is
equivalent to a buffer of size A existing before and
after the synchronization time T_ (see fig. 8). The
clock hardware can easily be designed to enforce the
rejection of signals received outside this window by
clearing all clock read registers at the beginning of
the window and inhibiting the update of the regis_ ers
at the end of the window (when the interrupt to the
processor is generated).
D
s
.I D
// I I I
START Ts ENDA A
Figure 8. Synchronization window.
Thought nmst also be given to the clock itself.
The clock must be corrected. While at first this may
sound trivial, several factors should be considered.
A read error equivalent to 1/fc could be induced
every time a clock is read or written. Thus, by
readingtheclock,addingthecorrection,andwriting
the newvalue,two clockticksof readerrorcanbe
accumulated.Also,sinceit takesthe processora
finite amountof timeto performthe correction,it
is possiblethat additionaltickswill be lost during
thecorrection.Correctingthe clockby addingthe
correctionis undesirablebecauseclocktime will be
either "lost" or repeated,and then caremust be
takennot to "skipover"or "reschedule"an event.
Alternativecorrectionmethodscanbedesignedthat
addpulsesto ordeletepulsesfromaclockoscillator
input, as necessary.As will be seen,this is tile
methodusedto adjust tile drift rate betweenthe
processors.To avoidpossibleinteractionbetween
the applicationof the correctionandthe drift rate
setting,anothercorrectionmethodwasdeveloped.
In theclockcircuit tested,the correctionis ap-
pliedby movingthesynchronizationwindow(which
definesthe endof tile frame). Normallythis would
resultin largerskewsbecausethe(:lockswill drift for
nnadditionalframebeforethecorrectiontakeseffect.
Thisis indeedwhatwouhthappen.However,(luring
this testno othertasksarescheduledoff the clock
duringthefi'ame.Thus,movingthesynchronization
windowis a wayof applyingthe correctionfor the
purposeof this test. Measurementsarenot affected
becausedataareonlytakenduringtheexecutionof
thesynchronizationalgorithm,andby this time,the
correctionfor the last framehasalreadybeenap-
plied. An additionalbenefitof usingthis method
is that the lengthof timetakento computetheal-
gorithm(includingany interruptlatency)doesnot
affectthe experiment.This alloweda greatdealof
freedomincodingdifferentalgorithms,faultmodels,
anddataacquisition.
Test augmentation. The clock peripheral de-
sign is augmented to allow the adjustment of the
oscillator drift rate, the setting of read error, and
the simulation of a malicious liar. To adjust the drift
rate, the oscillator input of the clock counter is driven
by a pulse deletion circuit. The pulse deletion circuit
has as input a reference oscillator signal (the global
clock oscillator) and a 16-bit unsigned integer value.
The circuit loads the 16-bit value in a down-counter
and deletes a pulse from the reference oscillator sig-
nal on overflow. A value of 0 will cause every other
pulse to be deleted; a value of 1 will delete every
third pulse, and so on, so that the clock frequency is
defined as
fc- v+l
v + _fr (39)
where v is the 16-bit value and fr is the reference
clock frequency. If we let p be defined as
fF -- f(} (L < (40)P- f,.
then equation (39) can be written as
1 - 2p
v - (0.0 < p _< 0.5) (41)
Foradrift rate ofl0 5 v = 99998.
Read errors and faulty clock behavior can be
programmed by varying the syne strobe time. To
present different errors to each of the remote clocks
(a form of malicious behavior), a SYNC pulse must
be independently generated for each remote clock.
Thus, three SYNC register/comparators were used
in the final circuit design.
Figure 9 is a block diagram of the clock synchro-
nization peripheral. The circuit is designed for four
clocks (one local and three remote) and assumes a
dedicated connection to the remote clocks. An oscil-
lator drives a counter of sufficient length to resolve
a frame. Five register/comparator blocks define the
START window time, SYNC times, and END win-
dow time (T_.). The START strobe clears and en-
ables the STORE n registers. The SYNC strobes are
broadcast to the remote clocks. The END strobe dis-
ables the STORE n registers, interrupts the proces-
sor, and clears the clock (counter), beginning a new
frame. Three remote clock strobes are gated through
the enable circuitry to the STORE n registers.
On receipt of a synchronization strobe, the current
clock value is latched into the associated STORE n
register.
Experiment Environment
The clock peripherals were installed on an exist-
ing fault-tolerant processor (FTP) test-bed (ref. 11).
The FTP is hosted from a VAX computer through
a dual port memory. In addition, each channel of
the quad FTP has an additional dual port memory
channel to separate VAX computers. These channels
were dedicated to data acquisition. A sixth VAX
computer with a windowing interface was used to
control the experiment. The FTP is a tightly cou-
pled computer. Initial skew is then easily controlled
from the base skew of tS0 = 0 provided by the FTP.
The synchronization algorithm is loaded into FTP
RAM and configured for the test trial. The FTP op-
erating system is then started from ROM. After the
FTP stabilizes, control is passed to the synchroniza-
tion algorithm and the FTP clock synchronization is
disabled.
9
pSTROBEq
I
p STROBE r
...........7..[............!iii
Reference oscillator
q STROBE p
r STROBE p
s STROBE p
Figure 9. I)etail of clock synchronization peripheral.
Another component of the experiment environ-
merit is the global clock. The global clock has a base
frequency of 2 MHz and a resolution of 32 bits. The
output of the global clock can be read by each chan-
nel and is assumed to be real time. To establish the
global clock as real time, its 2-MHz base fl'equency is
fed t.o the clock synchronization peripherals as the
reference frequency. Thus, in the absence of any
programmed drift rate, the clock synchronization pe-
ripherals are perfectly synchronized.
Results
Several tests were run to verify the functionality
of tile system. The following runs were made with
tile synchronization algorithm disabled:
1. ('m = 0, p = 0, c = 0, and t50 = 0) to test tile
global clock
2. (m = 0, p > 0, c = 0, and 60 = 0) to test drift
rate circuits
3. (m = O,p = O,e = O, and 60 > O) to test
setting initial skew
4. (m = 0, p = 0, s > 0, and (5o = 0) to test
setting the read error
With the synchronization algorithm enabled, sev-
eral tests were run with 6o > 0 and p > O, and it was
found that equation (16), the i = 0 synchronization
constraint, held. The next several sections present
the results of testing the ICCSA and the Midpoint
Algorithm.
The ICCSA. In reference 6, six constraints are
listed that must be met if the bounding theorem is to
hold for a clock synchronization system executing the
ICCSA (see table I). These constraints include the
skew t)ounds (C5 and C6), the maximunl perceived
(:lock skew A (C4), tile inaximum clock correction E
(C3), the minimunl time allocated to the synchro-
nization process S (C2), and the ininilnunl length of
the synchronization frame R (C1). A synchroniza-
tion subsystem based on these constraints must have
the property that a processor can read a remote eh)ek
at a time when the remote processor is not execut-
ing the synchronization process. That is, the remote
(:lock must be accessible for external reads outside
the scope of its own synchronization process. This is
clearly not the case with the design used in this t(.st.
Because a remote clock is read with the coopera-
tion of the remote synchronization process, the syn-
(:hronization windows must allocate adequate time
10
beforeandafterthesynchronizationtimeT_ in order
to be sure of capturing all good clocks. This time is
at least _ + s. In these tests, the window was set at
2 times the maximum perceived clock skew A, with
the synchronization time T.s ill the center of the will-
(tow (see fig. 8). Thus, the period R is d{;termined
by the END window register. Tile START window
register is set to END - 2A and tile SYNC registers
are set to END - A.
Table I. Constraints for Old Theory ICCSA
Constraint definition Con,_train! relation
Ct: minimmn period time R > 3S
C2: minimum algorithm time S > ?2
C3: maximum correct ion _ > A
C4: maximum perceived skew A > _ -+ e + P,@S
C5: maximum skew _ > _0 + PM I?
2C6: nmxinmIn skew _ > 2c + flM(2S + -_) + _A
+
Table ]L C{mst.raints for New Theory I(?CSA
Constraint definition Constraint relation
CI: minimum period time h' > N +- ._:
C2: minimmn algorilhm time S > 2A
correcqion E > (_) AC3: IB&xitlllllIl
C4: lnaxitIllllIl pert{dyed skew A > _ + s + £_-A
C5: mmximum skew b > b0 + pllI_
C6: ln_l_xiIlllllll ,'-;kl_w 7_ I m + PM A
The constraints as defined for these tests are listed
in table II. The only expression that remains equiv-
alent to table I is C5. The difference in C4 may he
due to ttle difference in S as described in the previ-
ous paragraph; C3 defines the maxinmm correction
possible if all n - 1 clocks return a difference of A;
C2 comes directly from the above discussion. Fi-
nally, R must be at least as big as S, with room for a
correction.
Figure 10 shows, for one series of tests, the bound
for the old theory (table I, C6), the bound as derived
in this paper (table II, C6), and the actual data.
These plots are of maximum clock skew (in ticks)
versus drift rate. The data were taken at large drift
rates with a constant read error of 200. Figure 10(a)
displays zero-fault-tolerant performance (m = 0),
and figure 10(b), single-fault-tolerant performance
(m = 1). The bound as derived in this paper exactly
predicts the performance of the result.
35
3O
u_
25
2o
_ t5
x 103
5
0"(
.O77
Old Ihcory __
New !heory/
Measured data ---__.
I I I I
.091 .111 .143 .200
Drift rale
(a) m 0.
70
60
" 50
_" 40
3o
2{}
101
{}'I
.030
x I0 3
Old theory__/
Ncw Ihcory/
Mcasurcd dal a
;.o; 2,; %7
I I I
.034 .04(} .048
Drift rate
(b) m 1.
Figure 1{}, ICCSA test results.
The Midpoint Algorithm. A theory based
on the Midpoint Algorithm was derived in refer-
enee 7 and interpreted in reference 2. Table Ill lists
the constraints for the old theory in tel'ms of the
symbols used in this paper (see appendix C). Ta-
ble IV contains tile constraints for the theory for the
Midpoint Algorithm as derived in this paper. The
synchronization t)rocess was identical to the ICCSA
with the exception that the Midpoint Algorithm wa.s
11
executed at Tc. Figure 11 plots the clock skew bound
predicted by the old theory, the theory derived in this
paper, and thc actual measured results versus drift
rate. As can be seen, the measured clock skew is
well below that predicted by the new theory. This
is not due to an inaccuracy in the theory, but to an
inability to replicate worst-case conditions with the
clock subsystem. This phenomenon will be explained
in more detail in the section Simulating a Malicious
Liar.
Case Studies
The parameters used in the verification tests are
obviously far worse than can be expected in an
actual system. However, now that the theory has
been verified under these extreme conditions, it is
reasonable to ask what level of performance can
be expected under nominal conditions. The case
studies listed in table V were generated to probe this
area. The case studies deal primarily with read error
and synchronization period, as these are the most
significant contributors to the clock skew.
A read error occurs every time a digital clock is
read. It is believed that the minimum read error that
will be obtainable in most synchronization systems is
1 tick. This tick of read error is added when, as is
the case with the subject clock subsystem, the local
clock is read in response to the strobe generated by
the remote clock. In this case the remote clock is
not actually read, but generates an event signal that,
by definition, occurs at clock time Ts and, therefore,
does not include an error component. A similar
situation would exist if the remote clock were to be
read in response to a request from the local clock
(given that there were no other overhead). Case 1
covers this best-case situation.
Table III. Constraints for Old Theory--Midpoint Algorithm
Constraint definition Constraint relation
CI: minimum period time
Cla: required lower bound on 5
R > 3A + _2_t50
p2i + l
C2: minimum algorithm time S > A
C3: maximum correction E > _ + A
C4: maximum perceived skew A > 5 + _ + _)LA
C5: maximum skew Assume C6 dominates
5 _ p2 t + 1
C6: maximum skew
Table IV. Constraints for New Theory Midpoint Algorithm
Constraint definition Constraint relation
CI: mimmum period time R > S + E
C2: minmmm algorithm time S > A
C3: lllaXlitnlln correction
C4: maxmmm perceived skew A > 5 + e + -_S
C5: maximum skew 5 > 50 + PM R
C6: mammum skew 5 > 46 + 2pMA + 2pMR
12
O4500
4000
3500
3000
2500
2000
,5oOo:I
.030
Measured data
l I I
.034 .040 .048
Drift rate
Figure 11. Midpoint Algorittun test results with m = 1.
Table V. C_se Study Parameters
Ca_qe
la
lb
lc
2a
21)
2c
3a
3b
3c
Period, R, pR, Read error, c,
Drift rate, p ticks ticks/period ticks
1.00 x 10 5
{{
I
I{
{
1
1.00 x 104
1.(111x 105
1.00 x 105
4.00 x 104
1.00 x 105
4.00 x 105
1.00 x 105
1.00 x 105
1.00 × 106
0.1
1.0
1.0
.4
1.0
4.0
1.0
1.1)
10.0
1
1
1
4
4
4
lO
1{)
lO
If both the local clock and the remote clock are
read in response to asynchronous events generated by
the processor, then 2 ticks of error would be added to
a clock read. Similarly, 2 ticks of read error can also
be added when a clock is corrected. This is again
due primarily to the asynchronous nature of clock
reads and writes. If the clock correction circuitry
is designed properly, this error will not be incurred.
Case 2 covers the situation when the read error c
is 4, with 2 ticks added during clock read and clock
correction.
To include a somewhat less than optimal situa-
tion, the read error is set to 10 in case 3.
Each ease consists of three subeases where the
drift rate is set so that the accumulated drift over one
period is equal to l/t0 of the read error in subcase "a,"
1.0 tick in subcase "b," and the entire read error in
subease "c." This leads to two redundant cases (lc
and 3b).
Figure 12(a) is a plot of all three cases. Fig-
ure 12(b) plots cases 1 and 2, which represent best-
case conditions. Data are plotted for both the ICCSA
(dashed lines) and the Midpoint Algorithm (solid
lines) and for both zero-fault-tolerant (filled symbols)
and single-fault-tolerant cases (empty symbols).
120
100
80
6O
40
20
0
• -- ICCSA m=0
--'I-- ICCSA m=l P
..... Midpoint m = 0 ."
------O----- Midpoint m = I ,,
_. ...... .n'
,•,**'°° ° •
.llf °" •.°'°
.D ....... I1'°*_
, , , ,
la Ib Ic 2a 2b 2c 3a 3b . 3c
Case
(a) Cases 1, 2, and 3.
45F -- • -- ICCSA m=0
40 F --'I-- ICCSA m=l ..,"*
35_ ..... Midpoint m=0 .,'"
_ 30{- _ Midpoint m = 1 .-""
" [ ..... I_ °
25 [" .P .......
201- ,."" .,--"
_0 ]5 .."" . ............... ""
]_) ............
0 I I I I i I
la lb lc 2a 2b 2c
Case
(b) Cases 1 and 2.
Figure 12. Case study results.
Discussion of Results
The results of this study span a broad spectrum
of subject matter including clock algorithm perfor-
mance, design methodology, and techniques of worst-
case testing. The following sections address these
issues.
Clock Algorithm Performance
As can be seen from comparing the fault-free and
single-fault cases in figures 12(a) and 12(b), a per-
formance penalty of 100 percent is paid to protect
the system from faults. It is interesting to note
that this penalty is the same for both algorithms.
If a clock skew dead band is made part of every
13
communicationsexchange,thendesignersmustcon-
siderwhetherthey arewilling to pay this penalty
to protect,thesystemfl'oma rarefl)rmof malicious
behavior.
The equationsfor the clockskewupperbound
suggestthat thecomponentofclockskewdueto ac-
tualdrift,(pR)callbereducedtoaninsignificantlevel
if R is made small enough. This is not thought to be
possible, since, in the absence of read error, no cor-
rection will be made for a series of intervals until a
significant skew has accmnulated. A correction will
then be made. This was in fact observed indirectly.
Direct observation was not possible because our sys-
tem had 1 tick of read error, nlinimum.
The indirect observation was made by frst taking
one data set with zero additional read error and zero
drift rate. What is observed is the minimum read
error of the system. This was done for several thou-
sand clock readings, with none exceeding q-1 tick. To
observe the effect of pR < 1, the same system was
then run with pR = 0.1. Within this series, occa-
skmal readings of 4-2 were observed, thus supporting
the conjecture that the pR term actually contributes
an amomlt equivalent to the function ceiliT_g(pR).
The Midpoint Algoritlun outperforms the ICCSA
and is the clear choice. Remembering that the
"a" series subcases are hypothetical with pR < 1,
the next best design is case lb (c = 1,pR = 1),
which yields a single-fault-tolerant skew bound of
6 ticks. While this kind of performance is possible
over dedicated links, it may not be possible to design
a general-purpose communication protocol that can
support both efficient transfer of normal traffic and
very low" read error.
If it is necessary to allow for greater read error,
as represented by case 3, the designer has a wider
choice in selecting the synchronization period. In
this case, tile use of a minimum synchronization
period (i.e., with pR = 1) may yield only marginally
tighter clock skews because tile read error dominates.
The frequent synchronizations may produce more
overhead on the communications ehmmel than is
saved by virtue of the resultant tighter clock skews.
Design Methodology
One of the areas in which clock synchronization
is used is highly reliable fault-tolerant architectures
such as those in military and commercial aircraft.
The high reliability requirements put on these de-
signs (probability of failure = 10 -9 per mission) pre-
clude testing as a means of validating that this re-
quirement has been met. One of the methods that
has been suggeste(t for this purpose is formal verifica-
tion. A formal verification methodology would entail
the use of a specification language and the construc-
tion of a hierarchical theory written in that language
that could be proven to show that the final design
meets the highest level specification. Automated th,t_-
orem provers are often used to facilitate this task.
A good example of this inethod is HDM (ref. 12)
as used on SIFT. Most recently this has matured to
EHDM, which was used by Rushby (ref. 6) to rederive
the clock theory originally invented by Lamport and
Melliar-Smith. In reference 6, Rushby reports that
the rigor enforced by the use of the theorem provers
led to the uncovering of several inconsistencies in the
original, hand-derived theory.
The purpose of expe_rirnental ver_cation as re-
ported in this paper was to demonstrate that the
formal theory was indeed correct. What was found
was that although tile theory was correct in that it
predicted a bound that was never violated, the bound
was only a bound and not a model for the actual cir-
cuit performance. With tile insight gained by experi-
mentally observing tile behavior of the circuit, it was
possible to derive a more accurate theory. Thus, al-
though testing cannot be relied upon to verify highly
reliable components, it becomes an integral part of
deriving the theory, which can then be used to predict
the performance of the circuit into the unobservable
regions. While this may somld obvious to those who
have practiced such techniques, it has been observed
that individuals tend to be heavily biased toward ei-
ther tile "design and debug" or "theorize and prove"
canlps.
Figure 13 is an attempt to illustrate an op-
timal design methodology. The two axes delin-
eate time spent testing and theorizing. A v(.ctor
DMV is drawn whose length represents design op-
timality. It is proposed that tile optimality is di-
rectly proportional to the correctness of a design
and inversely proportional to its cost. The locus
of points traced by this vector suggests that if too
much emphasis is placed on either testing or the-
ory, design optimality suffers and that the (,pti-
inure design is reached by applying those techniques
best suited for the particular problem. As demon-
stratcd in this work, verification of predicted val-
ues of physical quantities is well suited to testing.
Testing will also provide behavioral insight, which
aids in the construction of provable and rcaliznble
theory. As will be seen in tile next section, test-
ing cannot be relied upon to quantify worst-case
behavior.
14
In . i _ Correctness
ues gn method value = DMV
I _ Lost
Theory
Figure 13. l)osign mothod value.
Tp: good processor I
Tq: lying processor
: I * I
I
I I Z_! •
i
I
I
I i
Tr: good processor T s
T s
a i
! i
i i
i
* [ '
Figure 11. Anticipated malicious liar behavior.
Simulating a Malicious Liar
To experimentally verify the clock theory, special
circuits were added to the clock peripheral circuitry
to enable t.he simulation of malicious faults (see
the section "Test augmentation"). Dm'ing testing
of the ICCSA, the worst-case behavior of a lying
clock wa_s inore difficult to simulate than originally
anticipated, and the special circuitry could not be
used to simulate worst-case conditions without great
difficulty. Moreover, for the Midpoint Algorithm,
worst-case conditions couht not be sinmlatcd at all.
Figure 14 shows the faulty behavior that was
assumed during the design of the test equipment.
The figure illustrates the time line of three processors
p, q, and r, with p and ;' being good processors and q
being a lying processor. If p is a slow processor with
respect to r, then q would send a synchronization
signal to p just prior to the end of the synchronization
window to give p the pcrcet)tion that it was a good
deal faster than q aIltt tllllS CallSe p to at)ply a
correction that would slow its clock even flirt.her.
Conversely, q would signal r at the beginning of the
window and cause r to apply a correction that would
speed up its clock.
In practice, the difficulty with doing this is that
although it is possible to anticipate the beginning
and ending window times for r and p with respect
to q for the first frame, it was observed that worst-
case skew is not obtained until several frames later.
This behavior is illustrated in figure 15. Consider
the ease in which processor p uses the ICCSA. Pro-
cessor p will read a clock difference of A from q in
frame 1. Processor p uses this value as part of the av-
exaging process to compute the correction. The cor-
rection computed by processor p will thus have an
error of A/4 {for four processors). Processor r, on
the other hand, will apply a correction with an equal
but opposite error with the result that the synchro-
nization windows of p and r have been driven A/2
farther apart. Thus, for q to again send worst-case
synchronization signals, it mllst now take this addi-
tional skew into account, as illustrated by the second
fl'amc in the figure. The correction error would then
become (A + A/4)/4. The correction is then increas-
ing by amount A/4 a', where k is the fran]e number.
The skew betweei] p and r would increase until the
additional error becomes insignificant, i.e., A < 4k.
This typically took five frames when large drift rates
made large synchronization windows necessary.
Tp
I I 1
Tlie'rl. I I"
Tq , Tile, p
i' I i
T_
Tp
Tlie,r_ A/41 I !
Wr
Tlie, p
Figure 15. Observ(!(l malicious liar b(,havior.
It w'_rs decided, after having observed this behav-
ior, to model the malicious behavior from the per-
spective of the good processors instead of creating
the erroneous signal on the faulty processor. This
was done by providing the synchronization algorithm
with a parameter that indicated which remote clock
was to be considered a liar and in which direction
it was lying. The good processor then substituted
its START or END window value for the actual
15
readingof the faulty clock,thussimulatingthe ef-
fectdescribedabove.
\Vorst-ca.seconditionscould not be simulated
with theMidpointAlgorithmbecauseof the lackof
sufficientprocessorsto createthe necessarycondi-
tions. Wbrst-caseconditionsarea combinationof
maximmndrift, maximumreaderror,andthe pres-
enceof a maliciousliar. In theMidpointAlgorithm,
the twooutlyingclockdifferencesarediscardedand
the remainingtwo averaged(for four processors).
Whena maliciousliar ispresentandbehavesasde-
scribedabove,it will causethe fastestandslowest
clocksto includetheir clockdifferencereadings(0)
in thecorrectioncomputation.Normally,thefastest
andslowestclockswouldbeat theextremesandnot
beused. The "self' clockreadingsdo not contain
any readerror,so that the worst-caseskewis not
achieved.In asystemof fiveormoreclocks,it would
havebeenpossibleto arrangetheparametersto cre-
ateworst-caseconditions.
In conclusion,testingcannotbe relieduponto
createworst-casebehavior.Thecomplexinteractions
oftenconfoundcursoryanalysis;the resultis that
somethingotherthan worstcasemaybeobserved,
with thedangerthenthat thesystemwill bedesigned
aroundthesemisleadingspecifications.Developinga
theorythat predictsworstcaseprovidesa checking
mechanismthat whenthe theorypredictiondoes
not matchthe observation,immediatelyraisesthe
questionof whichis at fault. For a highlyreliable
design,thesekindsof discrepanciesmustbc known
andresolved.
Concluding Remarks
New theory has been developed and experimen-
tally verified for the Interactive Convergence Clock
Synchronization Algorithm and the Midpoint Algo-
rithm. The Midpoint Algorithm is capable of achiev-
ing tighter synchronization than the Interactive Con-
vergence Clock Synchronization Algorithm. Both
algorithms suffer a 100-percent penalty to protect
against one fault. The new theory outperforms exist-
ing theory that was developed without the benefit of
the insight gained during experimental verification.
However, it is not adequate to rely on testing pro-
cedures to uncover worst-case behavior. Testing and
theory go hand in hand to produce optimal designs.
This is especially truc for highly reliable systems.
NASA Langley Research Center
Hampton, VA 23665-5225
May 5, 1992
16
Appendix A
Proving Equations (10), (12), (13),
and (14)
Proving Equation (10)
To prove equation (10), that is,
Aqp ---- -[_qt,(Zs) :t: _ + pp Aqp (10)
we start with the definition of Aqp, equation (8),
which is
Aqp -_ Tqp - Ts (8)
and using equation (9) to expand Tqp and expressing
T._ as tile value of clock p at a real tittle when clock p
reads T_, we get
Aqp = Tp(tq(T,s)) - Tp(tp(Ts))
Using equation (3) to expand the clock functions Tp
and realizing that the second term incurs no read
error, we have
Finally, combining terms and using equation (7), we
get
Aqp -- -(_qp(r's) i e
1 - pp
= -_qp(Ts) :1: e + ppAqp
Proving Equation (12)
To prove equation (12), that is,
_qp(T + C) = t_qp(T) -C pqpC (12)
we start with equation (7) and substitute equa-
tion (1) as follows:
6qp(T + C) = tp(T + C) - tq(T + C)
= [(1- pp)(r + c) + top]
- [(1- pq)(T + c) + t0q]
---- ¢Sqp(T) + (pq - pp)C
= _qp(T) +- pqpC
Proving Equation (13)
To prove equation (13), that is,
(Sqp(T) = pqp(T - Tc) + (Sqp(Tc) (13)
we rearrange equation (12) and substitute C =
-(T - T_).
Proving Equation (14)
To prove equation (14), that is,
¢Srq(T) - 6rp(T) = 6pq(T) (14)
we use equation (7) and write
6rq(T) - _rp(T) = [tq(T) - tr(r)] - [tp(T) - tr(T)]
= [tq(T) -- tp(T)]
= @q(T) = -6qp(T)
17
Appendix B
The Expansion of _,p - \q
\Ve expand the term ,_p -Xq:
'EXp - Xq =
r= ]
(A,v ' _ A,.,l) + 1 (A_,t, _ Ap,)
II
l
+ - (A,n ,
H
Hl
- "X,l,t)+ -- (V_, Vq) (B1)
II
This expression contains four terms that can be
considered in tilree groups. Tile first term represents
the (n- 2 - m) 9ood processors. The second and
third terms represent good processors, one of which
is a local clock. The third term represents readings
taken from bad processors.
The Good Processors
We will first reduce the term A_.p - Arq using
e(luatiolt (10).
a,._ - _x,.,_ <[e,_(T_)- _,.j,(T_)]+ 2c
Jr- (pl,/krp -- PqArq )
_<[_,.q(T_)- _,._,(_)] + 2E (B2)
Here, as before, the difference between the pA values
is ignored. These results are replaced in the sum (B1)
and simplified as follows:
It _- tl_ tt-2 -m
1 E Z
r=:l r:i
11 _ tit It--2 tit
II, It
r_l 1"=_
X [(_rq(Ts) - Orp(T,_)] +
2(n 2- m)
Tl
c (B3)
18
The Local Processors
Taking tile two terms that include local processor
readings, we write
l
1 (App Apq) + (Aqp Aqq)
1
I I
{I-%,('i:_)] I-_,,_(T,) • G + t,q.X_,q]}
+
1
1
z -- {[-_,,,(T,4] [-6,,(%) ± d}
+
1
- {[-_q_,(7_) ± d - [-6q(r_)]}
!t
4:- [)P Z-_qp¢l -- _q L._pq
Finally, ignoring the difference t)etween the pA terms
we obtain
1
1 (App Ap,t) ± -- (Aqp - ,'Nqq)
It I I
1
= - {[--_,/T,)]- [-_,,_(7:,)]}
11
1 2
+ - {I-/_,,p(T,)} - [- _,,_(7;5]} + -_ (B4)
II H
Good Plus Local Processors
Combining equations (B3) and (B4) by replacing
the first two terms on tim right-hand side of equa-
tion (B4) in the summation of equation (B3), we get
'tl -- Ill
2(_- 1 - m
+ _ (B_)
Taking a closer look at the expression within the
summation we have, with T<, = T_ + A,
_,.q(T,)- _,._(T_)= _,.q(T_,-A) - _,.p(T,,- 5)
Now using equations (12) and (5), we get
_,.q(T,)- 6,.p(T_)= _,,.q(T_,)- 6,.p(T,.)+ pqp/',
Finally, application of equation (14) yields
[_rq(Zs) -- _,rp(rs) = -[_qp(Tc) 4- flqp/k (B6)
Now, substituting equation (B6) into equation (B5),
(%?Good + Local = 6qp(Zc) + ell
+ pM(n-- m) A (B7)
n
The Bad Processors
Recalling from the ICCSA that all perceived
skews are limitcd to a maximum of A we have
'2 (v_ - Vq) _<2,_/,, (B8)
71 Tt
Good Plus Local Plus Bad Processors
Using equations (B7) and (B8) in the original
expression gives
Xp- Xq _- (_-) 6qp(Te)
2(n - 1 - m)
+
+ pM(n-- m) 5 + 2m a
l1 II
19
REPORT DOCUMENTATION PAGE Form Approved
OMB No. 0704-0188
I Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources,
gathering and maintaining the data needed, and completing and reviewing the collection of information Send comments regarding this burden estimate or any other aspect of this
collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson
Davis Highway. Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188), Washington, OC 20503
1. AGENCY USE ONLY(Leave blank I 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED
I July 1992 Technical Paper
4. TITLE AND SUBTITLE S. FUNDING NUMBERS
Experimental Validation of Clock Synchronization Algorithms
6. AUTHOR(S)
Daniel L. Palunll)o and R. Lynn Graham
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)
NASA Langley Research Center
Hampton, VA 23665-5225
g. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES)
National Aeronautics and Space Adnlinistration
Washington, DC 20546-0001
WU 505-64-10-07
8. PERFORMING ORGANIZATION
REPORT NUMBER
L-17015
10. SPONSORING/MONITORING
AGENCY REPORT NUMBER
NASA TP-3209
11. SUPPLEMENTARY NOTES
Palumbo: Langley Research Center, Hampton, VA; Graham: PRC Kentron hlc., Hampton, VA.
12a. DISTRIBUTION/AVAILABILITY STATEMENT
Unclassified Unlimited
Sut)ject Category 62
12b. DISTRIBUTION CODE
13. ABSTRACT (Maximum 200 words)
The objective of this work is to validate mathematically derived clock synchronization theories and their
associated algorithms through experiment. Two theories are considered, the Interactive Convergence Clock
Synchronization Algorithm and the Midpoint Algorithm. Special clock circuitry was designed and built so that
several operating conditions and failure modes (iimhl(ting malicious faihlres) could be tested. Both theories
are shown to predict conservative upper bounds (i.e., measured values of clock skew were always less than the
theory prediction). Insight gained during experimentation led to alternative derivations of the theories. These
new theories accurately predict the behavior of the clock system. It is found that a 100-percent penalty is paid
to tolerate worst-case failures. It is also shown that under optimal conditions (with mininmnl error and no
failures) the clock skew can be as much as three (:lock ticks. Clock skew grows to six clock ticks when failures
are present. Finally, it is concluded that one cannot rely solely on test procedures or theoretical analysis to
predict worst-case conditions.
14. SUBJECT TERMS
Clock synchronization; Formal methods; Verification; Validation
17. SECURITY CLASSIFICATION
OF REPORT
Unclassified
_ISN 7540-01-280-5500
18. SECURITY CLASSIFICATION 19. SECURITY CLASSIFICATIO_
OF THIS PAGE OF ABSTRACT
Unclassified
!15. NUMBER OF PAGES
22
16. PRICE CODE
A03
20. LIMITATION
OF ABSTRACT
Standard Form 298(Rev. 2-89)
Prescribed by ANSI Std Z39 18
298-102
NASA-Langley, 1992
