Clocking and synchronization within a fault-tolerant multiprocessor by Krauss, H. R.
_- 
. .  
, ... . .. . ~. 
.- 
REPRODIJCED BY -- 
NATIONAL TECHNICAL 
INFORMATION SERVICE 
U. S. DEPARTMENT OF COMMERCE 
SPRINGFIELD, VA. 22161 
PREPARED A T  
.:.,: . ,. 
, .  
https://ntrs.nasa.gov/search.jsp?R=19770074744 2020-03-22T06:35:00+00:00Z
CLOCKING AND SYMCHRONIZATION WITHIN A 
FAULT-TOLERAN5C MULTIPROCESSOR 
bY 
H o w a r d  R e  Krauss 
3-E. The C o o p e r  Union 
( 19 7 1 ) 
SUBMITTED I N  PARTIAL FULFILLMENT 
OF TH% REQUIREMENTS FOR THE 
DEGREE OF MA!STER OF SCIENCE 
- a t  the 
MASSACHUSETTS INSTITUTE O F  TECJ3NOLOGY 
June 1972 
Signature of A u t h o r  
D e p a r t m e n t  of A e r o n a u t i c s  and A s t r o n a u t i c s  
May 1972 
Cert i f ied by 
I 
A c c e p t e d  by 
Chairman, D e p a r t m e n t a l  C o m m i t t e e  n G r a d u a t e  Students 
CW)CKING AND SYNCHR.ONIZATION WITHIN 
A FAULT-TOLERANT MULTIPROCESSOR 
by. 
Howard R. Krauss 
Submitted to  the Department of Aeronautics and Astro- 
nautics,  May, 1972, i n  p a r t i a l  fulfi l lment of the requirements 
for  the  Degree of Master of Science, 
ABSTRACT 
In  t h i s  thes i s  t he  synchronization requirements of a 
faul t - tolerant  multiprocessor are defined and methods of main- 
tenance of synchronism are developed. I t  is demonstrated that 
a synchronous fault-tolerant multiprocessor driven by a fau l t -  
to le ran t  clock is more e f f i c i en t  and more eas i ly  implemented 
than is an asynchronous fault-tolerant multiprocessor. 
Fault-tolerant clocking has been examined intensively 
here. From fault-tolerance requirements and the established 
multiprocessor synchronization requirements, general specif i -  
cations a re  developed fo r  a fault-tolerant clock. Two general 
methods of design have been explored, and it has been concluded 
t h a t  i f  the clock is t o  be d is t r ibu ted  t o  many modules, fault- 
tolerant clocking through the concepts advanced by William Daly 
and John McKenna of the  C.S. Draper Laboratory, is more prac t i -  
cal  t o  implement than is faul t - tolerant  clocking through fa i lure-  
detection and subsequent clock substi tution. 
by Daly and McKenna have been examined, refined, and revised, 
I t  is  demonstrated t h a t  it i s  desirable t o  have available a 
faul t - tolerant  clock which runs a t  20 M H z ,  but t h a t  such a 
frequency is not achieveable by a McKenna-type clock (with use 
of current technology). A method of achieving the use of a 
re la t ive ly  slow McKenna-type clock i n  conjunction with a fre- 
quency mult ipl ier  is developed. Also, analog phase-locking 
techniques are shown t o  be unsuitable f o r  the design of a fau l t -  
to le ran t  clock. 
Clacks developed 
esis Supervisor: A l b e r t  La Hopkins, Jr, 
@I 
i t E e :  Associate Professor of Aeronautics and Astronautics 
2 
A CKN(3WLEDGEMEMTS 
The author wishes t o  express h i s  grati tude,  f i r s t  
and foremost, t o  D r ,  Albert Le Hopkins for the guidance 
and motivation he provided throughout the  product ion of 
t h i s  t hes i s -  Second he is grateful  t o  John McKenna for 
having reviewed the work on the McKenna Clock and for  the 
suggestions he made, A l s o ,  he is indebted t o  Mary Shamlian 
f o r  her a i d  i n  typing the t ex t  of t h i s  thes i s ,  
Finally,  the author .wishes t o  offer  h i s  thanks t o  
Eileen Hack as  well as the many other friends and acquaint- 
ances for t h e i r  par ts  i n  f i l l i n g  h i s  l i f e ,  
This report  w a s  prepared under DSR Project 55-23890r 
sponsored by the Manned Spacecraft Center of the National 
Aeronautics and Space Administration through Contract 
NAS 9-4065. 
The 
approval 
National 
findings 
n 
pub.lication of t h i s  report  does not const i tute  
by the Charles Stark Draper Laboratory o r  t he  
Aeronautics and Space Administration of the 
o r  the conclusions contained herein. It  is 
published only for the  ex:change and stimulation of ideas. 
TABLE OF CONTENTS 
CHAPTER 1 FAULT-TOLERANT MULTIPROCESSING e e e e e a 6 
1,a Introduction . . e e .) e e e . e (. 6 
1.2 General Characteristics e e . e e .) e .) . 6 
1(.2*1 Fault-Tolerance . . . . ., . . e 8 
1,3 Particular Configuration e . . e . 8 
1-3-1 Achievement of Fault-Tolerance 
Within the Regional Computer . . . e . . 10 
1.3.2 Achievement of Fault-Tolerance 
Within the Local Processor . e a . e 12 
CHAPTER 2 SYNCHRONIZATION . . . - e e . e . (. e e 13 
2.1 Definitions and Requirements . e . (. . . 13 
2.2 Loss of Synchronization . (. . (. . e e 13 
2-3 A Synchronized System With 
Unsynchronized Elements . . e . . . e e 20 
2 - 4  System Synchronization Through 
Use of a Common Clock (. e e e e 28 
2.5 Conclusions a . . e . e e a e e . I. 29 
CHAPTER 3 
3.1 
3.2 
303 
CLOCKING.  e e e o e o e e e .  o e 
Specifications of a Fault-Tolerant 
Clock . . e O . . O O 1 l . . . . . . .  
General Methods of Design e .' (. a e e 
Fault-Tolerant Clocking Through 
Failure-Detec t ion  and Subsequent 
Clock Substi tution . . e . e . e 
3.4 The M c K e n n a  Clock a . . - a a . a e . 
3.4.1 F i r s t  Concept . (. . e . . . . . . . 
3.4-2 Current Concept e e e e . . - . . . 
3.5 Speeding Up the McKenna Clock . . a .) . 
3,5.1 Advantages of Greater Speed (. . e e . 
3-5.2 Application of Advanced Device 
Technology e . (. . e . e . e . 
3.5.3 Revised C i r c u i t  e e . ., (. .) . .. . . . e 
3.5.4 Increased Speed by Frequency 
Multiplication . . e a . e * . . . ., . e 
3-6 Methods of Synchronization Used 
In  Pulse-Code Modulation e e e ., a - 
CHAPSTER 4 CONCLUSIONS * . e .) e m 0 0 (. e e D 
3 1. 
31 
32 
32 
42 
42 
70 
80 
80 
80 
81 
84 
89 
94 
96 
E' 
FAULT-TOLERANT MULTIPROCESSING 
1.1 Introduction 
The concept of a fault-tolerant multiprocessor was 
developed and explored at the C.S, Draper Laboratory as a 
method of satisfying future spacecraft guidance requirements. 
Future space vehicles will requi.re the handling of additional 
control loops, and as missions become more complex and/or 
lengthier greater reliability is required. 
NASA particular emphasis was placed on the application of a 
fault-tolerant multiprocessor as a space shuttle guidance 
computer. 
In a proposal to 
1.2 General Characteristics 
The essential elements of a multiprocessor are two or 
mare processors capable of simultaneously executing different 
programs (or the same programs) and a common memory accessible 
by all processors, This collection of units has a single path 
for input-output communication. A conceptual diagram is shown 
in Fig. P.1. 
Because of the parallel operation of the individual 
processors there is a significant increase in computational 
capability, This parallelism lends itself very well to the 
requirement of simultaneous conixol of many loops. 
. .  
The ge 
F i g P  1.1 Generalized Multiprocessor 
increased r e l i a b i l i t y  i s  also .inherent within the multi- 
processing concept: processors are a l ike  and hence any proces- 
sor is capable of performing within any control loop a t  a 
given t i m e :  it is  this modularity which is  the fundamental 
idea behind faulc-tolerant multiprocessirlg. 
I e 2 1 Faul t-Tolerance 
In  general, fault-tolerance is  achieved through coded 
redundancy, replicated redundancy, OK a combination of the two. 
Except i n  special  instances, redundancy by repl icat ion is more 
r e l i ab le  and more simply implemented (Ref. 1). Within the 
multiprocessor under study, fault-tolerance is  achieved through 
comparison and/or voting amongst replicated uni ts .  Some basic  
assumptions i n  the design of t h i s  system are: (1) fa i lu re s  
are independent of one another: ( 2 )  the same e r ro r  w i l l  not be 
made a t  the same time by two elements which a re  i n  a compari- 
son or  voting scheme: and ( 3 )  multiple e r rors  w i l l  not occur 
so as  t o  outwit the  faul t - tolerant  scheme (roughly equivalent 
to saying t h a t  e r rors  w i l l  be separated by some minimum t i m e ) ,  
I n  a system which operates such t h a t  f a i lu re s  are independent 
of one another, the  probabili ty of assumptions nos. 2 and 3 
being violated is extremely small e 
2 . 3  Part icular  Confisuration 
Figure 1-2 is a representation of the data management 
system recommended for  the Space Shuttle.  The system is  
designed t o  m e e t  a f a i l  operational,  f a i l  operational, fail,  
s a fe  (FOFOFS) specification; by t h i s  specification it is 
meant t h a t  the system will maintain its perfo 
t ies a f t e r  the occurrence of any two f a i lu re s  and, as a r e s u l t  
ance capabil i -  
E3 
REGlCtlN 
INIERMLLV 
--1 -- - DlSPLAYANOCOMhlAND SYSIUI -- - WIlGAllO\ A I R  IRAf f IC  A DS. C0';IROL AIR OATA - - 
i 
75;------- ------ r -- ------ r --- -_-- 
SlARBOAR 
WING ARG 
WRI 
WING ARE1 
SPACE SHUTRE V t H l C U  AVIONICS SVSlEhl CONI IGUHAIION 
SIMPLD: L W R  PROCESSOR QUADRUPLM SERI11 D N A  BUS 
DuPLM W l  PROCESSOR 
IRIPLCX LOCAL PROCESSMI f A S I ~ A 1  PROCESSORS 
NUMBER of SUBSIEMS; 17 IINCLUDINC MASS m i o w l  
NUMEER Of REGULAR LOCAL PROCESSOR COMPLMES: I2 
NUMBER Of SIMPLlX REGULAR LOCAL PROCESSORS: 0 
NUMBER Of fASI  LOCAL PROCfSSOR CWtPIMES: 5 
NUMBER of SIMPIM ;AS1 LOCAL PROCLSSORS; 13 
ISec. 2.5.LJfor I Later LP Census1 
Fig. 1,2 Assumed System Configuration 
of a th i rd  f a i lu re ,  i n  the worst easeo suffer  a graceful degra- 
dation t o  a configuration which can s t i l l  assure safe control 
of the vehicle. The system is  hierarchical-  The many sensors 
and ef fec tors  compose the lowest level.  N e x t  up i n  the hier-  
archy# t h e  local  processors transform between the serial-multi- 
plex format of the data bus and t h e  sensor-effector formats: 
also they a re  charged w i t h  the  function of assuming those 
'burdens which would unnecessarily overload the top leve l  of the 
system, A t  the top, the regional computer provides data proces- 
sing services to  t h e  en t i r e  system and manages interactions 
between subsystems. 
The use of multiprocessing techniques i n  the regional 
computer (RC) serves both t o  achieve fault-tolerance and t o  
y ie ld  a larger  throughput than would be possible w i t h  a simplex 
machine, Within the local  processor (LP) duplication of 
processors i s  used solely as  a too l  for  the achievement of 
fault-tolerance.  
1.3.1 Achievement of Fault-Tolerance Within 
The Resional Computer 
The regional computer multiprocessor configuration 
recommended by the Drapes Lab is shown i n  Fig. 1.3. Each 
Central  Processing Unit (CPU) cclnsists of two processors and a 
t r i p l i c a t e d  scratchpad memory which s tores  local  temporary data 
and performs input/output ( I / O )  buffering. The memory, memory 
bus, and data bus are  each redundant. The memory may be 
accessed by only one CPU a t  a time, and only one CPU (or LP) 
e transmitt ing on the data bus a t  any time, 
M3MORY BUS 
(auadruply redundant) 
cpu 1 4 
- 
XRATCH k 
CPU 2 
P: PROCESSOR 
M: MEMORY 
1/0 BUS ( q u a d r u p l y  r e d u n d a n t )  
F i g .  1.3 Regional Computer Multiprocessor 
e r ro r  detection is achieved by comparing the outputs 
of the two processorsI which run ident ica l  programs. The 
detection of a CPU e r ro r  triggers; Single Instruction Restart  
( S I R )  , w h i c h  consists of the moving of the seratchpad contents 
of the "fai led" u n i t  i n to  memory and the subsequent loading of 
this information in to  the next available "healthy" CPU, where-  
upon the f a i l ed  job is resumed, The recovery i s  transparent t o  
the software. 
Fault-tolerance requirements a r e  m e t  by providing 
suf f ic ien t  C W s  such t h a t  a f t e r  an established number have 
fa i led  the remainder can provide the necessary response speed 
and throughput t o  meet the system requirements. Thus, there 
is the advantage of extra  processing capabi l i ty  before any CPU 
f a i l s .  
1.3-2 Achievement of Fault-Tolerance Within 
The Local Processor 
Depending upon system requirements a local processor may 
be simplex, duplex, o r  t r ip lex .  In any case e r ror  detection is 
achieved as i n  a CPU: each LP u n i t  contains two processors 
which perform ident ica l  operations and compare outputs. Fault- 
tolerance is  not through a r e s t a r t  mechanism, however. In the 
case of a duplex or t r i p l ex  LP, Local fault-tolerance is 
achieved by keeping two LP uni t s  i n  synchronism. One feeds the 
data bus and the other has i t s  output blocked: i n  the event of 
a f a i lu re  of an LP uni t ,  i t s  output is blocked and the o ther ' s  
is enabled. It is because of the differences aniongst loca l  
processors serving d i f f e ren t  sensor-effector systems as w e l l  
as a desire t o  l i m i t  data bus use t h a t  t h i s  mechanism for 
ieving fault-tolerance is used ra ther  than an S I R ,  
12 
CHAPTER 2 
SYNCHROIZIZATION 
2.1 Definitions and Requirements 
If fault-tolerance is  to be achieved through comparison 
and/or voting amongst replicated uni t s  then there a re  two 
requirements of operation which establ ish a need for  synchro- 
nization, Obviously i n  order for  t he  comparison t o  be ef fec t -  
ive, corresponding output information from each un i t  must be 
compared. Second, i n  order t o  assure equality of in te rna l  
operations, accessed input in.formation must be equal a t  cor- 
responding program points. I t  should be noted t h a t  although 
synchronization is  required, -simultaneous production of cor- 
responding information o r  &u!-taneous performance of cor- 
responding in te rna l  operations are  = required. I t  i s  best 
t o  discard the notion tha t  two events can be made t o  occur a t  
the same t i m e ;  i n  a r ea l  syst 'em there must a l w a y s  be f i n i t e  
toZerances i n  the "simultaneous" i n i t i a t i o n  of events It is 
fortunate t h a t  the  synchronization requirements do not  ca l l  
for simultaneity, but ,  as sha l l  be seen i n  Section 2.2, the  
impossibility of assuring simultaneity gives rise t o  d i f f i cu l t -  
ies i n  assuring synchronization. 
Assuming t h a t  several modules have been synchronized, 
loss of synchronization may be caused by a s l iver ing  of pulses. 
s l ive r  can occur when two independent observers s t robe a 
common s igna l  while it is undergoing a t rans i t ion .  
ga te  thresholds and propagation delays, the  observers may see 
d i f fe ren t  valueE of the signal,  SliveriTg can cause differences 
i n  the sequences of in te rna l  operations (leading t o  uncorre- 
lated outputs) o r  i n  the outputs of replicated comparators, 
Because of 
If the input or  the  event being strobed is  phase-locked 
(i.e.,  dependent on the same cloczkf w i t h  t h e  strobe, then 
s l iver ing  may be avoided (barring the event of a fa i lure)  
through good design techniques, I f ,  however, the observed 
s igna l  i s  produced asynchronously, o r  more generally, i f  the  
production of the  observed s igna l  i s  uncorrelated with t he  
strobe or  the  c a l l  fo r  data, then ant i -s l iver  c i r cu i t s  must be 
u t i l i zed  as necessary. 
Current designs for ant i -s l iver  c i r c u i t s  require the 
presence of a two-phase clock. ;4 two-phase clock may be simply 
produced as i l l u s t r a t e d  i n  Fig.  2.1, The pulses of the two 
phases are mutually exclusive,., The application of t h i s  clock 
to an an t i - s l iver  c i r c u i t  i s  i l l u s t r a t e d  i n  Fig. 2.2.  The 
event pulse is  s tored i n  the f i r s t  buffer: as i l l u s t r a t ed ,  
either the  concurrent (as  i n  Case 1) or  the next (as i n  Case 2)  
phase A pulse w i l l  cause the event t o  be s tored i n  the second 
buffer ,  which is  strobed, a f t e r  se t t l ing ,  by a phase B pulse, 
thereby feeding a healthy s ignal  t o  both uni t s ,  
Ln a faul t - tolerant  system, when a repl icated group of 
modules is receiving repl icated information asynchronously, an 
an t i - s l iver  unanimity c i r c u i t  may be used to  maintain synchro- 
nization, Such a c i r c u i t  is i l l u s t r a t e d  i n  Fig, 2.3. The 
transmitted information from the Ais t o  the B . s  is s l iver - f ree  
1 
Phase B 
Clock 
Phase A n n -  
Phase B 
F i g .  2.1 Two-Phase Clock 
To U n i t  I 
To U n i t  2 
Ev 
Phase A Phase B 
Event 
FF A 
Phase 
FF. E? 
Phase 
Outputs 
Case 1 Case 2 
I 
Fig. 2.2 Anti-Sl iver  C i r c u i t  
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
1 
I 
I 
I 
I 
I 
I 
8 
I 
I 
I 
1 
I 
t 
1 
I 
I 
Phase B 
Fig -  2-3 Anti-Sliver Unanimity C i rcu i t  
:L 3 
and a l l  the Bis  receive synchronized information from a l l  the 
A.s a t  the same t i m e  so t h a t  an accurate vote may be taken, and 
so t h a t  the Bif remain i n  synchronism. Each unanimity c i r c u i t  
waits long enough t o  accumulate a l l  the  A . s  but  not so long a s  
t o  impair operation due t o  a f a i l u r e  of one of the A . s .  
1 
1 
1 
The t h i n g  chart i n  Fig. 2.4 i l l u s t r a t e s  a possible 
sequence of events fo r  the case of no fa i lure .  The event t o  be 
transmitted t o  the B . s  is Ai --$ :Le 
2.4 indicates t h a t  the events, Ai + 1, occurs f i r s t  i n  A 
3' second in A and l a s t  i n  A 
the corresponding fl ip-flop, FFAi. 
The example shown i n  Fig. 
1 
2' 
--j 1, is held i n  
Concurrent with the f i r s t  
Ai Each event, 1' 
Phase-A pulse shown, only FFAl and FFA are  at logic-level-1; 2 
the fl ip-flops A i  and A '  are set  during the f i r s t  Phase-A 
pulse, The s e t t i ng  of the f l i p - f l o p  A; occurs during the 
2 
second Phase-A pulse. A s  can be seen i n  Fig, 2.3, each f l i p -  
flop A; is fed t o  each of the majority and comparator elements; 
the output of Maj (Ai) is logic-:L when a majority of the inputs 
is at logic-1, and the output of Comp(AiM) i s  logic-1 only when 
a l l  the inputs are a t  logic-1. In  t h i s  example, Maj (Ai) -+ 1 
shortly after the first Phase-A pulse, while Comp(A.M) does 
not go t o  logic-1 u n t i l  the occiirrence of the second Phase-A 
pulse. The occurrence of Naj(A.) -+ 1: causes the counter of 
Phase-B pulses to be reset t o  zero, causes the f l ip-f lop FFMi 
to be set a t  logic-1, and also feeds the  delay element A. The 
occurrence of Comp(A.M) 3 1: causes FFMi t o  be reset (here, 
before the counter reaches an a l l -1 ' s  s t a t e ,  preventing the 
propagation of an e r ro r  s igna l ) ,  and causes an output s ignal  
3. 
:L 
3. 
e propagated t o  a 
Fi? . A 3  
Phase A 
A i  
A'3 
Corn (A i, M) 
Phase B 
Out i 
_I 
I 
Fig .  2-4 Response of a n  a n t i - s l i v e r  unanimity 
circuit i n  the even t  of no f a i l u r e  
Fig. 2,s i l l u s t r a t e s  a possible sequence for the case of 
a f a i l u r e  of one of the A.s, Eere A is  described as  having 
1 3 
failed t o  produce a b i t  of information (pulse) (. A is  shown t o  
precede A 
and comparator elements separated by one Phase-A period, When 
f l ip-f lop A' goes t o  logic-1, A' 'has already been set, and the 
majority elements go t o  logic-1, the comparator elements, how- 
ever, remain a t  logic-0, as f l ip-f lop A' was not s e t ,  Again, 
a s  a r e s u l t  of M a j  ( A . )  + 1, the counter i s  reset, FFM. is set, 
and the delay element i s  fed. 
1 
such t h a t  the events a:re passed on t o  the majority 2 
2 1 
3 
1 1 
It should be noted t h a t  the 
time-delay element, A, is used i n  order t o  prevent the f a l s e  
indication of an e r ror  when an a l l -1 ' s  s t a t e  is indicated by 
the counter j u s t  pr ior  t o  reset. The delay t i m e  required i s  
dependent on clock frequency and propagation delay between a 
r e s e t  command and a response (assuming the counter was i n  an 
a l l - 1 ' s  s t a t e )  a t  the i n p u t  of the error-indicating AND gate,  
In  the  example shown i n  Fig. 2.5 it i s  assumed t h a t  a 2 - b i t  
counter is  used: hence, short ly  a f t e r  the occurrence of the 
th i rd  Phase-B pulse, succeeding the t rans i t ion  Maj (A. ) -+ 1, 
each l i n e  indicates both an output signal and an error.  
1 
2-3 A Synchronized System With Unsynchronized 
E l e m e n t s  
Before considering a multi-layered h ie rarch ia l  system, 
it is  w i s e  t o  look a t  a simple model of t h i s  problem. Consider 
two modules each w i t h  one input l i n e  and one output l i ne ,  each 
receiving and transmitt ing data se r i a l ly .  
t i m e  (t = 0 ) ,  there is  no signal on any l i n e  and the in te rna l  
states o f  the  modules (praeessors) are  equivalent, For all 
t i m e ,  t > 0, equivalent input dat.a i s  received in  s e r i a l  bytes 
A t  an a rb i t ra ry  
e 
2 
Al 
A2 
A3 
FF A i  
FF A 2  
FF A 3  
Phase A 
A'3 
/--- 
--I---- 
O u t  i 
Error i 
F i g ,  2,5 Response of an ant i - s l i ver  unanimity 
c i r c u i t  in  the event of a fa i lure  of one of the A i s  
by both modules (not simultaneously); i t  is desired t h a t  the .  
modules perform ident ica l  operations with ident ical  in te rna l  
and input data,  and t h a t  corresponding bits  of outputted data 
be recognized i n  order t o  f a c i l i t a t e  comparison. I n  t h i s  
analysis it is  not necessary t o  consider whether the processors 
are clock-driven (synchronous) o r  asynchronous machines : i n  
ei ther case, corresponding produced data bits  a re  separated i n  
time (. . 
Not only may input data be received a t  d i f fe ren t  times 
by the modules, bu t  more s ignif icant ly  it may be received a t  
d i f fe ren t  points i n  the program being run by the two modules. 
In the worst case such a condition may cause calculations t o  
be made w i t h  d i f fe ren t  numbers, o r  a branch t o  occur i n  one 
processor bu t  not i n  the other, 
Two theorems, which taken together contend t h a t  two 
independent processors (or, i n  general, modules) may be 
synchronized, are  stated and proved below, 
Theorem 1. TWO independent modules can be made t o  
perform ident ica l  operations with ident ical  in te rna l  and input 
data. 
Proof. A s s u m e  there is an interfacing u n i t  associated 
w i t h  each module's input. The i.nterfacing uni ts  may communi- 
c a t e  with each other as w e l l  as with t h e i r  associated modules - 
see Figo 2-6, A s  i n  the case o f  the modules, the s t ructure  
and operation of t h e  interfaces  a re  ident ical  with each other. 
Assume t h a t  incoming bytes of information a re  buffered i n  
corresponding r eg i s t e r s  within each interface,  
r eg i s t e r  have two ext ra  b i t s  (n extra  bi ts  - i n  the case of 
n p a r a l l e l  modules) above the ncunber used t o  s to re  input data;  
L e t  each 
22 
Fig. 2.6 Synchronizat ion of Input  Information 
these bits a re  used a s  check bits:  a 1 i n ,  say, the l e f t  b i t  
w i l l  be taken to mean t h a t  the  input byte t o  "me" ha's been 
stored i n  t h i s  r eg i s t e r ,  and a 1 i n  the other b i t  w i l l  mean 
t h a t  the  corresponding inpu t  byte t o  the  other module has been 
stored in  the corresponding reg is te r .  Hence each module may be 
aware if corresponding information is available t o  both, The 
precise nature of the s t ructure ,  possible microprogram, or  
requirements for f ault-tolerance, of t h e  interface un i t s  need 
not be considered presently- Rather the purpose of this 
discussion i s  t o  determine, first., i f  a system meeting the  
requirements ( s ta ted  e a r l i e r  i n  t h i s  section) is  poss*le, 
Through use of these check b i t s  by the program the modules may 
be kept i n  synchronization. Note! t ha t  it is important t h a t  
updating of memory associated w i t h  a module by input data be 
controlled by t h e  program so as to assure equality of available 
infomation a t  corresponding program points. Anti-slivering 
c i r c u i t s  w i l l  not be needed between interface and module a s  
the s t ruc ture  of the programming will exclude s l iver ing 
d i f f i c u l t i e s .  
Theorem 2. Corresponding b i t s  of output data produced 
by two independent modules can be' recognized. 
Proof- For purposes of buffering of information and 
comparison allow an interface u n i t  t o  be associated with each 
module's output. To allow comparison these interfaces have 
communication w i t h  each other,  See Fig. 2 .7-  A t  an a rb i t ra ry  
t = 0, the reg is te rs  of the output buffers are clear  and no 
output bytes of information a re  stored i n  t he  reg is te rs  of the 
interface,  When corresponding reg is te rs  have been writ ten 
_. 
into,  the contents a r e  compared by comparators w i t h i n  both 
24 
Fig. 2 .7  Synchronizat ion of Asynchronous Modules 
interface uni t s ,  If there i s  no error  the data i s  released t o  
output and the r e g i s t e r s  read are cleared, 
In  order to determine how much buffering space is  
required consideration must: be given to such things as: peak 
rate of output production, maximum t i m e  separation i n  product- 
ion of corresponding information; and r a t e  of comparsion and 
clearing of r eg i s t e r s  within the interface uni ts .  I f  w e  define 
the following quant i t ies :  
p: peak r a t e  of production ( i n  bits/sec), 
n: length of one register: ( i n  b i t s ) ,  
: t i m e  required for  comparison and subsequent tc 
outputting and clearing of register ( i n  see. ) , 
t : maximum t i m e  separation i n  production of 
S 
corresponding b i t s  , and 
X: t he  number of registel-s required for one 
output interfacing buffer ,  
n b i t s  
tc < p bi ts /sec ' then i f  
If the nature of the processor is such t h a t  it produces infor- 
mation i n  s e r i a l  bytes,  then the peak r a t e  of production of 
bits, averaged over s eve ra lby te s ,  w i l l  be less than p. I f  
we define: 
a: peak t i m e  averaged r a t e  of production 
( i n  bits/sec, 1, and 
ta: averaging time, 
2 
then if 
x =  
n bits n b i t s  
(I 
a bits/sec e < tc <: p bits/sec 
+ 2  PtS 
a t  1 a t  a 
- - a -  t--=) + -  n tc P 
n b i t s  In  the worst case t = - , then: c a bits/sec, 
P t  ,. 
-’ + 2 (1 - -) + -- a a% x = -  n P n 
A numerical example would be helpful;  consider: 
7 p = 10 bits/sec,  
n = 16 bits ,  
= loo3 see, ts 6 
a = 10 bitS/SeC8 
t = 1 sec, a 
then, if t < - : n 
= P  
l?te 
x = -  + 2 = 627 regis ters ;  n 
-<tc<;: n n 
P 
but if 
4- 2 = 56.877,  s 
at P t  
n P n 
a x=:- a (1 - -) + -- 
It can be seen t h a t  if the Lime required to ready a full 
register for the next load, tc, is greater than the t i m e  it. 
takes the  processor t o  f i l l  a r eg i s t e r ,  - P 
requirements are large, If t is larger than -8 then the 
buffer must have an infinite number of r eg i s t e r s  in order t o  
n 
then the buffering 
n 
C a 
assure successful operation of the interfacing system. 
The elements of design of t h i s  elementary system may be 
applied t o  a multi-layered hierarchical system. The primary 
drawback is  i n  the requirements which m u s t  be imposed on the 
software of each processor i n  order t o  maintain synchronism 
of operations amongst replicated units.  Different software 
" t r icks"  w i l l  be required f o r  d i f fe ren t  input information 
usage: any requirement imposed on software f o r  purposes of 
maintaining synchronism will serve t o  decrease processor 
speed and hence overal l  system speed. I t  is  generally poor 
design procedure t o  depend on software improvisation f o r  
system operation. 
2.4 System Synchronization Through U s e  O f  A 
Common Clock 
Consideration is  now given t o  a system i n  which a l l  
un i t s  a re  synchronous machines and one clock is  used for the 
driving of all units.  Replicated uni t s  which a re  driven by 
the same clock may be defined t o  be i n  synchronism i n  the case 
of uni t s  which, for  purposes of f a u l t  tolerance, run the  same 
program and receive the  same data (the input being controlled 
by the  same clock),  the i n i t i a t i o n  of each corresponding 
microprogram s tep  a s  w e l l  as the rece ip t  of corresponding b i t s  
of information occur concurrently (plus o r  minus some small 
tolerance):  such u n i t s  are said t o  be i n  t i g h t  synchronism, 
First consider the same elementary problem explored i n  
Section 2.3: the synchronization of two processing uni ts .  
Even though corresponding input information and corresponding 
program steps are  synchronized 'by the same clock, s l iver ing  
28 
may allow one u n i t  t o  recognize an input a.microstep before 
the other.  However, t o  maintain t i g h t  synchronism ant i -s l iver  
c i r c u i t s  are not necessary: two-phase clocking is  su f f i c i en t  
to  avoid slivering: one phase (A) is used for  receiving and 
transmitt ing of information and the other phase (B) f o r  
driving the processor. Output bits  produced by phase A are  
buffered and then compared and transmitted by phase B, 
In a multi-level hierarchical system, such as the 
C ,S * Draper Laboratory Space Shuttle Guidance Computer pro- 
posal, t h i s  method of synchronization should be adequate for  
the en t i r e  system, However, information transmitted from 
sensor t o  loca l  processor is not l ike ly  t o  be synchronized 
with the system clock: f o r  such an interface ant i -s l iver  
c i r c u i t s  (or  anti-sl iv e r  unanimity c i r c u i t s  where cal led for  
by f a u l t  tolerance requirements) can provide the necessary 
synchronization of receipt of information by local  processors, 
2.5 Conclusions 
A t  f i r s t  glance the  system described in  Section 2.4 i s  
quite simple and desirable,  especially i n  l i g h t  of the a l t e r -  
native (Section 2-3). The d i f f i c u l t y  i n  the design of a system 
synchronized through use of a common clock l ies  i n  the design 
of the clock, Such a clock must m e e t  the fault-tolerant speci- 
f ica t ions  b o a  i n  i t s  internal  s t ructure  and in  i t s  d i s t r i -  
bution around the system: t h i s  i s  no easy task, Nevertheless 
it i s  f e l t  t h a t  it is  much more desirable t o  add t o  the corn-- 
plexi ty  of hardware design by ca l l ing  for  a faul t - tolerant  clock 
an it is t o  suf fer  the  pains of dependency on software i m p r o -  
visation required i n  an unsynchronized system, It should also 
e ed t h a t  althou onous process is generally slower 
than an asynchronous processor, an asynchronous faul t - tolerant  
multiprocessor, due to  increased software requirements and 
necessary stop and w a i t  periods.. would probably be slower than 
a faul t - tolerant  multiprocessor driven 1-..y a faul t - tolerant  
clock. 
CHAPTER 3 
CLOCKING 
3-1 SPecifications of a Fault-Tolerant Clock 
As stated i n  Section 2.5, i f  a common clock is  t o  be 
used t o  drive the system, it m u s t  m e e t  the fault-tolerance 
specifications of the overall. system. Whether the system 
specification be f a i l  operational o r  f a i l  safe, the  clock 
specification must be f a i l  operational; t h e  clock is  a s  funda- 
mental t o  the system as .the power supply. In the case of a 
faul t - tolerant  clock designed to drive a Space Shut t le  guidance 
computer, the clock would need t o  be able t o  perform a f t e r  the 
occurrence of any combination of three independent f a i l u r e s -  
Of prime importance i n  the design is t h a t  the synchro- 
nized state of the system is affected neither by any mode of 
f a i l u r e  of the  clock nor by the method of recovery from the 
Sailed s t a t e  (i.e.,  the synchronization of the system must be 
transparent t o  clock fa i lures )  .) 
For purposes of design and discussion, the d is t r ibu t ion  
of the clock t o  all par t s  of the system w i l l  be considered a s  
a p a r t  of the clock design: t h i s  seems logical,  as  d i f f e ren t  
concepts of f aul t - tolerant  cl-ocking may conceivably warrant 
d i f fe ren t  methods of dis t r ibut ion.  I t  should be realized, 
however. t h a t  one of the keys to  a good design w i l l  be mini- 
mization of the  number of w i r e s  required f o r  dis t r ibut ion.  In 
a system such as t h a t  required for  the Space Shuttle,  distances 
between modules may be on th.e order of 100 feet; i n  such a 
geographically dis t r ibuted system wiring may assume a large 
share of the cos t  and complexity. 
As a faul t - tolerant  computing system may have on the 
order of hundreds of modules, i t  is  desired t o  minimize the 
logic required within each t o  convert the  dis t r ibuted clock 
information in to  the one t r a i n  of clock pulses which is  used 
for  driving the module. Any fa i lure  of this logic w i l l  be 
considered as a f a i lu re  of the en t i r e  module, which will be 
detected by comparison of outputs amongst replicated units.  
3.2 General Methods of Design -
Two general design approaches come t o  mind: (I) use a 
s ingle  clock i n  conjunction w i t h  a single-wire bus f o r  d i s t r i -  
bution u n t i l  the occurrence of a f a i lu re  i n  the  o sc i l l a to r  OK 
i n  the dist r ibut ion,  at w h i c h  t i m e  another clock and i t s  
associated bus axe brought into action: this pr inciple  is  
i l l u s t r a t e d  i n  Fig. 3.1; the  enable c i r c u i t s  permit only one 
clock to be dis t r ibuted a t  a t i m e ;  (Enable) passes clock n i f  
and only if f a i l u r e  detectors 1 through n-1 indicate f a i lu re  
( i n i t i a l l y  clock 1 is dist r ibuted):  (2) use a group of mutually 
synchronized osc i l l a to r s  which can to l e ra t e  the required number 
of f a i l u r e s  and s t i l l  have several Itgood” outputs: see F ig ,  3.2. 
n 
3 , 3  Fault-Tolerant Clockinq Throuqh Failure-Detection 
and Subsequent Clock Substi tution 
In this section, through logical development, an explor- 
ation i s  made of the  f e a s i b i l i t y  of a clocking system which 
achieves fault-tolerance through failure-detection and subsequent 
32 
. 
e .  
e .  
0 .  
e . e . 
e 
Ei: (Enable) i 
F D i :  (Fai lure  Detector) i 
Fig .  3 .I Faul t -Tolerant  Clocking through 
Fai lure-Detect ion and Clock S u b s t i t u t i o n  
3 
b 
b 
b 
Fig. 3 - 2  Fault -Tolerant  Clocking 
through Synchronizat ion of O s c i l l a t o r s  
3 
clock subst i tut ion.  Consider Fig, 3.,3: the  clock system and 
the clock bus are  t o  be designed t o  be fault-tolerant:  nei ther  
the connection between module and busI nor the module trans- 
ducer need Ls faul t - tolerant ,  as a f a i lu re  there may be con- 
sidered t o  be a f a i lu re  of the associated module. The design 
of the module transducer and i ts  connections to  the bus, how- 
ever, is an in tegra l  pa r t  of the design of the clocking system; 
the module transducer converts the information on the bus into 
a _ .  single  continuous clock waveform and needs t o  be designed 
such t h a t  the outputs of all module transducers are  i n  synchro- 
nism. It w i l l  be seen tha t  some elements of the clocking 
system need t o  be external  t o  themodule, w h i l e  others need t o  
be associated w i t h  the  module. 
fn order t o  simplify the f e a s i b i l i t y  study, s y s t e m  design 
fo r  single-fault-tolerance w i l l  be explored first. Figure 3.4 
is a general description of a single-fault-tolerant clock. In  
order to assure t h a t  each module u t i l i z e s  the same clock, 
f a i lu re  detection should be, external t o  the module. 
The most obvious d i f f i cu l ty  i n  designing the fai lure-  
detection and reconfiguration scheme is  maintenance of synchro- 
nizat ion through the f a i l u r e  and reconfiguration process. I n  
order t o  prevent the f a i l e d  clock from feeding the data  manaye- 
ment system, the clock waveform must be tested f o r  f a i lu re  
before it is used: b u t  i n  order t o  detect f a i lu re s  i n  d i s t r i -  
bution, the clock waveform m u s t  be tested a f t e r  dis t r ibut ion.  
it would appear t h a t  each module transducer must be designed to 
"hold" (delay) use of the clock waveform u n t i l  it is  sure  t h a t  
a f a i lu re  bas not  occurred, When a f a i l u r e  is  detected each 
transducer holds i t s  output a t ,  say, logic-level-0, u n t i l  a f t e r  
e clock system as been reconfigured, and a IIgood'* clock 
I 
I 
I 
I 
I 
I 
I 
- - - - - - - - _ - _ _ -  
I 
I 
I I 
. I  I , Module I 
F i g .  3 . 3  General Fault-Tolerant Clocking Concept 
E: Enable 
FD: Failure Detector 
Fige 3.4 Single-Fault-Tolerant Concept 
:3 7 
waveform is  available,  a t  which time each transducer passes the 
good clock waveform. The d i f f i c u l t y  now is  i n  assuring tha t  
synchronization is maintained through "&e loca l  transducer 
process of: output-hold-output. 
I f  one of the enable c i r c u i t s  f a i l s  such as  t o  produce a 
random output, the module transducer is required t o  choose the  
18good" waveform, i f  single-fault-tolerance i s  to exis t .  The 
amount of c i r c u i t r y  required fo r  the transducer t o  choose the 
**good*' waveform can be reduced i : E  the s t a t e  of the f a i lu re  
detector i s  made available t o  the transducers (via a f a i l u r e  
detector bus):  i f  t h i s  is done, the enable c i rcui ts  shown i n  
Fig. 3.4 become superfluous, Figure 3.4 may be revised as 
i l l u s t r a t e d  in  Fig. 3.5. The f a i l u r e  detector may be simply 
implemented as i l l u s t r a t e d  i n  Fig. 3.6, The c i r c u i t  is 
designed t o  allow a tolerance on clock pulse width and separ- 
a t ion between pulses: if the tolerance i s  violated the output 
of the failure-detector goes to ,  and is  held at ,  logic-1; the 
reset capabi l i ty  i s  provided for  i n i t i a l i z i n g  the clocking 
system. The pulse widths of the one-shot outputs determine the 
tolerance: it is a straightforward procedure t o  determine the 
necessary one-shot timing, given: clock frequency, duty-cycle, 
and allowed variations i n  both, a s  w e l l  as data concerning 
tolerances, of the  propagation delays andone-shot pulse w i d t h s ,  
associated with the failure-detector components. 
gerable one-shot should have an output pulse width of approxi- 
mately twice the period of the clock; it assures the detection 
of a f a i l u r e  t o  logic-level-0 or  logic-level-1, 
The re t r ig-  
Figure 3.7 i s  the design of a module transducer which 
e used in  conjunction with the clock system of Figures 3-5 
and 3.6, AL is  the delay associated with each module transducer: 
- 
38 
reset 
Fig. 3.5 Revi sed  S ingle-Fault-Tolerant Concept 
I 
I 
I 
I 
I reset , 
Fig. 3.6 Failure Detector 
(bus) 1 
T 
failure detector bus 
r - "  
Module Transducer 
I I 
@ 
I 
I 
I 
I 
A3 a4 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
8 
I 
I 
1 
I 
I 
I 
8 out 
Fig. 3.7 Module Transducer for Single-Fault-Tolerant 
Clocking through Failure-Detection and Subsequent Clock Substitution 
it allows the prevention of Che propagation of a f a i l ed  clock, 
In  a system s t a t e  pr ior  t o  clock fa i lure ,  the waveform of clock 
1 is passed through delay, Al, and one-shot, O.S. and then t o  
the output, 
termination of the dis t r ibut ion of clock 1 and the subsequent 
1" 
I f  a f a i lu re  is indicated, FF1 1, causing the 
d is t r ibu t ion  of clock 2 to each module. Additional c i r cu i t ry  is 
provided i n  each transducer f o r  maintenance of synchronization 
during and a f t e r  the switching period. Slivering within the 
transducer i n  effect ing the cut-off of clock 1 and the cut-in 
of clock 2 may, i n  some cases, y ie ld  an extra pulse associated 
with clock 1 o r  an extra  pulse associated with clock 2; thus 
a f t e r  the switching has taken place the t o t a l  number of clock 
pulses supplied t o  each module may d i f f e r  by one or two. 
is set, and i f  an "extra" pulse of clock 1 is  propagated, 
*'extra*' pulse of clock 2 is  propagated, FF is set. I n  those 
transducers i n  w h i c h  FF 
ing process, one extra  pulse for  each unset f l ip-flop is  inser t -  
I f  an 
FF2 
3 
o r  FF3 was not set during the  s w i t c h -  2 
ed between the end of the clock 1 waveform and the  beginning of 
the clock 2 waveform, thereby maintaining synchronism, R e q u i r e -  
meats are established fo r  values of AI, A2, A3, A4, and the 
pulse widths of the one-shots, as w e l l  as t he i r  tolerances; the 
requirements are  imposed by the system parameters (e.g.8 clock 
frequency), as  w e l l  as  by the required method of operation. 
It is believed t h a t  the module transducer shown i n  F i g .  
3.7 is an example of a minimally complex (or nearly so) trans- 
ducer required f o r  fault-tolerant clucking through f a i lu re  
detection and subsequent clock subst i tut ion,  The necessity of 
requiring such complex operations t o  be performed on the module 
level,  ra ther  than the external clock system level,  has been 
e development o f  this section- Because of the 
module-level complexity here,  it is f e l t  t ha t  designs such a s  
described i n  l a t e r  sections of t h i s  thesis ,  are much more 
desirable;  hence a detai led analysis of the module transducer 
i l l u s t r a t e d  i n  9ig. 3-7 is not presente2.. 
For n-fault-tolerance, the system becomes more compli- 
cated on a l l  levels,  and, of course, it is the increased com- 
pl icat ion on the module l eve l  which is  par t icular ly  undesirable, 
It i s  concluded t h a t  faul t - tolerant  clocking through f a i l u r e  
detection and subsequent clock subst i tut ion i s  possible, but 
is extremely cost ly  i n  hardware implementation. 
3.4 The McKenna Clock 
3.4-1 F i r s t  Concept 
In AUgUSt, 1971, Williim Daly and John F. McKenna, i n  a 
C.S, Draper Laboratory memo (Ref. 2), described their  design of 
a faul t - tolerant  clock. Figure :3.8 i l l u s t r a t e s  the concept 
proposed f o r  single-fault-tolerance. I t  i s  seen tha t  t h i s  
design conforms t o  the method of clocking shown i n  Fig, 3.2 and 
indeed may be described as  a synchronization of o sc i l l a to r s ,  
It should be noted, however, that here a single  clock element, 
apart  from the others,  is not an osc i l l a to r .  Rather, as s h a l l  
be seen i n  the analysis t o  follow, each clock element depends 
on the occurrence of the t rans i t ion  i n  s t a t e  of several  of the  
clocks i n  order t o  be driven to  change i t s  own s t a t e .  
The quorum function Q is defined t o  be 1 i f  a t  l e a s t  a n a 
Cn are 1, and 0 1, of the n independent variables C 
otherwise. For example i 
4 
Q, = c1 + c + c3 + c4 2 
4 
I 
I L I 
I - c4 c2- c3- 
c4- l e - - - - - - - - - - -  
Fig. 3 . 8  Daly-McKenna Clock 
Q4 = c c + c c + c1c4 +. c e f c2c4 + c3c4 2 1 2  1 3  2 3  
= c c c + c c c + c1C!p4= c2c3c* *3 1 2 3  1 2 4  
Q : = C C C C  1 2 3 4  
4 4 
2 3 Q and Q may each be 
gating o r  through one 
realized through two levels  of Boolean 
level of .threshold logic. 
The use of threshold logic, however, o f fe rs  
no advantage unless LSI threshold logic technology is  t o  be used, 
All clock l i n e s  are dis t r ibuted t o  each synchronous 
module within the system. I t  w i l l  be demonstrated i n  a l a t e r  
pa r t  of t h i s  section t h a t  majority voting and subsequent f i l t e r -  
ing within each module i s  necessary t o  maintain system synchro- 
nization, and an adequate module transducer will be described, 
The logic required t o  induce free-running osc i l l a t ion  
a f t e r  power-on is n o t  shown i n  Fig. 3 . 8 .  For the purposes of 
this analysis i t  w i l l  be assumed t h a t  the clock elements a re  
already o s c i l l a t i n g  i n  synchronism a t  the time of observation. 
T h i n g  analyses will be made, to demonstrate s y s t e m  performance, 
For purposes of analysis the following assumptions are  
made: each gate has a propagation delay equal to  A; the  propa- 
gation delay i n  forming a quorum function i s  2A; At is a pure 
delay, greater  than o r  equal t o  80 ( i n  order t o  avoid s l iver ing 
within the clock element which could yield spikes i n  the output, 
A t  must be greater  than the propagatiou delay through the 
which is  4 A ;  however, H21° A31@ N41@ series of gates: 
since by most manufacturers' specifications propagation delay 
within a simple Boolean gate may reach to  nearly twice the 
4 
i c a l  delay t i m e ,  i n  the worst case the delay through four 
gates i n  se r i e s  may approach 81111, 
A s s u m e  t h a t  the clock elements are  osc i l l a t ing  i n  synchro- 
nism. Assume t h a t  a t  t i m e :  t=:tl, C goes t o  logic-I:  t=t2, 
C 2  + I: t=t3, C -+ 1; and a t  t-t 
t4 > t3 > t2 > tlo 
(tl - E )  8 a l l  propagation, w i t h i n  each clock element, caused by 
the previous t rans i t ion ,  Ci+ 01, has ceased ( t h i s  w i l l  be veri-  
f i ed  within the  analysis t o  follow): therefore the outputs of 
the gates ( R e f .  Fig. 3.8) are as follows: 
1 
3 4' 4 C 3 1 (see Fig. 3.9), w h e r e  
A s s u m e  t h a t  a t  an i n i t i a l  t i m e  of observation, 
= o  'i I n i t i a l l y  : 
4 
Q2i = 0; = o  4 *3i 
= 0; D = O  
= 1  
Dli 2i 
H 2 i  Hli = 1: 
Ali = o  Mli = 1; Aai = 0; 
= 1; A3i = 0 ;  A4i = o  M 2 i  
= 1  M3i Agi = 0'; Agi = 0; 
A = 0 ;  M = I  
. 8i 4i A,i = 0 ;  
= 1; Nli 
N2i s Ci = o  
. N3i = 1 
Given the i n i t i a l  s t a t e  and the assumed progression of 
events, the timing analysis is as follows: €or reference 
purposes a l l  t rans i t ions  are rmnibered, 
c1 
c.2 
c3 I 
r 
F i g .  3.9 Assumed Synchronism 
46 
4 4 
QZi and Q remain a t  0 3 i  
. m e  gate x i = 2,3@4, remain. unchanged from xi '  
the i n i t i a l  s t a t e  
+ 0 a t  t + A (the va l id i ty  of t h i s  statement N3 1 1 
is dependent on the nature of t ransi t ion (3.1)) (3 e 2) 
C +l at t2 (N2*-+ 1) 2 (3 e 3) 
4 Qgi remains a t  0 
N -+ 0 a t  t2 + A (as i n  (3.2)) 32 
A2i+ 1 a t  t2 + 3A 
Dli+ 1 a t  t2 + 2A + A t  
H 4 O a t t  + 3 A + A t  li 2 
A + 0 a t  t2 + 4 A  + A t  li 
It has already been assumed t h a t  A t >  8A,  b u t  it is st i l l  
in te res t ing  t o  note t h a t  for A t  = 0, a short  duration pulse 
might o r  might not be generated by M yielding possible 
s l iver ing  within the clock's logic and hence unpredictable 
operation. 
li 
C 3 3 1 a t  t3 (N23 -+ 1) 
N33+ 0 a t  t3 + A (as i n  (3,2)) 
4 Q2i remains a t  1 
remains a t  1 %i 
Q : ~ - +  1 a t  t3 + 211 
(3-10) 
(3-11) 
(3.12) 
(3.13) H -+ 0 at tj + 3A 21 
+ O  at t + 4A 3i 3 
MZi+ 1 at. t3 + 5A 
Nqi+ 0 at t3 + 6A 
+ 1 at t3 + 2A + At D2i 
A 3 1 at t3 + 38 + At 8i 
M4iT’ 0 at t3 + 4A + At 
N3i+ 1 at t3 + 5A + At 
- (N2i = Ci) + 0 at t 
N4i 
+ 6A + At 3 
3 1 at t3 + 5A + At 
A4i+ 1 at t3 + 6A + At 
4 0 at t3 + 7A + At M2i 
+ 0 at t3 + 8A + At 
3 1 at t3 + 9A + At 
A8i 
*4i 
As a result of (3.21) : 
4 
Q Z i - S O  at t3 t 80 + At 
Q:i--) 0 at t3 + 88 + At 
+ C )  at t3 + 9A I- At A2i 
3 1 at t3 + 108 + At M1i 
-+ 0 at t3 + llh + At 
3 0  at t3 + 88 + 2At 
4 1  at t3 + 9A + 2At 
-3 1 at t3 + LOA + 2 A t  
Nli 
Dli 
%i 
(3*14) 
(3.15) 
(3.16) 
(3.17) 
(3.18) 
(3-19) 
(3.20) 
(3.21) 
(3.22) 
(3.23) 
(3.24) 
(3.25) 
(3.26) 
(3,271 
(3-28) 
(3,29) 
(3,30) 
(3-31) 
(3.32) 
(3-33) 
(3,34) 
48 
0 at t3 -1- 11A + 2At ( 3  0 35) 
= ci) 3 3. at t., + 128 + 2At (3-36) (N2i 3 
N3i=+ 0 at t3 -1- 13A + 2At 
Nli+ 1 at t3 -1- 120 + 2At 
Ali+ 1 at t3 -I- 13A + 2At 
Mli3 0 at t3 4- M A  + 2At 
As.,+ 0 at tg i- 15A + 2At 
M3.,+ 1 at t3 1- 16A + 2At 
H 
A3i+ 1 at t3 1- lOA + At 
1 at t3 t- 9 A  + At 21 
D2i+ 0 at t3 4- 8A f 2At 
Asi+ 0 at t3 $- 9A + 2At 
A4i+ 0 at t 4- 9A f 2At 3 
(3 * 37) 
(3,38) 
(3-39) 
(3.40) 
(3-41) 
(3.42) 
(3.43) 
(3 -44) 
(3.45) 
(3.46) 
(3.47) 
It is seen that the.letst transition of each gate (up to 
transition (3.47)) restores the gate to its initial setting. 
Because of transition (3.36), C.3 0 at a time later: 6A + At; 
and because of C . 4  0, C.3 1. i .n  another i n c r e m e n t  of t i m e :  
6A + At. 
1 
I. 1 
The duty-cycle of Cli is 50%. The period is: 
= 1211 + 2At Tci 
Hence the maximum frequency is: 
1 - -  f  
m a x  28A 
‘ For medium speed TTL, A --j 12 ns: therefore f 3 mz, max 
With use of the above timing analysisa the assertion t h a t  
At must be greater ‘han 4A, for successful operation, may be 
demonstrated. Assume t h a t  A < AI: < 4A: then t rans i t ion  (3.17) 
occurs before (3.16) and a possible tining sequence is as follows: 
D -+ 1 at t3 + 2A + At (3.48) 2i 
ISqi-+ 0 a t  t3 + 6A (3.49) 
Asi+ 1 a t  t3 + 38 +- A t  (3-50) 
Aqi+ 1 a t  t3 + 3A + A t  (3.51) 
Aqi+ 0 a t  t3 + 7A (3.52) 
(3.53) M2f+ 0 a t  t3 + 4 A  + At 
Mqi+ 0 at t3 + 4A f At 
M .+1 a t  t + 8A 
N4i 
21 3 
-+ 1 a t  t3 + 5A + At 
(3.54) 
(3.55) 
(3.56) 
N3i+ 1 at t3 + 5A + At 
Agi+ 1 at t3 + 9A 
(3.57) 
(3.58) 
(3.59) 
Agi+ 0 at tg -k 5A + At 
A4i+ 1 at t3 + 6A f At 
(N21 I CL)+ 0 at tj + 6A + At 
M4i-+ 1 at t3 + 6A + At 
Mqi+ 0 at t3 + LOA 
(3.60) 
(3.61) 
(3.62) 
(3.63) 
Transitions (3,61) and (3-62) both occur a t  t3 + 6A f At, 
but if (3.62) occurs j u s t  before (3*61) the following t rans i t ion  
may occur i n  some clock elements: 
N3i+ 0 at t3 + 7A + At (3 0 64) 
C , ) +  1 at t3 = 80 + A t  (3-65) I 
so, it i s  seen tha t  f o r  A < A t  < 4A, proper operation can- 
not be assured, 
Now assvme t h a t  A t  < A: then as a r e su l t  of t rans i t ion  
(3.12) : 
+ l a t  t3 + 2A + At D 2 i  
+ O  a t  t + 3 A  '2i 3 
(3.66) 
(3,67) 
-+= 1 a t  t3 -k 38 + A t  (3.68) A4i 
A3i+ 0 a t  t3 + 46 (3-69) 
Since (3.68) occurs before (3.,69), w i l l  not go t o  1 and 
therefore C. w i l l  not go t o  O,, yielding a non-oscillatory con- 
1 
dition. 
It has been shown, i n  support of the or iginal  assertion, 
that A t  must be greater  than 4 A .  Also, as mentioned, i n  order 
to assure operation i n  the event t ha t  gate propagation delays 
are nearly double nominal value, At should be no less than 
The principle of operation is seen to be as  follows: when 
4 Q:+ 1, C.+ 0 6 A  + A t  l a t e r ;  when Q2+ 0, C . +  1 6A + A t  l a t e r ,  
Differences amongst clock elements i n  the propagation of the  
signal tr iggered by the leading edge of the Q 
propagation of the signal tr iggered by the t r a i l i n g  edge of the 
1 1 
4 
3 pulse o r  i n  the 
4 
2 
Q pulse will cause minor time separations i n  occurrences of 
leading and t r a i l i n g  edges amongst clock element pulses,  
should be noted t h a t  i f  for  clock element 1, Q4 + 1 before 
H .+ 1, then C .  i s  driven to :Logic-l 
Similarly, if Q4 + 0 before D 
It 
2 i  
5A a f t e r  Q2 + 1, 
+ 1, then C .  is driven t o  logic-0 
13. 3. 
*3i .2 i 3. 
68 after Q3+ 0, 
For purposes of determining t h e  e f f e c t  of differences i n  
propagation delays on clock performance consider the following 
def ini t ions and analysis: f i r s t ,  assume t h a t  the delays of 
each clock elenent are  within tolerancea such t h a t  the "set-to- 
4 agree" (Q4+ 1 drives Ci+ 1 or Q3+ 0 drives Ci+ 0)  function is not 
u t i l i zed  i n  normal operation; now, define the t i m e  between the 
2 
4 
3 event Q -+ 1 (here, Q re fe rs  t o  t h e  conceptual quorum function, 
not the physical implementation) and the  result ing event C.3 0 
as ( 6 t d )  i; define the t i m e  between the event Q -+ 0 and the 
4 resul t ing Ci-+ 1 as ( 6 t u ) i .  Once the event Q -+ 1 has occurred, 3 
9 4  0 w i l l  occur (st,)-- l a t e r ,  where x i s  the clock possessing 
I. 4 
2 
4 
d 2 
4 U A  
the next t o  the la rges t  ( E t  ) ;  a f t e r  Q 
occur ( 6 t  ) l a t e r ,  where y is the  clock 
U Y  
the la rges t  ( S t u ) .  
Two specifications which considered 
f i can t  measurement of clock performance, 
follows: 
z duty-cycle of the functi.on C1 C2 
L duty-cycle of the function C C ATd 1 2  
o occurs, Q:-+ I, will 
possessing the next t o  
together of fe r  a signi-  
may be defined a s .  
These two specifications indicate,  respectively, the size of the 
overlap region of t he  clock pulses and the s i ze  of the overlap 
region of the clock 0-states. If there w e r e  no differences 
amongst propagation delays of l i ' ke  elements then ATU and  AT^ 
would both be m Following is a derivation of the relationships 
between ATU,  AT^ and the ( E t u )  Is, 
Assign nmibers t o  the clocks such that:  
1 
( a t d )  's: 
2 
Assign letters t o  the clocks such that: 
(6 It: ) < b< (' td) d ( 3  91) u a  
There are  4: d i s t i n c t  s e t s  of the four clock outputs, yet  i n  
each case: 
the period of the clock is given by: 
+ (6t .) (3.72) u c  T = 
(3.73) 
(3.74) 
Two cases are i l l u s t r a t e d  (F igs .  3.108 3.11) fo r  the purpose of 
shedding some l i g h t  on why ATtL and ATd are independent of the 
manner of the pairings of the ( 6 t d )  's w i t h  the ( 6 t U )  *s i n  the 
four clocks. 
The percentage variation of the ( 6 t d )  's and the ( 6 t U )  's 
around some nominal value is d.ependent on both component speci- 
f icat ions and component selection: tes t ing  and subsequent 
selection of components w i l l  yield a minimum variation. 
case analysis w i l l  y ie ld  a d i rec t  correlation between clock 
performance, a s  measured by AT 
the ( 6 t d ) ' s  and ( 6 t J ' s .  
(at) 
and (3.74) : 
A worst- 
and ATd, and the tolerances of 
U 
If ("tu) and (6 td)  fall within a range 
(1-x) < a t  < (6t)nom ( 1 - t ~ )  , then f r o m  eqns (3-72) (3.73) 8 nom 
bflu A I  min E ( 3 * 7 5 )  
Note t h a t  due to the "set-to-a.gree*' function it is unrea l i s t i c  
Fig, 3,10 Determination of A T ~  
and .had - Case 1 
5 
=3b 
=2c 
C4d 
Fig, 3,Pf Determination of AT, 
ATd - Case 2 
E i  
5 
to assume t h a t  [ ( 6 t d ) 3  - ( 6 t  ) 
man 68, or  t h a t  [ ( E t U . I c  - (8tU)b]  o r  [ (s tu) ,  - (s tuIb]  is larger 
than 5A, but  t ha t  the above analysis is val id  because, i n  the 
worst case ( 6 t u ) a  and ( 6 t d I l  may be equal t o  ( 6 t )  nom. 
the other (8td) 'rr  and (8tu)'s equi3.l to  (5t)nom- (l+x) : i n  t h i s  
worst case the set-to-agree function is  not u t i l i zed .  
o r  [: -. (St,);] is  larger  d 2  
(1-x) and, 
By nature of the design of -the clock, any single f a i l u r e  
can directly a f f ec t  the output of only one clock element, In  
order t o  examine the post-failure operation of the c i r c u i t  it is 
not necessary 'to examine the failure-modes of every logic  gate; 
ra ther  it is  su f f i c i en t  t o  examine the e f f ec t  on clock operation 
of one f a i l e d  clock element. The operation of the clock is  
examined below for  three modes of f a i lu re  of any one clock 
element: f a i l ed  t o  logic-level-1, fa i led  to  logic-level-0, and 
random osc i l la t ion .  Flaws i n  the design of the clock w i l l  be 
exposed, and a revised design w i l l  be recommended. 
The following def ini t ions w i l l  be useful i n  the analysis: 
3 = 1 i f  and only i f  n out of the three 11good8a *, 
clock elements a re  a t  logic-level-I e 
Using previously defined nomenclature, assign 
numbers and letters t o  the  three ''good" clock 
elements such that :  
Denote (Etd) and ( E t u )  of the fa i led  clock element, p r ior  
o fa i lure ,  as ( 6 t d )  and (Zitu) f8 respectively.  
If the failure i s  to logic-level-l and if the system does 
56 
not f a i l  a s  a r e s u l t  of the Occurrence of the fa i lure ,  then. the 
event Q -+ 1 is equivalent t o  Q 3 1, the event Q + 0 is equivalent 
t o  Ql+ 0, and the pr inciple  of operation of the three good clocks 
remains the same: once the event Q2+ 1, has occurred, Ql-+ 0 w i l l  
4 3 4 
3 2 2 3 
3 3 - - 
5 occur ( 6 t d ) 3  l a te r :  as a r e su l t  of th.a event Q + 0, Q’+ 1 w i l l  
occur l a t e r .  Therefore the period of osc i l la t ion  of each 
1 2 
good clock is: 
( 3 . 9 6 )  
where the subscript f l  denotes f a i l u r e  t o  logic-level-1, Unless 
( 8 t  ) > ( E t  ) and ( E t u ) €  < the period pr ior  t o  f a i lu re  d f  d 3  
d i f f e r s  from Tfl. 
I f  the  f a i lu re  i s  t o  logic-level-0 and i f  the system does 
not f a i l  as a r e s u l t  of the .occurrence of the fa i lure ,  then the 
event Q -+ 1 is equivalent t o  Q + 1, the  event Q + 0 is equivalent 3 
t o  Q2+ 0, and: once Q -+ 1 has occurred, Q 3 0 w i l l  occur ( 6 t  ) 
4 3 4 
2 3 3 3 
3 3 1 2 d 2  
3 
l a t e r ;  as a r e s u l t  of the event QJ+ 0, QJ+ l ‘ w i l l  occur ( 6 t U ) c  2 3 
l a t e r .  
is : 
Therefore the period of osc i l la t ion  of each good clock 
1 r n =  (3*77) 
where the subscript  f O  denotes f a i lu re  t o  logic-level-0. Unless 
( S t  ) < ( 6 t  ) and ( E t  ) > ( 6 t U ) = @  the period p r io r  t o  f a i lu re  
d i f f e r s  from Tfoe  
d f  d 2  U f  
Pr ior  t o  f a i l u r e  i f  ( 6 t a ) f  > and ( S t u ) f  > ( 6 t u ) c  
then: 
6 t  
U 
It has been shown that if the failure is to logic-level-1 
or logic-level-0, and if the system does not fail as a result of 
the occurrence of the failure, then the three good clock elements 
continue operation, but at a frequency determined by the new set 
of (6td) ‘s and ( 6 t U )  ‘s of the three operating elements. 
Figures 3.12 through 3.15 show how the occurrence of a 
failure may induce a spurious short-duration pulse in the wave- 
In the remaining text of the thesis such a 4 3’ form of Q2 o r  Q 
short-duration pulse will be referred to as a glitch. It is 
shown below that the clock will tolerate the failure-modes 
illustrated in Figs- 3.12 and 3.13, providing that (6tdfi and 
(6tu) are within prescribed tolerances, but that the failure- 
mode illustrated in Fig. 3.15 may induce a glitch to appear in 
any clock element output; Because the undesirable transistions 
4 4 which occur in Q2 or Q3 may be extremely rapid, slivering may 
occur within the clock elements: the study which will be made in 
each case represents the worst-case analysis. 
.Case 1. Again, in this case and those following, reference 
is made to Fig, 3.8, In Fig. 3.12 C ‘fails to logic-l within 1 
the region: (6t 1 < tfl < (6td)3:  the analysis  of this failure- d 2  
mode follows: 
-+ 0 at t = (6ta)2 + 28 ( 3  7 8 )  4 *3i 
-+ 1 a t  ter + 2A 
Q3i-+ 0 at (6td)3 + 2A 
4 
4 
%3i (3-79) 
(3.80) 
!5 
t 
Fig. 3.12 Case 1 
[II I I 
b 
Fig* 
tf8 
3 , b 3  Case 2 
Fig .  3.14 Case 3 
1 1 1 1  
6 
F9.g * 
t 
3.15 Case 4 
t 
6 
H 3 3. a t  + 3A 
€i + 0 a t  tfl + 3A 
2i 
2 i  
(3,81) 
H ,+ 1 a t  ( 6 t d ) 3  + 38 21 
A .-+ 1 a t  + 4A 
A .-+ 0 a t  tfl + 411, 
A .-+ 1 a t  ( S t d l 3  + 4a 
31 
31 
31 
-+ o a t  (6td)2 + 2A + At 
D 2 i  
D2i-+ 1 at tfl + 2 A  + A t  
D 3 0 a t  ( 6 t d ) 3  + 2 A S ' A t  2 i  
A 
A 
3 0 a t  ( S t a ) 2  + 3A + A t  
3 1 a t  tfl + 3 0  + A t  
4i 
4i 
A4f+ 0 a t  (6td)3 + 3 A  + A t  
(3-82) 
(3-83)  
(3-84) 
(3.85) 
(3.86) 
(3.87) 
(3.88) 
(3.89) 
(3.90) 
(3.91) 
(3.92) 
I f  t r a n s i t i o n  (3.90) occurs while AJi is a t  logic-0 (see 
t rans i t ions  (3.84) through (3.86)) then Ci is driven to logic-0 
a t  about the  same t i m e  tha t  it is  being driven to logic-l  by the 
propagation of the signal: Q:+ 0 ;  in order to avoid this 
s i tua t ion ,  set: 
As previously s ta ted ,  we may define the tolerances of fBt) as 
.follows : 
2 
Combining eqns e (3.93) and (3.94) i n .  a worst-case condition: 
(f+x) '- '(1-x) < At - A nom 
and since = 6 A  + At, 
A t  - A 
< 2 A t  + 12A 
(3.95) 
(3.96) 
If the  system is  t o  be run a t  maximum frequency, At = 86, and 
x < . It is seen t h a t  there i s  a tradeoff between the 1 
tolerance of components 
C a s e  2. I n  Fig. 
region: ("tu)b < tfO < 
Case 1 yields:  
H e r e  the requirement on 
and t h e  frequency, 
3.13 (2 fails to logic-0 w i t h i n  the 
( a t  ) i an analysis similar t o  t h a t  of 
1 
u (1 
< A t  + A 
x fo r  successful operation is: 
At + A 
< 2 At + 12A 
(3.97) 
(3-98) 
This is  not as  s t r ingent  a requirement as C a s e  1 provides. 
C a s e  3. In Fig, 3-14 C1 fa i l s  to logic-0 wi th in  the 
region: ( 6 t  ) < tfO < ( 6 t u ) d  e It can be shown ( i n  a manner 
similar t o  Case 4) tha t  Mqi may go tc logic-0 and back t o  
u c  
logic-1, within a t i m e  on 
i s  t r ans i t i on  occurring 
4 by the event: Q3-+ 1 a t  t= 
the order of one propagation delay; A, 
shortly before Ci is driven t o  logic-0 
(8tu)a:  as a result a g l i t ch  may oeeur 
6 
but  not within C I n  Case 4, howevero it 3 i @  i a t  the output of N 
is shown tha t  a g l i t c h  may occur i n  C i 
Case 4. In  Fig. 3.15 C fa i l s  to logic-1 w i t h i n  the 
region: < tfl < (Std)4. The analysis of this failure-mode 
follows : 
1 
is  driven to logic-1. As a r e s u l t  of Q2+ 0 a t  (8 td)30  MLi 4 
may go t o  1, driving A 6 i  is  a t  logic-l, If when Q 4 1 a t  tf10 Mli 
M3i 
(’ “14 - 1 
4 
2 
t o  logic-0, The length of t h e  tha t  M3i is a t  logic-0 is 5 
tf3; as a r e s u l t  C .  may go t o  logic-1 a t  about 3 A  a f t e r  
it had gone t o  logic-0, o r  a pulse: of duration approximately 
equal t o  A may occur i n  Ci, o r  C. may be unaffected, yielding a 
loss of synchronism amongst the cl.ock elements.  Also the extra 
pulse i n  Q2 may cause a pulse t o  occur i n  C 
set a t  logic-1. 
3. 
4 j u s t  before Ci is i 
Given t h a t  C1 f a i l s  t o  1ogi.c-level-1, t he  probabili ty that 
system operation i s  affected a s  described i n  Case 4 is the 
probabili ty the tfl w i l l  occur between (8 t a )3  and (6td)4 ,  or: 
where the subscript sf stands for system failure.  So the smaller 
the tolerance on ( E t d ) @  the smaller is the probabili ty of a 
system f a i l u r e  being induced by the f a i lu re  of a clock element 
to logic-1. It should be noted, however, that  i f  a clock element 
exhibits a failure-mode of random osc i l la t ion ,  tha t ,  over an 
extended period of operation, Psf approaches unity. Therefore 
i n  order t o  assure single-fault-toleranee, the c i r c u i t  o f  Fig, 
,8 must be modified. Figure 3e16 is the suggested redesign of 
a sfsck element. The additional c i r cu i t ry  increases the values 
6 
Fig. 3.16 Revised Daly-McKenna Clock 
of (btd)nom and (6tu)nom8 as fofl-ows: 
= 9A + At: 
( nom 
= 7 A  i- At: ““d) nom 
( 3  LOO) 
( 3  101) 
thereby decreasing the duty-cycle s l ight ly .  The values of the 
necessary one-shot pulse widths are  dependent on the desired 
operating frequency of the system (and there’fore are  dependent 
on the value of A t ) .  Figures 3 - 1 7  through 3-20  i l l u s t r a t e  how 
the addi t ional  c i r c u i t r y  f i l t e rs  out  the extraneous pulses of 
1 
2 + -  (6td)noml * 
4 4 QZi or Q3i; the pulse-width of OSli is [ (6tYlnom 
I 
and the pulse-width of OS 2 i  is  [: (6tdInom + - 2 ( ~ t U ) n o m l .  
As stated e a r l i e r  all clock l ines  are dis t r ibuted to  each 
synchronous module i n  the computing system. I f  the module 
transducer were simply a two out of three voter,  the occurrence 
of a f a i l u r e  could cause a g l i tch  i n  the output of the transducer, 
The subsequent s l iver ing  which may occur could destroy the 
synchronism of the system. &erefore the module transducer must 
f i l t e r  out  the  extraneous pulses. Figure 3.21 is  the design of 
a module transducer which may be used i n  conjunction w i t h  the 
Revised Daly-McKenna Clock. The. pulse w i d t h  of the one-shot is: 
1. Figure 3.22 i l l u s t r a t e s  the operation 
Note tihat although the design of the 
1 
of the module transducer. 
single-fault-tolerant clock requires four clock elements, only 
three need t o  be examined t o  extract  a reliable clock, For the 
sake of uniformity of presentation it has been assumed t h a t  a l l  
clack l i n e s  are dis t r ibuted,  In  the  event of the development of 
a reconfigurable voter  (2/3 voting with replacement) a the imple- 
mentation of such i n  place of the simple majority voter i n  each 
[(6tu)nom + - 2 (6td)nom 
6 
4 
43 i 
F i g .  3.17 Case 1 
ig. 3-18 Case 2 
6 
3.19 Case 3 
F i g .  3-20 Case 4 
6 
Fig. 3.21 Module Transducer for use with 
the Revised Daly-McKenna Clock 
a I 
b 
d 
9 
I 
I 
t 
I 
I 
I '  
I 
I 
t 
I - 
C 
Fig. 3 , 2 2  Module Transducer Operation 
module transducer could increase the r e l i a b i l i t y  of the clocking 
system. 
For n-fault-tolerance, the number of required clock ebe- 
ments is 3n + 1, as demonstrated by D a l y  and McKenna. Within 
the module transducer, the voter increases i n  complexity b u t  the 
remainder of the c i r c u i t r y  remains the same. O n  the clock system 
level  the Daly-McKenna Clock i s  more cos t ly  than the method of 
clocking examined i n  Section 3.3, but on the module leve l  it is  
much less costly.  
3.4.2 Current Concept 
I n  October, 1971, McKenna designed a single-fault-tolerant 
clock fo r  use i n  the prototype faul t - tolerant  multiprocessor 
(CERBERUS) currently being b u i l t  a t  the C.S. Draper Laboratory. 
The design is shown i n  F i g .  3.23. The clock has been b u i l t ,  
w i t h  m e d i u m  speed TT& technology, and has been demonstrated t o  
survive the imposition of s ingle  faul ts .  The frequency of the 
clock now i n  existence is  about 0 . 7  MHz, 
The operating principles of t h i s  clock are similar t o  
those of the first concept. It should be noted that here, each 
clock element will o s c i l l a t e  a t  i ts  own chosen frequency i f  
separated from the others. When the system is  intercoqnected, 
however, each element w i l l  confor:m t o  one common frequency. As 
will be demonstrated, the system behaves as follows: C : r  after 
A 
a short  delay, is set t o  logic-l  by the5 occurrence of Qz+ 1, 
4 or a f t e r  a much longer delay, by Q2-) 0; C is reset to logic-0, 
a i 
a f t e r  a short  delay, by Q;-+ 0, or a f t e r  a much longer delay, by 
1 
Fig. 3.23 McKenna Clock 
The clock is self-starting; remember that the first 
concept (Fig. 3.8) requires additional logic to induce free- 
running oscillation. Some explanation of the clock element 
structure is called for: the pair of parallel inverters at 
the E output of the J-K flip-flop are there to increase the 
fan-out capability; similarly the eight inverters at the input 
of each clock element serve to overcome fan-out limitations; 
as can be seen, (Q ) , and (Q3)i are each implemented through 
four levels of gating; the strings of inverters, to which 
4 4 
2 1  
4 4 and ( Q 3 1 i  are applied, serve to time certain key signals; 
and the peculiar configuration of the region around the 
retriggerable-one-shot (see Fig. 3.24) is the internal con- 
figuration of the circuit element used. 
As .in the first concept, all clock lines are distri- 
buted to each synchronous module within the system. Simple 
two-out-of-three majority voting within the module is n_ot 
sufficient, but, as before, khe module transducer illustrated 
in Fig. 3.21 may be used. 
For purposes of analysis of the circuit, the following 
simplifying assumptions are made: 
(1) An element's propagation delay is the same- for a 
Inputs 
Fig. 3.24 Retriggerable One-Shot used in 
McKenria Clock 
logic-level-1 t o  10 t rans i t ion  as fo r  a 0-to-1 
t rans i t ion ,  
(2 )  the delay associated w i t h  any inverter  o r  NAND 
gate  is  equal t o  h8 
(3 )  the delay associated w i t h  a t rans i t ion  caused by 
clocking the J-K fl ip-flop is 28, 
(4) the delay associated w i t h  a t rans i t ion  caused by 
an applied zero t 1 3  the SET o r  RESET input of the  
J-K f l ip - f lop  i s  3 6 ,  
( 5 )  the one-shot tr iggers when i t s  input AND gate 
experiences a t rans i t ion  from logic-level-0 t o  1; 
the delay between the input of the AND gate and 
the output of the one-shot i s  2A: a f te r  the l a s t  
tr iggering input t o  the AND gate, the one-shot 
w i l l  go to logic-level-0 a t  a time A t  l a t e r .  
For now, assume tha t  the clock elements are osc i l l a t ing  
i n  synchronism; the clock's self-s tar t ing capabili ty may be 
examined separately. As i n  the manner of the analysis of the 
first concept, assume t h a t  a t  t i m e  t=tA, CA+ I; t=t cB+ 1; B' 
t=tc, C 1; t=tD, C + 1, w h e r e  t >t >t >t Assume that a t  an 
C D D C B A .  
i n i t i a l  t i m e  of observation , ' a l l  propagation 0 w i t h i n  
each clock element, caused by the previous t ransi t ion,  ci+ 0, 
has ceased: then: 
I n i t i a l l y :  c = o  i 
TRi = b; T§i = 1 ,  
7 
ROS. i s  indete:minable from a s t a t i c  analysis, 
1 
Given the i n i t i a l  s t a t e  and the assumed progression of 
eventso the timing analysis i s  as follows: 
CA+ 1 a t  t = t 
c + 1 a t  tB B 
(QZ)i-+ l at tB + 
s.+ O at tB + 5A 
,A 
4 
- 
1 
s.-+ 1 a t  tB + 8a 
3. 
c 3 1 a t  tc 
C 
where, tB < tc < tB + 9A 
C -+ 1 at tD D 
where, tc < tD < tB f 9A 
4 (Q31i+ 1 a t  tc + 4A 
- 
TS.4 0 at 
TS.+ 1 at 
1 
I_ 
1 
ROS. is triggered 
tc + 3A < t < tC + 6A by 
1 
tc + 5A 
t + %A c .  
( 3  \. 102) 
(3,103) 
(3.104) 
(3,105) 
(3.106) 
(3.109) 
( 3  e 108) 
(3.109) 
(3.110) 
(3 111) 
e i ther  at t = t i- 611 or  within C 
i ts  own expiration (ROSi+ 0) : therefore 
ROSi, i = A,B,C, i s  triggered. a t  t = t 
within tc + 3A < t < tC + 6A 
+ 6A: ROSD i s  triggered 
C 
ROS -+ 0 within tc + 3 h  + A t  < t < tC + 6A + At 
ROS 
(3.112) D 
.+ 0 at tc + 6A + At 
BBIC 
C D 4  0 within tC + 6 A  + A t  < t < tc + 9 A  + A t  . 
(3  * 3115) 
ROSD is  triggered w i t h i n  t + 4 A  + A t  < t < tc + 7 A  + A t ,  C 
and ROSi' i = A,B,C, is triggered a t  t = tC + 7A + A t ,  
ROSi is 
ROSi is 
4 
4 
(Q2)i+ 0 a t  tc + 3-38 + At 
(,Q3)i=+ 0 a t  tC + 1.3A + A t  
- 
- 
TR. =+ 0 at tC + 1.3A + A t  
1 
TRi -+ 1 a t  t f 1.6A + A t  
C 
=+ 9 a t  tC -t- 3.511. + A t  
Ri - 
=+ 1 a t  tc f 1.8A + A t  
Ri 
triggered a t  tC + 148 + A t .  
ROS. 3 0 a t  tC + 1411 + 2 A t  
Ci 
1 
=+ 1 a t  to 5 tc + 1 9 A  + 2 A t  
4 (Q21i+ 1 at to + 4 A  
4 (Q31i+ 1 a t  to + 4 A  
TSi 3 
TSi 3 
0 a t  to + !5A 
1 a t  to + €LA _. 
triggered a t  t + 6A. 0 
ROSi 4 0 a t  t 
'i 
+ 61A f A t  
0 a t  to + 91A + A t  
0 
-+ 
( 3 . 1 1 6 )  
(3 e 1 1 7 )  
(3.118) 
(3.119) 
( 3 . 1 2 0 )  
( 3 . 1 2 1 )  
( 3 . 1 2 2 )  
(3 123) 
(3 - 124) 
( 3  125) 
(3 .) 1 2 6 )  
( 3 , 1 2 9 )  
( 3  Q 128) 
(3 0 129) 
6 
4 (Q21i-$ 0 a t  to + 130 + A t  
(3 1131) 4 (Q31i+ 0 a t  to f 1311 + A t  
Before proceeding any fur ther  with the analysis of the 
current concept of the McKenna clock, a major f a u l t  i n  the 
design should be pointed out: ai-+ 1 forces c.+ 1 ( i f  t h i s  
t rans i t ion  has not already occurred) -at a t i m e  9 A  la ter  and 
Q33 0 forces ( 2 . 3  0 a t  a time ]LOA l a t e r ,  b u t  i n  the clock element 
i n  which a t rans i t ion  of Ci occurs as a r e su l t  of Q 
3. 
4 
4 
2 
1 
1 or 
4 Q3+ 0, the retriggerable one-shot expires, thereby clocking the 
fl ip-flop and forcing an unintended change of s ta te :  the occur- 
rence of these gl i tches  i n  two clock outputs could induce a loss 
of synchronization within the system. A poss ib i l i ty  for  correct-  
ing the design is: 
with ;.os E, and i n s e r t  delays between S and the SET input of 
the f l ip-f lop and between and the RESET input of the fl ip-flop, 
- - 
remove TR and TS, re t r igger  the one-shot - 
such t h a t  the one-shot cannot expire in  the half-cycle i n  which 
Ci has been changed by s or  E.. 
times of 6 A  are  suff ic ient .  
To serve such a purpose delay 
H e r e ,  as i n  the f i r s t  concept, the occurrence of a 
4 4 
2 f a i l u r e  may introduce a g l i t ch  i n  Q o r  Q3, subsequent s l iver ing  
within the clock elements, and possible loss of synchronization. 
The same modification is  recommended here as  was introduced f o r  
the f i r s t  concepte Figure 3.25 i l l u s t r a t e s  the author 's  revised 
design of the McKenna clock. 
Assume t ha t  the r ise-the of the one-shots used for  
4 4 4 f i l t e r i n g  Q2 and Q3 is A,  
later: when Q2-) 0, C.* 1, l O A  f A t  l a t e r ,  
clock is: 
Then when Q3+ 1, Ci+ 0, 1 l A  + At 
4 The period of the 
1 
Fi9. 3 . 2 5  Revised McKenna Clock 
The expiration of the one-shot triggers the one-shot, A 
later; At must be such that the one-shot will not expire again - - 
before it is retriggered by S; or R;. From time of expiration .. 4 4 
3 2 to retriggering by Q 3 1 or Q 3 0 (assuming perfect synchronism) 
the time which lapses is: LOA or llA. In order to allow for 
the results of differences in propagation delay amongst clock 
elements (eg., internal delay, phase differences amongst Ci), 
At should be greater than or equal to 20A. Therefore: 
1 - -   
max 61b f 
and if medium-speed TTL technology is used (A z 12ns), 
f r 1.4 MHz. max 
Retriggerable one-shclts with timing adjustments are 
available: if these are used. they may be adjusted after the 
clock is wired, thereby allawing the synchronous quality to be 
"peaked" Indeed the use of adjustable retriggerable-one-shots 
obviates concern with AT AT (defined in Sec, 3.4.1), or 
frequency variation caused hy a single failure. After the 
u" d 
clock system has been peaked., deterioration of the synchronous 
quality is restricted to t h a t  introduced by component aging or 
environmental factors (such as temperature variations), 
The retriggerable-one-shot is responsible for the self- 
starting property of the clock, At power-on the flip-flops 
come up in an arbitrary stat.e, 
it has just been triggered a.nd will expire At later: if it 
If ROSi comes up logic-1, then 
comes up logic-0, it triggers itself and will expire At later. 
4 4 
2 3 Within short order Q + 0 or Q + 1 occurs and the clock elements 
begin oscillating in synchronism. 
*. 
t 
Fault-tolerance is  achieved i n  t h i s  design i n  the same' 
manner as described i n  Section 3-4.1; i f  the clock is  t o  be 
n-fault-tolerant,  3n-t-1 clock elements a r e  required, 
3.5 Speedinq Up the McKenna Clc& 
3.5-1 Advantaqes of Greater S p e s  
The speed of a synchronous processor i s  d i r ec t ly  re la ted 
t o  the  clock frequency. Current: technology allows the design 
of a synchronous processor which runs a t  20 MHz or  more. The 
maximum frequency obtainable by the clock shown i n  Fig. 3-25 
is, with TTL technology, 1.4 MHz: t o  run a processor a t  t h i s  
r a t e  can be highly inef f ic ien t .  It is, therefore, desirable t o  
have the capabi l i ty  of operating a fault-tolerant clock a t  a 
substant ia l ly  greater  speed. 
3.5.2 Application of Advanced Device Technolocsy 
The numerical values of d o c k  frequencies which have 
been derived i n  t h i s  thesis,,  have been derived assuming the 
usage of medium-speed TTL technology. But .there is, today, 
higher speed technology avai1ab:Le. In 1968, Motorola began 
marketing a logic family which of fers  1 ns gate propagation 
delays (MECL 111) e The McKenna-type clock shown i n  Fig. 3.25# 
if implemented i n  MECL 111, could operate a t  16.4 M H z ,  
Certainly, with the deve:Lopment of higher speed tech- 
nologies, proportionately higher speed faul t - tolerant  clocks 
may be b u i l t :  i t  should be realized, hcwevero t h a t  the 
ava i l ab i l i t y  of higher speed technologies w i l l  a lso allow the 
design of higher speed processors. As such it i s  desirable 
to design a faul t - tolerant  clscik which can run at 20 MHz 
8 
u t i l i z i n g  medium-speed TTL technology; then the implementation 
of both processor and clock with any higher speed technology 
w i l l  y ie ld  a proportionately higher speed computing system. 
3-5.3 Revised Ci rcu i t  
Simply by connecting T!Ri t o  the SET input of the f l i p -  
flop and Ei t o  the RESET input of the fl ip-flop, the frequency 
is t r ip led .  The revised clock element is  shown i n  Fig. 3.26. 
The basic operating principles are: Q2+ 0 induces C . +  1 ( a f t e r  
a delay, 9A) and Q3+ 1 induces C.+ 0 ( a f t e r  a delay, l O A )  e 
The period is given by: 
4 
1 4 
1 
T = l9A 
For medium-speed TTL technology, f 0 4.4 MHz. The retrigerable- 
one-shot remains high during synchronous operation: it i s  
needed here only for  i n i t i a l i z a t i o n  of osci l la t ion.  In  order 
t o  assure a start ,  only three of the four clock elements of a 
single-fault-tolerant clock need be provided with a re t r ig-  
gerable-one-shot and its associated c i rcu i t ry ,  Consider such a 
design w h e r e  for  ROS,, A t  = 20At  ROSb, A t  = 35A: and ROSc, 
A t  = 550. A t  power-on the flip-flops come up i n  an a rb i t r a ry  
state and the retriggerable-one-shots a re  triggered. Fig- 3-27 
shows how the clock i s  s ta r ted  from each of the sixteen possible 
i n i t i a l  conditions If one of the retriggerable-one-shots 
f a i l s ,  a s t a r t  i s  st i l l  assured. 
The disadvantages of t h i s  design s t e m  from the circurt- 
stance t h a t  the frequency is  dependent solely on gate delays.. 
Because there are  no devices having adjustable time delays 
within the clock elements, it is  not simple t o  design the 
ock f o r  a part icular  frequency; neither i s  it possible 
Fig. 3.26 Revised Circu i t  
m W 
w 
t = o  4 t = 23A 1 t = 38A 
53-O 
E 44A I 58A I Synchronization has been 
achieved when : 
Ci -0 a t  5l.b 
Ci -0 at 36A 
Ci -1 at 50A 
Ci -0 at 36Q 
Ci -0 at 36A 
ci -1 at 3SA 
Ci -0 at 5 U  
Ci - 1  at 35A 
C i  - r ~  1 at 50A 
Ci - I at 50A 
Fig .  3,27 Analysis of self-starting operation of revised circuit  
t o  minimize the  phase differences between clock elements through 
e method of peaking described i n  Section 3.4.2- 
3.5.4 Increased Speed by Frequency Multiplication 
The frequency of the dis t r ibuted clock can be increased 
through use of the concept i l l u s t r a t e d  i n  Fig. 3.28 for single- 
fault-tolerance. 
single-fault-tolerant McKenna-type Clock; these "low frequency" 
clocks are  supplied t o  t r i p l i ca t ed  two-out-of-three voters,  the 
outputs of w h i c h  a re  each supplied t o  a frequency multiplier 
(x N) The outputs of the multipliers,  C1, c28 and c38 a re  
d is t r ibu ted  t o  the data management system. 
cAI CBI cc# and C are  the outputs of a D 
Several methods of synthesizing the frequency multipli-  
cation come t o  mind: the  f i r s t  is  i l l u s t r a t e d  i n  Fig. 3-29, 
The method of operation of Fig. 3.29 may be described as  a 
burs t  generator; f o r  each pulse of the input, the multiplier 
produces a ilburstll of n pulses. The l imitations of this 
c i r c u i t  a r e  determined by: frequency and frequency var ia t ion 
of the input, minimum pulse width producible by the one-shot, 
tolerance i n  the propagation delay of the one-shot, tolerance 
i n  the pulse width, minimum delay-times available, tolerances i n  
delay-times, and the tolerance i n  the propagation delay of the 
OR gate. The l imitat ions imposed by the tolerances may be 
rninimied by careful component selection; nevertheless some 
tolerance must be allowed: 2 2% seems reasonable. Assume t h a t  
a one-shot i s  available with a pulse width of 25 ns and a r ise 
t i m e  of 10 ns. A s s u m e  t h a t  a tapped delay l ine is  available 
w i t h  t o t a l  delay 500 ns and taps availdble a t  50 ns in te rva ls  
(the Digi ta l  Equipment Corporation manufacturers j u s t  such a 
delay l i n e ,  They claim a tolera:nce from the input to  each 
8 
Fig. 3.28 S ingle-Fault-Tolerant 
Frequency Mult i p l  icat ion 
L_--T------l 
D2 
Fig, 3,29 Burst Generator 
delay tap of - + 5%; hence that tolerance will be allowed for the 
delays considered here), There is a trade-off between the 
quality of synchronization at the outputs of the three frequency 
multipliers and the number of 50 ns delays used, Not including 
delays introduced by the gates, corresponding pulses passing 
through independent 500 ns delays may be separated, at the 
output, by as much as 50 ns (if 25 ns pulses are considered, 
the trailing-edge-to-leading-edge separation is 25 ns; synchro- 
nism has been destroyed), 
pendent delays, and it is desired at the output to have a mini- 
mum overlap of 15 ns, then the maximum difference between 
delay-times is 10 ns and hence for an allowed 2 5% tolerance, 
the largest delay-time which may be used is LOO ns. Therefore, 
it is concluded that unless precision delay lines are made 
available, the frequency multiplier of Fig. 3.29 is restricted 
to producing an integral multiple of 3 or less if it is to 
produce a 20 MHz clock, synchronously with other similar 
frequency multipliers. 
capability of Fig. 3 - 2 9 ,  requires that the input frequency be 
6.7 MHz for the output frequency to be 20 MHz. As developed 
in Section 3 . 5 . 3 ,  the operating frequency of the revised, and 
fastest, McKenna-type clock is 4..4 MHz for a medium-speed TTL 
implementation. 
If 25 ns pulses pass through inde- 
The limitation on the multiplying 
The difficulty with the multiplier of Fig. 3 . 2 9 #  suggests 
the design shown in Fig. 3 - 3 0 ,  The voting between multiplying 
stages corrects for the phase differences amongst the outputs 
of the  multipliers of the preced.ing stage. Mli, Mai, and M3i 
are illustrated in Figs. 3 .311  3 , 3 2 @  and 3 - 3 3  respectively, 
. In Fig. 3 - 3 1 ,  the one-shot pulse! width is 100 ns; in Fig., 3 , 3 2 @  
50 ns; in Fig, 3 . 3 3 ,  25 ns. N0t.e that for a 20 MHz output only 
. 
86 
5 MHZ LO MHZ 20 MH: 
CB 
c2 
e, 
. .. . . 
CD M1 3 M2 3 M33 
Fig, 3.30 Frequency Multiplication through use of 
cascaded burst generators 
One-shot pulse width = 100 ns 
AI = 200 ns 
Figo 3.31 Mli of f i g .  3.30 
One-shot pulse width = 50 ns 
A2 = 100 ns 
One-shot pulse width = 25 ns 
A3 = 50 ns 
MH[z input is required: hence the requirements placed on the 
design of the faul t - tolerant  clock may be lessened. 
Another m e t h o d  of synthesizing the frequency multipli- 
cation is  through use of a phase-locked loop, as i l l u s t r a t ed  in 
Fig. 3.34. The fa i lure  of a system which u t i l i ze s  phase-locked 
loops for  frequency multiplication, however, is  tha t  i n  the 
event of an input clock fa i lure ,  and hence a subsequent change 
i n  input frequency, independent phase-locked loops may not 
maintain synchronism w h i l e  their outputs adjust t o  the new 
frequency. 
It is  concluded that for speeding up the McKenna Clock 
(or any other slow clock) the cascaded b u r s t  generator design 
offers the most viable solution. 
3.6 Methods of Synchronization Used i n  Pulse-Code 
Modulation 
The synchronization o f  pulse-code modulation ( P a )  
networks has long been a s&:ject of in te res t .  
synchronizing switching centers, i n  addition to  transmission 
links, arose i n  1956, when the PCM telephone switching experi- 
The question of 
ment, l a t e r  named Essex, was planned. 
Possible methods of synchronization as described by 
Mumford and Smith (Ref  e 3)  , are  as follows: 
(1) Homochronous system. One s ta t ion  i n  the network 
has a master osc i l la tor ,  and a l l  thc others are locked to  it, 
(2) e Each s ta t ion  is phase-locked 
e average of several signals,  
Output - 
fo = Nf, 
Fig. 3.34 Frequency multiplication by use of a 
phase-locked loop 
(3 e Each 
s ta t ion  has i ts  own frequency control, b u t  the b i t  r a t e  can be 
changed t o  one o r  two or  three discrete  rates, o r  the frame 
ra t e  may be changed by adding o r  dropping special  bits ,  so as  
t o  ensure tha t  the average frequency of each s ta t ion  is  the 
s a m e  as  any other, if a long period is considered, A t  any 
ins tan t  there w i l l  be a phase error  between an incoming s ignal  
and the local  signal,  w h i c h  must not be allowed to  exceed a 
specified maximum i n  operation. 
(4) Heterochronous system. Each s ta t ion  generates its 
own frequency within a specijEied tolerance of the nominal 
frequency. The tolerance must be kept small enough t o  reduce 
to  negligible proportions the loss of information which occurs 
when a f a s t  signal arr ives  a t  a slower stat ion.  
The only one of the four systems named above which of fers  
po ten t ia l  application to  the design of a faul t - tolerant  clock 
is the synchronous system. Such a method of synchronization of 
osc i l la tors  f a l l s  in to  the .general method i l lus t ra ted  i n  Fig, 
3.2. For use i n  PCM, schemes have been developed w h i c h  consis t  
of averaging the phases of a l l  osc i l la tors ,  comparing the r e su l t  
w i t h  each osc i l la tor  phase,.and applying an e r ror  s ignal  as a 
correction to  the  osc i l la tor  frequency. The  osc i l la tors  which 
a re  used have frequencies w h i c h  may be al tered i n  proportion t o  
a control signal: in the absence of external control, each 
osc i l la tor  operates a t  a d i f fe ren t  frequency. A system which 
has been examined by Gersho and Karafin (Ref .  4) is i l l u s t r a t e d  
i n  Fig, 3-35. It has been proven t h a t  under sui table  conditions 
the system i s  stable: i , e ,  I t h e  osc i l la tors  asymptotically 
settle to a common frequency and the phase differences have 
finite asymptotic values, The system i l lus t ra ted  i n  Fig. 3-35 
a, 
Fig. 3.35 Synchronization of PCM station 
oscillators - station i 
is offered by Gersho and Karafin as an abstraction of the more 
complex prac t ica l  systems t h a t  have been developed; the system 
of Fig. 3-35 requires a t o t a l  phase comparison t o  be made, 
which is  i q z a c t i c a l  t o  implement. i n  the PCM system, fa i lure  
of a transmission l i n k  w i l l  lead t o  resynchronization i f  the 
remaining network i s  s t i l l  connected. Also, i n  the case of 
osc i l la tor  fa i lure ,  the remaining N-1 osc i l la tors  w i l l  resyn- 
chronize to a new frequency, if the result ing network of N-1 
s ta t ions  i s . s t i l l  connected, a f t e r  removal of a l l  transmission 
l inks  entering or leaving the inoperative s ta t ion.  
It  i s  impractical t o  u t i l i z e  the above system i n  the 
design of a faul t - tolerant  cl.ock. Not only must provision be 
made to  remove a f a i l ed  oscil.lator from the system, bu t  during 
the time when Ule system is  adjusting to  a new frequency, as a 
r e su l t  of the f a i lu re  of a l ink or  osc i l la tor ,  synchronous 
operation cannot be assured. 
CHAPTER 4 
CONCLUSIONS 
In  a multiprocessor which achieves fault-tolerance 
through repl icat ion,  corresponding uni ts  must be kept i n  
synchronism. I n  Chapter 2 the  synchronization requirements 
have been defined and methods of maintenance of synchronism 
have been developed: the  primary conclusion t o  be drawn from 
Chapter 2 is  t h a t  a synchronous fault-tolerant multiprocessor 
driven by a faul t - tolerant  clock is more eff ic ient  and more 
eas i ly  implemented than is an asynchronous faul t - tolerant  
multiprocessor. 
nation of faul t - tolerant  clocking. 
T h i s  conclusion leads natural ly  t o  the  exami- 
In Chapter 3 specifications have been developed f o r  a 
faul t - tolerant  clock and two general methods of design have 
been explored. 
clocking through failure-detection and subsequent clock substi- 
I t  has been concluded t h a t  fault-tolerant 
tution is possible but impractical to implement in a system of 
many synchronous modules due t o  the high cos t  of each module 
transducer, Fault-tolerant clocking through the concepts , 
advanced by W i l l i a m  Daly and John McKenna has been studied 
intensively: clocks developed by Daly and McKenna have been 
examined, refined, and revised, It has been concluded t h a t  it 
is  desirable t o  have available a faul t - tolerant  clock which 
runs a t  20 MHz, but  t h a t  such a frequency is not achievable by 
a McKenna-type clock ( w i t h  use of current technology) A 
methad of achieving a 20 MHz clock by the use sf a re la t ive ly  
4 
ow McKenna-type clock in conjunction w i t h  a frequency multi- 
plier has been developed in Section 3,5.4, 
REFERENCES 
1. 
2. 
3; 
4, 
5 ,  
6 .  
Larsen, R.W., and Reed, I . . S . ,  "Redundancy by Coding 
Versus Redundancy by Replkation for Failure-Tolerant 
Sequential Circui ts ,"  I,IS.E.E, Transactions on Computers, 
vO1- C-21, NO, 2 ,  PebrUaT{, 1972, pp. 130-137. 
Daly, W. 8 and MCKenna, J., M.I.T. C.S. Draper Laboratory, 
Digi ta l  Development Memo #627, August, 1971, 
Mumford, Her and Smith, P.W., "Synchronization of a 
p.c,m, Network Using Digital  Techniques," Proc, of the 
Ins t i t u t ion  of Electr ical  Engineering, Vol. 113, N o .  9, 
September, 1966, pp. 1420-1428. 
Gersho, A. 8 and Karafin, :Be J, "Mutual Synchronization 
of Geographically Separated Oscil lators,  I' Bell System 
Technical Journal, Vole xLv# No- lo, Dece~iber, 1966. 
M.1-T- C.S, Draper Laboratory, A Fault-Tolerant Infor- 
mation Processinq System For Advanced Control, Guidance, 
Report R-659, Canibridge, M a s s - ,  May, 
1970 
M,I,T, C,S, Draper Laboratory, 
Report 
June# 31970, 
_ . .  . . .  
" 1 ,.. ~ . . . ., ... I 
. .  
E-2529, carnibridge, Masser 
7, M,I,T, C,S. Draper Laboratory, 
(Task 28-S)# Vo1.  I, Cambridge, Mass,, December, 1971, 
7 
