Dynamic modelling and estimation of the error due to asynchronism in a redundant asynchronous multiprocessor system by Duval, R. W. & Huynh, Loc C.
NASA CONTRACTOR REPORT CR177427
1
1
Dynamic Modelling and Estimation
of the Error Due to Asynchronism
in a Redundant Asynchronous Multiprocessor System
(NASA-CB-177U27) D Y N A M I C MODELLING AND N88-27205
ESTIMATION OF THE ERBO.H DUE TO A S Y K C H H O N I S M
IN A R E D U N D A N T A S Y N C H R O N O U S MULTIPROCESSOR
SYSTEM {Advanced Rotorcraft Technology) Unclas
36 p CSCL 01C G3/08 ; 015.7097
Loc c. Huynh and R; W. Duval
P.O. A30146C (HMK)
May 1986
NASA
https://ntrs.nasa.gov/search.jsp?R=19880017821 2020-03-20T05:27:31+00:00Z
NASA CONTRACTOR REPORT CR177427
Dynamic Modelling and Estimation
of the Error Due to Asynchronism
in a Redundant Asynchronous Multiprocessor System
Loc c. Huynh and R. W. Duval
Advanced Rotocraft Technology, Inc.
Mountain View, California
May 1986
Prepared for
Ames Research Center
Under P.O. A30146C (HMK)
NASA
National Aeronautics and
Space Administration
Ames Research Center
Moffett Field, California 94035
TABLE OF CONTENTS
Section Page
I. Background 1
II. Objective and Outline of Report 2
III. Approach 2
IV. Asynchronous Multiprocessor System Description 3
V. Simulation Program 5
VI. Definition of Error Due To Asynchronisity .-8
Error at CPU-A1 and CPUA-2 Output Times 8
Error at CPU-B Input Times 8
VII. Derivation of Mathematical Model of Error 10
VIII. Analysis of Error Model 13
Error Prediction for A-Node 13
Error Prediction for B-Node 14
Select CPUl Logic 14
Midvalue Select Logic 15
Input Smoothing 15
Error Prediction for C-Node 16
Summary of CPU Operations 18
IX. Parameter Identification 18
Mathematical Derivation 18
Summary of CPU Operations 20
X. Results of Parameter Identification Using Simulation 22
XI. Summary of The Results of This Study • 27
XII. Conclusions 27
XIII. Recommendations 27
References 29
LIST OF FIGURES
Figure 4.1 :
Figure 4.2 :
Figure 4.3 :
Figure 5.1 :
Figure 5.2,:
Figure 6.1 :
Figure 6.2 :
Figure 6.3 :
Figure 6.4 :
Figure 8.1 :
Figure 8.2 :
Figure 8.3 :
Figure 8.4 :
Figure 8.5 :
Figure 8.6 :
Figure 8.7 :
Figure 8.8 :
Figure 8.9 :
Figure 8.10 :
Figure 8.11 :
Figure 10.1 :
Schematic Diagram of RAMP System
Faults Resulting in Data Alteration
Faults Resulting in Improper Execution
Data Flow Diagram
Executive Flowchart
Error at CPU-Al and CPU-A2 Output Times
Error at CPU-B Input Times
Synchronized Error Definition
Synchronized Error at A-Node
Error Prediction at A-Node
Error at B-Node; Select CPUl
Synchronized error at B-Node; Select CPUl
Error at B-Node; Select Midvalue
Synchronized Error at B-Node; Select Midvalue
Error Prediction at B-Node
Error at C-Node
Synchronized Error at C-Node
Error Prediction at C-Node; Interpolation
Error Prediction at C-Node; Extrapolation
CPU Diagram
AT Estimate (AT(0) = AT*; No Feedback)
in
Figure 10.2 :
Figure 10.3 :
Figure 10.4 :
Figure 10.5 :
Figure 10.6 :
Figure 10.7 :
Figure 10.8 :
Figure 10.9 :
Figure 10.10 :
Figure 10.11 :
Error Estimate (AT(0) = AT*; No Feedback)
AT Estimate (AT(0) = ^AT*; No Feedback)
Error Estimate (AT(0) = ±AT*; No Feedback)
AT Estimate (Ar(0) = AT*; Full Feedback, Update Every 35 Cycles)
Error Estimate (AT(0) = AT*; Full Feedback, Update Every 35 Cycles)
AT Estimate (Ar(0) = AT*; Full Feedback, Update at Steady State)
Error Estimate (AT(0) = AT*; Full Feedback, Update at Steady State)
Error Estimate (AT(0) = AT*; Scaled Feedback,
Update at Steady State)
AT Estimate (Ar(0) = 1.5AT*; Scaled Feedback,
Update at Steady State)
Error Estimate (AT(0) = 1.5AT*; Scaled Feedback,
Update at Steady State)
IV
ABSTRACT
The use of Redundant Asynchronous Multiprocessor System to achieve Ultra reli-
able Fault Tolerant Control Systems shows great promise. The development has been
hampered by the inability to determine whether differences in the outputs of redundant
CPU's are due to failures or to accrued error built up by slight differences in CPU clock
intervals. This study derives an analytical dynamic model of the difference between re-
dundant CPU's due to differences in their clock intervals and uses this model with on-line
parameter identification to identify the differences in the clock intervals. The ability of
this methodology to accurately track errors due to asynchronisity is demonstrated using a
simulated multiprocessor system. The algorithms generate an error signal with the effect
of asynchronisity removed and this signal may be used to detect and isolate actual system
failures.
I. BACKGROUND
Increased reliance on computers and distributed processing in flight control systems
requires higher reliability. A new approach to achieving higher reliability is to distribute
sequential computational operations over multiple Microprcessor Central Processing Units
(CPU's.) Increased reliability of the sequential CPU's is then achieved using parallel
pipelines of sequential CPU's, with each CPU comparing the results of the parallel oper-
ations at the previous sequential stage to identify and isolate failures. Such a Redundant
Asynchronous Multiprocessor System (RAMPS) has been designed and tested [1]. Advan-
tages of this system include simplified fault detection and isolation logic since the results
at each sequential computation are generated redundantly for comparison. Also the re-
liability is higher than other redundant architectures since the operation of each CPU .is
independent of failures in any other CPU.
Using totally autonomous CPU's to achieve high reliability results in a major im-
pediment to successful implementation of this system. Since each CPU must have its own
clock to be totally independent of other CPU's, the parallel computations will be performed
asynchronously. Small differences in the CPU clocks can propagate into large differences
in the parallel computations over a period of time and the fault detection logic will be
unable to distinguish between errors due to CPU failures and errors due to asynchronisity.
One approach to resolving the problem of asynchronisity has been to use asymptoti-
cally stable control laws in each CPU [2]. This assumes that the error due to asynchronisity
will be bounded. The disadvantage of this approach is that unrealistic constraints on the
control laws may be required to reduce the error due to asynchronisity to acceptable levels.
II. OBJECTIVE AND OUTLINE OF REPORT
The objective of this study is to examine the nature of the error due to asynchronisity
to determined if it is possible to model and track this source of error. If this is possible,
then this information can be used by the fault detection logic to account for differences in
parallel CPU's due to asynchonisity and it can also be used to periodically resynchronize
the CPU's to prevent excessive build up of this error source.
This report begins with a description of the approach used in accomplishing this ob-
jective. A description of the RAMP system and the simulation of this system on a single
CPU computer then follows. The next section defines the error due to asynchronisity and
examines the nature of this error using the computer simulation of the RAMP system. A
mathematical model of this error is then developed and tested against the error generated
by the simulation. Finally, on-line parameter identification is applied, using the developed
error model, to identify the difference in the CPU clocks and use this information to esti-
mate the error due to asynchronisity on-line. The algorithms are teste using the simulated
RAMP system.
III. APPROACH
In order to efficiently attack the problem, a simulation program was implemented to
study the behavior of RAMPS. The approach now can be presented in the following order:
1. Determine operational requirements to insure analytical error function: The def-
inition of the error function affects the nature of the error. The difference in the output
of two assynchronous CPU's is a highly nonanalytic function, however by "synchronizing"
the time at which this error is observed by a downstream CPU, the observed error function
can be made smooth and hence analytically describable.
2. Derive a mathematical model of the analytical error as function of the difference in
CPU clock intervals:The error between parallel asynchronous CPU's will be defined and
the analytical error function will be derived by applying Sample Data theory.
3. Identify the difference in CPU clocks by applying regression to error data with the
error model as a constraint.
4. Use the identified difference in the CPU clocks to track the error due to asynchro-
nism. The voting logic can then be based on other sources of error. Two voting logics will
be compared here:
- Midvalue selection and
- First value selection.
IV. ASYNCHRONOUS MULTIPROCESSOR SYSTEM DESCRIPTION
RAMPS can be described graphically in the following diagram where boxes represent
CPU's and circles represent buffers.
Sensors
(continuous) CPU A's CPU B's CPU C's
Fig. 4.1 Schematic Diagram of RAMP system-
The major characteristics of this system are as follows:
- The first parallel sequence consists of sensors. These sensors give continuous outputs.
- Each CPU is operating independently of the others.
- Each CPU outputs at a fixed frequency to buffers which are read by the downstream
CPU's.
- Parallel CPU's at the same sequential position perform the same task. Their exe-
cution times are slightly different (within a known tolerance and about a known nominal
time.)
- Sequential CPU's in the same parallel pipeline perform different tasks.
There are three kind of faults which can occur in the RAMPS modules[l]:
- Faults that cause data alteration (Fig. 4.2)
- Faults that result in improper execution of the designed code (Fig. 4.3.)
- Faults that cause excessive timing differences in parallel CPU's.
The purpose of this study is to isolate and identify errors due to timing differences so
that the other faults are not masked by the effect of the timing differences.
in
22
L 1
zo J in
o
^
A 
AL
TE
BA
T:
l i
HS
» < in
zuuyic "-i at • .
~^i§
fiSEu< s <i§y
8
•s "^
u in _
-SB g^gQ ff OL u« O
w> Ct
3
"iwO
M MUin
_|j3
as .
C m
'
h:
u
g
z it
BBy^ .£1
OS
.11siig • .
o1-1
-0 Sjw ~ce R
Jg§l
u
z p
*"* 1* •
i&5£u» tn
zs
ZMp* 7
— 525 — — •-
5-
3O
£-
^C M
*•
*•* >•
tf)
S
" z
< • o1
 J_ M QZ !f rj
M
 o U — —
u oc
r SB £5
— 3 i- S <
« is Mg
i i
ce
in
(4
J3
£
— § ci
•B 4
uc
i
w
—en
-|: 1
O OM
in a H
-§§:8i5 H o«
< L)C2ga|
E "
(J
o
in£3
-BSS2°Stw ztn
£
B
0.
O U)
ma?
_S!E
~2~§iSr 3gs
z
o
w
U «3(J UK rj
- — ri zP 5 s1
 5 KZ 2 <
< M IX
"• o 8
0 • K
a a
g
o
-ill— !
£ x 3Ul
z
W
S
M {/)
< HH
— <i " iJi in 5
°S^U u*Zt*
U
>-• U)-
_ssa
ti.X < 10 =1og • Sag
""«- C *i
z
nri
^
^S
as a
„, ft- «10 U. ^ .H a 5 V)
J U D H
~<5So 3
fc
 "H 5.
O M3 55
03 3
|T>
•«t
U
Eu
'
t u t
V. SIMULATION PROGRAM
A simulation of the RAMP system was implemented on a Micro-Vax II computer.
The purpose of the simulation was to investigate the differences in the output of parallel
CPU's due to differences in their internal clock rates. In order to illustrate the effect of
these clock differences the following example is given.
- Each of two CPU's has a 5 Mega hertz (Mhz) clock that is different by 0.001%. The
clock difference (AC) is then:
AC = —^—0.001% = 2. x 10-12sec./dockcyde5Mhz
- If the computation cycle (T) for the CPU's is 50 milliseconds (mS) the number of
clock cycles that it will take to produce one computation cycle difference between the
CPU's is:
„ . . - , . T 50 x 10~3 sec./comp.cycle. 9 , , , /ClockCycles — -—— = — r——; •— = 25 x 10 clockcycles/comp.cycleAC 2 x lQ-12sec./dockcyde , •
- The elapsed time (t) over which the CPU's will develope a one computation cycle
(50 mS) asynchronisity is then:
clockcycles /comp.cycle 25 x 109 .
t = — — = :—T- = 5000.sec/comp.cycle = 83mtn.2Qsec.dockcyde/sec. 5 x 106
Since most flights would run in excess of one hour, this problem will be significant. In
order to facilitate the analysis and minimize the require computer time for the analysis,
greatly exaggerated clock differences are used in the simulation.
In the simulation, each CPU solves an algorithm of the form:
Where the supercript (m = A, B, C) denotes the parallel pipeline and the subcript
(n = 1, 2, 3, ...) denotes the computation cycle number. The following table shows the
coefficients of the algorithm used at each sequential CPU along with the computation cycle
times for the three parallel CPU's at each sequential node.
time
required
(X
P
CPU-A
1
Al j A2 A3
1
0.02 (0.0202 0.0201
1
0.88
0.12
CPU-B
Bl B2 B3
0.035 0.0352 0.0351
0.93
0.07
CPU-C *
Cl C2
0.018 0.0182
0.85
0.15
C3
0.0181
The structure and data flow of the simulation program is shown in Fig. 5.1. On
each pass the Executive first calls the sensor subroutine to generate input data for the A
CPU's. If any of the three parallel A CPU's are due to receive data on this pass, the CPU
A subroutine is called and the results are stored in the output buffer of the appropriate A
CPU. The same process is applied to the B and C CPU's on each pass.
EXECUTIVE
CPU A
i
\
k
CPU B
i
\
i
Fig. 5.1 Data flow diagram.
Figure 5.2 shows the internal logic of the Executive. Real time (t) is incremented
by an amount DT chosen to be much smaller than the difference in computation cycle
times. The next interrupt time for each of the j (j=l,2,3) CPU's at the A, B and C
nodes is then computed. The sensor subroutine is called each real time pass to simulate a
continuous sensor input to CPU-A. For purpose of the simulation, the sensor outputs are
modeled as pure sinusoids. Random noise is added to each of the three sensor outputs.
Current real time is then compared to the interrupt times for all CPU's and any CPU's
requesting data on the current data pass are processed and the output stored in the
appropriate buffer. Whenever a CPU is processed, an incremented computation cycle
count (tyi(j),tB(y)»»c(j)) is returned for use in computing its next interrupt time.
Initialize
t(ime), DT, Run t
FOR J=l,2,3
tA(j] = iA(j}
) = ic(J)Tc(j)
Call sensor
FOR J=l,2,3
if (t > tA(j}} call CPU A (iA(j}}
if ( t>tB ( j ) ) callCPUB(iB(j)}
i f - ( t > t c ( j ) ) cdlCPUC(ic(j}}
Figure 5.2: Executive Flowchart
VI. DEFINITION OF ERROR DUE TO ASYNCHRONISITY
Denoting the output of CPU-A1 as Al and the output of CPU-A2 as A2 then the
error may be expressed in two ways:
ERROR AT CPU-A1 AND CPU-A2 OUTPUT TIMES
The error, A1-A2, is computed at each output interval for either CPU. The error is
seen to flip back and forth between two curves. One curve represents the error when CPUl
outputs and the other curve represents the error when CPU2 outputs. The result of this
error definition is shown in Fig. 6.1.
Fig. 6.1 Error at CPU-A1 and CPU-A2 Output Times
ERROR AT CPU-B INPUT TIMES
The error, A1-A2, is computed at points where CPU-B goes to the buffer to obtain
the outputs of the A CPU's. Since the B CPU's have clock times that are out of synchro-
nization with the A CPU's, B will sometimes sample when Al has output most recently
and sometimes when A2 has output most recently. The result is a sporadic oscillation
between the two bounding error curves, as shown in Fig. 6.2.
0.04
0.03
0.02
0.01
O.OO
-0.01
HI-, fH,
Fig. 6.2 Error at CPU-B Input times
The .non-analytic nature of the error as seen by CPU-B significantly complicates the
modeling process. This problem can be eliminated by defining the error such that the most
recent output used in the error definition always comes from the same CPU. This assures
that the error will always stay on the same bounding curve. This synchronization process
is shown in Fig. 6.3. In this figure, the output times associated with CPU's Al, A2 and B
are superimposed. The error associated with using the two most recent outputs, as seen by
CPU B, is shown under the chart. If each CPU tags its output with a computation cycle
number so that the error can be defined as the difference at the same computation cycles
then the resulting synchronized error is as given in the second line under Fig. 6.3. Note
that there is one input cycle for CPU B at which there is no new information available
using this definition. The resulting error is shown in Fig. 6.4 and, as predicted, stays on
the same bounding curve, providing a well defined and analytical function.
CPU-A2
CPU-Al
CPU-B
Y2
Yl
n
n
i1
n+1
i
n+1
II
'
'
.
n+2 |
, 11
 1
1
1
n+2 |
, 11
 1
Continuous error : Y^nJ-Y^n) YJ[n)-Y,(n+1) Y^n+D-Y^n+l) YI(n+2)-Y,(n-H2)
Synchronized error: Y^nJ-Y^n) - Y^n+D-Y/n+l) Y,,(n+2)-Y,(n4.2}
Fig. 6.3 Synchronized Error Definition
Fig. 6.4 Synchronized Error at Node A
VII. DERIVATION OF MATHEMATICAL MODEL OF ERROR
Assume that each CPU performs a first order linear differential equation of the form
X — —aX + bU. Hence the output in the discrete domain at cycle n+1 can be written as:
Xn+1 = aXn + (3Un
where
and T is the nominal sample interval.
Let CPU-1 be the standard CPU, the block diagram for CPU-1 is:
ZOH aXn+0Un Ln+l
where the Zero Order Hold (ZOH) holds the input constant over the sample interval.
The difference in clock times between CPU-A1 and CPU-A2 can be considered as the
time delay of CPU-A2 with respect to CPU-A1. The block diagram of CPU-A2 then can
be drawn as follows:
TTUn e
ra ZOH aXn+0Un
where a pure time delay of magnitude T has been added.
The model error now can be drawn by combining the above diagrams.
where
= [T2-r1]n
Tl : computational time of CPU-1
T2 : computational time of CPU-2
n : computational cycle number
The mathematical error model now can be derived as follows:
10
Using Laplace Transformation of the elements in the schematic gives:
U continuous
U*(s] h-^2(s)
E(s)
The transfer function for CPU-1 is:
. l -e -*T
U*(s)
and the transfer function for CPU-2 is:
s + a
l - e-aT
U*(s) s s + a
The transfer function for the error is then:
E(s) X2(s)-X l(s) _ l -e~ a T b
U*(s) s + a e
ra
 -
(7.1)
(7.2)
(7.3)
Where E(s) is the output resulting from the impulse train input U*(s). The impulse
transform may be converted to a Z-transform by
1) Substituting z = eaT where possible.
2) Converting the remaining s terms to functions of eaT by
performing a complex convolution [3] integral of the transfer function
with the Laplace transform of an impulse train [ .£[6(0] =
 1_e
1
-t«
3) Substituting z = eaT in the resulting equation.
Applying steps l) and 2) give:
E(z) 1
U(z) 2vj z /c 1 - e~T(a-
b\eTX - [d\ (7.4)
11
Using the residue theorem to solve this complex convolution integral gives:
E(z)
 = z -
U(z) z
*• ' res
In order to to facilitate evaluating the residues, a partial fraction expansion is applied to
get:
E(z]
 = z-
U(z] z
Evaluating the residues, equation (7.6) becomes:
E(z) b z - l 1
tf(«) a z i_e-r( s+a)
The final result is obtained by substituting z — eaT in (7.7) to get:
(7.7)
The difference equation for the error is then obtained from the Z-domain transfer function
as:
En+1 = e-'TEn + -(1 - e-*r)(Un+l - Un) (7.9)
a
Assuming the CPU's are initially synchronized, the phase shift, r, beween the CPU's is a
time varying function given by:
r = ^AT (7.10)
where AT is the difference in the clock cycles between CPU-1 and CPU-2.
At T — T the two CPU's have become one computation cycle out of phase. In order
to avoid making the error model dependent on more than one previous error value, the
error model is redefined at this point to be:
where the superscript " +" denotes the redefined error. For consistency with the
redefined error, the phase shift, r, is reset to zero:
T = 0 (7.12)
12
Both the redefinition of the errors and the resetting of the phase shift to zero for
consistancy with the redefined errors occur only when the phase shift equals one full
computational cycle.
Writing the redefined error at time n (E+) in terms of the previous error definition
at time n (E~] gives:
+ = E- + X\ - X ln_, (7.14)
The error model is then propagated by equations (7.9) and (7.10) until T= T. At this
time the error is updated by the increment given in (7.14) and T is reset to zero. Equations
(7.9) and (7.10) then represent the error between CPU's 1 and 2 one computational cycle
apart. Since this corresponds to the two most current values generated by the two CPU's,
the error equation continually represents the difference in the two most current CPU
outputs. The update process must be repeated whenever r = T.
VIII. ANALYSIS OF ERROR MODEL
ERROR PREDICTION FOR A-NODE
The formulas derived in the previous section were implemented and compared with
errors generated by the simulation program. Predicted error and measured error are
plotted for comparison in Fig. 8.1.
measurement
prediction
Fig. 8.1. Error Prediction at A Node
13
The predicted and measured error, based on the synchronized error definition, are seen
to compare very well. The discontinuity at 2.0 seconds corresponds to the redefinition of
the error when a full cycle of lag has built up and closely matches the measured value ol
the redefined error.
ERROR PREDICTION FOR B-NODE
In analyzing the error for the A-node CPU's, the effect of voting logic on the inputs to
the A-nodes has been neglected because the relatively rapid sensor sampling rate results
in the three sensor values being very similar. The results are therefore similar regardless of
the input selected. This is not true for CPU's at the B and C nodes. There the sampling
rates are considerably slower so the sampled outputs from the A CPU's will be significantly
different as the effective phase lag,, increases. The two types of voting logic to be examined
are select CPU-1 and Midvalue select.
Select CPU-1 Logic
The select CPU-1 logic chooses, as the default, the output of the CPU in the first
pipeline at the previous node for input to the next node. The error in the B-node CPU s
as seen at the C-node CPU's is shown in Fig. 8.2, where the input to the B-node CPU s
has been based on the select CPU-1 logic. Fig. 8.3 shows the error, as seen at the C-node,
using the synchronized error definition.
Fig. 8.2 Error at B-Node; Select CPU-1
..»
o.«x
*•*'
8 -0.01
Fig. 8.3 Synchronized Error at B-Node; Select CPU-1
14
Although the error now is represented by only one of the two bounding curves, the
function is highly discontinuous. This is due to the low sampling rate applied at the inputs
to the B-node CPU's.
Midvalue Select Logic
The midvalue select logic chooses, as the default, the output of the CPU at the previous
node whose value is in the middle of the output range of the CPU's at that node. The
error in the B-node CPU's as seen at the C-node CPU's is shown in Fig. 8.4, where the
input to the B-node CPU's has been based on the Midvalue select logic. Fig. 8.5 shows
this error using the synchronized error definition.
Fig. 8.4 Error at B-Node; Select Midvalue
0.04
2 0.03
<Q
S «.«
i ..„
u
5
 O.M
I
Fig.-8.5 Synchronized Error at B-Node; Select Midvalue
Comparing Fig. 8.5 to Fig. 8.3, it can be seen that the Midvalue select produces a
smoother-error curve than the select CPU-1 logic, so this will be used for the remaining
analysis.
Input Smoothing
The error prediction for A-node CPU's is good because their inputs are from analog
devices (sensors,) these inputs are smooth aud continuous. The situation in B (and C) node
CPU's is different since the inputs to B node CPU's, as obtained from the A-node CPU's,
15
are step functions. Since, the derived math model is based on smooth and continuous inputs
the results will be degraded.
To resolve the problem of discontinuity, a first order hold was applied in each CPU
to smooth its output. Figure 8.6 shows the measured and predicted error with a first
order hold applied to the output of the B CPU's. The curve is smooth and, hence a good
approximation is achieved with the error model.
mcaturentnt
prediction
0.01
a 0.01
0.00
Fig. 8.6 Error Prediction at B-Node
ERROR PREDICTION FOR C-NODE
Figure 8.7 shows the error between the C-node CPU's. The strange shape is a result of
two levels of sampling between the continuous input to the A-node CPU's and the output
of the C-node CPU's. Figure 8.8 shows the synchronized error at the C-node and the error
is seen to stay on the same bounding error curve. The effect of output smoothing at the
A and B nodes is shown in Fig. 8.9 and compared with the model prediction. The results
compare quite well.
Fig. 8.7 Error at C-Node
16
O 9.01
Fig. 8.8 Synchronized Error at C-node
measurement
prediction
Fig. 8.9 Error Prediction at C-Node
In Fig. 8.9 the smoothing has been done by interpolation, which assumes that the
next output is available at the time the smoothed output must be generated. Since this
is generally not the case, a first order smoothing based on extrapolating the currently
available data was implemented and the results are shown in Fig. 8.10 along with the pre-
dicted results. The comparison is still quite good and the implementation is now physically
achievable.
measurement
_ prediction
3 0.01
8
0 0.00
Fig. 8.10 Error Prediction at C-Node; extrapolation
17
SUMMARY OF CPU OPERATIONS
The following tasks must be performed each cycle by each CPU to allow estimation
of the error due to asynchronisity given a knowledge of the difference in CPU clock rates
(AT).
a. Estimate the error due to asynchronisity from upstream CPU's based on the inputs,
b. Calculate the output, c. Smooth the output by using a first order hold.
These three tasks are shown in Fig. 8.11
Error Estimation
Output Computation
Output smoothing
X(t) = Xn (t - nT)
Fig. 8.11 CPU diagram.
IX. PARAMETER IDENTIFICATION
Using the error formula developed in section VII, the difference in CPU clock interval,
AT, can be identified using the Output Error Method [4]. The identified AT then will
be used to track the error due to asynchronisity. The mathematical derivation of the
identification algorithm and the results are presented in this section.
MATHEMATICAL DERIVATION
In order to simplify the algorithm, the derivation assumes that the phase shift does not
exceed one computation cycle. This is insured in practice by resetting the error equation
whenever the phase reaches a full computation cycle.
Given a dynamic model of the output difference due to asynchronisity:
18
En+l = + -(1 -a - Un) (9.1)
(9.2)
and given a measurement, E™, of the difference at cycle n, a measurement error may be
defined as:
Where the estimated difference, J5n(AjT), is a function of the difference in clock intervals.
A cost function is now written in terms of the measurement error as:
(9.4)
t=0
Expanding J as a Taylor's series in AT about an apriori estimate, AT, gives:
J(AT) - AT] + - AT]
Taking the differential of J due to a change in AT gives:
J( AT) - J( AT) = + [AT - AT] ] (AT - AT)
The necessary condition for J to be minimum with respect to AT is g(
gives:
or
= AT-
(9.5)
(9.6)
= 0. This
(9.7)
(9.8)
Taking the derivative of equation (9.4) with respect to AT at AT gives:
N
n=0
E
n=0
(9.10)
and from 9.3 and 9.1-9.2 we get:
l J
19
and
The complete procedure, based on updating the estimate of AT every N samples,
is summarized below.
1) Propagate error and error sensitivity up to cycle n = N.
en = E? - En
Resynchronization of the computation cycles of the two CPU's is accomplished by
resetting the index, n, to zero whenever the phase shift (nAT) equals one computation
cycle (T). At this time, the error is also redefined as described in Equation 7.14. The
equations for propagating the error and error sensitivity are then:
En+1 = e~"TEn + -(1 - <ran^)(tfn+1 - Un); E0 = 0.a
den dEn
d&T
dEn
= 0.
2) Generate first and second gradient of cost function for N cycles
a r N aaJ v^ den
n=0
3) Generate estimate of best AT" over N cycles.
-i
dAT
SUMMARY OF CPU OPERATION
The total operation required for tracking the difference in CPU outputs due to differ-
ences in the clock intervals is shown below
20
For previous CPU's (j = 1,2,3)
Estimate Error due to Asynchronisity/Identify Clock Difference
1
 dJ
AT = AT-
where
N
' d(AT)
and
where
£;n+1 = e-aTEn + b-(\ - e-"r)(Un+l - Un)
T = nAT; 0 < T < T
Select Input (j =-1,2,3)
If ej < e U = Midvalue of previous CPU output
Else Fail CPU*
Compute Output
Xn+l = aXn + (3Un ..
Smooth output
V L 1 V
,, , -*n T i — -*-n( t-nT)
21
X. RESULTS OF PARAMETER IDENTIFICATION USING SIMULATION
The identification algorithm was tested by running the simulation with a specified
AT between the clocks. The identification algorithm was then applied to the output data,
initially assuming a AT that was different from the one used in the simulation. The
algorithm would be considered successful if the estimated AT converged to the actual
value and the corresponding estimate at the CPU output difference tracked the measured
value accurately.
In order to examine the stability of the algorithm, the test was first run with an initial
estimate of AT that was equal to the actual value. The dynamic model was not updated
with the estimated AT. Figure 10.1 shows the history of the AT estimate. It is seen
to initially depart from the correct value but to converge back to the correct value after
about 35 samples and to hold that value from then on. The corresponding comparison of
the CPU difference and its estimate is shown in Fig. 10.2 and is seen to be excellent, as
expected since the correct value of AT is always used.
- actual
estimated
1 1 1 11 1 1 1 1
1M.* !»-•
Fig. 10.1 AT Estimate (AT(0) = AT*; No Feedback)
meaiucancnt
~ predict ion
Fig. 10.2 Error Estimate (AT(0) = AT*; No Feedback)
22
The second test was to initiate the identification with an estimate of AT'that was half
the actual value. Again the identified value was not used to update the dynamic model.
The result of the identification is shown in Fig 10.3. Here we see that the correct value has
again been achieved within 35 samples. Now, however, the estimate degrades significantly
at the discontinuity associated with the error redefinition. This is due to the fact that the
estimate is not being fed back to the model to allow the model output to converge to the
measured CPU difference as can be seen from a comparison of the estimated and actual
CPU differences in Fig 10.4.
™* *
—.— actual
estimated
io*.« us.* ita.t in.* ID*.* ZtC.«
Fig. 10.3 AT Estimate (AT(0) = ^AT*; No Feedback)
____ aea»ur«o«nt
prediction
Fig." 10.4 Error Estimate (AT(0) = 5AT*; No Feedback)
The third test added the updating of the model with the identified AT while using
the correct initial AT for the parameter estimator. The update was performed every 35
cycles since this number was shown to be adequate to achieve a steady state estimate.
The results of the identification are shown in Fig 10.5 and the corresponding comparing
of actual and estimated CPU differences is shown in Fig. 10.6. Now the identification is
quite good until the discontinuity is reached. After this point, the accuracy degrades. This
is apparently because the discontinuity generates large transients in the identification and
the 35 sample update interval falls in the middle of this transient.
23
actual
estimated rr
Fig. 10.5 AT Estimate (AT(0) = AT*; Full Feedback, Update Every 35 Cycles)
. * ***» *
o.ou
0.010
o.oo*
T o.ooo
o
< -o.oos
s
g -o.eie
-O.Olt
-«.020
-O.OM
-0.030
mta«ur«iMnt
prediction
Fig. 10.6 Error Estimate (AT(0) = AT*; Full Feedback, Update Every 35 Cycles)
In order to avoid this problem, a new criteria for the update was generated. The
first update occurs automatically after 35 cycles. Successive recursive estimates of AT are
compared and when the change in these estimates changes sign, it is assumed that the
result is close to steady state and the model is updated with the currently identified value
of AT. As a further prcaution, the update is only made if the magnitude of the change is
below the magnitude of the previous update. The results of this approach are shown in Fig.
10.7 and 10.8. The identified AT and the output of the model both match the simulation
values very well. A further improvement was achieved by reducing the magnitude of the
update by a scale factor to allow a smoother convergence. The update algorithm is then:
= AT-AT = (10.1)
where r is a value less than 1.0 and k is the update number. This scale factor allows the
amount of the update to be gradually increased from 50% to 100% of the actual estimate
over a period of time and improves the stability of the algorithm. The output of the model
corresponding to this modified update algorithm is shown in Fig. 10.9 and is seen to match
the measured CPU difference quite well.
24
-4
2.5
2.0
1.5
\.0
0.5
0,0
X10
- actual
estimated
_L
0.0
_L
25.0 50.0 75.0
J
100.0 125.0 150.0 17S.O 200.0 22S."0
CYCLES
Fig. 10.7 AT Estimate (Ar(0) = AT*; Full Feedback, Update at Steady State)
0 Q1S . »«a*ure»ent
"
>ola
 prediction
0.010 '
0.005
g 0.000
o
f -0.005
u
K -0.010
,. -0.015
o
£ -0.020
IU
-0.02S h
-0.030 I I I I J
0.0 25.0 50.0 75.0 100.0 125.0 1SO.O 175.0 200.0 225.0
CYCLES
Fig. 10.8 Error Estimate (AT(0) = AT*; Full Feedback, Update at Steady State)
o
? -o
u
H -0
- -o
5"
S . - o
U)
-0
.015
,010
DOS
,000
,005
,010
,015
020
,025
-0.030
ntaiurenent
_L J. J
0.0 25.0 SO.O 75.0 100.0 12S.O 150.0 175.0 200.0 22S.O
CYCLES
Fig. 10.9 Error Estimate (AT(0) = AT*; Scaled Feedback, Update at Steady State)
25
As a final test of the algorithm, the previous procedure was run using an initial
estimate of AT that was 50 % greater than the actual value. The update logic and update
scale factor were applied as in the previous case and the results are shown in Figures 10.10
and 10.11. The identification algorithm and model output both track the simulation value
quite well.
actual
• estimated
. in*
Fig. 10.10 AT Estimate (Ar(0) = 1.5AT*; Scaled Feedback, Update at Steady State)
-0.010
-O.Otf
-O.OZO
-a.ns
n.o IM.O m.a ua.« in.* MO.O a*.*
Fig. 10.11 Error Estimate (AT(0) = 1.5AT*; Scaled Feedback, Update at Steady State)
26
XI. SUMMARY OF THE RESULTS OF THIS STUDY
The following accomplishments were achieved in this study
a. Definition of Analytic Error Function. The error between two CPU's was defined
so as to produce an analytically describable function.
b. Error Model Derivation. An analytical model was developed to estimate the error
between CPU's as a function of the difference in CPU clocks. The error model is based on
a smooth, continuous input to each CPU; therefore each CPU must perform a first order
hold to smooth its output.
c. Accuracy of Error Model Demonstration. The estimated error was shown to be
within 2% of the measured error when the difference in clock intervals is known.
d. Identification of Difference in CPU Clock Interval. Since the error formula depends
only on the difference in CPU clock interval, AT, true AT can be regressively tracked by
applying Parameter Identification methods. Identified AT was shown to be within 5%
error of the true AT after 100 cycles.
e. Updating of Model with identified Clock Intervals . Identified AT was fed back
into the model and was successfully used to track the error due to asynchronisity.
XII. CONCLUSIONS
The ability to model and track the effect of different clock intervals on the relative
output of redundant CPU's has been demonstrated using a computer simulation of the
Redundant Asynchronous Multiprocessor System (RAMPS). This capability effectively
eliminat the problem of asynchronisity on identifying failures without compromising the
autonomy of each CPU. The high level of reliability associated with a RAMP system is
therefore maintained. The identified clock differences could also be used to modify the
computation algorithms in each CPU to keep the outputs synchronized, again without
sacrificing their independence.
XIII. RECOMENDATIONS FOR FURTHER RESEARCH
The identification algorithm used in this study was based on the Output Error method.
The use of Maximum Likelihood identification may allow for more rapid and robust track-
ing of the clock differences with only a slight increase in computational requirements. The
use of this technique should therefore be investigated.
The algorithm modeled in each CPU for this study was a single input/single output
algorithm. This algorithm was also asymptotically stable. In order to assure generality in
27
the application of this methodology, the use of multi-input multi-output algorithms that
are not asymptotically stable should also be investigated.
The use of the identified clock difference to modify the computation algorithm to
synchronize the CPU's could keep the error within small bounds without sacrificing CPU
autonomy. This technique should be investigated.
The ultimate test of the approach will be to implement it in the RAMPS hardware
rather than simulate a multiprocessor system on a single CPU.
28
REFERENCES
1. "Investigation of Redundant Asynchronous Microprocessors for High Authority
Auto Flight Control", University of Southern Colorado, January, 1984.
2. "Distributed Asynchronous Microprocessor Architectures in Fault Tolerant
Integrated Flight System", AIAA Computers in Aerospace Conference, Hartford,
Connecticut, October, 1983.
3 Roberto Saucedo, Earl E. Schiring, "Introduction to Contnuous and Digital
Control System", The Macmillan Company, New York, 1968.
4. Mehra, R. E., Stepner, D. E. and Tyler, J. S., " A Generalized Method for the
Identification of Aircraft Stability and Control Derivatives from Flight Test Data",
Proceedings of the Thirteenth Joint Automatic Controls Conference, Stanford-,
California, Aug., 1972.
29
1. Report No. 2. Government Accession No.
NASA CR- 177427
4. Title and Subtitle
Dynamic Modelling and Estimation of th
to Asynchronism in a Redundant Asynchr
Multiprocessor Svstem
e Error Due
onous
7. Author(s)
Loc c. Huynh and R. W. Du Val
9. Performing Organization Name and Address
Advanced Rotorcraft Technology, Inc.
1804 Stierlin Road, Suite 210
Mount ina View, CA 94043
12. Sponsoring Agency Name and Address
National Aeronautics & Space Administration
Ames Research Center
Moffett Field, CA 94035
IS. Supplementary Notes
Point of Contact: Technical Monitor,
NASA Ames Research
Mnffptt Field. CA
3. Recipient's Catalog No.
5. Report Date
May, 1986
6. Performing Organization Coda
FSF
8. Performing Organization Report No.
10. Work Unit No.
11. Contract or Grant No.
A30146C(HMK) '
13. Type of Report and Period Covered
Contractor Report
14. Sponsoring Agency Code
RTOP #505-66
James Howard, MS 210-5
Center
94035 (415) 694-5941
16. Abstract
The use of Redundant Asynchronous Multiprocessor System to achieve Ultra
reliable Fault Tolerant Control Systems shows great promise. The development
has been hampered by the inability to determine whether differences in the
outputs of redundant CPU's are due to failures or to accrued error built up
by slight differences in CPU clock intervals. This study derives an
analytical dynamic model of the difference between redundant CPU's due to
differences in their clock intervals and uses this model with on-line para-
meter identification to identify the differences in the clock intervals.
The ability of this methodology to accurately track errors due to asynchronisit
is demonstrated using a simulated multiprocessor system. The algorithms
generate an error signal with the effect of asynchronisity removed and this
signal may be used to detect and isolate actual system failures.
17. Key Words (Suggested by Author (s) I
Redundant, Asynchronous, Multiprocess
Control, Aeronautics, Parameter
Identification
18. Distribution Statement
ar
Unclassified-Unlimited
Subject Category 08
19. Security Oassif. (of this report) 20. Security Classif. (of this page)
Unclassified Unclassified
21. No. of Pages 22. Price*
29
'For sale by the National Technical Information Service. Springfield, Virginia 22161
