A polymorphic reconfigurable emulator for parallel simulation by Parrish, E. A., Jr. et al.
AN ANNUAL REPORT
A POLYMORPHIC RECONFIGURABLE EMULATOR
FOR PARALLEL SIMULATION
Submitted to:
National Aeronautics and Space Administration
Langley Research Center
Hampton, Virginia 23665
Submitted by:
E. A. Parrish, Jr.
Professor and Chairman
E. S. McVey
Professor
G. Cook
Professor
Department of Electrical Engineering
RESEARCH LABORATORIES FOR THE ENGINEERING SCIENCES
SCHOOL OF ENGINEERING AND APPLIED SCIENCE
UNIVERSITY OF VIRGINIA
CHARLOTTESVILLE, VIRGINIA
Report No. UVA/528173/EE80/102
April 1980
https://ntrs.nasa.gov/search.jsp?R=19800016517 2020-03-21T18:38:51+00:00Z
FOREWORD
This report summarizes research performed on the PREPS project
during the first year. The project has supported three half-time
graduate students (2 Ph. D. and 1 M.E.) and three principal investi-
gators (Professor G. Cook was on leave from 1 September 1979 through
3 May 1980) at 10% effort.
SECTION 1
INTRODUCTION
This research is directed at trying to capitalize on the recent
developments in microprocessors and arithmetic support chips to design
a reconfigurable emulator for real time flight simulation. The system
consists of (1) a Master Control System (MCS) to perform all man/
machine interactions and to configure the hardware to emulate a given
aircraft, and (2) numerous Slave Compute Modules (SCMs) which com-
prise the parallel computational units.
Three specific problem areas are reported on herein. First, early
work has shown that all parts of the state equations can be worked on
simultaneously but that the algebraic equations cannot (unless they are
slowly varying). Thus, Section 2 reports on attempts to obtain new
algorithms that will allow parallel updates.
The second area involves determining the word length and step
size to be used in the SCMs. This work is described in Section 3 as
well as [1].
Finally, the architecture of the hardware and software is covered
in Section 4 and in [2]. The architecture must clearly reflect the
outcome of the above studies as well as conform to the conceptual
framework of PREPS.
SECTION 2
ALGORITHMS FOR PARALLEL PROCESSING
2.1 Introduction
It is a general rule that "A more accurate method is more time-
consuming," but in a real-time simulation time is very critical. The
advent of cheap and fast microprocessors suggests that we can solve
problems by using multiprocessors in parallel.
Now the question is, can we use multiprocessor systems to solve
numerical differential equations such as
x = f(x,t) (2-1)
Looking at some well-known numerical integration methods, such as the
Euler method
xn+1 = xn
or the AB-2 method
Xn+1 ^n+li^V-^n-l^' <2-3)
we see that we cannot obtain x
 +1 before obtaining the value of f(x ),
the two tasks have to be done in series. Unfortunately, to update
f(x ) is time-consuming even for moderate complexity of f(x). We need
new algorithms that can update x and f(x) at the same time. If we
consider only numerical integration algorithms in the form
xn+1 = xn + J(f(xn-1>' f<xn-2) ^ ^
we can calculate f(x ) and x
 +1 almost simultaneously. That is, they
can be done in parallel. In the following section we develop several al-
gorithms. We also have determined their corresponding stability regions
and tried to enlarge them. The idea appears to be very promising as a
means of exploiting parallel processing in digital simulation.
2.2 Development
The general form of k step multistep integration algorithms is
Xn+1 = ?jaixi + Wi] (2'5)
I ~~ K
where n, k are integers k < n. After setting a = 1, 0 =0 and a. =
0 for k £ i < n, we can develop several new algorithms by the coeffi-
cient-matching method for different values of k. For stability consi-
derations we only take k = 2,3.
Assuming k = 2, we have the basic form
xn+1 = xn.+ TlP1fn"l +P2 fn-2 ] ' (2'6)
or
Xn+3 = Xn+2 + T[Plxn-H + P2Xn]' since X = f(x'l)' (2'6)
Step 1: Using the Taylor Series Expansion, we have
xn+2 = xn + 2Txn
n
 + T3P1 IT Xn" + . . . . (2-9)
Step 2: Use coefficient-matching method to minimize the local
truncation error.
5 3The minimum local truncation error occurs for p1 = -~ and Pp = " 2
and is
ELT = f| TV' + 0(T4), (2-11)
and
and
v = v + —• r^f - ?fA ,. ^ T ,•» I sj I ^ W I
n+1 n d. n-1 n-^i
By the same procedures, we can get another algorithm for k = 3, i.3.,
x_. . = x + T5[53f ' - 64f ~ H
n i/i n-1 n-^:
=55 42
 + Q ( 5) (2-14)
24 ' n u u
These algorithms are listed together with two other non-minimum local
truncation error algorithms in Table 2-1.
Now these algorithms can be implemented in parallel, because when
we update x -, we only need to utilize f _- for which the update was
begun just after getting x _... Although this cannot be done completely
in parallel, it gives us a hope of the further development.
2.3 Stability
Stability is another improtant criterion in evaluating the usefulness
of a numerical integration algorithm. Any acceptable algorithm must be
at least weakly stable, usually we need strong stability. The stability
regions of the above algorithms, compared with AB methods, are shown
in Figure 2.1.
Table 2-1. Truncation Errors of Several Algorithms
Name
PI
P2
P3
P4
Algorithm
T
xn+l = xn + j[5fn-l-3fn-23
-n+l « xn + lt8fn.i-9fn-2-3fn.3l
*n+l = *n + fy[51fn-l-18fn_2-9fn_3]
VI = Xn + |4t53fn-l-64fn-2+23fn_3]
Local Truncation
23 o ...ELT - u T.3xn +
LT 12 ^^n
55
 m3 •"p •" ~ ^^™ HP v 1
^,-a^v.
Error
.«.,
0(T4)
O f T^ 1
O f T ^
C «-
r*
<n co
r^
^4
I
c
*W
|
ei*-*
<nt_j
-|C4
+
xc
i
4
c
X
n c
+ + CM •"
r-« • + <n
•H >H N C <M
1 1 1 *»J c^ -f
C C C 00 1 CM
«U IM •*.
 rt C I
>C ON CO <t. C
*H IA 1 O» »fc-
1 ^
1 • iH 1 \O
.H 1 1
C C t C »H •-«
**-* i*j c «*•• 1 1
m in >*-i «H c eCM in in tf\ tt< v*
i— i i—i i—i i_i co n
» • i/t
Hl*^ ^l"* . H|*T t-J
I-H I«M M«M IM f-|CM l-|rs
IrH
+ + + + +
-f
c c c c c
x x x x x e
X1 1 t 1 1 1
t t t t t t
X X X X X X
O
4J
i
42 9
•0§
CN
04
M
O
IH
0)
C
O
(1)
•H
XI
id
+J(/I
CN
—1°: o)
00
•H
Based on accuracy considerations, P4 is the algorithm that we are
interested in, but it has a quite small stability region. It means that if
*
we want to use it, we must pick a step size sufficiently small to force
the value of Ah to fall into the stability region and this really limits its
usefulness.
A new question arises: "If we change the basic form into
xn+1 = Axn + Bxn-1 + Cxn-2 + Dxn-3 + TtEfn-1 + Ffn-2 + Gfn-3] <
could we change the situation?" After further investigation, we found
we could enlarge the stability region a bit, but not much (Detail see
Appendix in Section 2.5). This algorithm is therefore confined to
solving differential equations with small A. if a large h is to be used.
2.4 Summary
In the previous section we have developed several new numerical
integration algorithms. These new algorithms make it possible for us to
update x .. and f .. simultaneously. Thus the algorithm can be imple-
mented via a set of fast and low cost parallel processors. The stability
regions have been studied and plotted. In all cases the range of
stability was reduced by the use of stale data. Nevertheless the stable
range was still usually large enough to accomodate a value for T of
one-fifth the time constant (T/2 = .2 or AT = .2) which is as large as
one normally wishes to use.
2.5 Appendix
THEOREM:
A necessary condition for stability of the linear multistep method
+
 VlVn+k-1 + • • • • +aoyn =
(2-16)
is that no root of the associated polynomial
P(6) = <*k6k + ak_16k"1 + . . . .+ aQ (2-17)
may have modules which exceed 1, and that the roots of modulus 1 be
simple.
Since the general form is
xn+1 = Axn + Bxn-1 + Cxn-2 + Dxn-3 + T[Efn-1 + Ffn-2 + Gfn-3]
(2-18)
where A, B, C, D, E, F, G are coefficients to be determined and T is
step size, the first constraint, due to the theorem, is that the poly-
nomial
P(6) = 64 - A63 - C6 - D (2-19)
has no roots with modulus exceeding 1 and that those of modulus 1 be
simple.
Using the maximum coefficient-matching method, we can solve for
E, F, G in terms of A, B, C, D as follows:
E = |2[81 - 28A - 5B - D] (2-20)
F = J[2A - 2B + 2D - 18] (2-21)
and local truncation error also depends on A, B, C, D,
-LT - - A - —8  24 x +Xn (2-23)
Now all we have to do is to decide the possible values for A, B,
C, D to meet the constraint. We put one root at 0, another at 1, just
as all the other numerical methods do, and let the other two roots 6..,
6p move around within a unit circle. The problem can be divided into
two cases:
Case 1: 61, 6? are on the real axis, i.e., 6.. = k-, 6? = k? and
< 1,
Equation (2-19) becomes
64 - (k.,+k2+1)63 +
< 1
= 0, (2-24)
and
A = (k^kg+D
B = -(k1k2+k1+k2)
C = k.jk2
D = 0
The local truncation error is
ELT = 1 - I <Vk2
(2-25)
(2-26)
(2-27)
(2-28)
(2-29)
Case 2: 6-, 6? are complex conjugate, i.e., 6- = a+juu, 60 = a~JU)
and
+u>2 < 1
Equation (2-19) becomes
64 -(2a+1)63 + (a2+tu2+2a)62 + (a2 + u>2) = 0 (2-30)
and
A = 2a+1 . (2-31)
B = -(a2+iu2+2a) (2-32)
C = a2-hju2 (2-33)
D = 0 (2-34)
The local truncation error is
ELT = § - ! a ' -b^2)- (2-35)
From Figures 2-2 to 2-10, the stability regions for different values
of 61 and 6», we can see that for most cases the stability region is larger than
that of algorithm P4 and the stability regions get bigger and bigger
when 6.. and/or 6p approach the boundary of the unit circle. But we
know all the existing stable multistep methods have 6- and 6p at the
origin, and when 6.. and 6? approach the boundary of the unit circle,
we would expect the algorithm to be less stable or less accurate. We
need to do further investigation.
Assume x = -A.x where A < 0. Then the difference equation
xn+1 " Axn " [P'XTE xn-1] " [C-XTFlxn-2 + XTGxn-3 = ° (2"36)
has a solution of the form
xp = C^J + C2p£ + C3p|J + C/J. (2-37)
As T •* 0, P1 = 1, P2 = 0, P3 = k^ P4 = k2. Here P1 represents the
true root, and p?, p- and p. are extraneous roots which arise only
because we have replaced a first order differential equation by a fourth-
order difference equation. For a strong stable algorithm, we have to
let p < 1, p < 1, p < 1.
10
0)
•H
X
<U
£
c
0
A!
H
i-l
A!
M
<M
10§
-H
4J
ta
4J
w
CM
3
60
4-1
Oi-l
04
-P
-H
i-|
•3
4J
(0
o
I
O
O
*
11
12
orCz.
0)h
M
13
w
•H
X
s
0)
e
o
in
o
II
iHM
o
(0
0
n)
-p
CO
CN
M
•H
14
X
n)
0)
C
O
in
r~
•
o
II
O
•4-1
in
§
•H
•H
iH
•H
D
M
15
Ffk
cf
ci
I
e
H
§
H
•H
a
C)I
3
•n
O
U
X
0)Mfu
NO
Ci
I
.0;
CO
g
•H
Cn
0)
"I?
•H
rH
•H
•a
4J
W
CN
00
•H
16
3
•n
I
CN
O
II
CM
3
•n
tn
CM
•
o
II
•H
(4-1
§
•H
-H
fta
4J
00I
CS
3
60
17
Or
Qb
I
t
I
I
if}
•o
II
CN
o
II
so
§
•H
O^)
00
18
Oft
0'
r* c»
eT
§
mi^
a
II
+
in
*o
II
O
M-l
§
•H
•H
XI
n)4J
M
o
rH
I
0)
.M
•H
19
From Tables 2-2 to 2-7, which are constructed by varying the
value of XT, = -0.27 is the upper bound for stability and accuracy
consideration, because at this value some algorithms have one root
greater than unity and others lose accuracy by introducing imaginary
parts. If we want a little greater stability to suit a large XT, there
are two choices:
(i) xn+1 = ! xn - \ xn-1 + ^  189fn-r256fn-2+87fn-3l ^^
ELJ = §1 TV V + 0(T5) (2-39)
=
 2xn - I xn-1 + I xn-2 + l4f65fn-r88fn-2+35fn-3]
(2-40)
EL T 8 n
We need to perform simulations to further judge their usefulness.
20
Table 2-2a. Root Locations
XT = -0.05 1
*1
.0
0.25
0.25
0.25
0.25
0.25
0.5
0.5
0.5
0.5
0.5
0.75
0.75
0.75
0.75
k2
0
-0.25
0
0.25
0.5
0.75
-0.5
-0.25
0
0.5
0.75
-0.5
-0.25
0
0.75
61
0.951215
0.951214
0.95121
0.951204
0.951199
0.951124
0.951209
0.951206
0.951199
0.951169
0.951152
0.951185
0.95118
0.951171
0.951036
!
B2
0.355454
0.401046
0.432307
0.48603
0.587492
0.765601
0.542992
0.550895
0.563707
0.641303
0.775251
0.757975
0.759343
0.76532
0.816661
33
-0.15335 + JO. 50983
-0.17613 + JO. 469974
-0.06676 + j 0.46468
-0.03138 + JO. 42803
0.10565 + JO. 36254
0.14164 + j 0.28612
-0.2471 + JO. 35567
-0.12605 + JO. 39755
-0.00745 + jO. 39905
0.203764+ jO. 28235
0.26180 + JO. 15489
-0.22958 + jO. 27415
-0.10526 + jO. 32903
-0.01865 + JO. 33224
0.59838 + jO, 133921
5 i
84
83
Note: $3 means complex conjugate of 63.
x means the absolute value of 8j_ > 1.
Table 2-2b. Root Locations
XT = -0.05
k
0
0
0.25
0.25
0.5
0.5
0.75
0)
0
0.25
0.25
0.5
0.25
0.5
0.25
01
0.95121
0.95122
0.95121
0.95121
0.95119
0.95120
0.95117
32
0.35545
0.310305
0.40513
0.22257
0.42949
0.15311
0.11223
63
-0.15334 + JO. 50983
-0.13213 + jO. 55091
-0.07183 + jO. 46384
0.16311 + jO. 60885
0.30966 + JO. 29071
0.44784 + jO. 54781
0.71831 + JO. 25441
04
. -.
83"
21
Table 2-3a. Root Locations
XT = -o.i
*i
0
0.25
0.25
0.25
0.25
0.25
0.5
0.5
0.5
0.5
0.5
0.75
0.75
0.75
0.75
*2
0
-0.25
0
0.25
0.5
0.75
-0.5
-0.25
0
0.5
0.75
-0.5
-0.25
0
0.75
3l
0.904584
0.904565
0.904503
0.904394
0.904164
0.90306
0.904471
0.904409
: 0.904319
0.903739
0.901951
0.903895
0.903738
0.903464
0.895464
&2
0.426622
0.462158
0.494706
0.545743
0.632818
0.784402
0.576176
0.58748
0.604521
0.686548
0.801461
0.768846
0.771819
0.776442
0.855484
03
0.16560 + JO, 68500
-0.18336 + jO. 65276
-0.07460 + JO. 63206
0.02493 + JO. 58689
0.10651 + JO. 51691
0.15627 + jO. 43027
-0.24032 + jO. 56038
-0.12094 + JO. 57312
-0.00442 + JO. 55899
0.20486 + JO. 44321
0.27329 + JO. 14908
-0.21137 + jO. 46971
-0.08778 + JO. 48934
0.03505 + jO. 47692
0.37456 + j 0.14908
34
&3
Table 2-3b. Root Locations
AT = -0.10
k
0
0
0.25
0.25
0.5
0.5
0.75
(0
0
0.25
0.25
0.5
0.25
0.5
0.25
Bl
0.904584
0.904604
0.90445
0.904566
0.904074
0.904434
0.902965
B2
0.426622
0.393114
0.487811
0.340651
0.571984
0.297086
0.274497
33
0.16560 + JO. 68500
-0.14886 + jO. 71788
0.05387 + jO. 61801
0.12739 + jO. 72782
0.26197 + jO. 46520
0.39924 + jO. 61916
0.66127 + jO. 25067
e4
33
22
Table 2-4a. Root Locations
AT = -0.15
kl
0
0.25
0.25
0.25
0.25
0.25
0.5
0.5
0.5
0.5
0.5
0.75
0.75
0.75
0.75
*2
0
-0.25
0
0.25
0.5
0.75
-0.5
-0.25
0
0.5
0.75
-0.5
-0.25
0
0.75
3l
0.359341
0.859208
0.858838
0.858135
0.856351
0.831331 + JO. 0105599
0.858591
0.858204
0.857574
0.85037
0.841861 + JO. 0308764
0.852558
0.850571
0.846422
0.867969 + JO. 045016
32
0.476703
0.507451
0.540661
0.590594
0.671962
Si
0.606184
0.620489
0.640453
0.72805
¥
0.788329
0.79468
0.805377
3l
33
-0.16802 + JO. 82072
-0.18333 + JO. 79220
-0.07475 + JO. 76153
0.025635+ j 0.70952
0.11084 + jO. 63495
0.168669+ j 0.541913
-0.232687+' JO. 709719
0.11435 + j 0.70740
0.00099 + JO. 68305
0.20944 + JO. 55980
0.28314 + JO. 456348
-0.19544 + j 0.61098
-0.07263 + JO. 61425
0.04910 + JO. 59209
0.38203 + JO. 31819
$4
3j
Table 2-4b. Root Locations
XT = -0.15
k
0
0
0.25
0.25
0.5
0.5
0.75
0)
0
0.25
0.25
0.5
0.25
0.5
0.25
Bl
0.859341
0.859451
0.858543
0.859228
0.855972
0.858475
0.847451
02
0.476703
0.447507
0.542269
0.415615
0.637901
0.405967
0.614993
33
-0.16802 + j 0.82072
-0.15348 + JO. 84966
0.04959 + jO. 73797
0.11258 + JO. 83304
0.25306 + JO. 58384
0.36778 + JO. 70171
0.51878 + JO. 29603
34
33
23
Table 2-5a. Root Locations
XT = -^0.20
kl
0
0.25
0.25
0.25
0.25
0.5
0.5
0.5
0.5
0.75
0.75
0.75
*2
0
-0.25
0
0.25
0.5
-0.5
-0.25
0
0.5
-0.5
-0.25
0
A
0.813932
0.813359
0.811803
0.808502
-.795513
0.810375
0.808471
0.804923
0.785482 + JO. 040399
0.806713 + JO. 0390617
0.809695 + jO. 0424832
0.814069 + JO. 0466179
&2
0.520066
0.548211
0.582872
0,634411
0.722697
0.640033
0.656843
0.681446
Bl
3l
Bl
Pi
e3
-0.11700 + JO. 93686
-0.18079 + j 0.91069
-0.07234 + JO. 87228
0.02854 + JO. 81442
0.11590 + JO. 73553
-0.22520 + J0.83450
-0.10766 + j 0.82156
0.00682 + JO. 78921
0.21452 + jO. 657418
-0,18171 + jO. 729575
-0.05969 + JO. 72174
0.06093 + JO. 69197
34
63"
Table 2-5b. Root Locations
XT = -0.2
k
0
0
0.25
0.25
0.5
0.5
0.75
to
0
0.25
0.25
0.5
0.25
0.5
0.25
01
0.813932
0.814379
0.810591
0.813614
0.794696
0.810778
0.777819 + jO. 068809
02
0.520066
0.493282
0.590268
0.474643
0.703101
0.489458
h
e3
-0.16670 + JO. 93686
-0.15383 + JO. 96332
0.94857 + jO. 84099
0.10587 + j 0.92739
0.25110 + JO. 68157
0.34988 + JO. 782657
0.472181+ JO. 428535
84
H
24
Table 2-6a. Root Locations
x
 AT = -0,25
*1
0
0.25
0.25
0.25
0.25
0.5
0.5
0.5
k2
b
-0.25
0
0.25
0.5
-0.5
-0.25
0
gl
0.764619
0.762269
0.755181
0.717892 + jO. 0155761
0.7542 + jO. 0637311
0.740026
0.726368 + j 0.0279864
0.737616 + jO. 047944
02
0.564556
0.592382
0.632886
01
01
0.69649
01
Bl
S3
X
X
-0.06903 + jO. 97124
0.03211 + jO. 90816
0.1208 + jO. 82517
-0.218258 + jO. 944256
-0.10137 + jO. 92315
0.01238 + jO. 88403
04
X
X
03
Table 2-6b. Root Locations
XT = -0.25
k
0
0
0.25
0.25
0.5
0.5
U>
0
0.25
0.25
0.5
0.25
0.5
01
0.764619
0.766375
0.748818
0.76406
0.748561 + jO. 068962
0.753959
02
0.564556
0.538369
0.64882
0.530168
»T
0.567169
03
X
X
0.05118 + jO. 93331
X
0.25144 + jO. 76737
0.33944 + jO. 85870
04
X
X
01
X
03
25
Table 2-1 a. Root Locations
XT = -0.27
*1
0.25
0.27
0.28
0.3
*2
0
0
0
0
Bl
0.716738
0.701398
0.69691 + JO. 00906
0.69988 + JO. 0227805
32
0.66852
0.690966
01
33
X
X
-0.05691 + j 0.99862
-0.0498798 + JO. 99199
34
X
X
63
Table 2-7b. Root Locations
XT = -0.27
k
0.25
0.5
0.5
to
0.25
0.5
0.4
01
0.697969 + JO. 0265897
0.721195
0.70184 + JO. 056508
32
¥
0.605773
03
0.05203 + JO. 96797
0.33652 + jO.8877
0.29816 + JO. 84226
34
33
26
SECTION 3
COMPUTER WORD SIZE REQUIRED FOR
SPECIFIED ERROR IN NUMERICAL INTEGRATION
3.1 Introduction
This section deals with computational errors; a question that has
to be examined in the light of the particular characteristics of
microcomputers, mainly their short word length. Specifically, the
accumulation of round off errors in numerical integration performed
by small word computers will be investigated. This choice of effort
was based on what seems to be a reasonable assumption that con-
siderations on errors in numerical integration will-be decisive in
determining the minimum word length required for satisfactory sim-
ulation. Also, since one of the main (and looked for) consequences
of using parallel processing on a large scale will be the possibility
to lower the sampling period of all signals and in particular the step
size of all integrations, the discretization (or truncation) error will
no longer be a source of great concern; a fact which dramatically
emphasizes the new and major role played by the accumulation of
round off error in the envisaged solution. Indeed, in many cases
the use of an extremely unsophisticated integration algorithm like
the Euler's method would be perfectly sufficient to make the dis-
cretization error smaller than the accumulated round off error.
The effectiveness of a partial use of double precision arithmetic
will be shown in this presentation and the validity of including in the
arithmetic an algorithm for symmetric rounding will be discussed.
Two simple criteria will be established, in a particular but typical
one-dimensional problem, that relate the accumulation of round off
errors and the word length of the computer. However, no final,
definitive, recommendation will be given regarding the minimum word
length to be used for a given maximum error in the simulation of a
complex problem (that is a problem involving a n*n order differential
equation, possibly including non-linearities). In such a case a math-
ematical approach to the question of the accumulation of round off
errors becomes very quickly hopeless, and no other solution has been
found than a comparative simulation (several simulations with different
word length) of each particular system. Nevertheless the conclusion
that has emerged at the end of this study is that in many instances
short word computers will be adequate for simulation purposes and
that it will be possible to obtain valid results using extremely short
step sizes.
27
Assuming that the dynamic system to be emulated can be divided
into blocs of simultaneous differential equations, the simulation
problem can be summarized, using state space representation, as the
search for the real time solution of
—
 =
 x'
 =
 IS*' ") 03-1)dt
where x. i-s the state vector and u_ is the input vector. This vector
equation can also be written as
X« — I AX « , X n , . « . , X j , U . | , U p , . . . ( U , I
X2 = Vxl' X2" " ' xd* ul' U 2* ' ' ' ' ud^ _
xd fd^xl 'x2' " * * 'xd* Ui»u2" ' * * u d ^
In the projected multi-microcomputer emulator each one of those d
equations in (2) would be integrated by its own dedicated computer.
Thus, for each individual integrator the problem can be reduced to the
integration of one differential equation
x) = f (x . , tu) (3-3)
where the forcing term u- now represents the combination of all terms
in x.: and u- in the i*n equation of (2), except for those terms involving
the dependent variable x-.
In this study, attention will be given first to analysing the numeri-
cal integration of (3) alone, where u^ is assumed exactly known, in
order to focus on the errors introduced by the integration process
itself. However, the preceeding assumption cannot be made when the
equation is part of a system of simultaneous equations because of the
dependence of the forcing term on the other state variables. In a second
step consideration will be given to the system of equations as a whole.
28
3 . 2 Euler's Method • . . . ' • • : . • ;
Euler's method to integrate (3), where the subscripts can now be
forgotten, is given by the algorithm
Xr, = X
0 O
where xo is the initial condition and T is the step size of integration.
However, this formulation does not accurately represent the relation-
ship between the successive results of the numerical integration
because it does not take into account the rounding of numbers in the
various operations needed to calculate xn+j. Assuming that the
computer uses floating point arithmetic, a more accurate represen-
tation of that relationship is
xn+l MV tT«V %>
V = Y^X0 xo
where the symbol a is defined as the computed value of the function
or the variable a and a* indicates its rounded value (to one computer
word). Formulation (5) assumes that
1) no overflow occurs at any point of the computation
2) the step size of integration T is chosen such that it can be
exactly expressed by a one word number.
A more manageable form of equation (5) can be obtained by
explicitly expressing the error
xn+1 = xn + T f(xn, un) + en+1 (3-6)
where en+1 is defined as the local round off error [1]. It can be
evaluated by
n + T f(xn, un)]
n + [T «xn. un)]*J
[Tf( . . . ) J* - Tf(. ..) + T f ( . . . ) - Tf(. ..) (3-7)
29
, ,.« ..^P^ot**
^•p&
..^ S5^ ^
..,..;/:.-ii---?.-r->;s=a
f;r.;t~:--Vl~~;C*?/^ C"'--?
3.3 .The Local Round Off Error . I1 •
The way in which round off errors are generated in the process
of performing a particular arithmetic operation depends greatly upon
the implementation of the arithmetic and the type of computer used
(there is a difference for example if the computer uses a simple or a
double accumulator [2]). However, generally speaking, there are two
ways to round a number to a N-bit mantissa floating point expression
where the first bit is reserved for the sign [3], [4] :
1) by simple truncation (or chopping), then all bits beyond the N
most significant ones are simply dropped
2) by symmetric rounding, then a special algorithm is required :
first the (N+l)th bit of the mantissa is added to the Nth one,
then only the N most significant bits of the result are retained.
The N-bit mantissa floating point representation of any number y
can therefore be written as
y* - y(l - e) (3-11)
where
if y is simply truncated to form a N-bit mantissa floating point
number, and
-N^ , -N
-2 £ e <2
if y is symmetrically rounded.
The local relative round off error e cannot be known a priori for
every number that will be used during the integration process except
for its bounds. Therefore, the relative local round off error will be
viewed as a random variable with uniform distribution. Moreover, it
will be assumed that all those random variables corresponding to
successive steps of integration are independent. Then the laws about
summing a large number of independent random variables will be
applicable.
31
When no special rounding algorithm is included in the arithmetic,
'computer will naturally round the results of all operations by
fiply truncating them, But, because of the two's complement form
to represent negative numbers, it will actually truncate their
'solute values [1]. Therefore, in these conditions the only safe
F/pothesis that can be formulated about the statistical characteristics
f the local round off error are
(3-12a)
2 2
var(e ) = <T x
n n
<T2 = 2-2N/12 C3-12b)
When an algorithm performing symmetric rounding is included in the
arithmetic, it then can be assumed that
E(en) = 0 (3-13a)
var(en) = CT <r2 - (3-13b)
32
3 . 4 The Accumulated Round Off Error
Substracting (4) from (6), and defining
rn = *n - xn
as the accumulated round off error at the n"1 step of integration yields
This equation describes the evolution of the accumulated round off
error. It can be approximated by
T
 [<
In order to focus more specifically on the errors introduced by the
integration process itself, it will now be assumed that
so that the accumulated round off error becomes
r . t = r +Tg r +e .. (3-15)n+1 n 6n n n+1
where
Equations (14) and (15) relate the accumulated round off error to the
local round off error and to the parameters of the original differential
equation. Looking at the expression found for the local round off error
(7) as well as at those two last equations, it clearly appears that the
accumulated round off error depends on all the aspects of the original
simulation problem in a rather compex manner. To approach it
mathematically will therefore often be hopeless. However, Henrici
[1] has established some theoretical results which will now be
introduced without proof.
33
-"~ SS.vis^ StSiSe..-.'*-^ !51' '
He has shown that assuming
(3-16)
var(en) = <72qn C3-17)
where p and q are two continuous, nonnegative, piecewise smooth
functions of t = nT , the statistical characteristics of the accumulated
round off error are given by
m -H 0(T)) (3-18)
var(rn) = ^(vn + O(T» (3-l'9)
where mn and vn are two functions of t = nT obtained by solving the
following differential equations
m'(t) = g(t)m(t) + p(t) (3-20)
m(0) = 0
and
v'(t) = 2g(t)v(t) + q(t) (3-21)
v(0) = 0
where
This result indicates that the expected value and the variance of the
accumulated round off error, which will be assumed to have a Gaussian
distribution, are directly proportional to the expected value and the
variance of the relative local round off error. It also emphasizes that
the statistical characteristics of the accumulated round off error will
be inversely proportional to the step size of integration T , a fact that
apparently prohibits using in the same time short word computers and
small step sizes. Partial double precision will change this fact.
34
Example
Consider a simple first order system with a unit step input
x'(t) = -Lx(t) + 1 (3 -22 )
x(0) = 0
The exact solution to this problem is given by
x(tl = ±(1 - e"Lt) . (3-23)
L-i
It is represented by a broken line in figure 1, for L, = 1 and L = 0. 1 .
If the numerical integration were perfect, its results would correspond
to points on those curves. Actually when the simulation is performed
with a 9-bit mantissa computer (N = 8), for example, the curves
obtained are quite different as shown on the figure. The output of the
integrator becomes constant at a level much below the limit toward
which it is supposed to tend. This shows that there is a minimum rate
of change of the variable to integrate under which the numerical
integration becomes inoperative. What happens can be expressed by
or
r l7(xn ,un) < 2"N(exponantial part of xn)
x
1
 < 2 _ (exponantial part of xn)
n
When T and N are small this phenomena may generate very large
errors. For example, setting L = 1. in (22) and integrating (22) with
a 9-bit mantissa computer will introduce a steady state error of
about 100% when T = 1/128 sec. and an error of about 300% when
T = 1/512 sec.
35
Fig. 3-1. Integration of x1 = -Lx + 1 with a computer having a 9-bit
mantissa (including the sign bit). The exact solution to this differential
equation is indicated with a broken line.
36
3 . . 5 Partial Double Precision ; • - . . - ' ,
As previously mentioned (10), in most cases of real time simula-
tion the induced error forms the largest part of the local round off
error. To eliminate it would therefore result in a substantial gain in
accuracy, particularly when short word computers are used. This
can precisely be achieved at low cost (a very small increase in
computation time) by performing the addition on the right side of (4)
in double precision. Henrici [1] refers to this algorithm as "partial
double precision".
Using partial double precision, the equation expressing the real
results of the numerical integration, taking the effect of rounding into
consideration, becomes
ft* ^ j£
v - /Y + T ffv# M#H (1 O C ' N
n+1 in n* n J \ o - 2 o j
where the following definitions apply:
1) x is the double precision computed value of the quantity x
2) x* is the value of x after it has been rounded to form a single
precision quantity
3) y* is the double precision rounded value of the quantity y
4) i is the single precision computed value of the function f.
It is here emphasized that although xn will be obtained step after step
as a double precision variable, only its single precision value is used
to compute the function f(. .. ). Therefore, this last computation which
in many instances will be the most time consuming, especially when
the function is highly non-linear, will not require any additional time.
Then, if it is assumed that the function f(. . . ) can be calculated
accurately enough so that the only error degrading its value is its
final rounding to a single precision number
**^ AS /V -V **f(x*, u*) = f*(x*, u*)
n n n n
and (25) can be somewhat simplified
I f sjc f Y *!*
=
 Pn
Once again, equation (26) may be rewritten in a more manageable form
as
37
x = x + Tf (x u ) + e
n+l n luxrf V
(3-27)
where the local round off error is explicitly expressed. Equation (27)
appears to be formally identical to equation (6), however two major
differences should be noted :
1) xn+, is now a double precision value, and this alone suggests a
substantial gain in accuracy
2) en+j is very different than in. equation (6), and will actually
absorb part of the gain.
The local round off error can now be estimated. As before, in
order to focus on the errors introduced by the integration process
itself, it will be assumed that the forcing term is known with an
absolute accuracy. Then
And
n+l
un = un = un
T f
*
(xn' T f*(xn'
T f ( xn'Un> -
= {...}* -{•-.} + T [ f*(. . . ) - f(. . . ) ]
Tg n (x* -x n ) (3 -28)
where
g df(. . .n dx n
Following the same line of reasoning as before, the quantities
/"*•* ~ \(xn - xn>
and
38
will be viewed as two random processes. Specifically it will be
assumed that
when numbers are simply truncated, or,
E(x* - xn) = 0 (3-29b)
when numbers are symmetrically rounded, and
var(x* -x ) = <J|2x2 (3-29c)
n n n
in both cases. In the same way, it will also be assumed that
|E[f*(. . . ) - f(. . . )]| < p'|f(xn. un)| ( 3-30a)
when numbers are simply truncated, or
E[f*(. ..) - f ( . . . ) ] = 0 . (3-30b)
when numbers are symmetrically rounded, and
var[f*(. ..) - f(. . . )] = (T2f2(xn, UQ) (3-30c)
in both cases. For all those expressions
(3-31a)
12 (3-31b)
The statistical characteristics of the local round off error, when
simple truncation is used, can then be found by introducing (29) and (30)
into (28), so that
where
39
and M is the number of bits in the mantissa (not including the sign bit)
of a double precision number. Also, applying the rules regarding the
summation of independent random variables, it can be found that
var(en+1) = <J2X2 + T2 CT2f2(xn. un) + T2g2(j'2x2 (3-33)
where
0-2 =
 2-2M /12
When symmetric rounding is used for all computations the expected
value of the local round off error is of course assumed to be
E(en+1) = 0 0-34)
while its variance is still given by (33).
Now, assuming that JJ is very small compared to )J and 0" is
very small compared to <T' , the proceeding expressions can be
somewhat simplified. Equation (32) becomes
while equation (33) now reads
var(en+1) = T2(T'2 [ f-2(. . . ) + g2x2 ] (3-36)
This last assumption could probably not be made in most cases where
a large computer is used but it will be valid in many cases involving
short word computers. For example if a 16-bit microcomputer has its
word divided into 7 bits for the exponantial part and 9 bits for the
mantissa, its double precision mantissa will have 25 bits and the first
term in (32) and (33) will be negligible compared to others for most
realizable T's.
At this point, Henrici's formulas to evaluate the statistical para-
meters of the accumulated round off error [see (18) and (19)] can be
called upon to notice that because of the factor T present in the
expected value of the local round off error (35), when partial double
precision is used together with simple truncation, the expected value
of the accumulated round off error will be independent of the step size
of integration and much smaller (very roughly of a factor 1/T ) than in
the case where partial double precision was not used. In the same way,
it appears that because of the factor T2 present in the expression of
the variance of the local round off error, the general effect of the
accumulation of round off errors will decrease with the step size of
integration when both partial double precision and symmetric rounding
are used. Indeed, in this last case the expected value of the accumula-
ted round off error would (at least theoretically) be zero, while its
variance would be proportional to T. This is certainly the most inter-
esting consequence of using partial double precision. It should be
emphasized however that in order for this to be true the first terms
in both (32) and (33) must always (for any n) be negligible compared
to the other terms. If, for example, T were very small and as a
consequence double precision would not be enough to fulfill this
requirement, it could be of interest to program some kind of triple or
even quadruple precision algorithm. To do so would not represent a
very difficult problem since the corresponding extra-long variables
would have to be used in an addition only.
There is a second very positive consequence of using partial
double precision : the minimum rate of change of the variable to
integrate under which the integrator will not be able to follow properly
is now given by
_-M
x' <-? . (3-37)
n
 T
Since M is considerably smaller than N, (37) represent a substantial
improvement compared to (24) when a short word computer is used.
41
First Example
Consider a simple linear first order system with unit step input to
be simulated by a short word computer using partial double precision.
The differential equation to integrate is
x
1
 = -Lx + 1 , x(0) = 0 (3-38)
Its exact solution is
x(t) = 1(1 - e-Lt) (3-39)
L
The expected value of the local round off error can be found using (35).
It should be noted though that because the solution xn will always (for
all n) be positive the absolute value signs in (35) may be removed and
the inequality now becomes an equality. The quantities f(xn, un) and
xn cannot be known exactly a priori, however assuming that the results
of the numerical integration will closely follow the exact solution, those
two quantities can be approximated using the exact theoretical solution
(39). Then
x = 1(1 - e"LnT) (3-40a)
n
 L
f(xn ,un) = e'LnT ( 3 - L f O b )
gn = -L (3-40c)en
Therefore, (35) gives
E(en+1) = -
or,
E(e(t)) = -T|i'[2e'Lt - 1] (3-m)
The expected value of the accumulated round off error can now be
estimated using (18) and (20). Equation (20) becomes
m'(t) = -Lm(t) - T[ 2e"Lt - 1 ]
m(0) = 0
42
Its solution is
m(t) = 1[1 - (1 +2Lt)e"L t ] (3-42)
Li
It the a follows that
E(r ) = £'[1 - (1 +2LnT)e-L n T ] (3-43)
L
In the same way, the variance of the local round off error can be
approximated by
var(en+1) = T2 <T'2 [ e-2LnT + (1 -e-L n T)2] (3-44)
And the variance of the accumulated round off error can be estimated
using (19) and (20) ; it has been found to be
var(r ) = T(r' [1 - 4e'LnT + (3 + 4LnT)
 e-
2LnT
 ] (.3-45)
n
 2L
The next two figures represent the evolution of the expected value of the
relative accumulated round off error, that is E(rn)/xn , and of the
variance of the relative accumulated round off error, that is var(rn)/xn
calculated according to (43) and (45).
Experimentation relatively to this example has been conducted in
the following way. Equation (38) has been simulated with two parallel
integrations. For the first simulation the full length of the mantissa of
the computer (a HP2100 using a 32-bit word with a 23-bit mantissa) was
used. For the second integration only 9 bits of the mantissa were
retained for calculations (that is the mantissa was truncated or rounded
to a 9-bit expression after each arithmetic operation). However, in
this second computation the full length of the mantissa was used for the
"double precision" variables. The accumulated round off error
corresponding to a 9-bit mantissa was then estimated as the difference
between the results of those two integrations. Furthermore, in order
to obtain an average value and an estimated variance, for each set of
parameters, equation (38) was simulated 21 times giving for each pass
a slightly different amplitude to the forcing term (the 21 different
amplitudes were evenly spread between 0. 95 and 1.05). Some of the
typical curves obtained in the course of this experimentation are given
following the figures representing the theoretical curves.
43
Fig. 3-r2. Expected value of the relative accumulated round off error for
the first example. Theoretical curve corresponding to equation (43).
i.
L =/. T= 1/3.2
30 -5O 60 t (3cc.)
Fig. 3-3. Standard deviation of the relative accumulated round off error for
the first example. Theoretical curve corresponding to equation (45).
RELATIVE AVERAGE ACCUM. ROUND OFF ERROR
TRUNCATION) . ..
. X<e) -8 , T-l/32
Fig. 3-4. Relative average accumulated round off error corresonding
to a real simulation of example 1 (9-bit mantissa). —
RELATIUE AVERAGE AOCUM. ROUND OFF ERROR
(SIPIPLE TRUNCATION)
0 i^.o a»7c asT0 aaTc 3?70 nd.e . 4^.0 sJTe 5 .^0 s.e
X' • -UX + 1. , X < 0 ) - 0 , T-1/S12
Fig. 3-5. Relative average accumulated round off error corresponding
to a real simulation of example 1 (9-bit mantissa).
45
.eaL.
RELATIVE STANDARD DEVIATION OF ACCUfl.
ROUND OFF ERROR
T-l/512
iSIflPLE TRUNCATION)
5.'00. . 10 .015 .0 20.0 as
T1.PIE (SEC,)
.1 ,'fc 111 -•-••;« {
53.0 64.0 !
Fig. 3-6. Relative standard deviation of the accumulated round off error
corresponding to problem 1 (9-bit mantissa).
3.02
u.0ei
RELATIVE STANDARD DEVIATION OF ACCUH.
ROUND OFF ERROR
(SIMPLE TRUNCATION)
X' • -L*X + 1. . Xte)-0 . T-l/32
I'.91
.0el 1 : 4-
s 7 0 0 1 0 7 0 is\
(SEC)
0 a.e 25 ~^—
-»— q=>«^ =fg^ -^ j— ->t I I .-.^ l_-^  - =J
.0 30*.e 3S.0 49.* 45.0 si.e 55.0 €9.9
' '3-7. Relative standard deviation of the accumulated round off error
corresponding to problem 1 (9-bit mantissa).
46
13.0-
8.
s.ee.
.0C_
s.ea.
*.e0ei
RELATIVE AVERAGE ACCUfl. ROUND OFF ERROR
(SVPinETRIC ROUNDING)
X' • -l*X + 1. ' . X(0) -« , T-l/32
L-.01 \ ; v TIME (SEC)
.« 45 .c siTa 55.e iS.e
^t;
Fig.: ;3-8. Relative average accumulated round off error corresponding to
example 1 (9-bit mantissa).
19.
8.
6.
4.
2.
02-
02.
0C.
02.
.02-^
*.000i
RELATIVE AVERAGE ACCUM. ROUND OFF ERROR
(SYMMETRIC ROUNDING)
X' • -LXX +• 1. , X(e»)-C , T-l/512
L-.i
15.0 20.0 25.0 34.0 35.0 437
TIHE (SEC)
7 0 5 5 . 0
-2.1
-4.(
-6.1
-8.0L
Fig. 3-9. Relative average accumulated round off error corresponding to
example 1 (9-bit mantissa).
Second example
Consider now a linear first order system with a sinusoidal input
to be simulated using a short word computer and partial double precision.
The steady state operation will be assumed in order to examine whether
the accumulated round off error will also reach a steady state type of
evolution, or on the contrary will grow without bounds. The differential
equation to be integrated is
x
1
 = -Lx + Us inwt (3-46)
Its steady state solution is
x = A sin(wt +¥) (3-47a)
where
V = arctan(-w/L) (3-27b)
A = U/(LcosV - wsin^) (3-27c)
Using (35), the local round off error is found to be
|E(e(t))j ^ Tu ' [ |-LA sin(wt +V>) + Us inwt l
- LAJsin(wt +^
or, simplifying,
JE(e( t ) ) |< UTji' |sinwt| (3-48)
An approximation for (48) can be obtained by the first terms of its
Fourier series. Then,
|E(e(t))|^ U T u ' [ ^ - !cos2wt] (3-49)
Now the contribution of the two terms on the right side of (49) to the
accumulated round off error can be examined separately. The constant
term gives, using (20),
m0(t) = 2UT_(1 _ e -L t }0
 L
48
Therefore (18) becomes
1 -
 e-
Lt) (3-50)
and for t large enough
- I < >*'2U (3.-51)
I 3C L
The cosine term in (49) gives, using (20),
mc(t) = Bcos(2wt+V) (3-51a)
where
Vx= arctan(-2w/L) (3-51b)
B = Wltcd-cosY - 2wsinV) (?
Formula (51) shows that the accumulated round off error will
indeed reach a steady state type of evolution, except when L = 0 ,
then it will grow without bounds. This result seems to emphasize
very strongly the importance of the ratio U/L which will determine
the value of the DC term of the expected value of the accumulated
round off error. Formula (52) indicates that this last quantity will
oscillate around its DC term, but also that the amplitude of the
oscillation will be limited. Indeed, (52c) shows that
- If w « L .then B ~ 2|E(rn)l
- If w ~ L/2 B « |E(rI n
- If w» L/2 B «|E(rn|a
av
v
The variance of the local round off error can be found for this
example using (36).
var(e(t)) = T2(T2 [ ( -LA sin(wt +f) + U sinwt )2
+ L2A2sin2(wt +V) ]
49
AVERAGE ACCUMULATED ROUND OFF ERROR
-s.e.
-10.
-12.L (SIMPLE TRUNCATION - 9 BIT P1ANTISSA)
-14 _ X' • -UX + SIN(U*TinE) , U-.314 , T-l/33
-16
-18
t-l. TIME (SEC)
0 . He r."
Fig. 3-10. . Average accumulated rouad off error corresponding to the
second example. Two other similar simulations performed with
w=l. and w=. 1 have yielded the same curves except for the ripple.
50
.a
13.e.
10.e.
6.CC.
4.e«.
3.ec.
STANDARD DEVIATION OF ACCUfl. ROUND OFF ERROR
(SIMPLE TRUNCATION - 9 BIT flrtNTISSft) ",
X' • -l*X + SIN(U*TinE> , U-.314
• <:••': "":: . 5^-:'."- --• - T-I/SS
TIHE (SEC)
7 no. 120.
Fig. 3-11. Standard deviation of the accumulated round off error corres-
ponding to example 2. Two similar simulations with w=l. and w=. 1
have yielded the same curves except for the ripple.
51
!.«(„
.86.
-.66.
-.86.
-l.fl.
AVERAGE ACCUMULATED ROUND OFF ERROR
(SYMMETRIC ROUNDIMG - 9 BIT MANTISSA)
X' • -UX + SIN(U*TinE) , U-.314 , T-l/33
Fig. .3-12. simulatioa correspoading to the second example.
.80.
-.88.
-l.fl.
AUERAQE ACCUMULATED ROUND OFF ERROR
(SVfiriETRIC ROUNDING - 9 BIT MANTISSA)
L-I. . L-.I , t»e
X' • -L1X + SIN(U*TinE) , U-.314 , T-l/128
Fig. 3-13. Simulation corresponding to the second example.
52
puooas Suipuodsaaaoo upijetnuns
(03S) auxi
jjo QKinoa *unoo^ jo Nouyinsa -'BO*?
•3-[duiBxa puooas aq; o; Suipuodsaaaoo
(035) 3WU
•e?T_ M8JT
jjo asinoy 'unooy jo
or, simplifying,
var(e(t)) < T 2 (T | 2 (U 2 + 2L2A2 + 2LA ) C3-52)
Then, the accumulated round off error is given by (21) and (19). It was
found that
v(t) < _ ( l - e - )
 ( 3 _53)2Lj
where
C = (U2 + 2L2A2 + 2LA ) (3-54)
From (53) it can be deduced that the variance var(rn) will be bounded.
Indeed, (19) gives, for t large enough,
v a r ( r ) _ (3-55)n m
Experimentation regarding this example has been conducted
following a procedure similar to the one developed for the first example.
Some of the most typical curves obtained are given in the following
figures. It results from the various simulations performed that when
numbers are simply truncated formulas (50) and (52) give a quite good
description of the expected value of the accumulated round off error.
It is clear that this quantity can become very large when L is very
small. Simple truncation is therefore not advisable with short word
computers. When symmetric rounding is used, the expected value of
the accumulated round off error is very small, near zero, as antici-
pated. Then, the most important characteristic of the accumulated
round off error becomes its variance or standard deviation. Curves
relative to this quantity are also given. They reveal that the variance
of the accumulated round off error is badly described by the various
expressions established so far. No satisfactory explanation has been
found regarding this fact. However, the size of the standard deviation
is a comforting factor which sustains the hope of being able to use
short word length computers for numerical integration.
54
3.6 Effect 'of The Limited Accuracy of The Forcing Function
In many cases of real time simulations, the inputs of the emulated
dynamic system will be obtained from analog to digital converters.
The accuracy of the corresponding forcing functions will therefore be
limited by the number of bits given by the conversion.lt can easily be
shown that the accumulation of round off errors degrading the results
of an integration will, in such a case, be dominated by the effects of
the rounding of the forcing term, provided the computer uses a mantis-
sa somewhat longer than the word of the A/D.
For simplicity, the function f(xn, un) will be assumed linear. It
can therefore be written as
f ( xn'un> = -Vn + un < 3 - 5
where Ln may be a function of n and where un represent the forcing
function. The equation corresponding to the evolution of the accumula-
ted round off error (14) now becomes
rn+l = rn + Tf - Lnrn +^n] + en+1 (3-57)
where
- un (3-58)
Equation (57) can be approximated by
rn = - Lnrn + TAun + en+l (3-59)
The term en+j is obtained as before from (27) and (28). And, assuming
an efficient use of partial double precision, it is found that
h^Wl < Tu'[ | f(xn ,un) | - Ln|XtJ] + Tn|Un| (3 -60)
where n is the expectation of AU • Also,
var(en+1) = T2<T'2 [ f2( . . . ) + L2X2 ] + T
55
where d is the variance of Aun. Since the function f ( . . . ) has been
assumed linear, these last equations can be simplified. Thus,
pteh+ll ^ Tf 'M + T?KI C3-63)
and
O O O O O O i*? 0
var(en+1) = T*<r>2[21$,* - 2Lnxnun + u2 ] + T2f u^
(3-64)
The expectation and variance of the term Aun viewed as a random
variable are known since un is given by an A/D. They are, assuming
that the A/D gives K bits with symmetric rounding,
a = 0 C3-65a)
=
12
 2"2K (3-65b)
If the computer performing the integration uses symmetric rounding
and has a mantissa a few bits longer than the word of the A/D, the
first terms in (63) and (64) will be negligible compared to the second
ones, so that then
E<en+l> = ° (3-66a)
var(en+1) = T2 t f2u2 (3-66b)
These last expressions combined with (59) show that the accumulated
round off error at the output of the integrator will depend on the
rounding of the input only. Asa consequence, if for example the input
(or forcing term) is given by a 8-bit A/D, a computer with a 12-bit
mantissa will perform as well as a computer with a 40-bit mantissa.
Experimentation on this point has been conducted following a
similar procedure as for the preceeding simulations. The results of
a first integration performed with a 23-bit mantissa were compared to
those obtained in a second inte gration performed with a mantissa
reduced to 9 bits by symmetric rounding. The same simulation has
been then repeated with, for the second integration, a mantissa reduced
to 12 bits by symmetric rounding. The average value and standard
deviation of the difference are given in the following figures.
56
1.02U
. 1
.80L *'
.62.
.42.
I
•
24
.0e
Fig; 3-16.
AVERAGE ACCUMULATED ROUND OFF ERROR
(SYMMETRIC ROUNDING - 9 BIT MANTISSA)
•~4—-^ An—\=7^ 4idc. Ivi0.v l^2'0.
TIME (SEC)
"X' - -UX + SIN(U*TinE> , U-.314 , T-i/32
- INPUT ROUNDED TO 9 BIT MANTISSA f. P. N.
AVERAGE ACCUMULATED ROUND OFF ERROR
(SVnBETRIC ROUNDING - 9 BIT MANTISSft)
A itYvA A A/1\ VyJl S.,^ 1 ^ /[
96.0 160. l ie.12*.
TIME (SEC)
-.set
-i.0
Fig. 3-17.
60.0 '7^.0
.0
X' • -UX t SIN(U*TinE) . U-.3H , T-l/128
INPUT ROUNDED TO 9 BIT MANTISSA F. P. N.
57
i.eeu
.88.
.62-
.48.
STANDARD DEVIATION OF ACCUPI. ROUND OFF ERROR
(SYflflETRIC ROUNDING - 9 BIT MANTISSA)
L-.ei
3C\0 46.0 5d.0 60. L0 100. Ii0. 120.
'
 3
"
18
-
X' • -UX * S I N C U X T I P I E ) , U-.314 , T-l/33
INPUT ROUNDED TO ft 9 BIT PtoNTISSA F. P. N.
TlttE (SEC)
i.ea. STANDARD DEVIATION OF ACCUM. ROUND OFF ERROR
ij
.80L
I
[
.68.
.48.
*.001 R O U N D I N G - 9 BIT MANTISSA)
A-
li.0 ad.e 3^.0 4^.0 5^.«T 60^.0 7d.0 80.0 9d.0 ife 110. 120.
TIRE (SEC)
X' • -L»X * S I N < U * T i n E > . U-.314 , T«l/138
Fig. 3-19. INPUT ROUNDED TO A 9 BIT HANTISSA F. P. N.
58
AUERAGE ACCUMULATED ROUND OFF ERROR
(SYMMETRIC ROUNDING - 18 BIT MANTISSA)
.ei lea. lie. 120.
TIRE (SEC)
Fig. 3-20.'
- X' • -L*X + SINtumriE) , W-.314 . T-i/32
INPUT ROUNDED TO A 9 BIT MANTISSA F. P. N.
AVERAGE ACCUMULATED ROUND OFF ERROR
(SVmiETRIC ROUNDING - 12 BIT MANTISSA)
120.
TIHE (SEC)
Fig. 3-21.
X' • H-UX + SINfUmtlE) , U-.314 , T-l/128
INPUT ROUNDED TO A 9 BIT PIANTISSA F. P. N.
59
STANDARD DEVIATION OF ACCUM. ROUND OFF ERROR
(SYMMETRIC ROUNDING - 12 BIT HftNTISSA)
.001
sd.e 9d.
- ..3-22.
.e 30. e 4d.e 5^ .0 sd.e ?^ .e si.e 9^ .0
..
X' - -UX + SIN(UXTIRE) , U-.314 , T-l/32
INPUT ROUNDED TO ft 9 BIT (1ANTISSA F. P. N.
ffo 130.
TIME C S E C )
1.0 STANDARD DEVIATION OF ACCUN. ROUND OFF ERROR
.38.
*.00ei (SYPIHETRIC ROUNDING - 12 BIT RflNTISSA)
L'0.0
~^_ i^ r-f_ \-^ _.__rr. __tr_ _ -JT- U ^_ J- • - .!
.0 4«".e~5^re"'"6^o"~73.«'" sT.e 9^.0 1^0." i"Jo7~ ii
Fig. 3-23.
>:' • -L*X * SIN(U*TIME) . U-.314 , T'1'128
INPUT ROUNDED TO ft 9 BIT PlftNTISSft F. P. M.
60
3.7 Integration of a System of Differential Equations
As mentioned in the introduction, integrating a system of simul-
taneous differential equations can be done with parallel operations by
assigning a separate integrator to each individual equation. After each
step of integration the results of those independent computations will
be exchanged among the integrators for the updating of the various
forcing terms. As a consequence, round off errors will propagate
from differential equation to differential equation and may become very
large. To study their accumulation mathematically will in many cases
be hopeless, especially if the equations are non-linear. Then, simula-
tion will be the only way to obtain an evaluation of their effect. :
However, the observations made in the preceeding sections apply to
integrating systems of equations as well as single equations. Therefore
it can be said that if partial double precision and symmetric rounding
are used, and if the coefficients L in the equations are not all very
small, the accumulated round off error will remain small.
Example
The second order differential equation
x" + 25£wcx' + w^x = u (3-67)
has been simulated as a system of two simultaneous equations
x1 = y
2 v, (3-68)
y' = -w£x - 2£wcy + u
where the forcing term u was a sinusoid of unit amplitude. As for the
previous experimentations the integration has been performed twice :
once using the full length of the mantissa (23 bits), then reducing the
mantissa to 9 bits by symmetric rounding. The average value and the
standard deviation of the corresponding accumulated round off error
has been calculated and is shown in the following figures.
61
AUERAGE ACCUMULATED ROUND OFF ERROR
(SVnilETRIC ROUNDING)
.e i m . i i l e . ise.
TIME (SEC)
U-.314
Fig- 3-24. Simulation corresponding to the example on page 36.
62
3.9t.
2.0* -
STANDARD DEUIATION OF ACCUFI. ROUND OFF ERROR
t.M1 (SYMIIETRIC ROUNDING)
; •• .;. . T«l/32 - ' . -•'.,:.-;^.---.:- ''. ;
. . TIME (SEC)
Le 20.e ai.e 4^.0 50.e 6^.0 ?0.e si.e 90.e ife Tier~2e.
W J • U ' _. ___ „ ._ ____ - _A ™ T - - - - - " " "
V -"-.ux -.a*v + SIN(UtTIME) , U-.314
Fig. 3-25. S imulation of a system of second order.
STANDARD DEUIATION OF ACCUN. ROUND OFF ERROR
(SVMMETRIC ROUNDING)
i.c a .^'e ai.a 4d.a sd.e sd.e ?d.0 sfl.a sO ife Tio. Tae.
TIHE (SEC)y,' • v
V - -.1»X - .2»V * SINCUITIP1E) . U-.314
Fig. 3-26. Simulation of a system of second order.
63
AVERAGE ACCUMULATED ROUND OFF ERROR
(SYMMETRIC ROUNDING)
X' • V
V - - X - .5*V + SINCUtTIflE) , U-.3H
TIME (SEC)
Fig. 3^27.. Simulatioa of a second order system.
STANDARD DEVIATION OF ACCUM. ROUND OFF ERROR
(SYWIETRIC ROUNDING)
10.0 30.0 30.0 40.0 50.0 60.0
X' • Y
V • - X - .5*X •»•
*i \s--J vi V"-v *] xx^ vi v~*v vi
si.e 9d.0 1^0. iT07~7ie.
TIRE (SEO
, U-.314
Fig. 3-28. Simulation of a second order system.
3.8 Conclusions • • > • • i .
The mathematical approach to the problem of the accumulation of
round off errors in numerical integration leads to two general observa-
tions. First, all the parameters of the specific simulation under
investigation have to be considered; general rules and formulas about
the evolution of the accumulated round off error can therefore be
established only for a limited range of well defined problems. Second,
in many cases this theoretical approach will be hopeless (especially
when non-linear equations or large systems of simultaneous equations
are involved) because it calls for solving differential equations which
will be at least as complex as those defining the original problem;
comparative simulation will then be the only way to obtain useful
information on this question of round off accumulation.
Regarding the specific question of using short word computers for
numerical integration, this research has shown the gain to be realized
by a partial use of double precision, or partial double precision in the
sense of Henrici. Indeed, if the ratio between the length of the double
precision mantissa and the length of the single precision mantissa is
large enough, the problem of the accumulation of round off errors will
be drastically changed. In the worst case the accumulated round off
error at any point of the integration will become independent of the
chosen stepsize of integration; in the best case it will slowly decrease
with it. This fact certainly opens the door to using microcomputers
working with very short sampling periods to perform real time
simulation.
At the same time the effects of partial double precision were :
studied, those of using simple truncation or symmetric rounding were
also analyzed. Because of the large difference in magnitude between
the expected value of the accumulated round off error and its standard
deviation, symmetric rounding (which results in making the expected
value go to zero) becomes a necessity with short word computers.
It has been shown that this choice combined with partial double preci-
sion will result in very small accumulated round off errors (in many
instances smaller than the least significant bit) when the coefficient
of the dependent variable (L) is not too small compared to the ampli-
tude of the input signal. For this particular analysis the input signal
was assumed exactly known (with an infinite accuracy).
65
A separate look has been given to the question of the accumulation
of round off errors when the input signal to the emulated dynamic
system is given by an analog to digital converter. It has been found
that then a computer with a mantissa slightly longer than the converter
word will perform as well as a computer with a much longer mantissa,
provided symmetric rounding and partial double precision are used.
The consequence of this observation is again that microcomputers
will be able to handle simulation as efficiently as much larger and
more expensive units when the problem of emulating a particular
system requires analog to digital converters.
66
References
(1) Peter Henrici, "Discrete Variable Methods in Ordinary Differen-
tial Equations", John Wiley & Sons, 1962.
(2) J. H. Wilkinson, "Rounding Errors in Algebraic Processes",
Prentice-Hall, 1964.
(3) Bede Liu and Toyohisa Kaneko, "Error Analysis of Digital Filters
Realized with Floating-Point Arithmetic", Proc. IEEE, Vol. 57,
No. 10, October 1969. -
(4) Bede Liu, "Effect of Finite Word Length on the Accuracy of
Digital Filters -- A Review", IEEE Trans, on Circuit Theory,
Vol. CT-18, No. 6, November 1971.
67
SECTION 4
PREPS ARCHITECTURE AND IMPLEMENTATION
4.1 Introduction
This section is a summary of a one year effort in studying and
developing a multimicrocomputer system to be used in parallel simu-
lations. Emphasis was placed on the practicality of reconfigurability,
and the compatability of software and hardware. A demonstrative
system was built during this year and is described herein.
The system design chosen to meet the objectives is reminiscent of
analog computer architecture and uses special computer modules--
working in parallel-- to allow a simple definite assignment procedure for
various parts of the simulation.
The special representation of the simulated system- which led to
the hardware organization- is described first, followed by a complete
description of the hardware and software.
This project is based on the premise that a multimicroprocessor
simulator has potential for being practical if the required hardware-
software combination can be made reconfigurable in a practical manner.
A multimicroprocessor system is envisioned as consisting of a large
number of interconnected microprocessors and a minicomputer. The
rapid technological developments and improvements in LSI microproces-
sors and arithmetic support chips make a special purpose, reconfigur-
able, parallel processor dedicated to real time flight simulation an at-
tractive alternative to a standard, large computer complex. It has been
68
clear for some time that increases in processing speed through techno-
logical advances have almost reached an upper limit. Thus, attention
has been focused on architectures which exploit parallel processing of
data. The advantages are quite significant:
Optimal (minimal) cycle time reduces the phase shift intro-
duced into the process being simulated.
Short cycle time decreases truncation error.
Easy way to expand the system is provided by adding inde-
pendent computing modules.
Better reliability achieved by the modular nature of the
design, also providing ease of maintenance.
The system to be described suggestes two levels of reconfigura-
bility. Flexible software modules provide the user an initial command
language similar to analog computer program diagrams. These modules
are designed to simplify the job of generating the simulated system, by
providing blocks of programs which occur frequently in a particular
form. Simple examples are integrators, weighted sums, non-linear
functions, etc. These blocks, although fixed in regular use, can be
redefined easily, or others can be added as needed to the system
library. Integrations, for example, can be switched from Euler method
to Adams-Bashforth method or any other desired one.
Sophisticated arithmetic units connected by a common bus to a
master computer comprise the hardware. Reconfigurability is achieved
by independent operation of each unit and by the asynchronous mode in
which they interchange data. Finally, man/machine programs supply
69
the relations between software and hardware under specifications given
by the user and under optimization algorithms that reconfigure the
system to give best results as to speed and performance.
4.2 System Software
A state variable description of the dynamical (possibly non-linear
and time varying) system is the basis for the block diagram representa-
tion of the simulation system. Independent parts of the differential
equations can be evaluated simultaneously, thereby achieving paral-
lelism. For example, consider the simple system given by
XI = X2 - K4*X1 + K2*X3 (4-1)
X2 = X3 + K1*R + K4*X1
X3 = 2*X2 - K2*X3 + K2*R
with initial conditions
X1(0) = X10 (4-2)
X2(0) = X20
X3(0) = X30
This system can be represented in block diagram of the kind shown in
Figure 4-1. This structure is similar to that suggested for direct
execution processors by Korn [3] but is herein used for programming a
multimicroprocessor system. This necessitates a "library" which in-
cludes multiplications, divisions and integrations as blocks of programs.
We can see that the terms K2*X3, K2*R, K4*X1 can be evaluated once
(per cycle) although used several times in the system of equations.
The block diagram representation eliminates multiple evaluations by its
70
X(N
1-I MULT — K 2 _r Ml II TMUL 1
Fig. 4-1. Block Diagram Command Language
71
nature. The terms K4*X1, K2*X3, K1*R, K2*R can all be in process
simultaneously and this is where the multimicroprocessor system has
advantages over sequential processing units.
The library blocks are written in the microprocessor's assembly
code or preferably microcoded and stored in local PROMs. Assignment
of blocks to a processor is done by loading the corresponding command
codes into the local RAMs. There they appear as
MULT
A1
X1
K4
SUB
A2
X2
A1
where MULT and SUB are command codes, A1 and A2 are labels of
results and the other are operands. Command codes can be library
functions or input-output definitions. I/O codes are carried out by
special purpose hardware interfaces hooked to the system bus.
72
A task list for the example is shown in Table 4-1. This list
indicates the fact that tasks may need results from other tasks as their
operands.
Table 4-1.
Task List
M.- — A i 8 ~~ 1 P
A2 = KI * R Ag = Ag + X3
A3 = K2 X3 A10 = A4 " A3
A4 = K2 * R A11 = A5 + A10
A5 = 2 * X2 X1 = INT (A7)
Ac =.AQ - A- X0 = INT (AQ)
D O I £ . 3
A? = Ag + X2 X3 = INT (A^
It is noted again that the "library" of tasks was chosen just for
the demonstration, and more complicated functions may be programmed
by the user. At the preceding step the list is broken down into several
parts, and each of the available computers gets some of the tasks to
be carried out. A timing diagram of a process (given by the example)
is shown in Figure 4-2. The time needed for a completion of a task
depends on the hardware speed, and is discussed later. Each library
function has an associated execution time which can be measured experi-
mentally when created. Those periods of time are used by an optimization
algorithm, which shares the tasks between the computers (before the
73
TIME"
v SEO
con .
720-
560 -
/inn -HUD
320-
240-
160
80 "
x2
A9
A8
A4
A,
V
A
,1
A10
A5
A2
Xl
A7
A6
A3
Fig. 4-2. Timing Diagram
74
beginning of the simulation process), thus minimizing waiting loops
which may occur when a task in process needs the result from one that
has not been processed yet. In the example shown no such situation
occurs, and the simulation program is processed efficiently.
4.3 System Hardware
As viewed from the block diagram representation there are two ways
to achieve parallelism:
1. Simultaneous processing of tasks by the available microcomputers.
2. Increasing efficiency in data flow between modules.
The first point indicates that a given simulation system should be analyzed
first and broken into subprocesses in which tasks do not depend on
each other's results, thus processed simultaneously. The second point
implies a special hardware architecutre. If we assume that several tasks
may need the same piece of information, an inefficiency may occur when
the common bus is occupied more than once to transfer the same
information. An interrupt system and preanalysis of the given simulation
can solve this problem, based on a "write only" bus architecture. This
system is conceptually identical to a shared memory system where all
computers use a common memory to read and write. The difference is
that instead of one memory module we have an image of it in each local
memory. The advantage of the image idea is that although the need
for a piece of data arises asynchronously in the computers, which might
lead to several transfers of the same data from the memory, the writing
occurs only once and the data is distributed in parallel to the computers
that need it.
75
Each of the computers is pre-programmed to detect the appearance
of an operand needed by one of its tasks on the common bus. The
interrupt system, controlled by the master computer, gives a "bus
granted" not to processors that need an input but rather to processors
that have intermediate results to output, and are idle until those results
are accepted by the other processors. This way several modules can
be interrupted at the same time, and perform input operations from the
bus. The optimization provides a way to organize the tasks performed
by each computer, so that their operands are available in the local
memory when needed. The suggested organization that meets the
above description is shown in Figure 4-3.
The master computer performs man/machine communications
before the simulation begins. It is used as an interrupt controller
in the real time run. The number of slave processors is limited by hard-
ware restrictions only.
Two different buses connect the computers to each other:
1. DMA bus.
2. System bus.
The first one is used to load the list of tasks into the local memories dur-
ing system generation. It also provides access to the internal parts
of each module, so the user can check registers, memory bytes and
interface status. If the system library is needed to be very flexible,
it can be written and held on disc and loaded by DMA into local RAMs
for each simualtion.
76
SYSTEM
BUS
MASTER
COMPUTER
^^ H^
PROCESSOR N
PROCESSOR 2
PROCESSOR 1
•^i
-
DMA
BUS
Fig. 4-3. System Organization
77
The system bus interconnects the computers for transfer of
intermediate results, data and status. It includes a 32-bit data path
and an 8-bit control path.
Ultimately, the slave computers will be realized with bipolar, micro-
programmable bit slice microprocessors because of the associated short
cycle times and variable word length. To reduce costs and ease construction
of a feasibility experiment, the current version combines the utility of
INTEL-8085 microprocessors with the speed of Advanced Micro Devices
AMD9511 ALUs. This arithmetic unit is stack oriented and can perform
80-|jsec multiplications and divisions, 180-(jsec additions and subtractions
as well as several trigonometric and logarithmic functions, all done in
32-bit floating point arithmetic (24 bit mantissa, 6 bit exponent, 2
bits for signs).
As shown in Figure 4-4, each computer is comprised of two main
parts:
1. Arithmetic logic.
2. Communication logic.
The 8085 microprocessor and AMD9511 provide the number-crunching
capabilities. The AMD9511's data path is-8 bits wide, perfectly suited
to the internal bus structure and is treated as an interface port.
Its activation is done by loading the two operands and the opcode.
Its busy line is used to check completion of calculations which, in
turn, interrupts the microprocessor. Status bits indicate overflow,
underflow, negative argument, division zero by zero and argument too large.
78
ARITHMETIC
UNIT
INPUT
PORT
DMA
BUS
OUTPUT
LATCH
COMMUNICATION
LOGIC
SYSTEM
BUS
Fig. 4-4. Computing Module
79
The relatively long time needed by the ALU for calculations is used
by the processor to finish the overhead operations like communication
with other processors or memory operations.
Upon completion of a task, the results are transferred to the output
latches. These are (32 bits wide) tristate flip flops, controlled by the
communication logic which, in turn, is connected to the master interrupt
controller. The microprocessor goes into a waiting loop if previous
results were not taken by the system from the output latches.
The DMA components include gates to the common bus and
interrupt logic. A DMA request by the master computer generates
an interrupt in the slave processors and a jump to the monitor.
This monitor allows major registers and pointers in memory to be
examined or changed by the user. The registers are restored and
the simulation programs take over upon a start command by the
user.
Communication logic handles address and status bits used by the
interrupt system. It consists of a one word comparator (6 bits) and
programmed Ik word comparator. Upon completion of a task, the cor-
responding task number is loaded by the local CPU into one side of
the first comparator. The comparator is activated when the same
number appears on the system bus—sent by the master computer which searches
for ready tasks. The output gates are opened upon equality, and data
flows to the system data lines. Also, a status bit (called s~) goes low,
indicating to the master computer, as well as to the other computers, that
data is ready on the bus.
80
A local operand list created in advance for each computer when
tasks are assigned is used to generate interrupts for read cycles.
This list consists of task numbers whose results are used by the pro-
cessor during the simulation cycles. The list is loaded as ones and
zeros in the corresponding addresses of a 1 bit wide Ik RAM used
as comparator. Appearance of one of these addresses on the common
bus along with low level status bit (s,,) causes an interruption in
the local process and performance of input operation. Input of data
by a slave computer is done as follows:
1. Set a status bit (s1) to indicate input operation.
2. Read data (8 bits at a time four times).
3. Reset status.
The data source computer checks s1 reset before releasing the bus,
otherwise an input error occurs in the reception. Pre-programming
of the Ik RAM is done by DMA operation during system generation.
The master computer consists of minicomputer and sophisticated
interface to the system bus. This interface is actually a microcomputer
which handles the DMA operations and the interrupt control. Four 8
bit data ports are dedicated to communication with the minicomputer, 2
ports are used for DMA transfers, 8 ports are occupied by the
system bus and 3 ports are used for general control information.
Also, the interface microcomputer functions as a programmable system
clock which generates the basic cycle time and counts the true time
needed for the multiprocessor system to complete the tasks.
The data base of the interface computer includes an address list
of all tasks to be performed. State variables appear first in this list.
81
The addresses are sent to the system bus during the cycles, providing
bus grants, according to the following algorithm:
1. Wait for beginning of cycle signal
2. Reset task counter
3. Ouput next address in list
4. If 52 = 1 go to 3
5. If S1 = 0 go to 5
6. If 51 = 1 go to 6
7. If end of list go to 1
8. Go to 3.
Step 4 is used as feed back to check bus request. Steps 5, 6
indicate the reception of the data by other modules. The algorithm by
which the task list is searched can be changed easily from linear to any
other desired priority schedule. Finally, the independent nature of the
computers may lead to a different concept of implementation: instead of
one cycle time for the whole system we can assign a single task to each
computer and let it run freely. This demands much larger hardware
support, but may reduce discretization error and increase speed signifi-
cantly. This idea necessitates some changes in the control programs,
but the hardware is completely suitable.
Man/machine communicatiosn are performed by the minicomputer.
This includes a high level simulation language, an optimization algorithm
to assign tasks to available computers and utility programs such as a
file manager, DMA driver, etc. A simulation language has not been
82
developed yet, although the block diagram approach previously men-
tioned seems attractive. The assignment policy of tasks to different
processors has a direct influence on speed and performance. The ideal
solution may cause an equal share to be given to each one, and also
eliminate idle cycles when one processor waits for results from another
processor which has not yeat completed its job. Three algorithms were
considered:
1. Critical path determination.
2. Trees.
3. "Fill up" tables.
The first one is based on the fact that each simulation system has
a limit of parallelism. This is determined by the longest (most time
consuming) path of tasks that feed each other with results, and must
therefore be evaluated sequentially. Finding the critical path helps to
determine the minimal cycle time. The rest of the tasks must now be
examined and shared between the rest of the processors. This adds
complexity to the algorithm because it does not provide a way to con-
tinue the sharing procedure.
The second one suggests building trees of connected tasks, with
weights on the branches, relative to execution times. Again, there is
no simple way to determine the relations between the number of avail-
able processors and the parts of the trees, especially for tasks whose
results appear several times in the calculations.
The third one seems the clearest and simplest to implement, al-
though run time may be longer. The data base consists of the tasks
determined earlier. In addition, there is a work list which contains
83
initial conditions and a library. The library relates experimental exe-
cution time to each function. Each processor is represented by a file.
A time vector describes the current situation of each processor.
The algorithm assigns the next task to the file whose cor-
responding element in the time vector is minimal, in order to keep time
values in the vector as close to each other as possible. The tasks are
chosen from the original list if their operands appear in the work list.
A task which is being assigned to a file is also added to the work list
indicating that at this stage its calculation is complete. This algorithm
does not always determine the optimal task share because of the random
way they are chosen from the original list. The timing diagram in
Figure 4-2 is a result of using the described algorithm. Each task
being processed has its operands ready before the execution, and no
need for waiting loops arises.
4.4 Summary
An experimental, reconfigurable multimicrocomputer system for real
time simulations has been described. Its features may be further
investigated, using the demonstrative system that was built. An effort
should be continued to improve software and hardware: high level
simulation language must be developed using the block diagram repre-
sentation and modular l/o interfaces must be designed, based on the
existing bus structure and operation.
The current system is constructed of three slave microcomputers
and a controller computer connected to HP 2100 minicomputer. Each
computer is a modular unit, built on one board and connected to S100
84
mother board (15 slots). Cost of hardware is less than $2400, includ-
ing the computers, chassis and power supply.
85
REFERENCES
1. R. O. Lejune and E. S. McVey, "Computer Word Size Required for
Specified Error in Numerical Integration," Proc. 12th Annual
Southeastern Symposium on System Theory, May 19-20, 1980,
Virginia Beach, Virginia.
2. D. Klien and E. A. Parrish, Jr., "A Reconfigurable Multiprocessor
System for Real Time Simulation," Proc. IEEE Southeastcon '80,
Opryland Hotel, April 13-16, 1980, Nashville, Tennessee.
3. G. A. Korn, "High-Speed Block-Diagram Languages for Micro-
processors and Minicomputers in Instrumentation Control and
Simulation," Comput. and Elec. Eng., Vol. 4, 1977, pp. 143-159.
86
DISTRIBUTION LIST
Copy No.
1-3 National Aeronautics and Space Administration
Langley Research Center
Hampton, Virginia 23665
4-5 National Aeronautics and Space Administration
Scientific and Technical Information Facility
P. O. Box 8757
Baltimore/Washington International Airport
Maryland 21240
6-8 E . A . Parrish
9 - 1 0 E. S. McVey
11 - 12 G. Cook
13 LA. Fischer
14 - 15 E. H. Pancake
Clark Hall
16 RLES Files
222R
0737:ras
