Vectorized Circuit by Wang, Yi-Xiang & Hwang, Kai
Purdue University
Purdue e-Pubs
Department of Electrical and Computer
Engineering Technical Reports








Follow this and additional works at: https://docs.lib.purdue.edu/ecetr
This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact epubs@purdue.edu for
additional information.








School of Electrical Engineering
Purdue University
West Lafayette, Indiana 47907
This research was supported by an IBM research grant to Purdue 
University
VECTORIZED CIRCUIT ANALYSIS 
USING A MODIFIED NEWTON ALGORITHM 





♦This research was supported fully by an IBM research grant to Purdue University from Nov. 
1982 to Nov. 1985.
TABLE OF CONTENTS
ABSTRACTv,..i:„.„.(.,v;u.;
CHAPTER 1 - INTRODUCTION.
1.1 Circuit Analysis Methodologies............
1.2 Circuit Analysis On A Supercomputer.
1.3 Organization And Contributions.........







2.2 The Modified Newton Algorithms.............. ;■..;..........
2.3
CHAPTER 3 - COMPUTATIONAL REQUIREMENTS ... ..
3.1 Circuit Formulation Using The MNA........................
3.2 Computation Steps In The MNA ...........
3.3 Complexity And Convergence, Issues...i............,..........










4.2 Subcircuit Decomposition Programs.............
4.3 Main Network Update Programs.............___......................





CHAPTER 5 - CYBER-205 SIMULATION RESULTS 43






CHAPTER 6 - CONCLUSIONS AND SUGGESTIONS .............
6.1 Concluding Remarks.................. ..................................... .....




LIST OF REFERENCES 















In this report, a newly modified Newton algorithm (MNA) and a data 
structure for sparse matrix manipulation are presented for analyzing large-scale 
electronic circuits on the Cyber-205 supercomputer. The MNA is improved 
from the Multilevel Newton Algorithm (ML NA) developed by E abbat, 
Sanjiovanni-Vincentelli, and Hsieh (1979). The time complexity and conver­
gence rate of MNA are analyzed. The computation steps are shown in detail 
by some example circuits. Scalar and vectorized simulation programs have 
been tested run on a VAX 11/780 Scalar machine and on the Cyber 205 vector 
processor at Purdue University. From the results obtained, we observe that 
the MNA results a speedup of about 100 on the Cyber-205 as compared with 




the paper organization and research contribution! Eelabbd previous TVorks are 
briefly reviewed.
1.1 Circuit Analysis Methodologies
Digital computers have been used widely in large-scale circuit analysis. 
This report presents a Modified Newton Algorithm (MNA) for circuit analysis. 
The supercomputer Cyber-205 is used for analyzing large-scale electronic 
circuits with this new algorithm. In the time-domain, a nonlinear lumped 
circuit system is characterized by a set of differential equations. [1] [5]
f( u(t), u(t), t ) = 0 T > t > 0 , (1.1)
where u(t)£Rp is a vector of node voltages, or branch currents, or capacitor 
charges, or inductor fluxes, and 0 is the origin in Rp, The mapping, 
f: RpxRpxR1—>Rp, is a differential function with respect to u(t) and u(t), On 
a digital computer, the Backward Differential Formula (BDF) [16] can be used
to discretize the operator -7- . The BDF of order k is defined by:
■ 'k .. '■■■■ ; ■■ ■
! -hun + 1 = *n+i-i , (1-2)
, i=0 .
where un + 1 is the computed value of u(t.n + 1), and un + 1_; is the computed 
value of u(tn + 1_j), for i = 0, 1, • • • k. The time increment h = tn + 1—tn, and 
the <*i’s are selected such that Eq 1.2 is exact for polynomials of the degree<k. 
Substituting Eq 1.2 at t =tn + 1 into Eq 1.1, we obtain
f ( un, • • • un + 1_k, tn+1 ) = 0 0 < tn+1 < T (1.3)
Since the k past values un,...un + j_k are known at time tn+1? Eq. 1.3 becomes a 
function of un+1. Then Eq 1.3 can be written as
Fn+i(un + i) = 0 ■;, t•:
Where Fn+1 : Rp—►RP is continuously differentiable, and the index n + 1
2
indicates different time instants. Then a digital computer can be used to solve 








Where'V is the voltage, across the capacitor, I is the . current through it, and C 
is the capacitance. Using the BDF to discretize Eq. 1.6, we obtain
VB+i=V-+h- *n + lc
or
Ul = YVn+x--S-V„ (1.8)
Using Eq.1.8, at time tn + I, the capacitor is equivalent to a resistor and a 
current source, called an associated discrete circuit [5j. Figure 1.1 shows this 
equivalence for a linear capacitor at time tn + 1. After such an equivalence, 
there will be no time dependent elements in the circuit. Only linear resistors, 
nonlinear resistors, independent and controlled sources appear. Then the 
circuit can be solved by the Newton Raphaon Algorithm at time 
•t'n + i:for".' V:<;;tn + i < T . : •
There are several algorithms for solving the nonlinear equation defined in 
Eq.1.4, such as Single Level Newton Algorithm (SL.NA) and Multilevel Newton 
Algorithm (MLNA). This report presents a newly modified Newton algorithm 
to analyze large-scale circuits and studies the speedup from code vectorization.
1.2 Circuit Analysis On A: Supercomputer :
Circuit analysis requires to solve the linear system of equations:
.. ;. A-X =b' ■ ■■. (1.9)
When the size of a circuit is large, the matrix A becomes very large and very 
sparse. Since ’the additions'and' multiplications' with the zero operands are 
redundant, avoiding them may gain speedup and save memory space. There 
exist several techniques for sparse matrix manipulation, such as using a row- 
column pointer structure and bit matrix mask structure as described in [2). 
For the-:-row-column pointer method, the nonzero elements of A are stored
Figure 1.1
3
rowwise in increasing order. We denote this vector as NZ . This vector has 
length m, where
' . m = p-n2 (1.10)
The parameter n indicates the dimension of the matrix A, and p is the 
percentage of nonzero elements in A. This method needs to use two extra 
integer arrays to locate the nonzero elements. One array, called the row 
identifier array IUR, has length n and contains the location of the first nonzero 
element of each rows of A. Another array, called the column identifier array 
IUL,, has length m and contains the corresponding column numbers. Adding or 
multiplying the nonzero elements need to access NZ, IUR and IUL for locating 
the nonzero elements in A. These operations need extra CPU time beyond the 
regular addition or multiplication times. This method is used only when A is 
small.
The second method uses a bit mask matrix B to replace the vectors IUR 
and IUL to locate the nonzero elements. The matrix B has the same dimension 
as A. Each entry in B has only one bit, with a value 1 for a nonzero element 
in the corresponding position of A, and a value 0 for a zero element. For a 
computer which has the capability of bit processing like Cyber-205, memory 
space can be saved when this technique is used. However, to locate a nonzero 
element requires to count the number of i’s in B from the beginning. When 
the dimension of B is very large, the counting may become very time 
consuming..
The third method uses the same bit mask matrix B as in the second 
method. An integer array R is used to indicate the first nonzero element in 
each row of A. For example, R(i)=j means that NZ(j) is the first nonzero 
element in the i-th row of A. Although this method needs a little extra 
memory to store the vector R, only one row of I’s in B needs to be counted at 
one time. So it can reduce the addressing time from 0(h2) to O(n).
The supercomputer Cyber-205 at Purdue University has two vector 
arithmetic pipelines and a bit masking pipeline. It has a complete set of 
instructions for bit processing. Therefore the Cyber-205 is very suitable to 
implement the modified-bit-matrix method for manipulating very large and 
sparse matrices.
JUS Organization And Contributions --
A new algorithm, MNA, is proposed in this paper which is developed from 
the .SLNA and MLNA. The SLNA and MLNA have some problems when used
for solving a very large scale circuit. The MNA is developed to overcome these 
problems. A given circuit is partitioned into a main circuit ajdd many 
subcircuits in our approach. These subcircuits are treated as a vector and are 
solved by a pipeline supercompute efficiently. The nonlinear equations are not 
used here. And we do not have to solve the Jacobian matrices as in ML NA,
Solving a large-scale circuit partitioned intp / -levels, demands / -levels of 
Newton loop in MLNA ( one main loop and / -1 inner loop ). So the number of 
iterations in MLNA increase as an exponential function of the number of levels. 
There is no inner loop in MNA. The number of iterations in MNA is a 
constant. The MNA has quadratic convergence in most cases, which is faster 
than the "Pairwise quadratic convergence” in the MLNA. Using the MNA to 
perform circuit simulation experiments on Cybur'-205, significant GPU time can 
be saved.
Chapter 2 explains the MNA and compares it with the SLNA and MLNA 
in their relative merits. Mathematical proofs of MNA are given there. In 
Chapter 3, two examples are used to illustrate the computational steps in the 
MNA. The complexity and convergence of MNA are then afialyed. Chapter 4 
shows sparse matrix techniques for solving large matrices on supercomputer, 
and illustrates how to vectorize the MNA . The scalar version of programs are 
also explained. In Chapter 5, the Cyber-205 is used to solve large-scale circuit 
examples by various program versions. The scalar computer VAX 11/780 is 
used as a reference machine to solve the same problems. The results are 
presented based on simulation experiments. The speedup of each computation 
step is shown by some curves. Conclusions and suggestions are given in 
Chapter 6. The input data sets, numerical results from the simulations and 
three versions of circuit simulation programs are attached in the Appendices.
CHAPTER 2
THE MODIFIED NEWTON ALGORITHM
This chapter reviews the Single-Level and Multilevel Newton Algorithms 
and presents the new algorithm, MNA, for large-scale circuit simulation on a 
vector processing supercomputer
2.1 Single Level And Multilevel Newton Algorithms
The Single Level Newton Algorithm (SLNA) has been used to analyse the 
circuits widely. Since the complexity of solving a circuit characterized by an 
rixn matrix is 0( n3 ). It will require many hours to solve a large system With 
more than several thousand unknowns. Moreover, the main memory in most of 
today’s computer does not have enough space to hold the entire data base,
The tearing techniques are used to overcome these problems [17]. In the SLNA, 
a set of nonlinear equations F(X) = 0 is used to characterize a nonlinear 





Where xJ is an unknown vector in the j-th iteration, the 





. Ax> = xj + 1 - xJ ■;;'-:-:;(2,2j'
In general, a large-scale digital circuit or memory array may have many 
repeated subcircuits with the same structure as shown in Fig.2.1a. Assume all 
subcircuits are only connected to the main circuit, and no connections each 
Other, the matrix in Eq.2.1 will have a block-diagonal form as in Fig.2.1b, In 
the tearing technique, this matrix is ■partitioned into one main matrix and some 




Figure 2.1 A main
6
1) LU decomposition of submatrices
2) Solve for
3) Matrix multiplications
A • :■“•SSI ^ssi^ssi
Ri from ^ssi ^i ^smi
T; from T-tJ — A •• l v ssi ^msi
ci from ^ssi ci ^si
- RiTi
Ki = TiCi
4) Solving the main matrix
5)
V'' n -■ ’ : ■ n -
^mm S ^rni
. i-1
bm - F b *:m Zj . mi
; V’' i=l ‘
^ssi ysi — I ^si ]
U • x ■■■ y •ssiAsi Jr si.
Where from step 1 to 3 are used for solving the i-th submatrix Assi. Assume 
the dimension of the submatrices is ns, the time complexities in the first three
steps are about — ns3, ns3 and ns3. The calculating in step 2 and 3 may take a 
3
long time when the submatrices are large. If there are N submatrices, they can 
be solved in parallel, and the total time complexity is estimated to be;
N ( —ns3 + ns3 + ns3 ) ~ —N ns3v 3 .. ■ ' *: 3
Step 4 is for solving the main matrix. Let n is the dimension of the main
matrix, then the time complexity in step 4 is about —n3, and the time
3
complexity in step 5 is about N-ns2. Assume the number of the Newton 
iterations for this approach is p, the total time complexity for tearing technique 
SLNA will be about
p( — Nns3 + — n3) = ns3 + n3)
Another approach is called Multilevel Newton Algorithm (MNLA) which 
use a inner Newton loop to solve the subcircuits instead of the step 1 to 3 
mentioned above. We can Use
F( U, Y, w) “ 0 (2.4)
to formulate a main circuit, and use
®i( Up Yj, X; ) - 0 (2.5)
to formulate the i-th subcircuit S; in the circuit. The P and Hj are sets of 
nonlinear equations, the U, Y, W, U;,Yi? and Xj are vectors, U is the outputs 
of the main circuit, Y is the inputs of the main circuit, ta is the inner variables 
of main circuit, Uj and Y; are inputs and outputs of the i-th subcircuit, and 
UjEU, Y;(EY, X; are the inner variables of the i-th subcircuit.
At the j-th iteration of the main 
following equation; :
( ^11 + M. _dYr ■ dF
V AT 7 UV ATI / >BV dY dXJ dw
loop in MLNA, one has to solve the
-F(Ui , Yi , <J)
AoA
AUj '= Uj + 1 - Uj
Aa>* — + l — uj




























. _. . •
M ^Fn ’ ' 9Fn
Bul Boj2 ■ Bum
9Yi m Mi




8\Jl dV2 • ' ‘ *Uk
■: ^p
Since the nonlinear equations F(X) is known, the Jacobian matrix [ —— ],
^p ^p
fe f and [ —— ] in Eq.2.6 can be obtained directly. But the Jacobian 
BY Bu
matrix [ -—7 ] can be obtained only after all of the subcircuits are: solved. 
. ; B U
Fortunately the subcircuits are only connected to the main circuit and no
connections are between each other. The internal variables in different 






When YD and Ua are not in the same aubeircuit, ——2- becomes zero So we can 
-vauq .
•
separately calculate these nonzero entries in [—- j as follows:
Assume the i-th subcircuit is characterized by Eq.2.5, Since the elements 
of XJ- are calculated from the main circuit in last iteration, the Eq.2.5 can be 
written as ■
Yr) =0 \ (2.11)
Then another Newton algorithm loop (inner loOp) is needed to solve the Xj and 




AY: ~ -H( X , Y ) (2.12)
After this loop wq obtain the unknowns in the subcircuit X; , Yi , Moreover, 
from theEq.2,5, we have
W. + M. UK + M







1 an ' <
IX=X; Y=Y; dY . .. / ('J-**-, -‘M. . av
5Hj dllx anx

















dx. dX2 • • dXb
dH, dH, dH,
dYt dY2 v ' Wc
dll2 dll2
&YV dY2
^H„s dH„s ' ' ‘
dY,; dY2 • • • dYc
Where ns is the dimension of the subsystem, b is the number of inner variables 
in the subsystem, c is the number of inputs and outputs in the subsystem. 
Substituting the results X; , Y; from the inner loop into Eq.2.14, we obtain the
dY
values of which is just needed by Eq.2.9 in main loop [5]. Therefore for
each iteration of main loop, it needs to do a whole inner loop to solve the X; 
and Y;.
Let the N, n, and ns are all as defined in SLNA, the time complexity for
solving the main circuit and subcircuits are about — n3 and — ns3. Suppose a
3 3
two-level circuit demands p iterations in the main loop, and q iterations in the 
inner loop. The total time complexity in MLNA is about
P (-g: qN-ns3 + -^-n3) =■ (qNns3 + n3) (2.14)
In this approach, the inner loop is used to solve the subcircuits instead of 
solving some extra matrices and matrix multiplications, if q equals to 7, 
Eq.2.14 would be the same as Eq.2.3. Lin has proved that, if we do not apply 
the latency technique, the amount of computation in MLNA should be close to 
that in SLNA.
11
2.2 The Modified Newton Algorithm
The new algorithm, Modified Newton Algorithm (MNA) is proposed to 
improve the efficiencies of the SLNA and MLNA. This new algorithm reduces 
the calculation steps and number of iterations, and preserves all the advantages 
of the SLNA and MLNA. Sometimes, it even improve the convergency of the 
MLNA Moreover, we do not need to use all the nonlinear equations F and H;, 
and can avoid solving any Jacobian matrices like those in Eq.2.6 and Eq.2.13.
In the SLNA , we can apply the Newton algorithm at the element level. 
That is to establish a associated discreted equivalent circuit to simulate the 
nonlinear circuit in the j-th iteration, and use the linear nodal equation to solve 
it [6] (7]. In the MNA, we use a set of independent sources and controlled 
sources to simulate the subcircuits at each iteration. Then we include these 
sources in the main circuit and use linear equations to solve them. Figure 2.2 
shows the the flowchart of the MNA.
For ex ample j consider N nonlinear subcircuits each with c + 1 ports which 
are connected to a main circuit. The structure of the equivalent current 
sources for the subcircuit is as shown in Fig.2.3. There are c voltage controlled 
current sources G|(vs) which are functions of the port voltages vs, and c 
iterative independent current sources Jjj in each equivalent subcireuit. The 
superscript j means that they have the different values at different iterations. 
The main circuit and these equivalent subcircuits can be solved by the 
Associated Discrete Equivalent Gtrcwjf (ADEC) method.
Any current at any port must be a function of the voltages across all ports 
of a subcircuit as shown in Fig.2.4a. Where vs is the vector of voltages across 
all ports of tflis subcircuit, Ik is the current through the k-th port of the 
Subcircuit. When this port is considered as one branch of the main circuit as 
shown in Fig.2.4b, we can use the nodal equation to solve the main circuit. 
The Jk and Ek characterize the independent sources in the main circuit, vkis 
the branch voltages in the main Circuit, I is the branch current in the main 
circuit. Since the currents through all other devices in the main circuit are also 
the functions of the branch voltages. So they can be characterized by the same 
funH.ibn.'V^tVjs) as the subcircuits. For an example, if we replace a linear 
relistor iri the main circuit by the active devices in Fig.2.4b, the Jk, Ek will 
become zero, and the Gk becomes only a constant.
According to the Nodal Equations Method (NEM) [6], the main circuit with 






UPDATE MAIN NETWORK 
SOLVING MAIN CIRCUIT
. y - -
SUBSTITUTE TO SUBCIRCUITS
STOP
Figure 2.2 Flowchart of the MNA













Figure 2.3 Equivalent current sources in subcircuits
Subcircuit
Subcircuit
One Branch of Main Circuit
Figure 2.4 The equivalent circuit of subcircuit ports
12
A J = AI ^ Ag(vs) (2.15)
Since
Atvn = v = vs - E (2.16)
vs = A‘vn + E
Equation 2.15 can be written as
Ag(Atvn + E) - A J = 0 (2.17)
A is the reduced incidence matrix of the main circuit, g is a function of 
(AtVn + E), I is the branch current vector, J is the independent current 
sources vector, and E is the independent voltage sources vector.
J =
^1 > ^2 i
Jf > J2 >
E = E, ,EJ2 j E»f
Where the n is the number of the nodes in the main circuit, vn is the vector 
that indicates the voltages form all nodes in main circuit to the datum, vs is 
the voltage vector of the subcircuit ports and the main circuit devices. Using 
the Newton Raphson algorithm to solve Eq.2.17, We obtain the equations at the 
j-th iteration:
'v„ + E| ,•*
and
r j +1 ^ «.] .—





Ag(Aev^ + E) — AjJ 
A g(Atvi + E) - AJ J2.I8)
A4 vJ + E = vi
:(2.10)
Where is the voltage vector vn in the j-th iteration, and v j is the voltages 
across the subcircuits in j-th iteration, and IJ is the vector of currents through 
the subcircuits’ports in the j-th iteration. Let us define here:
, (2.20)A A J
V = V






=1 s . , ■
■ • (2-21)
bgd 8Gn ' ’ 8Gn
dvl dv2 : V =vj
Gk here are functions of vs. Then the Eq 2.18 can be written as
vj + i =Tn vl-I AYjAn^ [ AIj - 2 (2-22)
or ’■
I AYjA, i^r1 = A [ J - Ij + Y[Jv-ri.l
= A [ J- V + Yj( vj B)| (2.23)
Define:
jsj = Ij-YjvJ (2.24)
Then the Eq 2.24 becomes
[AYjA1 ]v j+i = n A[J-(P -Yjvi) “ Yj E ]
: A |l J .Jj) - Yj E (2.25)
Now, let us look back to the linear circuit. If we use the standard linear 
branch as shown in Fig.2.5 to replace the nonlinear branch in the same circuit 
as mentioned before. Then using the NEM to solve this linear circuit we have:
[AYbA‘]v„=A[J-YbE] (2.26)
Where the A is the reduced incidence matrix, Yb is the conductance matrix, vn 
is the nodal voltage vector, J is the current sources vector, E is the voltage 
sources vector. Here Eq.2.25 and Eq.2.26 have the similar structures. The only 
differences are that the conductance matrix Yb in Eq.3.4I is being replaced by 





Figure 2.5 A branch in a linear circuit
J ~ J/ . These mean we can use some equivalent conversions for solving 
nonlinear circuit per iteration. After the conversion, all subcircuits and 
nonlinearity in the main circuit will be replaced by the linearized discrete 
equivalent circuit, Then the standard NEM can be used to solve the linearized 
circuit. v
The discrete equivalent circuit for each branch is shown in Fig.2.6a , 
where is the entry of jj, GsJk is obtained from some entries in matrix Yj . 
Because the GsJk is the function of vector vs, it can not be characterized as a 
simple conductor but a current source controlled by the vector vg as in 
Fig;2.6a. The circuit in Fig.2.16a can be reconstructed as in Fig.2.6b. By 
comparing the Fig.2-.6b with the Fig.2.4b, we can known why we use two 
Current sources to replace each port of the subcircuits in MNA.
From the Eq.2.21 and Eq.2.24 we have





u Ml dv2 vs2
0G„ dGn &Gn ^sn
dvl m2 Mn
v = ysk-Ek 
vsk = vk + Ek
So the entries of YJ becomes
5Gj _ 0Gj <9vsk _ 5Gj 


























when Gp and vg(j are in different subcircuits. Therefore the parameters in 
different subcircuits can be calculated simultaneously. When there are many 
identical subcircliits in the main circuit, we can treat them as a vector and use 
a pipelined supercomputer to process them efficiently. The major difference 
between the MNA and the MLNA lies in the method of solving the subcircuits. 
When the values of Gj and Jj are calculated in the MNA, it does not use the 
differential equations Hj. The equivalent circuit is used to simulating the 
subcircuits. Instead of a inner loop in MLNA; the simple NEM is used directly 
for obtaining the Gj and jj.
For example, consider c + 1 ports subcircuit shown in Fig.2.7a. Using the 
ADEC method, the equivalent circuit is obtained in Fig.2.7b . Using the MNA 
method, we obtain the port currents in the j-th iteration F. The incremental 
conductance GsJk(vs) is calculated in two steps.
First we set all the independent sources in Fig.2.7b to be zero, The 
resulting circuit is shown in Fig.2.7c. The input voltage vj is applied to the 
subcircuit, and the other ports of the subcircuit are connected to datum as 
shown in Fig.2.8a. Solving it we can obtain the subcircuit incremental current
if 1,112, • U
Li
Then applying the input voltage v^ .to the subcircuit as shown in Fig.2.8b, we 
obtain another incremental current I-j.
H ’ ^2 > ‘ ‘He
After we apply the voltages from vsl to vsc to the subcircuit, the current 
vectors from l| to I| can be obtained accordingly.
Secondly we use these incremental currents to calculate the corresponding 





















With the results of F and Gj, the iterative current sources of equivalent circuit 
Jj cab be calculated by Eq.2.28, Applying the value of Gj and Jj in the main 
circuit and using the ADEC method, we then obtain the complete solution of 
the system in the j-th iteration.
2.3 Comparisons Of Three Newton Algorithms
In the MNA, the given circuit is partitioned in to one main circuit with 
many subcircuits, and the equivalent current sources are calculated in each 
iteration The dimension of the matrices used will be reduced after this 
partition, and these subcircuits can be treated as a vector, and be solved in 
parallel. So the pipelihed supercomputer can be used to process these 
Vectorized equations efficiently. The more subcircuits in the system, the higher 
speedup.-.can'be achieved in MNA.
Using a tearing technique for the SLNA, some extra calculations are 
needed to solve the subcircuits. If do not consider the sparse technique, the 
time complexity of each iteration in SLNA is much higher than that in MNA, 
In ML NA, for each iteration of main loop, the entire inner loop operations 
must be repeated to solve the subcircuits. So a large number of iterations will 
be. demanded. When we solve a 1-level system, the number of iterations may 
increase as an exponential function of 1. This may destroy the advantages of 
vector processing.
MNA has only one main loop. The number of iterations in the main loop 
is a constant which does not increase with the number of levels in the system. 
In each iteration of the MNA, only one LU decomposition and back- 
substitution are needed for solving the equivalent sources for each Subcircuit; 
This leads to potential speedup advantage over a vector processor.
CHAPTER 3
COMPUTATIONAL REQUIREMENTS
Two examples are used to illustrate computational steps in the MNA. We 
start with a multiple-port subcircuit which shows how to compute the 
equivalent sources of the subeircuits. Another example is used to show all 
calculation steps in the MNA. Finally, we discuss the complexity and 
convergence issues of the proposed MNA for computer aided circuit analysis.
3.1 Circuit Formulation Using The MNA
We have to find the equivalent circuits for all the subcircuits in each 
iteration of the main loop. An example is used below to formulate the 
equivalent circuit. Consider a four ports subcircuit in Fig.3.1 which is 
characterized by the following equations:
ij = 0.5 U? + U,
i2 = U24 + U2
i3 ^ Uf ■
; . i4 = 0.5 Uf
is = - Us
In the j-th iteration, we assume initial values are: vj = 1 , v| — -3'and 
Vij — 2. The associated discrete equivalent circuit of subcircuit in the j-th 
iteration is as shown in Fig.3.2 [7]. Using the ADEC method, we obtain the 
following formulation:
. dij ■■
gi = du7 = Ul + 1 = 2 ’ J< =ii-g<Ui =-°-5
4 = -Jf \=4U2 + 1 = 5 , J-j -i.-Kjl'.- 3
2
I > 0.5i.






= 3Uf = 12
dU3
g4 = U4 = 2 ,
g|=2U6-l = 1
, 4 = is “ g|U3 = “l6
Jl = i4-giU4=-2
n - % - giu5 = -i
Using the MNA method to solve this linearized circuit, we have the following 
system of equations: '
0 2 —2 0 -1 0 1 ' V1 0.5
-5 -2 19 0 0 -1 o v2 18.5
6 0 -5 -1 0 o -0.5 v3 -2
1 0 0 o 0 0 0 II 1
0 1 0 0 0 0 0 h 3
0 0 1 o ■o 0 o h
2 ’
-2 2 0 0 0 0 -1
h .
. 2 .






















We set all independent sources in the equivalent circuit to be zero, the circuit
in Fig.3 2 becomes that in Fig.3,3. Then we calculate the incremental 
■ dGj(vs)
'■■in; the following'steps:conductance d\sk
1 > 0.5i
Figure 3.3
Step 1: Let the y, = 1 , v2 - 0 , v3 = 0 , then using the MNA method to 
solve the circuit in Fig.3.3, we have to solve the following system:
0 2;:; -2 0 -1 o 1 0
-5 '■—2 19 0 0 o v2 o
6 0 -5 -1 0 0 -0.5 v3 0
1 6; 0 0 0 0 p II 1
0 1 0 0 0 0 o h 0
0 o 1 o 0 o 0
h
0














Step 2: Let v, = 0 , v2 = 3 , and v3 - 0 
obtain:
-5
Solving the circuit again, we






























= [7,-1 , -5 ]
ffy*.! 5G2(vs)
dvsl ’ ^vs2 ’ ^vs3
[ •2 • < •2 ]
»<yv,i 9G»K) ^G3(vs)
dvsl dvs2 5vs3
= “5 , -2 ,19
22
■3 7x1 + (-I)x3 + (“5)x2 3
%=I1-Gsj2vJ
The final equivalent circuit in the j-th iteration is shown in Fig.3.4. Then we 
use the ADEC method to solve the main circuit.
3.2 Computation Steps In The MNA
Another example circuit ( Fig.3.5 ) is used to illustrate the computations 
involved in the MNA. The associated discrete equivalent circuit for the 
subcircuit is shown in Fig.3.6a. The equivalent circuit for solving the main 
circuit in the j-th iteration is shown in Fig.3.6b.
Assume the initial guess is v ® — 7 , v ® = 1 , and v® — 0 . The 
following steps are needed in each iteration of the main loop:
Step Is Caiculate the G® and J®.
Using the ADEC method, we have
dUx Ui = v2°
2
dU2 JU2 = v3°
0
0
By the MNA method, we obtain the equation
22.a










Main circuit I Subcircuit
Figure 3.5 An example circuit
or
1+g? -1 ;; -1 v2 -n
-1 i+g2° o v3 ' —
1 0 0 . I _ v2°
3 -1 -11 v2 . 1
-i i o
1 o o







where I is the current at port 1°. Then we set all independent sources in 
Fig.3.6nto be zeroand keep theinpht voltage V^/we have
l+g? -i -i v-2 . 0
-1 l+gf 0 v3 . = 7 0




_ 1 j. 2 :
Here 1 is the incremental current. Since the subcircuit has only one input 




Figure 3.6 The equivalent circuit to Figure 3.5
G^ys = 2y,.:::.^
Js° = 1° - Gs° vs = 1 - 2x1 = -1
Step 2: Substituting Gs° and Js° into the main circuit as shown in Fig.3.6h, and 
using the MNA method , we obtain:
or
1 ' “I : .. 1 V1 0
1+GS° 0 v2 -Js°
l : 0 0 h T
1 -1 1 Vi 0
-1 -3'. - 0 v2 zr . 1
1 0 o h 7 .
This step yields the solutions:
Vj = 7.0 ... 
v2‘ = 2.666667 
ia1 = -4.333333
Step 3: Substituting the results from the main circuit into the subcircuit, we 
obtain:
(3-7)
l+g,° -1 1 v2 -j?
-1 l+g2° 0 v3 —; -J2°
1 0 0 . I
The corresponding results are:




Now, the first iteration of entire circuit is completed. The results 
v/ , v2, and vj1 will be used as the initial values to start the second iteration. 
The steps in second iteration will be similar to those in the first one. Detailed 
steps are skipped. Only the results after the second iteration are shown below:
Repeat step 1: Calculate the Gj and Jg1. Using the ADEC method, we have:
= 5.333333
Ul = Vl ..
= 5.333333
y2 = v}
J/ = ii " Si v/ =-7.111111 
H = *2 -£2*2 = -7.111111




0 v3 ■■ -j2v
.6 . 1
v2 = 2.666667 
v3 = 1.543860 
; I = 8.233918'
where I is the port current l1 of the subcircuit. Then we set the independent 
sources in Fig.3.6a to be zero, solving the circuit we have:
Hr,1 • 1 • 1 v2 0
-1 1+k2‘ 0 v3 ■ = 0











G.g\*Vg = 6.175439 v2 
Jg1 = 1* - G^ Vg1-= -8233918
Repeat step 2: Substituting Gg1 and Jg1 into the ma.in circuit and using the 
MNA method, we obtain:
' vf = 7.0 
vf = 2.1230645 
if = -4.876936
Repeat step 3: Substituting the results from the main circuit to the 
subcircuit, we obtain:
vf = 1.458028
The second iteration of MNA is then completed. The values vf , vf , and vf 
will be used to start the third iteration similarly. The results of the main 
circuit and the subcircuit in successive iterations are listed in TableA.l of 
Appendix A. In this example, it takes five iterations to Obtain the exact 
solutions:
v, = 7.0
v2 = 2.0 ■
V3 = 1-0' 
h - “5.0
The procedures described above correspond to one main loop in MNA. Instead 
multiple loops are required in the MNLA. This is the important difference 
between the two algorithms [2].
27
3.3 Complexity And Convergence Issues
From the above two examples for each e-port Subcircuit, the computations 
involved requires to solve c linear systems of equations characterized by 
Axjbj , A x2 - b2., • • • Axc = bc . Since they are described by the 
same coefficient matrix A, the L U decomposition method is used:
A = LU
Ly* = br
U x; ■= yr (3.8)
The matrix A is decomposed into a lower-triangular matrix L and a upper- 
triangular matrix U, Then the back-substitution is used to obtain the vector 
y. and the solution vector X;. In each iteration of the MNA, we need to 
perform one L U decomposition and c back-substitutions for a c-port 
subcircuit, we know that the time complexity for the submatrices LU 
decomposition is 0(ns3), but for back-substitution it is only 0(ns2). When the 
diinension of the submatrices A is large, the time complexity for solving c 
equations with the same coefficient matrix A is 0(ns3) + cO(ns2) = 0(ns3) 
that is the same time complexity for solving one equation.
Table 3.1 gives the time complexities of these algorithms. As a reference,- 
the time complexity of the Semi-Direct Method (SDM) is listed here [19]. The 
time complexity of each iteration in this method is about the same as in MNA, 
but this method has the linear convergency rate. Assume the circuit is 
partitioned to two levels. Tj, T2, T3 and T4 are time needed for solving the 
subcircuit in four algorithms, T is the time needed for solving the main circuit, 
arid S is the speedup of vector processing over scalar processing.
In the MNLA the entire inner loop is required in each iteration of the 
main loop, and one L U decomposition 'is-needed for each inner loop iteration. 
The comparison of computation steps in the MLNA and MNA are shown in 
Fig. 3.7. Figure 3.7a shows a main loop in the MNA and there are three major 
steps in each iteration. Figure 3.7b shows a two-level structure of the MLNA. 
In solving an 1-level system, the program should have 1 levels of looping. If 
there are p iterations in each loop, then the MNLA would need p1 iterations for 
solving a subcircuit in the 1-th level. But only p iterations are needed in the 
MNA. This is a significant improvement, when the system becomes large.
In general the SLNA has a quadratic convergence rate, 
the set of equations in the SLNA. Then the increment
iteration would be approximately the square of the increment in
et FtU) = 0 be 
AU in each 
the last
27,a .:




SLNA* •; MLNA 'i : \ MNA SDM“
SCALAR P(NT,+T) P(qNT2+T) P(NT3 + T) P(NT3 + T)
VECTOR P(NT, +T)/s P(qNT2+T)/s P(NT3 + T)/s P(NT4 + T)/s
* Assume the tearing technique is used
** This method has the different convergency rate with the other algorithms.
N: The number of subcircuits
n: The number of unknowns in the main circuit
ns: The number of unkowns in one subcircuit
s: Speedup of vector processor over scalar processor
T = — n* T, = — ns* + 2ns* = 7-ns*■'3 ■■ ' ' 1 3 . ■ 3 '
T2= J-ns* Ts = ins* T4=ins*
T!>T2 = T3 = T4
27.b








10 CONTINUE 10 CONTINUE
END END
'(*>)'.
Figure 3.7 Program structures in the MNA and MLNA
iteration. Let us demonstrate this by the same circuit shown in Fig.3.5. By 
the SLNA the discrete equivalent circuit is shown in Fig.3.8. Assuming the 
initial guess is the same as before, v® = 7 , v® — 1 , v® — 0 , we have:
dii
di Ui = v?
<jj2
dU 2 [U2 v2°
Jf -if - sfv? ' ' 
if = if-?fvf =0.
Using the MNA method we have the following set of equations:




0 ■'-■-IU 2 + g| 0 v3 -u?
1 0 0 0 .
i
Solving it, we obtain the results after the first iteration of the SLNA.
V,1 = 7.0 - ::
: V21 - 2.666667 .
V3 = 2.666667
i1 = -4.333333
Table A.2 in Appendix A gives the results after each iteration in the SLNA. 
The increments decrease after each iteration at a quadratic convergence rate.
Equations 2.24 and 2.25 show that the MNA has the same convergency rate as 
in the SLNA. This also can be found by comparing the TableA.l and 
TableA.2, They exactly have the same values corresponding to each iteration.
In a strict sense, the convergence of the MNA depends on the structure of 
the circuit. Suppose the circuit is characterized by a set of differential 
equations, F(U) = 0 . Let U* is the solution vector, and the J(U) is the
figure 3.8 The discrete equivalent circuit for SLNA
Jacobian matrix of F(U). When
. det J(U*) = 0 (3.10)
the MNA and SLNA both have linear convergence. However, in general; when 
Eq.3>10 is not true, the MNA and SLNA would have a quadratic convergence 
rate br faster [5] [6]. So we claim that the MNA has a quadratic convergence 
rate.
In the MLNA, the convergence of the main loop depends bn the precision 
of the results in the inner loop. The higher is the precision in the inner loop, 
the higher will be the convergency in the main loop [l|. In general the inner 
loop termination criterion is chosen as ;
AX , AY < min { r° AU , Aw (3.11)
Where AX and AY are the increments of the inner loops, AU and Ao; are the 
increments of the main loop, r° is the initial termination criterion of the inner
loop. It has been proved that for a= 2, the MLNA has the quadratic 
convergence or faster [1]: [5]. However this means the precision of the inner 
loop should be very high. As an ^xainple, withr an increment of the main loop
in the j-th iteration AU , Aw = 0.0001, the precision at the inner loop 
should be equal to or less than 0.00000001, which demands many inner 
iterations to satisfy this criterion. If one wants to improve the convergence of 
the main loop in the MLNA, the number of iterations in the inner loop Would
be increased, and the total time complexity of both main and inner loops may 
not be reduced.
If pr.< 1, inner loop can then use the same termination criterion as used in 
the main loop. The convergence rate of MLNA is neither quadratic nor linear, 
but called ’’Pairwise quadratic convergence” as proved in [5]. This convergence 
is shown in Fig.3.9a. It displays some ’’kinks” which mean that the curve 
alternates between slow-decreasing and fast-decreasing intervals, even when the 
number of iterations j become very large. Figure 3.9b shows the curve of 
quadratic convergency of MNA. It converges faster than the MLNA when 
ct < 1 . The comparison of the convergence rates of four algorithms is shown 
in Table 3.2.
(a) Quadratic convergence
(b) Pairwise quadratic convergence
Figure 3.9 Convergence rates
Table 3.2. Convergence Rates










detJ(U*)*0 (Lin, et al.)
CHAPTER 4
VECTORIZED SIMULATION PROGRAMS
This chapter presents the major vectorized programs used in circuit 
simulation. Section 4.1 describes the subnetwork update program The 
program for LU decomposition to solve the subcircuits is explained in section 
4.2. Section 4.3 illustrates the program for the main network update. The 
programs for solving the main circuit are described in section 4.4.
4.1 Subnetwork UpdatePrograms
As mentioned before, in MNA, the circuit is partitioned to a. main circuit 
and some subeircuits. Hence, the dimensions of the matrices in the equation 
can be reduced. Moreover, if there are a lot of identical subcircuits in the 
circuit, they can be treated as the elements of a vector. For example, assume 
there are N subcircuits which have the same structure, and there are two 
parameters p; and q; in i-th subcircuit. Since all the subcircuits have the same 
structure, the vector P = [pj , p2 , • • • pN] and Q = [qt , q2, • • • qN] can be 
used to represent the parameters in all the subcircuits. If we need to add p; 
and qj up , we can use the vector pipeline of the supercomputer Cyber-205 to 
do the vector addition P+Q for all the subcircuits. It can obtain a higher 
speedup than using the scalar processor to add them individually.
From the example in the last chapter, we know that the associated 
discrete equivalent circuits of the nonlinear resistors as shown in Fig.3.2 are 
needed. And these values are put into the equations before solving the 
equivalent current sources of subcircuits. We call these procedure subcircuit 
update.
Assume the current of a nonlinear resistor is the function of the voltage 
across it,
i = f(v) (4.1)
Then the incremental conductance is calculated by
Mil
dv
The iterative current is calculated by
JJ — jJ — g)yJ
(4-2)
(4.3)
The f(v) can be any kind of function of the voltage v’s, which may assume a 
very complicated form. So, for calculating easily on the computer, we use the
Taylor expansions of functions f(v) and The more items of Taylor
expansion are taken * the more precision will be obtained. We use the first 10 
items of the expansion in our simulation program. The coefficients of the items 
of the functions will be stored in a two-dimensional array called PO. Each 
column of PO corresponds to one type of nonlinear resistor. The coefficients of
Mil are stored in the upper half of the columns in PO, the coefficients for
dv v/'
f(v) are stored in the lower half positions. As an example, there is a PO whose 







It means that the the first 10 items of the Taylor expansion of
first type of nonlinear resistors ate




f(v) = anvfl + a12v8 + , • • • a^o
Generally, there are more than one types of nonlinear resistors in the circuit, 
and the k-th column of PO corresponds to the k-th type of nonlinear resistor. 
A two-dimensional array called DS is used to store the pointer for each 
nonlinear resisitor in the subcircuits. Each row of DS corresponds to one
32
nonlinear resistor. The first and second entries in a row are the numbers of 
nodes to which the corresponding nonlinear resistor is connected. The voltage 
across the resistor can be obtained from these two nodes. The third entry of 
the row indicates which column of PO corresponds to this nonlinear resistor. If 
a number i is in the third position of one row of DS, it means the 
corresponding coefficients of the nonlinear resistor is stored in i-th column of 
PO. The last entry of DS is either 1 or 0. 0 means that the resistor is 
connected to the datum of the subcircuit, and 1 means that the resistor is not 
connected to the datum.
Let us look at an example shown in Fig.4.1a. The first row of DS 
corresponds to the nonlinear resistor shown in Fig.'4. lb, which is connected 
from node 3 to node 5, and not connected to the datum. The corresponding 
coefficients of its function are stored in the second column of PO. The second 
row of DS means that the nonlinear resistor in the subcircuit shown in Fig.4.1c 
is connected from node 4 to node 7 that is the datum of the subcircuit. The
corresponding coefficients of its function are stored in the third column of PO.
All the nonzero elements of the matrix A for solving the subcircuit are 
stored in array NZS, and the'.elements'of-the right vector b are stored in array 
brs. The values of gJ and JJ need to be inserted into the arrays NZS and brS. 
Therefore, two bit mask arrays are used to locate gj and Tin the NZS and brS.
In the simulating program, we assume all the subcircuits have the same 
structure, and each subcircuit has only two ports. Furthermore, we assume the 
symbolic processing and row exchanges, column exchanges for all matrices have 
been done before the simulation. Two integer arrays NZPSc and brSc are used 
to indicate the order exchanges of gj and T in NZS and brS. Another integer 
array CIS indicates the column exchanges of matrix A for subcircuits. The 
program for calculating the gJ and T in subcircuits is as follows:
1) Obtain the voltages accross the nonlinear resistor
P-0 ■■
DO I0i=l, nls 
IF (DS(i 4) EQ.0) THEN 




2) Calculate the polynomial
g(l;y)=0,0
32.a
3 5 2 1
4 7 3 0
DS =
6 5 67
(a) Array DS (b) Resistor used
Figure 4.1 The array DS and resistors used






; 30 CONTINUE .
3) Obtain the J and G ,
-g(l;y)*V(l;y)-I(l;y)
' P=P + 1 
G(i,p;y)=g(i;y)




Where nls is the number of nonlinear resistors in each subcircuit, the y in 
program is the number of identical subcircuits in the circuit. All the 
subcircuits will be treated as a vector and be solved by the vector pipelines of 
Cyber-205.
The vector XS is used for storing all variables of the subcircuits, G is a 
two-dimensional array for storing all g in NZS. Changing order of g and J and 
inserting them into NZS and brs respectively can be achieved by the following 
operations:
p—0




. ENDIF . / .
40 CONTINUE "
p=0 ■ .







Where ms is the number of nonzero elements in NZS, ns is the dimension of 
the matrix A for the subcircuits. If only the scalar processor of Cyber-205 is 
used, the corresponding program to do the same operations as the above one 
will be as follows:
DO 10 z=l, y 
P-0







DO 30j- l, 10 
30 CONTINUE:^
. I—0.V-i.;










DO 50 i=l, ms 
IF (BTOL(NZPS(i))) THEN




50 ' ‘ CONTINUE ■
: • . ■■y p-o.■■ :. V"■ y :" y- V.; ’ y:;.:y . ;’:v;/'yy; y JC'--yV.y':
DO 60 i = l, ns
^ THEN ■ ■y"-;;'':
■ P~P + 1 '
brs(z,i)-J(p)
ENDIF
60 CONTINUE - '
10 CONTINUE
From the program above we can see, the program of scalar version needs an 
additional loop to replace a set of vector instructions in the vector version. So 
it is riot efficient.
Finally, to update the subcircuits, the port voltages of the subcircuits 
which come from the main circuit, should be inserted into array brS. Because 
each subcircuit has only two ports in our simulation, an integer vector bpV is 
used to locate these port voltages in the equation of main network, and a 
variable bpp is used to iridicate their positions in the equations of each 
subnetwork.
4.2 Sub circuit Decomposition Programs
From the example in chapter three, we can see, that the equivalent 
independent current sources and controlled current sources of the subcircuits 
should be calculated after updating the subnetworks. Then these values are 
put into the equations of main circuit in order to solve it in each iteration. In 
MNA, it needs to do only one LU decomposition for each subcircuit instead of 
a loop where one LU decomposition is done for each iteration in MLNA 
algorithm for each main iteration. Since it may need to do more than one time 
of substitutions for one LU decomposition, the operations to do these twTo 
things are written as a separate subroutine.
LU decomposition method is used to solve the equation Ax = b. The 


















The program to calculate the k-th column of L is as follows:
T) Initialize
DO 10 i=k, ns .v:
TF (BTOL(BS(i,k))) THEN/ V-:
; \uum(l;y)~0.0 ■ 
ql=RS(i)-I
2) Calculate the sum of the l;p-upj
///-//.'-DO 20p=l,k-l
IF (BTOL(BS(i,pj)) THEN
ql=ql + l —/'-"\





3) Obtain the k-th column of L
ql=ql + l
; - endif
Where BS is a two-dimensional bit mask array for the subcircuits, RS is an 
integer vector to Ideate the first nonzoero element of each row of A for the 
subcircuits.
Since the first column of does not need any computation which can be 
obtained directly from the matrix A, We can calculate the columns of L only 
from 2 to ns. Similarly, the last row of U does not need calculating either. 
The program for calculating the k-th row of U’is as follows:
37
1) Initialize
IF (k NE.ns) THEN .
DO30 j=lc + l, ns 




2) Calculate the suni qf lip-iipj
DO 40 p = 1, k~l .
IF (BTOL(BS(k,p))) THEN
ql^ql + l ..V ; ... •, . ^V>rV':
IF (BTOL(BS(p,j))) THEN
,; qu=RS(p) + Q8SCNT(BS(p,l;j)))-1 ; -V^v
sum(l;y)=NZS(l,ql;y)*NZS(l,qu;y)*surn(l;y)
ENDIF r : -'v^'
:. 40 ',' CONTINUE ■■ ' : ; / ■
3) Obtain the k-th row of U




30 CONTINUE-' ... : \ ;;
The Q8SCNT is one of the intrinsic functions of the CYBER 200 FORTRAN 
which is available on Cyber-205. Appendix B will give the illustration in 
detail of these intrinsic functions used in our simulation. Since the 
corresponding program of scalar version for simulation is too long, we are not 
going to present it here but put it in Appendix D.
When solving the linear equations by LU decomposition, we need to solve 
the equtions as in Eq.3.8. The elements of vector x and y can be obtained by 
the following formulas:
k-i
(V- E *k,pXp) (t.6)
^ P = 1
38
xk - yk~ E uk,Pxp (4-7)
p —k + 1
The program for solving k-th element of y is as follows:
1) Calculate the sum of lk p yp
sum(l;y)=0.0
dd-RS(K)-l 
DO10j = l, k-1 





Where YS in the program is the temporary vector y.
2) Obtain the k-th element of y
dd=dd+l..
YS(l,k;y)=(brS(l,k;y)-sum(l;y))/NZS(l,dd;y)
The program for solving k-th element of x of subcircuits is as follows:










2) Obtain the k-th element of x
XS(l,I;y) =YS(l,l;y)-sum(l;y)
From the XS we can obtain the values of the currents of the subcircuits. Then 
the equivalent current sources are calculated by Eq.2.27 , Eq.2.28 and Eq.2.29. 
This part of program is simple and we put it in Appendix C. The results of 
equivalent sources for subcircuits are stored in array EG and EJ.
4.3 Main Network Update programs
After the values of the equivalent sources ;arefound, there are two 
operations should be done. First the discrete equivalent sources for nonlinear 
elements in main circuit should be calculated. Second the equivalent sources of 
the subcircuits should be inserted into the main circuit, and then the main 
circuit is solved with the NEM method. We call these operations the main 
network update, which is similar to subnetwork update except that there is 
only one main Circuit in main network update while there are y subcircuits in 
subnetwork update. It is difficult to optimize the vectorized program especially 
when the main circuit is small.
In simulation program the nonzero elements of A for main circuit are 
stored in the array NZ, the vector b on the right side of the equation Ax = b 
is stored in the array br. The integer array D has the same structure as DS in 
subnetwork update. gJ and JJ contain the values of discrete equivalent sources 
in the main circuit, the integer arrays NZPc and bpe are used to indicate the 
position exchanges of them in NZ and br. The bit arrays NZP and brp are 
used to locate the positions of gJ and JJ. The program for calculating gJ and JJ 
is as follows:
1) Find the voltages accross the nonlinear elements
p=0
DO 10 i=l, nl





2) Calculate the polynomial of nonlinear elements
g=Q8VPOLY(v,PO(l,D(i,3);10);g)
1=Q8VPOLY(v,PO(l l,Dg3); 10)-1)
3) Obtain J and G
p=p + l ■
'G(p)-g. '
IF (D(i,4),EQ.O) GOTO 10 
' P=P + 1 •
40
Where nl is the number of nonlinear elements in main circuit. Q8VPOLY is 
one of the intrinsic functions of CYBER 200 FORTRAN for calculating the 
polynomial Since the length of the polynomial is short in the simulation, the 
Speedup for thisfunction is poor. To optimize the vectorized program, we add 
two temporary arrays NZo and bro in the program. The program for inserting 
■ gt and into NZ and br is as follows:
1) Update the NZ
NZo(l;p) =Q8SCATR(G(l;p),NZPc(l;p);NZo(l;p))
G(l;p)—NZo(l;p)
NZ( l;m)=NZ( 1 ;m)+Q8VXPND(G(l;p),NZP(!;m);NZo( l;m)j




Where n is the dimension of A for the main circuit. Q8VSCATR and 
Q8VNPNI) are intrinsic functions which are illustrated in Appendix B.
The bit arrays NZQ and brq are used to locate the values of the 
equivalent sources of the subcirCuits in NZ and br. The integer arrays NZQc 
^v':and;;hqc;;>fe.rjsed' :to 'indicate the exchanges of these values in NZ and br. The 
bit array E is used to show whether the subcircuit is connected to datum of the 
main circuit. A tempora,ry array EO is used to optimize the vectorization. 
Inserting of the equivalent sources can be done by the following operations:
1) Calculate the EO
P=0 ' -
EO(p)=EG(i)












Where m is the number of nonzero elements in NZ, p is a counter here.
4.4 Programs For Solving The Main Circuit
Since only one equation Ax = b is required for solving main circuit, it is 
difficult to vectorize the program. The temporary arrays sum, tu, tl, and NZT 
are used here to optimize the program. By Eq.4.6, yk can be calculated after 
the k-th column of L is obtained. So we compute the L, U, and Y by one 
DO-loop for saving the CPU time. This part of program is as follows:
1) Generate the vector tu
tu(l;k-l)=0j0 
DO 10 i=l, k-1 
IF (BTOL(B(i,k))) THEN 




2) Calculate the sum of lj p-Up j for L
DO 20 i=k, n
IF (BTOL(B(i,k))) THEN
ql=R(i)
tl( 1; k-1) =- Q8VXPND(NZ( ql;k-1 ),B(i, 1 ;k-1); tl(l.;k-1)) 
sum( 1)—Q8SDOT(tl( l;k-l),tu( 1 ;k-1))
3) Obtain the L
ql-Q8SCNT(B(i,l;k-l)) + ql 
NZ(ql) ==NZ(ql) suro(l)
ENDIF
■' 20 , CONTINUE '




5) Obtain the y
Y(k)?(br(k)-sum(l))/NZ(dd)
6) Calculate the sum lj,p-up j for U
. IF (k.NE.n) THEN \ 
ql-R(k)-l
sum(l;dd)=0.0 
DO 30 j=i, k-i 
IF (BTOL(B(k,j))) THEN 
ql=ql + l 
qU=R^
WHERE (B(k,k d? l;dd)) sum(l;dd) =sum(l;dd)+tu(l;dd)*NZ(ql) 
ENDIF
30 vcontinue;;/'.'
7) Obtain the U
hqu=Q8SCNT(B^ 
qu— ql+1
^ZT(l;nqu)^Q8VCMPRS(sum(l;dd),B(k,k + l;dd);NZT(l;nqu)) 
NZ(qu + l;nqu)^(NZ(qu:■+ l;nqu)-NZT(l;nqu))/NZ(qu)
' ENDIF-






The key parts of the simulation programs have been described in this chapter. 
Two detailed versions of the programs are given in Appendices C and D: one
written in
CHAPTER 5
We use large-scale circuit example to obtain the simulation results on the 
Cyber-205. Various, programs for different purposes are used in the simulation 
experiments. Section 5.1 presents the example circuit and initial conditions. 
Section 5.2 presents the results for different program versions, and analyzes the 
speedup performance of the MNA. The implications and further improvements 
are then elaborated. ; >
5.1
An example circuit is used for the circuit analysis simulation. The main 
circuit with the equivalent current sources of the subcircuits is shown in 
Fig.5.1. N is the number of the subcircuits which are connected to the main 
circuit. In our simulation program, N can vary from 1 to 1000. All the 
subcircuits have the same structure shown in Fig.5.2, and they are connected 
together in parallel, which means that they have the same parameters. This is 
for simplifying the input and output procedure such that we only need to input 
and output the variables in one of the subeircuits. Assume the initial guess for 
nodal voltages in main circuit is
L6II0_r 
' 
> v® = 14.3
v| = 6.0 : v4° = 5,0
= 0.0
The initial guess for nodal voltages in subcircuits is
vj° = 15.0 v2° = 5.0
v3° = 7.0 \ v| = 4.0























Figure 5.2 One subcircuit
44
subcircuits. Since the variables in each subcircuits are the same, the results for 
only one of the subcircuits are listed here. All subcircuits are connected in 
parallel. Table A4 lists the variables in one of the subcircuits with different 
number of subcifcuits in main circuit.
In Table A5, the average CPU time in each calculation step is listed, 
where the time unit used is second. The input time and the output time are 
not included in the table. Since the GPU time for each calculation step 
includes some operating system overhead time, doing the same operation may 
take different time. So we use the average values in the tables. There are 
several program versions in our simulation experiments. Table A5.a lists the 
GPU time where the vector pipelines of Cyber-205 are used. Table A5.b lists 
the GPU time where only the scalar processor of Cyber-205 is used. The 
number of subcircuits N used in vector version simulations takes value 1, 10, 
50, 100, 200, 500, and 1000. The same values for N are used in the scalar 
Version except 1000 because the CPU time of the Cyber-205 is expensive. 
Table A5.a and Table A5 b show that the performance of updating and solving 
the subcircuits in the vector Version is even Worse than that in the scalar 
Version when the Circuit contains Only one subcircuit. This is because the 
number of the subcircuits is too small to utilize the pipeline. Table A5.C lists 
the GPU time for using the scalar processor of Cyber-205 and without any bit 
processing instructions used. As a reference, Table A5.d lists the GPU time for 
using the VAX 11/780 machine. Due to limitation by the memory space to 
user, the number of subcircuits in Table A5.C and Table A5.d is only up to 200. 
The average speedup of the Vectorized program versus the program of the 
scalar version is shown by some curves. The curves of speedup for each 
calculation step are shown in from Fig.5.3 to Fig.5;7 . The speedup increases 
rapidly when N increases, Figure 5.3 shows that the speedup of the subcircuit 
update is about 100 when N equals to 500, Figure 5.5 shows that the speedup 
for Solving the subcircuits is about 10. Since there is only one main network, it 
is diffichlt to optimize the vectorized program. Figure 5.4 shows the speedup of 
main networkupdate is about 10 when N is 500.
The dimension of the matrix for solving the main circuit is much larger 
than that for the subcircuits, so solving main circuit takes most of the total 
GPU time. To Optimize this part of program is very important even though it 
may be difficult, We add some temporary vectors to improve the efficiency of 
the program, which will require some extra memory space, but it can save the 
GPU time. Figure 5.6 shows the speedup for solving main circuit is more than 
100; This is significant. Figure 5.7 shows the speedup for back-substitution is 
about 20. Figure 5 9 shows the overall speedup is about 100. From the curves
44.a
50 100 200 500 1000 N
Figure 5.3 Speedup in subnetworks updates
44.b
50 100 200 500 1000 N
Figure 5.4 Speedup in main networks updates
44.c
50 100 200 500 1000 N
Figure 5.5 Speedup in solving subcircuits
50 100 200 500 1000 N
Figure 5.6 Speedup in solving the main circuit
50 100 200 500 1000 N




Figure 5.8 CPU times for scalar and vector programs
50 100 200 500 1000 N





I ■■ ~l■ ■ I ■ <" I • I_, In.,..---- 1----- ---
1 10 50 100 200 500 1000 N
Figure 5 10 The CPU times needed in various simulations
we can see, the speedup increases rapidly at the beginning, and slows down 
when N becomes very large. This means it may tend to a constant. From 
Figure 5.8 we can see, when the subcircuit number N is very large the GPU 
time fpr vector version program is approximately 6.8xlO-3 N2, and for scale 
version is about 3.16xlO-3N3 . Figure 5.10 gives the curves of the CPU time 
for different versions of the program. As a reference, the GPU time for VAX 
11/780 is shown on the top of the figure.
5.3 Implications And Further Improvements
From the results of simulation we can see, when the vector pipelines of 
Cyber-205 is used to solve the large-scale circuit by MNA, a high speedup can 
be achieved. Since the setup time fpr the pipelines in Cyber-205 is long, when 
the number of subcircuits is smaller, the speedup is poor. Figure 5.9 shows 
that the increase of the speedup will slow down when N becomes very large, 
which means the speedup might tend toward a constant with a very large 
system. Because the CPU time of Cyber-205 is expensive, N takes limited 
values up to only 500 in the curves, and the corresponding overall speedup is 
more than 100. If more subcircuits were in the main circuit, the speedup might 
be higher than this. The vectorization in each calculation step is different, and 
the speedups for these steps are different too. In Fig.5.8 the total CPU time 
includes the time for calculation, data access and system overhead, such as 
page fault handling etc., the input-output time is excluded here. For the 
example circuits, the total time complexity for vector version is 0(N2), and for 
scalar version it is 0(N3).
A considerable amount of time has been used for page fault handling in 
our vectorized simulation. Table A6 shows that the execution time for scalar 
code is 340 seconds when N is 500, and the time for page fault handling is 12 
seconds. For vector code, the time is reduced to 4 seconds, but the time for 
page fault handling does not change. This means that reducing page faults 
becomes very important in the vectorized MNA.
CHAPTER 6 
CONCLUSIONS AND SUGGESTIONS
Our research findings are summarized below. Several suggestions are 
made for those who wish to conduct further studies on vectorized circuit 
analysis using supercomputers.
6.1 Concluding Remarks
A new Newton algorithm has been proposed to perfrom circuit analysis on 
vector Computers. The subcircuits are treated as the elements of a vector and 
are processed by a vector pipeline. Higher efficiency can be achieved over 
existing algorithms. We revealed the Speedup advantages of the new algorithm 
The symbolic processing and row column exchanges in MNA are exactly the 
same as in the SLNA and MLNA. The major advantages of the MNA are 
summarized beloW:
1) A large circuit is partitioned into multiple levels. There is only one main 
loop in the program. The number of iterations in the main loop is a constant 
and does not increase with the number of levels. The complexity in each 
iteration increases linearly With respect to the number of levels in the circuit.
2) The main loop of the MNA has a quadratic convergence rate in most cases. 
This is important for the overall speedup of the circuit simulation program, 
especially when the circuits are very large. This convergence rate is faster than
the pairwise quadratic convergence of MLNA reported in Lin [5]
3) The nonlinear equations for the main circuit and the subcircuits are not 
needed in the MNA. Only the parameters of circuit elements are used. This 
makes the input of the data of a large circuit much easier. The calculations for 
all the Jacobian matrices can be avoided in MNA which saves GPU time.
4) In out simulation experiments, the speedup is approximately 100, when 500 
subcircuits are in the main circuit. The modified bit matrix structure is 
attractive for large-scale circuit simulation. Only the nonzero elements and bit 
mask matrix have to be stored in memory. This results in higher efficiency on 
the Cyber-205 supercomputer.
47 • :
5) The vector pipelines of Cyber-205 have a long setup time. If the circuit 
contains a few subcircuits, the speedup would be poor. When the circuit 
contains many identical subcircuits, a significant speedup can be expected. 
The larger is the number of subcircuits in the circuit, the higher will be the 
speedup.
6.2 Suggestions For Further Research
From the results of our simulation experiments, two suggestions are made 
for continued studies:
1) Since the vectorized programs are written in CYBER 200 FORTRAN, they 
have to be translated to machine language by the compiler of Cyber-205. 
Therefore, the overall CPU time needed depends on the efficiency of the 
compiler. For example, page fault handling will demand CPU time. If the 
page size and page allocation are reasonable, the page fault occurrences may be 
reduced. Thus the CPU time will be also reduced. This problem can be 
alleviated by writing the program in assemble language. This requires us to 
know the machine architecture and operating system in more detail.
2) :The data input/output demands long time delays. Since the Cyber-205 
CPU time is expensive, it may be more advantageous to use a small scalar 
computer to perform - the input/output functions. The circuits in our 
'simulations are partitioned into only two levels. Since the number of iterations 
in the MNA is a constant, higher benefit may be obtained if more levels are 
used. The latency technique is not considered either. Latency may further 
improve the efficiency of the MNA on a vector supercomputer [20-23].
LIST OF REFERENCES
[1] N. B. G. Rabbat, A. L. Sangiovanni-Vincentelli, and H.Y. Hsiech, “A 
Multilevel Newton Algorithm and Latency for the Analysis of Large-Scale 
Nonlinear Circuit in the Time Domain,” IEEE Trans, on Circuits and 
Systems Vol CAS-26, pp. 733-741, Sept. 1979.
[2] J. L. Algiere, and K. Hwang,“Sparse Matrix Techniques for Largr-Ssale 
Circuit Analysis on the Cyber 205 Vector Processor,” School of Electrical 
Engineering, Purdue University, TR-EE 83-40, October 1983.
[3] Chung-Wen Ho, Albert E. Ruehli, and A. Brennan,“The Modified Nodal 
Approach to Network Analysis,” IEEE Trans, on Circuit and Systems, 
Vol. CAS-22, pp. 504-509, June. 1975.
[4] Kai Hwang, “Vectorization Methods for Larg-Scale Circuit Analysis on 
Vector Supercomputers,” School of Electrical Engineering, Purdue 
University, TR-EE 83-1, Jan. 1983,
[5] P. M. Lin, and A. W. Nordsiech, “A Study of the Convergence Rate of a 
Multilevel Newton Algorithm,” School of Electrical Engineering, Purdue 
Univesity, TR-EE 82-24, November. 1982.
[6] L. O. Chua, and P. M. Lin, Computer-Aided Analysis of Electronic 
Circuit: Algorithms and Techniques, Prentice-Hall, 1975.
|7| W. J. Mccalla, and D. O. Pederson, “Elements of Computer-Aided Circuit 
Analysis,” IEEE Trans, on Circuit Theory, Vol. CT-18, pp. 14-26,
. ,Jan-1971.
[8] L. M. Ni and Kai Hwang, “Vector Reduction Techniques for Arithmetic 
Pipelines,” IEEE Trans, on Computers, March 1985.
/ 49 '
[9] Control Data Corporation,CDC CYBER 200 FORTRAN VERSION 2 
Reference Mannal, Oct. 1982.
[10] Kai Hwang, and F A. Biggs, Computer Architecture and Parallel 
Processing, McGrW-Hill Book Co., New York 1984.
[11] T. Putnam, B. Whitson, and D. Seaman ‘Introduction to the CYBER 
205,” PUCC User Services, Purdue Univesity.
[12] Ibrahim N. Hajj, Ping Yang, and Timplthy N. Trich, “Avoiding zero 
pivots in the Modified Nodal Approach,” IEEE Trans, on Circuits and
Vol.CAS-28, pp. 271-279, April, 1978.
[13] Robert D. fiery,“An Optimal Ordering of Electronic Circuit Equations for 
ai Sparse Matrix Solution,” /EEE Trans, on Circuit Theory, Vol.C7-18, pp. 
40-50, Jan. 1971.
[14] A. W. Nqrdsiech, and P. L. Lin, “MULNA:A Program for a Multilevel 
Newton Algorithm,” School of Electrical Engineering, Purdue University, 
TR EE 82-23, August. 1982.
[15] William T. Weeks et at , “Algorithm for ASTAP-A Network Analysis 
Program,” IEEE Trans, on Circuit Theory, \o\.CT-20, pp. 628-634, Nov.
1973.“ v
[16] R. K. Brayton, F. G. Gustarson, and G. D. Hachtel, “A New Efficient 
Algorithm for Solving uDifferehtial Algebraic Systems Using Implicit 
Backward Differential Formulas,” Proc. IEEE, Vol. 60, pp. 98-108, Jan.
::i 1972-
[17] F. F. Wu, “Solution of Large Scale Networks by Tearing,” IEEE Trans, 
on Circuits and Systems, Vol. Cas-23, No. 12, pp. 706-713, December 1976.
[18] A. Sangiovanni-VmcentelIi, L. K. Chen, and L. O. Chua, “A New Tearing 
Approach - Node Tearing Nodal Analysis,” Proc. IEEE Int. Symp. on 
Circuits and Systems, pp. 143-147, April 1977.
[19] F. Odeh and D. Zein, “A Semi-Direct iiiethod for Modular Circuits,” Proc. 
1983 Inti. Symp. on Circuits and Systems, pp. 226-229, May 1983.
[20] K. Hwang, “Multiprocessor Supercomputers and Scientific Applications,” 
to appear IEEE Computer Magazine, July 1985.
[21] K. Hwang and Y. H. Cheng, “Partitioned Matrix Algorithms for VLSI 
Arithmetic Systems,”IEEE Trans. Computers, Oct. 1982.
[22] K. Hwang (Editor), Supercomputers: Design and Applications, IEEE
Computer Society Press, August, 1984.
[23] M. M. Hassouri and P. M. Lin, “A Study of a Semi-Direct Method for 
Computer Analysis of Large Scale Circuits,” School of Electrical 
Engineering, Purdue University, TR-EE 84^46, December 1984. v
APPENXIX A: INPUT DATA SETS AND SIMULATION RESULTS
The main circuit and subcircuits are all characterized by the linear 
equations as the form Ax = b in MNA. The matrix A for the subcircuits is 
shown |n Fig.A(a), Since the symbolic processing and the row column 
exchange do not be included in our simulations, these processing should be 
dome before the data input. After these processing, the A becomes that in 
Fig.A(b). The symbol x in Fig.A(b) indicate that there is a fill-in element. 
The bit mask matrix for subcircuit BS is as shown in Fig.A(c). In Fig.A(d) are 
the vector b for subcircuits, the b after row exchange, and the bit array bpS, 
the integer array RS indicating the first nonzero element in each row of A, 
The matrix A for main circuit is shown in Fig.A(e). Its dimension can be 
changed up to 1000. After symbolic processing and row column exchange, A 
becomes that in Fig.A(f). Figure A(g) show's the bit mask matrix for main 
circuit B. The vectors b before and after row change, the vectors bg and bp, 
and the integer array C for the main circuit are shown in Fig,A(h).
Table Al and A2 list the simulation results of MNA and SLNA as 
reported in Chapter 3. The large scale circuit simulation results on Cyher-205 
are given in Table A3 to A6.
Table A6. Times Needed for User, System and Page Faults Handling
Numberof
Subcircuits
User CPU Time*(s) System CPU Time(s) Net Page Faults(s)
Scalar Vector Scalar Vector Scalar Vector
:-r' 1.9 1.9 1.0 1.0 11.2 11.9
10 2.0 1.9 1.0 1.0 11.3 11.9
50 2.5 1.9 1.0 l.o 11.3 11.9
100 5.5 2.0 1.0 1.0 11.3 11.9
200 26.0 2.4 1.0 1.0 11.3 11.9
500 345.4 4.6 1.0 1.0 11.4 12.0
1000 12.0 "■ '. 1.0 12.3























R2 R4 R2 R3 R
1 0 0 0 0
1 -1 0 0 0
d 0 -gf 0 0
0 0 g l -gi 0
0 0 0 C,|o 0
1 1 1 p 0
-1 0 0 0 0
0 0-1 0 1-p 0
0 -a 0 -1 1
0 0 0 0 0
0 0 0 0 0
0 0000 
0 -1 0 0 0
0 0 0 -1 0






































































R2 R3 R'-4. 
-1 
r2







































































0 + —u e3 r4
o Y o
0 Y Y 0
-gj 0 '.,:;
' o :
1 0 0 0
0 1-a 1-0 1
0 -1 0 0
0 a -1 0
0 0 0 0
0 0 0 0
0 -10 0 































0 0 0 0 0
b 0 0 0 0
0 0 -1 0 0
-gy 0 0 -1 0
0 I X 0 0











1 0 0 0 0 0 0 0
0 1 0 0 0 0 o o
0 1 1 0 0 10 0
0 1 0 1 0 0 1 0
1 0 1 0 1 1 0 0
1 1 1 0 0 110
0 1 0 1 0 11 0
0 1 1 1 0 1 1 1
(e)
0 Ex 0 1
0 e2 o 2
0 -H 1 3
0 -jj 1 6
Ex 6 o 9
e2 o 0 13
-H o 0 18




o i o o 
o ... o i o
0 ... o 1 1
0 ... 0 10
0 ... 0 0 1
0 ... 0 0 1
0 ... 0 1 0
0 ... 0 10
0 ... 0 0 0
1 ... 1 0 0
0 10 0 0
0 10 0 0 
0 0 0 0 0 
0 0 0 0 0 
0 10 0 0 
0 10 0 0 
110 0 0 
0 10 11 
11111 














0 -Jio i 0
0 . • •
0 -H : i 0
-Jl E, 0 0
0 ■e2'; 0 0
Ei -J) 0 1 :
e2: 0 0 0
-jf 0 ■i 0 0
0 0 0
-Jid 0 0 0
* ■ -H 0 1
-Ji 0 0
Fig. A The input data sets: (a) to (h)
52.f




Variables in the main circuit Varianbes in the subcircuits
Vj vi : ; Jl ; ; ' vj Gj Jj
0 7.0 1.0 0.0 2.0 -1.0
'1 7.0 2.666667 -4.333333 2.666667 6.175439 -8.233918
' ■/'-2 S'-j/'- 7.0 2 123064 -4.876935 1.458028 4.990777 -5.050256
7.0 2011470 -4.988530 1.056501 4.701707 -4.4045713
V\' 4 ■ ■ 7.0 2.000203 -4.999796 1.001090 4.667315 -4.334630 ,
V: 5: 7.0 2.000000 -5.000000 1.000000




Variables in the circuit
vj vj Vj
v yo^y;?;" 7.0 ;-y 10 0.0
y 1 'V. 7.0 2.666667 2.666667 -4.333333
'; 2 7.0 2.123064 1.458028 -4.876936
V 3 .;;v: 7.0 2.011470 1.056501 -4.988530
y'.y^ 4 .^ 7.0 2.000203 1.001090 -4.999796
'Vw...5 i' 7.0 2.000000 1.000000 -5.000000





1 . 0,00023194 15.0 14.3 14.21430961 14.20918220
10 0.00022998 15.0 14.3 14.21217756 14.20724574
50 0.00022328 15.0 14.3 14.20303835 1419894491
100 0.00021724 15.0 14.3 14.19215019 14.18905557
200 0.00020844 15° 14.3 14.17156648 1417036002
500 0.00019112 15.0 14.3 14.11602041 14.11990955





1 0.00090818 0.00128569 -‘0.0151450 13 49213682 0.01307401
10 0.00092754 0.00128782 -0.01719880 1347826175 0.01512683
50 0.00101055 0.00129696 -0.02601451 13.43076520 0.02393841
100 0.00110944 0.00130785 -0.03653093 13.38800212 0.03445006
200 0.00129640 0.00132843 -0.05643029 13.32564756 0.05434058
500 0.00180090 0.00138398 -0,11017436 13.20295807 0.10806125
1000 0.00251765 0.00146289 -0,18656576 13.06817581 0.18441980




V, V2 ■ vs ! ' V4
: 1 ' 15.0 13.49213632 13.74011897 12.7147804
10 15.0 13.47826175 13.74139862 12.70100470
50 15.0 13.43076520 13.74577911 12.56384735
100 15.0 13.38800212 13.74972294 12.61139075
200 15.0 13.32564756 13.75547342 12.54948480
500 15.0 13.20295807 13.76678753 12.42768413




h h ■ *7 ■ *8
1 -0.02519762 -0.00005460 0.02539449 -0.00023194
10 -0:02517203 -0.00005763 0.02536542 -0.00022998
50 -0.02508417 -0.0000680 0.02526592 -0.00022328
100 -0.02500554 -0.00007734 0.02517634 -0.00021724
200 -0.02489053 -0.00009097 0.02504573 -0.00020844
500 -0.02466425 -0.00011777 0.02478873 -0.00019112
1000 -0.02441568 -0.00014721 0.02450643 -0.00017209
- ' , 52.i ■ \ :':u,
Table A5. CPU Times for the Execuation of Various Programs 
(a) Vector Code (Cyber 205)
Number of Update the Solving the Update the Solving the 





i 3.24 x 10"4 1.01 x 10~* 1.2 x 10"4 6.53 xlO'4 2.82 x 10"4 1.2 x 10"2
10 3.39 xlO"4 1.20 x 10~* 1.51 x lO"4 1.42 x10"* 3.03 x 10"4 1.7 x 10"2
50 3.55 x 10“4 2.01 x 10* 2.11 x 10"4 8.00 x 10* 3.98 x 10"4 5.49 x 10"2
100 4.23 x 10-4 CO © 00 X © 2.99 x 10"4 2.37 x lO"2 5.37 x 10"4 1.40 x lO"1
200 4.77 x 10"4 5.12 x 10 s 4.46 x 104 7.96 x 10"2 7.63 x 10 4 4.32 x 10"1
500 7.34 x 10 4 1.13 x 10"2 9.16 x 10"4 4.44 x 10"1 1.50 x 10* 2.29
1000 1.13 x 10"* 2.17 x 10"2 1.72 x 10"* 1.72 2.72 x 10* 8.76
(b) Scalar Code (Cyber 205)
1 1.60 x 10"4 9.35 x 10"4 2.11 x 10"4 8.7 x 10"4 2.42 x 10"4 1.2 xlO"2
10 9.16 x 10~4 2.40 x 10 s 2.90 x 10"4 4.13 x 10"* 7.39 x lO"4 4.24 x 10"2
50 4.32 x 10 s 9.21 x 10"* 7.49 x 10"4 1.07 x 10"V 2.99 x 10 * 6.21 x 10"1
100 8.58 xlO"* 1.77 x 10'2 1.32 x 10"* 6.7 x 10"1 5.8 x 10"* 3.52
200 1.71 x 10~2 3.47 x 10"2 2.46 x 10"* 4.73 1.15 x 10"2 23.96
500 4.27 x 10"2 8.58 x 10"2 5.90 xlO"3 68.43 2.84 x 10"2 342:95
(c) Scalar Code (Cyber 205 without use of bit processing)
■l 3.40 x 10'4 1.03 x 10"* 1.39 xlO"4 6.57 xlO"4 2.83 x 10'4 1.2 x 10"2
10 1.11 x 10"* 2.21 x 10"* 3.2 x 10"4 3.78 x 10"* 6.81 xlO"4 4.06 x 10"2
50 5.30 x 10"* 9.04 x 10"* 8.36 x 10"4 8.92 xlO"2 2.94 x 10 * 5.37 x 10"1
100 1.05 x 10"2 1.76 x 10 2 1.48 x 10® 6.75 x 10 1 5.76 x 10"* 3.55
200 2.10 x 10"2 3.46 xlO"2 2.77 x 10"* 4.78 1.14 x lO"2 24.25
(d) Scalar Code (VAX 11/780)
1 1.67 x 10"2 2.67 xlO"2 0.0 4.33 x 10 2 3.33 x 10"* 4.50 x 10"’
10 2.67 x 10~2 6.67 x 10"2 1.00 x 10 2 8.33 x 10"2 2.33 xlO"2 1.67
50 1.30 xlO"1 2.37 x 10'1 1.30 xlO"2 1.81 6.00 x 10"2 11.22
100 2.63 x lO"1 4.58 x 10"V 1.6 x 10 2 12.67 1.30 x 10"1 67.68
200 4.93 x 10'1 8.60 x 10"1 3.67 x lO"2 100.55 2.83 x 10 1 511.95
53
APPENDIX B: CYBER 200 FORTRAN INTRINSIC FUNCTIONS
Listed below are the CYBER 200 FORTRAN intrinsic functions used in 
this report. These description are taken directly froni reference [9]
Q8SCNT
Q8SCNT(v) is a specific scalar function that returns the number of 1 bits in 
the argument, The argument must be a vector of type bit. The result is of 
type integer.
For example, if bit vector VI consists of the elements 10 0 1 1, the result 
of Q8SCNT(Vl) is 3.
Q8SDOT
Q8SDOT(vl,v2) is a generic scalar function that returns the dot product of the 
twb arguments. The argumehts must be vectors and can be of type integer, 
real, or half-precision. If the arguments have different lengths, the excess 
elements of the longer argument are ignored. The result is of the same data 
type as the arguments. The result is the sum of the products of corresponding 
elements of the vector arguments.
■■^:.- /:':F<)r.iexample,.:if vector VI consists of the elements 0 1 3, and vector V2 
consists of the elements 2 2 2, the result of Q8SDOT(Vl,V2) is
(G*2j) (1*2) +(3*2), which is 8.
Q8VCMPRS(v,cv;u) is a generic vector function that creates a vector consisting 
of selected elements of the input argument V. Teh input argument v must be 
a vector and can of type integer, real, or half-precision. The input argument 
cv, which is used as a control vector, must be a vector of type bit. The output 
argument can be a vector of the same data type as the input argument, or an 
integer expression that specifies the length of the vector function result. The 
input arguments must have the same length The length of the vector through 
which the function result is returned is determined by the number of 1 bit in 
cv. o.. ■ "■ ; ■
54
The function result consists of all of the elements of the input argument v 
whose corresponding elements in the control vector cv contain a 1 bit.
For example, if input argument VI is a vector that consists of elements 2 
4 6 8, and input argument CV1 is a bit vector that consists of the elements 0 1 
0 1, the function reference Q8VCMPRS(V1,CV1;U1) assigns the values 4 8 to 
the output argument Ul.
Q8VPOLY
Q8VPOLY(vl,v2;u) is a generic vector function that computes a polynomial at 
several points, The input arguments must be two vectors or one scalar and one 
vector. If a scalar is used as an input argument, the scalar must be the first 
input argument. The input arguments can be of type real or half-precision. 
The output argument can be a vector of the same data type as the input 
arguments, or an integer expression that specifies the length of the vector 
function result. The input arguments can have different lengths. The length of 
the vector through which the function result is returned must be the same as 
the length of the input argument vl, or longer.
The input argument v2 contains the coefficients of the polynomial: the 
first element of input argument v2 is the coefficient of the highest order term of 
the polynomial, and the last element of input argument y2 is the coefficient of 
the lowest order term of the polynomial, which is the constant. The length of 
input argument v2 determines the order of the polynomial. The order is one 
less than the number of elements in input argument v2. The input argument 
vl contains the points at which the polynomial is to be evaluaed. The value of 
the first element of input argument vl is substituted for the variable in the 
polynomial, the polynomial is evaluated, and the result is placed in the first 
element of the function result. This is repeated for each element of input 
argument vl.
For example, if input argument vl is a vector that consists of the elements
2.0 3.0 5.0, and input argument v2 is a vector that consists of the elements 4.0
2.0 1.0, the function reference Q8VPOLY(Vl,V2;Ul) assigns the values 21.0
43.0 111.0 to the output argument Ul. These values were computed by 
substituting each element of input argument Vl for the variable in the 
polynomial defined by the input argument V2. The polynomial is:
4x2 + 2x + 1
55
Q8VSCATR
Q8VSCATR(v,i;u) is a generic vector function that creates a vector consisting 
of selected elements of the input argument v. The input argument v must be a 
vector and can be of type integer, real, or half-precision. The input argument i 
must be a vector of type integer. The output argument can be a vector of the 
same data type as the input argument v, or an integer expression that specifies 
the length of the vector function result. The input argument i and the vector 
through which the function result is returned must have the same length.
Each element of the input argument v corresponds to an element in input 
argument i. The elements in input argument i indicate to which elements in 
the function rsult the elements in input argument v are assigned. For example, 
if an element of i contains a 1, the element of input argument v that 
corresponds to that element in i is assigned to the first element of the function 
result. An element of the function result can be assigned more than one value; 
the last value an element is assigned is the value that it retains.
For example, if input argument VI is a vector that consists of the 
elements 2.0 4.0 8.0 8.0, input argument II is a vector that consists of the 
elements 1 4 4 2, and output argument Ul is a vector that consists of the 
elements 9.0 9.0 9.0 9.0, the function reference Q8VSCATR(Vl,Il;Ul) assigns 
the values 2.0 8.0 9.0 6.0 to the output argument Ul. The fourth element of 
the output argument is assigned the value 4.0, but is then reassigned the value 
6.0. The third element of the output vector is never assigned; therefor, it 
retains its previous value.
Q8VXPND
Q8VXPND(v,cv;u) is a generic vector function that creates a vector that 
consists of the elements of input argument v plus additional elements having 
the value 0 or 0.0. The input argument v must be a vector of type integer, 
real, or half-precision. The input argument cv, which is used as the control 
vector, must be a vector of type bit. The output argument can be a vector of 
the same data type as the input argument v, or an integer expression that 
specifies the length of the vector function result. The length of the vector 
through which the function result is returned must be the same as the length of 
the input argument cv.
Each element of the function result corresponding to an element in the 
control vector cv that contains a 0 bit is assigned the value 0. The elements of 
the function result corresponding to 1 bits in the control vector are assigned 
values from the input argument v. The leftmost values from input argument v
are used, and any excess values are ignored.
For example, if input argument Vl is a vector that consists of the 
elements 5.0 5.0 5.0, and input argument CVl is a bit vector consists of the 
elements 1 0 0 1, the function reference Q8VXPNI)(Vl,CVl;Ul) assigns the
57














































THIS PROGRAM IS USED TO DO THE LARGE CIRCUIT 
ANALYSIS USING THE MMNA ALGORITM. THE NUMBER 
OF SUBCIRCUITS IN THE CIRCUIT CAN BE FROM 1 
TO 1000. THE MAIN PROGRAM AND ALL SUBROUTINES 
ARE WRITTEN IN UECTOR UERSION.
UARIABLES:
NZ - A REAL ARRAY CONTAINING THE NONZERO 
ELEMENTS IN MATRIX A FOR MAIN 
CIRCUIT.
NZS - A REAL ARRAY CONTAINING THE NONZERO 
ELEMENTS IN MATRIX A FOR SUBCIRCUITS.
B - A BIT MASK ARRAY FOR MAIN CIRCUIT.
BS ^ A BIT MASK ARRAY FOR SUBCIRCUITS
BR - A REAL ARRAY CONTAINING THE RIGHT HAND 
SIDE OF THE EQUATION AX=B FOR MAIN 
CIRCUIT.
BRS - A REAL ARRAY CONTAINING THE RIGHT HAND
OF THE FIRST NONZERO ELEMENT IN EACH ROW OF 
i THE MATRIX A FOR MAIN CIRCUIT.
R - A NITEGER ARRAY FOR INDICATING THE POSITIONS 
FO THE FIRST NONZERO ELEMENT IN EACH ROW OF 
THE MATRIX A FOR MAIN CIRCUIT.
RS - A INTEGER ARRAY FOR INDICATING THE
POSITIONS OF THE FIRST NONZERO ELEMENT IN 
EACH ROW OF THE MATRIX A FOR SUBCIRCUITS.
D - A iNTEGER ARRAY CONTAINING THE POINTERS OF 
NONLINEAR DEUICES IN MAIN CIRCUIT.
DS - A INTEGER ARRAY CONTAINING THE POINTERS OF 
NONLINEAR DEUICES IN SUBCIRCUITS.
PO - A REAL ARRAY CONTAINING THE COEFFICIENTS OF
POLYNOMIAL FOR FUNCTIONS OF NONLINEAR DEUICES 
IN WHOLE CIRCUIT.
X - A RAEL ARRAY CONTAINING THE UARIABLES IN MAIN 
CIRCUIT.: ;
XS - A REAL ARRAY CONTAINING THE UARIABLES IN THE 
'' uSUBCIRCUITS. ■
ci - A INTEGER ARRAY INDTCaTiNG THE COLUMN ORDER 
EXCHANGE OF THE MATRIX A FOR MAIN CIRCUIT.
CIS - A INTEGER ARRAY INDICATING THE COLUMN ORDER 
EXCHANGE OF THE MATRIX A FOR SUBCIRCUITS.
NZP - A BIT ARRAY FOR INDICATING THE POSITIONS OF 
G IN THE NZ.
NZPS - A BIT ARRAY FOR INDICATIOG THEPOSITIONS OF 


















INTEGER I, J, K, M, MS, N, NS, NL,NLS,P,Y,BPG,BPP,




BIT B,BS,EC 1000),NZPC3100),NZPSC50),NZQC3100), 
1 BQ C110 0),BP C110 0),BPS CIO)
INPUT THE DATA
CALL INP CIBR, IBRS, X, XS, INZ, INZS, B, BS, R, RS, D, DS, Cl, CIS, M, MS, 
1 NZP,NZPC,NZPS,NZPSC,NZQ,NZQC,E,BP,BPC,Y,
1 BPS,BPSC,BQ,BQC,NL,NLS,N,NS,PO,BPG,BPP,BPU)











IF CNLS.NE.0) THEN 
TT3=SECQNDC)








C ^ CALCULATE THE EG AND EJ
DO 20 1=1.NS 
BRS(1,I;Y)=0.0
20 -■''...'■■.■...■CONTINUE':
DO 30 I=1»Y 
BRS(I»BPP)=X(BPU(I))
30 CONTINUE
C SOLUE THE SUBNETWORK












OBTAIN THE I WITH ZERO INPUT
iO a;y)=xs(i» bpg;Y >
CALCULATE THE BRS
P=0












SET THE SUBNETWORK INPUT
BO 70 1=1.Y 
BRS(I.BPP)=X(BPU(I))
CONTINUE
CALCULATE THE SUBNETWORK 
CALL SXY tNZS>BRS,BS,RSVNS.Y.XS)
C OBTAIN THE EG AND EJ
DO 90 1=1,Y EG(I)=-I0(I)/X(BPU(I)) 
EJ(I)=XS(T,BPG)-IO(I) 90 CONTINUE
61
C COPY THE BR
BRC1;N)=IBRC1?N) ; ^ {
TT4=SEC0ND()
T2=TT4-TT3




CALL UPNM CNZ* BR* NZP* NZPC* BPC> BP* PO>D* X* N*NL* Cl> M) 
ENDTF





CALL UPI C NZQ*Y,N,M*NZQC* EG * EJ,E,BQ*BQC *NZ *BR)
TT4=SEC0ND()
T3=TT4-TT3
LU DECOMPOSITION AND SOLUE THE MAIN NETWORK
TT3-SEC0NDC)
CALL LUM (NZ>X»BR»B>R*N) 
TT4=5EC0ND()
T4=TT4“TT3
SUBSTITUTE TO THE SUBNETWORK
TT3=SEC0ND()
DO 100 1=1,Y 
BRS C19BPP)=X (BPU(I)) 
100 CONTINUE




C PRINT THE RESULTS OF THIS ITERATION
C . -------- ------ ■--------------------------------------
WRITE(6* 6) L
WRITE(6* 7) (X(I+Y-1) *1=1*N~Y+1) 
WRITE(G*8) (XS(1»I)>1=1»NS) 
WRITE(6*9) T1»T2» T3»T4, T5
G FORMAT (1X,****//35X,*THE NUMBER OF ITERATION V, 15//
1 25Xfs****************************************************?*/)
7 FORMAT•C/3QX,/THE VALUE OF VARIABLE IN THE MAIN NETWORK V//
1 10Xf5F15.8//)
8 FORMAT (/30X,/THE VALUE OF VARIABLE IN FIRST OF SUBNETWORKS &/'
1 20X.4F15.8//)
62




C OUTPUT THE FINAL RESULTS
CALL OUT (N.NS.Y.XrXS.Tl)
STOP
END ■ ~ 'V' __ ' "J-
c
C FILE NAME UPNS
C
C THIS SUBROUTINE IS USED TO DO UPDATA THE
C SUBNETWORKS. ALL OF THE FUNCTIONS OF THE
C NONLINEAR DEUICE IN SUBCIRCUITS ARE WRITTEN
C AS THE POLYNOMIAL OF THE UOLTAGE U.
c ■





INTEGER DS(4» 4) *Y» CIS(1.0) > NS* NZPSCC8) *
1 NLS»MS* PtI»J» Z
BIT NZPSC50)




C CALCULATE THE GS AND JS
P=0



























C OBTAIN THE NZSO
C —----- —------------- — :
DO 40 1=1,P
NZSO(1115 Y)=GS(1,NZPSCCI); Y) 
40 CONTINUE











: FILE NAME LUS
THIS SUBROUTINE IS USED TO DO THE 










C CALCULATE THE FIRST ROW OF U
c . —-----------------—---------- --- ------------------------------------------------—
Q=1
DO 2 1-1=2 > MS





DO 10 K=2* NS
C CALCULATE THE KTH COLUMN OF L (TO 20)
C . -------- :----------------------- ------------- ------------------- - --------------







C FIND THE INDEX OF L AND U
C --------------------------------—---------------
IF (BTOL(BSdfP))) THEN 
QL=QL+1
IF (BTOLCBS(PfK))) THEN 
QU=RS(P)+Q8SCNT(BS(Pf1?K))-l
C CALCULATE THE SUM
c ----------------—-----------------




C OBTAIN THE L
C -------------------------
QL=QL+1
NZS(1,QL;Y)=NZS(1f QL 5 Y)-SUM(15 Y)
ENDIF
20 CONTINUE
C CALCULATE THE KTH ROW OF U
c —----- - - - - - - -———
IF (K.NE.NS) THEN
DO 120 J=K+lfNS 
IF (BT0L(BS(Kf J))) THEN





C FIND THE INDEX OF U AND L
IF (BT0L(BS(K,P)>) THEN 
QL=QL+1
IF (BTOLCBSCP,J))) THEN 
QU=RSCP)+Q8SCNT(BS<P,1;J))-1





C OBTAIN THE U
QQ=Q8SCNT(BS(K*1iJ))+QQ 
QL=QL+1








THIS SUBROUTINE IS USED TO SOLUE THE UECTORS 
Y AND X FOR THE SUBCIRCUITS.
SUBROUTINE SXY (NZS» BRS* BS>RS» NS» Y*XS)
RONNISE BSClOflO)
REAL NZSC1000,50),XS(1000,10),YSCIOOO, 10)V 




C FIND THE FIRST ELEMENT OF Y
YSClf1jY)=BRS(lf1;Y)/NZS(If I?Y)










DO 20 K=2* NS 
C INITIAL SUM
SUM(1JV)=0.0
C CALCULATE THE SUM
DD=RS(K)-1 
DO 40 J=1»K-l 





C OBTAIN THE Y
DD=DD+1
YSCl,K;Y) = (BRSa*K;Y)~SUMa;Y))/NZSa»DDJY) 
20 CONTINUE
C CALCULATE THE LAST ELEMENT OF X
XS(i»NS5Y)=YS(l.NS;Y)
C CALCULATE THE X (TO 70)
,'DO: 76'Kri»'NS-l 
C FIHD THE INDEX OF L
L=NS-K •
QQ=RS CL)+Q8SCNT(BS(L > 1! L ) ) -1 
C INITIAL THE SUM
SUMC15 Y)=0.0
C CALCULATE THE SUM V
DO 100 P=L+1»NS












C FILE NAME UPNM
C ■
C THIS SUBROUTINE IS USED TO DO THE UPDATA THE
C NONLINEAR SEUICES OF THE MAIN NETWORK. ALL OF
C THE FUNCTIONS OF NONLINEAR DEUICES IN MAIN




SUBROUTINE UPNM (NZfBR» NZP»NZPC,BPC.BPrPOrDfXrNfNL»Cl.M)
REAL NZC3100)»XC1100)»BR(1100)»PO(SO»5)»G(20)f 
1 TXC1100)*TIrU» JTC16)»GTf NZOC3100)f BROC1100)
INTEGER NZPCC20)>DC10>4)>BPC(10)>Cl(1100)f Nf 
1 NL» P> I» J» M
BIT NZPC3100)fBPC1100)
C FIND THE OLD ORDER OF X
C . — -------
TXC15 N)=Q8USCATR(X Cl;N),Cl(1;N)J TX C1;N)) 
C CALCULATE THE G AND JT (TO 20)
P=0
DO 20 1=1>NL
C FIND THE U ACROSS THE NONLINEAR DEUICES





C CALCULATE THE POLYNOME
GT=0.0
DO 30 J=lf10 
GT=GT*U+POCJf DC If 3))
30 CONTINUE
TI=0.0
DO 40 J=llf20 
TI=TI*U+PO(Jf DC If 3))
40 CONTINUE
C OBTAIN THE JT
68
JT(I)=GT*U~TI 
C OBTAIN THE G
P=P+1 
GCP)=GT 




C UPDATA THE NZ
G • ' ---- - -- - ---——*--- —---
NZO(1;P)=Q8USCATRC G C U P ).NZPCCUP);N20CUP))
g(i;p)=nzo(i;P)
NZ (1; M) =NZ Cl J M) +Q8UXPNB (G C15 P)". NZP CUM) 1NZOC15M))
C UPDATA THE BR
BRO ( U NL )=Q8USCATRC JTC U NL). BPCC U ND5BROC11 ML))
JT C15 NL)=BRO C1;NL)












THIS SUBROUTINE IS USED TO UPDATA THE INPUT 
OF THE MAIN NETWORK. ASSUME ALL OF THE 
SUBCIRCUITS HERE HAUE ONLY TWO PORTS.
SUBROUTINE UPI CN2Q«Y» N.M»NZQC»EG>EJ*E*BQ>BQC»NZ» BR)
REAL NZC3100)> BRC1100)* EGC1000)»
1 EJC1000),NZOC3100).EOC2000).BROC1100)
INTEGER P» L» Y»N*M»NZQCC2000)»BQCC1000)







DO 10 1=1.Y 
P-P+l
EO C P)=EG Cl)
















NZOC1; P)-Q8USCATR( EO (15 P)VnZQC( l;P); NZOC15 P)) 
EO(l;P)=NZO(i;P)
NZ C15 M)=NZ <1;M)+Q8UXPNDC EO Cl;P),NZQCl;M)5NZOC15 M))
UPDATA THE BR
BRO(IJY)=Q8USCATR(E J (1; Y),BQC(lSY);BROC15 Y)>
ej(i;y)=bro(i;Y)




THIS SUBROUTINE IS USED TO DO THE LU 
DEGOMPOSltlQN FOR MATRIX A FOR MAIN 
CIRCUIT AND SOLUE THE UECTORS Y AND X.
SUBROUTINE LUM CNZ.X*BR,B*R,N)
RONWISE 8(1100,1100)
REAL NZ(3100),X (110b)> Y( 1100),BRC1100), SUM (1100), 
1 TU t1100)*TL (1100),NZT(1100)
INTEGER RCl100), N, I, J, K, Q, QL, QQ, QU, P,DD, W
BIT B
FIND THE FIRST ELEMENT OF Y
Y(1)=BR(1)/NZC1)
CALCULATE THE FIRST RON OF U
Q=Q8SCNT(B(1,2;N-l))
NZ C 250)=NZC 2 J Q)/NZ C l)
DO LU DECOMPOSITION AND CALCULATE Y (TO 20)
DO 20 K=2,N
GENERATE THE UECTOR TU
TU(l;K“l)=0.0 
DO 30 1=1,K-i 










C CALCULATE THE SUM
TL(1J K-l)=Q8UXPND(NZ(QLSK-l')* B(I»1»K-l)5 TL( 1* K-l)) 
SUM(1)=Q8SD0T(TL(1»K“1)»TU(l» K-l))





C CALCULATE THE Y
.. QL=R(K)
C GENERATE THE UECTORTL
TLClsK 1>=QBUXPND(NZ(QL5K-1>»B(K» llK-DSTLUSK-l)) 
C CALCULATE THE SUM
SUM(1)=Q8SD0T(TL(15K-I).Y(1JK-1))
DP=Q8SCNT CB (K*1 * K- l)) +QL 
G OBTAIN THE Y
Y<K)=(BR(K)-SUM(1))/NZ(DD)





DO 80 J=1.K-1 







tO( 1; DD) -QiBUXPND (NZ (QU?DD)*B C- J.K+1;DD);TU C1J DDI) . 






NQU-Q8SCNT C B (K* K+.l'J.DD ) )





C ' CALCULATE THE X
X(N)=Y(N) ■ v,;:.
DO 120 K=1.N-1 
C CALCULATE THE SUM !.
' ■ L=N-K ; ■
QQ=R (L)+Q8SCNT(B (L«1J L))
tu(1jk)=Q8UxFndcnZ(qqjk>,B(l,l+i;k);tu(i;K)) 
sum(1)=q8sdot(TU(i;k).x(L+i;k))





c ■■ :: ;-
c FILE NAME OUT
C THIS SUBROUTINE IS USED TO OUTPUT THE







WRITE(6* 2) (XS(1 *J>* J=1* NS)
WRITE(6*3) T1
FORMAT (/25X,*THE FINALE RESULTS OF THE MAIN NETWORK*//
10X* 5F15.8/)
FORMAT C25X**THE FINALE RESULTS IN THE FIRST OF SUBNETWORKS*// 
20X* 4F15.8/)
FORMAT (////35X**THE TOTAL CPU TIME*//35X**T =*,F15.10/)












APPENDIX D: SIMULATION PROGRAMS IN SCALAR CODE
FILE NMAE MMNA 
THIS PROGRAM IS USED TO DO THE LARGE CIRCUIT 
ANALYSIS USING THE MMNA ALGORlTM. THE NUMBER 
OF SUBCIRCUITS IN THE CIRCUIT CAN BE FROM 1 
TO 1000. THE MAIN PROGRAM AND ALL SUBROUTINES 
ARE WRITTEN IN SCALAR UERSION.
UARIABLES!
NZ - A REAL ARRAY CONTAINING THE NONZERO 
ELEMENTS IN MATRIX A FOR MAIN 
CIRCUIT.
NZS - A REAL ARRAY CONTAINING THE NONZERO 
ELEMENTS IN MATRIX A FOR SUBCIRCUITS.
B - A BIT MASK ARRAY FOR MAIN CIRCUIT.
BS - A BIT MASK ARRAY FOR SUBCIRCUITS
BR - A REAL ARRAY CONTAINING THE RIGHT HAND 
SIDE OF THE EQUATION AX=B FOR MAIN 
CIRCUIT.
BRS - A REAL ARRAY CONTAINING THE RIGHT HAND
OF THE FIRST NONZERO ELEMENT IN EACH ROW OF 
THE MATRIX A FOR MAIN CIRCUIT.
R - A NITEGER ARRAY FOR INDICATING THE POSITIONS 
FO THE FIRST NONZERO ELEMENT IN EACH ROW OF 
THE MATRIX A FOR MAIN CIRCUIT.
RS - A INTEGER ARRAY FOR INDICATING THE
POSITIONS OF THE FIRST NONZERO ELEMENT IN 
EACH ROW OF THE MATRIX A FOR SUBCIRCUITS.
D - A INTEGER ARRAY CONTAINING THE POINTERS OF 
NONLINEAR DEUICES IN MAIN CIRCUIT.
DS - A INTEGER ARRAY CONTAINING THE POINTERS OF 
NONLINEAR DEUICES IN SUBCIRCUITS.
PO - A REAL ARRAY CONTAINING THE COEFFICIENTS OF
POLYNOMIAL FOR FUNCTIONS OF NONLINEAR DEUICES 
IN WHOLE CIRCUIT.
X - A RAEL ARRAY CONTAINING THE UARIABLES iN MAIN 
CIRCUIT.
XS - A REAL ARRAY CONTAINING THE UARIABLES IN THE 
SUBCIRCUITS.
Cl - A INTEGER ARRAY INDICATING THE COLUMN ORDER 
EXCHANGE OF THE MATRIX A FOR MAIN CIRCUIT.
CIS - A INTEGER ARRAY INDICATING THE COLUMN ORDER 
EXCHANGE OF THE MATRIX A FOR SUBCIRCUITS.
NZP - A BIT ARRAY FOR INDICATING THE POSITIONS OF 
G IN THE NZ.
NZPS - A BIT ARRAY FOR INDICATIOG THE POSITIONS OF 









PROGRAM MMNA(TAPE5=INPUT * TAPE6=QUTPUT)
ROWWISE BCl100*1100), BS(10,10)
REAL NZSC1000,50),NZC3100),POC20,5),XSC1000, 10),BRC1100)*
1 BRSC1000* 10)*IOC 1000)*EJC1000)*EG(1000>,IBRC1100)*IBRSC1000*10)
1 JTSC1000* 4)*INZC31O0),INZSC1000,50),X(1100)* T1* T2,T3* T4* T5
IMTEGER I, J* K* M* MS* N* NS* NL, NLS* P* Y* BPG, BPP*
1 DC 10,4), DSC4,4)* Cl (1100)* CISC 10),
1 R(1100)*RSC10)*NZPC(20)*NZPSC(8),
1 NZQCC 20 0 0),BPC(10)* BPSC(4)*
1 BQC(1000)*BPU(1000)
BIT B,BS> EC1000)* NZPC3100)* NZPSC50)* NZQC3100)* BPC1100) * BPSCiO)* 
1 BOC1100)
INPUT THE DATA
CALL INP (IBR*IBRS*X*XS*INZ*INZS*B*BS*R*RS,D,DS,Cl*CIS*M*MS* 
1 NZP^NZPC*NZPS*NZPSC*NZQ*NZQC,E*BP*BPC*Y*
1 BPS* BPSC,BQ,BQC,NL,NLS,N,NS,PQ* BPG*BPP,BPU)





COPY THE NZ AND NZS
DO 4 1=1*M 
NZCI)=INZ(I) 
CONTINUE
DO 3 1=1,MS 
DO 3 J=1,Y 
NZSCJ*I)=INZS(J,I) 
CONTINUE
UPDATA THE NONLINEAR DEUICES OF THE SUBNETWORK
IF (NLS.NE.O) THEN 
TT3=SEC0ND()




C LU DECOMPOSITION FOR SUBNETWORK
TT3=SECONDO
CALL LUS (NZS,BS,RS,NS,Y)
C CALCULATE THE EG AND EJ_
DO SO 1=1,NS 






C SOLUE THE SUBNETWORK
CALL SXY (NZS,BRS, BS,RS,NS,Y,XS)
C OBTAIN THE I WITH ZERO INPUT
DO 35 1=1,Y 
IO(I)=XS(I,BPG)
35 CONTINUE
C CALCULATE THE BRS
■■ P=0












C SET THE SUBNETWORK INPUT
DO 70 1=1,Y 
BRS(I,BPP)=X(BPU(I))
70 CONTINUE
C CALCULATE THE SUBNETWORK
C ■ -T--—-——————
C^LL SXY (N?S,BRS,BS,RS,NS,Y,XS)







C COPY THE BR
C ‘ ■ --- ------------- ;





C UPDATA THE NONLINEAR DEUICES OF THE MAIN NETWORK
C . ——  * —---------- --  ---------- ----------------------- -
TT3=SEC0ND()
IF (NL.NE.O) THEN
CALL UPNM (NZ?BR?NZP?NZPC#BPC#BP#PO#D,X# N»NL#Cl # M) 
ENDIF
C UPDATA THE INPUT OF THE MAIN NETWORK
C . —    —-------------- ; —*-------- ——
CALL UPI (NZQ#Y,N#M#NZQC# EG#EJ# E# BQ# BQC#NZ# BR)
TT4=SEC0ND()
T3=TT4-TT3
C LU DECOMPOSITION AND SOLUE THE MAIN NETWORK





C SUBSTITUTE TO THE SUBNETWORK
c ------------ —------ ---------------------
TT3=SEC0ND (.)
DO 100 1-1 # Y
BRS (I # BPP) =X (BPU CD)
100 CONTINUE




C PRINT THE RESULTS OF THIS ITERATION
C _____----—---- :--------- ------------------------ —-------- --
WRITE(6#G) L
WRITECG# 7) (XCI+Y-l),I=1#N-Y+1) 
WRITECG# 8) (XSC1» X)»1 = 1>NS) 
WRITEC6#9) T1#T2#T3#T4#T5
77
FORMAT (IX.****//35X,s*THE HUMBER OF ITERATION 15//
E5X»^a-**************************************************^)
FORMAT (/30X,*THE UALUE OF UARIABLE IN THE MAIN NETWORK */✓'• 
10X»5F15i8//)
FORMATC/aOX^THE UALUE OF UARIABLE IN FIRST OF SUBNETWORKS *// 
20X.4FI5.8//)








c r'''v■ ' ■'■-■..v.'..'' -:■ ■
C , FILE NAME UPNS
C ■. • ■ ■■■■■" ■■■ ■ ■' '■ >. 'V ■-
C THIS SUBROUTINE IS USED TO DO THE UPDATA
C SUBCIRCUITS. ALL OF THE FUNCTIONS OF THE
C NONLINEAR DEUICES IN SUfiCIRCUITS ARE WRITTEN
C AS THE POLYNOMIAL OF THE UOLTAGE U.
■C' ■ ' .• i --r i: ;■ ■■■
G
1











C FIND THE OLD ORDER OF XS
DO 10 1=1,NS 
DO 10 Z=l,Y ;
TXSCZ,ClS(I))sXS(Z,lT r v 
10 CONTINUE
C CALCULATE THE GS AND JS
■. -''DO.' 20 Z=1,Y
P=0 '.•.’r.: - V .■■■■;' ■■>■■■
DO 25 1=1,NLS 
IF (DS(I,4).EQ.O) THEN 
US==TXS (Z, DS C 1 ,1))
ELSE
78
US=TXSCZ,DSC 1,1))-TXSCZ, DSC 1,2)) 
END IF
C CALCULATE THE POLYNOME





C OBTAIN THE JTS
C
TIS=0.0 
DO 35 J=11 , 20 
TlS=TIS*US+POCJ, DSC 1,3)) 
35 CONTINUE
JTSC^,I)=GTS*US-TIS
C OBTAIN THE GS
P=P+i
GSCP)=GTS '




C OBTAIN THE NZSO
DO 40 1=1,P 
NZSO Cl) =GS C NZPSC CD)
40 CONTINUE
C UPDATA THE NZS
c ------ -----------------------------—
P=0
DO 20 1=1? MS
IF CBTOLCNZPSCI))) THEN
P=P+1





c-------------- --------------------—------------------------------------------------- ---------------------------------------- -------------—--------------------"
C
C FILE NAME LUS
C
C THIS SUBROUTINE IS USED TO DO THE LU
C DECOMPOSITION FOR THE MATRIX A OF THE
C SUBCIRCUITS,
C





INTEGER RS(10) * NS * Q* QQ* QL» QU*I* K* P* W* Z> J* Y 
BIT BS
C CALCULATE THE FIRST ROW OF U
L-' Q=1 •
DO 2 1-1=2* NS







C DO LU DECOMPOSITION (TO 10)
DO 10 K=2,NS
C CALCULATE THE KTH COLUMN OF L (TO 20)








C FIND THE INDEX OF L AMD U
IF (BTOLCBS(I.P))) THEN 
QL=QL+1 ■ L
IF (BTOL(BS(P*K))) THEN 
QU=RS(P)-1 
DO 50 KK=1*K










DO 70 Z=l* Y
C OBTAIN THE L




C CALCULATE THE KTH ROW OF THE U
c -------------—------- --———
IF (K.NE.NS) THEN 
DO 120 J=K+1? NS 
IF CBTOlCBSCKf J))) THEN
C INITIAL SUM
C ——----





C FIND THE INDEX OF U AND L
C ' -——---------------  —




DO 150 KK=1> J
IF (BTOLCBSCP*KK))) QU=QU+1 
150 CONTINUE
C FIND THE SUM
C --------------- .
DO ISO Z=19Y





C OBTAIN THE Uc
QQ=QL 
QL=QL+1 
DO 170 KK=K> J
IF (BTOLCBSCK*KK))) QQ=QQ+1 
170 CONTINUE
DO 180 Z=l» Y 









c -T 1 ;■■
C ^ FILE NAME SXY
v.' C ' : T ■ V
C THIS SUBROUTINE IS USED TO SOLUE THE HECTORS
C Y AND X FOR THE SUBCIRCUITS.





• 1 Z» J.DD.Y
BIT BS
C FIND THE FIRST ELEMENT OF Y
DO 10 2=1.Y
YS(Z. 1)==BRS(Z. 1 )2NZS(Z. 1)
10 CONTINUE
C CALCULATE THE Y (TO 20)
DO 20 K=2»NS
■:;./CV'>/^''':lNlTlAL'SUf1....;v-"
DO 30 Z=1^YSUrt(Z)=o.o :
■; 30 CONTINUE \
■!/::: c*€cuLate the Sum v
.' DD=RS(K)-1 •'
DO; 40 J=1.K-1 
IF (BTOLCBSCK.J))) THEN 
. DD=DD+1




; 40 : :■ CONTINUE
■ ; C \ OBTAIN THE Y ;
DD=DD+i 
DO 20 Z=l.Y
YS(Z» K) = (BRS(ZfK)-SUM(Z))/NZS(Z.DD)
20 continue
C CALCULATE THE LAST ELEMENT OF X
82
DO 60 Z=l»Y 
XS(Z* NS)=YS(Z* MS)
GO CONTINUE
C CALCULATE THE X (TO 70)
DO 70 K-l.NS-1 




IF (BTOL(BS(L*P>)) QQ=QO+l 
80 CONTINUE
C INITIAL THE SUM
DO 91) Z=1.Y 
SUM(Z)=0.0 
90 CONTINUE















END ■ ___________ ____
C "
C FILE NAME UPNM
C
C THIS SUBROUTINE IS USED TO DO THE UPDATA
C MAIN NETWORK FOR NONLINEAR DEUICES. ALL OF
C FUNCTIONS OF NONLINEAR DEUICES IN MAIN
C CIRCUIT ARE WRITTEN AS THE POLYNOMIAL OF
C UOLTAGE U.
C ____ ____________ ■ _ _ ■










G ; CALCULAE THE G AND JT (TO 20)
O'.p=0 ■' -
DO 20 1=1.NL 
' IF CD Cl. 4) .EQ.O) THEN 
0:.v'\0. ■ -O U=TX(D(I.l))
ELSE
U=TX(D(I.1))-TXCDCI»2))
■ ' ENDIF ■:
; C CALCULATE THE POLYNOME
GT=0.0 









C OBTAIN THE G
pSp+1 o’.'.
G(P)=GT
IF (D(I.4).EQ.O) GO TO 20 
TO P=P+1 ' :
. G(P)=-GT 0-0 ■
20 CONTINUE
C OBTAIN THE NZO
DO 40 1=1.P 
NZOCI)=G(NZPC(I))
40 CONTINUE





P=0 ; \ "v,:— • r.-'
DO 50 1=1> M 




C UPDATA THE BR
■: P=o
DO GO I=l>N 
IF (BTOLCBPCI))) THEN 






c v ; ■ •
C * FILE NAME UPI
c : ■■ : : , ■:, . v
C THIS SUBROUTINE IS USED TO UPDATA THE INPUT
C OF THE MAIN CIRCUIT* ASSUME ALL OF THE
C SUBCIRCUITS HERE HAUE ONLY TWO PORTS.
C , - . ■■■■•; '






C CALCULATE THE EO
P=0
DO 10 1=1,Y 
' P=P+1 '
EOCP)=EGCI)




C OBTAIN THE NZO




DO 30 1=1 >M
C UPDATA THE NZ





C OBTAIN THE BRO
DO 40 I-l»Y 
BRO( I)=EJ(BQC(I)) 
40 CONTINUE


















THIS SUBROUTINE IS USED TO DO THE LU 
DECOMPOSITION FOR MATRIX A FOR MAIN 
CIRCUIT AND SOLUE THE UECTORS Y AND X.
SUBROUTINE LUM(NZ.X.BR»B»R.N)
ROWWISE B(llOO.llOO)
REAL NZ(3100)»X(1100)> Y11100)»BR(1100)»SUM» 
7l TU<1100),TL(1100)
INTEGER RdlOO)»N* I * J» K» QL» QU-> Q»
-V:; ; 1-:'. QQSP, DD» W
BIT B
£ FIND THE FIRST ELEMENT OF^Y
Y(1)=BR(1VNZ(1)
> -/' yPAL^^ Uy, / : /vV'
Q=1
DO 10 W=2»N 




C DO LU DECOMPOSITION AND CALCULATE X Y (T0_20)
86
DO SO K=2.N
C GENERATER THE UECTOR TU
DO 30 1=1,K-l 
IF (BTOLCB(I,K))) then 
QU=R(I)-l 
DO 35 P=1,K











C CALCULATE THE SUM
SUM=0.0 
DO 50 P=1»K-1 
IF (BTOLCBCI»P))) THEN 
QL=QL+1
IF (BTOL(B(P»K))) SUM=N2(QL)*TU(P)+SUM 
END IF
50 CONTINUE





C CALCULATE THE Y
SUM=0.0
DD=R(K)-1

















€ — CALCULATE THE KTH ROW OF U (TO 80)
IF (K.NE.N) THEN 
DO 80 J=K+1»N 
IF (BTOLCBOOJ))> THEN
C CALCULATE THE SUM
sUMtO.o 
DO 90 1=1.K-l 
IF (BTOL(B(I»J))) THEN 
QU=R(I)-l 




' ' ENDIF :
30 CONTINUE;
C OBTAIN THE U
V qu=QL -
DO 110 P=K* J
IF CBTOL(B(K*P))) QU=QU+1
110 CONTINUE





c ' -x -
■ X(N)=Y(N)
DO 120 K-l.N-1 
C FIND THE INDEX OF U
L=N-K 
QQ=R(L)~1 
* DO'130 P=l» L
IF CBTOL(B(L*P))) QQ=QQ+1
130 CONTINUE
C calculate THE SUM
; SUM=0.0
DO 140 P=L+lvN 















THIS SUBROUTINE IS USED TO OUTPUT THE FINAL RESULTS OF MMNA PROGRAM.
SUBROUTINE OUT CN»NS>Y.X.XS.Tl)
REAL T1»X(1100)»XS(1000*10)





1 FORMAT (/25X>/THE FINALE RESULTS OF THE MAIN NETWORK///
1 10X.5F15.8//)
2 FORMAT (/25X.ZTHE FINALE RESULTS IN THE FIRST OF SUBNETWORKS///
1 20X* 4F15.8//)
3 FORMAT (////35X*/THE TOTAL CPU TIMEZ//35X,ZT =Z»F18.10/)




APPENDIX E: THE PROGRAM FOR GENERATING 
INPUT DATA SETS









THIS SUBROUTINE IS USED TO GENERATE THE DATA OF THE 
EXAMPLE CIRCUIT. LET THE SYMBOLICA PROCESSING AND THE 





















: . DO S 1=1,N DO 5 :J=1,N ■B(I+U)=B*0*
5 ; CONTINUE
DO 6 1=1, Y ■■■:.
; B(I,I)=B*1*B(I» Y+8)=B^1>!
B(N,I)=B^1^G CONTINUE
v DO 7 1=1,9 READCS, 1) CBB(K),K=1,9)
DO 7 J=l,9 '7 IF (BBCJ).NE. 0) B( I+Y, J+Y) =B^ls«s 
DO 8 1=1,8READCS,!) (BB(K),K=1,8)
DO 8 J=l,88 IF (BB(J).NE.O) BS(I,J)=B/l?i





READ(5,2) (INZ(2*Y+I),1=1 READ(5,2) (INZ(3*Y+1+31), 
READ(5,2) CiNZSCl, I ) , 1-1, FORMAT(5F10.5)
10













DO 14 1-1> 27NZPS(I)=B*0*
15




























XSd, 1 )=15.0 XSd»2)=5.0 
XSd»3)=7.0 XSd. 4)=4.0
DO 19 1=1.Y DO 20 J=l»2720 INZSd.J)=INZSd, J) XSd,2)=XS(1.2) 
XSd,l)=XSd,l) XS(I.3)=XS(1.3) XSd,4)=XSd,4) 
IBRSC1.1)=IBRS(1.1)19 CONTINUE
DO 21 1=1.921 R(I+Y)=2*Y+RR(I)
XCY+l)=15.0 X(Y+3)=6.0 X(Y+2)=14.3 
X(Y+8)=5.G
DO 22 1=1,20 
DO 22 J=1.222 PO(I»J)=0.0
PGd0,l)=0.001 P0<19,l)=0.001 
P0(20»1)=0.0005 PO(2,2)=2.205 PO(l1,2)=0.245 




1 2 3 5 6 7 8 4 31 2 4 7 11 15 21 28 32
1 2 3 G 9 13 18 221 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0
1 0 1 0 0 1 0 O 0
0 1 1 1 0 1 0 0 00 1 0 1 1 1 0 0 0
1 0 1 1 0 1 0 1 1
1 0 1 0 1 1 1 1 10 0 ■ -i: 0 0 1 0 1 1
0 0 i 0 O' 1 0 1 1
1 0 0 0 0 0 0 0
0 1.;:. ■ 0 0 0 0 0 0
0 1 i 0 0 1 0 0
0 1 0 1 0 0 1 0
1 0 1 0 1 1 0 0
93
1110 0 1
0 1 (3 1 0 1















1 3 1 1
2 3 :3 1
3 4 ;3 1
2 4 ;2 1
o
0
1
0.0 0.0
0.1101 0.0
0.0 -0.00001
-0.00001 0.01
1.0 1.0
0.0 0.0
-1.0
0.0 -1.0
0.Q2 -0.02
-0.00001 0.02001
0.00201 0.5
-0.00001 0.5
i
i
l
0
1
0
5
0
0
00101
0
0
02
00001
00001
