A systolic array implementation of discrete relaxation algorithm by Henderson, Thomas C. & Wang, Wei
A  S y s t o l i c  A r r a y  I m p l e m e n t a t i o n  o f  
D i s c r e t e  R e l a x a t i o n  A l g o r i t h m
UUCS-TR-86-008 .
Wei Wang*, Jun Gu, and Thomas C. Henderson
*Department of Electrical Engineering 
Department of Computer Science
University of Utah 
Salt Lake City, UT 84112
12 March, 1986 
A b s t r a c t
Discrete Relaxation techniques have proven useful in solving a wide range of problems in digital signal 
processing, artificial intelligence, machine vision, and VLSI engineering, etc. A conventional hardware 
design for an 8-label 8-object Discrete Relaxation Algorithm (DRA) requires three 4K memory blocks 
and the maximum execution time of over seconds and minutes, which makes such a DRA hardware 
implementation infeasible. A highly parallel systolic array for the computation of an 8-label 8-object 
DRA problem has been developed. This realization eliminates the 12K memory requirement and performs 
DRA computation at the worst case in microseconds. The circuit requires about 6,382 transistors. Major 
design issues and chip descriptions are described in this paper.
This work is supported in part by NSF Grants MCS-82-21750, DCR-85-06393, and DMC-85-02115; and in part by a University 
of Utah Research Fellowship.
UUCS-TR-86-008 
T a b le  o f  C o n te n ts
2 .D isc re te  R e la x a tio n  A lg o rith m  (D R A )
2.1 Boolean Formulation of Discrete Relaxation
2.2 An Example
2.3 The Hardware Implementation Problem for DRA
3 .Im p le m e n ta tio n  o f  D isc re te  R e lax a tio n  A lg o rith m  U sing  A Systo lic  A rra y
3.1 Complexity Analyses
3.2 Two Considerations for DRA Hardware Design
• Concurrency and Communication
• Simple and Regular Design
3.3 A Highly Parallel Reformulation for DRA
• Constructing the Parallel Computation Tree
• Speeding up the Iteration
• Introducing Time Dimension in Computation
3.4 Basic Principles and Implementations for DRA2 Circuit
• System Architecture and Block Diagram
• Systolic Cells and Array Design
• Circuit Features and Design Techniques
3.5 PPL Layout
3.6 Pin Descriptions and Interfacing with CPU
• Pin descriptions






4. T e s tin g
• Testing for Systolic Airay
• Testing for Iteration Process and Control Module
5. C o m p a riso n  w ith  A  C o n v en tio n a l D esign
5.1 A Brief Description of the DRA1 Design
5.2 Comparisons
6. F u r th e r  A d v a n c e d  D evelopm en t
7. C o n c lu sio n s
8. A ck n o w led g m en ts  
R e fe ren ces  
A p p en d ix
• PPL Simulation File for Region Coloring Problem
UUCS-TR-86-008 Page 3
L is t  o f  F ig u r e s
Figure 1: Leaf Node for Computing ljp x A-Ck.p)
Figure 2: Modified Leaf Node Computation for 1  ^x Ajj(k,p)
Figure 3: A Parallel Tree Structure for Computing n* 1^
Figure 4: Circuit Block Diagram for DRA2
Figure 5: Basic Principle of the Systolic DRA2 System
Figure 6: Two Cells in DRA2 Systolic Array
Figure 7: Construction of Systolic Array Using Cell-A and Cell-B
Figure 8: Computational Wavefront Pipelining and Circulation for Interleaved Processing
Figure 9: Broadcasting Scheme for Signal
Figure 10: Self-Timed Synchronization
Figure 11: State Graph of Finite State Machine
Figure 12: The PPL Layout of DRA2 Chip
Figure 13: The PPL Layouts for Several Circuit Modules
Figure 14: A New PPL layout Using Full-Custom Designed Cell-A and Cell-B
Figure 15: Pinout Diagram
Figure 16: Interfacing Block Diagram
Figure 17: DRA 1 Functional Block Diagram -s
Figure 18: Initialization State Diagram
Figure 19: Loop State Diagram
Figure 20: DRA1 Chip Layout
Figure 21: DRA1 Chip Pin Out Diagram
Figure 22: The Comparisons with DRA1 Design
. J j '"  •■■'Cm. ■
• 'S o;ffT MS i\>i:h¥ . !<; -.-iTi-Ste’/?. Oi zt..tv-T
■ S I*.'- ’ ■ i0 fnhwJOt 0
Relaxation is a very general computational technique for a wide range of theoretical and engineering problems. 
Since its invention many years ago it has demonstrated powerful and extensive applications in many areas. Some of 
them are listed below:
1. Digital Signal Processing: such as digital signal and digital image filtering;
2. Mathematics: (1) for solving linear and certain partial differential equations; (2) finding out the 
objective function; (3) performing linear programming and optimization;
3. Artificial Intelligence: doing heuristic optimal search and propagating numeric constraints, etc.;
4. VLSI Engineering: for building various kinds of relaxation simulation tools and for developing a 
hardware accelerator;
5. Computer Vision: dealing with the problems such as graph homomorphism, graph coloring and image 
understanding. For line finding, stereopsis, line-labeling, and semantics-based region growing, etc.;
6. Robotics: for solving its vision problems;
7. Mechanics: in computing stresses,etc.
For a review of the numerous applications of relaxation processes see [11,12,13,14,15,16].
Classical relaxation (CR) was introduced by Southwell in 1940 [11] and the symbolic (as opposed to numeric) 
versions of relaxation (SR) were introduced in the mid-seventies [12]. The version used here is that described by 
Henderson [Note on Disc]. The Discrete Relaxation Algorithm (DRA) is a restriction of the classical relaxation 
process to systems of Boolean inequalities which take values over the two element set {0,1). One of the significant 
technique resulting from the introduction of DRA is that both classical and symbolic relaxation algorithms are 
directly executable in silicon subroutines, thus making many real-time relaxation applications feasible. The project 
described in this paper is the first successful hardware implementation of this algorithm.
UUCS-TR-008 , Page 4
l.Introduction and Motivation
UUCS-TR-008 Page 5
2.1 Boolean Formulation of Discrete Relaxation '
Instead of seeking a real number solution in a numerical relaxation situation [13], the solution to be found in discrete 
relaxation case involves the assignment of a set of labels at each unknown such that some constraint relation among 
the labels is satisfied by neighboring unknowns. Whereas the unknowns in numerical relaxation take on real 
number values, the unknowns in a labeling problem take on a Boolean vector value with each element in the vector 
corresponding to a possible label. Boolean vector operations are denoted by x, t, *, + and • which represent 
complementation, vector multiplication, transpose, Boolean "and," Boolean "or," and Boolean vector dot product, 
respectively. Let
1. U = {uj,...,un} be the set of unknowns,
2. A = be the set of possible labels,
3. A; = be the column vector describing the set of labels (i.e., zero or one) possible for uit where 
lj=l if Aj is compatible with u;; 0 otherwise.
4. C be an m by m compatibility matrix for label pairs, where C(ij)=l if \  is compatible with Xj; 0 
otherwise.
5. Ajj = (Aj x Ajl)*((Nei(ij)’E+C) be an m by m compatibility matrix for ^  and Uj, where E is the m by m 
matrix for all l ’s, and Nei(ij) = 1 if u; neighbors Uj; 0 otherwise.
6. At denotes klh row of A;i.* 1J
A labeling is a vector L = (L j,...^ )1, where L; = (lu .... 1 ^ )  in Aj is a Boolean vector with ly = 1 if label Xj is a
possible label for object u;; 0 otherwise. A labeling is consistent if for every i and k:
2.Discrete Relaxation Algorithm (DRA)
It can be rewritten as:
1 ^ 1 ^ %  / I " ,  dJP^ P ) ) l  (2 )




Z  ( l n p * A - , „ ( l , p ) )  
p -  1
ln
< ^2 * Z  (l ip* A a(2>P)) />- 1 * . . .  *




_ P ~ l
m
Z  U V *A,n( '">P))  






* A u ( l ) '
< i^2 L i . A , , ( 2 ) ' * . . .  *
* A ln(2) '




where the column vector:
P=Y[nH  ({[Ljjx [A ij(l)...A ij(m)]}t).
Gathering together the Lj’s, i=l,n, we have
L<L*P
This formulation emphasizes the relation to classical relaxation. The relaxation is achieved by repeating
L<^L*P











Suppose that we are analyzing a picture of a scene, with the aim of describing it, and that we have detected a set of 
objects uj,...,un in the scene, but have not identified them unambiguously. The relationships that exist among the 
objects are used to eliminate the ambiguity.
An example for eliminating the ambiguity in a region coloring problem is given here to demonstrate these ideas and 
computation procedures. For simplicity, consider the case of three regions to be colored red, green or blue with the 
constraints:
1. Region 1 must be red. ,
2. Region 3 must be blue, and
3. No two regions may be colored the same color.
Thus, u; = Region i (for i=l,2,3) and:
U = {u1,u 2, u3}
where Aj is red, A^  is green, and A3 is blue. Since region 1 must be red, we have:
A ^ tlO O ]1
and since region 3 must be blue:
A3=[0 0 l ]1
Finally, since there is no restriction on region 2’s color, we have all possibilities:
A2=[l 1 I ]1
Since only similar colors are incompatible, we have:
C=
^0 1 1 ^
10 1
v 1 1 0 .
(10)
UUCS-TR-008 Page 8
for different objects, and
C=
f 100^1 
0 1 0  
00  1 J
(11 )
for the same object
We see then that C actually depends on the objects under consideration; i.e., technically, we should write C- which 
is identified as:
C ij =
f Nei(ij)’ Nei(ij) Nei(i j) N 




[ 0 if Region i does not neighbor Region j,
Nei(ij)= •{ (13)
I 1 if Region i does neighbor Region j.
Now we can calculate A- as:
An =([l 0 0]‘ x [1 0 0])*((0’E)+C)=
f100^ fiool f 1 00^
000 * 010 = 000
1,000, Loo 1 J t o o o )
(14)
A12=([10 0]l x [ l  1 1])*((1’E)+C)=
f 1 1 0 f o i l ' f o i n000 ►101 = 000
1,000 J [ w o , 1,000 )
(15)
A,3=([l 0 0]‘ x [0 0 1])*((1’E)+C)= f o o O
f o i n f o o O
000 ¥ 101 = 000




This says that the color red is all right for Region 1. To determine if the color red is possible for Region 2, we must 
find l2V
For i=2 and k=l: . ,
/2i(n)< /2i(n-1)*[/ii(n-1)*A2i(l,1)+/i2(n‘1)*A2l(1'2)+/13(n'1>A21<1-3)l






1 < 0 which is false.
Thus, l2\ must be set to zero. Likewise, for i=2 and k=3, /23 is set to zero, and blue is not a possible label for Region
2. Finally,
For i=2 and k=2:
/22(n)^ /22(n‘1)*t/ll(n' 1)*A2l(2-1)+/12(n‘1)*A2l(2^ )+/13(n' 1)*A2l(2-3)]
*t/21(n-1>*A22(2,l)+/22(n-1)*A22(2^)+/23(n'1)*A22(2-3)] ( 25)
n /3l(n' 1)*A23(2-1)+/32(n' 1)*A23(2-2)+/33(n' 1)*A23(2.3)]
l<l*[l*l+0*0+0*0]
15 1
15 1 which is true
*[1*0+1*1+1*0]
UUCS-TR-008 Page 11
1 < 1 which is true.
We see then that the value of /j j, l^ , and /33 are not affected by the change of /21 and /23 to zero. In fact, the system 
of equations stabilizes after the change of l2\ and /23, and the result is /lt = / ^  = /33 = 1, while all other hypotheses 
are zero. Thus, the only consistent labeling is to label Regions 1,2 and 3 the colors red, green and blue, respectively.
2 3  The Hardware Implementation Problem for DRA
The problem of DRA Hardware Implementation 1 (DRA1) has been defined as finding out the labeling matrix L (n
*[0*0+0*0+l*l]
and unchanged label pairs' compatibility matrixes C;i and in equation (8) for every i and j (i j  = 1,2,..., n). 
3 .Im p le m e n ta tio n  o f  D isc re te  R e la x a tio n  A lg o rith m  U sing  A Systo lic  A rra y
3.1 Complexity Analysis
A conventional design DRA1 for an 8-label 8-object DRA problem is presented in section 5.1 and [2]. The 
computational strategy used in that design is to serially compute each intermediate element of matrixes Aij(p,q) and 
/jj and periodically read and write L, A, Ajj(p,q) and C- from and into memories. Since the computation mechanism 
imbedded in this design is purely an HO bounded computation, the upper bound of execution time is on the order of 
hours for an NMOS process. Finally the complete system takes 3 separate chips (totally about 80,000 transistors). 
This design has revealed the inherent computation complexity for DRA’s hardware implementation.
= m):
\
L—(L iL 2,...,L -,...Ln)t — (2 6 )
'v ^ nl* n^n/
for the predefined computational model provided by the initial labeling matrix.
A -(A jvA2,...vAn)t — (2 7 )
Referring to equations (14) to (22), in order to store the initial labels A, matrixes C-, and the intermediate results of
all elements of matrixes A;j(p,q) ( i, j, p, q = 1....n), the space complexity is on the order of
0(2n2 + 3n4) = 0(n4). (2g)
For practical application, the label number could be 8, 16 or 32, thus the bit memory requirements for these different 
cases are 12K, 48K and 192K, respectively. As shown in design [2], this has added to the circuit size and which has 
been a bottleneck when n is large.
The time complexity can be estimated from equations (23) to (25). During each iteration, at least 2 x 4 x n4 x n read 
and write memory operations will need to be performed. Assuming t ^  = twrite = 500 ns for an NMOS process, the 
computation time complexity of each iteration is 0(n5) (taking the assumption that the unit time is 500ns). 
Multiplying the worst case iteration times 0(n2) [12], which is determined by the feature of the computational 
model, the execution is terribly slow.
3.2 Two Considerations for DRA Hardware Design
Two considerations in designing DRA have become critical and challenging.
1. Concurrency and Communication
It should be clear that any attempt to speed up an I/O-bound computation like design [2] must rely on an increase in 
the memory bandwidth. Since the technological trend clearly indicates a diminishing growth rate for device speed, 
any major improvement in computation speed must come from the concurrent use of many processing elements 
[3,5,6]. The degree of concurrency in a special-purpose system is largely determined by the underlying algorithm. 
Massive parallelism can be achieved if the algorithm is designed to introduce a high degree of pipelining and 
multiprocessing. When a large number of processing elements work simultaneously, coordination and 
communication become significant - especially with VLSI technology where routing costs dominate the power, 
time, and area required to implement a computation. The issue here for DRA is to design a hardware algorithm that 
supports a high degree of concurrency, and in the mean time employs only simple, regular communication and 
control to enable efficient implementation.
2. Simple and Regular Design
Cost-effective designs have also been a chief concern in designing special-purpose chips like DRA. Special-purpose 
design costs can be reduced by the use of appropriate architectures. If DRA can truly be decomposed into a few 
types of simple substructures or building blocks, which are used repetitively with simple interfaces, great savings in
UUCS-TR-86-008 Page 12
design cost can be achieved. To cope with the circuit design complexity, simple and regular designs, similar to 
some of the techniques used in constructing large software systems, are essential. In addition, special-purpose 
systems based on simple, regular designs are likely to be modular and therefore adjustable to various performance 
goals - that is, system cost can be made proportional to the performance required. This suggests that meeting the 
architectural challenge for simple, regular, modular designs yields a cost-effective DRA chip.
Systolic system [3,5,6] is an attempt to capture the concepts of parallelism, pipelining, and interconnection 
structures in a unified framework of mathematics and VLSI engineering. They embody engineering techniques such 
as multiprocessing and pipelining together with the more theoretical ideas of cellular automata and algorithms, and 
therefore are excellent ideas for DRA hardware implementation.
3.3 A Highly Parallel Reformulation for DRA
The hardware parallel reformulation of DRA takes the following three steps in order to solve the complexity met in 
the conventional DRA1 design.
1. Constructing the Parallel Computation Tree
When more effort is spent analyzing Eq. (2), we see that element AjjQc.p) can be decomposed as
A (2 9 )
which can form a leaf node like
UUCS-TR-008 Page 13
Figure 1: Leaf Node for Computing /jp x A-Qc.p)
so that Eq. (2) can be hierarchically formed as a tree-like structure with each level imbedded in the parallel 
computation for their leaves’ operands as shown in Figure 3.
2. Speeding Up the Iteration

The hardware parallel reformulation for DRA1 eliminates three 4K memories from the design [2]; only a 64-bit shift 
register is required to store all 64 intermediate label elements. Thus space complexity is decreased to 0(n2). Since 
each computation takes 64 cycles, assuming a clock cycle is about 150 ns (NMOS process) and the maximum 
iteration time is 0(n2), the execution time using this highly parallel computation at the worst case is given in 
microseconds.
3.4 Basic Principles and Implementations for DRA2 Circuit
1. System Architecture and Block Diagram .
1. System Architecture and Block Diagram
The block diagram of the DRA2 circuit is illustrated in Figure 4. The chip consists of four functional blocks.
1. Compatibility Matrix Registers (CMR). C,j Registers are a set of eight 8-bit shift registers in the 
leftmost part of the circuit, they are used for storing each C^ matrix. Another set of Cn Registers in 
the rightmost part of the circuit are for storing C^.
2. 8 x 8 Systolic Array (SA). The systolic array is composed of 8 by 8 simple and regular cells. 8 simple 
and regular cells. They are predefined to map the highly parallel computation algorithm of Figure 3 
into silicon. A number of horizontal and vertical communication wires are designed around the four 
edges of the cells to make use of higher degrees of parallelism in the computation.
3. L-matrix Shift Register (LSR). It is used for (1) the input and output data paths for original and final 
labeling matrices, (2) the pipelining channel for tree-root operands broadcasting and pipelining, 
forming a recursive DRA computational wavefront and (3) performing temporarily the data storing 
and updating.
4. Control Module (CM). This module includes four units. An 8-Bit Comparator is located on top of 
the first 8-bit shift register of the LSR to sense the equality between the n*11 output vector L^n) of the 
systolic array and the corresponding n-1* row vector L^n'^  inside the LSR. A Timer is served as both 
the systole pacer and tagged-bit signal generator for iteration control. An 8-Bit State Register is used 
for collecting comparison results from the Comparator and monitoring iteration states. Finally a Finite 
State Machine (FSM) is built for performing a self-timed synchronization among these functional 
blocks and host computer.
UUCS-TR-W-008 Page 15
UUCS-TR-008 Page 16
Figure 4: Circuit Block Diagram for DRA2
This diagram of four functional blocks is also served as the PPL layout floorplan for efficient layout (in section 3.5) 
and testing blocks to imbed the module testing strategy (in section 4).
2. Systolic Cells and Array Design
The basic principle of the systolic architecture for DRA is illustrated in Figure 5. By replacing a single Processing 
Element with an array of 8 by 8 PEs, a higher computation throughput can be achieved without increasing memory 
bandwidth. The function of the memory (i.e., the L matrix shift registers) in the diagram is to "pulse" data l-p (j, p =
1,2,..., n) through the array of cells. Then new data (i, k = 1,2,..., n) are returned to memory in a rhythmic 
fashion. The crux of this approach is to ensure that once the data are brought out from the memory they can be used 
effectively at each cell they pass while being "pumped" from cell to cell along the array.
UUCS-TR-008 Page 17
Memory
Ijp (j> P» = 8)
12
8 x 8  
S IM D  A rra y
i8
Figure 5: Basic Principle of the Systolic DRA2 System
To perform parallel DRA computation, two cells (as illustrated in Figure 6 (a) and (b)) with almost identical logic 
and structure were used in constructing the systolic array. The only difference is that the first cell is in charge of 
generating broadcasting signals for each row array. The construction of the systolic array using these two cells is 
illustrated in Figure 7.
bk = = ljk at column j  = l . (30 )
UUCS-TR-008 Page 18
Out(jJc)H  = (lJpxAlJ(k,p))=lp=1 (Ijp^ lj^ -Cu(k,p))= OjpxbkxCii(k,p))=lp=] (1-+^+^).
(a) Cell-A
(31)
b Hin) =  bkiouiyal c o lu m n s ]  * 1 . (32)
Out(jJc)H  = djpxAij(k,p))=lp=] (lJpxl^C ii(k,p))= (ljpxbkxCu/k .p ))= (T -^ + U -j).  (33}
(b) Cell-B
Figure 6: Two Cells in DRA2 Systolic Array
According to Figure 3 and Eq. (30) to (33), these two cells can be implemented in two levels of NOR gate 
combinational logic. Their PPL layouts were shown in Figure 13.

UUCS-TR-008 Page 20
where j = The corresponding output vector L; of the systolic array, which is the i111 row vector of L labeling
matrix at the n* iteration, is generated:
^il^i2’^ i3’.....
where i is fixed at a time t=i. As time moves forward, the elements in the L shift register have shifted from the left 
to the right in an 8-clock-pace fashion. The time-varying feature of the entire DRA2 array can best be described by 
following two Topological Index Equations: '
i «— i+t mod n 
j <- j+t mod n








and at time t = i = 2, vector Lj 
is generated, etc.
(^ 21 >^ 22^ 23^ 24^ 25^ 26^ 27^ 28)
UUCS-TR-008 Page 21
'is >11 >2S 1*21
E Z I
Figure 8: Computational Wavefront Pipelining and Circulation for Interleaved Processing
Each L; vector is computed based on the interleaved utilization of the systolic array, whereas eight Lj vectors form 
an entire computational wavefront of the L labeling matrix, of the n^ relaxation iteration. Note that we use the 
number of n computing trees for generating n2 l^'s in 0(n2) time, we may also use n by n computing trees to 
compute the same number of l^'s in O(n) time, provided that the latter has a uniformly progressing wavefront in 
time and in space but the former doesn’t.
Multiple Signal Broadcastins
The broadcasting technique is probably one of the most obvious ways to make multiple use of each input element. It 
plays an important role in making the parallel computation tree of Figure 3 implementable. Two multiple 
broadcasting schemes are used in DRA2 architecture. In the first, n2 vertical broadcasting lines from each 
pipelining operand are connected to the bottom most leaves’ node of each parallel computing tree. Secondly, as 
depicted in Figures 6 and 9, Cell-A at column j (=1) is used to jog signal (which is the n'b and then
propagate it horizontally from right to left through the entire row array. Thus, the output vector of the systolic array,
i.e., (/ii>fi2>Wi4>Wi6>Wi8)’ can ^  genefated simultaneously in a highly concurrent manner. For the sake of 
simplifying the analysis in fast DRA3 and DRA4 architectures, we define the second multiple data routing pattern 
for jogging bk as J-Pattern generation in Figure 9.
»>!_' ,m 
_______ b2 ' \______::
UUCS-TR-008 Page 22
b8
Figure 9: Broadcasting Scheme for Signal 
Self-Timed Synchronization and Taeeed-Bit Control
By using recursive systolic computation and interleaved processing, the computational task has been decomposed 
into the smallest computing piece, L;. To compute each vector L;, the globally synchronized systolic array of Figure 
7 is used. For completing the entire relaxation computation, this synchronous array is imbedded into a self-timed 
system. The self-timed asynchronous scheme may be costly in terms of extra hardware and delay in each element, 
but it has the advantage that the time required for a communication event between two elements is independent of 
the size of the entire system [14]. AJso, it is easy to design and validate a self-timed state machine in PPL 
methodology [16].
Among the 64-bit L shift registers, the rightmost first 8-bit SR is one which is able to parallel load in the n* output 
vector from the systolic array in order to update the current n-1* Lj row vector. This iteration and updating process 
is the core of the relaxation process described in Eq. (8). To sense the completion of computation, a Comparator is 
built on top of the first 8-bit SR. If two vectors are equal, a row-eq signal of 1 is produced and stored into 8-bit 
States SR of the Control Module; otherwise a 0 signal is sent As soon as the State Register gets eight 1 ’s, which 
means the equality of Eq. (8) is reached, an all-eq signal is issued to the FSM. Since the control processes in this 
system are based on the data validity of a control data flow, a reliable and fast execution in a data-driven 
environment is created. The control mechanism used in the parallel DRA2 architecture is shown in Figure 10.
UUCS-TR-008 Page 23
To ensure that the iteration cycle completes at the end of n x n cycles, a tagged-bit is derived from an ANDed term 
of both the /n  bit and the 64th-count of the Timer, which has served as a reliable alignment signal for computation 








64-Bit SR 'j« ------ ■jl
L _I
Figure 10: Self-Timed Synchronization 






diiQ uoiiBXBjajj 9J9JDSIQ jo inoXtr[ q j j  9 i ll  -Zl 9jn3lj
• 11; Hi u. r
I ' l ! ' j  ^  I ' i ' : : i  I r ' l  ]■«j-:": i i  ^  ] 'j ; : s  i u i  s s i i s s  s s  i'l s i  i u i  s s  i i  s s  H s s  i'j s s  i u 'j  v s  i i  s s  !■! V " ! i'l s s  L ' i  s - s  i i s s  i i s s  i i  s s  i u i s s  i i  s s i i s s  i'i s s  L i s s  i'i - i  j j  s - :  H L *  |  j j j l j j ' j  j ;
r , V ;  j,j 1.4,  |,j i H : j u 'j  K  l .* ,  : T . T ! . j  : .u . : _ i  j ^ . j  L i !,! L i  j,j L i !,! L i  L i  j.j L i  j.j L i  J . I L i  j . j j . j j . j  L: i : W ,  j.j ;.; i ; j.j; j.j L-O j | , . . j | . |  L ;i: I.J L i  j,j L i  §  j  j ]  j j
I j w « p : - ;r ' i M : i ' i !' !: : : ;  I'i r : i ; i'L-'i ^ !'! ^  i !  ^  I'! | r *  ^ ^  «  L“" : « \ - i  ^ ^  i ' i i ' i  ^  L i  i'i i'i s s i i s s  i:L. ' i s s  i ' i s s  i ' i s s  i i  s ' s  i u i  s s  i i s s  i'i s ' s  i'i s s  y  s s  i'i s s  i'i s ' s  i'i s s
;,!,y !L . t - ' i i ' i i ?  i'i ^  i r i  i'i ^  | ' |  ^  i'i s “ : I 'i - i  s s  i'i ss  | i  ss H  ss \ I J  i . i  j'j = - i  i i  i i  \ \  i - i  L i  i i  i'i i i  i i i i  : i  i ^ ' i  w  i i  s s  i'i s s  H S'S i u i  - r s  i'i S'S i'i S'S i'i s ' s  i^.;i H i ' i  i i  i'i i i  
. f ' i V 'H  i'j s " :  i'i s ' s  I ' j 1^-1 i ' i i i H i ^ ' i  v - :  i i  i1! s ' 1-: i'i !^ .'i i'i ‘i  i i  ^  K ^  K A  i i  i'l = -:s i i i ( . ' i  s - - :  i ' i i ' i  s - - :  i'i s - :  i ^ . i  i'i s s  i'i s ' s  i'i i'i i i  i i  L i  r  j
!.!«.!,i!j.j:. A-'i^ i ' i | ' i  s-: i'i s's i^ -'i ;-rs i'| H s's || ii i'issn;:-: iijyi s s  i'i s " - ! i i i - i  i:!.i ^  ii - i  ii ii s^ iui ^  ii sv i i i i  s-: ^  j^i |-r-: H s s  H s's n s s  i u i i i i i  Li |E j] ] ] j] “
................................. !............................ !............................ !............................ !............................ !............................ !............................ !....................  h!';,!■!,i'l,I';,
f ' i i ' i i ' i ^  ' i i r i ^  II lS^ SVHSS iui s s iiS'S ss iiS'Si:L. s s s s ' i i r^s i -' s s iis s s s s s . s s  i i i s s ss i; jj] ] ,
..........L'l'iilliii'i'i i' ..................................................................................................................................................................................................................................................................................................................................................................................... ........ ....................................... ............ ‘I, ’
’■ 2 G  Sii i i i !  j  j  ;L i'i i ' i : : : : i ' i y  H s s  i'i S S  i'i S S  |:L. i H  s - s  i i  s s  i'i s s  i u i  s i .  i i  t - t .  i i j . j  U .  i r j  L i  j i  t ;4.; j i - r 1-: i i  s - s  i u i  s s  i i  s " :  i i  s s  i i  s s  iij^.i s s  i i  s s  I'i s ^  i i  s s  j u i  s s  i i  ‘ r s  i i i i  s s  j E j j » ; ! l | :
L u” u r  ■' ^  •' ^
Ii §1 j i  1 1 1 1  U l l i l l l  1 1 1 1 1 H 1  l l i i | | j . | lF I“ w v t  '“ v i .m .- .-w ‘m “ '^ j ; i** 
" i i ‘ " i L - j i '  “ i i '  “ i i '  “ i i '  ' ' i i ' ....................
1
:i ;  &  P : ; i  ? \ i &
j  i n & : !
! Y i  I : - ; ' !  ;
i i i i
■] iri :i  i'i ^ :,i £'5
■ i i I i
J
55545352515049484746454443424140393637363534333231302928272625242322212019181716151413121110987654321
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 70 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
8 8 8 8 8 8 8 8 9 9 9 9 9 9 92 3 4 5 6 7 8 9 0 1 2 3 4 5 6
I* a Ola 0 tit a 0|a 0 tIt a 0|a 0 tit a Oja 0 t
i  I 1 1 1 1
1 1 titI I
N H 11"
a 01a 0 111 a-0 I a-0 
* * j j * *
1 1 tit
i  i
_  N_  1 0|a 0 t * *U
Cell-B Cell-A
u n “ jj " N u n n " " " jj II N | M M
" " " .1" " " " " M MM" " " HH M | M MVi VVVVF HF F FMF F I F MF •• F F F H f M 0 0 * 0 0 u 0 0 0 0lu 0 0 0 0 0 0 0 0 s s 9 9 9 9 r r r [ " M "1 lMH HH 1 lNM|H*\ lNN | M Nlu 0 0 0 0 0 0 9 r s 9 9 9 r r r r H II NH MIt II MM M It1 U1 lu 0 0 0 0 0 9 9 9 r r r r r r i 1 " M "I) 1" N H"11 r "1" "11 I" N | N riu 1 |U 0 0 0 9 0 r r 3 r r r s 9 r FMFN F H p M F M F H F N p HHH H HMMMl lu 0 0 r 0 9 0 r r 9 9 9 r r r r 0 1 r u|0 1 r U| 0 1 r u|0 1 r U|MMII li MH HMMH*• lu 0 0 0 0 S r 9 S 9 r r r r 1 1 u 9 11 u sl 1 1 u 31 1 1 u s 1M« " T « " Ml lu 0 0 0 0 0 0 r 9 3 9 r r r r 1 0 Ur 11 o u r 11 0 u r 11 0 u rF MF F « F MFMF -“T lu 0 0 0 0 0 r r 9 r r r 9 s r 1 u 9|1 1 u all 1 u 9|1 1 u sMMMM M MMMII M” i |U 0 0 0 0 S r r S r r r r r s + + + +UHUHU* u Mu u " 0 u lu 0 0 3 r 0 r 9 a 9 r r r rr r r r r r 1 u lu 0 0 0 0 r 0 S 0 3 r 8 9 9 r r r r
Timor
|Ulu 0 0r 3 rr rr rr rr rr 99 r 9 r 9 rr r r r, r State Register
Finite State Machine
til t 1 1 1 1 1 u 1
“T ( [ [ ( ( [ [ “T ( [ [ ( [ [ [i i i i i i i i i i i i — i i i i•I M H H m M •i H H m •1 I II If ll I
X « t| u X " t| u X " t| U X " tl u X " t| u X " t| u x '• tl u X "
Comparator
Figure 13: The PPL Layouts for Several Circuit Modules
Page 26
0 1 J  3 4 5 «  1
!• u J 1 I I* 
I I I ) I 1 I '
0 0 0 0 0 0)0 00
10 1110 0
8 * 0 1  2 3 4 5 4 "
1 I .............. I
























































5. c8: The 8-cycle flag for detecting systolic pace and control bit signal in testing mode. When every 
8-clock is sensed, c8 is set to 1.
6. all-eq: State signal for monitoring the ending of the iteration process. When all l ’s of all-eq appears, 
the iteration is done.
7. ready: When relaxation computation is demanded and data (C^, C^, and L ^ )  are ready, CPU sets 
ready as 0, which resets all states of the Finite State Machine in the control module.
8. data-ok: Upon the completion of the relaxation computation, DRA1 issues data-ok by signaling a 1 
to notify the CPU that the output of computation is pending for transfer.
9. to-cpu: Once the CPU finishes the preparation for receiving the data giving the result, a to-cpu 
command of l ’s is acknowledged by the DRA1 chip, then a data transfer from DRA1 to CPU takes 
place.
10. <t>: This clock is used to generate several 2-phases clocks with different signal polarity and delay.
11. Vdd, Gnd: For power supply.
2. Interfacing with CPU
Since the DRA1 chip is designed as a Microcomputer Peripheral Device, it can be easily interfaced with a CPU as 
illustrated in Figure 16. The data input pins of C,|, C,j, and L-in, L-out are tied to the data-bus. Pins like ready, 
data-ok, and to-cpu are connected to control pins of the CPU. Since ready is designed as the chip initialization 
signal it can be used as the Chip-Select signal preceded with a simple combinational circuit for a given address. To 
initiate the DRA1, the processor first places the data onto the input pins and selects/resets chip using ready, then 
DRA1 computes relaxation until data-ok is signalled to the CPU. Once the CPU is ready, data of the results can be 













Figure 16: Interfacing Block Diagram
3.7 Simulation
1. Functional Simulation
Functional simulation is aimed at the verification of the correctness of the algorithm and data structure, discovery of 
limitations and problems which may occur during practical implementation. Thus a number of high-level functional 
simulations were performed during the formulation of DRA.
2. Logical Simulation
Logic simulation for the DRA1 chip was performed for individual modules and the entire circuit using the PPL 
topological circuit simulation tool SIMPPL. Simulation using SIMPPL is performed by assigning logical values to 
every node in the circuit. Input values are assigned and allowed to ripple through the circuit. Output values are then 
checked to ensure that the correct values are produced. For detailed functionality and usage about PPL simulation 
tools see [18].
An 8-label 8-object coloring identification problem was selected for logical simulation as shown in the following. 
The input to the circuit includes matrixes C-, C-, and L^°\ Matrixes C- and C- are the inherent label pair’s 
relationships. L ^ y  are the raw data seen by a robot. The first seven initial labels (L1,L2,L3,L4,L5,L6,L7) of L ^  
indicate seven distinct regions of color on an object, the 8th label (Lg=l 1111111) means the color in the 8th region 
is a mixture of all 8 colors in these eight areas. We frequently meet such a situation. For example, suppose an 
airplane is flying, its major parts are clear, but one area in its body is blurred due to the plane’s motion.

UUCS-TR-008 Page 32
units; (2) Since DRA1 is designed as a microcomputer peripheral device, it can be tested in a microcomputer-based 
environment, as illustrated in Figure 16. These strategies can be imbedded in the following two steps.
4.1 Testing for Systolic Array
A total number of eight test pins (Lil,...,Li8) were built in the chip for testing and monitoring the output vectors 
generated during each systolic cycle. These pins mathematically represent the n* iteration results of the parallel 
computational trees. Since these results are predictable from high-level functional simulation or lower-level PPL 
logic simulation. Errors inside each combinational cell can be detected very easily. The testing of systolic array 
implies that shift registers for C^, Cy and L need to be tested first. Testing for these registers have already been 
taken into account and will be earned out by using pins 1-in, 1-out, Cjjj to C-g, and C-j to C-g.
4.2 Testing for Iteration Process and Control Module
After the systolic array has passed the test, the DRA1 chip will be tested for the relaxation computation. During this 
stage only the iteration process of computation and control module need to be tested. In Figures 10 and 15 a number 
of control pins which provide sufficient iteration flags and control information are packaged on the chip, such as c8, 
all-eq, ready, to-cpu, and data-ok.
An advantage of designing the control module with a data-driven mechanism will be seen because of its 
convenience in testing. Referring to state diagram of FSM in Figure 11, the entire computation has been divided 
into several distinct periods with each period initialized based on the availability of certain critical control signals, 
for instance the most general indication signal for systolic shifting, updating, inpudng, and outputting, the c8. 
Another nice side-effect is gained through the implementation of the interfacing pins with the CPU, which not only 
makes the testing available in a microcomputer-based environment, also separates testing procedure with interactive 
notification of testing states of these signals. Much of the time and effort saved during testing can be drawn from 
these advantages.
UUCS-TR-008 Page 33
S .C o m p a riso n  w ith  A  C o n v e n tio n a l D esign
5.1 A Brief Description of the DRA1 Design
A conventional design for an 8-object, 8-label DRA problem, called DRA1, is presented in [2]. The DRA1 system 
consists of three chips. A DRA1 Chip performs DRA computation. An External RAM is used for storing, prior to 
DRA computation, the matrices C-, C-, Aj, and Nei[ij] elements. A Bus Control Chip coordinates the interaction 
between the DRA1 chip and the external RAM, under control of the host computer. The block diagram of the 
DRA1 chip is shown in Figure 17.
IBB Counter Bus
Figure 17: DRA1 Functional Block Diagram
These functional blocks are: (1) An Internal RAM for holding matrices AjjCk.p). (2) An n by n Counter for 
determining the number of the elements during data reloading. (3) An Initialization Logic Unit evaluates the 
matrices Ajj(k,p) elements and writes them into RAM. (4) A Loop Logic Unit reads matrices Ajj(k,p) elements
UUCS-TR-008 Page 34
from RAM and iterates the labeling vectors L;s over these constraints, and writes the results into LRAM. (5) There 
are four State Machines which were built. The Main State Machine controls the three other state machines, 
activating each successively. The Initialization State Machine controls the initialization logic unit and the internal 
RAM, as indicated in Figure 18.
Figure 18: Initialization State Diagram
The Loop State Machine controls the loop logic unit and LRAM. The state diagram is shown in Figure 19.
UUCS-TR-008 Page
Figure 19: Loop State Diagram
The Done State Machine loads back the results upon the computation completion. The complete DRA1 chip 
implemented as illustrated in Figures 20 and 21.
For time and space analysis of DRA1 circuit, see section 3.1 and [2].
Page 36










































































Figure 21: DRA1 Chip Pin Out Diagram
UUCS-TR-86-008 Page 38
The conventional design DRA1 and DRA2 architectures are designed for the same DRA problem, a brief 
comparison between both designs are summarized in the following Figures.
Complexity Comparisons
DRA2 Architecture DRA1 System
5.2 Comparisons
Computational Family Compute Bound I/O Bound
Hardware Algorithm Highly Concurrent Serially Computing
Memory Requirement None 12 K
Computing Time Microseconds Seconds to Minutes
Layout Regular and Simple Irregular and Hard
Control Strategy Simple Complicated
Entire System One Chip Three Chips
(DRA1 chip plus 
External 4-K RAM 
and Bus Ctl Chip)
# of Transistors 6,382
Design and Cost Comparisons
-  80,000
DRA2 Architecture DRA1 System
Algorithm Reformulation 
PPL Design Time 








DRA Chip Descriptions 
DRA2 Architecture
# of Transistors 6,382
Pins 35
Sizes 181x249
(PPL row by column) 75 x 75
(Using Full-Custom Designed Cell-A 
and Cell-B in Systolic Array, it is 
currently under implementation)





6. F u r th e r  A d v an ced  D eve lo p m en t
Further research in developing fast, high-performance discrete relaxation hardware algorithms and layout
implementations is really attractive and promising. The major issues related to this research are:
1. Using full-custom designed PPL cells in DRA circuit design which is able to greatly decrease the 
layout area. Since just two cells need to be designed, almost no additional cost will be added.
2. The highest degree of flexibility in DRA design can be obtained by allowing programmability in cells 
as well as reconfigurability of cell interconnections. Thus to implement a Programmable Systolic 
Chip (PSQ [3] for more general or specific DRA problem seems necessary.
3. CMOS technology could be applied to give higher speed and lower power realization.
7 .C o n c lu sio n s
Several conclusions can be drawn from this extended summary:
1.The implementation of DRA has paved the way for developing various kinds of fast and high- 
performance Discrete Relaxation Chip.
2. PPL design tools are really a cost-effective and high-speed design tool, which frees the labor-intensive 
composite layout and many lower level logic design. Since each individual circuit unit can be layed 
out and simulated separately, the hierarchical VLSI design can be carried out efficiently. A weak 
point of this methodology is the space-consuming of its wire cells. Since every row wire and all wire 
connections take one cell space, a large amount of areas were wasted in wiring. In DRA circuit, for 
example, 80 percent area is spent on wire connection. To adopt PPL being suitable for parallel design 
in which generally a large numbers of parallel wiring are required, special wiring cells to perform 
area-efficient layout are important
3. Design of highly parallel hardware algorithms is the key problem of logic design for fast, area- 
efficient, high-performance, and less-device VLSI systems. A considerable attention should be paied 




R E F E R E N C E S  •
1.W. Wang and J. Gu, An 0(n2) Time Fast Discrete Relaxation Architecture, Project Report, 
Department of Computer Science, University of Utah, March 1986.
2. D. Ku, DRA1 Chip Implementation Report, Project Report, Department of Computer Science, 
University of Utah, March 1986.
3. H. T. Kung, Putting Inner Loops Automatically in Silicon, Lecture Notes in Computer Science Vol. 
163: VLSI Engineering, pp. 70-104, Edited by Tosiyasu L. Kunii, Springer-Verlag, 1985.
4. R. P. Brent and H. T. Kung, The Area-Time Complexity of Binary Multiplication, Journal of The ACM 
Vol. 28, No. 3. pp. 521-534,1981.
5. H. T. Kung, Why Systolic Architectures? Computer Magazine 15(l):37-46, January, 1982.
6. C. E. Leiserson, Area-Efficient VLSI Compulation, The MIT Press, 1983.
7. K. Hwang and F. A. briggs, Computer Architecture and Parallel Processing, McGraw-Hill, 1984.
8. J. E. Savage and J. S. Vitter, Parallelism in Space-Time Tradeoffs, VLSI: Algorithms and 
Architectures, Edited by P. Bertolazzi and F. Luccio, North-Holland, 1985.
9. Z. Galil, Optimal Parallel Algorithms - invited paper, VLSI: Algorithms and Architectures, Edited by 
P. Bertolazzi and F. Luccio, North-Holland, 1985.
10. H. Yasuura and S. Yajima, Hardware Algorithms for VLSI System, Lecture Notes in Computer Science 
Vol. 163: VLSI Engineering, pp. 70-104, Edited by Tosiyasu L. Kunii, Springer-Verlag, 1985.
11. R. V. Southwell, Relaxation Methods in Engineering Science, Oxfo D. Waltz, Understanding Line 
Drawings of Scenes with Shadows, In Psychology of Computer Vision, Edited by P. H. Winston, pp. 
19-91, McGraw-Hill, 1975.
12. T. C. Henderson and O. D. Faugeras, Relaxation Techniques in Computer Vision, Oxford University 
Press, London, To Appear.
13. P. H. Winston, Artificial Intelligence, Addison Wesley, 1984.
14. D. H. Ballard and C. M. Brown, Computer Vision, Prentice-Hall, Inc., 1982.
UUCS-TR-008 Page 43
15. IEEE/ACM, Proceedings of International Conference on Computer-Aided Design, 1984 and 1985.
16. K. F. Smith, T. M. Carter and C. E. Hunt, Structured Logic Design of Integrated Circuits Using the 
Storage/Logic Array (SLA), IEEE Trans, on Electron Devices, Vol. ED-29, No. 4, April 1982.
17. PPL Manual, VLSI Group, University of Utah.
UUCS-TR-008 Page 44
A P P E N D IX
PPL Simulation File for Region Coloring Problem











»  s e t  r e a d y :0
; As so o n  a s  CPU g e t s  a  r e a d y  = 0 ,  i t  e n a b l e s / r e s e t s  DRA2 c h ip ,  and  s t a r t s  
; t o  i n p u t  C i i  (= rc ) and  C i j  ( - l c ) , and  L i j  ( = l - in )  e le m e n t s .
»  cy
1 : 4> l-o u t= X  l- in = X  Li=XXXXXXXX c8=X a l l - e q = X  read y = 0  d a ta -o k = 0  to -cp u = X  
»  s e t  r e a d y :1  t o - c p u :0
»  ;c y  1 •
»  s e t  l - i n : l
»  s e t  l c l :0  l c 2 :1  l c 3 : l  l c 4 : l  l c 5 : l  l c 6 : l  l c 7 : l  l c 8 : l  
»  s e t  r c l :1  r c 2 :0 r c 3 :0 r c 4 :0 r c 5 :0 r c 6 :0 r c 7 :0 r c 8 :0 
»  cy
2 : 4> l-o u t= X  l - i n = l  Li=XXXXXXXX c8=0 a l l - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  2 
»  s e t  1 - i n :0
»  s e t  l c l :1  l c 2 : 0  l c 3 : l  l c 4 : l  l c 5 : l  l c 6 : l  l c 7 :1  l c 8 : l  
»  s e t  r c l : 0  r c 2 :1  r c 3 :0  r c 4 :0  r c 5 :0  r c 6 :0  r c 7 :0  r c 8 :0  
»  cy
3 : 4> l -o u t= X  l - i n = 0  Li=XXXXXXXX c8=0 a l l - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  3 
»  s e t  1 - i n :0
»  s e t  l c l :1  l c 2 :1  l c 3 : 0  l c 4 : l  l c 5 : l  l c 6 : l  l c 7 : l  l c 8 : l  
»  s e t  r c l : 0  r c 2 :0  r c 3 : l  r c 4 :0  r c 5 :0  r c 6 :0  r c 7 :0  r c 8 :0  
»  cy
4 : 4> l-o u t= X  l - i n = 0  Li=XXXXXXXX c8=0 a l l - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  4 
»  s e t  1 - i n :0
»  s e t  l c l :1  l c 2 :1 l c 3 :1  l c 4 : 0  l c 5 : l  l c 6 : l  l c 7 : l  lc 8  :1  
»  s e t  r c l :0 r c 2 :0 r c 3 :0  r c 4 :1 r c 5 :0 r c 6 :0 r c 7 :0  rc 8  : 0 
»  cy
5 : 4> l -o u t= X  l - i n = 0  Li=XXXXXXXX c8=0 a l l - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  5 
»  s e t  l - i n : 0
»  s e t  l c l :1  l c 2 :1 l c 3 : l  l c 4 : l  l c 5 : 0  l c 6 : l  l c 7 : l  l c 8 :1
PPL L o g ic  S im u la t io n  f o r  an  8 -L a b e l  8 -O b je c t  C o lo r in g  P ro b lem
(DRA.out)
UUCS-TR-008 Page 45
»  s e t  r c l : 0  r c 2 :0  r c 3 :0  r c 4 :0  r c 5 : l  r c 6 :0  r c 7 :0  r c 8 :0  
»  cy
6 : 4> l-o u t= X  l - i n = 0  Li=XXXXXXXX c8=0 a l l - e q = X  re a d y = l d a ta -o k = 0  to -c p u = 0
»  ;
»  ; cy  6 
»  s e t  1 - i n :0
»  s e t  l c l :1 l c 2 :1 l c 3 : l  l c 4 : l  l c 5 : l  l c 6 : 0  l c 7 : l  l c 8 : l  
»  s e t  r c l : 0  r c 2 :0  r c 3 :0  r c 4 :0  r c 5 :0  r c 6 : l  r c 7 :0  r c 8 :0  
»  cy
7 : 4> l-o u t= X  l - i n = 0  Li=XXXXXXXX c8=0 a l l - e q = X  re a d y = l d a ta -o k = 0  to -c p u = 0
»  ;
»  ; c y  7 
»  s e t  1 - i n :0
»  s e t  l c l :1  l c 2 :1 l c 3 : l  l c 4 : l  l c 5 : l  l c 6 : l  l c 7 : 0  l c 8 : l  .
»  s e t  r c l : 0  r c 2 :0  r c 3 :0  r c 4 :0  r c 5 :0  r c 6 :0  r c 7 : l  r c 8 :0  
»  cy
8 : 4> 1 -o u t-X  l - i n = 0  Li=XXXXXXXX c8=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  8 
»  s e t  1 - i n :0
»  s e t  l c l :1  l c 2 :1 l c 3 : l  l c 4 : l  l c 5 : l  l c 6 : l  l c 7 : l  l c 8 : 0  
»  s e t  r c l : 0  r c 2 :0  r c 3 :0  r c 4 :0  r c 5 :0  r c 6 :0  r c 7 :0  r c 8 : l  
»  cy
9 : 4> l-o u t= X  l - i n = 0  Li=OXXXXXXX c 8 = l a l l - e q = X  re a d y = l d a ta -o k = 0  to -c p u = 0
»  ;
»  ; cy  9 
»  s e t  l - i n : 0  
»  cy
1 0 :4 >  l-o u t= X  l - i n = 0  L i=00000000 c6=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to - c p u - 0
»  ;
»  ;c y  10 
»  s e t  l - i n : l  
»  cy
1 1 : 4> l-o u t= X  l - i n = l  L±=XXXXXXX0 c8*=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  11 
»  s e t  1 - i n :0 
»  cy
1 2 : 4> l -o u t= X  l - i n = 0  Li=XXXXXX0X c8=0 a l l - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ; cy  12 
»  s e t  l - i n : 0  
»  cy
1 3 :4 >  l-o u t= X  l-±n*=0 Li«XXXXX0XX c8=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  13 
»  s e t  1 - i n :0 
»  cy
1 4 :4 >  l -o u t= X  l - i n = 0  Li^XXXXOXXX c8*=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ; c y  14 
»  s e t  1 - i n :0 
»  cy
1 5 :4> l-o u t= X  l - i n = 0  Li=XXXOXXXX c8=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ; c y  15 
»  s e t  l - i n : 0
UUCS-TR-008 Page
»  cy
1 6 : 4> l-o u t= X  l - i n = 0  Li=XXOXXXXX c8=0 a ll - e q = X  re a d y = l d a ta -o k = 0  to -c p u = 0  
»  ;
»  ;c y  16 .
»  s e t  l - i n : 0  
»  cy
1 7 : 4> l-o u t= X  l - i n = 0  Li=0OXXXXXX c 8 = l a l l - e q = X  re a d y = l d a ta -o k = 0  to -c p u = 0  
»  ;
»  ; c y  17 
»  s e t  1 - i n :0 
»  cy
18 : 4> l-o u t= X  l - i n = 0  L i=00000000 c8=0 a l l - e q = X  re a d y = l d a ta -o k = 0  to -c p u = 0  
»  ;
»  ; c y  18 
»  s e t  1 - i n :0 
»  cy
1 9 :4> l-o u t= X  l - i n = 0  L i=00000000 c8=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0  
»  ;
»  ; c y  19 
»  s e t  l - i n : l  
»  c y
2 0 : 4> l -o u t= X  l - i n = l  Li=XXXXXXOO c8=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0  
»  ;
»  ; c y  20 
»  s e t  l - i n : 0  
»  cy
2 1 : 4> l-o u t= X  l - i n = 0  Li=XXXXXOOX c8=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0  
»  ;
»  ; c y  21 
»  s e t  l - i n : 0  
»  cy
2 2 : 4> l-o u t= X  l - i n = 0  Li=XXXXOOXX c8=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0  
»  ;
»  ; c y  22 
»  s e t  1 - i n :0 
»  cy
2 3 : 4> l-o u t= X  l - i n = 0  Li=XXXOOXXX c8=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0  
»  ;
»  ; cy  23 
»  s e t  l - i n : 0  
»  cy
2 4 : 4> l-o u t= X  l - i n = 0  Li=XXOOXXXX c8=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0  
»  ;
»  ; c y  24 
»  s e t  l - i n : 0  
»  cy
2 5 : 4> l-o u t= X  l - i n = 0  Li=OOOXXXXX c 8 = l a l l - e q = X  re a d y = l d a ta -o k = 0  to -c p u = 0  
»  ;
»  ; c y  25 
»  s e t  1 - i n :0 
»  cy
2 6 : 4> l-o u t= X  l - i n = 0  L i= 0 0 000000 c8=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0  
»  ;
»  ; c y  26 
»  s e t  l - i n : 0  
»  cy
2 7 : 4> l-out=X  l-in = 0  Li=00000000 c8=0 all-eq=X  ready=l data-ok=0 to-cpu=0
UUCS-TR-008 Page 47
»  ;
»  ;c y  27 
»  s e t  l - i n : 0  
»  cy
2 8 : 4> l-o u t= X  l - i n = 0  L i=00000000 c8=0 a l l - e q = X  r a a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  28 
»  s e t  l - i n : l  
»  cy
2 9 :4 >  l-o u t= X  l - i n = l  Li=XXXXXOOO c8=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  29 
»  s e t  1 - i n :0
»  cy  .
3 0 : 4> l -o u t= X  l - i n = 0  Li=XXXX000X c8=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  30 
»  s e t  l - i n : 0  
»  cy
3 1 : 4> l -o u t= X  l - i n = 0  Li=XXX000XX c8=0 a l l - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  31 ^
»  s e t  l - i n : 0  ’ *
»  cy
3 2 : 4> l -o u t= X  l - i n = 0  Li=XX000XXX c8=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0
»  ; .
»  ;c y  32 
»  s e t  1 - i n :0 
»  cy
3 3 : 4> l -o u t= X  l - i n - 0  Li=0000XXXX c 8 = l a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  33 
»  s e t  1 - i n :0 
»  cy
3 4 :4 >  l -o u t= X  l - i n = 0  L i=00000000 c8=0 a l l - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  34 
»  s e t  l - i n : 0
»  cy  „
3 5 :4 >  l -o u t= X  l - i n = 0  L i=00000000 c8=0 a l l - e q = X  r e a d y = l  d a ta -o k = 0  tp -c p u = 0
»  ;
»  ; cy  35 ^
»  s e t  1 - i n :0
»  cy  •
3 6 :4 >  l -o u t= X  l - i n = 0  L i=00000000 c8»0 a l l - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  36 
»  s e t  1 - i n :0 
»  cy
3 7 :4 >  l -o u t= X  l - i n = 0  L i=00000000 c8=0 a l l - e c p X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  37 
»  s e t  l - i n : l  
»  cy
3 8 : 4> l-o u t= X  l - i n = l  Li=XXXX0000 c8=0 a l l - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  38
UUCS-TR-008 Page 48
»  s e t  l - i n : 0  
»  cy
3 9 : 4> l-o u t= X  l - i n = 0  Li=XXXOOOOX c8=0 a l l - e q = X  re a d y = l d a ta -o k = 0  to -c p u = 0
»  ; *
»  ;c y  39 
»  s e t  l - i n : 0  
»  cy
4 0 :4> l-o u t= X  l - l n - 0  Li=XX0000XX c8=0 a l l - e q = X  re a d y = l d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  40 
»  s e t  1 - i n :0 
»  cy
4 1 : 4> l-o u t= X  l - i n = 0  Li=00000XXX c 8 = l a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  41 ’
»  s e t  1 - i n :0 
»  cy
4 2 : 4> 1 —out=X  l - i n = 0  L i=00000000 c8=0 a l l - e q = X  re a d y = l d a ta -o k = 0  to -c p u = 0
»  ;
»  ; cy  42
»  s e t  l - i n : 0  "
»  cy
4 3 :4 >  l-o u t= X  l - i n = 0  L i“ 00000000 c8=0 a l l - e q = X  re a d y = l  d a ta -o k « 0  to -c p u * 0
»  ;
»  ;c y  43 
»  s e t  l - i n : 0  
»  cy
4 4 : 4> l-o u t= X  l - i n = 0  L i=00000000 c8=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  44 
»  s e t  l - i n : 0  
»  cy
4 5 :4> l-o u t= X  l - i n = 0  L i«00000000 c8=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  45 
»  s e t  1 - i n :0 
»  cy
4 6 :4 >  l-o u t= X  l - i n = 0  L i=00000000 c8=0 a l l - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  46 
»  s e t  l - i n : l  
»  cy
4 7 : 4> l-o u t= X  l - i n = l  Li=XXX00000 c8=0 a l l - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ; cy  47 
»  s e t  l - i n : 0  
»  cy
4 8 :4> l-o u t= X  l - i n = 0  Li=XX00000X c8=0 a l l - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  48 
»  s e t  l - i n : 0  
»  cy
4 9 :4> l-o u t= X  l - i n = 0  Li=000000XX c 8 = l a l l - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ;c y  49 
»  s e t  1 - i n : 0 
»  cy
UUCS-TR-008 Page 49
5 0 : 4> l -o u t= X  l - i n = 0  L i=00000000 c8=0
»  ;
»  ;c y  50 
»  s e t  1 - i n :0 
»  cy
5 1 : 4> l-o u t= X  l - i n = 0  L i=00000000 c8«0
»  ;
»  ;c y  51 
»  s e t  l - i n : 0  
»  cy
5 2 : 4> l -o u t= X  l - i n = 0  L i=00000000 c8«0
»  ;
»  ;c y  52 
»  s e t  l - i n : 0  
»  cy
5 3 : 4> l-o u t= X  l - i n = 0  L i=00000000 c8=0
»  ;
»  ;c y  53 
»  s e t  l - i n : 0  
»  cy
5 4 : 4> l-o u t= X  l - i n « 0  L i=00000000 c8=0
»  ;
»  ;c y  54 
»  s e t  l - i n : 0  
»  cy
5 5 : 4> l -o u t= X  l - i n = 0  L i=00000000 c8=0
»  ;
»  ; c y  55 
»  s e t  1 - i n :1 
»  cy
5 6 : 4> l-o u t= X  l - i n = l  Li=XX000000 c8=0
»  ;
»  ;c y  56 
»  s e t  l - i n : 0  
»  cy
5 7 : 4> l-o u t= X  l - i n = 0  L i= 0000000x  c 8 = l
»  ;
»  ;c y  57 
»  s e t  1 - i n :1  
»  cy
5 8 : 4> l -o u t= X  l - ± n = l  L i=00000000 c8=0
»  ;
»  ;c y  58 
»  s e t  l - i n : l  
»  cy
5 9 : 4> l -o u t= X  l - i n = l  L i=00000000 c8=0
»  ;
»  ;c y  59 
»  s e t  1 - i n :1  
»  cy
6 0 :4> l-o u t= X  l - i n = l  L i-0 0 0 0 0 0 0 0  c8=0
»  ;
»  ; c y  60 
»  s e t  1 - i n :1 
»  cy
6 1 :4> l-o u t= X  l - i n = l  L i=00000000 c8=0
»  ;
a l l - e q = X  r a a d y = l  d a ta -o k = 0  to -c p u = 0
a ll - e q = X  r e a d y = l  da ta -o )c= 0  to -c p u = 0
a ll - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0
a ll - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
a l l - e q = X  re a d y = l  d a ta -o k = 0  to-cpu«=0
a ll - e q = X  r a a d y = l  d a ta -o k = 0  to -c p u = 0
a ll - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u « 0
a ll - e q = X  r a a d y = l  d a ta -o k = 0  to -c p u = 0
a ll - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
a ll - e q = X  r a a d y = l  d a ta -o k = 0  to -c p u = 0
a l l - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
all-eq=X  ready=l data-ok=0 to-cpu=0
UUCS-TR-008 Page 50
»  ; c y  61 
»  s e t  1 - i n :1 
»  cy
6 2 :4> 1 -o u t-X  l - i n = l  L i=00000000 c8=0 a l l - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ; c y  62 
»  s e t  1 - i n :1 
»  cy
63: 4> l-o u t= X  l - i n = l  L i= 0 0000000 c8=0 a l l - e q = X  r e a d y = l  d a ta -o k = 0  to -c p u = 0
»  ;
»  ; c y  63 
»  s e t  1 - i n :1 
»  cy
64 :4 >  l-o u t= X  l - i n = l  Li=X1000000 c8=0 a l l - e q = X  re a d y = l  d a ta -o k = 0  to -c p u = 0  
»  ; ’ !t>
»  ;c y  64 
»  s e t  1 - i n :1 
»  c y  150
; A row v e c t o r  L l i s  g e n e r a te d  a t  c y  65 a f t e r  i n p u t i n g  n e c e s s a r y  d a t a .
; S in c e  c8 g e t s  1 , FSM s to p s  c o u n t in g ,  i n  t h a t  c8 re m a in s  1 , an d  th e  
; c o m p u ta tio n  e n t e r s  i t e r a t i o n  e n t r a n c e  a t  c y  65. The new v e c t o r  u p d a te s  
; t h e  c u r r e n t  L l v e c t o r  a t  cy  66.
65: 4> l - o u t = l l - i n = l Li=10000000 c8 = l a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
66: 4> l - o u t = l l - i n = l Li=10000000 c8 = l a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
67: 4> l - o u t = l l - i n = l Li=10000000 c 8 = l a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
68: 4> l - o u t= 0 l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
69: 4> l- o u t= 0 l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
70: 4> l- o u t= 0 l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
7 1 : 4> l- o u t= 0 l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
72 : 4> l- o u t= 0 l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
7 3 : 4> l- o u t= 0 l - i n = l Li=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
7 4 : 4> l- o u t= 0 l - i n = l L i= 00 l00000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
7 5 : 4> l- o u t= 0 l - i n = l L i=01000000 c8 = l a l l - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
7 6 : 4> l - o u t= 0 l - i n = l Li=01000000 c8 = l a l l - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
; Same s i t u a t i o n  a s  a b o v e , o n ce  c8 g e t s  an  1 , FSM s to p s  ( a t  cy  75) c o u n tin g  
; o f  c8 an d  u p d a t in g  c u r r e n t  v e c to r  L2 ( a t  cy  7 6 } . The r e l a x a t i o n  i t e r a t i o n  
; k e e p s  g o in g  on i n  su c h  way a s  shown i n  t h e  f i l l o w i n g  f i l e .
77 : 4> l - o u t = l l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
78 : 4> l- o u t= 0 l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
7 9 : 4> l- o u t= 0 l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
80: 4> l - o u t= 0 l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
81: 4> l - o u t= 0 l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
82: 4> l - o u t= 0 l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
83: 4> l - o u t= 0 l - i n = l L i=00010000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
84: 4> l - o u t= 0 l - i n = l L i=00100000 c 8 = l a l l - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
85: 4> l - o u t= 0 l - i n = l L i=00100000 c 8 = l a l l - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
86: 4> l - o u t= 0 l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
87: 4> l - o u t = l l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
88: 4> l- o u t= 0 l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
89: 4> l- o u t= 0 l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
90: 4> l- o u t= 0 l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
91: 4> l - o u t= 0 l - i n = l L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
92: 4> l- o u t= 0 l - i n = l Li=00001000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
UUCS-TR-008 Page 51
93: 4> l - o u t= 0 l - i n = L i-00010000 c 8 = l a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
94: 4> l- o u t= 0 l - i n = L i-00010000 c 8 = l a l l - e c p X re a d y = l d a ta -o k = 0 to -c p u = 0
95: 4> l - o u t= 0 l - i n = L i= 0 0 000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
96: 4> l- o u t= 0 l - i n = L i= 0 0000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
97 :4> l - o u t = l l - i n = L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
98: 4> l - o u t= 0 l - i n = L i-0 0 0 0 0 0 0 0 c8=0 a ll - e q = X re a d y * 1 d a ta -o k = 0 to -c p u = 0
99: 4> l - o u t= 0 l - i n = L i=00000000 c8=0 a ll - e q = X r e a d y = l d a ta -o k = 0 to -c p u = 0
1 0 0 :4> l - o u t= 0 l - i n = L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
101: 4> l - o u t= 0 l - i n = L i=00000100 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 0 2 :4> l - o u t= 0 l - i n = L i=00001000 c 8 = l a l l - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 0 3 :4> l - o u t= 0 l - i n = L i=00001000 c 8 = l a l l - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 0 4 :4> l - o u t= 0 l - i n = L i= 0 0000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 0 5 :4> l - o u t= 0 l - i n = Li=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 0 6 :4> l - o u t= 0 l - i n = Li=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 0 7 :4> l - o u t = l l - i n = L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
108 :4> l- o u t= 0 l - in = : L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 0 9 :4> l-o u t= 0 l - i n = L i= 0 0 000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
110 :4> l- o u t= 0 l - i n = L i=00000010 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 1 1 : 4> l- o u t= 0 l - i n = Li=00000100 c 8 = l a l l - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 1 2 :4> l- o u t= 0 l - i n = Li=00000100 c 8 = l a l l - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 1 3 :4> l- o u t= 0 l - i n = Li=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
114 :4> l- o u t= 0 l - i n = L i—00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 1 5 :4> l- o u t= 0 l - i n = Li=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 1 6 :4> l- o u t= 0 l - i n = L i—00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 1 7 :4> l - o u t = l l - i n = : L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 1 8 :4> l- o u t= 0 l - i n = Li=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 1 9 :4> l- o u t= 0 l - i n = L i= 00000001 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 2 0 :4> l - o u t= 0 l - i n = Li=00000010 c 8 = l a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 2 1 :4> l - o u t= 0 l - i n = : L i=00000010 c 8 = l a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 2 2 :4> l- o u t= 0 l - i n = Li=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 2 3 :4> l- o u t= 0 l - in = : L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 2 4 :4> l - o u t= 0 l - i n = : L i=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 2 5 :4> l - o u t= 0 l - i n = L i= 0 0 0 00000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 2 6 :4> l - o u t= 0 l - i n = Li=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 2 7 :4> l - o u t = l l - i n = Li*=00000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 2 8 :4> l - o u t= 0 l - i n = Li=01000000 c8=0 a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 2 9 :4> l - o u t = l l - i n = L i=00000001 c 8 = l a ll - e q = X re a d y = l d a ta -o k = 0 to -c p u = 0
1 3 0 :4> l - o u t= 0 l - i n = Li=00000001 c 8 = l a l l - e q = 0 re a d y = l d a ta -o k = 0 to -c p u = 0
; The a l l - e q  s i g n a l i s  s e t  t o  0 s in c e 8- b i t  s t a t e  r e g i s t e r  h av e lo a d e d  i n
; t o  8 d e f i n i t e  d a t a .  The i t e r a t i o n  s t i l l  k e e p s  g o in g  on f o r  an  a l l - e q  = 1
; i s  a p p e a re d  
1 3 1 :4>  l - o u t= 0 l - i n = l L i-0 0000000 c8=0 a l l - e q = 0 re a d y = l oll01 « 4J ■8 to -c p u = 0
1 3 2 :4> l- o u t= 0 l - i n = l L i=00000000 c8=0 a l l - e q = 0 re a d y = l d a ta -o k = 0 to -c p u = 0
1 3 3 :4> l- o u t= 0 l - i n = l L i=00000000 c8=0 a l l - e q = 0 re a d y = l d a ta -o k = 0 to -c p u = 0
1 3 4 :4> l- o u t= 0 l - i n = l L i=00000000 c8=0 a l l - e q = 0 re a d y = l d a ta -o k = 0 to -c p u = 0
1 3 5 :4> l- o u t= 0 l - i n = l L i=00000000 c8=0 a l l - e q = 0 re a d y = l d a ta -o k = 0 to -c p u = 0
1 3 6 :4> l- o u t= 0 l - i n = l L i—00000000 c8=0 a l l - e q = 0 re a d y = l d a ta -o k = 0 to -c p u = 0
1 3 7 :4> l - o u t = l l - i n = l L i=00000000 c8=0 a l l - e q = 0 re a d y = l d a ta -o k = 0 to -c p u = 0
1 3 8 :4> l - o u t = l l - i n = l L i=10000000 c 8 = l a l l - e q = 0 re a d y = l d a ta -o k = 0 to -c p u = 0
1 3 9 :4> l - o u t = l l - i n = l L i-1 0 0 0 0 0 0 0 c 8 = l a l l - e q = 0 re a d y = l d a ta -o k = 0 to -c p u = 0
1 4 0 :4> l - o u t= 0 l - i n = l L i=00000000 c8=0 a l l - e q = 0 re a d y = l d a ta -o k = 0 to -c p u = 0
1 4 1 :4> l - o u t= 0 l - i n = l L i=00000000 c8=0 a l l - e q = 0 re a d y = l d a ta -o k = 0 to -c p u = 0
1 4 2 :4> l - o u t= 0 l - i n = l L i=00000000 c8=0 a l l - e q = 0 re a d y = l d a ta -o k = 0 to -c p u = 0
1 4 3 :4> l - o u t= 0 l - i n = l L i=00000000 c8=0 a l l - e q = 0 re a d y = l d a ta -o k = 0 to -c p u = 0
1 4 4 :4> l - o u t= 0 l - i n = l L i—00000000 c8=0 a l l - e q = 0 re a d y = l d a ta -o k = 0 to -c p u = 0
UUCS-TR-008 Page 52
145 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
146 4> -out=0 -in=! Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
147 4> -out=0 -in=! Li=01000000 c8=l all-eq=0 ready=l data-ok=0 o-cpu=0
148 4> -out=0 -in=! Li=01000000 c8=l all-eq=0 ready=l data-ok=0 o-cpu=0
149 4> -out=l -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
150 4> -out=0 -in=: Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
151 4> -out=0 -in=! L i—00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
152 4> -out=0 -in=: Li=00000000 c8=0 all-eq=0 ready=l data-ok—0 o-cpu=0
153 4> -out=0 -in=! Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
154 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
155 4> -out=0 -in=! Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
156 4> -out=0 -in=! Li=00100000 c8=l all-eq=0 ready=l data-ok=0 o-cpu=0
157 4> -out=0 -in=! Li=00100000 c8-l all-eq=0 ready=l data-ok=0 o-cpu=0
158 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
159 4> -out=l -in=! Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
160 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
161 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
162 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
163 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
164 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
165 4> -out=0 -in= Li=00010000 c8=l all-eq=0 ready=l data-ok=0 o-cpu=0
166 4> -out=0 -in=: Li=00010000 c8=l all-eq=0 ready=l data-ok=0 o-cpu=0
167 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
168 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
169 4> -out=l -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
170 4> -out=0 -in=! Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
171 4> -out=0 -in=: Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
172 4> -out=0 -in=! Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
173 4> -out=0 -in=: Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
174 4> -out=0 -in=! Li=00001000 c8=l all-eq=0 ready=l data-ok=0 o-cpu=0
175 4> -out=0 -in= Li=00001000 c8=l all-eq=0 ready=l data-ok=0 o-cpu=0
176 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
177 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
178 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
179 4> -out=l -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
180 4> -out=0 -in=! Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
181 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
182 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
183 4> -out=0 -in=! Li=00000100 c8=l all-eq=0 ready=l data-ok=0 o-cpu=0
184 4> -out=0 -in= Li=00000100 c8=l all-eq=0 ready=l data-ok=0 o-cpu=0
185 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
186 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
187 4> -out=0 -in=! Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
188 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
189 4> -out=l -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
190 4> -out=0 -in=: Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
191 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
192 4> -out=0 -in= Li=00000010 c8=l all-eq=0 ready=l data-ok=0 o-cpu=0
193 4> -out=0 -in=: Li=00000010 c8=l all-eq=0 ready=l data-ok=0 o-cpu=0
194 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
195 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
196 4> -out=0 -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
197 4> -out=0 -in=! Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
198 4> -out=0 -in=! Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0
199 4> -out=l -in= Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0200 4> -out=0 -in=! Li=00000000 c8=0 all-eq=0 ready=l data-ok=0 o-cpu=0201 4> -out=0 -in= Li=00000001 c8=l all-eq=0 ready=l data-ok=0 o-cpu=0
UUCS-TR-008 Page 53
202:4> l-out=0 l- in= l Li=00000001 c8=l all-eq=l ready=l data-ok=0
203:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0
204:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0
205:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0
206:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0
207:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0
208:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0
209:4> l-out=l l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0
210:4> l-out=l l- in= l Li=10000000 c8=l all-eq=l ready=l data-ok=l
; Ok, 
; and
now an 1 of all-eq is  reached at cy 2 1 0 . 
then waits for a signal from CPU.
DRA2 sends a data
2 1 1 :4> l-out=l l- in= l Li=10000000 c8=l all-eq=l ready=l data-ok=l
212:4> l-out=l l- in= l Li=10000000 c8=l all-eq=l ready=l data-ok=l


















A to-cpu is  sent to DAR2 which in it ia te s  the data transfering from DRA2 
to  CPU. The follow ing eight 8 cycles are the process to  output the 8 
row vectors of L matrix. Look at 1-out:
sim> cy 8
215:4> H 1 o c ft II M l- in= l ooooooorl11•HA c8=l all-eq=l ready=l data-ok=0 to-cpu=l
216:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=*l data-ok=0 to-cpu=l
217:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
218:4> l-out=0 1-in—1 Li=00000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
219:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok-0 to-cpu=l
220:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
221:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l2 2 2 :4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
; This resu lt means that the f i r s t  row vector of L matrix is  Ll=10000000.
223:4> l-out=0 l- in= l Li=01000000 c8=l all-eq=l ready=l data-ok=0 to-cpu=l
224:4> l-out=l l- in= l Li=10000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
225:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
226:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
227:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
228:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq^l ready=l data-ok=0 to-cpu=l
229:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
230:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
; The second row vector is  L2=01000000.
sim> cy 8
231:4> H 1 o c ft II o l- in= l oooooHooII•H c8=l all-eq=l ready=l data-ok=0 to-cpu=l
232:4> l-out=0 l- in= l Li=01000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
233:4> l-out=l l- in= l Li=10000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
234:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
235:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
236:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
237:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
238:4> l-out=0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
UUCS-TR-008 Page 54












0 l- in= l Li=00010000 c8=l all-eq=l ready=l data-ok= 
0 l- in= l Li=00100000 c8=0 all-eq=l ready=l data-ok=0 l- in= l L i=01000000 c8=0 all-eq=l ready=l data-ok:1 l- in= l L i=10000000 c8=0 all-eq=l ready=l data-ok: 
0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok: 
0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok 0 l- in= l L i=00000000 c8=0 all-eq=l ready=l data-ok 
0 l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok
; L4=00010000.











0 l- in= l L i=00001000 c8=l all-eq=l ready=l data-ok
0 l- in= l Li=00010000 c8=0 all-eq=l ready=l data-ok0 l- in= l L i=00100000 c8=0 all-eq=l ready=l data-ok0 l- in= l L i=01000000 c8=0 all-eq=l ready=l data-ok1 l- in= l L i=10000000 c8=0 all-eq=l ready=l data-ok 0 l- in= l L i=00000000 c8=0 all-eq=l ready=l data-ok 0 l- in= l L i=00000000 c8=0 all-eq=l ready=l data-ok 




































l- in= l L i=00000100 c8=l all-eq=l ready=l data-ok
l- in= l Li=00001000 c8=0 all-eq=l ready=l data-ok
l- in= l Li=00010000 c8=0 all-eq=l ready=l data-ok
l- in= l Li=00100000 c8=0 all-eq=l ready=l data-ok
l- in= l Li=01000000 c8=0 all-eq=l ready=l data-ok
l- in= l Li=10000000 c8=0 all-eq=l ready=l data-ok
l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok
l- in= l Li=00000000 c8=0 all-eq=l ready=l data-ok
=0 to-cpu=l =0 to-cpu=l =0 to-cpu=l =0 to-cpu=l =0 to-cpu=l =0 to-cpu=l =0 to-cpu=l =0 to-icpu=l
=0 to-cpu=l =0 to-cpu=l =0 to-cpu=l =0 to-cpu=l =0 to-cpu=l =0 to-cpu=l =0 to-cpu=l =0 to-cpu=l
=0 to-cpu=l =0 to-cpu=l =0 to-cpu=l =0 to-cpu=l =0 to-cpu=l =0 to-cpu=l =0 to-cpu=l =0 to-cpu=l
; L6=00000100.
l- in= l Li=00000010 c8=l all-eq=l ready=l data 
l- in= l Li=00000100 c8=0 all-eq=l ready=l data 
l- in= l Li=00001000 c8=0 all-eq=l ready=l data 
l- in= l Li=00010000 c8=0 all-eq=l ready=l data 
l- in= l Li=00100000 c8=0 all-eq=l ready=l data 
l- in= l Li=01000000 c8=0 all-eq=l ready=l data 
l- in= l Li=10000000 c8=0 all-eq=l ready=l data 



















271:4> l-out=0 l- in= l Li=00000001 c8=l all-eq=l ready=l data-ok=0 to-cpu=l
272:4> l-out=0 l- in= l Li=00000011 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
273:4> l-out=0 l- in= l Li=00000111 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
274:4> l-out=0 l- in= l Li=00001111 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
275:4> l-out=0 l- in= l Li=00011111 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
276:4> l-out=0 l- in= l Li=00111111 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
UUCS-TR-008 Page 55
277:4> l-out=0 l- in= l Li=01111111 c8=0 all-eq=l ready=l data-ok=0 to-cpu=l 
278:4> l-out=l l- in= l L i = l l l l l l l l  c8=0 all-eq=l ready=l data-ok=0 to-cpu=l
; L8=00000001.
sim> q
