Design of microprocessor-based hardware for number theoretic transform implementation by Shamim, Anwar Ahmed
Durham E-Theses
Design of microprocessor-based hardware for number
theoretic transform implementation
Shamim, Anwar Ahmed
How to cite:
Shamim, Anwar Ahmed (1983) Design of microprocessor-based hardware for number theoretic transform
implementation, Durham theses, Durham University. Available at Durham E-Theses Online:
http://etheses.dur.ac.uk/7213/
Use policy
The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or
charge, for personal research or study, educational, or not-for-proﬁt purposes provided that:
• a full bibliographic reference is made to the original source
• a link is made to the metadata record in Durham E-Theses
• the full-text is not changed in any way
The full-text must not be sold in any format or medium without the formal permission of the copyright holders.
Please consult the full Durham E-Theses policy for further details.
Academic Support Oﬃce, Durham University, University Oﬃce, Old Elvet, Durham DH1 3HP
e-mail: e-theses.admin@dur.ac.uk Tel: +44 0191 334 6107
http://etheses.dur.ac.uk
2
~(}'J CF MlffiCFROCESSOR-BASEO HARDWARE FOR 
NUM3ER 11-ECRETIC TRANSFORM IMPLEMENTATION 
by 
Anwar Ahmed Shamim B.Sc., M.Sc. 
The copyright of this thesis rests with the author. 
No quotation from 'it should be published without 
his prior written consent and information derived 
from it should be acknowledged. 
A thesis submitted in accordance with the regulations 
for the degree of Doctor of Philosophy at the University 
of Durham, Department of Applied Physics and Electronics. ; 
1983 
' -
Design of Microprocessor-Based Hardware for 
Number Theoretic Transform Implementation 
Anwar Ahmed Shamim 
ABSTRACT 
Number Theoretic Transforms (NTTs) are defined in a finite 
ring of integers ZM' where M is the modulus. All the 
arithmetic operations are carried out modulo M. NTTs are similar 
in structure to DFTs, hence fast FFT type algorithms may be used 
to compute NTTs efficiently. A major advantage of the NTT is 
that it can be used to compute error free convolutions, unlike 
the FFT it is not subject to round off and truncation errors. 
In 1976 Winograd proposed a set of short length OFT 
algorithms using a fewer number of multiplications and 
approximately the same number of additions as the Cooley-Tukey 
FFT algorithm. This saving is accomplished at the expense of 
increased algorithm complexity. These short length OFT 
algorithms may be combined to perform longer transforms. 
The Winograd Fourier Transform Algorithm (WFTA) was 
implemented on a TMS9900 microprocessor to compute NTTs. Since 
multiplication conducted modulo M is very time consuming a 
special purpose external hardware modular multiplier was 
designed, constructed and interfaced with the TMS9900 
microprocessor. This external hardware modular multiplier allowed 
an improvement in the transform execution time. 
Computation time may further be reduced by employing several 
microprocessors. Taking advantage of the inherent parallelism of 
the WFTA, a dedicated parallel microprocessor system was designed 
and constructed to implement a 15-point WFTA in parallel. 
Benchmark programs were written to choose a suitable 
microprocessor for the parallel microprocessor system. A master 
or a host microprocessor is used to control the parallel 
microprocessor system and provides an interface to the outside 
world. An analogue to digital (A/D) and a digital to analogue 
(D/ A) converter allows real time digital signal processing. 
-i-
AO<NOWLEDGEMENTS 
I owe my unbound gratitude to Dr. B. J. Stanier for his 
guidance, constructive criticism, invaluable suggestions and 
kindly help throughout the period of this project. 
I should like to thank Prof. G. G. Roberts for allowing me 
to use the facilities of the Department of Applied Physics and 
Electronics. I am grateful to my colleagues for valuable 
discussions. Furthermore, I am also thankful to the members of 
the workshop for providing the necessary and technical 
assistance. 
I should also like to thank the computer unit and the 
library staff for their co-operation. 
I greatly acknowledge the moral and financial support I 
received from my parents, brothers, sisters, Mr. A. Khan and 
family and H. Khan. I also extend my appreciation to 
Prof. N. A. Khan and other friends for their moral support. 
My special thanks are due to Dr. M. Ahmed and the South 
Fields Trust, London for providing the financial support. 
-i i-
Dedicated To My Affectionate Parents 
Who Inspired Me To Higher Ideals Of Life 
-iii-
ABSTRACT 
ACKNOWLEDGEMENTS 
a-IAPTER 1 
Introduction 
a-IAPTER 2 
Elementary Number Theory and Number Theoretic Transforms 
2.1 Introduction 
2.2 Discrete Fourier Transform and the Convolution 
2.3 Congruence 
2.4 Chinese Remainder Theorem (CRT) 
2.5 Groups, Rings, and Fields 
2.6 Number Theoretic Transforms 
2.6.1 Mersenne Number Transforms 
2.6.2 Fermat Number Transforms 
a-IAPTER 3 
Multiplication Techniques For Microprocessors 
3.1 Introduction 
3.2 Clocked Multiplication Algorithms 
3.2.1 Multiplication on a Microprocessor 
3.2.2 Burk-Goldstine - Von-Neumann Method 
3.2.3 Robertson's First Method 
3.2.4 Robertson's Second Method 
3.2.5 Booth's Algorithm 
3.2.6 A Short Cut Multiplication Method 
3.2. 7 Multiple Digit Multiplication Method 
-iv-
3.3 Clockless Multiplication 
3.3.1 Array or Parallel Multiplication 
3.4 Read Only Memory (ROM) Multiplier 
3.4.1 Direct ROM Multiplier 
3.4.2 Quarter-Squares Lookup Table Multiplication 
3.4.3 Multiplication Using Logarithms 
3.5 Parallel Multiplier Chips 
3.6 Modular Arithmetic on Microprocessor 
3.6.1 Addition Modulo 65521 
3.6.2 Subtraction Modulo 65521 
3.6.3 Multiplication Modulo 65521 
D-IAPTER 4 
Implementation of the Winograd Fourier Transform Algorithm 
4.1 Introduction 
4.2 Computation of NTT using WFT A 
4.2.1 Determination of the Constants for the WFT A 
4.3 Architecture of the TMS9900 Microprocessor 
4.4 Implementation on the Microprocessor 
D-IAPTER 5 
External Hardware Modular Multiplier 
5.1 Introduction 
5.2 Design and Implementation of an External Hardware 
Modular Multiplier 
5.2.1 Interfacing Considerations 
5.2.2 Interfacing the Modular Multiplier with the 
TMS9900 Microprocessor 
-v-
5.3 Results 
D-IAPTER 6 
Multi Processor and Parallel Processor Systems 
6.1 Introduction 
6.2 System Organisation 
6.2.1 Single Instruction Single Data (SISD) Machine 
6.2.2 Single Instruction Multiple Data (SIMD) Machine 
6.2.3 Multiple Instruction Multiple Data (MIMD) Machine 
6.2.4 Multiple Instruction Single Data (MISD) Machine 
6.3 Multi Processor Systems 
6.3.1 Directly Coupled Multi Processor Systems 
6.3.2 Indirectly Coupled Multi Processor Systems 
6.4 Inter Processor Communication 
6.4.1 Time-Shared Bus 
6.4.2 Dedicated Link 
6.5 Parallel Processor Systems 
6.6 Array Processors 
6. 7 Processor - Memory Interconnection 
6.8 Computer Systems 
6.8.1 Ring Structure 
6.8.2 Star Link 
6.8.3 Fully Connected Link 
D-IAPTER 7 
A Dedicated Parallel Microprocessor System 
7.1 Introduction 
7.2 Choice of a Microprocessor 
-vi-
7.3 Architecture of the MC6809 Microprocessor 
7 .3.1 Hardware and Software Interrupts 
7 .3.2 Microprocessor Synchronisation 
7.4 Inter Microprocessor Communication 
7.5 Dual Microprocessor System 
7.5.1 Merits and Demerits 
7.6 Design and Implementation of the Dedicated Parallel 
Microprocessor System 
7 .6.1 System Architecture 
7 .6.2 Design of the Control Microprocessor 
7.6.3 Software of the Control Microprocessor 
7 .6.4 Design of a Typical Slave Microprocessor 
7.6.5 Software of the Slave Microprocessors 
7 .6.6 Synchronisation of the Hardware and the Software 
7.7 Transforms of Real Time Signals 
7.8 Results 
a-1APTER 8 
Conclusion 
Appendix-A 
Modular Arithmetic Routines for the following 
microprocessors. 
TMS9900, MC6809, ZBO, 6502 
32/16-bit Divide Routine for the MC6809 
Appendix-8 
Assembler Source Listing for a 15-point WFTA (TMS9900) 
FORTRAN Source Listing for a 15-point WFTA 
-vii-
Appendix-C 
FORTH Source Listing for a 60-point WFTA 
Appendix-0 
Software of the Parallel Microprocessor System 
Assembler Source listing for IS-point WFT A (MC6809) 
Appendix-E 
Backplane Wiring for the Parallel Microprocessor System 
REFERENCES 
-viii-
GiAPTER 1 
Introduction 
The aim of this work was to design hardware to facilitate 
the implementation of the Winograd Fourier Transform Algorithm 
(WFT A) to compute Number Theoretic Transforms (NTTs) on 
microprocessors. 
Microprocessors are easy to implement and provide cheap 
integer processing power. In recent years there has been a major 
breakthrough in the solid state technology, which is responsible 
for providing highly reliable hardware. 
Cooley and Tukey (B), described a fast and efficient method 
to compute the Discrete Fourier Transform (OFT) via the Fast 
Fourier Transform (FFT) algorithm (2). The FFT is subject to 
truncation and round off errors, since it involves 
multiplications with complex irrational roots of unity, which 
cannot be represented accurately on a finite precision machine. 
Number Theoretic Transforms on the other hand have a similar 
structure to DFTs, and are defined in a finite ring of integers 
Z M' where M is the modulus. All the arithmetic operations are 
carried out modulo M. Fast FFT type algorithms may also be used 
to compute NTTs without round off errors (9) - (15), (20), (80). 
The results thus obtained are exact. 
Winograd (3), (4), proposed short length OFT algorithms 
which show improvement over the conventional FFT algorithm. The 
~ 
1-1 
WFT A requires fewer multiplications, and roughly the same number 
of additions as the Cooley-Tukey FFT algorithm. In the FFT the 
transform length is restricted to powers of 2, but in the WFTA 
the transform length is the product of several mutually prime 
factors. These mutually prime factors are chosen from the short 
length (small-N) WFTA. Transform lengths from 2 to 5040 may be 
implemented. Implementation of the WFT A requires some constants 
to be precomputed and stored in the memory which requires more 
memory than the comparable length FFT (51). The WFTA requires 
less multiplications, but at the expense of increased algorithm 
complexity and more data transfers (52). 
Martin (5), (6), carried out a search for a suitable modulus 
M for 16-bit arithmetic on the lines described by Bailey (53), 
and found that M = 65521 is suitable for NTT implementation. 
Agarwal and Burrus (9), have shown that the transform lengths are 
subject to certain constraints. 
1- N must divide O(M), where O(M) is greatest common divisor 
(g.c.d) of the set of prime divisors (p. - 1) of M. 
1 
O(M) = g.c.d (p. - 1) 
1 
2- An element a of order N must exist such that 
aN :: 1 mod M, a r '¢ 1 mod M, ¥- r < N. 
3- N-l must exist in the ring ZM. If M is not prime, 
-1 then N may or may not exist. N • N-l - 1 mod M. 
1-2 
4- N must be well factored for fast transform algorithms 
to. exist. 
5- To implement fast and simple arithmetic mod M, M and 
a. must have simple binary representation. 
No attempt has been made to compare the WFT A and the FFT nor 
to derive any of the algorithms. Martin (5), have discussed 
these topics in detail. Here we will emphasise more the hardware 
design and implementation to compute NTT via WFT A. McClellan and 
Rader (7), provide good references for the NTT and the WFTA. 
In chapter 2 basic number theory and Number Theoretic 
Transforms, and some fundamental concepts about rings, fields, 
and modular arithmetic are described. A brief discussion about 
Mersenne Number Transforms (MNTs) and Fermat Number Transforms 
(FNT s) is also presented. 
Chapter 3 describes different algorithms for signed and 
unsigned multiplication suitable for microprocessors. 
Multiplication using ROM lookup is also described, this method 
provides a fast way of multiplying two numbers. However, the 
applications may be limited since the size of the ROM increases 
rapidly as the size of the input numbers increase. Fast 
multiplier chips are now available which may replace several 
discrete components. Finally 16-bit modular arithmetic 
operations for a microprocessor are described. 
1-3 
Chapter 4 describes a step by step approach towards the 
implementation of the WFTA to compute the NTT. The WFTA was 
implemented on the TMS9900 microprocessor (54), (55), using 
Assembler and FORTH (56), (57), languages. The WFTA was also 
implemented on the MC6809 microprocessor (78), (79),. using 
Assembler language, and in FORTRAN and Assembler on IBM mainframe 
computers (370/168 and 370/4341). 
The total transform execution time on a processor depends 
upon the number of operations and the time required to execute 
each operation. Ordinary microprocessors do not have hardware 
multiplication, even microprocessors with hardware multiply 
require a considerable amount of time for multiplication. 
Modular arithmetic operations and in particular modular 
multiplication, are very slow. Chapter 5 describes a special 
purpose (16 x 16-bit) external hardware modular multiplier (mod 
65521) interfaced with the TMS9900 microprocessor. This modular 
multiplier behaves as an intelligent memory mapped peripheral. 
We shall use the term modular for the results reduced modulo M. 
This external modular multiplier uses multiplier chips and ROM 
lookup techniques to generate the modular product. Finally 
comparison of timings for the implementation of WFTA with and 
without using the external hardware modular multiplier are 
discussed. 
Chapter 6 provides prerequisite information and describes 
some of the basic concepts of parallel and multi processor 
systems. In addition inter processor communication, array 
processors and processor to memory interconnection is also 
1-4 
described. 
The difficulties involved in the uni processor 
implementation of the WFTA is that it requires more data 
transfers and indexing in the memory to acquire data (52). Since 
the WFTA exhibits parallelism in its structure, the possibility 
of parallel implementation of the WFTA was investigated. Chapter 
7 describes design and construction of a parallel microprocessor 
system to implement a 15-point WFT A. 
Benchmark programs were written to choose a suitable 
microprocessor for the design of a parallel microprocessor 
system. Motorola's MC6809 microprocessor gave an optimum choice 
among several microprocessors. To investigate the principle of 
data exchange between the two microprocessors, a two 
microprocessor system (using MC6809) was designed and tested. 
The TMS9900 microprocessor was used as a host processor. 
Since the modular multiplication is the most time consuming 
operation, the parallel microprocessor system was designed such 
that each of the microprocessor is loaded equally during the 
modular multiplication. 
used to control the 
A control or a master microprocessor is 
parallel structure. The control 
microprocessor provides communication between the parallel 
microprocessor system and the outside world. Inter 
microprocessor communication is through dedicated latches. The 
system configuration is that of a master and slave, all the 
input/output (1/0) data is through the master microprocessor. 
1-5 
The system design is described, and the timings for parallel 
and uni processor implementation of the 15-point WFT A are 
discussed. Finally a 15-point convolution was also implemented on 
the parallel microprocessor system. The software development is 
the bottleneck of the parallel microprocessor system. 
It was found that the execution time of a 15-point WFTA on 
the parallel microprocessor system is comparable with the 
execution time on IBM mainframe computers. 
Software routines are listed in appendix-A to appendix-D. 
Appendix-E contains backplane wiring connections for the parallel 
microprocessor system. 
---- -- --- ---- -- -- ----- -~---
Fully documented program listings appearing in the 
appendices A - D are available in a separate folder. 
'---'----- -----------
---------------------- -
1-6 
D-fAPlER 2 
Elementary Number Theory and Number Theoretic Transforms 
2.1 Introduction 
The Discrete Fourier Transform (OFT) of a sequence x(n) is 
given by: 
N-1 
X(k) = L x(n) wnk (2.1) 
n=O 
where k = 0,1,2, ••• ,N-l. The Inverse Discrete Fourier Transform 
(lOFT) is given by: 
N-1 
x(n) = N-1 L X(k) w-nk (2.2) 
k=O 
-j2Tt/N . r, 
where n = 0,1,2, ••• ,N-1, and W = e , J = ,J -1. 
W N (usually written as W) is the principal root of unity such 
that WN = 1 mod N, where N is the sequence length. 
2 Direct computation of equation (2.1) requires N complex 
operations. A complex operation is a multiplication followed by 
an addition. On a digital computer multiplication of two numbers 
requires more computation time than the addition of two numbers. 
The multiplication time depends entirely on the software and the 
hardware available. To improve the efficiency and to compute 
equation (2.1) faster, the number of multiplications must be 
reduced. Various algorithms are available which are more 
efficient than the direct computation of equation (2.1). 
2-1 
In 1965 Cooley and Tukey (8), presented their FFT (Fast 
Fourier Transform) algorithm. This algorithm efficiently 
computes DFT given by equation (2.1). The number of complex 
operations are 2 reduced from N to Nlog2N. This fractional 
saving of N/log 2 N becomes quite appreciable for sequence 
lengths greater than N = 32. It is required by the algorithm for 
m N to be highly composite and a power of 2, such that N = 2 , 
where m is a positive integer. Reference (2), provides 
theoretical development of the FFT algorithm in detail. 
The Fourier Transforms are complex in general. The 
computation of equation (2.1) using the FFT requires 
multiplications with complex irrational roots of unity. These 
irrational roots cannot be represented accurately on a finite 
precision machine. The FFT is subject to cumulative roundoff and 
truncation errors. This gives rise to noise at the output of 
digital signal processing system, thus deteriorating the 
signal-to-noise ratio. 
2.2 Discrete Fourier Transform and the Convolution 
A common problem in digital signal processing is the 
implementation of convolution which is defined by: 
N-1 
y(n) = L x(i) h(n-i) (2.3) 
i=O 
where n = 0,1,2, ••• ,N-1, y(n) is the convolution of two sequences 
x(n) and h(n). Direct implementation of convolution by using 
2-2 
equation (2.3) is not efficient. However, the Discrete Fourier 
Transform (OFT) can be used to compute convolution efficiently. 
Certain transform possess the Cyclic Convolution Property (CCP), 
which may be represented as follows: 
T(y) = T(h) • T(x) (2.4) 
where '.' denotes pointwise multiplication. The inverse of 
equation (2.4) is given by: 
(2.5) 
So a cyclic (circular) convolution may be performed by taking the 
inverse transform (T-1) of the product of the transforms of the 
two sequences to be convolved. 
Let X(k) and H(k) be the Fourier transforms of the sequences 
x(i) and h(i) respectively. Then from equation (2.5) we have: 
N-1 
y(n) = N-1 L H(k) X(k) w-nk (2.6) 
k=O 
Substituting value of X(k) in equation (2.6) we get, 
N-1 N-1 
y(n) = N-1 L H(k) L x(i) wik w-nk 
k=O i=O 
2-3 
N-1 N-1 
= L x(i) N-1 L H(k) w-k(n-i) 
i=O k=O 
N-1 
= L x(i) h(n-i) 
i=O 
To obtain an N point circular convolution of the sequence 
h(n-i), if the sequence length is less than N it must be 
periodically extended to have a period of N. Hence 
N-1 
y(n) = L x(i) h(n-i mod N) (2.7) 
i=O 
= x(i) * h(i) 
where * denotes convolution. 
Equation (2. 7) shows circular convolution, it is so called 
since it evaluates y(n) as if the input sequence were 
periodically extended outside the range [o to N-1] • This may 
also be stated as that for cyclic convolution the indices are 
evaluated mod N. If zeros are appended to the sequence so as to 
avoid aliasing or overlapping, the cyclic convolution gives the 
same results as conventional convolution. Convolution computed 
via equation (2.5) is -computationally efficient when the sequence 
length is highly composite, so that FFT type algorithms can be 
applied to it. 
2-4 
2.3 Congruence 
Consider two elements a,b of a set. Then for b a positive 
integer, if b is a factor of a we can write 
a = qb + r for O<r<q (2.8) 
where q represents the quotient and r the remainder. Equation 
(2.8) basically represents a division operation. If the 
remainder r = 0 then we say that b divides a and is represented 
as b I a. For all integers in the set there are at least two 
divisors for each element, either II a or a I a. This condition 
indicates that a is a prime, with no divisors except 1 and 
itself. If r = 0 then we say that a is composite a=qb. Either q 
or b or both can be prime or composite. For q and b composite we 
can further factorise until we get prime factor factorisation 
which is written as: 
where p. 
1 
a = pi 1 n r. 
is a prime and r. 
. 1 is an integer exponent. In 
equation (2.8) if b is a fixed number then it is called the 
modulus. Then for infinitely large number of values of a we can 
have the same value of the remainder r. All these values of a 
which give the same value of r are said to be congruent and are 
denoted by =· The remainder r is called the residue mod b, or 
simply the residue. For example, let b = 5. Then 7 = 2 mod 5, 
12 = 2 mod 5, and 17 = 2 mod 5. Numbers 7, 12, 17 are congruent 
mod 5. In general we can write 
a = r mod b 
or b I (a-r) 
also if a = 0 mod b then b I a. Some notations also use angle 
2-5 
brackets to represent the modulus, for example: 
<12> 5 and <13 + 8> 5 
The following conditions hold for congruence 
<I + m> b = <<I> b + <m> b> b 
<I - m> b = <<I> b - <m> b> b 
<Im> b = <<I> b <m> b> b 
The largest number which can divide a and b is called the 
greatest common divisor (g.c.d). If the two numbers a and b are 
mutually prime i.e. they have no common factors then they are 
represented as (a,b) = 1, or a and b have a common factor of 1, 
for example (3,4) = 1, and (3,5) = 1, etc. However, if there is a 
. common divisor then (8,10) = 2. 
2.4 Dlinese Remainder Theorem (CRT) 
If the residue is known for several mutually prime moduli 
then with the help of the Chinese Remainder Theorem (CRT) these 
residues can be combined to give the result modulo the product of 
all the mutually prime factors. 
Let a set of simultaneous congruences be given for which 
each of the moduli m. are relatively prime. 
1 
is determined through linear congruences. 
the set of congruences is given by: 
For each i, bi 
The solution of 
(2.9) 
where Y :! ai mod mi, and composite modulus M is given by: 
M = IT mi (2.10) 
i 
2-6 
provided that m. are relatively prime, b. are defined such 
1 1 
that: 
b. (M/m.) = 1 mod m. 
1 1 1 
For example, let x = 2 mod 3 , x = 2 mod 5, x = 4 mod 7. To 
solve these simultaneous congruences first we get the product of 
mutually prime factors according to (2.10). Hence 
M = 3 • 5 • 7 = 105 
Now from (2.9) 
X = 2 b1 105/3 + 2 b2 105/5 + 4 b3 105/7 
= 2 • 35 • b1 + 2 • 21 • b2 + 4 • 15 • b3 
Now to determine b1, b2, b3 such that 
35 b1 - 1 mod 3 ===> b1 = 2 
21 b2 - 1 mod 5 ===> b2 = 1 
15 b3 = 1 mod 7 ===> b3 = 1 
substitution of these values in (2.11) gives 
(2.11) 
x = 70 • 2 + 42 • 1 + 60 • 1 = 242 - 32 mod 105 
2.5 Groups, Rings and Fields 
Recall from the previous section that 
a = b + Me (2.12) 
where b is the remainder, c is an integer (quotient) and M the 
modulus. Then (2.12) may be rewritten as 
a = b mod M 41- a,b E [1, M-~ 
In a finite set [a,b,c, ••• ,M-1] of integers all the elements are 
congruent to some integer called the modulus M. Such a set is 
denoted as ZM. Let there be an operation * defined in ZM, 
then the following conditions hold. 
2-7 
1- Closure : a * b 4f a,b E ZM 
2- Associative : (a * b) * c = a * (b * c) ¥ a,b,c E ZM 
3- Identity element : a * I = I * a = a * a,I E: ZM 
4- Inverse element : a * a -1 I -1 ZM = ~ a,a E: 
5- Commutative : a * b = b * a ¥ a,b E ZM 
Where I represents an identity element and a -1 is the 
inverse of a. If the operation * is defined as ordinary addition 
then property 4 represents subtraction, and for ordinary 
multiplication it represents division. 
If these properties hold then the set of integers ZM is 
called a group under the operation *· A group which obeys the 
commutative law is called an abelian group or a commutative 
group. A group is called a cyclic group if all the elements of 
the group can be generated from a single element, this element is 
called a generating function. For example 1 is a generating 
function under addition mod M. For a group ZM under ordinary 
addition '+' and ordinary multiplication '.' operations if the 
following distributive laws hold, 
a • (b + c) = a • b + a • c 
a • (b • c) = (a • b) • c 
(a + b) • c = a • c + b • c 
-¥ a,b,c ~ ZM' then the group is called a ring. 
Consider some examples of arithmetic mod 11, the elements in 
the ring ZM are [o,I,Z, ... ,IO ]· 
1- Addition : 5 + 8 = 13 = 2 mod 11 
2- Negation : -3 = 11 + (-3) = 8 mod 11 
2-8 
3- Subtraction : 3 - 7 = 3 + (11 - 7) = 3 + 4 = 7 mod 11 
4- Multiplication : 5 • 4 = 20 = 9 mod 11 
5- Multiplicative inverse : 6 • 2 = 12 = 1 mod 11 
6 and 2 are multiplicative inverses of each other 
-1 -1 
or 6 = 2 mod 11 or 2 = 6 mod 11 
6- Division : alb -1 is defined if and only if b exists, 
I - -1 therefore, a b = a • b mod M 
consider 912 = 9 • 6 = 54 = 10 mod 11 
from property 5, 6 and 2 are inverses of each other. 
The element 2 is an integer root of unity of order 10, 
25 = -1 mod 11 
210 = 1 mod 11 
2.6 Number Theoretic Transforms 
One group of transforms having the CCP are those with DFT 
like structure. Let 
X(k) = T x(n), so x(n) = T-1 X(k) 
N-1 
X(k) = L x(n) a nk (2.13) 
n=O 
where k = 0,1,2, ••• ,N-1. 
The inverse is given by: 
N-1 
x(n) = N-1 L X(k) a -nk (2.13a) 
k=O 
Where a is an element of order N, and plays the same role as W in 
equation (2.1). Where N is the least positive integer such that 
2-9 
aN :: 1 mod M, a ,N E [a, M-1} NTTs use modular arithmetic 
and possess the CCP. 
Euler's function or Euler's totient function is defined as 
the number of integers in the ring ZM which are relatively 
prime to a given modulus M. This function is represented by 
¢CM). If M is composite then ,¢(M)<M, but if M is prime then the 
Euler's function (lJ(M)= M-1, for example Jl1(6) = 2, and 0(7) = 6. 
,¢(M) = M(1-1/p1)(1-1/p2) ••• (1-1/pr) 
where p1,p2, ••• ,pr are different primes dividing M. 
Euler's theorem states that for any non zero element a in 
the ring ZM' which is relatively prime to M, (a,M) = 1, the 
following congruence holds 
~(M) - 1 d. M a = mo 
If M is prime then O(M) = M-1 and the Euler's theorem 
reduces to Fermat's theorem given by: 
M-1 
a = 1 mod M 
The necessary and sufficient condition for the NTT with the 
CCP to exist is that N J O(M), where O(M) is the greatest common 
divisor (g.c.d) given by: 
O(M) = g.c.d (p1 - 1)(p2 - 1) ••• (pr - 1) (2.14) 
Thus the maximum transform length N = O(M). 
max 
When the transforms in equation (2.13) and (2.13a) are 
defined in a finite ring of integers with the CCP, they are known 
as Number Theoretic Transforms (NTT) (7), (9) - (15), (80). In 
NTTs all the arithmetic operations are conducted mod M. There 
2-10 
are several constraints between the modulus M and the transform 
length N (9). Since the NTTs are similar in structure to the 
DFT s any algorithm which applies to the DFT can be applied to the 
NTT. In other words an NTT is a DFT with the CCP defined in a 
finite ring of integers under addition and multiplication. Such 
a ring is denoted by ZM. If the modulus M is a composite 
number then the multiplicative inverses of all the elements do 
not exist. Hence ZM is a field if and only if M is prime. If a 
is of the order of ¢(M), (where ¢(M) is the Euler's totient 
function), then a is called the primitive root or the generating 
function, the non-zero elements of ZM can be generated by the 
powers of the primitive root. 
The results obtained by NTTs are exact and are not subject 
to cumulative round off or truncation errors. For computing 
convolutions using NTTs, the choice of the modulus M has to be 
made first, then the corresponding N and a may be evaluated. 
In a ring of integers ZM' integers may be represented 
unambiguously if their absolute value is less than M/2. If the 
two sequences to be· convolved x(n) and h(n) are scaled such that 
y(n) never exceeds M/2, then the convolution in the ring of 
integers mod M gives the same results as normal arithmetic. In 
most practical applications the impulse response of a digital 
system h(n) and the peak amplitude of the input x(n) signal is 
usually known. 
For efficient implementation of convolution using NTTs the 
algorithm should be computationally efficient. Also N should be 
highly composite and the modulus large enough to provide a large 
2-11 
dynamic range of numbers. By suitable choice of N, M and a. it is 
possible to define NTTs which can be computed efficiently. If N 
is chosen to be a power of 2 the efficiency of the FFT algorithm 
can be applied for computation. Binary representation of a. 
should also be simple, such that the multiplication could be 
performed with ease. For a. = 2 or a power of 2 the 
multiplications are reduced to bit shifts and add. 
Discrete convolution may also be obtained by either Mersenne 
Number Transform (MNT) or Fermat Number Transform (FNT). These 
transforms are special cases of Number Theoretic Transforms. The 
multiplications in MNT and FNT are reduced to circular bit shifts 
within the word and add (12), (13), (14), (24). On a digital 
computer most of the computation time is taken by the 
multiplication. The situation is even worse on a microprocessor 
because ordinary microprocessors do not have hardware 
multipliers. Software implementation of the modular 
multiplication requires more time. External hardware modular 
multiplier may be implemented to facilitate modular 
multiplication. So transforms which do not require 
multiplications at all such as the MNT and FNT are 
computationally more efficient. 
2.6.1 Mersenne Number Transforms 
If the modulus is chosen to be a Mersenne number (M ), p 
then the transforms defined in a ring with CCP are called 
Mersenne Number Transforms (MNT). The mersenne numbers are 
defined as follows: 
2-12 
M = 2P - 1 p 
where p is prime. Mersenne numbers are of interest only if p is 
prime. 
Rader (12), have described method for computing circular 
convolution using Mersenne Number Transforms. The arithmetic to 
compute Mersenne transform requires only additions and circular 
shifts of bits within the word. Circular convolution is computed 
in a similar fashion as given by equation (2.5). Mersenne Number 
transforms provide error free convolution, since quantisation and 
truncation have no meaning in the field of integers. MNTs are 
defined in a field under addition and multiplication, also the 
associative, commutative and distributive laws hold, except that 
division is not defined therefore some numbers do not have 
multiplicative inverses mod M , unless M is prime. p p 
Mersenne number transforms are defined in a set of p 
integers. 
N-1 
X(k) = L x(n) 2nk 
n=D 
where k = D,1,2, ••• ,p-1 
mod M p 
· Let q be defined as inverse of p such that 
q = Mp - (Mp -1)/p 
we have solution 
(pq) = 1 mod M p 
if (M - 1)/p is an integer p 
but 
since 
M - 1 = 2P - 2 p 
P I 2P - 2. 
2-13 
(2.15) 
It is a special case of Fermat's theorem which states that, for 
every prime p and every integer q, pI qP - q, this proves that 
is an integer. Since 
pq = (p-1) M + 1 = 1 p 
thus the inverse transform is given by: 
N-1 
x(n) = q [ 
k=O 
where n = 0,1,2, ••• ,p-1. 
X(k) 2-nk mod M p (2.16) 
To ease the computations 2P (p is prime) may provide a 
suitable modulus, but the transform length is restricted to 2p. 
As 2p is not highly composite, it is not of much interest. 
Consider modulus 2k + 1, the maximum transform length is 2 
since 3 / 2k + 1, hence k must be even (k = pq a composite 
number). The other choice for the modulus is 2P - 1, where p is 
prime, 2 represents root of unity. This allows addition to be 
performed by simple 1s complement add. Multiplication mod M p 
is done by forming 2 p-bit product of two words, and adding p 
least significant bits (1s complement addition). However, 
multiplication by 2k mod M is quite simple to implement, p 
requiring bit rotation in a p-bit word. The same is true for the 
inverse transform except that the results must be multiplied by 
the inverse q. 
2-14 
2.6.2 Fermat Number ·Transforms 
If the modulus is chosen to be a Fermat number, then the 
transform is called a Fermat Number Transform (FNT). Fermat 
numbers are defined as: 
b M = Ft = 2 + 1 
t 
where b = 2 , t = 0,1,2, ••• 
(2.17) 
Fermat numbers F 
0 
- F 4 are prime and .F 5 upwards are 
· composite. Then for FNT to exist 
N I O(F t) 
O(F t) = 2b = N max 
The largest possible transform length in this case is 
m <. b 
If a = 2 the FNT can be computed efficiently. The FNT of a 
sequence is given by: 
N-1 
X(k) = L x(n) a nk mod Ft 
n=D 
where k = 0,1,2, ••• ,N-1, and inverse is given by: 
N-1 
x(n) = N-1 L X(k) a -nk mod F t 
k=O 
(2.18) 
(2.19) 
where n = 0,1,2, ••• ,N-1, and N is a power of 2, and U is the Nth 
root of unity, i.e. a N -; 1 mod F t• In case of the FNT the 
multiplication is equivalent to bit shifts and add. 
One of the constraints in the pr.actical implementation of 
the FNT is that the wordlength is defined by the transform length 
(13). For a general Ft (D4) the maximum transform length is 
2-15 
., 
given by N = 2t+2• Since a 2 ~ 2 mod Ft' a : JZ: the 
transform length N = 4 x wordlength. For example arithmetic mod 
F 2 provides us with 6
2 
- 2 mod 17, 6 = J2 mod •17. 
Equation (2.18) can be computed efficiently using FFT type 
algorithm. In FNT multiplication is equivalent to simple binary 
word shift followed by subtraction. Leibowitz (14), have used 
slightly different approach for performing modular arithmetic mod 
~ 
F t• In the Agarwal and Burrus (13), method problems arise due 
to quantisation when b-bits are used for modular arithmetic. 
This is due to the fact that 2b = -1, hence when -1 is 
encountered it is either rounded to 0 or 2. This introduces some 
quantisation error. The method described by Leibowitz (14), uses 
(b+1)-bits, the extra bit is only used to represent D. 
McClellan (15), have described hardware to implement the 
FNT. A different number representation is used in which the bits 
are weighted +1, -1 and not as 0, 1 as in conventional binary 
representation. 
2-16 
GiAPTER 3 
Multiplication Techniques for Microprocessors 
3.1 Introduction 
We have seen in the previous chapter that the Number 
Theoretic Transforms (NTTs) are defined in a finite ring of 
integers ZM. NTTs provide error free convolution (9), (12), 
(13). Since in the ring all the numbers are defined precisely, so 
there is no ambiguity in their representation on a digital 
computer. In contrast floating point numbers cannot be 
represented accurately on a digital computer, and floating point 
arithmetic is subject to roundoff and truncation errors. 
Ordinary microprocessors are integer processing machines and 
are available at much lower prices than the floating point 
arithmetic processors. A microprocessor provides cheap integer 
processing power. By appropriately manipulating the carry bit in 
the condition code register, the microprocessor is capable of 
performing multi-precision arithmetic, for example an 8-bit 
microprocessor can perform 16-bit arithmetic operations. It 
seems logical to investigate the possibilities for implementing 
NTTs on microprocessors (5), (6). In many microprocessors no 
hardware multiplier is available since it requires more hardware 
and chip area. When a hardware multiplier is not available 
alternative methods may be employed to perform the multiplication 
in software or by implementing an external hardware multiplier 
3-1 
(18), (31), (41). 
For real time digital signal 
multiplication must be carried 
processing applications, 
out efficiently. The 
multiplication speed can be increased by reducing the total 
number of additions (of partial products) or by performing high 
speed addition. Carry Save Adders (CSA) or Carry Look Ahead 
(CLA) may be used to reduce the carry propagation delay instead 
of conventional Carry Propagate Adders (CPA) (16), (17), (23). 
3.2 Clocked Multiplication Algorithms 
We can classify multiplication in different ways 
i.e. serial, parallel, unsigned, signed (twos complement). A 
brief outline of different algorithms for binary multiplication 
is presented. 
3.2.1 Multiplication on a Microprocessor 
The simplest form of binary multiplication is multiplication 
by two or powers of two. This is analogous to multiplication by 
ten or powers of ten (considering integer arithmetic) in the 
decimal number system. Multiplication by ten is accomplished by 
appending a number of zeros equal to the power of ten towards the 
least significant digit. Similarly in the binary number system, 
multiplication by two is accomplished by shifting the binary word 
towards the most significant bit position and filling the vacated 
places by zeros. The number of shifts is equal to the power of 
two. Overflow conditions must be detected and dealt with 
accordingly. It may be mentioned here that division by two in the 
3-2 
binary number system is equivalent to shifting the binary word a 
number of positions towards the low order significant bits. This 
is analogous to shifting of the decimal point in the decimal 
number system towards the high order digit position. However, in 
the binary number system if the least significant hit was a one 
prior to division by two, then the result is subject to 
truncation. This may be circumvented by rounding the binary word 
prior to shifting, this is done by adding a one to the least 
significant bit irrespective of the bit value. 
In practice it is quite uncommon to encounter 
multiplications by two or a power of two. Hence some other 
method must be devised and developed for the implementation of 
multiplication on a microprocessor. 
The most commonly used method to perform multiplication on 
the microprocessor is the shift and add algorithm. The 
microprocessor checks the bits in the multiplier one by one and 
if a one is encountered the multiplicand is added to the partial 
product. After addition the partial product is shifted towards 
the least significant bits. If a zero is encountered then no 
addition takes place and the partial product is simply shifted 
towards low order bits, which is equivalent to shifting of 
multiplicand towards the most significant bit position (28). 
This method is lengthy and quite inefficient for large numbers. 
If subtract instruction is available then an alternative method 
may be used. For example a string of ones in the multiplier can 
be reduced to subtract for the first 1 encountered, shift for 
each subsequent 1 and addition for the first 0 encountered. A 
3-3 
multiplication by 14 (1ll0) may be reduced as follows. 
14 = 23 + 22 + 21 
= 24 - 21 
= 10000 - 10 
Since the multiplication time increases with the number of 
multiplier bits, the above mentioned method may produce results 
faster than the shift and add algorithm. This algorithm may also 
be implemented externally in hardware (17), (18). 
3.2.2 Burk-Goldstine - Von-Neumann Method 
This method was developed for twos complement multiplication 
(21). In this method if the multiplier and the multiplicand are 
positive no correction of the final result is required. However, 
if any of the operands is negative (twos complement) then 
correction must be applied to the final result. This step is 
necessary since in the twos complement number the sign is 
embedded in the number itself. This algorithm generates the 
product in the following manner. 
Let X, y be the multiplicand and the multiplier 
respectively, where 
* X = -xo + X 
* (3.1) Y = -yo + y 
* * 
-xo and -yo represent the sign bit and X and y give true 
value of the numbers. For number representation see Chu (21). 
The product is obtained as follows 
* * X Y = (X + x 0 ) (Y + y 0 ) 
3-4 
To obtain the correct answer -(x 0 Y + y0 X + x0 y0 ) must be added to 
the final product, such that 
* * X Y =X Y 
If one of the numbers is positive then either -x0 Y or -y0 X have 
to be added. 
3.2.3 Robertson's First Method 
This method multiplies a signed number X with an unsigned 
* number Y = Y. When the multiplier is negative, correction 
term -y 0 X must be added. No correction is required when the 
multiplicand is negative (21). 
3.2.4 Robertson's Second Method 
In this method if the multiplier is negative, then the 
product of -X and -Y is calculated which yields a positive 
result, then no correction is required. But if Y = -1 then the 
result is not correct. The value of Y must be restricted such 
that -1 < Y < 1 (21). 
Comparing the two methods, in the first method if the 
multiplier is negative then it ·needs correction, but in the 
second method no correction is required. The hardware only needs 
to sense the sign bit y0 of the multiplier and to complement the 
multiplicand X. 
3-5 
3.2.5 Booth's Algorithm 
Booth's algorithm is quite extensively used where serial, 
signed twos complement multiplication has to be implemented (20), 
(21), (28), (35), (40), (43), (46). This method has an advantage 
over the previous methods that no prior knowledge of the sign and 
no correction of the result is required at the end. Also the 
product is independent of the sign of the multiplier and the 
multiplicand. Let the multiplier and multiplicand be represented 
as. 
n n-1 ° X = -x 2 + x 12 •••• + X 0 2 n n-
In this method two consecutive bits y. and y. 1 of the 
1 ·-
multiplier are examined simultaneously, starting from the least 
significant bit. Three possible conditions can arise for y. 
. 1 
and y. 1 1-
i) if y., y. 1 are 01, then the multiplicand is added to the 1 1-
partial product. After addition the partial product is shifted 
by one bit towards the least significant bit position. 
ii) if y., y. 1 . are 10, then the multiplicand is subtracted 1 . 1-
from the partial product and the partial product is 
shifted one bit towards the least significant bit position. 
iii) if y ., y. 1 are 00 or 11, then no addition or subtraction 1 1-
takes place. However, the partial product is shifted one 
bit position towards the least significant bit. 
3-6 
3.2.6 A Short Cut Multiplication Method 
This method involves detection of isolated bits ones or 
zeroes. If a sequence of ones are detected then multiple addition 
of the multiplicand into the partial product takes place. 
Otherwise multiple shifts are performed on the partial product. 
Additional hardware may be required to detect the sequence of 
ones or zeroes. For example, if the multiplier is 01000100, then 
there are only two additions of 26 and 22• Worst case would 
be if the multiplier had alternating ones and zeroes. 
3.2. 7 Multiple Digit Multiplication Method 
This algorithm uses the method of repeated additions of the 
multiplicand to the partial product. However, there is a subtle 
difference from the method described previously (Booth's 
algorithm). In this method two consecutive bits of the multiplier 
are checked simultaneously. The following four different 
conditions can arise. 
i) if y., y. 1 are 00, then no addition takes place 1 I-
ii) if y., y. 1 are 01, then the multiplicand is added into 1 1-
the partial product. 
iii) if y., y. 1 are 10, then twice the multiplicand is added 1 1-
into the partial product. 
iv) if y., y. 1 are 11, then three times the multiplicand is . 1 1-
added into the partial product. 
Since two consecutive bits are considered only once, the 
total number of addition steps are thus reduced and hence there 
is an overall improvement in the speed. It may be noted that the 
3-7 
partial product is shifted two bit positions instead of one after 
the addition of the multiplicand into the partial product. 
Parasuraman (18), have described a variation in this method 
by inspecting three bits at a time and applying correction. 
Harman (19), have described a possible method to increase the 
multiplication speed by examining the number of ones in the 
multiplier and the multiplicand. The operand which has the least 
number of ones is chosen as the multiplier. This method may not 
find a place in practical applications. 
3.3 Clockless Multiplication 
All the different techniques described above use clock 
signals to generate the shift and the add pulses. Now we 
consider some algorithms for clockless multiplication which are 
much faster than the methods described before. Clockless 
circuits are also referred to as combinatorial circuits, whose 
outputs entirely depend upon the current input values. 
3.3.1 Array or Parallel Multiplication 
This method is generally used when high speed multiplication 
is to be performed. A 11 the bits of the multiplier and 
multiplicand are fed simultaneously into an array of logic gates 
and full adders. No storage of partial or intermediate products 
is required. Chu (21), have described a simultaneous multiplier 
in which the two operands are fed into a two dimensional array 
structure of logic gates and full adders. 
3-8 
Rabiner and Gold (20), have also discussed a fast parallel 
multiplier which consists of a two dimensional array of 1-bit 
adders. The total multiplication time is the sum of the settling 
time and the propagation delay of the logic ·used, after the 
operands are fed into the input. The unit cell is shown in 
figure (3.1a). These basic cells are cascaded to give a parallel 
multiplier structure. Figure (3.1b) shows a 3 x 3-bit parallel 
array multiplier. This arrangement can be extended to an n x 
n-bit parallel multiplier. A finite amount of time is required 
for the carry to propagate through different stages of the 
multiplier. The partial products can be generated as shown in 
figure (3.2). A problem arises when the partial products have to 
be added. For small numbers the conventional ripple carry adder 
(CPA) may be used to add the partial products, but for larger 
numbers a CLA (Carry Look Ahead) or a CSA (Carry Save Adder) may 
be used (22), (23). Davies and Fung (31), and Bate and Burkowski 
(33), have described the interfacing of a high speed 
combinational array multiplier to a microprocessor. 
3.4 Read Only Memory (ROM) Multiplier 
With the availability of cheap and fast ROMs for storing 
information lookup techniques may be employed to perform 
arithmetic operations for a small range of numbers (18), (26), 
(27), (28). The ROM is programmed such that the products are 
stored in it in an appropriate manner. The address lines are 
used as input, and the product is obtained on the data bus. This 
method is very fast since the output from the ROM entirely 
depends upon the access time of the ROM and may be of the order 
3-9 
P5 
SUM INPUTS 
CARRY OUT CARRY IN 
SUM OUTPUT 
Figure 3.la: Unit cell O-bit adder). 
0 
P4 P3 
Figure 3.lb: 3 x 3 Parallel array multiplier by combininq unit 
cells. 
XO.YO 
PO 
YO. Yl Y2 Y3 
XO.YO . XO.Y1 XO.Y2 XOY3 
X1 
X1.YO X1.Y1 X1.Y2 X1.Y 3 
X2.YO X2.Y1 X2.Y2 X2.Y3 
X3.YO. X3.Y1 X3Y2 X3.Y3 
Figure 3.2: Arrangement for generating partial products. 
of tens of nanoseconds. The ROM lookup technique for 
multiplication can be used in variety of ways some of which are 
described below. 
3.4.1 Direct ROM Multiplier 
The multiplier and multiplicand are appropriately connected 
to the address bus of the ROM. The product of the two numbers, 
which is stored at this address is then obtained directly. 
Figure (3.3) shows an arrangement for a simple ROM multiplier. 
The disadvantage is that if the numbers are large then this 
method may become impractical due to complexity, size and cost. 
3.4.2 Quarter-Squares Lookup Table Multiplication 
Let X and V be the two n-bit numbers to be multiplied. Then 
the product is obtained in the following manner. 
XV = (X + V) 2 - (X - V) 2 
4 
XV = (X + V) 2 - (X - V) 2 
4 4 
(3.1) 
(3.2) 
(3.3) 
Squares of the sum and difference of the two numbers are 
stored in separate ROMs. Sum and 8ifference is obtained by 
conventional method using add«;!r. Figure (3.4) shows an 
arrangement for such a multiplier. 
3-10 
X 
y 
X 
y 
• 
X 
y ---1 
ROM 
TABLE XV 
Figure 3.3: Direct ROM multiplier. 
ROM 
ADDER X+ y SQUARE 
TABLE 
ROM 
ADDER X-Y SQUARE 
TABLE 
ADDER t---"""'1 + 4 t---XY 
Figure 3.4: Quarter-squares lookup table multiplication. 
ROM LOG LOGX X TABLE 
ROM 
+ LOGX•LOGY ADDER ANTILOG 
ROM LOG TABLE 
y TABLE LOGY 
Fiqure 3.5: Multiplication usinq loqr1rithrns. 
XY 
In equation (3.1) the product is obtained by dividing the 
difference of the output of ROM squarer by 4. In equation (3.2) 
the division by 2 is accomplished before feeding the sum and 
difference to the ROM square table. This sometimes introduces 
truncation errors. Equation (3.3) is equivalent to equation 
(3.1) and gives the same results (26). 
For X and Y even or odd we have X = 2m and Y = 2n or 
X = 2m + 1 and Y = 2n + 1 respectively. If X and Y are even or 
odd equations (3.1) and (3.3) are equivalent, but equation (3.2) 
produces truncation errors. 
For example, if X is even and Y is odd, then X=2m, Y=2n+1, 
substituting these in equation (3.3) we get: 
2m(2n+1) = (2m + 2n + 1)2 
4 
- (2m - 2n - 1)2 
4 
= (m+n) 2 + (m+n) + ! - (m-n) 2 - (m-n) - ! 
= (m+n) 2 + (m+n) - (m-n) 2 - (m-n) 
= 4mn + 2m (3.4) 
Considering the case with equation (3.2), we get: 
= 4mn 
1: XV (3.5) 
Equation (3.5) shows truncation error of 2m. Davies (28), 
have described implementation of this method directly on the zao 
microprocessor in software. 
3-11 
Johnson (27), have described an improved ROM lookup method. 
Partial products are stored in separate ROMS and the lookup 
results are added appropriately. Product time depends upon the 
access time of the ROMs and the carry propagation delay of the 
adders. Parasuraman (18), have also described lookup method for 
multiplication. 
3.4.3 Multiplication Using Logarithms 
Brubaker and Becker (25), have described another approach to 
binary multiplication. This method employs logarithm and 
antilogarithm tables stored in ROMs. The product of two numbers 
are obtained in the following manner. 
XV = antilog (log X + log Y) 
' This method introduces errors due to truncation and 
rounding. A disadvantage in this method is that only the product 
of positive numbers can be directly obtained (since the logarithm 
of a negative number is undefined). However, the sign of the 
product can be generated externally if required. Figure (3.5) 
shows an arrangement for the logarithmic multiplier. The 
multiplication time is twice the access time of the ROM. 
3.5 Parallel Multipliers Chips 
Parallel multiplication can be achieved using discrete 
components described. However, VLSI technology now allows the 
integration of a complete n x n-bit multiplier on a single chip. 
These chips are easy to interface with a general purpose 
microprocessor (18), (31), (34), (35), (36), (37), (38), (39), 
3-12 
(41), (42), (44). Usually these multiplier chips can be cascaded 
so as to allow multiplication of arbitrary length numbers. 
The methods discussed previously use twos complement 
multiplication with discrete components. However, in VLSI chips 
a facility may be provided to perform signed or unsigned 
multiplication, rounding etc. 
Bywater (16), Lewin (17), Rabiner and Gold (20), Chu (21), 
Hayes (22), Flores (45), Booth and Booth (46), Abd-alla and 
Meltzer (47), are also suggested for further reading. 
3.6 Modular Arithmetic on Microprocessor 
Modular arithmetic operations can be implemented on any 
microprocessor with unsigned compare instructions. Some 
microprocessors may perform these arithmetic operations more 
efficiently and faster than the others. This depends upon the 
clock frequency, number of accesses to the memory to fetch the 
operands and the number of CPU registers available. If the CPU 
has enough .registers to hold the operands and the intermediate or 
partial products, then the total number of memory accesses are 
reduced (during the multiplication), which will produce faster 
results. 
Modular arithmetic routines were written for several 
microprocessors. Results of the routines are -shown in tables 
(3.1) to (3.3). Appendix-A contains assembler source listings of 
these modular arithmetic routines. Note that each of the 
microprocessor has a different clock frequency. Renold (48), 
3-13 
Table .3.1: Results of benchmark programs for modular addition. 
Clock Microprocessor Number of Number of Instr: Clock Cycles Price 
MHz (No. of bits) Program Bytes Executed (Time )Jsec) 
3 TMS9900 (16) 36 8 88 (29.3) so.o 
Texas Instr. Ltd 
2 M650Z (8) 42 22 74 (37.0) 13. 0 
I MOS Technology 
1 M6809 (8) 20 8 40 ( 40. 0) 13.0 
Motorola 
8 8 X 300 ( 8) 76 26 52 (6.50) 36.0 
Signetics 
4 zao (8) 36 14 75 (18.74) 11. 0 
Zilog 
COP402 (4) 
• 1 2 5 National 20 5 108 216 (864.0) 4.80 
Semiconductors 
.._________ 
·---------·-
Table 3.2: Results of benchmark progr~ms for modular subtraction. 
Clock Microprocessor Number of Number of Instr. Clock Cycles Price 
MHz (No. of bits) Pro']ram Bytes Executed (Time }Jsec) 
3 TM$3900 (16) 24 8 88 (29.3) 50.0 
Tex~s Ins tr. Ltd 
2 M6502 ( 8) 75 16 59 (29.5) 13.0 
MOS Technology 
1 M6809 ( 8) 14 6 32 (32.0) 13. 0 
Motorola 
8 8 X 300 (8) 10'3 50 100 (12.5) 36.0 
Signetics I 
.. I 
4 Z80 (8) 49 22 117 (29.24) 11.0 I 
Zilog 
COP402 (4) I 
.125 National 211 134 268 (1072.0) 4.80 
Se"'iconductors 
---- ~---~--- ~-- -- ~---·-
Table 3.3: Results of benchmark programs for modular multiplication. 
Clock r-1 i c r o p r o c e s so r Nuf'l'lber of Number of Instr. Clock Cycles Price 
Ml-iz (No. of bits) Pro<Jram Bytes Executed (Time Usee) 
3 TM$990(1 (16) 18 5 242 (80.0) so.o 
Texas Instr: Ltd 
2 M6502 ( 8) 333 1246 4866(2433.0) 13.0 
t-10S Tecnnology 
1 M6809 (8) 128 60 336 (336.0) 13. 0 I 
Motorola 
' 
8 8X300 ( 8) 160 .325 550 (81.25) 36.0 
Si gn.et ic s 
4 zao (8) 252 1013 2462 (615.5) 11.0 
Zilog 
COP402 (4) 
.125 National 859 2269 4553 (18212.0) 4.80 
Semiconductors 
-------
I..__ ____ ----------
--- - ---- -- -- --- - - ---- - - - -- - -- --
have compared performances of five different microprocessors by 
means of nine different benchmark programs. He has suggested two 
methods for comparison. 
i) An instruction of medium complexity (load 8-bit register) is 
chosen as an instruction unit. The number of clock cycles for 
any instruction is divided by the number of clock cycles of the 
instruction unit. 
ii) Reduce the clock frequency such that the instruction unit 
takes the same time for all the processors. 
Smith (58), have also described comparison of three 
microprocessors by executing a standard program on each one of 
them. The performance is compared by looking at the number of 
program bytes required, execution time etc. 
To implement modular arithmetic any value of modulus M may 
be chosen. The residue is usually computed using division, but 
division, like multiplication is not an efficient operation when 
implemented on a microprocessor. Division may also be 
implemented externally which may require complex hardware. 
Special techniques may be used to compute the residue. 
In a decimal number system, if the modulus is chosen to be 
10, then the residue of the number is the least significant digit 
of the number. For example 103 = 3 mod 10. A similar case is 
also true in binary number system. If the modulus is chosen to 
be 2k (k is a positive integer) then the residue is found by 
masking out the most significant k-bits except the low order 
k-bits which is the residue. A carry into the kth bit is 
3-14 
congruent to 1 and if added to the least significant k-bits gives 
the residue. A choice of modulus 2k -1 also provides easy 
calculation of the residue. The residue in this case is computed 
by adding the k most significant bits to the k least significant 
bits. But in some cases if the k least significant bits are 1s, 
and the k most significant bits are zeros, then the result is not 
correct and may be corrected by adding a one to the k least 
significant bits. 
Let k = 4, 4 2 - 1 = 15 
i) 7 x 8 = 56 = 11 mod 15 
in binary form it is given as 
0111 X 1000 = 0011 1000 
carry = 0 
1000 
+ 0011 
1011 mod F 
ii) 14 x 14 = 196 = 1 mod 15 
1110 X 1110 = 1100 0100 
carry = 1 
0100 
+ 1100 
0000 
+ 0001 
0001 . mod F 
If the modulus is chosen as 2k + 1 then 2k = -1 and 
The problem in this case (k-bit arithmetic) is 
the representation of -1 if it is encountered, it is either 
rounded to 0 or 2. To implement NTT there are several 
constraints between the modulus and the wordlength. If the 
wordlength of the microprocessor does not allow the required 
dynamic range of numbers, the Chinese Remainder Theorem (CRT) may 
3-15 
be used to perform arithmetic modulo product of several moduli. 
A search for a suitable modulus made by Martin (5), showed 
that a value of M=65521 (2 16 -15) is very convenient for 
implementation of the NTTs using the WFTA. This is the first 
16 prime number below 2 and allows a dynamic signal processing 
16 
range of nearly 2 • Some examples of arithmetic modulo 65521 
($FFF1) are given below. $ shows a hexadecimal number. All the 
following examples use hexadecimal numbers, $ is omitted. NTTs 
deal with unsigned numbers so more emphasis will be given to this 
type of arithmetic. 
3.6.1 Addition Modulo 65521 
When two 16-bit numbers are fed into a binary adder, a value 
of 216 - 65521 (=15) must be added to the sum, 
i) if a carry was generated or 
ii) if the sum was greater than· 65521. 
However, this may generate a further carry, but not more than two 
carries can ever be generated. 
i) 
carry = 0 
ii) 
carry = 1 
carry = 0 
0279 
+ 041C 
0695 
FFEF 
+0014 
0003 
+ OOOF 
0012 
mod FFF1 
mod FFF1 
mod FFF1 
3-16 
3.6.2 Subtraction Modulo 65521 
Subtraction is performed in the usual way by adding the twos 
complement of the subtrahend to the minuend. A value of 65521 
must be added to the result, if the subtrahend was greater than 
minuend. 
i) 
ii) 
0352 
-0140 
0212 mod FFF1 
0140 
- 0352 
FDEE 
+ FFF1 
FDDF mod FFF1 
3.6.3 Multiplication Modulo 65521 
If the product of two 16-bit numbers exceeds 65521 then the 
product is reduced modulo 65521. 
0003 * 0003 : 0009 mod FFF1 
FFFO * FFFO : 0001 mod FFF1 
(FFFO - -1 mod FFF1) 
3-17 
a-iAPTER 4 
Implementation of the Winograd Fourier Transform Algorithm 
4.1 Introduction 
The Discrete Fourier Transform (DFT) of a sequence x(n) is 
given by: 
N-1 
X(k) = L x(n) wnk (4.1) 
n=O 
and the inverse is given by: 
N-1 
x(n) = N-1 L X(k) w-nk (4.2) 
k=D 
where W = -j2TT/N e ' W is an integer root of unity such that 
WN :: 1, N is the sequence length. Cooley and Tukey (8), showed 
an efficient way of computing the DFT which reduces the number of 
2 
operations from N to Nlog2 N. Attempts have been made to 
further reduce the number of operations. Winograd (3), proposed 
a new class of Winograd Fourier Transform Algorithms (WFTA), 
which requires only 20 percent of multiplications as that of 
Cooley-Tukey's FFT algorithm and roughly the same number of 
additions. Winograd proposed short length DFT algorithms of 
length 2, 3, 4, 5, 7, 8, 9, 16, with minimum number of 
multiplies. Table (4.1) shows number of additions and number of 
multiplications for each of these short length DFT algorithms. 
4-1 
Short-length No. of No. of 
WFTA Adds Multiplies 
2 2 2 
3 6 3 
4 8 4 
5 17 6 
7 36 9 
8 26 8 
9 44 13 
16 74 18 
Table 4.1: Number of additions and multiplications in the 
Winograd short length OFT algorithms. 
In the FFT the sequence length is N = 2m, where m is a 
positive integer. However, in the WFT A the transform length is 
equal to several mutually prime factors. If not more than one 
factor is chosen from each of the following groups (2, 4, 8, 16), 
(3, 9), (7) and (5), transform lengths in the range from 2 to 
5040 are possible. This is done by nesting the short length 
algorithms together in the following manner. Each of the short 
length OFT algorithms consists of input additions followed by 
multiplications and the output additions. In the nested form all 
the input additions (for the mutually prime factors) are 
performed one after the other followed by multiplications (with 
the coefficients) and the output additions. Instead of 
performing the multiplications separately for each of the short 
length factors, the multiplications are also nested (49). This 
algorithm reduces the total number of multiplications at the cost 
of increased algorithm complexity. These multiplications are 
performed with precomputed transform coefficients. There are two 
sets of transform coefficients, one for the forward transform and 
the other set for the inverse transform. N-1 . In equation (4.2) 
is combined with the inverse transform coefficients so that the 
forward and the inverse WFTA can be computed with equal 
computational effort. 
For example, for sequence length N = 15 the mutually prime 
factors are (3,5) = l. Figures (4.1) and (4.2) show the 3-point 
and 5-point WFTA respectively. Let xO,x1, ••• denote the input 
sequence and XO,X1,... denote the transformed sequence. 
4-2 
3-Point WFT A 
N = 3, U = 2 11: /3 
tl = x1 + x2 
mO = 1 • (xO + tl) 
m1 = (COSU - 1). tl 
m2 = jSINU(x1 - x2) 
· s1 = mO + m1 
XO = mO 
X1 = s1 + m2 
X2 = s1 - m2 
5-Point WFT A 
N = 5, U = 2 11:/5 
t1 = x1 + x4, t2 = x2 + x3, t3 = x1 - x4 
t4 = x3 - x2, t5 = tl + t2 
mO = 1 • (xO + t5) 
m1 = (t{COSU + COS2U) - 1).t5 
m2 = t{COSU - COS2U).(t1 - t2) 
m3 = jSINU.(t3 + t4) 
m4 = j(SINU - SIN2U). t4 
m5 = j(SIN2U + SINU). t3 
s1 = mO + m1, s2 = s1 + m2, s3 = m5 - m3 
s4 = s1 - m2, s5 = m3 + m4 
XO = mO 
X1 = s2 + s3 . 
X2 = s4 + s5 
X3 = s4 - s5 
X4 = s2 - s3 
4-3 
xO + mo xo 
xl 
x2 m2 __ ...,.. __ --.1111...._ ____ ...,..- X2 
mult. 
prewave. postweave 
F·ig.41: 3- Point WFTA 
XO 
X1 
x2 X2 
x3 
ms ~-.~~----~~----~~~~- X4 
m3 
mult. 
pr@weave postwe~ve 
Fig.4·2: 5-Point WFTA 
The nested 15-point WFT A is shown in figure (4.3) which 
clearly shows five 3-point pre-weaves (premultiply adds), 
followed by three 5-point pre-weaves. This is followed by the 
multiplications, the number of multiplications is equal to the 
product of the multiplications in individual short length OFT 
algorithms. Finally the three 5-point post-weaves (postmultiply 
adds) and the five 3-point post-weaves are performed. WEAVE (50) 
is an acronym for Winograd Elementary Add Vector Elements. Note 
that there are eighteen multiplications in the 15-point WFTA, 
since there are three multiplications in the 3-point WFTA and six 
multiplications in the 5-point WFT A. 
Similarly a 60-point WFTA has three mutually prime factors 
3, 4, and 5. First of all twenty 3-point, fifteen 4-point, 
t wei ve 5-point pre-weaves are performed, followed by 72 modular 
multiplications with coefficients and the post-weaves for each of 
the short length WFT A. 
The input and the output data must be reordered or shuffled. 
The input and output shuffle vectors are also precomputed and 
stored in the memory and the shuffle is then performed using 
lookup. The disadvantage in the WFTA is that extra memory is 
required just to store the input/output shuffle vectors and the 
forward and the inverse WFTA coefficients. However, this 
algorithm is computationally efficient on machines on which the 
multiplication time is much longer than the addition time. 
Silverman (51), have described memory considerations for the 
FFT and WFTA, and discussed that the WFTA requires 7N memory 
4-4 
O• • .. • • ~ I ec:::: • • t • • 0 
3•· • fit :-....: :Y • • • ~ :::A 1\ • • 6 
-=== -- " {~\ \ • • 6 • • '''k.' {:Y .,. • lit • ~' )\\ y I 1: J l 12 
9 • • I I ly / " ~ • • lit • ; ........ / k~t t \, • - • rrr r' .. r, ''' 3 
12 • · • ! ! ! t·~ •"-::: "' • • I • 7' :;11- • •lf\ \ \ \ • • 9 
.,. • * 
·s 4t ·- I I I I -I Y Y Y e e e ~ • "= e e e \ \ \ \ "f ,• 10 • iii \ 
w8 tt ' 19- I I I • ~ ' y , y y ~ -....... ~ • I • ~ ~ j \ \ \ _ .... , I ,, ---..... iii 1 I 
~ w ~ ~ 
~11 ~ \ \ I ~·- I I " I ........... H • • • ., ~- " I~ \ \ ..,\ \ I I ,. 7 ~ ''' rr rr "'"' r:>- ~' r i\ \i YJ ::::l 
1- 14 tt \ \ X I 1+- I ...... I " """ • • * • )I .,c / " .,.,. \ ~\ \ X I I • 13 '' rv v r' r' i ti rr, 
I 
Vl 
1-ir 
z 2 , \ y I I {., - I "aC "< • * • 7£ ;;;.- ~ ... - ¥'\ y X X I ,. 4 ~ 
.... . . 
10 • f II X 'flf • • • ...,.e 
13 t~ f I I \ \-a.- • k :A<= • 
4 ~ '.olll- ~r ' I '\. ~ • • 
7 • ... - ........ "< • • 
5 POINT PRE- WEAVE lol.liiPliCATIC.N 
WITH 
COEFFICIENTS 
3 POINT POST -WEAVE 
5 POINT POST-WEAVE 
FIG. 4·3: tESTED 1 S-POINT WINOGRAD FOURIER TRANSF~M ALGORITHM (WF TA) 
5 
11 
2 
8 
14 
5 
locations as compared to comparable size FFT algorithm which 
requires 1.25N memory locations. Unlike the FFT, the WFTA cannot 
be computed inplace, Silverman called an analogous approach as 
full overlay. Nawab and McClellan (52), have described that in 
general the WFT A requires more data transfers than an equivalent 
length FFT. In addition they have also discussed the minimum 
number of CPU registers required to perform each short length OFT 
algorithm efficiently, since register to register instructions 
are executed much faster. 
4.2 ~tion of NTT using WFTA 
The Number Theoretic Transform (NTT) of a sequence x(n) is 
given by: 
N-1 
X(k) = I x(n) ank (4.3) 
n=O 
and the inverse is given by: 
N-1 
x(n) = -1 r N . X(k) a-nk (4.4) 
k=O 
where a = e -jZTT/N, and is an integer root of unity, such that 
aN :: 1 mod M , where M is the modulus, and a is defined in a 
finite ring of integers ZM. The choice of modulus is made such 
that N I M, i.f M is prime then N I·M-1. The inverse 
N-1 is defined such that NN-1 = 1 mod M. If M is not a prime 
-1 then N may or may not exist. Martin (5), carried out a 
search for a suitable modulus on the lines described by Bailey 
4-5 
(53), and found that value of M = 65521 is quite adequate for 
16-bit modular arithmetic and it is the first prime below 216• 
Since NTTs are similar in structure to the OFT any algorithm 
which applies to the OFT can also be applied to the NTT. 
4.2.1 Determination of the Constants for the WFTA 
Implementation of the WFTA requires some constants to be 
precomputed and stored in the memory. These are the input/output 
shuffle vectors, transform coefficients etc. Consider that we 
want to implement a 15-point WFT A. The following calculations 
must be performed before the actual program coding. 
1- Choice of modulus M = 65521, since it satisfies the 
condition N I O(M), where O(M) is the g.c.d of (p.-1). 
I 
0(65521) = 13 x 5040 and so this modulus will support 
any Winograd transform algorithm (5), (9). 
2- Choice of transform length N = 15. 
3- Determination of N-1, 15-1 = 61153 mod 65521. 
4- Determination of element of order N, 
a 15 = 1 mod' 65521, (7791)15 = 1 mod 65521. 
5- Determination of mutually prime factors 15 = 3 x 5, such 
that (3,5) = 1. 
6- Determination of j (iota) such that j.j = -1 mod 65521, 
j = 41224 mod 65521, j is an element of order 4, such 
that (41224)4 '= 1 mod 65521. 
7- Determination of T 1, T 1 = 32761 mod 65521 
8- Determination of the input and output shuffle 
or reordering vectors. The input and output shuffle 
4-6 
vectors are obtained using Chinese Remainder Theorem 
(CRT), in the following manner. 
Let N = r1 r2 such that (r1 , r2 ) = 1 
also let q1 = 0,1, ••• ,r1-1, and q2 = 0,1, ••• ,r2-1 
The following equation allows mapping from a one dimensional 
into a two dimensional array. 
Let 
We get 
(5q 1 + 3q2) mod 15 (4.5) 
Using equation (4.5) we obtain the following input shuffle 
vectors 
0 3 6 9 12 
5 8 11 14 2 
10 13 1 4 7 
Similarly the output reordering vectors are obtained, by 
using the following relationship and determining the values of x 
and y, such that: such that: 
5x 
- 1 mod 3 --~ X = 2 
3y 
-
1 mod 5 --~ y = 2 (4.6) 
Equation (4.5) is rewritten as 
(5xq1 + 3yq2) mod 15 
substituting values of·x and y, we get 
(10q1 + 6qz) mod 15 (4.7) 
4-7 
where q1 = 0,1,2 and q2 = 0,1,2,3,4. 
This relationship gives us the output reordering vectors as 
0 6 12 3 9 
10 1 7 13 4 
5 11 2 8 14 
9- Determination of the transform coefficients. 
By definition 
casu = HejU + e -jU) 
·u -·u SINU = 1/j. HeJ - e J ) 
where U = 2T[/N 
(4.8) 
(4.9) 
Since division has no meaning in an NTT, the trignometric 
functions must be redefined in the number theoretic sense (53). 
Rewriting equations (4.8) and (4.9). 
casu = 2-1cu + u-1) 
SINU = 2-1(-j)(U - u-1) 
where u = a 5 mod 65521, and 
(from step 4) a = 7791. 
is an element of order 4, and 
The multiplier coefficients for the 3-point WFTA and the 
5-point WFT A are calculated separately. 
(a) Coefficients for the 3-point WFT A 
Let U = aS mod 65521 
C7791)5 = 48847 mod 65521 
(48847f1 = 16673 mod 65521 
mO = 1 
-1 ( -1) m1 = casu - 1 = 2 u + u - 1 
= 32761.(48447 + 16673) - 1 
4-8 
= 32760 mod 65521 
m2 = SINU - 1 = 32761.41224.24297(48847 - 16673) 
= 16087 mod 65521 
Similarly the 5-point transform coefficients are calculated 
in the following manner. 
(b) Coefficients for the 5-point transform 
Let U = a 3 mod 65521 
(7791)3 = 30887 mod 65521 
(30887r1 = 28625 mod 65521 
casu = 32761 • (30887+30887-1) = 29756 mod 65521 
SINU = 32761 • 24297(30887.:30887-1) ::: 13367 mod 65521 
CaS2U = cas2u - SIN2U :: 3004 mod 65521 
SIN2U = 2. SINU. CaSU ::: 49289 mod 65521 
mO = 1 
m1 = 2-1• (CaSU + caS2U) - 1 ::: 16379. mod 65521 
m2 = 2-1• (CaSU - CaS2U) ::: 13376 mod 65521 
m3 = j(SINU + SIN2U) : 19136 mod 65521 
m4 = j(SIN2U) : 18005 mod 65521 
m5 = j(SINU - SIN2U) : 48647 mod 65521 
The coefficients for the 3-point and 5-point transform are 
now multiplied (mod 65521) together, such that each of the 
3-point coefficients is multiplied by each of the 5-point 
transform coefficients. This multiplication (mod 65521) is 
performed using a nested 'Da' loop, such that the 5-point 
transform coefficients are indexed by the inner loop and the 
3-point transform coefficients are indexed by the outer. loop. 
4-9 
The values of the inverse transform coefficients are 
obtained in exactly the same manner (as for the forward 
transform), except that all the SINU are changed to -SINU and the 
transform coefficients thus obtained are then multiplied by 
-1 15 = 61153 mod 65521. 
4.3 Architecture of the TM59900 Microprocessor 
Texas Instruments TMS9900 is a single chip 16-bit CPU 
capable of addressing 64K byte of memory (54), (55). The 
instruction set of the microprocessor provides full minicomputer 
capabilities (including 1/0). There are sixteen general purpose 
16-bit registers (RO to R15). These registers can be defined any 
where in the RAM whose location is determined by contents of the 
workspace pointer. Register to register instructions are 
executed faster than memory to register or register to memory 
instructions. The three on chip registers are accessible to the 
programmer, these registers are: 
a) Workspace Pointer (WP): this register holds the address of the 
current workspace, which is the same as the address of RD. 
b) Program Counter (PC): 16-bit program counter holds the address 
of the current instruction. 
c) Status register (ST): this register represents the current 
machine state. 
The workspace concept increases the programming flexibility 
and more than one program can reside in the memory and executed 
without affecting the other programs. The workspace pointer can 
also be changed during the program execution. This allows the 
4-10 
user to redefine a new set of 16 general purpose registers. The 
special purpose registers R13, R14 and R15 of the current 
workspace contains the contents of old WP, old PC and old ST 
respectively, and a return to old workspace reloads these values 
in the respective registers. This feature is useful when program 
environment is changed to a subroutine, since in a conventional 
CPU the entire machine state is saved on the stack, but in case 
of the TMS9900 only the workspace needs to be changed. A special 
purpose register R12 holds the base address of the Communications 
Register Unit (CRU). All the data read or written to the 1/0 
ports must pass through the CRU. 
This microprocessor also contains 16 x 16-bit hardware 
(unsigned) multiply and 32/16-bit (unsigned) divide, and unsigned 
compare. 
the NTT. 
These features make it suitable for implementation of 
4.4 Implementation on the Microprocessor 
A 15-point and a 60-point WFTA were implemented on the 
TMS9900 microprocessor in assembler language. As there was no 
software support available with the TMS9900 microprocessor, a 
mainframe computer was used for program assembly. A utility 
routine was written in assembler for the TMS9900 to load the 
object program directly from the mainframe computer into the 
memory of the microprocessor. This provided an efficient way of 
testing and debugging the software. 
4-11 
Appendix-B shows an assembler program source listing of the 
15-point WFTA implemented on the TMS9900 microprocessor. A 
FORTRAN program listing of the 15-point WFTA is also included in 
the appendix-B. 
A 60-point WFT A FORTRAN program is listed in (5). A 
60-point WFTA was also implemented in the FORTH language, a 
source program is listed in appendix-C. FORTH is an interactive 
high level language for microprocessors (56), (57). 
The 60-point WFTA has three factors 3, 4, 5, so this 
transform has a three dimensional structure. In general a 
transform length with r factors would have an r dimensional 
structure. The input and output shuffle vectors, forward and 
inverse transform coefficients are calculated in a similar manner 
as for the 15-point WFTA. A 120-point WFTA was also implemented 
in FORTRAN on a mainframe computer. 
An A/0 (analogue to digital) converter and a D/A (digital to 
analog) converter was interfaced with the TMS9900 microprocessor 
system to perform transforms of real time signals. 
4-12 
D-fAPTER 5 
External Hardware Modular Multiplier 
5.1 Introduction 
Microprocessors have found their way into. many digital 
signal processing applications. Multiplication is one of the 
basic operation in digital signal processing. Hence the need for 
performing multiplication on the microprocessor efficiently is of 
vital importance. In many microprocessors no facility is 
provided for hardware multiply or divide. However, software 
routines can be written to perform the required multiplication or 
division operations. 
Some of the later versions of microprocessors are provided 
with signed or unsigned hardware multiplier. For example 
Motorola's MC6809 microprocessor and Texas Instrument's TMS9900 
microprocessor contains an 8 x 8-bit and 16 x 16-bit unsigned 
hardware multiplier respectively. A considerable amount of time 
is needed for multiplication even if the hardware multiplier is 
available. For example, for the MC6809 microprocessor, 173 clock 
cycles are required to produce a 32-bit unsigned product (clock 
speed 1-2 MHz), and for the TMS9900 microprocessor 88 clock 
cycles (clock speed 3 MHz) are needed. As we are interested in 
the product reduced modulo M, some more time has to be allowed 
for modularising the 32-bit result. The most obvious and 
straightforward way to modularise a 32-bit unsigned number is by 
division. However, for the MC6809 microprocessor this division 
5-1 
requires 1264 clock cycles. In total 1337 clock cycles are 
required to produce a 16-bit modular product. Typical program 
coding for 16 x 16-bit (unsigned) multiply and 32/16-bit divide 
routine for the MC6809 microprocessor is listed in appendix-A. 
An alternative approach can be adopted in which the hardware 
multiplier is used to produce a 16-bit modular product which 
requires then only 336 clock cycles (see appendix-A). In the 
case of the TM59900 microprocessor 132 clock cycles are required 
to perform a 32/16-bit unsigned hardware divide, so the total 
number of clock cycles is 220. The number of clock cycles 
required depends upon the addressing mode of the instruction, 
since register to register instructions are executed much faster 
than the register to memory instructions. 
The time required for modular multiplication can be reduced 
further by interfacing a high speed external modular multiplier 
to the system to increase the throughput of the system, thus 
increasing the range of digital signal processing applications. 
Different algorithms may be adopted to implement external 
multiplication. Either serial or parallel methods may be 
employed. For a parallel multiplier the cost increases 
approximately linearly with the number of bits, whereas for a 
serial multiplier the execution time increases approximately 
linearly. Davies (28), have described some aspects of performing 
multiplication on the zao microprocessor, and interfacing an 
external hardware multiplier to it. Weed (29), have described 
theoretical clockless multiplication and division circuits using 
4 x 4-bit multiplier chips. The product of larger numbers can be 
5-2 
obtained by employing more than one multiplier chip and adding 
the partial products in an appropriate way. In clockless 
(combinatorial) circuits the total multiplication time is the sum 
of the propagation delay on the chip, and the carry propagation 
delay of the adders. This propagation delay increases 
approximately linearly with the number of input bits. 
P arasuraman (18), have described a hardware multiplier interfaced 
to a microprocessor. 
5.2 Design and Implementation of an External Hardware 
Modular Multiplier 
Large Scale Integration (LSI) techniques now allow the 
integration of a complete 8 x 8-bit multiplier on a single chip. 
For example Advanced Micro Devices (44), and TRW (30), (39), 
(42), have produced single chip 8 x 8-bit (AM25S558) and 16 x 
16-b it (MPY -16AJ) multiplier respectively. These multiplier 
chips have a typical 8 x 8-bit and 16 x 16-bit multiplication 
time of approximately 45 and 200 nanoseconds respectively. A 
single chip multiplier (8 x 12-bits) to produce the 13 most 
significant bits of the product with an internal propagation 
delay of about 2 nanoseconds have also been reported, additional 
delay due to external components adds up to 30 nanoseconds (32). 
The interfacing of an external hardware multiplier with a 
microprocessor have been described by Davies and Fung (31). This 
interfacing can be achieved in two ways. Either it can behave as 
an I/O peripheral or it may be mapped into the memory space of 
the microprocessor. 
5-3 
An external hardware modular multiplier (mod 65521) was 
designed and constructed using wire wrapping techniques. 
interfaced with the TMS9900 microprocessor. 
5.2.1 Interfacing Considerations 
It was 
We shall use the term modular multiplier for the external 
hardware modular multiplier interfaced with the TMS9900 
microprocessor. The two choices to interface the modular 
multiplier to the TMS9900 are as follows. 
i) connect to the 1/0 port 
ii) connect directly to the address and data bus 
In the first choice the main disadvantage is that 262 clock 
cycles are required to communicate with the external modular 
multiplier through the I/0 port. The strobe signals for the 
modular multiplier must also be generated at the output port. 
This process is slow since the TMS9900 communicates with the 1/0 
ports through the Communications Register Unit (CRU) serially. 
The number of clock cycles thus required are more than when the 
hardware multiply and divide are used to produce the modular 
product. In the latter arrangement the modular multiplier behaves 
like an intelligent memory mapped peripheral, with three unique 
16-bit addresses. The data is written to two of the addresses 
and read from the third. 
5-4 
5.2.2 Interfacing the Modular Multiplier with the 
TMS9900 Microprocessor 
Figure (5.I) shows a block diagram of the complete 
(combinatorial) modular multiplier interfaced with the TMS9900 
microprocessor. In figure (5.I) and (5.2) lines with arrowheads 
represent the data bu& 
This modular multiplier combines two of the forementioned 
techniques, using parallel multiplier chips to produce a 32-bit 
unsigned product and ROM lookup tables whose outputs are combined 
by a modular adder. The 32-bit unsigned product is reduced 
modulo 6552I in the following manner. The high order I6-bits of 
the 32-bit unsigned product pre-multiplied by a fixed constant 
2I 6- 6552I (=I5) are added to the low order I6-bits of the 
product using a modular adder. Direct storage of the 
pre-multiplied data would require a 64K x I6-bit ROM. However, 
if the output is determined by combining partial products derived 
from the 8 low order bits and the 8 high order bits of the high 
order I6-bit input, the storage requirement is reduced to two 256 
x I6-bit ROMs. 
Figure (5.2) shows the block diagram of the modular adder, 
which consists of three identical I6-bit binary full adders, with 
two inputs AI and A2. The output of FAI is checked by a carry 
and overflow detector (CD) circuit (figure 5.3). If a carry or 
an overflow is detected this circuit activates the gate GI and a 
value of 2I6- 6552I (=I5) is added to the output of F AI in F A2. 
This may generate a carry or overflow activating gate G2 adding a 
further value of I5 in F A3. The output of F A3 is the final 
5-5 
M MULTI PLJE R 
T BUS DRIVER 
TB BUFFER 16 
L1] LATCH ' L2 
·~~MODULAR ADDER TB 
. OE 
16 
, 16 
16--~ ~ _f)ATA BUS - - MA1 MA2 ~ . .... L1 T j 
-· 
... 16-16 
fCLK 16 ~ 16 j ~ z 
LOW 0 ..J 
- 0 
, 
ORDER , ~ 16 TMS9900 .... 01: M _...__ 16 u 16 w ... 
- 32 ~p Q: z 
0 0 u 
16 4 16 ADDRESS CONTROL L2 - HIGH ROM1 ROM2 -.. 
BUS 16 LOGJC ORDER 
_J ~ fwe I CLK ~ 8 LOW ORDER 8 HIGHO RDER. 
F iqure 5.1: Block diagram of the' modular multiplier (mod 65521). 
A1 
16 16 16 
1 
....::.. _ ... 
-
-
NPUT FA1 FA2 FA3 
A2 1! 1§ -- 16 .. 
A A 
Gl J G2J 
FA J I J I CD 
co CD 
~· 16 
DATA DATA (15) (15) 
Figure 5.2: Bl.ock diagram of the ·modular adder (mod 65521). 
16 
.L. ....... 
OUTPUT 
MODULAR 
SUM 
FUL 
CARl 
DET 
ADDER 
Y/OVERFLOW 
ECT 
modular sum. A modular adder was designed and constructed for 
test purpose before implementing it with the modular multiplier. 
The basis of this modular multiplier is four (8 x 8-bit) 
multiplier chips (AM255558), which achieve a typical 8 x 8-bit 
multiplication in approximately 45 nanoseconds. These multiplier 
chips are combined with full adders (SN74LS83) to achieve a 16 x 
16-bit to 32-bit multiplication in approximately llO nanoseconds. 
Figure (5.4) shows a photograph of the modular multiplier, the 
four multiplier chips can be seen clearly. 
Typical program coding and timings for the hardware multiply 
and divide operation is shown in figure (5.5), and coding for the 
use with the external hardware modular multiplier is shown in 
figure (5.6). 
On the first and second move (MOV) instructions the two 
16-bit data words are latched in L1 and L2 (SN74LS374) through a 
bidirectional bus driver T (SN74LS245). Address and control 
signals for these latches and driver are generated by 
appropriately decoding the addresses and gating it with the write 
enable (WE) line from the TMS9900 microprocessor. The outputs of 
L1 and L2 are directed to the multiplier M. The 32-bit unsigned 
product is then split into three parts. The low order 16-bits 
are connected directly to one of the inputs of the first modular 
adder MAl. The high order 16-bits are further split into two 
8-bit words. The low order half 8-bits are directed to the 
address bus of ROM1 and the other half 8-bits are directed to the 
address bus of ROM2. ROM1 and ROM2 are four 256 x 4-bit 
(AM27521) PROMs, with a typical access time of 45 nanoseconds. 
5-6 
)----OUTPUT= 1 
FOR 00-015 >65521 
CARRY OUT 
FROM FULL ADDER 
Figure 5.3: Carry and overflow detect circuit. 
Figure 5.4: Photograph of the external hardware modular 
multiplier. 
Clock cycles Labels Mnemonics Operands 
14 
88 
132 
234 
MDV @MPD,R2 
MPY @MPR,R2 
DIV @MOD,R2 
RT 
MOD DATA 65521 
MPR DATA 
MPD DATA 
R3 contains the modular product. 
Figure 5.5: Program coding for using hardware 
multiply and divide. 
Clock cycles Labels Mnemonics Operands 
INPUT1 EQU >3FF2 
INPUT2 EQU >3FF4 
OUTPUT EQU >3FF6 
20 MDV @MPR,@INPUT1 
20 MDV @MPD,@INPUT2 
14 MDV @OUTPUT,R3 
RT 
54 MPR DATA 
MPD DATA 
R3 contains the modular product. 
Figure 5.6: Program coding using external modular multiplier. 
(> Shows hexadecimal values, and @ shows symbolic names.) 
Typical values stored in ROMl and ROM2 are shown in tables (5.1) 
and (5.2). The output of ROMl is connected to an input of the 
first modular adder MAl. MAl combines the low order 16-bits of 
the 32-bit product with the partial product stored in ROMl from 
the low order 8-bits of the high order 16-bits. MA2 then adds in 
the other partial product stored in ROM2. The output of MA2 is 
finally the 16-bit modular product of the two current 16-bit 
values in the input latches Ll and L2. The output of these 
latches, multiplier chips and the PROMs are permanently enabled, 
so after the second value is latched in L2 the 16-bit modular 
product is available in less than 500 nanoseconds at the output 
of MA2. This output can be read back into the microprocessor by 
activating the tristate buffer TB (SN74LS126) at the output of 
MA2. 
The multiply instruction for the TMS9900 microprocessor 
works in the following manner. If the multiplicand is in 
register Rn and the multiplier is in register Rm. Then after the 
multiply instruction Rn:Rn+l holds the product and Rm remains 
unchanged. For example, if register R2 contains $FFFF, and R3 
contains $FFFF, then after the multiplication the register pair 
R3:R4 contains $FFFEOOOl, where ':' shows concatenation of two 
registers to form a register pair to hold the 32-bit product. 
The division operation also utilises a (consecutive) 
register pair to hold the quotient and the remainder. Initially 
the dividend is held in a register pair Rn:Rn+l. After the 
division the Rn holds the quotient and Rn+l holds the remainder. 
For example, if R2 contains the divisor ($0005) and R3:R4 
5-7 
Table 5.1: Values in ROM 1 
0 390 780 1170 1560 1950 2340 273() 3120 3510 
IS 405 795 ttBS 1575 1965 2355 2745 3135 3525 
30 420 810 1200 1590 1980 2370 2760 3150 3540 
45 435 825 1215 1605 1995 2385 2775 3165 3555 
60 450 840 1230 1620 2010 2400 2790 3180 3570 
75 465 855 1245 1635 2025 2415 2805 3195 3585 
90 480 870 1260 1650 2040 2430 2820 3210 3600 
105 495 885 1275 1665 2055 2445 2835 3225 3615 
120 510 900 1290 1680 2070 2460 2850 3240 3630 
135 525 915 1305 1695 2085 2475 2865 3255 3645 
150 540 930 1320 1710 2100 2490 2880 3270 3660 
165 555 945 1335 1725 2115 2505 2895 3285 3675 
180 570 960 1350 1740 2130 2520 2910 3300 3690 
195 585 975 1365 1755 2145 2535 2925 3315 3705 
210 600 990 1380 1770 2160 2550 2940 3330 3720 
225 615 1005 1395 . 1785 2175 2565 2955 3345 3735 
240 630 1020 1410 1800 2190 2580 2970 3360 3750 
255 645 1035 1425 IBIS 2205 2595 2985 3375 3765 
270. 660 1050 1440 1830 2220 2610 3000 3390 3780 
285 . 675 1065 1455 1845 2235 2625 3015 3405 3795 
300 690 1080 1470 1860 2250 2640 3030 3420 3810 
315 705 1095 1485 1875 2265 2655 3045 3435 3825 
330 720 lftO 1500 1890 2280 2670 3060 3450 
345. 735 tt25 1515 1905 2295 2685 3075 3465 
360 750 ff40 1530 ;920 2310. 2700 3090 3480 
375 765 It 55 1545 ·1935 2325 2715 3105 3495 
Table 5.2: Values in ROM2 
0 34319 3tt7 37436 6234 40553 9351 43670 12468 46787 
3840 38159 6957 41276 10074 44393 13191 47510 16308 50627 
7680 41999 10797 45116 13914 46233 17031 51350 20148 54467 
11520 45839 14637 48956 17754 52073 20871 55190 23988 58307 
15360 49679 18477 52796 21594 55913 24711 59030 27828 62147 
19200 53519 22317 56636 25434 59753. 28551 62870 31668 466 
23040 57359 26157 60476 29274 63593 32391 tt89 35508 4306 
26880 61199 .29997 64316 33tt4 1912 36231 5029 39346 8146 
30720 65039 33837 2635 36954 5752 40071 8869 43188 11986 
34560 3358 37677 6475 40794 9592 43911 12709 47028 15826 
38400 7198 41517 10315 44634 13432 47751 16549 50868 19666 
42240 .11038 45357 14155 48474 17272 51591 20389 54708 23506 
46080 14878 49197 17995 52314 2tt12 55431 24229 58548 27346 
49920 18718 53037 21835 56154 24952 59271 28069 62388 31186 
53760 22558 56877 25675 . 59994 28792 63111 31909 707 35026 
57600 26398 60717 29515 63834 32632 1430 35749 4547 38866 
61440 30238 64557 33355 2153 36472 5270 39589 8387 42706 
65280 34078 2876 37195 5993 40312 9tt0 43429 12227 46546 
3599 37918 6716 41035 9833 44152 12950 47269 16067 50386 
7439 41758 10556 44875 13673 47992 16790 51109 19907 54226 
11279 45598 14396 48715 . 17513 51832 20630 54949 23747 58066 
15tt9 49438 18236 52555 21353 55672 24470 58789 27587 61906 
18959 53278 22076 56395 25193 59512 28310 62629 31427 
22799 57tt8 25916 60235 29033 63352 32150 948 35267 
26639 60958 29756 64075 32873 1671 35990 4788 39107 
30479 64798 33596 2394 36713 55ft 39830 8628 42947 
contains dividend ($00000058) then after the divide instruction 
R3 will contain ($0012) and R4 will contain ($0001). 
The dividend must be in a register pair (right justified). 
Before performing the division the microprocessor checks if the 
divisor is greater than the most significant word of the 
dividend. If the divisor is greater then normal division takes 
place. However, if the divisor is smaller than the most 
significant word of the dividend then overflow bit in the status 
register is set and the division operation is aborted, and the 
dividend remains unchanged. 
In figure (5.5) register pair (R2:R3) holds the 32-bit 
unsigned product resulting from a multiply (MPY) instruction. 
After a divide (DIV) instruction R2 holds the quotient and R3 
holds the remainder. 
Comparing the two values in figure (5.5) and figure (5.6) 
shows a saving of 180 c 1 oc k cycles for a single modular 
multiplication. For ·a clock frequency of 3 MHz the total time 
saved for each modular multiplication is 60 microseconds. 
5.3 Results 
A 15-point and a 60-point WFTA transform were run on a 
T MS9900 microprocessor, requiring 18 and 72 multiplications 
respectively. The execution time for a 15-point WFTA is about 4 
milliseconds and for a 60-point WFTA is about 32 milliseconds 
using the hardware multiply and divide instructions. When the 
external hardware modular multiplier is implemented, execution 
5-8 
time is reduced to about 3 milliseconds for a 15-point transform, 
and to about 28 milliseconds for a 60-point transform. 
A 60-point WFTA implemented in FORTH requires about 739 
milliseconds to execute. When the external hardware modular 
multiplier is used, a saving of 3 milliseconds is achieved. 
An interesting point to note is that the modular multiplier 
generates the 16-bit modular product between the second and third 
move (MOV) instruction. If the modular multiplier had been 
slower, then a delay routine would be required between latching 
the second operand into the modular multiplier and reading the 
modular product from it. 
The modular multiplier was tested extensively. A test 
routine for the TMS9900 microprocessor was written to check all 
the possible input combinations of the multiplier and the 
multiplicand. The modular product obtained from the modular 
multiplier were compared with modular product of the same two 
numbers calculated by the microprocessor itself. 
Total cost of this external hardware modular multiplier is 
approximately~ 400 (1980), which is dominated by the cost of the 
four multiplier chips. Total power consumption is about 16 watts 
and 81 i.e. packages are used in all. 
5-9 
a-w>TER 6 
Multi Processor and Parallel Processor Systems 
6.1 Introduction 
A Central Processing Unit (CPU) fetches instructions from 
its program memory sequentially under the program control (see 
figure 6.1). These instructions are then decoded and executed. 
Each instruction may differ in length depending upon the mode of 
instruction. These instructions are visualised as stream of 
instructions and operands as stream of data. 
The data are manipulated in the CPU registers and the 
results are stored back in the memory. The arithmetic operations 
performed in the CPU registers are much quicker than the register 
to memory or memory to register operations. The onchip registers 
are also referred to as scratchpad registers. Some of the onchip 
registers are not accessible to the programmer and are entirely 
used by the CPU. 
6.2 System Organisation 
Suppose that a processor P is operating at full speed and 
capacity. Let M1 and M0 be the minimum number of instruction 
and data stream respectively. The computer systems can then be 
organised into four different ways according to the instruction 
and data stream. 
6-1 
6.2.1 Single Instruction Single Data (SISD) Machine 
In this type of system MI = MD = 1. This arrangement is 
typical of a uni processor (with a single Arithmetic-Logic Unit 
(ALU), and a Control Unit) system. A single instruction I is 
fetched from the program memory sequentially under the ALU 
control, and is decoded by the ALU and then executed in m 
subinstructions 
... ' (as shown in figure 6.2). 
The data are obtained from the data memory, and after the 
calculations the results are stored back into it. Each 
instruction represents one arithmetic operation on input data D 
I 
entering the ALU to generate the output data D. 
6.2.2 Single Instruction Multiple Data (SIMD) Machine 
In this case MI = 1, and MD > 1. Figure (6.3) shows a 
typical SIMD machine. There are m number of processors P. These 
processors are arranged in such a manner that the same 
instruction stream performs operations on m seperate input data 
streams D1 , D2 , ••• , 
I 
str.eams Dl' 
D . 
m 
I 
D 
m 
To generate the output data 
This arrangement is typical 
of an array processor, with a single control unit with some 
arrangement to broadcast instructions to the desired processors. 
6.2.3 Multiple Instruction Multiple Data (MIMD) Machine 
In this type of system MI > 1 and MD > 1. Figure (6.4) 
shows a typical MIMD machine. Processors P are arranged such 
that each one is distinct and separate, and a separate 
6-2 
DATA 
STREAM 
CPU MEMORY 
INSTRUCT ION 
STREAM 
Figure 6.1: Conventional uni processor system. 
INSTRUCTION 
STREAM 
D 
UT ! INP DA TA STREAM 
s, 
S2 
' 
' I 
I 
' I 
I 
I 
Sm 
(/) 
z 
0 
-.... 
(,) 
~ 
0::: 
.... 
(/) 
z 
-CD 
~ 
(/) 
TPUT ! ou 
D' DATA STREAM 
Fiqure 6.2: A Sinqle Instruction Sinqle Data (SISD) machine. 
I 
I 
..... 
o, 
I 
p1 
L 
0, 
INPUT DATA 
STREAM 
02 
I 
ri ~ 
L 
02 
OUTPUT OAT 
STREAM 
INSTRUCTlON STREAM 
Om 
A 
Figure 6.3: A .Single Instruction Multiple Data (SIMD) machine. 
12 
INPUT DATA 
STREAM 
02 
, 
02 
OUTPUT DATA 
STREAM 
INSTRUCTION STREAM 
Dm 
, 
Dm 
F iqure 6.4: A Multiple Instruction Multiple Data (Mitv1D) machine. 
instruction stream is applied to each of the m processing units. 
Let each of the processing units have separate input data 
streams D1, D2, ••• , Dm to generate the output data streams 
I I I 
D 1' D 2' ••• ' D m This system executes several 
independent programs concurrently. It basically forms a multi 
processor system, such that each processor has a separate program 
memory. 
6.2.4 Multiple Instruction Single Data (MISD) Machine 
In this case MI > 1 and M0 = 1. Figure (6.5) shows a 
typical MISD machine. The same data passes through different 
segments. The same set of data D is being operated upon by m 
I 
instructions to generate the output D. This arrangement can 
also be called as an m-segment pipeline processor. A pipeline 
processor requires more hardware and complex circuitry, but has 
high speed operation. Each of the segments is separated by a 
buffer register to hold intermediate results. 
6.3 Multi Processor Systems 
Experience reveals that parallelism in hardware circuitry 
increases the throughput of the system. Increasing the level of 
parallelism increases the potential operating speed but also the 
hardware and the cost. 
Consider a uni processor system with programmed I/O devices. 
A CPU performs I/O routines to transfer data to and from the I/O 
devices using polling. Polling is a scheme in which the CPU 
6-3 
~ i1 
~ 
w 
~ 
.... 
(/) 
z 
0 
i2 
-.... 
u 
:::> 
~ 
..... 
(/) 
z 
im 
INPUT DATA 
STREAM 
D 
51 
lr 
52 
l 
i 
j 
l 
Sm 
I 
D 
OUTPUT DATA 
STREAM 
(/) 
.... 
z 
w 
~ 
(!) 
w 
(/) 
Figure 6.5: A Multiple Instruction Single Data (MISD) machine. 
periodically checks the I/O devices to see if any of the devices 
needs servicing. The system would tend to slow down when the CPU 
is interfaced to rather slow mechanical devices e.g. a card 
reader, or a line printer etc. An improvement on programmed 1/0 
method of data transfer is to implement interrupts. In this case 
the CPU does not poll any of the devices, but when the peripheral 
or 1/0 device is ready to receive/transmit data it sends an 
interrupt signal to the CPU. The CPU branches to the appropriate 
Interrupt service routine, and after performing 1/0 routines 
resumes normal operation. A further improvement would be to 
employ 1/0 Processors (lOPs) also called Peripheral Processing 
Units (PPUs). These reduce considerably the load on the main 
CPU. The lOPs share common memory with the main CPU. But the CPU 
still initiates and terminates all the data transfer operations. 
The main CPU behaves as a master and the lOPs as slaves. 
The advantage of employing CPU and lOPs side by side is that 
both can execute their programs concurrently and independently of 
each other. This basically forms a type of multi processor 
system. Figure (6.6a) shows a single shared link between memory 
and 1/0 devices for local communications. The speed of the 
system may suffer if the 1/0 devices are very slow. However, 
figure (6.6b) shows another arrangement with dual bus, in this 
case 1/0 devices are controlled by an lOP (22). 
In most practical systems it is required by the processors 
to communicate with each other. The multi processor systems can 
be classified as directly or indirectly coupled, which depends 
upon the method of data exchange. 
6-4 
CONTROL BUS 
DATA BUS 
MEMORY CPU I/O I/O 
Figure 6.6a: Local communication between CPU and memory and I/0 
connected through a shared bu~ 
CONTROL BUS 
----. 
DATA BUS 
-----· 
MEMORY CPU lOP 
CONTROL BUS 
-----· 
DATA BUS 
-·-- ---
1/0 
Figure 6.6b: Local communications with memory and several 
I/0 through IOP using dual hus structure.-
I/o 
6.3.1 Directly Coupled Multi Processor Systems 
A multi processor system is defined as a computer system 
with more than one CPU, sharing a common memory and 1/0 devices. 
The CPUs co-operate with each other at hardware and software 
level, and exchange data with each other through common memory 
when required (73). This is known as a directly or tightly 
coupled multi processor system. 
Scales (77), have described two kinds of directly coupled 
m u It i microprocessor systems using Motorola's MC6809E 
microprocessor, namely global-only and local/global type. He has 
also discussed the basic hardware differences between the MC6809 
and the MC6809E version of the microprocessor. The MC6809E 
version requires an external (TTL) clock, but the MC6809 has an 
onchip oscillator, which operates by an external crystal. The 
MC6809E version provides output signals suitable for a multi 
microprocessor environment. 
In the global-only type, the microprocessors continuously 
use the same global bus, because all the microprocessors share 
the common (global) memory. The efficiency of the system is low. 
Each microprocessor is granted the bus by the bus arbiter at the 
begining of each cycle of the clock E. One of the 
microprocessors has higher priority than the rest of the 
microprocessors such that the system behaves as a master and 
slaves. The master acquires the global bus on powerup reset to 
initialise the system and peripherals, while the other 
microprocessors execute the SYNC instruction and wait for the 
interrupt after the reset has been activated. The priority of 
6-5 
the microprocessors is in round-robin manner. At any instant 
only one microprocessor uses the global bus and the clocks are 
stretched for other microprocessors. The maximum time for which 
the clock can be stretched is 10 microseconds without loss of 
data. 
In the local/global system each of the microprocessor has 
its own local program and data memory connected to the 
microprocessor by the local data and address buses. In addition 
there is a global memory, data bus, address bus and global 1/0 
devices. Each of the microprocessors is allocated a different 
task, for example one of them performs the 1/0 operations, the 
other runs the operating system, and the control microprocessor 
supervises the entire system. 
A bus arbiter controls the flow of the data from the 
microprocessors to the global memory and global 1/0 devices. 
Each of the microprocessors is executing program from its own 
local program memory using its local bus. If any of the 
microprocessors wishes to access the global memory, it puts a 
request to the bus arbiter which makes sure that only one 
microprocessor is accessing the global bus at a time to prevent 
bus contention. If two microprocessors simultaneously request 
the bus arbiter to access the global memory, the bus is granted 
by the bus arbiter to the microprocessor which has higher 
priority, and sends the other microprocessors into a wait state 
with their clocks stretched until the first one has finished the 
data transfer into the global memory or the global 1/0 device. 
As long as the microprocessors are executing programs from their 
6-6 
own local program memories the speed and efficiency of the system 
is a maximum, but as soon as more than one microprocessor wishes 
to access the global memory or 1/0 device, the speed of the 
system suffers. The number of microprocessors which can be 
interconnected in this manner is limited (4 in this case). 
Hoffner and Smith (68), have described a tightly coupled 
multi processor system. This system employs two MC6809 
microprocessors. These two microprocessors are operated by 
opposite phases of a common clock. This prevents simultaneous 
access by the microprocessors to the common memory. The memory 
in this system should be twice as fast as the processor read 
cycle, to prevent contention. The processors are connected 
through a parallel interface buffer to a common memory. The 
advantage in this system is that in one cycle one of the 
processor is writing into the memory, while in the next 
(anti-phase) clock the other processor can read this particular 
byte. The major drawback in tightly coupled multi processor 
systems in general is the memory conflict. The method described 
above circumvents memory conflict problem (limited to 2 
microprocessors only). 
6.3.2 Indirectly Coupled Multi Processor Systems 
Indirectly coupled multi processor systems in contrast do 
not share a common memory (73). The data exchange takes place 
through an other medium like magnetic tape, magnetic disk or I/O 
ports etc. Each of the CPUs has its own associated memory. In 
loosely or indirectly coupled multi processor systems the 
6-7 
processors work more autonomously as compared to tightly coupled 
systems. 
Bellm and Sauer (64), have described three different methods 
for data exchange between two Intel 8080 microcomputers. 
The first method involves parallel data transfer through 
Programmable Peripheral Interface (PPI) using I/O ports. A 
further port is used for handshaking. These handshaking signals 
are also referred to as semaphores. Semaphores are memory 
locations under the software control which act as flags 
indicating the presence or absence of data. When one 
microcomputer transfers the data into its output port, it sets a 
1-bit flag in the other output port. This port is being 
continuously monitored by the other microcomputer, when it is 
expecting data from the other microcomputer. When the signal on 
a particular bit changes, the destination microcomputer reads the 
output port of the source microcomputer. The destination 
microcomputer then acknowledges this by setting a bit in its own 
output port. This port is being monitored by the source 
microcomputer (after it has transferred data to its output port). 
The source microcomputer after receiving this acknowledgement 
sends the next data byte. The data transfer can be in either 
direction, i.e. each of the two microcomputers can at one instant 
behave as source, and in the next instance as destination. A 
loop counter determines the number of data bytes to be 
transmitted and/or received. 
6-8 
The second method uses interrupts. When the data are 
available at the output port the source sends an interrupt to the 
destination. After executing the interrupt routine the two 
microcomputers can resume their normal operation independently. 
Data exchange still takes place through input and output ports. 
The destination microcomputer then reads the data, and sends an 
acknowledge signal back to the source. 
The third method employs Direct Memory Access (DMA). The 
source microcomputer sends a request for DMA to the destination 
microcomputer. The destination microcomputer forces its address 
and data buses into high impedance state. The source can then 
access the address and data buses of the destination 
microcomputer to access its memory. Then the source 
microcomputer can write into this remote memory as if it were its 
own memory. A tristate buffer is required to isolate the common 
buses of the two microcomputers (67), (77). During the DMA the 
destination microcomputer is not executing any program. After 
the DMA is complete a signal transmitted to the destination 
microcomputer restarts it. This method of data transfer requires 
complex circuitry. Tanabe and Matsumoto (74) have described a 
dual bus microprocessor. This microprocessor is capable of 
behaving as a master or a slave depending upon a control signal. 
The dual bus architecture allows use of both the buses (local and 
global) simultaneously, for example on the internal bus the CPU 
is executing its program, while the external bus is being used 
for DMA. This prevents the microprocessor idling during DMA, 
thus increasing the throughput. 
6-9 
6.4.1 Time-shared Bus 
A time-shared bus is sometimes also referred to as a shared 
bus (22), (71), (72). This is a single bus which is used by 
several processors to communicate with each other, or with some 
other processor or 1/0 device at different intervals of time. A 
shared bus has more than one source and destination. The shared 
bus may be unidirectional or bidirectional. The data transfer 
rate is low but the cost is also low. The complexity of the 
hardware and control function increases with the increase in the 
number of processors on the bus. A major disadvantage is that 
only one processor can act as a source at a time, and the rest of 
the processors are effectively cutoff from the bus during this 
period see figure (6.7). A bus arbiter or a multip_lexer controls 
the dynamic communication path between the two devices. 
Additional systems can be connected to the bus, without major 
alterations in the link, provided that the arbiter has the 
capacity to control all the devices. Such a system is called a 
modular system. 
6.4.2 Dedicated Link 
A dedicated link is the one in which there is only one 
source and one destination per link see figure (6.8). A 
dedicated link provides high speed communications at the expense 
of increased cost. These dedicated links can either be 
unidirectional or bidirectional. If an additional device is to 
be connected to the n-device system then n(n-1)/2 number of links 
are required. This kind of system is non-modular. 
6-11 
CONTROL BUS 
DATA BUS 
o, 02 Dm 
Figure 6. 7: A shared bus system. 
A B 
0 C 
Figure 6.8: Several devices interconnected throuqh dedicated 
link. 
6.5 Parallel Processor Systems 
The term parallel processing is used in a very general 
sense, which involves methods. to improve computational speed by 
performing calculations simultaneously or in parallel. 
Each of the CPUs has its own local memory (RAM and ROM). 
These local memories are not accessible to any other processor, 
not even to the master. The role of the master in this 
configuration is to control the data flow to and from the slaves. 
The master can also initiate the task. This type of system is 
useful in implementing algorithms with inherent parallelism (59), 
(61). Then a big task is broken down into subtasks and each 
processor is allocated a subtask (73). The processors 
communicate with each other through the 1/0 ports or dedicated 
buses. A master processor supervises the entire system. The 
master is capable of communicating with all the slaves. This 
kind of system is of dedicated type, and it is not very suitable 
for general applications. Another approach to such a system is 
that the master is capable of accessing the local memory of the 
slave(s). This makes the system programmable and more flexible, 
i.e. the master can transfer program(s) into the local memory of 
the slave(s) and request them to execute this program on a 
particular set of data (63). After completing the task the 
slave(s) informs the master and goes into an idle state and waits 
for the next task. This method is also useful when the raw data 
is to be preprocessed to be used at a later stage during the 
program execution by the master. 
6-12 
A parallel processor system basically forms an MIMD machine. 
All the processors are under the control of a central control 
unit. Increased parallelism makes the system special purpose or 
dedicated, while low order of parallelism makes the system less 
efficient. Parallelism in a particular problem is obtained by 
examining the size and type of the problem. 
FFT type algorithms can be implemented on a parallel 
processor system provided that the data exchange among the 
processors are performed in an efficient manner (1). 
Parallelism in an algorithm is defined as number of 
arithmetic operations that are independent and can therefore be 
performed in parallel i.e. concurrently. A system which can then 
utilise this parallelism in full would give a highly efficient 
system. 
6.6 Array Processors 
A processor is defined as a computer without a control unit 
(66). These processors can be arranged into arrays with a single 
control unit. These processors are then much easier to design 
using integrated circuit technology on a single chip. This would 
basically form an SIMD machine. The control unit, depending upon 
the instruction, can disable or enable a particular processor. 
If a separate control unit is provided for each processor then it 
would work more autonomously, but still working under the control 
of a central control unit. 
6-13 
Performance of an array processor is the (data) bandwidth or 
maximum throughput measured in terms of maximum number of results 
that can be generated per unit time. One measure often used for 
high performance machines is the number of floating point 
operations per second (flops). Sometimes a bigger unit, 
megaflops (million floating point operations per second), is also 
used. 
Array processors are employed for implementation of 
algorithms which have inherent parallelism (62), (70). Each 
processor share the task of processing the data, the load on each 
processor should be kept at the same level. As the processors 
are physically located in close proximity to each other, parallel 
connection exists between them. Each processor can have its own 
program and data memory. The control unit can appropriately 
enable or disable the processors as required. 
6. 7 Processor - Memory Interconnection 
Processor to memory interconnection is one of the essential 
factors to be considered while designing a multi processor 
system. The connection to the main memory with a number of 
processors can be achieved by multiplexing through a switching 
network (87). 
Figure (6. 9) shows a cross-bar switch matrix interconnecting 
processors P and memory or 1/0 modules M. The advantage of this 
arrangement is that the connection between several processors and 
memory modules can be achieved simultaneously, provided they are 
accessing different memory modules. In this case the efficiency 
6-14 
PROCESSORS 
pl p2 
M, I' 
'"' " 
0 
-
M2 \, 
' 
> ~ 
0 I ' ., 
~ I . 
UJ 
I I t t ~ Mn 
Figure 6. 9: Processor-memory interconnection through a cross-bar 
switch. 
S SWITCH 
P PROCESSOR 
Figure 6.10: Several processors connected to a ring throuqh 
switches. 
would be a maximum. Some arrangement must be provided to prevent 
simultaneous access by two or more processors to a common memory. 
The cost of a large cross-bar switch may exceed the total cost of 
the rest of the system. 
Arden and Berenbaum (65), have described a switch with four 
ports, of which one is the input port and the rest are output 
ports. The connections of the input port to any of the output 
port can be achieved by proper addressing. These three output 
ports can further be connected to similar switches which can 
extend the capability of the processor to access a bank of 
memories. But care should be taken not to connect more than one 
processor to the same memory module accessing a different 
address. This is referred to as memory interference and it is 
entirely under software control. Another kind of contention in a 
multi processor system which can arise is the access of the 
common system routines or tables. This kind of contention is 
called system contention. To overcome this problem the routines 
must be reentrant. A reentrant routine is the one which can be 
executed by several different processors simultaneously, data 
should be in different data memory for each processor. 
Interleaved memories may also be implemented. In an 
interleaved memory structure even and odd addresses are located 
in different memories, such that they can be accessed one after 
the other in quick succession. This reduces the constraints due 
to the low access time of the memory. For instance the processor 
fetches the instruction (op code) from the even address, in the 
next cycle it will fetch operands from the odd address memory 
6-15 
module. 
6.8 Computer Systems 
The computer systems can be connected in several ways, few 
of them are described below. 
6.8.1 Ring Structure 
A ring or mesh network is shown in figure (6.10) (22). The 
ring structure is used for long distance communications or local 
area networks. The switches 51 to 56 behave as multiplexers, the 
processors Pl to P6 are interconnected through these switches. 
Each of the processor before transmitting the data sends the 
address of the destination processor to the link. Appropriate 
switch is selected and then the data is transmitted. A 
particular switch then selects its local processor as the 
destination and routes the data to its local processor, otherwise 
forwards it to the next switch in sequence. This form of network 
is modular. A facility in the system to reconfigure itself in 
case of a switch failure makes the system more reliable. 
6.8.2 Star Link 
A star link shown in figure (6.11) consists of centralised 
con troller C. Processors talk with each other through this 
central control switch. 
cripple the entire system. 
Failure of the control switch C would 
6-16 
C CENTRAL 
CONTROLLER 
P PROCESSOR 
Figure 6.11: Several processors connected to a central control 
switch to form a star configuration. 
P PROCESSOR 
Figure 6.12: Fully connected multi processor network. 
6.8.3 Fully Connected Link 
A fully connected network is shown in figure (6.12). In a 
large computer network all the computers may be connected to each 
other through a dedicated or a time-shared bus. This allows the 
system to bypass a busy or a faulty processor. There is no 
central control, each processor is allowed to communicate with 
any other processor independently when required. This network 
will be costly to implement due to multiple connections. The 
fully connected network is highly non-modular. 
6-17 
D-V\PTER 7 
A Dedicated Parallel Microprocessor System 
7.1 Introduction 
A number of microprocessors are available now commercially 
(75), (76). Microprocessors are slow for many applications. 
However, additional hardware may be employed for better 
performance e.g. an array processor interfaced with a main frame 
computer may increase its performance many fold (62), (70). The 
software on the mainframe computer must be able to detect the 
degree of parallelism in an algorithm, and generate appropriate 
code for it. 
Arden and Barenbaum (65), and Enslow (66), have suggested 
that employing several cheap processors in parallel can in 
certain cases outperform an expensive mainframe computer. With 
the availability of cheap microprocessors parallel processing 
technique to implement WFT A was investigated. 
Figure (4.3) shows a flow diagram of the 15-point WFTA. 
Figure (7.1) shows another way of representing it, which 
illustrates the two dimensional structure in the algorithm. A 
transform of length N, which can be factorised into n mutually 
prime factors (N = r1 xr2x ••• xr n) will have an n dimensional 
structure. For example in this case N = 15, the two mutually 
prime factors are 3 and 5. When the 15-point WFTA is implemented 
on a uni processor system, the 'DO' loop simulates a· parallel 
processor system, calculating the values sequentially rather than 
7-1 
0 
1 
8 
8 
4 
6 
e 
' 8 
I 
10 
11 
18 
18 
1.& 
-
I 
n 
p 
u 
t 
s 
h 
u 
f 
f 
1 
e 
iooo.-
I 
I 
0 / / 
0 / // 
lD / ,;x. / 
/ / 
8 / / 
• / / W_ / / 
/ / I 
8 / / 
u / / t 1 / /_ 
/ / 
I / ./ 
14 / / 
• ./ / 
' L 
18 ~ / / B / / ., / /_ 
/ / ~ 
8-polnt--~ ~ 
j_ 
I 
..... 
... 
::: 
'~ 
• 
J\ 
::: 
..... ~ v y ..... 
Kultlplloatlon 
with Ooefflalenta 
r--
/ / 0 
L~ / 1Q 
/ ...x- / 0 
/ / 
0 / ./ 8 
/ / 1 u 
/ / 11 t 
/ p u 
/ / 18 t 
/ / ., gl / / B 
L u 
/ / 8 f 
/ L _18. f 
/ / • 
1 
/ e 
/ / I 
/ / " / / 1B 
/ 
.___ 
S-po!nt post.-wea1 
6-point pre-weave 6-po!nt post-weave 
Figure 7.1: 16-Point Winograd Fourier Transform Algorithm {WFTA) 
showing a two dimensional structure 
0 
1 
8 
8 
' C5 
8 
' ·e 
I 
10 
11 
18 
18 
14 
simultaneously. Coding of a 'DO' loop also hinders efficient 
program execution. In the case of the WFTA the program coding 
requires double indexing in the memory to acquire data for 
arithmetic operations which would load the microprocessor 
heavily. The consequence is that the microprocessor will spend 
more time in the indexing and data organisation than actually 
performing the arithmetic operations. 
We are interested in designing a dedicated parallel 
microprocessor system to implement the 15-point WFTA. 
Implementation of the 15-point WFTA on a parallel microprocessor 
system would circumvent some of the problems arising in the uni 
processor implementation of the algorithm (59), (61). The amount 
of indexing to be performed by each of the microprocessors is 
reduced considerably, and fewer data are to be manipulated by 
individual microprocessors. This frees the microprocessors for 
more vital tasks. Zohar (60), has suggested the use of address 
processors to calculate the addresses of the data beforehand, 
which would effectively increase the systems efficiency. 
Attention is now drawn to some essential factors which must 
be kept in mind for designing a parallel microprocessor system. 
These factors are, the transform length N, choice of a suitable 
microprocessor, inter microprocessor communication, systems 
organisation, cost and power requirements etc. 
7-2 
7.2 Choice of a Microprocessor 
To investigate the possibility of parallel implementation of 
the 15-point WFTA requires the selection of a suitable 
microprocessor. This was done by writing benchmark programs to 
test the microprocessor's performance in this application. These 
benchmark programs (for modular arithmetic operations) were 
written for the following microprocessors, TMS9900, MC6809, Z80 
(89), 8X300 (90), COP402 (91) and 6502 (92). Among these the 
TMS9900 is a 16-bit microprocessor, whereas the MC6809, Z80 and 
6502 are 8-bit microprocessors. The 8X300 and COP402 are 8-bit 
and 4-bit micro-controllers respectively. The MC6800 
microprocessor was not included in the above list, because the 
MC6809 is an enhanced version of the MC6800, and is much faster 
and more versatile than its predecessor. All these benchmark 
programs were run on the respective microprocessor systems to 
test their accuracy, except for the 8X300 and the COP402, which 
were not available at the time. Appendix-A contains source 
listings of these benchmark programs, listings for the two 
micro-controllers are excluded. 
Results of these benchmark programs 
(7.2) to (7.4). Figure (7.5) shows 
are 
the 
shown in figures 
cost of these 
microprocessors (1981), which was one of the considerations to 
obtain a cost effective design (also see tables (3.1) to (3.3)). 
Comparison of these results show that the MC6809 microprocessor 
gave an optimum choice. Two of the important features of the 
MC6809 microprocessor which led to its selection were the 
availability of an unsigned hardware multiplier and the SYNC 
7-3 
u 
w 
(/) 
40 
~ 20 
w 
~ 
..... 
10 
0~----~----~----~----~--~_. ____ _. ___ 
zso COP402 9900 6502" 6809 8X300 
Figure 7.2: Results of the benchmark programs for modular 
addition. 
1070 i 
40 
l 30 
u 
w 
(/) 
~ 
w 20 
~ 
-..... 
10 
0~----~-----L----~~----~----~----_. __ _ 
COP402 9900 6502 6809 BX300 ZBO 
Figure 7.3: Results of .the benchmark programs for modular 
subtraction. 
50 
10 
I 
6502 6809 8X300 Z80 · COP402 
Figure 7.4: Results of the benchmark proqrams for modular 
multiplication. 
o~--~--~--~--~--~----~1 COP402 6809 8X300 . Z 80 9900 6502 
Figure~ 7.5: Cost of the microprocessors ( 1981 ). 
instruction. In spite of being an 8-bit microprocessor, its 
powerful addressing and indexing modes can provide an outstanding 
performance comparable to the 16-bit microprocessors. Among the 
rest, only the TMS9900 microprocessor contains an unsigned 
hardware multiplier. 
7.3 Architecture of the MC6809 Microprocessor 
The Motorola's MC6809 microprocessor is an 8-bit 
microprocessor, with 16-bit addressing, housed in a 40 pin d.i.l 
package. Figure (7.6) shows a block diagram of the CPU 
architecture (78), (79). 
It consists of two general purpose 8-bit registers A and B, 
often called the accumulators. These registers are mostly used 
for arithmetic purposes. The repertoire of the microprocessor 
contains signed and unsigned 8-bit and 16-bit arithmetic 
operations. The accumulators A and B may be concatenated together 
to form a 16-bit accumulator D, thus allowing 16-bit arithmetic. 
An 8-bit Condition Code register (CC) provides information about 
the current machine status. 
Two 16-bit index registers X and Y are used in the indexed 
mode of addressing. These registers are quite useful when 
sequential data access to and from the memory is required. 
However, an offset can be specified in the instruction, then the 
address in an index register behaves as a base address. The 
accumulators can also be used to hold this offset. 
7-4 
AO -A15 00-07 
J 
*-
~ 
1111! 
16 Is 
' 
.. PC - ... IR 
u - .. 
-
.. s - ... I 
"" y - .. INTERRUPT 
CONTROL 
... ... X ... 
-
- .,. I 
o{ A ' -~. B ... .. BUS 
-
DP cc ...._ ... CONTROL 
~ 'P I 
vw 
ALU .. 
-
TIMING 
I 
Figure 7.6: MC6809 CPU block diagram. 
MPU STATE 
BA BS 
0 0 NORMAL RUNNING 
0 1 INTERRUPT ACK. 
1 0 SYNC ACK. 
1 1 HALT OR BUS GRANT 
Table 7.1: MC6809 CPU statP.. 
~ 
fE-
fE-
~ 
~ 
oE--
... 
_ .... 
Vee 
Vss 
RESET 
NMI 
FIRQ 
IRQ 
DMA/BREQ 
R/W 
HALT 
BA 
BS 
XTAL 
EX TAL 
MRDY 
E 
Q 
There are two 16-bit stack pointers called the hardware 
Stack pointer S, and the User stack pointer U. These stack 
pointers can be used with the same addressing modes as the index 
registers X and Y. These registers work as pushdown stack 
pointers, and are accessible to the programmer. When subroutines 
or interrupt routines are to be executed, the microprocessor 
automatically utilises the address in the stack pointer 5 for 
saving the entire machine state in the memory. The stack 
pointers U and 5 may be used as pointers for the pushdown stack 
thus supporting pull and push instructions. This pushdown stack 
allows to pass arguments to and from the main program to the 
subroutines, interrupt routines etc. 
A 16-bit Program Counter (PC) allows access to 64K bytes of 
memory. The program counter contains the address of the next 
sequential or logical instruction to be executed. An 8-bit 
Direct Page (DP) register is available to enhance the direct 
addressing mode. The contents of this register serve as high 
order 8-bits (A8-A15) during the direct addressing. The DP 
register is cleared when the microprocessor is reset. This 
register allows 8-bit relative addressing within the page, whose 
base address is in the DP register. The direct addressing mode 
requires fewer program bytes and executes much faster· than the 
absolute addressing mode. 
The microprocessor also contains an onchip oscillator, which 
is accessed through two input pins. This oscillator may be 
operated by an external crystal of frequency 4f (where f is the 
bus frequency, typically f = 1 MHz). Alternately an external 
7-5 
(TTL) clock source of 4f may be used to operate the 
microprocessor. The latter arrangement is useful in systems where 
synchronous processing is required e.g. in multi processor or 
parallel processor systems. Two output clock signals E and Q 
(1 MHz), are used for external timings. Addresses are valid on 
the leading edge of Q, and data are latched on the falling edge 
of E. 
A low level on the RESET input forces the microprocessor 
into a known state. A low level on the DMA/BREQ input forces the 
data and address buses into high impedance state, so as to permit 
a direct memory access. A low level input on the HALT line halts 
the microprocessor indefinitely after the end of current 
instruction without loss of data. A MRDY input allows the 
microprocessor to access slow memories, by stretching its clock 
signals. However, the clock signals may not be stretched beyond 
10 microseconds without loss of data. A R/W line indicates a 
Read (high) or a Write (low) cycle. Two output signals BA (Bus 
Available) and BS (Bus Status) gives information about the 
current ·machine status as shown in table (7 .1). 
7 .3.1 Hardware and Software Interrupts 
Three levels of hardware interrupts are available, and are 
priori tised in the following order, NMI (Non Maskable Interrupt), 
FIRQ (Fast Interrupt ReQuest), and IRQ (Interrupt ReQuest). 
The NMI is a negative edge triggered interrupt and cannot be 
disabled through software. When this interrupt occurs, the 
entire machine state is saved on the hardware stack. This 
7-6 
condition is indicated by setting the E flag in the condition 
code register. After a reset, the NMI will not be recognised 
until the first program load of the hardware stack pointer S. 
Both the FIRQ and the IRQ are level triggered interrupts and 
are maskable, i.e. these interrupts can be disabled or enabled 
through the software. If the F or the I bit in the condition 
code register is set to logic 1, then the respective interrupt is 
disabled. Otherwise it is enabled. The FIRQ is the fast 
interrupt in the sense that, unlike NMI and IRQ it does not save 
the entire machine state, but saves only the condition code 
register and the program counter on the hardware stack. The E 
bit in the condition code register remains cleared. The IRQ 
interrupt works in a similar fashion as the NMI interrupt, except 
that it is maskable. 
Three levels of software interrupts are also available, and 
are useful for debugging the system and for software development. 
Decoding of the low order 4-bits on the address bus determines 
which level of interrupt had occured. 
7 .3.2 Microprocessor Synchronisation 
In a parallel processor system a single out of step 
processor can produce chaotic results. Synchronisation can be 
achieved by handshaking at hardware or software level. The 
handshaking allows data exchange between two or more processors 
without loss of information. 
7-7 
The MC6809 microprocessor is provided with a SYNC 
instruction which may be used to synchronise the microprocessor 
to an external event. When the microprocessor executes the SYNC 
instruction,· it stops processing the instructions and waits for 
an external interrupt. Two output pins BA • BS = 1 indicate the 
SYNC acknowledge, where '.' represents a logical AND operation 
(see table 7.1). If the pending interrupt is a nonmaskable (NMI) 
or a maskable interrupt (FIRQ or IRQ) with its mask bits (F or I) 
clear, then after receiving the external interrupt the 
microprocessor will clear the sync state and will execute the 
appropriate interrupt routine. However, if the pending interrupt 
is maskable and it is disabled, then the microprocessor will 
simply clear the sync state and resume normal operation. This 
instruction is ideally suited for the situations where the 
expected input data are occuring randomly, and the microprocessor 
cannot process further data without it. This data can be from 
another microprocessor or from some other source. 
The use of SYNC instruction is equivalent to a wait loop. 
An advantage of using the SYNC instruction is that it is faster 
than the wait loop, since the microprocessor will proceed further 
as soon as it receives an interrupt. However, in the case of a 
wait loop a small delay may be introduced before the processor 
can proceed furthe~ 
7.4 Inter Microprocessor Communication 
In a multi processor or a parallel processor system it may 
or may not be a requirement for the processors to communicate 
7-8 
with each other at all. However, if a processor requires data 
from another processor during the task execution, then some form 
of inter processor communication is required. The method of data 
exchange will depend upon whether the system is loosely or 
tightly coupled. 
To investigate a principle for inter microprocessor 
communication a simple example is presented. Consider a system 
with two general purpose processors PI and P2 (see figure 7.7). 
Each of the processor has its own local program memory, and some 
arrangement for decoding the address and generating the 
appropriate read/write signals. Consider two latches LI and L2 
with tristate outputs, these latches are connected to the 
processors such that, PI can only write into LI and P2 can only 
write into L2. Furthermore, PI can only read the contents of L2 
and P2 can only read the contents of LI. In other words LI is a 
write only and L2 is a read only latch for PI, and L2 is a write 
only and LI is a read only latch for P2. This arrangement forms 
a loosely coupled multi processor system, and the associated 
latches may be visualised as 1/0 ports. These latches are 
connected through dedicated parallel data buses, with two 
associated control signals. These two control signals are the 
output enable (OE) and the clock (CLK) signals. 
The two processors exchange data with each other through the 
communication latches in the following manner. When required, PI 
writes data into LI and P2 writes into L2. The processors are 
then synchronised with each other at this instant, and then the 
processors read their respective read only latches (88). 
7-9 
.;. ... 
~ 
LATCH 
CLK L1 OE 
DATA DATA 
CPU ..,; CPU 
-
.... --.,. 
P1 BUS BUS P2 
CONTROL CONTROL 
LOGIC 
- -
LOGIC 
LATCH 
L2 
OE CLK 
Figure 7.7: A two microprocessor system. 
7.5 Dual Microprocessor System 
Figure (7 .8) shows a block diagram of a practical circuit 
based on the idea discussed in the previous section. This system 
contains two MC6809 microprocessors P1 and P2. A TMS9900 
microprocessor system serves as a host or master to control the 
two slaves P1 and P2. Each of the microprocessors has its own 
local program memory and no other microprocessor can access it. 
A common single phase clock is used to operate the two slaves, 
which is separate from the master's clock. The microprocessors 
·are located physically very close to each other, and the 
interface between the master and slaves is through dedicated 
16-bit latches with tristate outputs. The master's side consists 
of a 16-bit data bus, while the slave's side consists of an 8-bit 
data bus. 
In addition to the communication latches L1 and L2, each of 
the two slave microprocessors have associated with them two 
additional latches, namely IN1, OUT1 and IN2, OUT2 respectively. 
IN1 and IN2 serve as the input buffer memory i.e. data to be 
transferred to the slaves by the master are held in these 
latches. Results calculated by the slaves are stored in the OUT1 
and OUT2 latches, which are to be read by the master. The 
working of these latches are similar to L1 and L2 as described 
before, except that these latches are used to exchange data with 
the master. 
The HALT and the RESET inputs of the slave microprocessors 
are connected to the output port of the master. The logic levels 
on this port can be changed individually through the software. 
7-10 
.. 
vou 
. -~ IN1 ~ MC6809 ADDRESS BUS 
P1 
12 
SLAVE 
BS PROGRAM 14- MEMORY OUT1 BA 
16. FIRQ 
(/) l OATA_8US :::> 
-
_. 
CQ 8 
__. 
·4( 51 14-~ t-
r G' TMS9900 oc{ ·~ c MASTER 
"" _r 
L G1 G2 -~ 2 
""'"" 
::0 1/0 
-
" 
m L1 PORTS J :J: Ul 
- - )> m 
we r- ~ -f ~ 
;)Gs ADDRESS 16 52 ~ BUS 
,PATA BUS 
- I 8 ADDRESS 
FJfQ DECODE AND CONTROL PROGRAM LOGIC ~ ~T2 .BA MEMORY BS 
,~ MC6809 12 ........  
P2 ADDRESS BUS TO READ ONLY ~ IN2 r. SLAVE AND WRITE ONLY 
LATCHES 
Figure 7.8: TMS9900 microprocessor controlling the two slaves 
(MC6809). 
f 
L2 
1 
Initially the master resets and then halts the slaves, until it 
has transferred data into the input latches of the slaves. 
Another important feature in this system is the 
synchronisation between the two slaves. This is achieved by 
using the SYNC instruction and the FIRQ interrupt input, with the 
F bit in the CC register set to logic 1. When the HALT input 
goes high the slaves read the input latches and transfer these 
data values into their appropriate communication latches, and 
then execute the SYNC instruction. The sync acknowledge signal 
from the two processors are ANDed (G3) together and inverted to 
generate interrupt to themselves. This condition indicates that 
valid data are available in the latches L1 and L2. After 
receiving the interrupt the slaves read their appropriate read 
only latches, and perform the desired operation. One of the 
slaves was chosen to perform modular addition and the other 
modular subtraction. 
Some form of protocol is also necessary between the master 
and the slave microprocessors to facilitate synchronisation and 
communication. For this purpose an 8-bit status (STATUS) latch is 
also associated with each of the slaves 51 and 52, only the least 
significant bit is used. The output of the status latch 
determines the system status. For example a logic 0 at the 
output of the status latch indicates that the slave is busy 
executing its program. While a logic 1 indicates termination of 
the task (see figure 7.8), the slaves execute the SYNC 
instruction after setting status to logic 1. The output of the 
two status latches are permanently enabled and are ANDed (G1) 
7-11 
together to generate the system status signal. Another · 1-bit 
signal which is being ANDed in G1 is obtained from the output 
port of the TMS9900 microprocessor. This bit is called the 
status control bit (SCB). When this bit is low the status latch 
output has no effect on G3, as G1 is disabled. When the master 
desires to read the output latches, it sets the status control 
bit to logic 1, and continuously monitors for the output of G1 to 
go high. When the system status signal goes high, the master 
reads the output latches. The slaves execute the SYNC 
instruction after outputting the data, hence the slaves will 
remain in that state until the status control bit goes low again. 
This is done by the master after transferring new values into the 
input latches, which forces the output of G3 low, thus generating 
an interrupt to the slaves, the slaves repeat the same cycle 
again, by first clearing the status latch. 
This loosely coupled multi processor system was designed 
just to test its performance and the principle of slave-slave and 
master-slave communication. Addi tiona! software on the master 
checks that the results calculated by the slaves are correct. 
7.5.1 Merits and Demerits 
In general two microprocessors cannot communicate with each 
other in real time, without one of them waiting for the other to 
send data. But if some intermediate buffer memory is used, then 
the source microprocessor can transfer the data into this buffer 
memory, and the destination microprocessor can read this data at 
leisure. If we are dealing with a single or a double byte 
7-12 
buffer, then care must be taken that the source does not 
overwrite this data before the destination microprocessor had a 
chance to read it (64), (68). Another situation might also 
arise, in which the destination microprocessor keeps reading the 
same data without realising that the data have not been updated 
since it was last read. These conditions can be circumvented by 
using a single bit flag which indicates whether the data had been 
read or updated in the buffer or not. 
Previously we have seen that the latch was used as a 
communicating medium between the two microprocessors. The input 
of the latch is connected directly to the data bus of the source 
microprocessor. The output of these (tristate) latches can be 
connected directly to the data bus of the destination 
microprocessor. The control signal i.e. the clock (CLK) and the 
output enable (OE) may be appropriately generated. This means 
that each side of the latch consists of ten lines in all, i.e. an 
8-bit data bus and two control signals for either the output 
enable or the clock signal (since 16-bit data is being 
transmitted through a unidirectional dedicated data bus). We are 
investigating a method for inter microprocessor communication to 
be used for the implementation of the 15-point WFT A. We will see 
later that in the parallel microprocessor system (for the 
parallel implementation of the WFTA) only one 16-bit value is 
exchanged between two microprocessors at any instant on a 
particular bus. The use of latches thus reduce the circuit 
complexity considerably, but at the expense of increased chip 
count, cost and power consumption. 
7-13 
Alternately a common memory (RAM) can be employed for inter 
microprocessor communication. Although it provides more storage 
and may be cheaper, it also increases the circuit complexity 
considerably. The major problem in a shared memory system is to 
prevent memory conflict or memory contention. An attempt by two 
or more microprocessors to access common memory is called memory 
contention. The shared and the local address and data buses have 
to be multiplexed (67). The throughput is reduced considerably 
when all the processors wish to access the common memory 
simultaneously. Hoffner and Smith (68), have suggested a method 
of preventing memory contention in a system with two MC6809 
microprocessors by operating them opposite phases of a common 
clock. The number of microprocessors connected in this manner is 
limited to two. 
7.6 Design and Implementation of the Dedicated Parallel 
Microprocessor System 
The dual microprocessor system worked quite satisfactorily. 
The method adopted for inter microprocessor communication through 
latches seemed quite suitable for the parallel microprocessor 
system to implement a 15-point WFTA. Each of these latches would 
be connected through dedicated unidirectional 8-bit data buses. 
All the data exchange among the microprocessors can then take 
place simultaneously, hence the system should provide a very high 
efficiency and throughput. 
Close examination of figure (4.3) reveals that the 
implementation of the 15-point WFTA algorithm consists of 
following steps. 
7-14 
1. Input shuffle or reordering 
2. Five 3-point preweave or premultiply adds 
3. Three 5-point preweave or premultiply adds 
4. Eighteen modular multiplications with precalculated 
coefficients 
5. Three 5-point postweave or postmultiply adds 
6. Five 3-point postweave or postmultiply adds 
7. Output shuffle or reordering. 
It may be noted here that the 5-point WFTA requires six 
modular multiplications which requires extra storage. Hence the 
total number of modular multiplications in the 15-point WFTA is 
eighteen. Since modular multiplication is the most time consuming 
operation, the parallel microprocessor system was designed such 
that all the microprocessors share the load equally during the 
modular multiplication. 
complete system. 
7 .6.1 System Architecture 
This requires 18 microprocessors in the 
Figure (7.9) shows a block diagram of the dedicated parallel 
microprocessor system. The microprocessors are interconnected to 
form a two dimensional array with three rows and six columns. 
The five 3-point transforms are performed along the columns. The 
microprocessors numbered 16, 17, and 18 do not take an active 
7-15 
A 
c 
E 
I I I I I I I I I I I I I 1 1 I 
I 
1 I 
1 2 3 I 4 5 16 I 
"""" -
"""" 
I - -
I 
I 
I 
----------- ------------ ----- ----+-
----------- ------------------I 
I l1 I J I I I I I I I 1 J I I 
I 
I 
.... 
6 -- 7 
.... 
8 I 
1-. 
9 
..... 
10 17 I 
P""'" 
"""" 
~ I 
"""" """" I 
I 
I 
---------- --------- ---- ------+- ----------- ---~-------------
I 
I.... . 11 
' 
I I 1 I I I .. I I I I I I 1 
I 
I 
12 13 I 14 15 
-
. 1-. I 1-. L-
I 
I 
- --
L__ I L__ 
Figure 7. 9: Block diagram of the parallel microprocessor system 
(the control microprocessor is not shown). 
1 l 
I 
18 
8 
D. 
F 
part at this stage hence no connection exists between them along 
the column. For the three 5-point transforms the microprocessors 
are active along the rows. Comparison shows that each '.' 
(column wise) in figure (4.3) corresponds to a box which is a 
microprocessor with associated hardware in figure (7.9). Each of 
the connecting lines along the rows and columns consists of two 
8-bit dedicated data buses with two associated control signals, 
to facilitate bidirectional communication between the two 
microprocessors. All the microprocessors in the system are driven 
from a common single phase clock source of 4 MHz. Each of the 
slave microprocessors generate their own local timing signals. 
The microprocessors in the system are partially connected, 
i.e. there are no redundant connections. This system basically 
forms a loosely coupled dedicated MIMD machine. The prototype 
system was assembled on seven standard plugin 6U eurocards, using 
wire wrapping techniques. The dotted line in the figure (7.9) 
shows how these microprocessors are distributed among the six 
boards labelled A to F. The seventh board in the system consists 
a contrql or a master microprocessor, with associated circuitry. 
7 .6.2 Design of the Control Microprocessor 
The slave microprocessors are not capable of communicating 
directly with the outside world i.e. with a VDU or any other real 
time device. Hence an extra dedicated microprocessor is employed 
to serve as a host or a master microprocessor (not shown in 
figure 7 .9). This brings the total number of microprocessors in 
the system to nineteen. The control microprocessor not only 
7-16 
serves as a controller for the slaves, but also provides an 
interface to the outside world. The control microprocessor has 
an RS-232 serial interface with the VDU to provide access to the 
system. Figure (7.10) shows a circuit arrangement for the serial 
interface using Motorola's MC6850 ACIA (Asynchronous 
Communications Interface Adapter). A baud rate generator 
Motorola MC14411 is used to generate the receive/transmit rate 
clock for the ACIA (82), (83), (84), (85). 
The parallel microprocessor system appears to the master as 
a black box, the only part accessible to the master are the input 
and the output latches associated with the slaves. This black 
box appears as an intelligent peripheral to the control 
microprocessor. The master microprocessor transfers data to the 
input latches and reads the transformed values from the output 
latches. For demonstration purposes the ·master then reads the 
output latches and stores these values into its memory and 
displays on the VDU, or oscilloscope via a D/ A converter. The 
master microprocessor does not interfere in the data exchange 
among the slaves, and in fact it is unaware of that. All the 1/0 
data has to pass through the master. For large N, this may 
become a limiting factor, and may degrade the system's 
performance. For example 178_ microseconds are required to 
transfer fifteen 16-bit data· to or from the slave 
microprocessors. Alternative arrangement can be made to transfer 
the data directly into/from the input and output latches, which 
would increase the throughput. 
7-17 
AO 
DATA BUS 
8 
•SV 
E R/W LINE 
DRIVER 
CS1 
MC6850 
~----~~1488L-----~ 
UNE 
RECEIVER 
RS-232 
CHIP -----tCS2 
SElECT CTS RXD....._----tU89 ....._ _ __, 
DCO 
•SV 
CRX CTX 
MC144 11 ......__RESET 
15M RS RIW REGISTER 
•SV 
0 0 0 CONTROL 0 1 STATUS 
1·834 MHz 1 0 RECEIVE 
1 TRANSMIT 
Figure 7.10: ACIA interface. 
Figure (7.11) shows circuitry associated with the control 
microprocessor. The control microprocessor has it own program 
memory of 2K x 8-bits (2716), and 1K x 8-bit (2 x 2114) of local 
RAM. A number of address decoders (SN74LS154) are required to 
access all the input and output latches. A bidirectional bus 
transceiver (SN74LS245) is used to drive all these latches which 
reduces loading on the data bus of the microprocessor. However, 
the local RAM and ROM are connected directly to the data bus of 
the microprocessor. 
An 8-bit write only control latch (CONTRL) is associated 
with the master (see figure 7 .12). The output of the control 
latch is permanently enabled and the low order 5-bits are used 
for control purposes. A location STATUS in the RAM keeps a 
record of the contents of the control latch. 
signals are as allocated as follows. 
Bit 0 master RESET for the slaves 
Bit 1 HALT for the slaves 
Bit 2 RESET for the baud rate generator 
Bit 3 status control bit (SCB) 
Bit 4 chip enable for the A/D converter 
These control 
Bit 5 signal to slaves to perform forward or inverse 
transform 
These bits can be individually set to a logic 1 or reset to 
logic 0 through software using logical bit instructions. The 
status control bit (SCB) is used to detect the condition of the 
complete cyc~e of the transform (see figures 7.12, 7.13). When 
the master desires to read the output latches, it sets the status 
7-18 
VDU I I CRO I ANALOGUE URCE 
J 
1 
ROM RAMI I 0/A [$ 
MC6809 
MASTER 
16 l 
ADDRESS BUS 
E~ 
RIWLJ..h---, 
8 l • . 
DATA BUS 
, loRIVER~ 
CLOCK 
OUTPUT ENABLE 
__ ..,. ____ ------------------------------------
1 
I 
~ :.1 INPUT ~ 
LATCHES 
ACIA SELECT OUTPUT 
~PARALLEL 
pP SYSTEM 
ADDRESS DECODE BUS DRIVER LATCHES 
'--t AND CONTROL SELECT 
- - - - - • - - - - - - - - -- -- - -- - - - • • - - - - • - • -- - - - -A 
LOGIC 0/A SELECT 
t l ROM SELECT 
A/0 SELECT 
RAM SELECT 
F iqure 7.11: Complete para lie I microprocessor system showinq 
the master and the slaves. 
RESET 
HALT 
STATUS 
FROM 
SLAVE 
BOARDS 
SYNC 
FROM 
SLAVE 
BOARDS 
SLAVE 
BOARDS 
I OFORWARD 
.DATA BUS 
t = 
;""'": 1 INVERSE ls I I 
I 0 
·-I CONTROL t 1 J_ A/D CHIP ·-- LATCH I 
I ENABLE I (2 3 I 
I 
I RESET STATUS I BAUD RATE .. I CONTROL c:::. GEN. '1: BIT 
... 
r 
8 ·~-
c: 
Gl SYNC. 1~ ACK. I-- BA ~ BS 
A MC6809 
I G3 MASTER 
I y I I I 
FIRQ I 
I 
INHIBIT 
"!"--
I G2 \_. 
f 
INTERRUPT TO 
-
MASTER 
BOARD 
SLAVES 
Figure 7.12: Master microprocessor with associated circuitry. 
MC6809 
CONTROL 
74371. 
DATA BUS .. D 
8 
LOGIC t----' 
.MC6809 
DATA BUS 74374 
........ _ __,_ _ """ ... , D Q ~----
CONTROL 
LOGIC 
MC6809 
CONTROL 
LOGIC 
8 
CLK OE 
-:.ir 
74374 
DATA BUS 
- D 
8 
CLK 
Q-
lOE. 
-= 
I .,. 
I 
I 
I 
I 
I 
I 
I 
STATUS OUT 
FROM SLAVE 
BOARD 
Figure 7.13: Arrangement for generating STATUS signal from each 
slave board. 
control bit (SCB) to a logic 1 and executes the SYNC instruction 
and waits for the slaves to complete the transform. When the 
slave microprocessors finish the transform cycle, they set their 
respective status latches to a logic 1, and execute the SYNC 
instruction. At this time the output of the gate G1 goes low 
disabling G2, simultaneously generating an interrupt signal to 
the master through G3. The master then resumes normal operation 
and reads the output latches. However, as long as the status 
control bit remains high, it prevents the interrupt signal from 
reaching the slaves. After reading the output latches the master 
clears the status control bit. This forces the output of G1 
high, enabling G2 and consequently generating an interrupt to the 
slave microprocessors. 
the transform. 
The slaves then start the next cycle of 
7 .6.3 Software of the Control Microprocessor 
To facilitate the development of the software, the control 
microprocessor provides an interactive interface with the 
parallel microprocessor system (see figure 7.11). This allows 
manual insertion of data into the parallel microprocessor system. 
When the power is switched on, the powerup circuitry resets 
the master microprocessor. The master then resets the baud rate 
generator and the slaves, and halts the slaves. It then resets 
and initialises the ACIA for the data receive/transmit data 
format and the baud rate. The halt state of the slaves is then 
cleared. Source listing of the monitor program is included in 
appendix-D. 
7-19 
For test purposes a 15-point WFT A verify routine is stored 
in a separate ROM (see appendix-D). The control microprocessor 
executes the 15-point WFTA on the same input data as the slaves, 
and verifies the transformed values obtained from the slave 
microprocessors. The control microprocessor displays an error 
message on the VDU, if the two results do not tally, and prints 
these values. A 15-point WFTA was also implemented in FORTRAN on 
a main frame computer to verify these results. 
7 .6.4 Design of a Typical Slave Microprocessor 
A typical circuit arrangement for the slave microprocessor 
interfaced with local program memory 2K x 8-bits (2716), local 
RAM 1K x 8-bits (2 x 2114) is shown in figure (7.14). However, 
microprocessors numbered 16, 17 and 18 have a slight variation in 
their circuit arrangement which is shown in figure (7.15). Each 
of the six eurocards contains three such circuits. Each of the 
slave microprocessors has associated with it input (INPUT), 
output (OUTPUT), and status (STATUS) latches, except for the 
slaves numbered 16, 17, and 18. In addition a number of 
communication latches are also associated with each of them. The 
number of latches for a particular microprocessor depends upon 
( 
how many microprocessors it is communicating with. All these 
latches are 16-bit (2 x SN74LS374) latches, with tristate 
outputs, except the status latch which is an 8-bit latch. The 
clock and the output enable signals are generated using a 4-line 
to 16-line decoder (SN74LS154), and gating it appropriately with 
E and R/W. All the latches are driven by the bidirectional bus 
7-20 
E 
R/W ~ f 
..... )-L.c 
E <~~~~~-1---t 
R/W4--..... 
MC6809 . 
l ._ 
ADDRESS BUS 16 
...--....a-.-..a.l_ DATA BUS 
2716 
--~cs ROM E---t 
8 J 
PD/GM 
}---
- 2x2114 
CS RAM 
r--
-- WE 
A!!,_ 
1 
A4 
-...... ---
1 
R!W 
A10 1.r, }-- R/W 
1 8 8 
~ ..... 
DIR 
J---....... ..;.._-=a6E 74 245 
f 
. r 1 
11 
'1,--' 
CONTROL 
LOGIC 
t 
TO READ ONLY· 
LATCHES 
1 
A 8 
DATA BUS 
4 A0-A3 
I 
CONTROL 
LOGIC 
t. 
TO WRITE ONLY 
LATCHES 
·Figure 7.14: Typicai slave microprocessor (1 to 15) with 
associated hardware. 
, 
E 
R/W 
I..-
L...-.c 
h.-
F 
E .... -r----1 
MC 6809 
I 
ADDRESS BUS f6 
2716 
r-'----t B R 0 M 
PO GM 
I 
A10 
~ ~ 
1 f 
1 I' 
~" 
...... 
DATA BUS 8 r 
E-~- h_ _2x2114 
.-- ~ CS RAM 
WE 
I 
R/W 
DIR l----.---.n"'oe 7424 s 
A4 
...... ..._1 
4 
7i. 
8 
DATA BUS 
A0-A3 
L-----------~~~.~~----1 CONTROL 
LOGIC 
-
TO LATCHES 
Figure 7.15: Typical slave microprocessor (16 to 18) with 
associated hardware. 
BUS 
DRIVER 
transceiver (SN74LS245), and the direction of data flow is 
controlled by the R/W line. 
The operation of slaves numbered 1 to 15 is as follows. 
After receiving the reset signal from the master, the slaves set 
their respective status latches to 1, and execute the SYNC 
instruction. If the status control bit is high, the slaves then 
wait until it goes low. After transferring the results to their 
respective output latches the slaves set the status latch to a 
logic 1 again. Thus allowing the master to read the output 
latches. If at this instant the status control bit remains low, 
the slaves start the next transform cycle assuming that the data 
have been updated in the input latches. The microprocessors 
numbered 16, 17 and 18 receive data from other microprocessors 
just before the multiplication cycle. They behave as external 
modular multipliers, whci for the most of the time are idling 
(executing a series of SYNC instructions). After performing the 
modular multiplications, these microprocessors return the results 
to the appropriate microprocessors through communication latches. 
These microprocessors then continue to execute another series of 
SYNC instructions until the next multiplication cycle. Figure 
(7.16) shows a flowchart for the master and slave 
microprocessors, which also shows how the software of the master 
and the slaves interact. Figure (7.17) shows a timing diagram. 
7 .6.5 Software of the Slave Microprocessors 
All the slave microprocessors are executing programs 
concurrently although the software of each of the slaves is 
7-21 
MASTER 
P-OWERUP RESET 
INITiALISE 
SYSTEM 
RESET SLAVES 
TRANSFER 1ST 
SEQUENCE TO 
SLAVES h(n) 
I 
' SAMPLE 2ND 
SEQUENCE x(n) 
.TRANSFER TO 
SLAVES 
SET SCB= 1 
WAIT FOR 
INTERRUPT 
FROM SLAVES 
REAO OUTPUT 
LATCHES 
SET SCB= 0 
INTERRUPT 
-
INTERRUPT 
SLAVES 
RESET FROM 
MASTER 
TRANSFORM 1ST 
SEQUENCE AND 
SAVE RESULT 
H(k) 
I 
... 
TRANSFORM 2ND 
SEQUENCE AND 
SAVE RESULT 
X(k) 
t.4UL TJPLY 
Y = H ·X 
INVERSE 
TRANSFORM y(n) 
STORE RESULT IN 
OUTPUT LATCHES 
SET STATUS :1 
WAIT FOR -
INTERRUPT 
1 
SET STATUS:O 
Figure 7·16: Flow diagram for the master-slave interaction. 
SCB 
MASTER 
SYNC ACK. 
STATUS 
SLAVES 
· SYNC ACK. 
READ OUTPUT 
LATCHES 
INTERRUPT 
END OF CYCLE) 
INTERRUPT 
(START OF CYCLE) 
BUSY 
Figure 7.17: Timing diaqram for the master-slave interaction. 
different from any other. The source listings are given in 
appendix-D. The symbol Rn means that this particular address is 
of a read only latch and it is receiving data from the 
microprocessor numbered n, where n can have any value between 1 
to 18. For example, in the listing for microprocessor number 1, 
R6 means that the microprocessor numbered 1 is receiving data 
from microprocessor numbered 6 whose address is $0412. 
Similarly, Tn indicates an address of a write only latch, where n 
can have any value between 1 to 18. For example, in the source 
listing of microprocessor number 1, T6 means that the 
microprocessor numbered 1 is transmitting data to microprocessor 
numbered 6 whose address is $0403 • 
. All the modular arithmetic operations are coded directly in 
the main program. No subroutines are being used, as this would 
slow down the microprocessor considerably. For example for the 
MC6809 microprocessor a JSR (jump to subroutine) instruction 
requires 7 to 8 clock cycles, and an RTS (return from subroutine) 
requires 5 clock cycles. This means that 12 to 13 clock cycles 
are required for each subroutine call. Results in table (7.3) 
show that the time for a single subroutine call is considerable 
as compared to the total transform time. Table (7.2) shows 
number of modular arithmetic operations for the 15-point WFTA on 
a single and the parallel microprocessor system. 
The slaves are executing their programs in an endless loop. 
The master must ensure that the output latches are read before 
they are over written by the slaves. 
7-22 
No. of pre-weave modular additions 39 
No. of modular multiplications 18 
No. of post-weave modular additions 39 
Table 7.2a: Shows number of operations for the 
15-point implementation on a uni processor. 
Proc. No. No. of data exchange No. of additions 
Receive Transmit 
P1 2 2 2 
P2 6 6 6 
P3 5 5 5 
P3 5 5 5 
P4 4 4 4 
P5 4 4 4 
P6 4 4 4 
P7 8 8 8 
P8 7 7 7 
P9 6 6 6 
P10 6 6 6 
Pll 3 3 3 
P12 7 7 7 
P13 6 6 6 
P14 6 5 5 
P15 5 5 5 
P16 2 2 1 
P17 2 2 1 
P18 2 2 1 
Table 7.2b: Shows number of operations per microprocessor 
for 15-point WFTA on the parallel microprocessor system. 
(Each microprocessor is performing one modular multiplication.) 
7 .6.6 Synchronisation of the hardware and the Software 
Synchronisation among the slave microprocessors is one of 
the most crucial factors in this system. Recall that the slaves 
are executing programs from their own local program memories. 
The essential requirement is that they should do so in a 
predetermined and in a synchronised manner. Each of the slave 
microprocessors after performing a modular arithmetic operation, 
stores the result in an appropriate communication latch and 
executes the SYNC instruction. The sync acknowledge from all the 
slaves are ANDed (G2) together as shown in figures (7.12) and 
(7 .18). This signal is inverted and fed into the FIRQ interrupt 
input of all the slave microprocessors. The result is that the 
slaves cannot proceed further until they have all executed the 
SYNC instruction. After receiving the interrupt the slaves read 
their appropriate read only latches and start processing the data 
further (see figure (7 .17)). The advantage in this arrangement 
is that all the microprocessors always find valid data in the 
communication latches. 
This synchronisation could also be achieved by coding dummy 
instructions such as a NOP (no operation) in the main program. 
The purpose of these dummy instructions would be to waste 
microprocessor time so that each of the modular arithmetic 
operation is executed in equal number of clock cycles. For 
example, 14, 18 or 22 clock cycles are required for a modular add 
if the sum > 65535, 65521 > sum > 65535, or sum < 65521 
respectively. 
7-23 
MC6809 
BA 
BS 
MC6809 
BA 
BS 
MCS809 
BA 
BS 
_,. 
"' 
-
-
"' 
."' 
' 
-
SYNC ACK. 
I 
SYNC OUT 
FROM SLAVE 
BOARD 
Figure 7.18: Arrangement for . generating the SYNC siqnal from each 
slave board. 
The former method for synchronisation was chosen for the 
system, because the use of the SYNC instruction optimises the 
program execution time for each transform cycle. However, in the 
latter case the dummy instructions are executed when carries are 
generated, so the time for the transform execution time is fixed 
(equivalent to worst case). 
7. 7 Transforms of Real Time Signals 
A 12-bit successive approximation analogue to digital (A/D) 
converter (RS754) interfaced with the control microprocessor 
allows transforms of real time signals (see figure (7.19)). The 
conversion time is between 15 to 35 microseconds depending upon 
whether the 8-bit or 12-bit mode is being used. A sample and 
hold (S/H) circuit (LF398) is used to hold the input to the A/D 
converter steady while the conversion is being carried out. 
A latch is connected to the output of the converter, such 
that when the conversion is complete the data are automatically 
latched into it. A read on this latch by the microprocessor, 
also sends a start convert signal to the A/D converter, and to 
the S/H circuit to hold the sample. The control microprocessor 
then executes the SYNC instruction. When the conversion is 
complete, the status bit from the A/D is used (as clock signal 
for the latch) to latch the data and simultaneously _send an 
interrupt signal to the control microprocessor. The advantage is 
that the status bit (of the A/D converter) need not be monitored. 
The control microprocessor reads this latch, this again sends the 
start convert signal to the A/D converter, which then starts 
7-24 
OE 
OE 
ANALOGUE 
INPUT 13 HOLD I 
,..--
6
-1 SAMPLE AND SAMPLE _j 
·01 T 7 HOLD CIRCUIT. 8 
~ 5 
cs 13 
~ 3 
CE · STATUS 
AO 6 RS574 28 
r' 16-27 2 -•SV 
" REAO/CONV. 
12 
8 4 
I I I l 
1 D CLK 1 D CLK 74374 11 ~ 7!.374 11 
MSB LSB 
al 8 
DATA BUS 
6 
~ 6 
BA 85 
iRa 
MC6809 
MASTER 
Figure 7.19: Analogue-to-Digital (A/D) interface with the master 
microprocessor. 
I 
j.OE (LATCH) 
MASTER 
SYNC ACK. \ J ?/' \ I 
READ/CONV. 
STATUS \ ~ 15 ;;;SEC :.\ I 
AID 
CONVERTER! CS \ I 7~£ \ I 
READ 
CONVERT 
START 
END 
OF CONVERSION 
I DATA I ) HIGH IMPEDANCE (DATA ) 
\VALID _ _ 7/ _VALID . _ .... -
Fiqure 7.20: Timing diagram for the A/0 converter. 
/ 
converting the next sample. The use of the latch simplifies the 
circuitry and also increases the throughput. While the A/0 is 
converting the next sample, the microprocessor is busy storing 
the previous data into the memory. In this manner full advantage 
of the conversion time is being utilised. A sampling rate of 
28KHz is obtained, figure (7.20) shows timing diagram for the A/0 
conversion. 
Figure (7.21) shows an arrangement for a digital to analogue 
(0/ A) converter (OAC1220) interface. Actually there are two 0/ A 
converters interfaced with the control microprocessor. One for 
displaying the input and the other for displaying the transformed 
values on the oscilloscope. These are 12-bit multiplying 0/A 
converters, with a typical conversion time of 1.5 microseconds. 
Figures (7 .22) and (7 .23) show photograph of the master 
board and the slave board (with three microprocessors) 
respectively. Figure (7.24) shows a photograph of the parallel 
microprocessor system. 
A 15-point convolution was also implemented on the parallel 
microprocessor system. Figure (7.25a) shows a pulse to be 
convolved with itself. Figure (7.25b) shows the NTT of the 
pulse. Figures (7 .25c) and (7.25d) show the product of the two 
NTTs and the convolution respectively. However, if the amplitude 
is large then the effect of modular arithmetic can be seen in 
figures (7.26a-7.26d), which shows the folding of amplitude. 
7-25 
v REF IN 
10pF 
4 17 18 
·DATA BUS· 1 MC6809 74374 DAC1220 
MASTER 8 15 2 3 OE 
_f ":;:-
CLK 
-:;-
F iqure 7.21: Diqi tal-to-Analogue (0/ A) interface with the master 
microprocessor. 
+V 
2 
3 
-v 
+V 
ANALOGUE 
OUTPUT 
Figure 7.22: Photograph of the master microprocessor with 
associated hardware. 
Figure 7.23: Photograph of the slave microprocessor showing 
three slaves on the board with associated hardware. 
Figure 7.24: Photograph of the complete parallel microprocessor 
system. 
' 
-------------1 
Figure 7.25 
(a) Shows a pulse to be convolved with itself. 
(b) Shows the NTT of the pulse. 
(c) Shows product of the two NTTs. 
(d) Shows convolution of the two pulses. 
I 
a 
b 
c 
d 
Figure 7.26 
(a) Shows a pulse of a larger amplitude to be convolved 
with itself. 
(b) Shows NTT of the pulse. 
(c) Shows product of the two NTTs. 
(d) Shows convolution of the two pulses, folding of the 
amplitude occurs due to the use of modular arithmetic. 
a 
b 
c 
d 
7.8 Results · 
The program timings show that a 15-point WFT A run on a 
single MC6809 microprocessor requires approximately 10 
milliseconds to execute. However, when the parallel dedicated 
microprocessor system is employed, the transform execution time 
is reduced to 675 microseconds. 
Table (7 .3) shows comparison of the 15-point WFT A execution 
times. The program written in FORTRAN was not optimised for 
time, but it gives a rough estimate for comparison. 
System Assembler FORTRAN 
MC6809 10 msec 
--
Parallel 675 usee --
Structure 
TMS9900 4 msec 
--
IBM 370/168 365 usee 2 msec 
IBM 370/4341 1 msec 5 msec 
Table 7.3: Comparison of timings for the 15-point WFTA 
The total power consumption of the system is about 65 watts, 
and the total cost of the system is in the range of £ 1500 
(1981). 
7-26 
D-IAPTER 8 
Conclusion 
The object of this work was to investigate and implement 
WFT A on microprocessors and to design hardware to improve the 
execution time. Special purpose hardware was also designed and 
constructed to exploit parallelism in the WFT A. 
An external hardware modular multiplier (mod 65521) was 
designed, constructed and interfaced with the TMS9900 
microprocessor. Since a number of modular additions and 
subtractions are also ·performed it may be beneficial to employ an 
external hardware modular adder (mod 65521). If an external 
hardware modular adder is used then only three move instructions 
are required for external modular add. This will save a compare, 
an add, and two branch instructions. 
designing hardware for modular subtraction. 
There is no benefit in 
A parallel microprocessor system was designed and 
constructed for the implementation of the 15-point WFTA. 
Benchmark programs were written for several microprocessors to 
select a suitable microprocessor for the parallel structure. 
Motorola's MC6809 gave an optimum choice, since it contains an (8 
x 8-bit) unsigned hardware multiplier and a SYNC instruction (the 
SYNC instruction is used to synchronise the microprocessor to an 
external event). This parallel microprocessor is a very highly 
dedicated MIMD machine. A host processor is used to control the 
8-1 
parallel structure. The use of the host processor was necessary 
in the development stages since it provides an interface with the 
parallel microproce~sor system. A serious difficulty is the 
development of the software for the parallel microprocessor 
system which requires large amount of effort, since proper 
synchronisation between all the microprocessors must be 
maintained at all times. 
The parallel microprocessor system being very dedicated 
executes the 15-point WFTA in times comparable with the IBM 
mainframe computers. Table (7.3) shows the program execution 
times on the parallel microprocessor system, MC6809 and two IBM 
mainframes (model 370/168 and 370/4341). All these programs were 
written in assembler language. This agrees with the argument 
given by Arden and Berenbaum (65), and Enslow (66), about 
achieving higher performance from several cheap processors rather 
than an expensive one. 
This p rag mat i c approach to parallel processing, i.e. to 
implement one microprocessor per point may not seem to be a cost 
effective design approach for a bigger size transform. However, 
bigger size transforms can be implemented on the parallel 
microprocessor system by combining the power of each of the slave 
microprocessors with the power of the parallel structure. The 
length of this transform should be an integer multiple N of L, 
where N is one of the short length WFTAs, and L is the transform 
length implemented on the parallel structure. This may be done 
by allowing each of the slave microprocessors to accept N values 
from the master, and perform an N point preweave. Then the 
8-2 
parallel microprocessor system is used to perform N (L length) 
transforms. Finally each of the microprocessor performs the N 
point postweave. 
The parallel structure employs microprocessors with 1 MHz 
clock, a 2 MHz version of the MC6809 is also available but at 
much higher price. If the 2 MHz version is used then faster 
memories have to be employed which means further increase in the 
total cost of the system. However, this would double the program 
execution speed. 
Alternately, if an external modular multiplier is interfaced 
to each of the slave microprocessors (as described in chapter 5), 
this would also almost double the program execution speed. 
However, the cost of a modular multiplier is considerable, and 
this may not be practical due to cost. 
The parallel microprocessor system is not 15 times faster 
than a single microprocessor, this is due to the over heads 
involved. Estimated time for 60-point WFTA on MC6809 
microprocessor is about 50 milliseconds, of which 712 
microseconds are required for input/output shuffle. On the 
parallel microprocessor system the execution time is about 3.5 
milliseconds. 
8-3 
Appendix-A 
Modular arithmetic routines for the foJJowing microprocessors 
i) TMS9900 
ii) MC6809 
iii) ZBO 
iv) 6502 
32/16-bit diVision routine for the MC6809 microprocessor 
· Appendix-A 
... 
... 
* * MODULAR ~RITHM~TIC PROGRAMS FOR TMS9900 
"'· .,. ****************************************************** I ;:: OPTION XREF,SYI-1T 
AORG >4000 I ;:: * MODULAR MULTIPLICATION 
I ::: 
I 
.,t.,. ...... 
... ~ .. , .. MODULAR ADDITION ::: I 
I 
* **************************** 
* START 
OVER 
OVR 
... 
... 
UJPI 
t~OV 
r>1DV 
A 
JOC 
CI 
JL 
AI 
r~ov 
WKS. 
@AOl,iU 
2A02,R2 
Rl,R2 
OVf:R 
R2,65521 
OVP 
?2 ,15 
R2,@SUM 
* **************************** 
* * MODULAR SUBTR~CTION 
MDV JSUBTl,Rl 
r~ov · llSUST2,R2 
MlJV Rl,R3 
s R2,Rl 
c R3,R2 
JHE OVERl 
AI Rl,65.521 
OVERl ~10V Rl,@RES 
... •.. 
... •.. 
... 
. ,. 
wKS 
AOl 
A02 
SUM 
SUBT1 
$UiH2 
RES 
MPR 
;~ p J 
PROD 
~~ 0 8 
LAST 
..• 
... 
•'• ... 
•'• .,.
•'• 
... , ... 
:;: 
MOV ~MPR,Rl 
MOV .:iH·IPR,Rl 
r-lOV llMPO,R2 
MPY R 1, R 2 
O!V ;moo,R2 
MDV R3, Ql DRIJ.O 
s :i))0080 
9$5 32 
BSS 2 
BSS 2 
RSS 2 
3SS 2 
BSS 2 
BSS 2 
5SS 2 
BSS 2 
BSS 2 
DATA 65521 
END START 
* ************~***************************************** 
... 
.,. * MOOULAQ ARITHMETIC PROGRAMS FOR MC6809 ~·.. . .,. 
* ****************************************************** 
... 
. ,. 
•'• .,.
NAM 
OPT 
ORG 
M6809 
CRE,L,S,I..I,P 
$30 
I 
lADS 
I 
I 
I :;: 
JMP OV~D 
FOB 0 
FOB 0 
I=IJB ·. 0 
A-1 
... 
. ,. 
::: t,: M.ODULA? AD:JITION * I* **************************** 
* ~*************************** I* * MODULAR SU3TRACTIDN ... . ,. 
... 
.. , .. , 
START 
SI\IP 
SK U'l 
LOX 
LflO 
tl OD::J 
BCS 
CMPD 
BLO 
o. 0 OD 
s fri 
tJAOS 
,X++ 
,x++ 
SKIP 
"65521 
.SKIP1 
lj15 
' X 
I* **************************** I ::: 
lOVER 
I 
I 
I 
I 
IS t<. I P 2 
I 
LOX 
LD':l 
SUBO 
BCC 
AOOO 
! 
s Trl 
JMP 
t:!SBTN 
,X++ 
,X++ 
SKIP2 
!165521 
' J 
fJ v ~~ rn 
Appendix-0. 
SbTN FD8 0 
FOB 0 
FDS 0 
"'• ...
... ~~~~~~~~~~~~~~~~~~~~~~·~·-·· ! ... .. ............... , .... , ..... , ... "'•"'"!" ................... , ................ , ................... , ...... ~ ............. , ......... , ......... , .... , .. 
,.; .. ... 16:::16 f:\IT MULTIPICATION "1:. I •,• •,• 
.... ~~~·-~-~-~~~-~-~~-~··~-~-~~- I ... .. , .......... , .... , .... , .... , .............................................. , .................... , .... , .... , .... , .... , .... , .... , ..... , .............. 
... I ... 
0VER1 LOX tHIL T P. ISKI 0 6 
LOY ttr·1L Hl ISKIP7 
LOU !tP~001 ! 0/·1!T 
CLR o., u I 
CL~ 1 t u !SKIPS 
lOA 1 , X. I 
LD3 1 ~ y I 
MUL I 
STO z,u 
LOA 0 , X 
LOB 1 ' y SKIP A 
~1UL 
AODD l,U 
STO 1, u 
BCC SK!P3 
INC o,u 
SKIP3 L DA 1 , X 
l D~ 0' y SKIPE 
~1Ul SKI PC 
ADDD 1, u 
STQ 1,u 
BCC SKIP4 
!NC o,u SK!::>8 
SKIP4 LOA o,x SK.IPC 
LD8 I) , y 0"'1IT1 
~WL MLTR 
ADDD o,u t~L TN 
STD o,u PR0~1 
.,_ PQ002 ... 
'!:: ---·~-~~~-~-~~~~·-~·-~·-·~~-¥¥¥~¥ - ·- ¥-¥-¥¥¥¥¥ ¥ ~-- PRCD3 
... ..• r·10DULARISING . .. PR004 . ,. . .. .,. 
•'· 
............ .,, ........................... J .. ~· ..... • .................................................................................................. T E ~~ P ... ¥¥¥¥¥¥¥~~~----¥¥¥¥¥¥~··-¥-¥¥ 
... 
... 
* MCDULA~ ARITHMETIC PROGRAMS FOP ZBO 
LOA 
5EQ 
LD3 
~~uL 
ADOC 
RCS 
C r~o D 
BLO 
ADDD 
STD 
LOA 
SEQ 
L::lY 
CLR 
CLR 
CLR 
LDo 
"'UL 
STD 
LOA 
13EQ 
L.O ~ 
MUL 
ADDD 
BRA 
L DO 
ADDD 
BCS 
CW>[; 
3LO 
A. ODD 
STD 
JMP 
FD9 
r=og 
FCB 
FC~ 
FC~ 
FCS 
~=oP. 
=No 
1 'u 
Q:HT 
~ 15 
z,u 
SKIP5 
~t65521 
SKIP7 
!tl5 
z,u 
o,u 
ornr1 
:tT~i-10 
O,Y 
1 'y 
2' y 
:i15 
o,v 
0 'y 
SKIPE 
II 1 5 
1 ' y 
SKIP.fJ 
1 'y 
2,lJ 
SK~ 0 ~ 
1165521 
SKI PC 
:i15 
2,U 
tt:S64 
0 
0 
0 
0 
I) 
0 
b 
... 
. ,. 
--~----~~--~---~~-~~-·~~-~~~~~--~~----~·~--~---·--·---... , .... , .......... , ......... , ........................................ , .................... , ............................... , .......... , ............... , .... , .... , .... , ..................... , ..... , .... , .... , ............ ~ .... , ..... , ...... , ......... , ......... , ......... , .................... . 
.~~*~~*~*~**~*****~********** 
•'• ..• MODULAR ADDITION 
****************~~*~******** 
START: 
'JRG 
LD 
LD 
~DO 
100H 
11L,(A001) 
f:'.C,(A002) 
YL,BC 
JP c,cv~Rl 
LD A,255 
CP H 
JP tlZ,JVEk' 
LD A, 2 41 
CP L 
JP ?,QV=Rl 
JP r-.JC,OVE~ 
A-2 
. Appendix-A . A- 3 
OVERl: LD 5(, 1 5 I LC tJ , ( 1-1 p t< 2 ) 
ADO t-IL,BC I LD H ,.A 
LD ( S U ~1) , H L I LD A, ( !·1 P 01 ) 
OVER: JP SK!? I LD :: 'A 
I CALL ~ULT 
AUD1: OEFW 0 I LD (PDJ05),HL 
AU02: OEFW 0 ! LO A,CM?Rl) 
sur~: OEFW 0 I LO "1 ' A 
.. **************************** I LD A,(i~P02) , 
. ... f·10 DIJLA R SU8TRACTIC~J ... I LD :'A , ... ., . 
. ........................ ..} .. ~., ................................................................................................................... I CALL "1ULT , .., ...... "'"" .. , ......... , .................. ~"•'" .. , ........ , .......... , .... , .... ~ ....... , ..... , .... , ...................... "•"'"'!"' .. , .. 
SKIP: LD HL,(SUBTl) ! LD Cl!:,(PR005) 
LD D~tCSUBT3) I ADD HL,Jc 
AND A I JP ~., c ' ?, tlK 
SBC HL,OE I LD B,l 
LD A,CSUBT3) I LD .l,(P~:JD2) 
LO !),.l I ADD A,~ 
LD A,CSUBT1) I LD CPR'J02),A 
CP 0 !oAK: LO (PRO':'I5),HL 
JP NC,OVR I LD .l,(;JR004) 
JP Z,ZERO ! LD ::'A 
BACK: LO 3C,65521 I LD A,(P=/0~1) 
ADD HL·, SC I LD D,A 
JP OVR I ADD "iL,DE 
ZLRO: LD A,CSUBT4) I JP ~!( , 3 A K 1 
LJ o,A I LD. ~ , 1 
LD A,CSUBT2) I LD A,(P~002) 
CP 0 I ADD A' g 
JP NC,OVR I LD CPPJD2),A 
JP Z,OVR IBAK1: LD (?RJll5),Hl 
JP RACK. ! LD A,H 
OVR: LD CRES),t-~L ,. LD (?RJOl),A 
JP SKIP2 I LD t.,L 
• I . , LD (?PC04),A 
SUBTl: DEFB 0 I . , 
SUBT2: OEF8 0 I . ~~··~·~·~···-~-·--·~--~--·--, ¥¥¥¥¥¥-¥¥¥¥¥¥-¥¥¥YY¥¥Y¥¥Y¥¥¥ 
SUBT3: 0:FB 0 I 0 ... PRQ01:PRQDZ:PR003:PRQD4 ::: , ..• 
SUBT4: DEl=~ 0 I . ................................................................................................................... ~· ................................. , "' .... , ..... ,.. ........ , ......... , ......... "' ....... , .... , .. "'•"' ..................................... , .. ~o· ., ..... , .......... , .... , ... , ....... 
RES: OEFW 0 I 0 , 
****************~*****#***** I LD A,(PP001) 
0 ... MIJDULAR MUL TIPLICATICPJ ·'· I LD H,A , ... .,. 
. ~~~~~·-··-·~~~~~~~~~~~·--~~~ I LD =' 15 , .. , ........ , .... , .......... , .......................................................... , .... , .... , ........................................ .,., ....... 
SKID2: LD A,(I-\PR1) I CALL t~UL T 
LD H,A I LD DE,CPRr::J3) 
LO A,O~P01) I ADD rlL,DE 
LO E,A I JP NC,BAK2 
CALL ~1UL T I LD ?.Ct15 
LD (PR003),HL I ADO HL,SC 
LD A,(MPR2) I LD CPR'JJ3),HL 
LD H,A I JP ~.AK 3 
LD A, (I~ P 0 2) I8AK2: LD (PROD3),HL 
LD E, A I LJ 0,255 
CALL MULT I CP H 
L~ (PPOr:Jl),HL I JP ~JZ,3AKJ 
Appendix-A 
BAIC.6: 
BAK3: 
BAK4: 
* 
LD 
CP 
JP 
JP 
LD 
ADD 
LO 
LO 
LO 
LD 
CALL 
LO 
LD 
LO 
LO 
LO 
CALL 
LO 
ADD 
LD 
ADO 
JP 
LO 
ADO 
JP 
LO 
LO 
CP 
JP 
L:J 
CP 
A1 241 
L 
Z,BAK6 
NC,BAK3 
BC ,15 
HL,BC 
(PR003),HL 
A, (PROD2) 
H,A 
E,15 
MULT 
A,L 
CTMPZ),A 
A,O 
CTMPl),A 
E,t5 
MULT 
OE,(TMPl) 
HL,DE 
DE,(PR003) 
HL,DE 
NC,BAK4 
BC,l5 
HL,BC 
BAK5 
(PR003) 1 HL 
.A, 25 5 
H 
NZ,BAK~ 
A1 241 
L 
BAK7: 
BAKS: 
JP 
JP 
LO 
ADD 
LO 
JP 
Z,BAK7 
NC 1 BAK5 
BC 1 15 
HL 1 BC (PR093),HL 
OOOOH 
A-4 
**************************** ; * MULTIPLICATION SUBROUTINE* 
**************************** 
MULT: 
JUMP: 
NOAOD: 
MPOl: 
MPOZ: 
MPRl: 
MPRZ: 
PROOl: 
PR002: 
PR003: 
PR004: 
PRODS: 
PROD6: 
L,O 
o,o 
Bt8 
HL 1 HL 
.NC 1 NOAOD 
HL,OE. 
JUMP 
TMPl: 
fTMP2: 
LD 
LO 
LO 
ADO 
JR 
ADO 
OJNZ 
RET 
OEFB 
DEFB 
DEFB 
OEFB 
OEFB 
DEFB 
DEFB 
OEFB 
OEFB 
DEFB 
OEFB 
DEFB 
END 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
I 
... 
... ****************************************************** 
.... 
... * MODULAR ARITHMETIC PROGRAMS FOR 6502 
* 
* ****************************************************** 
NAM M6502 
ORG $1024 
* **************************** 
* * MODULAR ADDITION ... ... 
* **************************** 
... 
... 
START LOX 
CLC 
LOA 
AOC 
STA 
LOA 
AOC 
S TA 
BCS 
01P 
BNE 
#AOl 
ltX 
3 1 X 5,x 
o,x 
z,x 
4,X 
OVR 
•sFF 
SUBTl 
OVR 
SKIPl 
SU8Tl 
AOl 
A02 
SUM 
SUMl 
LOA 
CMP 
BEQ 
BMI 
LOA 
CLC 
AOC 
STA 
LOA 
AOC 
STA 
JMP 
ORG 
FOB 
FOB 
FCB 
FCS 
5 t X 
I$Fl 
SKI Pl 
SU8T1 
5,X 
Nl5 
s,x 
*0 
4 1 X 
4., X 
SUBT 
$0023 
0 
0 
0 
0 
Appendix-A A-5 
... ! LDA e.,x ... 
... ~*************************** I STA z,x ... 
•'• 
·::;: MODULAR SUBTRACTION •'• I LOA 6,X ...... ... 
... 
-·-·------------------------
I STA I) ' l( .. , .. . ~i· ....... , .... , ......... , .............. , ............... , ......... , ........................ , .............. , ... , ............ , .......... , .. 4\' 
•'• I JSP SU~RT ....... 
ORG SliJ 24 I LOA 4,X 
SU~T LOX ~SUB I"•. ... 
LOA tiO I STA 16,.( 
STA CHECK I LOA 3,X 
L DA o,x .I STA 1 59 X 
C"'P 2,X I LOA 7 ·,X 
REQ OM!T I STA 2, X 
BCS JM? I -LOA 6,Y 
I~C CHECK I STA o,x 
JMP LOA 1 ,-x I JSR SU3RT 
JMPl S E.C I L DA 4~X 
SoC 3,X .I STA 14 9 X 
STA s,x I LOA ?,X 
LOA o,x I ST~ 13,X 
S8C z,x I LD~ F, X 
S T .A 4,X I .STI\ 2, X 
L 0 ll. CHECK I LDA s,x 
REQ ~WLTl I STA 0,X 
CLC I JSR ~ U? RT 
L DA s,x I LDA t.,X 
ADC ~$Fl I . STA 1 2 ' .X 
S T.A 5,X I LOA ., '( -· ' ·.·· 
LOA 4,X I STA 11 , X 
ADC li$FF I LDA . .,_'X 
STA 4,X L STA · 2 ,_x 
MULTl . ji~P · ~IULT I ... •.. 
O~IIT L JA l,Y 
' 
.L D .A. 5, X 
CMP. 3,X I ST.l OtX 
BEQ OM!Tl I JSR SU~RT 
BCS JMPl I LDA 4,X 
INC" ¢HECK I STA 1 0, X 
Jl-.1g JI·IP 1 I LDA 3, X 
01'11!1 ~q~ ~0 ! ST .:1. c; ' X ~ ; i. / . : .t ·~: .. STA ~;x I ... ... 
STA s;x I CLC 
Jr-.1P MULT I LDA l4,X 
ORG 'tC.023 ·I AOC 1 2 , X 
SUB FDP. () I ST~ 14,X 
SUBl FDS 0 I LD"A 1 3; X 
SUBZ FCB 0 
' 
A I) C 11 , X 
SUB3 FOB 0 I STA 13,X 
CH!:CK != c 8. 0 
' 
LOA itO 
..• 
' I ADC 9,X ... 
"· *********************~*#***~ I STA 1,X ... 
... . .. MULTlPL!CAT!Qq ROUTINE ... I CLC ..• . .. . .. 
..... ~·--~~~··~~~·~·~--··~~~~~··· ! Ul~ 1 5 , X ..• ..,~ .. ,.. ,., ........ "·~  ..... , .. ~·~ ....... , ..... , .. '"'!'' ... , ................ f .... , .... , ............... , ......... , .........................
"" 
........ ' 
... I A JC l4,X 
' ORG ·. I ST~ $·1 (\ 2 4 . 1 5 ' ~ 
. ' . ·~ . · .. ·.t. . • ~ 
HUL.T ll:X liMP~~ ! ~p ~'- :1 3, X • - t •. :·. 
: ... 
.. 
. :~· . 
; .. ' ::·:·;{~'j<: ·,·: . ... ··':: .. · .. ... •:r .. <> 
··. \:.····· . 
Appendix-A A- 6 
A DC 10,x I STA 2 ' Y. 
ST~ l4.X I LD~ . ;t 1 5 
LDA ttl) I STA 0' y 
AJC 9,X I ~· 
STA 13, X I JSq su:.cn 
•'• I LD~ 4. ' y .,. 
•'• ~~-~~----------~------------ I STA 1 8, X •.. ..., .... , .... , .... , ........... , ......... , ........ ~ ..... , ..... , .... , ......... , ........... , .... , .... , .................. ,~ .. , ......... , .... , .... , .. 
•'• 
·'· ROUT IN: FOR 1·1 'J 0 U L A R I S I N G •'• ! LDA 3 , X ... . ,. . .. 
!~::: -~~--h-~-~------~-~------~--.. , .... , .... , ..... , .... , .... , ..................... , ...... ,,. ........ , .... , .... , .... , .......... , ..... , .... , .......... , .... , .... _, .... , .... , ............. ! STA ? 'X 
•'• I .,. LD~ -115 
LOA 1s,x I STA I) ~ X 
01P li$FF I JSR SUE'~T 
RNE JMPA I C LC 
L DA 16,X I LOA 4 9 X 
Ci-1P :tSFl 
' 
AOC 1 ~,X 
3EQ J ~1 p 0 I STA 1 9, X 
?.CC JMPA I LOA 3,x 
Jf·l p 3 CLC ! ADC 1 8, X 
A. DC ~ 1 5 ! STA 1 3 , X 
STA 1 6 , X I CLC 
LOA ~i$0 I L:JA 16,X 
.l\ () c 1s,x I ADC 1 9 , X 
STA 15,X I STA 16, X 
Jt·1P A L DA 14,X I LOA 1 5 , X 
STA 2tY I ADC 1 Q, , X 
LOA ~1: I STA 1 5, X 
STA 0, X ·I :.cs J ur~ P c 
JSR SU~RT I C 11P #$~~ 
CLC I SNE ':)VERl 
LDA 16,X I LOA 16,X 
AD( 4, X I (MD •$Fl 
STA 16,X I 8EJ JU~PC 
LOA 15,X I BCC ov:;n 
A DC 3,X I JUMP( CLC 
STA lS,X I L DA 16,X 
BCC OVRA I ADC rtl5 
LOA 15,X I STA 16, X 
CMP q$ff. I LOA #0 
3Nf OVPA I ADC 1 5 , X 
LDA 16,X I STA 1 5, X 
U1P ~$F=1 ICJVt;P.1 BRK 
SEQ JMPC I•'• .,. 
BCC OVRA 
'•'• 
.,. ***********~*****~*****~**** 
JNPC CLC I-·· ... MULTIPLICATION ROUTINE •'• ... ., . .,. 
A DC .It 11) I -·· ~-~~--~~-~-~-~--~~~~~~~--~--... ~¥~--~---·-¥-------~--------
STA 16,X I ... .,. 
LOA #$0 ISU8PT LOA #0 
AOC 1 5, X STA 1 , X 
s:A 15,X I STA 3,X 
OVRA LOA liO I STA 4,X 
STA 1 7 , X I LOY ttlj 
STA 1 3 , X I JMP ~~1(_. 
STA 19,X 1 ov~ Q .A SL 1 , X 
LOA 13,X I ~SL 2, X 
Appendix-A. 
BAK 
BAKl 
OUT 
•'• .,. 
... 
. ,. 
•'• ... 
BCC BA'< 
LOA ~1 
OP.ll l,X 
STA l,X 
CLC 
ROP !),X 
gee flt~Kl 
CLC 
LOA z,x 
A DC , .. , X 
STA 4,X 
LOA l,X 
A!JC 3,X 
STA 3,X 
DEY 
P.!:Q ~UT 
JMP OVER 
RTS 
I 
IMPLP. 
I1-1CN!Jl 
1 ~~c No 2 
ITf.MPl 
ITEMP2 
IMPR 
IMND 
I PROD! 
IPRC'D2 
fPROD3 
IPR'J04 
I PRODS 
IPROC6 
IPPOD7 
IPRG08 
ITMPl 
ITMP2 
IT~~ P 3 
I 
ORG 
Fcg 
t=(CI, 
FC8 
FC3 
!=CB 
FD~ 
FJS 
FC9 
FC~. 
FC'3 
FC5 
FCB 
FC3 
FC3 
FC e. 
FCI?o 
FCS 
FC~ 
END 
~0023 
0 
0 
0 
0 
0 
0 
0 
0 
Q 
0 
I) 
0 
c 
0 
0 
0 
0 
START 
A-7 
-~~~~--~~-~~~-~~--~~~~·~~-·-~·~~---~~~~~----·-~-·~~--­¥¥¥¥¥¥¥¥¥¥¥¥ ¥¥¥¥¥¥¥¥¥¥¥- ¥ ¥-- ----- ¥¥¥¥¥¥¥¥¥¥~-
....... 
. ,. ~ 32/16 BIT DIVISION FOR MC6809 MICROPRJCESSCR 
START 
SK IP3 
SKIP4 
NAM 
ORG 
LOX 
LOY 
LDU 
CLR 
CLR 
L8A 
LOB 
~UL 
srn. 
LOA 
LOB 
MUL 
ADDD 
STD 
BCC 
INC 
LOA 
L08 
MUL 
ADOO 
STQ 
ace 
I ~JC 
LOA· 
LOB 
. ~1UL 
AOQD 
STD 
DIVISION 
$0000 
#MLTR 
~MLTN 
#P~OOl 
,u 
l,U 
1 , X 
1 'y 
.2 'u 
'X 
1 , y 
1 ,u 
l,U 
SKIP3 
,u 
1 , X 
' y 
1,U 
1 'u 
SK!P4 
,u 
' X 
'v 
,u 
'u 
I::: 
I::: ~-··~~····-~-~--~-~-~~---~--,, ........... , ..... ~ .. , ..... ~ ....................... ,.. ....................................... , ................ "'•"' ....... , .... , .... , ..... .. 
I :': ~: 
I ::: ot,: 
I ::: ::: 
I ::: ::: 
32-BIT PRODUCT IN 
PROD1:PPOD2:P~u03!~RJ04 
... 
. ,. 
... 
. ,. 
32 BIT I 16 BIT UNSIGNED * 
CI'/JSION •'• .,. 
I* ~~~*~**~*~*~**~************* 
I::: 
I 
I 
I 
I 
I 
I 
I 
I 
I 
IDIVICE 
I 
I 
I 
I 
I 
' I 
I 
I 
ISK!P 
I 
I 
LOIJ 
STO 
LDD 
STD 
LDD 
STA 
STO 
LOA 
STA 
ASL 
RDL 
~OL 
POL 
RDL 
LOA 
RilE 
LOD 
OP'D 
BCS 
LOA 
suP. A 
STA 
PRODl 
0 V~·J D 2 
PROD3 
:JVNJ4 
:tO 
DVNOl 
QUOTl 
~16 
COUNT 
OVND5 
DVN04 
DVN03 
DVND2 
IJVNDl 
OVNDl 
SKIP 
DVND2 
ovs~ 2 
CH:CK 
DVr.JD3 
::JVS~3 
QVND3 
Appendix-A A-8 
LOA OVND2 !DVSR3 FC~ 00 
SBCA DVSR2 !REM FC3 00 
STA DVN02 IQUOTl FC3 00 
LOA DV"JDl fQUOT2 FOB oc 
sgcA OVSRl IMLT?. FOB 00 
STA DVN Dl I 1·1L TN FCB 00 
ASL QUOT2 I?ROOl FC'3 00 
ROL ()UOTl IPROD2 FCB oc 
INC QUOT2 IPROD3 FC'3 00 
CHECK o::c C OU~!T IPR004 1=(3 00 
BNE DIVIDE IDVNOl FCB 00 
LOO t:lVND2 I.DVN02 FCS 00 
ST:J R E 14 · !DVN03 FC:3 00 
JMP $0283 .!DVN04 F'"o I....J 00 
CUUNT FCB 00 tovrws FC'3 00 
DVSRl. FCB co I END 
::JVSF'2 FCB 00 I 
Appendix-B 
Assembler program source listing for a 15-point WFT A (TMS9900) 
FORTRAN program source listing for a 15-point WFTA 
Appendix-g 
•'• .,. 
p -1 
•'• .,. 15-POINT·W!NOGRAD ALG~RITHM cwcTA) TMS9900 •'• .,. 
·'· ........ ~~~~~-~-~--~~-~-~----~~------~-~·~~--~--~------------¥~¥~¥¥¥~¥¥¥¥¥~¥-~¥~~¥~-~--¥~¥0¥¥~¥¥-¥¥¥¥¥~~¥-¥ ____ ¥ 
ST~RT 
... 
. ,. 
!DT 'HHJ015' 
OPTION XREF,SY~T 
ACRG >6000 
U~DI I..JSP 
LI R4,YREG 
LI RS,XREG 
- ---------------~---~----~--­~ ----·--¥¥¥¥¥¥~-~----~~-¥¥¥¥¥ 
•'• ..• 
•'• .,.
INPUT SHUFI=L!= 
rv18V :::R4,:::RS 
~ov @o(R4),@2(RS) 
MDV @12CR4),34(P.5) 
MDV @lq(R4),@6CR5) 
~10 v J'24CR4),@8(R5) 
~lOV illOCR4),@10CR5) 
r~ov @16CR4),@12CR5) 
~-1 ov 322CR4) ,@14(R5) 
1'-IOV Cl2BCR4),@16(R5) 
r-1ov @4(R4),@13CR5) 
r~ov :l20CR4), nOCR5) 
r-~a v @26CR4),@22CR5) 
r1 ov @2CR4),c.1?.4CR5) 
~10V @8(R4),.il26CR5) 
~1 ov @14(R4) ,:::128CR5) 
* **************************** 
* * 3 POINT PREW~AVE •'· .,.
...... .. ..... ~ ....................................................................................... .,~ ........................................ .;. .... • .. 
¥ ¥¥¥¥¥¥~--~-¥¥~--¥¥¥¥¥¥¥Y¥¥¥¥ 
•'• .,.
... 
. ,. 
LODPl 
•'• .,.
LI 
MO.V 
r~ ov 
BL 
~10V 
r~ov 
~~0 v 
BL 
1·10V 
MDV 
MOV 
gl 
MOV 
t10V 
r~ JV 
tiL 
MOV 
RS,XREG 
@10(R5),R0 
3l20CRS),Rl 
i'ADDSUB 
R2,@l0CR5) 
R3,@20CR5) 
:;:R5,R3 
~ADD 
R3,:::P,5 
@12CRS),R0 
~22CRS),Rl 
@ADDSU6 
R2dl2(t:?5) 
R3,@22CP5) 
~2CP.5),R3 
.l)A')D 
!='3,.il2(D5) 
, ..• •.. 
I r~ov ~l4(R5),R0 
I MOV @24(R5),Rl 
I BL 2ADDSU3 
I 1-!0V ~2,~14(R5) 
I MOV R 3 , .JJ 2 4 C ~ 5 .) 
I ~-10 v 2'4(t<5),R3 
I BL i-~00 
I MOV R3,0:4(1<5) 
I ·'· ... , .. 
I MDV il16(K5),~0 
I MDV .il26('15),Rl 
I BL :.JAODSUB 
I ~~0 v R2,@16(P5) 
I ,·~ov R3,.il26(~5) 
I MOV .i'6(R5),R3 
I oL uADD 
I MQV !:13,36(R5) 
I* 
I M'JV @18(R5),P.0 
I MDV il28CP.5),Rl 
I BL aADDSU5 
I MDV R 2 , a 1 3 ( R 5 ). 
I HOV R3,@29(D5) 
I MCJV 33(R5),P.3 
I BL ,iJAOD 
I MOV R3,@8(R5) 
I::: 
I::: ~~~·~-~~··~~--~~~~~·~~~~--~~ _,.. .. , .............. - .......... , .. "'('> .. , ............... '"'(" .,, ............................... , ................ , ................................ 
I ~- .... 5 POINT PREWEAVE •'· •.• ... .,. 
I•'· ~~-~-~-~·~-·~h--~~~~-~-~~-~-. .. .. , .. "'•"' ............ , .... , .................. , .... , ............................ ,. ....... , ... , .. ~ ..... , .......... , .... , .............. ,.. 
I•'• ... 
I LI R6,ZREG 
I WJV Cl2CP5),R0 
I MCV 1~(P5),Rl 
I SL ~~o.rsua 
I MOV ~ 2, 2 2 C R 5·) 
I MQV ~3,@6(R6) I ... ... 
I MQV @~(R5),R0 
I t-10V a/4(R5),P.l 
I BL !!ADD SUB 
I MDV R2,@4(R5) 
I MQV R3,@10(R6) 
I MDV il6(R6),R2 
I BL a,lOD 
I ~1CV R3,@8(R6) 
, ... 
... 
I MDV Ol2(P5),t?O 
I 1·10V ).:.(P5),Pl 
Appendix-8 R-2 
3L a)A~DSU8 I HCV ?.3,@24(R6) 
~1 0 V· R2,a2CR6) I•'• ..
MOV R3,::.J4CR6) I ::: ~-------·------------·------.. , ...................................... , .... , .... , .... , .. "" ........ , .... , ......... , .... , ..... , ..... , .... , ..................................... 
r~ov :::R~,R3 I ...  ,. ::: ).IULTIPLICATION ·'· ., . 
SL @ADD I•'• 
---------------·------------
. ,. ............. , ..... , .... , .... , ........... , .................... , .... , ................. , ............................. , .... , .... , .... , .... , ............ 
~ov R3,:::R6 I :-,: 
•'• I MDV ;FWD,~l .,.
... I .,IEQ F~WD . ,. 
~1 0 v @12CRS),R0 I LI ;n,COEFC? 
/·10V · .::J18CR5),~1 I•'• ...
BL 2ADOSU3 I J ~:1 p OVER 
t~OV R2,@12(R5) IFRWO LI P.7,CIJEFF 
r~ o v R3,@18CRS) I ::: 
:::: IOV=R L! R4,0 
MOV @16(RS),R0 I LI R8,65521 
r~ ov @14(R5) ,~1 ILOCP ('1,0V :::R7+,Rl 
RL ,i)AODSUB I NOV JJZREGCR4),R2 
MDV R2,@14CR5) I r~ P v ~ 1 , R 2 
MOV R3,iil2ZCR6) I OIV P?R,R2 
/·10V a 1 3 C R 6 ) , R 2. I /·1(JV . R3dZREGCR4) 
BL 2ADD I INC T R4 
~1CV R3,@20CR6) I cr R4,3b 
·'· I JN5 LOOP .,. 
MDV @12CR5),R0 I ... •,• 
MDV @14(R5),Rl I•'• 
----------------------------
... ---~¥¥¥¥¥ _____ ¥¥-¥¥¥~¥~~----
e.L @AODSUB I•'• :::: 5 POINT POSTwEAVc !:: .,. 
r~ov R2,@14CR6) I ... ~-····~··~···--·-·~-··-····-.,. -~----¥----~¥----¥-----~----
/10 v P3,Cll6CR6) l :~ 
r~ o v @11J(~S),R3 I MOV :;:R6,R3 
BL @ADD I MOV R3,:::RS 
MDV R3d12(R6) i MDV 32CR6),R2 
•'• I ~L. .iJ~DD .,. 
MDV @22CR5),~0 I ,.~,. v R3,22CR6) 
nov :il23CR5),Rl I WJV J6(R6),R0 
SL :ilAOOSUB I MOV .~3(R6),R1 
MDV R2,@22CR5) I 3L .:!SUS 
MDV P.3,J30(R6) I 1·10V R3,0l6(R6) 
•'• I r~o v ::l3CR6),R2 .,.
MDV @26CRS),RO I WJV ~10(1<6),~3 
~~ ov @24CRS),Rl I BL :JADD 
BL @Ar::>DSU8 I MDV R3,211J(R6) 
MDV R2,@24(R5) I r~ov 1>2(R6),R0 
;\1 DV R3,@34(R6) I MOV i'4(R6),P.l 
~~a v @30(R6),P2 I i3L ])A ::lOS US 
~L @ADD I MC1V qz,C12CR6) 
11 ov R3,@32CR6) I MDV R3,@4(R6) 
•'• I MDV .iJ2CR6),R0 .... ~ 
rwv @22CRS),P0 I. HQV .!l6(R6),Rl 
MDV ?24CRS),Pl I 3L .::JADDSU3 
BL @AODSUB l MDV ~2,i'2CR5) 
MDV R2,Cl26(R6) I ; l·lOV R3,.:t8CR5) 
t~ 0 v R3,@28CR6) I fHlV ~4(R6),R0 
r-~ov 3)20CRS.),R3 I 1·1 tJ v 310CR6),Rl 
BL ::JAQO I BL ;:JAOfiSU~ 
a:..< 
··' <J 
~ov R2 9 @4(~5) I I·~OV ~3d28CR5) 
~~ 0 v R3 9 0:6(R5) I WJV .:)23CR6),RO 
•'• I MQV cl134CR6),Rl .,. 
r·10V @12(~6) 9 R3 I aL aAJDSUS 
MOV R3,@10CR5) I i~1 0 v R2d24CRS) 
MDV 314(R6) 9 P2 I ~~Q v ~3,@26CPS) 
8L 3A~D . I::: 
;'·10V ~3,214CR6) I::: ··~··········~······~······· .... ~ ............ , ...... ~ ....................... ~ ...... , .... , .. _, .... , ..... , ..................... , .... , .......... , ..... , ... , .... ' .. , ........... 
~1 0 v ~18(R6),R0 , ... :': 3 POINT PQSTWEt.VE •'• ... / ... 
MDV ~20CR6),Rl , ... ··~·····~···~··············-... .. , ................. ,, ........................ , .. :·" ....... , .... ,.. .. , .... , .... , ... , ......... , .............. , .... , .... , ............... 
BL JSU8 , ... ... 
MDV R3 9 @18CR6) I . MCV :::~S,R3 
~·1 0 v <i'20CR6) 9 R2 I MOl/ lll 0 C R 5 ) , R'2 
MDV ·ll22(R6),?3 I BL .VA DO 
BL @ADD I 1·101/ R3,Q:lOCR5)" 
MOV R3,@22(Kf;) , ... ... 
1\10 v @14(R6),?0 I ~, n v ;:)2(RS),P3 
MDV .:iJ16(R6),Rl l ~18 v J:ll2CR5),R2 
3L ~AODSUB I BL @ADO 
MDV R2,@14(R6) I MOV R3,@12CR5). 
MDV R3 9 Jl6(R6) , ... .,. 
1·10V 214(R6),R0 I r~o v :i)4(P5),R3 
MDV @18CP6),Rl I MOV 314(R5),R2 
BL 2ADOSUB I BL @~DO 
f·10V R2,@12CR5) I ~-10 v Rj 9 @14(R5) 
1-1DV R3 9 .1H8(R5) I•'• .,. 
MDV @16CR6),RO I MOV :il6(R5),R3 
MDV @22CR6),Rl l ~IG V ill6(q5),R2 
5L @AODSUB I BL ~ADD 
MDV R2,314(R5) J ~1 [) v P.3,216CPS) 
MDV R3,@16CR5) I::: 
... I MDV aJ8CP.5),R3 ... 
t~OV @24(?.6),P3 I MCV @18CR5) 9 R2 
~10V R3,@20CR5) I BL @~:)~ 
rwv j)26(R6),q2 I MDV t.>3,.il18CR5) 
8L .:;)A[) 0 J•'• ... 
MOV R3 9 @26CR6) I rviOV 310CR5) 9 RO 
. r~ov @34CR6),?2 I t-IOV 320CR5),Rl 
NOV :\'32(R6),~3 I eL JADClSUc 
BL O:ADD MC'V R2 9 210CR5) 
1'\0V R3,@34(R6) ''10V 1:(3,320(P5) 
t~OV. => 3 0 CR 6 > , R 0 "' 
~lOV · @32CR6),Rl MCV al2(R5).,RO 
BL @SUB MOV -ll22CR5) ,Rl 
1·1DV R3,~30Cq6) BL @AODSU:! 
r~ o v @26(R6) 9 PO MOV qz,il12CR5) 
MOV @ 2 8 ( R 6'), ~: 1 1~0 v R3,.:J22CR5) 
BL iAODSUB :-': 
110V R2,@26CP.6> ~'iO v ,j)l4(~5) 9 RO 
I-10V R3,228CR6) MDV ~2.:.(~5),Rl 
MCJV .ll26(R6),R0 ~L 1ADOSUB 
r1nv il30CR6),rn t-1CV "2,~14(1='5) 
BL O'A::JuSU8 I-1JV C::':\,i).!. .. (P.5) 
~~ 0 v R2,G.l22(R5) :': 
Appenclix-8 
•'• .,. 
...... 
.,. 
~A 0 V 
~10V 
8L 
~10V 
~~ 0 v 
MDV 
MDV 
3L 
MDV 
1-10 v 
'il16CRS),R0 
ll26CRS),R1 
@ADD SUB 
R2,@16CR5) 
R3,@26(R5) 
@18CRS),RC 
JJ28(R5),R1 
~ADDSUB 
R2,:!180:5) 
R3,.il28CR5) 
* * OUTPUT SHUFFL~ •'• .,.
- --~~·~··········~··········~ .. , .................... , ......... , .... , .... , .... , .... , .... , .... , .... , .... , ............. , .... , ..... , ................... , ................... , .... , .. 
•'• .,.
... ; 
.,. 
... 
. ,. 
MOV. :::R5,:::R6 
r~ov @12CR5),~2CR6) 
MDV @24CRS),OJ4(R6) 
r-1ov ~6(R5),@6(P6) 
r.10V @18CRS),:il3(K5) 
~~ov ~20CR5),J10(R6) 
r~ov @2(R5),cill2(R6) 
r~ov ll14(R5),@14(R6) 
'10V @26CR5),@16CR6) 
MDV ll8CR5),@18CR6) 
MDV 310CR5),@20(Ro) 
MDV :iJ22CR5),@22CR6) 
MQV <i)4(R5),cil24CF:6) 
I-10V· .lJ16CR5) ,CJ26CR6) 
r·10V Cl28CR5),@28(R6J 
B @>0800 
# * ADD & SUBTRACT SUBROUTINE* 
~­.,. 
A ODS UB 
PLUS 
~10V 
A 
JOC 
CI 
JL 
AI 
R1,R2 
RO,P.2 
PLUS 
R2,65521 
SUB 
R2 ,15 
ISU~ 
I 
I 
I 
I 
I FitJ 
I::: 
~, ov. 
s 
c 
JL 
AI 
RT 
?.-4 
PO, R3 
:n, R3 
~ 1 '~ 0 
~I;~ 
R3,65521 
I* ***************~~*~***~***** I* * ADDITION SUBROUTINE •'• ... 
I* *****~********************** 
I::: 
!ADD 
I 
I 
. I 
I PLUS1 
I TAG 
I t.: 
I::: 
A R2,R3 
JOC PLUS1 
C! R3,65521 
JL TAG 
u .R 3, 1 5 
en 
I* **************************** I ::: ::: SHUFFLE= VECTORS ·'· .,. 
I* *****~*~~~***~*************~ 
I ::: 
ICCEFF 
I 
I 
I 
I 
f 
... 
. ,. 
WSP 
Yt<EG 
XRfG 
ZREG 
LIM 
~WD 
LAST 
DATA 1 ' 16379, 
DATA 19136, 1S005, 
DATA 32759, 8192, 
DATA 36817, 5753, 
O.HA 16087, 2'1032, 
DATA 23174, 43615, 
DATA 61153, 5460, 
DATA 46773, 20640, 
DATA 6552, 57331, 
[')ATA 2.0 122, 34561, 
JATA 29504, 28641, 
~ATA 5 91 3' 24748, 
8SS 32 
ess 30 
BSS 30 
sss 36 
5SS 2 
ass 2 
END START 
13376, 
48547, 
45457, 
25311, 
R748, 
1-+~5, 
1836-., 
54q3, 
37975, 
24521~ 
12521, 
21933, 
Appendix-B 
c 
c 
c 
c 
c 
* 
15-POINT WINOGRAD ALGOQITH~ (WFTA) 
~~~~~----------------------------------------~--·-----­¥¥¥~~¥¥¥¥¥¥¥¥¥¥¥¥¥¥~¥¥¥••¥¥¥¥¥¥¥¥¥¥¥-¥¥¥-¥¥¥¥¥¥¥¥¥¥¥¥¥~ 
IMPLICIT REAL*S(A - H,O - l) 
DIMENSION X(15), YC15), ZC18), OUT{15) 
DIM~NSION COEFC13), COEFRC18) 
lNTEGER IQF(15), IR~IC15) 
REAL*8 MODO 
C INPUT SHUFFLE VECTORS 
c 
DATA IRF 10, 3, 6, 9, 12, 5, P, 11, 14, 
1 2, 10, 13, 1, 4, 11 
c 
C OUTPUT SHU~FLE VECTORS 
c 
DATA IRFI 10; 6, 12, 3, 9, 10, 1, 7, 13, 
1 4, 5, 11, 2, a, 14/ 
c 
C FORWARD TRANS~ORM COEFFICIENTS 
c 
c 
DATA 
1 
2 
3 
4 
DATA 
1 
2 
3 
4 
COEF /1.QO, 16379.00, 13376.~0, 19136.80, 
18005~00, 48647~00, 32759.00, 8192.00, 
45457.00, 36817.00, 5753.00, 25311.00, 
16087.00, 29032.00i 8748.00, 23174.00, 
43615.00, 1465.00/ 
COEFR /61153.00, 5460.00, 18364.00, 46773.00, 
20640.00, 5493.00, 6552.00, 57331.00, 
37975.00, 28122.~0, 34561.~0, 24521.00, 
29504.00, 28641~00,12521.00, 59f3.DO, 
24748.00, 21938.00/ 
C READ INPUT DATA APRAY 
c 
FRO = 0.0 
READ (5 1 *) CY(I),I=1,15) 
DO 10 I = 1, 15 
10 XCI) = YCIRFCI) + 1) 
DC 20 I = 1, 5 
T = MOOO(XCS + I) + XC10 + IJ) 
XCI) = MOOOCXCI) + T) 
XC10 + I) = MO.DOCXC5 + I) - XClO + !)) 
tc5 • I) = r 
~0 CONTINUE 
J = 1 
DO 30 I = 1, 3 
IND = 5 * CI - 1) 
$1 = MOOOCXCINO + 2) + X(!NO • 5)) 
$2 = MODOCXCIND + 2) - XCIND + 5)) 
$3 = MOOOCXCIND + 4) + XCINO + 3)) 
$4 = MOOOCXCIND + 4) X(INO • 3)) 
S5 = MODOC$1 + 53) 
B-5 
Appendix-B 
c 
S6 = MODGCS1 
S.7 = 1--1000(52 + 
s B = r~ a DOcs 5 + 
ZCJ) = SS 
ZCJ + 1) = S5 
ZCJ + 2) = S6 
ZCJ + 3) = S2 
ZCJ + 4) = S7 
ZCJ + 5) = S4 
J = J + 6 
30 CONTINUE 
53) 
S4) 
X(INO + 1)) 
IF CFRD .EQ• 1.00) GO TO 50 
DO 4 0 I = 1, 18 
40 Z(I) = MODOCZCI)*CDEFCI)) 
GO TO 70 
50 DO 60 I = 1, 18 
60 ZCI) = MOOOCZCI)*CDEFQ(I)) 
70 J = 1 
DO 8 0 I = 1, 3 
IND - 5 * CI - 1) 
S9 = MOOO(Z(J) + ZCJ 
S10 = MODOCS9 + ZCJ + 
S11 = MODOCS9 - ZCJ + 
S12 = MODDCZCJ + 3) -
S13 = MOOOCZCJ + 4) + 
S14 = MODOCSlO + 512) 
$15 = MODOCS10 - S12) 
$16 = MODOCS11 + $13) 
$17 = MODOCS11 - S13) 
XCIND + 1) = Z(J) 
XCIND + 2) = $14 
X(IND + 3) = Sl6 
XCIND + 4) = CS17 
X(IND + 5) = S15 
J = J + 6 
80 CONTINU!:·· 
DO 90 I = 1, 5 
+ 1)) 
2)) 
2)) 
ZCJ + 4)) 
ZCJ + 5)) 
T = MGDO(X(I) + X(S + I)) 
T2 = MODOCT ~ XClO + !)) 
XC10 + !) = MODOCT - XClO + !)) 
xes+ I>·= rz 
90 CONT!NUE 
DO 100 I = 1, 15 
OUTCIRF!CI) • 1) = XCI) 
100 CONTINUE 
WRITE (6,110) CYCI),I=1,15) 
11 0 F 0 R r~ A T C ' ' , . 5 F 1 0 • 2 ) 
WRITE (6,120) 
120 FORMAT (' ',II) 
WRITE (6,130) (OUT(!),I=1,15) 
130 FORM~T (' ', 5F10.2l 
STOP 
END 
B-6 
Appendix-B 
c 
c 
c 
DOUBLE PRECISJON FUNCTION MODOCF) 
REALt.:8 f, MOD 
MOO = 65521.00 
IF (F .LT. 0.000) GO TO 10 . 
MODO = DMOOCF,MOD) 
GO TO 20 
10 MODO = MOD - DM08(-F,MOD) 
20 R!:TURN 
END 
B-7 
Appendix-C 
FORTH program source listing for a 60-point WFTA (TMS9900) 
At-Jpendix-C 
( THIS PROGPAM PERFORMS WINOGRAD LENGTH 60 
FORWARJ AN~ REVERS~ TPANSFO~M ) 
C INPUT AR~AY IS Y AND THE ?ES~LT OF TRANSF1RM 
IS ALSO STORED IN AR?AY Y ) 
: s 
: s 
DECIMAL ( VARIABLES USED FOR TEMPOPARY STORAGE ) · 
0 INTEG::R SO 0 INTEGER Sl 0 INTEGER 52 0 InTEG~R 53 
0 INTEGER $4 0 INTEGER S5 0 INTEGER Tl 0 I~TEGER TZ 
0 INTEGER T3 0 INTEGER T4 0 INTEGER TS 0 !NTEG::R TMO 
0 INTEGER T~l 0 INTEGER TM2 0 INTEGER TM3 0 INTEGER TM4 
0 INTEGER H1 
C ARRAYS USED FOR CO~PUTATION ) 
144 ARRAY FCOEF 144 ARRAY RCOEF 120 A~RAY X 144 ADRAY Y 
120 ARRAY RF 120 ARRAY RFI 
SINT 0 SO ! 2 Sl ! 4 S2 ! 6 S3 ! ? S4 ! 10 SS ! 
INTZ 0 TMO ! 2 TM ! 4 TM1 ! 6 TM2 ! 8 TM3 ! 10 T~4 t ; 
lCHG TMO 3 10 • TMO ! TM J 10 • T~ ! TMl ~ 10 + T~l ! 
TM2 2 10 + TM2 ! TM3 @ 10 + T~? t TM4 @ 10 + TM4 
2CHG SO @ 12. + SO ! 51 ~ 12 + S1 ! 52 ~ 12 • 52 
! S3 a 12 • 53 J 54 ~ 12 +·54 ! ss J 12 + ss ! : 
( INPUT SHUFFLt VECTORS ) RF FILL 
0 72 24 96 48 90 42 114 66 1 '3 60 12 84 36 
30 102 5'+ 6 7~ 80 32 104 56 8 ~~~ ? 74 26 :J~ .. 
20 92 44 116 68 110 f,2 1 4 86 313 40 1 , ? ... 64 1c 
10 82 34 106 58 100 52 4 76 23 70 22 94 46 
( OUTPUT 5f-IUFFLE VECTORS ) ~FI FILL 
0 24 4~ 72 96 30 54 73 102 6. 60 84 10.3 12 
90 114 18 42 66 40 64 88 112 16 70 94 11.3 22 
100 4 28 52 76 10 34 58 82 106 80 104 3 32 
110 14 38 62 
: s 
( COEFFICIENTS 
rCOEF 
. " • ..J 
1 
1 
1 
41224 
32759 
32759 
32759 
3685 
.49434 
49434 
49434 
33074 
FILL 
16379 
16379 
16379 
13991 
8192 
8192 
8192 
11774 
36489 
36489 
36489 
S6 93 9 
86 
FOR 
13376 
13376 
13376 
~3009 
45457 
45457 
45457 
18768 
56773 
56773 
56773 
32 
20 44 63 92 116 . 50 
F=ORWARD TRANSFORM ) 
64390 46385 48647 
64390 46385 43647 
64390 453R5 48647 
·26608 10376 22681 
34457 28704 25311 
34457 28704 25311 
34457 28704 2.5311 
25609 49957 64260 
45080 23174 64056 
45080 23174 64C·56 
45080 23174 64056 
5797 23796 17202 
( CO E F F I C I= tiTS F 0 1:> R: V E K S E TRANS F 0 R M. ) 
74 93 2 
C-1 
lOP. 
98 
88 
118 
35 
46 
56 
26 
RCOEF FILL 
: s 
64429 1365 
64429 1365 
64429 1365 
3681 1177g 
1638 30713 
1638 30713 
1638 30713 
27239 15092 
58145 qzzo 
58145 9220 
58145 9220 
50784 2041 
4591 9347 
4591 98-+7 
4591 9847 
30785 . 35358 
25874. 17990 
25874 17990 
25874 17990. 
52104 12439 
13250 44432 
13250 4.-+432 
13250 44432 
30577 4.0308 
4687 50Cil4 
4687 50~14 
4o87 50514 
4541 64807 
25730 55271 
25730 55271 
25730 55271 
25949 1071 
50619 27276 
50619 2727':-
50619 27276 
60673 45578 
( MODULAR MULTIPL!CATIQN RJUTINE FOR T~E EXT~RM~L 
~AROWARE MODULAR MULTIPLI::R ) 
HeX CODE ALOAO 3F~2 2 L! 3FF4 3 LI 3FF6 4 L! ~ETU~N 
DC:CIMAL 
CODE !CALC 8 PCP 9 POP 0 g 1 2 MOV 0 q 1 3 
~OV 1 4 0 7 MCV 7 PUSH ~~TURN 
A~ULT ALOAD 144 0 DO I FCGEF + @ I Y + ~ !CALC 
T y + ! 2 +LOJP . .. 
' BMULT AL0.40 144 0 DO I ~cc=~ + ·~ I v + 5) !CALC 
I y + ! 2 +LOQP . 
' CMULT FLAG 0 = IF AI·IUL T ELSE 1;)~1UL T THEN 
. , .. 
. .) 
C MODULAR ADDITION ) HEX 
CGDE MOD 1 PDP 2 POP 0 1 0 2 A FNC If ELSE F 1 AI 
THEN FFF1 1 CI FH IF F 1 AI 1 PUSH ELS~ 
1 PUSH THE~J RfTURrJ 
( MODULAR MULTIPLICATION ) 
CUDE 0/ 7 POP 5 POP FF.F1 4 LI 5 0 7 MPY 
5 0 4 DIV 6 PUS~ RETURN 
( REG4 CONT~INS DIVIS~R ) 
( MCDULAR SU6TR~CTION ) 
CODE SBT 2 POP 1 POP 0 3 0 1 MDV 0 1 0 2 S 0 2 ° 3 C 
FLT IF FFFl 1 AI 1 PUS~ ELSE 1 PUSH TH~N R~TUP~ 
( MCDULAR HARJWARE MULTPLIER ) . 
H~X CODE CREG 0 7 CLR 0 8 CLR 0 9 CLR RETUKN 
CODE ALOAD 3FF2 2 LI 3FF4 3 LI 3FF6 4 LI RETURN 
CODE CALC 0 8 1· 2 MDV 0 9 1 3 MQV 1 4 0 7 MOV 7 PUSH R~TURN 
: s 
( 3 POINT PRE-WEAVE ) DECIMAL 
3AD 40 C DC I 40 + ~ + @ I 80 + X + @ OVER OV~R ~~D ! 
40 + Y + ! SBT ! 80 + Y + ! ' +LQOP 
3DAD 4a 0 DO I 40 + Y + ~ I X + 3 ~09 I .Y + 
! 2 +LOOP 
I3PT 3AO 3JAD ; 
( 4 POI;JT PRE-WEAVE ) 
41AD 10 0 DO T y + "' I 20 + y + @ MOO I X. + ! .., +LOOP ... C.• '-
42AO 10 0 DO I 10 + y + CJ I 30 + y + :J OVEP OVED r·l D (1 
I 10 + X + ! S1H I 30 + X + ! 2 +LOOP . 
' 
C-2 
Appendix-( 
42SB 10 0 DO I Y + ~ I 20 + Y + J S~T ! 20 + X + I 2 
+LOOP ; 
43AO 10 0 Db I X + & I 10 + X + @ MCO T~ 
X + @ SBT I 10 + X + Ttvl ::: T X + 
I I + 
2 +LnDP 
:i I 10 + 
: s 
44AC 1 0 0 DO I 40 + y + Q) I 60 + y + @ 'JV':R ov:R '-~OJ 
I 40 + X + ! SeT I 60 + X + 2 +LOOP . I 
45AD 10 0 DO I 50 + '( + jJ I 70 + v + @ OVO:R OV~R I·IOQ 
I 58 + X + ! 5:.>T I 70 + X + 2 +i..:OG 0 . 
' 48AD 1 0 0 DD I 4(1 + X + ~iJ I 50 + y + .;! '~00 :r-1 ! I <+0 + X 
j) I 50 + X .. :il 5 P, T I 50 + X + T i~ a ... 40 + '( + ! 2 
49AD 10 0 on I 30 + y + @ I 100 + y + .:l OV~P OVtR :~C)J 
I 80 + X + ! S ~T I 100 + X +. ., +LJCJP . '-
' 
4AAD 10 0 DQ I 90 + y + a I 110 + y + @ ov:P ':'Vf~ :'·100 
I 90 + X + ! 5P.T I 110 + X + 2 +LOOP ; 
4DAD 10 0 DO I RO + X + @ J: 90 + X + . @ MOrl Tt-l ! I 80 + X 
a' I 90 + X + "'\ SP.T I 90 + X + ! T ~~ a I 80 + X + ! 2 OJ 
I4PT 41AO 42AD 4253 43AD 44AJ 45All 43no 4QAG 4AAO 4DAO 
: s 
: s 
: s 
C MULTIPLICATION WITH COE~FICIENTS ) 
0 HJT f G E R F L ~ G 
FMULT 144 0 DO I FCOEF + @ I Y + ~ Dl I Y + 2 +LOOP 
RMULT 144 0 D~ I PCrE~ + @ I Y + ~ 0/. I Y + 2 +LOD? 
MULT FLAG a 0 = IF F~ULT ELSE RMULT THEN 
( 5 P(li~T PRE-WEAVE ) 
I15PT TM ~ X + i TM3 @ X + @ OVER CVER MOD 51 @ Y + ! 
SBT S5 @ V + ! ; 
I25PT TM1 @ X + 2 TM2 a X + ~ MOO S2 @ Y + TM2 @ X 
+ @ TM1 @ X + @ SBT 54 3 Y + ! ; 
I35PT Sl ~ Y + 3 52 @ Y + @ OVER OVER M~D 51 3 Y + 
5BT S2 @ Y + ! THO @ X + @ S1 @ Y + a MOO SO @ Y + 
I45PT SS @ Y + ~ S4 @ Y + @ MOO S3 @ Y + 
I5PT INTZ SINT 24 0 DO I15?T I25PT I35PT I45PT 2CHG 
1CHG 2 +LOOP i 
( 5 PO HIT POST-WEAVE ) 
FVPT so @ y + @ DUP Sl a y + J) t·~ DO T1 TMO C,r 
1FVPT 53 @ y + j) 55 a y + @ I~ DO TS ! 
2FVPT 53 2 y + a 54 j) y + ~1 SoT T3 ! 
3FVPT Tl @ $2 @ y + 3 8VER OVER ~~ r ~ T2 $ o T 
4FVPT T2 ~ T3 a; OVER OVER I~OD T ~~ a X + ! SBT 
T ~13 ,lJ X + I • • I 
SFVPT T4 3 TS 2 OVER OVER MOD TMl 3 X + ! S~T 
TM2 J X + 
05PT INTZ SINT 24 0 DO FVPT 1FVPT 2FVPT 3~VPT 
4FVPT SFVPT 2CrlG 1CHG 2 +LOOP ; 
C 4 PJINT POST-WEAVE ) 
X 
T4 
401 10 0 DO I X + @ I Y + ! 2 +LOOP 
140 10 0 DO I 20 + X + 3 T 30 + X + .]) OVE:D C'Vt:K 
+ 
~DO I 10 + Y + ! SBT I 30 + Y + 
+ V + ! 2 + LOOP ; : 
I 10 + X + '· I ? " ~v
c-: 
+. 
+LODe> 
+ 
+LQ'JO 
Appendix-( 
422 10 0 DC 
240 10 (I no 
S::l + y + 
2 + LOOP 
4D3 10 0 JQ 
. 340 10 0 U!J . 
90 + y + 
I 100 + 
. (" 
• ..J 
04PT 401 14[\ 
I 40 + 
I 60 + X 
SfH ! 
I O(l + 
I 100 + 
! SBT 
I 
I 
y + ! 2 
402 248 
X + J I 40 + 
+ :i) I 70 + 
70 + y + ! 
X + @ I RCJ + 
X + ;) I !.10 
110 + y + 
+ LOOP . 
• 
L.23 34~ 
( 3 POI~T POST-WEAVE ) 
y + 
X + ~ 
T 50 
-
y + 
+ X 
I 00 
: 03PT 40 0 CJ I Y + i I 40 + Y + J ~rD 
I 2 •Lnoo . • 
OVED C'VER ·~a ::J I 
+· X + ?! ~ f •J + y ... 
2 •UJO::> 
+ ~ JYrR ov~:> '1 0 iJ 
+ X + CJ 
I 80 + Y + a CV~~ OVER MOD I 40 + X • 
+ X + ! I Y + Z I X + ! ~ +LOOP 
sEn T ?O 
• c 
. -) 
( INPUT R~-OROEPING V~CTOQ ~F ) 
!ORO 120 0 CD I RF + 2 Y + 2 ! X + ! 2 +L~JP : 
( OUTPUT QE-JRDERING V~CTDR ~~r ) 
OORC ~20 0 QO I X + ~ I RFI + ~ Y + ! 2 +LOOP ; 
TRANSFOP.'~ IOFfl I3?T !4PT l:5PT 1·1UL T DSPT 
04PT CDPT OJPD 
lTRANSFCJRM IOQD I?DT I4 PT I5PT U~UL T Q5DT 
04PT r:l3PT O!JRD 
( FRO FOR FCJ~APD AND I~V FJR INVE~SE TQANSFOQM 
USING ~ULTIPLY AND DIVIDE INSTQUCT!OM ) 
: FRO 0 FLAG ! TRANSFOR~ ; : INV 1 ~LAG ! TRA~SFOQM 
( 1 F R 0 F 0 R f 0 '.-1 A R D AN 0 1 HJ V F C R P.! V C: R S E T R A ~IS I= DR ~1 
USING EXT~RNAL HA~DWARE MODULAR M~LTIDLicR ) 
: lFRD 0 FltG ! 1TPANSFORM ; : liNV 1 FLAG ! lTRANSFJRM 
)( EMPTY Y c~1PTY 
: ~ 
(-!.. 
+ 
I 
Appendix-0 
Assembler program source listings for the slave microprocessors 
(1 to 18) . 
Assembler program source listing for the master microprocessor 
Assembler program source listing for a 15-point WFT A (MC6809) 
Appenclix-0 
•'• ************************************************************** ., . 
... •'• PPOCESSCJR ''JUI'< ?. E q ! ... . ,. . .. . .. 
•'• ************************************************************** .,. 
N A ~·1 68091 I 51", p 1 5 STD :4 c rJ o 
OUTPUT f. JU S0400 I C LR II ' ·-· 
STATUS E:JU $0402 I CLR 1 ' 1,' . 
Tb E tJU $0403 I LOA 1 ' X 
T2 tOU $0405 I LOR 1 ' y 
INPUT EQU $0410 I ~~ UL 
R6 EQU ~0412 I STD 2,!J 
R;:: r ;;;u ~0414 I LOA ' X 
S E ~'1 EJU o;Q416 L :l3 1 ' y 
•'• 'I :·lUL •.. 
eRr.. $~!:-:00 I AOIJO 1 'u 
'!JP STfl 1 'u J. 
GRCC ~t~;OlOlOOOO RCC St< 0 H: 
LCIU tt P R12 D 1 u~c ' !J 
BEGIN CLRA s 1<'. p 16 L 0 A ! ' X 
STA FL A;G LD~ ' y 
LOA S t=M' i·1UL 
~f:Q F ~~o! AODD 1 'u 
START 
I 
LOA ~ 1 I STC" 1 'u 
STA FLA 1G c.cc SKD19 
FRO L0Y :: HOJO INC 'u I 
L DX ;t M L.T F R IS'<P1q LD c.. ' '1. I 
LOA #1 I I LOB ' y 
STA STATUS I "'lUL 
S yt,!C I A ODD 'u 
CL~ll. I I STD 'u 
STA STATUS I ... ... 
LDD : ~! P LIT I LOA 1 'u 
RPA OVE~ I LOR ~15 
'• NEXT L DY 1: 1·1 C i~ D MUL 
LOX Z:f..1LTRR I AODD z,u 
s n.Jc I ?.CS s I<', p 20 
LDO S c. V ,E i Ci·IPD z~.sssn 
... i I fiLO St<P21 .,. 
OVER S Y ~I.C ISKP20 ADOD li 1 5 
SYNC ISKP21 STD 2 ' IJ 
.A DOD R6 I•'• ...
BCS SKP12 I LOA I I 
' l.. 
C~l P:l .to65521 I LOX t:TEM 0 
~:L 0 SKP13 I CU' ' X 
SI\P12 A DOC 1;:15 I CLR 1 ' X 
SKP13 S·YNC I CL~ z,x 
SYNC I LOP, !tl5 
s Yf\tC I ~11J L 
ADO~ P2 I SHl ' X 
BCS SKD14 I LOC.. ' X 
Ct-'tPD !t65521 I L:JB .rn s 
BLO $11. 0 15 I r·1UL 
SKP14· A o:JD t;l5 I .~ D 0 iJ 1 , X 
•'• I tl ODD 2,U ... 
Appenclix-D 0-2 
scs SKP22 STO ' IJ 
cr~ PD tt65521 LOA 1 'u 
3LO SKP23 L :J 8 til5 
SJ<..P22 AJDD ::<15 ;.llJL 
SKP23 SYNC AOOD :?,U 
... RCS L0°20 ... 
STD TZ C '.<Dr '' ...; "'b5S21 
S'!'\JC o,LO L0°21 
S'!'NC ILGP20 A8DD .tilt; 
SO!C ILOP21 STu 2tU 
STO T6 I ... .,. 
S 'f'-IC I LOA t u 
S\'NC I LOX 1t T C: ~1P 
•'• I CLD t X .,.
STD SAVE I (LD 1 , X 
L C:rA FLAG I CL~ 2 , X 
CIWA ~1 I L 03 li • c: .J. -· 
:HQ 1·1UL T I 1-AUL 
CHPA #2 I srn ' y 
~1 c Q c orJ v I LDA ' X 
LOO SAVE I LD~ It 1 5 
$TO Rf.S I "!UL 
L bRA o::GIN 
' 
AODD 1 , X 
CUNV L J [: SAVE I ADCJD . 2 t I.J 
STO OUTPUT I RCS LDP22 
L B R 1\ 3EGIN I U1D8 li65521 
•'• I ~L:J L0°2? .,.
,'-1UL T INC FL/l.G ILOP22 O.J[l:J ~ 1 5 
LDX #SAVE IL!JP23 STO SAV: 
LOY /I' RES I S Y'·IC 
CLR 'u I SYNC 
CLR 1 ' 'J I SYNC 
LOA 1 , X I SY"JC 
LOB l , y I UIPA NEXT 
t·1UL I ... ... 
STO 2 ' 'J I r~ L T F R !=QB ]_ 
LDA 'y I :-IL TR R F03 61153 
LOR 1 'y I ... ... 
1·1\J L I ORG $0(101) 
ADDD 1 'u 1 r~c~n ::og I) 
ST:J l,U IP~OOl FCS 0 
RCC LDP16 IPR002 FCP. 0 
INC 'u IPRCJD:?. FCg l"r 
LGP16 L D ~ 1, X !P~CD4 FC8 0 
LD'3 ' y IT= ~·1? F(g 0 
''IUL ITt::~Pl FCB I~ 
ADDO 1 'u I TEMP? . FCS 0 
STD 1 'u !SAVE c:os 0 
sec: L 0 P11 !FLAG c:cg 0 
INC , u IRES FD5 ('• 
LUP19 LOA 
' X 
, ... 
•,• 
LOB t y l SRG t.FC:FE 
r,1UL ISTt:1T EQU 1r=poo 
A:JDD ,:J I fHD a;: G It~ 
/ . 
AtJpenclix-0 n ~ li- _"'; 
... ~~~~~-~-~~~-------------------~--------------~---~-----------~ ... ¥¥¥~~-·~-~~-~¥¥¥~¥¥¥¥YY¥¥Y~¥¥Y¥¥¥¥-¥¥¥-¥~YY¥YY¥¥-Y-~¥¥~~¥¥~¥~~~ 
... . .. PRac::ssoR NUW?.ER 2 . .. ... . .. •.• 
... ----~-----------~--------------------------------------------·-... ... .................. , .... , .... , .......... , ............... , .... , .... , .............. , ............... , .............. , ..... , ..... , .... , .... , .... , .... , ............. ,~ ....... , ........ , .................... , .... , .... , ............................ , .... , .... , .... , .... , .................................... ,., ............... 
r,J .Ar-1 68092 ISt<.Pl4 a~OD It 1 5 
OUTPUT ~QU ~0400 IS!<'. P 15 STD T3 
STATUS E JU >1:0402 I SY"JC 
T7 EQU ~04(13 I ~DOD R3 
TS E •.:lU $0405 I Cl($ $1<. 0 16 
T::> EQU ~0407 I CMDO t16552l 
Tl FQU ~ 0 40 •:J I CILO S K 0 1 ... 
INPUT f()U $0410 ISKP16 ~ 0[10 lt 1 5 
R7 r::Qu $0412 ISKP17 STD T1 
RS E':!U !0414 I S HIC 
R.::. EQU $0416 I STD 1·1( NO 
Rl EQU $0413 I C LR 'u 
StM f.QLI f041A I CL~ 1 'u 
... 
' 
LOA 1 , X ... 
ORG $F800 I L :J ~ 1 ' y 
NOP I '·lUL 
o~cc ,.o.;o 101 01) 00 I STD ;o,u 
LDU 2iPRGD1 I LJt. ' X 
BEGIN CL~A I L03 1 ' y 
STA FLAG I MUL 
L 0 'l SEM I :.ooo 1 'u 
Cl,~(1 
... - ·~ F~ D l STD l 'u 
START L Oil ~1 I sec St<.D18 
S T.A FLAG I : r-.JC 'u 
FRO LJY t: !~ C N 0 !SKP18 LOC; 1 ' '( 
LOX ;Hil TF R I LIJ~ ' y 
L D.A !l 1 I i._.,UL 
STA ST.ATUS I ACJDD 1 'u 
SYNC I S T ~l 1 'u 
CLR~ I ~,(( SKD21 
STA STATUS I r~c ,u 
LJD HlPUT ISKP21 LOA ' X 
BC/A OVER I Log 'y 
NEXT LDV #r~ CN 0 I MUL 
lox ~ :~L T R R I 0.000 'u 
SYNC I STD 'u 
LDD SAVE I•'• .,. 
-·· I LOA 1 'u ...
OVER SYNC I LOS :i 1 5 
SYNC I r-1UL 
AODD R7 l ADJD z,u 
BCS SKP12 I ~cs St<'. 0 22 
OlPD 1t65521 I C:·IP D 11S552l 
f.ILO SKP13 l RLG SKD23 
SKP12 ADDO lt15 I S KP 2 2 AOOO til5 
SJ\P13 STCl TS IS!<P23 STD z,u 
SHJC I•'• ...
ADOD R5 I LO~ 'u 
3CS SK?l4 I LOX t:TE r·:P 
OlPD 1165521 I C L~ 'X 
~. L 0 SKP15 I .t:L? 1 , X 
A~pendix-J 
CLR 2 , X I LOY ;:QES 
LOR 1!15 ILOP15 CLC? 'u 
HUL I CL=' 1 'u 
STO 'X I L D .l 1 1 X 
LOA 
' X I LOK 1 ' v 
LDB ~15 I ~UL 
1,1 LJ L STO 2 ''J 
AJJD 1 , X LO~ ' X 
AJJO z,u LOS 1 ' 'f 
9CS SKr24 I·IUL 
CMP[) .tt65521 ADOD 1 'u 
SLO SKP25 STD J 'u 
SKP24 ll.f1JD !t 1 ~- ecr: LCJDlA 
$1\P25 SYfiJC ! N C 
' !J 
•'• ILOP1t? L DA 1 , X .,. 
S Y "lC I LOP. ' y 
ADDO t<l I •A UL 
gcs SKP26 I ADOD 1 'u 
U1PO ~65521 I ST!:' 1, u 
3LO $1('::>27 I ~-CC LOP19 
SKP26 ArlDO 1:15 I I!'JC 'u 
SKP27 STD T3 IL:JPlr~ LOA 
' X 
SYNC I L~3 ' y 
AD:JO R3 I t·!UL 
BCS SKP2Fl I A D~O , u 
01PD .tt65521 I STD 'u 
:3LO SK 0 29 ! ... .,. 
SKP28 .ll. JDD i:!15 I LD~ 1 , u 
SKP29 STD T5 I L o e. ~15 
SYNC I t~UL 
ADDO R5 ! t.ODO z,u gcs SKD30 ! c,cs LCJD20 
CMPD ~65521 I C ~1 PO 4>65521 
BL:J SKP31 I ?, LO LJP21 
SKP30 ADDD 11115 ILJP20 ADDD it 1 s 
SKP31 STO T7 IUJ 0 21 STD z,u 
S Y!\!C , ... . ,. 
SYNC I L DA 'u 
•'• I LOX :tT c 1-lP .,.
s T':l SAVE I CLR ' X 
L DA FLAG I CU' 1 , X 
c r~ P A n I C LR 2 , X 
BEQ >1UL T I LOB :t 1 5 
Cl-1 P A tt2 I ~1UL 
P.EQ C\!i\IV I STD ' y 
.L DD SAVE I LDA ' Y, 
STD R>=:S 
' 
l :J '?· ~tlS 
L8PA E EGH~ I i'-1UL. 
C UtJV L D ') SAVE I .~DOD J ' y 
ST[I OUTPUT I tdJOD 2 , I~ 
I H? A 1'. F G Ttl I [', ( r 1 n P :·? 
i:{ I . ( ~~I)! J II 0 r. '· :• 1 
r~UL T I r,J ( FL {I G I ('. L I J I. u r>.: -~ 
L [)) II'· A V F. ILCJ?22 llOClU '11':> 
Appendix-:.> f'-5 
··- I C'~G SI).JOO ..• 
LUP23 STQ .... 1..> !r~ow ~~R 0 
Sf "!C IPP:JDl FCB 0 
L J :J R3 I PP082 FU\ (\ 
STD TS IPRG03 ~cl) 0 
SYiJC l PPJC4 FC~ G 
SPIC IT::MP ~cs 0 
SfNC I T r: ,, D 1 r:cg 0 
LDD C\3 IT::f-jDJ ~=c~ 0 
STD S~VE IS~V~ F f) p, 0 
L3RA I~:: X T !FLAG ~=c~ (.' 
··-
IR~S FOB 0 
-·· 
MLTFR FOB 16379 I CRG <!FFF=~ 
MLTRR C::):<. 5460 IST;:;T :- -~U ~ F ~{· 'l 
.. , I :NO Ll:: G! ~; 
-.· 
;•: ~~~~~~~~~~~~~~~~~~~~~~~~~h~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~h .. , ......... , .... , .. "'t'"' ............ , .... , .... ,~ ..................... , .... , ..... , ............... , .... , .... , .... , ... "!" ,, ... , .... , ..... , .... , .............................. , .... , .... , .... , ......... , ............................. , ............................ , .... , .... , ......... , .. "•" ............ , ..... 
··- ·'· PROCESSC:R i·IUHfl, c P 3 ·" ... . .. . .. 
:::: 
~~~~~~~~~~~~~~~~~~~~~~~~~h~~~~~~~~~-~-~~~~~~~~~~~~~~~~~~~~~~~-
........ , .... , .... , .... , ......... , ......... , ............................... , .... , .. "t" ........... , .. "t'" .. , .... , ..... , .............. , ........ , .... , .... , ......... , ........ , ....... , ....... , ............................. , .... , ................ -.- - ... ~.- ~ .................... , .... , ............. , .. 
~~A r-1 68093 !l)VcP s y ~~ c 
OUTPUT E :'.)U $0400 SnJC 
STATUS E :JU $0402 .lOOO C\8 
Tb ~QU ~0403 ~cs SK;Jl2 
T4 E~U $0405 (J~ p 0 ~65521 
T<:: :Qu $0407 ! r..Lo SKPl? 
II--. PUT ECU $0410 ISKP12 A JCIO r:15 
R8 F.QU $0412 Is'< P 1 3 STJ T4 
R4 EQ!J S0414 I S Y '-!C 
R~ ::Qu $0416 I A DOD q4 
StM : Ql.l- $0418 I RCS s 1<'. p 14 
:::: I ( ~~ D 0 ,6ssn 
ORG $F800 I BLO SK 0 15 
~·I 0 P ISKP14 nooo It 1 5 
IJ?CC ri 0{01010000 ISKP15 ST~ T":l 
LOU #PP.QOl I S y r,J (. 
BEGIN CLRA I STO SAV': 
STA FLAG I LDD Q2 
LOA S :C:I-1 l SU~:l SAVE 
BE') FC10 I BCC SKD16 
START LO~ .ttl I AO')O "65521 
STA FLAG ISKP16 SYNC 
FRO LOY ;HlCtJO I•'• ..• 
LOX "I~L T F R I STC 1·1Ct!D 
LOA 1:1 I CLP. ' u 
STA STATUS I CLI:I 1 'u 
S H!C I LOll 1 , X 
CLRA I L 03 1 ' v 
STA STATUS I r~ UL 
LDil INPUT I STD 2 , I J 
BRA OVER I L:.JI\ ' ( 
l'H. X T LOY 1'1 r~ C t1 D I L DR 1 ' y 
L [1 X ttl~LTRP I ~LJL 
SYNC I AODD J 'u 
LC"IO SAVE I ST[1 1 'u 
... I ... 0(( :. 1<. p 1 .s 
.-, 
Appendix-l.l D-S 
INC ,u ecs Sr< 0 2P 
SKP18 LOA 1 , X CHPD ~65521 
LOB 
' y ~L'J SKP2° 
MUL !5!<.??.3 AJOD ll' 1 s 
A80D 1 'u !SKP2'? STO T.?. 
STD ! 'u ! S P.JC 
3(( SKD2l . I s y ~!( 
INC 'u ' ~: 
SKP21 LOA ' X I STD SAV: 
LOR 
' y I LOA ~L~G 
~~ UL I Cf.1DA ttl 
AODD 
' u I r:;. E Q t-11JLT 
STQ 'u I C 1-1PA ~2 
... I ~C:Q ([~NV . ,. 
LOA 1 ' lJ I LOr S.lV' 
L DR itl5 I STO RES 
I·~ U L I Lci~A Bf.:G!r-1 
AQOD z,u 1 c :n;v L CJQ SAV': 
°CS SKP22 I srn OUTPUT 
(r-1DJ t;65.521 
' 
L3?.A Br:G!N 
e u: St<P2~ I·'· ... 
SKP22 AD')D ti.l5 IMULT INC F LA 1) 
SKP23 STD 2,U I LOX t:SAVE. 
•'• I LOY atRC<: .,. .. - ....... 
L DA 'u ILCJP15 CLq ' !J 
L DX nTF.MP I rL?. 1 'u 
CLr< ' )( I LOA 1 , X 
CLR 1 , X I LD3 1 ' y 
CLR 2, X I 1-IUL 
LOB til 5 I STD z,u 
1-1 UL I L::JA ' X 
ST'J 
' X I LD~ 1 ' y 
LOA 
' X I ~1UL 
LOB ~15 I ADD::l 1 ' u 
1·1UL I STD 1 ''J 
A DuD 1 ' _"( I :J.(( LIJ 0 16 
ADDU 2 'u I INC ,u 
BCS SKP24 ILOP16 L JA 1 ' /.. 
CMP:l :t65521 I LD~ ' y 
[:,LO SKP25 I I-1UL 
SKP24 A DOD :t15 I AOOD 1 'u 
SKP25 SY"lC I STO 1 f. ' J 
::: I sec LOD19 
s n~c I I r~ C 'u 
STD T2 ILCJP19 LOA ' .'1.. 
SYNC I LOS ' y 
STO 5AVE I 1·1UL 
LDD P2 I .'1000 ,u 
SUBD SAVE I STfl 'J 
RCC SKP26 , ... .,. 
AO'JD 1:165521 I LOA 1 'u 
SKP26 STD T4 I L DR lfl5 
SYNC I MUL 
~['f)Q C::4 I A::JOIJ 2 ' u 
Appendix-[) ,..I_""! 
LUP20 
LGP21 
·'· ... 
LUP22 
LLP23 
RCS LOP20 
CW'C :;65521 
P,LO LOP21 
ADDD :t15 
STD 2,'.! 
LOA ,U 
LOX 
CU' 
CLK 
CLR 
LCl8 
~1 Ul 
STO 
L DA 
LOt. 
t~ U L 
ADJD 
A DOQ 
:'.( s 
CMPJ 
5LO 
ADDD 
ST(I 
SYNC 
LDD 
ST(I 
u T E r~ P 
' X 
1 'j 
2, X 
:: 1 5 
' X 
'X 
:tl: 
1 , X 
z,u 
LCP22 
tt65521 
LOP23 
li 1 c. 
T2 
P2 
SAVE. 
I 
I 
I 
I 
I 
I 
I •'• .,. 
IMLTI=P. 
IMLTP~ 
I:': 
I 
I r1c r~o 
IP~ODl 
IP~CD2 
IPP.JC\3 
IPR084 
l T:: i·1 P 
I T E ~·i P 1 
IT::~D3 
Is ~·JE 
I FLAG 
IRFS 
I::: 
I 
ISTRT 
I 
I::: 
S Y ~JC 
SYNC 
LJD 04 
STD T2 
SPJC 
Lf,t:;A N::n 
;J RG 
FD3 
c:cs 
c:c~ 
r=cs 
~=c~ 
c:c~ 
1=(3 
F C:3 
F~~ 
!= C:3 
C::::JI? 
'JRG 
:: r~u 
F tJ 0 
13 1 76 
13364 
$0000 
0 
0 
0 
0 
0 
0 
c 
0 
0 
0 
0 
$1=c::Ft: 
t.F q 0 0 
O.EGirJ 
~~~~~~~~~~~~~~~~~~~~-~~~~-~~-~~~~~~~~-~~~~~~~~~~~~~~~-~.~-~~~~~ 
............. , ..... , ........ , ....................... , .......... , ..... "' .. , .... , ............. , ........ , ........ , .......... , .................. , ..... , .............. , ........ , .............. , .... , ................. , ..... ~ .......... , ............................................. , .... , ............... ,. 
PROCESSJP NUMB~~ 4 ... ... 
..... .. .................................... ., .............................................................................................. _ ....................... .) .... • .... • ..... • ..... • .... • .... • .... • .... • .... • ..... • .... • ..... • .... • .... • .... • .... • .... • .... • ..... • .... • .... • .... • .... • .... • .... • ..... • .... • .... • .... • .... • .... • .. 
...... ... ..... , .... , ..... , .... , .... , .................................................... -.......................................... , ... , .... , .... , ....................... , .... , .... , .......................................................... , .... , .... , ...................... , ................................ .. 
OUTPUT 
STATUS 
TSi 
T3 
T16 
INPUT 
~'1 
R:;, 
Rl6 
SEM 
Bi:GIN 
STAI·H 
':.QU 
EQU 
f.QU 
EQU 
~ Qll 
EQU 
E ~JU 
~JU 
~~G 
NC'JP 
ORCC 
LJU 
CLP.A 
S T A-
LD~ 
B~') 
LDA 
STA 
LOY 
LC'!X 
68094 
~0400 
$0402 
$0403 
J0405 
$0407 
:S0410 
:;0412 
$0t-14 
$0416 
'!>0418 
$FI300 
""~01010000 
rtPRODl 
FLAG 
SEM 
FPO 
:i 1 
FLt\G 
:, nc ~!O 
~ 1·1 L T !=I:? 
~~EXT 
IOVEC> 
I 
' I 
I 
I 
!SKP12 
lSKPlJ 
I 
I 
.I 
I 
LlJll 
STA 
SH1C 
C L ~.A 
'3TA 
LDD 
~RA 
LDY 
LOX 
SYNC 
LDD 
S 'f"JC 
SYNC 
I'll 
STATUS 
STAT•JS 
I:~ PUT 
ov:R 
H1(~.:o 
:P-1LT~O 
SAV:= 
,1fJOD (.':1 
u.c. )I(J' 12 
CMP!) tt65521 
ML'l SKPL' 
ADO;) ill~ 
S TO T 3 
SYNC 
SUPD P3 
RCC St<.Pl.:. 
AJC'Cl #6c;5:'1 
SKP14 ST~ T16 I BCS s 1", p 2 2 
SYNC I (i'-10!) ~65521 
S OIC I i3LO SKDZ3 
... 15!(?22 A~OD !115 ... 
STD r1Cf,10 !SKP23 SnJC 
CLR ,u I"• •,• 
CLR 1 'u I S Y ~!C 
L~lA 1, X I A:JQO Pl6 
LDR 1 ' y I "'>CS SK =>24 
1·1 IJ L I U~::>C ::65521 
ST'I 2,U I ~LU SK. 0 25 
L DA 
' X ISK.P24 AJOD ':15 LDC\ 1 ' y ISKP25 SPJC 
1·1UL ! STD ·n 
.A91!J 1 'u I SYNC 
STD 1 'u I ST~ s;:,.vt. 
sec SKC'l6 ! L~r-.)_ R3 
I II! C ·,U I 5U'30 5AV: 
SKP16 LDA 1 , X I '3CC SK 0 26 
LCB 
' y I AOOD :!65521 
f·IUL ISK?26 STC: Tg 
ADOD 1 'u I SYNC 
STD l,IJ I SYNC 
BCC SKD19 1 ... 
r r~ c 
'u I ST!J SIIV: 
SKP19 L~A 'y I LOA FLAG 
LOB 
' y I Cr~PA #1 
~~ UL I 3EQ ~1UL T 
ADOD ,~ I 01?~ ::2 
STJ 
' u 
PC() 
1.1 ,_ " (IJNV 
::: LDD s:.vo: 
LOA 1,U ! ST::I R:s 
L:J6 t:15 I LBRA ::I~GPJ 
iWL. ICJNV LJ~ S&lVE 
ADDJ z,u I STD DUT=>UT 
RCS SK P 2 0 I LB?A· tlt=G!N 
CMPQ :i65521 I ... ... 
E1 L 0 SKP21 I ~1U L T H~C FLAG 
SKP20 A DOl: t115 I L~X ztSAV: 
SKP21 s T'") 2,U I LOY ::t.;t:S 
... IL:JP15 CLR 'u . ,. 
LDA 
' u I CL? 1 'u 
LOX z;:r= i~ P I LCA 1 ' X: 
C L '~ 'X LD~ 1 'v 
CLP 1 , X I tvlUL 
CLR 2 , X I srn 2 'L! 
LCJI3 :q:; I L:JA ,I 'f 
MLIL I L Of'. 1 ' y 
ST~ 
' X I I~U l 
LOA ' X I AUOD 1 1 I! 
LD:1 I{ 1 c, I .S T fJ l ' u 
nuL I r:cc LOPl~ 
AOOC 1 , X I INC . 'u 
AODD 2,1J IUJP16 L:JII 1 , X 
Appendix-0 
Lf}!=\ 
' y I .1000 2 ' lj 
r·1UL I 3CS LOD22 
ADDD 1 'u I UlP~J tt65521 
STO 1 'u I ?, LC LCJD23 
~. cc LCP19 ILOP22 .t: COD t: 1 ~· 
INC 'u ILOP23 s y r:c 
LUP19 LOA 
' X I SYNC 
LOP· 
' y I ST~ TJ 
MUL I LD~ Plt, 
ADClJ , u I STD 'SAVO::: 
STO ,u I SYNC 
::: I s nJC 
LDA , t I J I LSRA u:xr .L 
L0~ ~15 I::: 
HUL I r-IL T F R c: D ;:>, 43~47 
AJDD 
·2 'u ~~~LTRK c;:~ 549? 
RCS LO?ZrJ , ... ·-· 
OlPD ~65521 , ... ·-· 
t.LO LOP21 I 'J~G ~; 0 () 0 C'· 
LUP20 AiJI)O ~1S I ~~C W! c: 0 ;:, 0 
LuP21 STU 2 'u jPDJ01 F C ~- 0 
•'• IP 0 0C2 ..• FCB 0 
LOA 'u IP~003 ;:co 0 
L ::JX t:TEMP IPR004 FC5 " J 
CLR 'X I p:\IP FCB (') 
CLP 1, X IT:: r~ P 1 C:(::l, G 
CLP ? 'X I Tft.IDJ ;:::c 1 0 
LOB ~1~ !.S.lVE F[)g (' 
~~ UL !FLAG ;:::(5 0 
S T ~) t X !R:S FOR 0 
U'A 
' X I:;: 
LC5 ~15 I QRG 5FFF: 
ML1L ISTRT ::::::u ~FqQI) 
ADDJ 1, X ., c: ~-JO ;lEG I'l 
•.. 
.............................................................. ,J .. ............................................................................... ,J., ..... , ... .,),. ~· ....... .J .. ...... ..... ..~ .. ..J .. ................................. .J .. .................... ~ ...................................................... 
.. , ..... , ......... , ................ , .......... ~ .. , ................... , .. -, ..... , ........ -...... , .... , ......... , ......... , .... , .. ,, ........ -..... , .... , .... , ......... , ..... , .... , .... , .. -... , .... , .. " .. -..... , ............... , ......... , ..... '•" ...... , .... , .. '•" ....... , .. -..... , ........... -.- ., ... , .. 
·'· 
... PDOCESSDR NUH3ER 5 ::~ ..• ., . 
•'• 
............................................................................... , .................................................................................................................................................................................................................................................... 
.,. .. ........... , .. , .................. -..... , .......... , .... , .... , .... , ..... , .... , .... , .... , ....................... , ..... , ... , .... , ............................... -: ............................. , ....................... _ .. , .. , ............. , .... , .. .,, .... , .... , .............. , ... , ................ ,, ... _, .. 
t~ A~ 68095 STA FLAG 
OUTPUT E:JU $0400 LOA s::--1 
STATUS =uu ~0402 Elt:Q FR[l 
T10 :: :) u $0403 ISTA 0 ":" LO~ :tl 
Ti. F~U ~0405 I STII FLa.G 
Tl6 EQU !0407 IFP.O L DY ~tt~U·l J 
INPUT EQU $0•dQ I LOX z: ·~ L T F R 
RlO ":QU $0412 I LOA 11'1 
R) EQU $0414 s T .n S"rAT:JS .. '-
Rl6 EQU $QL..l6 s v~·c 
Sf:t-1 :: ~J u !.0418 CLRA 
..• sra. STHL.!S 
..• 
1RG $F800 LD'J INPUT 
NC 0 C,RO lJVfR 
ORCC 1t~{010101JOO !NEXT LOY ;oT>!UlD 
L DU ~?ROD1 I LOX i2 ~~ L T R ~ 
B::GIN CLC<:Il I S Yr-lC 
Appendix-D 0-1(1 
L D Cl SAVE IS'<P20 AD9D t' 1 5 
~:: IS'<?21 <:.TL' z,u 
OVER SYNC , ..• ·.-
s n~c I LOA 'u 
ADDO ~10 I LCJX t:T ErA P 
P, ( S SKD12 I C'L'? ' X 
c ,~1?0 ~65521 I CL.R 1 , X 
8LO SKP13 I CLR 2 ' y 
SKP12 . .l. ODD ~15 I LD~ :t 1 5· 
SI'\Pl3 STD T2 I ~UL 
S~NC I STG ' X 
STO SAVE I LOA ' X 
LJD R2 I L o:. ~] 5 
SURD SAVr; I ~>1U L 
BCC SKP14 I .'lJDD 1 ' v. 
AO'JD :t65~21 I AOl'D z,u 
SKP14 STO T16 I ?.CS SKD22 
S Y ~JC I ( r-1DJ !t65521 
SYNC I :::. LO 51< 0 23 
-·· 
ISKP22 aoo:J II 1 ~ .,. 
STO ~~ c iJ l) ISKP23 S~NC 
CLR 'u I•'• .,. 
CL~ 1,U I SYNC 
LOA 1 , X I SUBD ?.16 
LD5 1 ' y I sec SK 0 24 
MUL I t.ODG 1165521 
STO 2 'u I SU'24 snJc 
LD!l. 
' '1. I ST~ 
TZ 
LOR 1 ' y I SYNC 
:--1UL I STO SAV'f: 
A DOD 1 'u I LOO oz 
STD 1 'u I SUBD SAV~ 
BCC SKP16 I RCC SK"2.S 
INC ,u I AJDD ~6~521 
SKP16 LOA 1 , X ISKP26 STD :10 
LD5 
' y I s y ~~ c 
r·IUL I SYNC 
A JDQ 1 'u , ... .,. 
ST!J 1 'u I STD SAVE 
BCC SKP19 I LJA fLAG 
INC 
' lj I UIPA ttl 
SKP19 LOA 'X I 
Q~l"l 
··' t.l:.' HIJL":' 
LOB 
' y I CrH'!\. 1:2 
1·1UL I ~cQ c ~ ~~ v 
ADOD ,u I L Df" snv::: 
STD ,u I STD ~cs 
~:~ I LB?A 3: G I 'J 
L~A 1 'u I CQtJV LOG St.Vr: 
L 0~ :n: I STD O~JTDUT 
~·1 U L I L~PA 3':~HJ 
.A llDD z,u I·'· ... 
3CS SK 0 20 I t·1!J L T !NC FLC.:~ 
OH'D 11'65521 I LDX ;:5.!\V~ 
o. L 0 SKP21 I LCJY :::JES 
Appendix-[) 0-11 
LOP15 
LUP16 
LCP19 
•'• .,. 
LUP20 
LGP21 
CLR 
CLR. 
L:JA 
LOB 
f"lUL 
STQ 
LOA 
L:Jt?. 
~·1U L 
ADIJD 
STO 
BCC 
INC 
LDA 
L DB 
~UL 
ADJD 
STD 
8CC 
INC 
LDA. 
LOS 
~~ UL 
.AD fl D 
STD 
LOA 
LDP. 
~~u L 
ADOD 
BCS 
C·1PD 
gLO 
AOOD 
STO 
LOA 
lOX 
CLR 
CLD 
CLP 
'u 
1 'u 
1 , X 
1 'y 
2,U 
' X 
1 ' y 
1 'u 
1 'u 
LOP16 
'u 1, X . 
' y 
1 'u 
1 'u 
LOP19 
'u 
' X 
' y 
'u 
,u 
1,1.1 
tt15 
2,U 
LOP20 
~65521 
LOP21 
::il5 
2 'u 
,u 
:tTEMP 
' X 1 , X 
: 2, X 
I LOR :.' 1 5 
I t"1UL 
I STD ' X 
I LOa ' X 
I LD:2 I! 1 ~ 
I r1 uL 
I AOrlD 1 , X 
I ~ 8~~' 2 'u 
l scs LJP 2 
I (MPD ::i6S 21 
I ~L'J LOD23 
IVJP22 ADDD 1:115 
!LOP23 STO TH: 
I s y ~~ c 
I s y "1( 
I LDJ R2 
I STD SAVE. 
I s y '!( 
I s n~c 
I LBPA '·JE X T 
I********************** 
I 1·1L TF R l=OB 19136 
IMLTRR FDg 46773 
I•'• ...
I OP.G ¢.0000 
! i'1 c ~-JO FD~ n 
1Pqco1 c:cg 0 
I?ROOZ 1=(8 0 
IPCW['3 FCR 0 
IDRC84 C:(~ 0 
IT ET•1P FCC, 0 
I T C: r~ P 1 c:ca 0 
ITC:MP3 F(g 0 
IS~V: ~=o~ 0 
I FU.G f=C3 0 
I R:: S ;:!Jo. . l.- .... 0 
I ... i ...... 
1RG ~F::F;: 
ISTRT :Qu $FjQO 
I ':NC P,Er;~~! 
, ... 
... 
~ ~~~~~~~~~~~~~~~~~~-~~~~~~~~-~~~~~~~~~-~~~~-~h---h~-----hh-hhh~ ... ,... .., ..... , .. , ............. , ......................... , ........... , ..... , ....................... , .......... , ..... , .......... , ..... , .... , ....................... , .... ~ .. , .... , .... , ......................... , ..... , .... , ... , .. , ... , ............ " ., ... , ............ , ....................... , .... , ... , ..... .. 
OUTPUT 
STATUS 
T11 
T• 
·J. 
T7 
INPUT 
Rll 
Rl 
R7 
•'• .,. ~POCESSJR NUMBEP 6 :': 
~~-~~hh~~~~~~~~~~·~~h-~·~~~~-~~~~·-~-~-h--~h~-~--·~-~--h~-~h-~ .. , .. "•"' .......................... , ..... , ... , ..... , .... , .... , ............... , .... , ..... , ......... , .... , ... , .. "'•"' ................ , .... , .............. , .... , .... , ........ , .... , ......... , .... , .................. , ......... , ...................... , .... , .... , ...................... , .... , .... , .... , ......... .. 
rJ.~ M 68096 Is ::;-1 E=QU $0418 
E QU $0400 , ... .,. 
::Ju $0402 ! ORG $Fi:i00 
EQU $()403 I ~.J Ll p 
f.QU $0405 I o~cc ~~~Ol·JlOOOO 
EQU $0407 I LDU ::P~ODl 
E :; U $0410 I8EG!N .CL;>A 
:: :;)u $0412 I STA FLAG 
~ r:JI.J $0414 I LOA ) :: ~1 
>>)U 't0416 I Q ~' ,., I;_ .4, FQ[J 
Appencii.x-0 [J-12 
STAP.T LOA ltl PIC 'u 
STA FLAG IS!< 0 19 LOA ' X 
Fl-(0 Lf1Y IH,lCN 0 I LIJ:: ' '( 
LOX ,r,t'1L TF !:' I r·,UL 
LJA It 1 
' 
llODCJ 'u 
STA STATU~ I STQ !I 9V 
SHJC I::: 
CLRA I LOA 1 'u 
STA STATUS I LD~ :t' 1 5 
LOn I'J PUT I !··1Ul 
B;;:A OVER I AOOO z,u 
NE:XT LCY :t>10JD I scs SIP20 
LOX ~t-IL TR!:.! I O~PD !i65521 
SYNC I SLO S II. D 21 
L DO SAVF. ISKP2Q AOOD itl5 
-·· 
ISKP21 STD 2 'u .,. 
OVER STD Tll , ... ..• 
SYNC I LD4 'u 
.L\ DOD R 11 I LJX :t T E ,'·1 ::> 
PC~ SKP12 I CLP ' X 
C"''PD li'65521 I CLR 1 ' Y. 
t1LO S!<.P1.3 I CLP 2 ' y . 
SKPl2 AOOD t' 1 :; I LD~ It 1 5 
SKP13 STD T 1 I IJ.UL 
S YN.C I STD ' X 
SYNC I LDA ' X 
SYNC I L D~. !:'15 
SYNC I "'UL 
AOOD R7 I AODO 1 , X 
8CS SKD14 I .\DOD 2 'u 
OIPO :t65521 I ~.c s SKP22 
i3L~ SP'15 I CMDD #65521 
SKP14 AOO:J ,.'t 15 I ~LO ~KP23 
:::' ISK..P22 AJDD :i 1 5 
SKP15 STO r~ CN 0 SKP23 SP:C 
CLR ,u •'• 
CL~ l,U STC T7 
LOA 1, X SYNC 
LD?. 1 'y SYNC 
~~uL S YtlC 
STO z,u S Y ~!C 
LOA 
' X ADOD V} 
LOS l ' y ~. c s SK?24 
"1UL C I'-1P 0 :t65521 
AJD:J 1 ' IJ kLO SKP2C: 
STIJ 1 'u SKP24 ADOD ~1'i 
3CC SKP16 SKP25 SEl T1! 
INC 
' IJ 
s y ~J c 
SKP16 LOil 1 , X !\DOD "11 ... 
LO~ , y c,cs SKP26 
>IUL UP'D ~65521 
AOOD 1 ' lJ RL~ SKD27 
STD 1 'u ISKP2b .ADQD 1115 
~cc SKP19 I··· ... 
Appenclix-0 ~1-1: 
SKP27 STCI SAVE !L2P20 AOOO It 1 5 
LOA FLAG ILOP21 STf'l 2 'u 
U1PA .If 1 I•'• ...
BEQ MIJLT I L~A 'u 
c ~~ p ,.\ ::2 I LOX ~tT'=i~P 
g:Q CONV I CLP. ' X 
LOD S.L\ V E l CLR 1 ' _x 
STO p;:.) I CLR ~ ' X 
L B P .~ i.lEGIN I L~S 1t15 
CUNV LDO SAVE I 1~UL 
STJ CUTOUT I STJ ' X 
LARA 6 EG I f.J I L D~ ' X 
... I LCJ3 ~15 ... 
:'·1UL T HJC FLAG I ~wL 
LOX #SAVE I l\JC'D 1 , X 
L [) y li!RES I A~DD ., ,, ... ' ..... 
LUP15 CLR ,u I ECS LJP22 
CLR 1 'u I C '•I PO ~t65521 
L[iu 1 , X I FLJ LOP23 
LJB 1 ' v IL~P2?. ADf)D t: 1 5 
'1LJ L IL'JP23 ST~ T11 
ST') 2,U I SYNC 
LOA 
' X I LOD ;.' 11 
LJ~ 1 ' y I STD SAV: 
~IUL I SYNC 
.AD ll8 1 'u I SH!C 
STD 1 'u I SYNC 
sec LOP16 ! L'3RA ~~~X T 
INC 'u I*********************** 
LUP16 LOA 1 , X lr-1L TF ~ I= Do 32759 
Log 'y I r1 L T ~ R F DP 6::52 
~1UL I•'• •.• 
A. ODD 1 'u I OKG t.onoo 
ST'J 1 ' lj I i·1C i~D C:~3 IJ 
BCC LDP19 IP;;:Jol FCR 0 
INC 'u IDDQD2 FCB 0 
LGP19 LJA ' X . IP~CD3 FCB 0 
LOB 
' y I>'R004 FC3 0 
MUL I TeMP· FCS 0 
AODD 'u T:: ~1 D 1 
c:rp 
.J ' 0 
STO ,u TEHP3 FC5 0 
•'• SAVE FOE' 0 .,.
L DA l,U FLAG FC3 I) 
LD~ ::<15 RF.S ~= D c. ('\ 
f~UL •'• .,.
A!:JOD 
."'' u ~~r, $FFFF 
gc ~. L0°20 STRT E:QU H 1300 
C:MPO t;'o5521 C IJ ~) 0 EGit! 
I)LO LOP21 ~: 
:::: ~:~ 
... I ... 
. ,. 
. .. 
•'· I·'· ... ., . 
~::: , ... -'•" 
•'• 
... 
I ... •.. 
Appl?ndix-0 0-14 
•'• 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-~~~~~~~~~~~h~~~~~~~-~~~~~~~~~ 
. ,. 
.............. , ..... , ..... , .... , .... , ..... j' ... , .... , .... , .... , ..... , ......... , .............. , .... , .... , ..... , ................................................. , .... , ............. , .... , ..... , .... , ... , .... , .... , .... , .... , .... , .... , ....................................... , ..... , .... , .... , ............ , .... , •• , .... , .. 
•'• ~:: PROCESS·JR NUf"13 E? 7 •'• ... . .. 
•'• ~~~~~~~-~~~~~~~~~-~~~~~~~~~~-~~~~---~~-~-~~-~~-~~~~~-~-~~~~-~-
•,• 
....................... , .... , .... , .... , ....................................................... , .... , .... , .... , .... , ......... , ........ j' .... , .... , ........ f ... , .... , .... , .............. , .............................. , ...................................................... , .... , ............................. 
'•JA f·1 68097 CIOOD PlO 
OUTPUT EQU SO<tOO p, c s S K P 14 
STATUS EQU t0402 ( ~1 D 0 it6~52! 
T12 EQU $0403 3L::J SK 0 15 
TZ f ou $ 0 4'0 5 SKP14 ~ 0.00 1t15 
TlO f!)U $0407 SKP15 STD T8 
Tb E QU $04(\9 SHIC 
T6 EQU ~o.:..os t.OOD R8 
IhPUT EQU $QL.10 t:lCS $I<', PH 
Rl2 fCU 'f0412 Ci·',::>8 :t65521 
RL EQU "S0414 ~LO St<.P17 
RlO f Ql_l <£0416 S 1(. P16 li.ODD !tl~ 
Ro ::wu ~041:.3 SKP17 S T =-· T6 
Ro :au S041A I SYNC 
Sd~ E :jU ~ 041C I•'• .,. 
··- I .,. STO I·' CUD 
ORG $F800 I CLP ' 1.1 
~~ 0 p I CLR 1 ' !j 
ORCC ~~;0101000(1 I LOA 1 ' X 
LOU ;,' 0 R001 I LQ~ 1 'y 
BEGIN CLkA I ·~u L 
STA FLAG I STD 2 'u 
LOA S!::M l LOA ' X 
BEQ FP.D I Log 1 ' y 
START L 0~ i!t1 I MUL 
STA FLAG I li.OOD 1 'u 
FtW LOY :H1C~JO I STD 1 ' iJ 
LOX ttT'1 L T F D I 5CC SKP1!:l 
LOA til I INC 'u 
ST~ STATUS ISK?l3 LOA 1 ' X 
S Y ~JC I L J~ ' y 
CLRll. I r,~UL 
STA STATUS I ADC!D 1 'u 
LOO HJPUT I )TD 1 ''-' 
BRA OV EP 
' 
RCC Si<. 0 21 
NEXT LOY i1 1·1Cf·ID I INC ' IJ 
LOX. iH1L TR R I S~P21 LOA ' '1. 
s Y r~c I LOR ' y 
LDD SA. V E I :1• UL 
•'• I ADDD '1_1 .,. 
OVER STD T12 I STO ' '-' 
SYNC I•'• ... 
ADDD ~~ 1 2 I LD.l 1 ,u 
RCS SKP12 I l 0?. Ill 5 
CMPD Jt65521 I 1·1UL 
5LO SI",P13 I AD80 2,U 
Si<-Pl2 AOQO 1t15 I c:cs SK?22 
SKP13 s Tf' T2 I Ct-1 P D :165521 
s '( ~~ c I E'.LC <;KP23 
STD T10 Is t<. P 2 2 AODO 1115 
:-;n1c .ISI".P(:'. ~, rn ?,IJ 
~1Jperictix-D u-lS 
•'• IS'<P35 ST8 SAv:: .,. 
LOA ,u I l 0A fLAG 
LDX t.T::;MP I cr., P A ~! 
CLR 'X I 0 • E0 ~ill L i 
CLR 1 , X I Cr-1 P A li 2 
CLP 2, X I ;l, EQ COi·JV 
L'"'" U..; :: 1 5 I LOD SAV: 
I'IUL I STD R:=s 
STf\ 'X I LB;<A ·. 8 ~ G! ~J 
LOA 
' X 1 c 'J r·~ v LDD s~v:= 
LOP.. :tt 1 5 ! STQ ']:JT PUT 
1·1UL I L3~A 57:GPJ 
A [) Cl [l 1 , X I•'• .,. 
.AODD 2 'u IMI.ILT INC FLAG 
:3CS SKP24 I LOX rtSA'/13: 
Ci1PD 1!65521 I LOY ll~ES 
BLCJ SKP25 ILOP15 CLR 'u 
Si<.P24 AODD itl5 I CL'< 1 ' IJ 
SKP25 S OJC I L ~Cl 1 ' y 
... I L'JP. 1 ' y . ,. 
SYNC I ~~u L 
AOCD R6 I ~. T 'J 2 'u 
~cs SKP26 I L 0 .~ ' X 
01PD !:65521 I L D?. 1 ' y 
BLO SKP27 I ~·11_1 L 
SI<.P26 AOOD ::15 I ADOO 1 'u 
SKP27 STD T8 I STD 1 'u 
S Y~JC I sec LiJ 0 16 
AODD R8 I INC ' u gcs SK?28 ILCJP16 L DA l , X 
C/-1PO :t65521 I L D~ ' y 
8LO SKP29 I ~1lJ L 
SKP28 A :IDD t<15 I AODD 1 'u . 
SKP29 STO T10 I ST~ 1 'u 
SnJC I ?,(( LfJPlq 
ADJD R 1 ·) I ItJC 'u 
~cs SKP30 ILIJP19 l DA ' X 
CHPD ::65521 I L o~. , y 
5LO SKP31 I PUL 
SKP30 .!1 DOD ~15 AODD ,u 
SKP31 SYNC STO ' u 
AO!JD P2 :~: 
BCS SKP32 LOA 1 'u 
CMPr'; ::65521 L DP.· It 1 5 
bLO SKP33 1·1UL 
SKP32 A'JnD 1:15 ADDD 2 'u 
SKP33 STIJ T12 .RCS LOPZO 
SYNC C.'-1 PO ~65521 
II. DOD R12 P,L 0 L0°21 
BCS SKP34 LCJP20 AODD if 1 5. 
(~1 p 0 tt65521 LJP21 STD 2 'u 
bLO SKP35 ... . ,. 
SI<.P34 .AJDU ~15 l8A 'u 
..• L:JX :t T :'I~ P . ,. 
Appendix-!! LJ-1~ 
CLD 
' X I STD SAV: 
CLt:? 1 , X I SYnC 
CL'< 2, X I LB?A NEXT 
L JS lt15 I•'• ...
r·1 UL IMLTFR FDS 8192 
STD 'X !MLTRR c: D P. 57?.31 
LOA 
' X 
'I::: 
LOP. ~15 I CJRG ~0000 
1·1UL I 1·1CNO c:og 0 
ll.ClJO 1 , X !P?ODl c:ce 0 
ADJU 2,U 1Pt;'C02 FC~ 0 
BCS LCP22 IPQ013 FC ['. C' 
C:J, p [) ~65521 I?~CC4 c:r~ . \.. '-' 0 
8L'J LOP23 I TU1P c:ce 0 
LLP22 ~DOD :115 IT:: 1·1 P 1 c:co 0 
LGP23 ST[l T12 !T:MP3 FCB 0 
SHJC IS~VE FO~ 0 
LDO R12 I FLAG FCB 0 
STD TO '-' IRt=S C:Qg 0 
s Yf\! c I•'• ... 
LD'J RS I 'lKI3 'SFFF: 
STO T12 ISTRT · EQU ~F80Q 
SYNC I E;I.J~ BEGIN 
LO~ P.12 , ..• .,. 
...... .. ............................................................................................................... "" ....................................... .J ..... • ..... • .... • .... • .... • .... • ..... • .... • ... • .... • .... • .... · .... • .... • .... • .... • .... • .... • ... • .... • .... • .... • .... • .... • .... • .... • .... • .... • .... • .... • .... • .. 
.. ,... .. ..... , .... , ... , .............. , .... , ..... , ................... , .... , .... , ..... , ................... , ................... , ............ , .... , ......... , .... , ......... , .... , ......................... ~ ........ , ......... , ..... ~ .. , .... , .............. ·.~ .. , ..... ~ ................. , .................. , ............... .. 
... 
... PROCESSOR NUMBER 8 
~ ~~~~~~~~~~~~~-~~~~--~~~~~~~~~-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
"'(" ,, .... , .... , ........ , ........................... , .... , ..... , .......... , .... , ......... , .......... , .... , .... , .... , ..... , .... , .... , ........ , ......... , ..... ~ .. , ................... , .... , .......... , .... , .... , ..... , ................... , .... , ............... , ............................. , .... , .... , ................... , ..... .. 
OUTPUT 
STATUS 
Tl3 
T3 
T'1 
T7 
INPUT 
Rl3 
R3 
R'9 
R7 
Stf'l 
•'• .,. 
B1:GIN 
START 
FRO 
f..JAI'1 
EQU 
ECJU 
EQU 
E\.JIJ 
=au 
EQU 
~QU 
EQU 
EQU 
EQLI 
Ewu 
EQU 
ORG 
NOP 
ORCC 
L8U 
CL 0 A 
STA 
l DA 
B E :~ 
LOA 
STL\ 
LDY 
LOX 
LOA 
68098 
!!;Q400 
!-0402 
30403 
$0405 
";0407 
$0409 
$0410 
$0412 
$0414 
'li0416 
$0418 
$041A 
$F800 
1fO,.;Q1010000 
ttPRDD1 
FLAG 
SE:M 
F ::> C• 
It ~. 
FLAG 
llf<~ChiO 
li f~L T F R 
1t1 
I rJEXi 
I 
OVF? 
I s ~~'- P 1 4 
STA 
SYNC 
CL?A 
5TA 
LOG 
~. R A. 
LOY 
LCJX 
S Y '•JC 
LDD 
ST8 
S'fNC 
~DD:J 
~cs 
CMP::J 
RLO 
ADDD 
STO 
SYNC 
SH' 
s Y ~Jc 
ADDD 
~cs 
C ~~ P D 
::J,LO 
ADDD 
STATUS 
ST~TUS 
I'~ PUT 
OVEP 
:t ~.1 c f·l 0 
;: :·1 L T ~ R 
S~VE 
T13 
Dl3 
SK 0 12 
~65521 
SKP13 
t' 1 r::: 
... -lj 
T9 
D9 
S K P 14 
~+6552l 
r, ~: p 1 (:', 
:q 5 
Appendix-~ G-17 
SKPlS ST') T7 LelA ' X 
SYNC L Cl8 ~15 
STD SAVE 'AUL 
LDD R7 ADJD 1 t X 
SUBD SAVE AODD 2t~ 
~CC SKP16 c,cs Sl\ 0 24 
A DOD tr65521 CMPO ::65521 
SKP16 S '!'f\IC :;>. LO SKD2'i 
... IS !<.P 2 4 AJOD ::1 5 ... 
STD HC t-10 IS'<P25 SYNC 
CLR , u I•'• .,. 
CLP. 1, u 1. SYNC 
LOA 
.!. ' X l S T C:' T7 
L Q[l, 1 ' y I S'I'NC 
~UL I )T~ SAVE 
S T rl z,u I LDD ?.7 
LOA 'X I s u Q, 0 s~v:: 
LOP. 1 , y I PCC SKP26 
MUL I AJClD :t65521 
.ODDD 1 'u ISKD26 STD T'j 
STQ 1 'u I S'I'NC 
sec SKP18 I ~ODD P9 
INC 
' IJ I PCS SK>'2S 
SKP18 LOA 1 , X I ("1P!) 'l65521 
L 06 
' y I !3LO SK. 0 2C1 
MUL Is 1<. P 2 8 ADIJD :t1c; 
ADDCl 1 , u ISKP29 S v t~C 
STD 1 , u A. DOD p 3 . 
13CC SKP21 9CS SK??.Q 
INC 'u C ~·1 PO 1165521 
SI\P21 LOA ' '( O,L'J S K P 31 
LOB , y ISKP30 AJOO !i 1 c; 
/-',IJL ISK.P31 SEI TD 
AOOD 'u 
' 
S H!C 
STO 'u I A!J8D R13 
... I ~cs S K 0 3 2 . ,. 
LOA 1 , u I (~P[' 116::.521 
LOR :J 1 5 I BLQ SKP3? 
~1UL Is 1(.? 3 2 .~ODD li15 
a.D:JD z,u I•'• ... 
ecs .SKP22 ISI(P33 SF' SC.Vc 
(!"lP[l l:i65521 I l DA I= LAG 
eLO S!<P23 I C t-1PA :t1 
SK P 22 Af)')O ~15 I 3!:~ ~1UL T 
SKP23 STD 2 'u I C1'1P A ::z 
•'• I SEQ CONV .,. 
LOA ,u LDD SA'v'~ 
LOX :t T EIW I STD !;1:S 
CLR 
' X I LSRA 8F.GIN 
C LR 1 , X ICDtJV LD'J SAVF. 
CLP. 2, X I STO Oi.JTDI.JT 
.L DR t: 1 5 I L ~PA P c: ,- T 1-1 ,J ~ u .• 
~~ U L , ... . ,. 
srn 
' X !H~LT 'PJC F LAI· 
Appendix-D J-1? 
L JX ,:SAVE MUL 
LOY trPES s Tr) ' X 
LOP15 CL~ , u LOA x· , .. 
CLR 1, u LOS 1115 
LOA 1 'X ~·WL 
L~':. 1 , y A:J::JD 1 , X 
MUL tll)[)Q 2,U 
STD ;::,u r: c s LD 0 22 
L DA 
' X Cf.1PO :t6552l 
LOa 1 , '( PLO LCJP23 
"1UL ILOP22 A'JDD ::15. 
AD1D 1 'u I L0P23 SFl Tl~ 
STO 1 , 'J I S Y !JC 
sec LOP16 ! L:J:J ;;:'9 
INC ,u I STD T7 
LuP16 LDC. 1 , X I LDCl Ql~ 
LDD , y I STD T9 
~wL I SHiC 
AOOD 1 'u I L O'l R7 
STD 1 'u I STD SAVE gee LOP19 I SYNC 
INC 'u I S Y~
1 C 
LCP19 LDA 
' X I LBRA N:XT 
LDR , y , ... .,. 
MUL lt~LTFP 1=')'3 45457 
ADDD , 'J I i~ L. T R R ~=os 37975 
STO 'u 
, ... 
... 
... I . ,. O~G soooo 
L')A 1, u 1 r-1:No FOB 0 
LD5 11'15 IPR'j01 c:cg 0 
MUL IP~C')2 !=(~ 0 
AuOD z,u JPR003 1=(8 1'1 C' 
RCS LQP20 IP:;>JO-+ c:cs 0 
OlPO #65521 l T:: r-1 P I=(S 0 
3LO LOP21 IT :=r~ P 1 r::cB 0 
LOP20 ti.DD::l ~15 IT::MP3 FC B 0 
LuP21 STD z,u ISAV~ FO~ (1 
•'• !FLAG FCP 0 ... 
LDA 'u I RES c:os 0 
LOX tiT:l'IP !•'• ..
Clq 
' 'I( I ORG s-c:r::::: r- . -
CLI:' 1 , X ISTRT 1= r' 11 -'>,c. ~FSOO 
CLR 2,X I END 8:GIN 
L D.O. !il5 I::: 
•'• ~-~~~~--~~~~~~~~~~~~~~~~~-~~-~~~~~-~~·~~~~~~--~~~~~~--~~-~---~ .,. ..., ..... , ................ , .. ~ ......... , ............... , .... , .......... , .................. , ..................... , .... , .......... , .... , ........................ , ......................... , .......... , ......... , .... ,~ .. , .... , ........ , ... , .... , ....... , .... , ......... , .... , .... , .... , ... , ... , .... , .... , ........ , ......
•'• •'• PROCE:SSOR N lJit, 8: ~ 9 ... ... . ,. . .. 
!:: ~~-~~~~~~~~~~~~~~~~~-~~~-~~~--~-~~~~~~~~~-~~-~~~-~~-~~~~~-~~-~ ....... , .. ,, ......... , .... , .... , .. ~ ..... , .............. , .... , .... , ............................................. , .......... , .... , ......... , ......... , ......... , .... , ............. , ......... , ....................... , ......... , ........ , ........ , .... , .... , ........ , ................. , ......... , ...... , .......
~~A·~ 68(199 I HI PUT f '.JU $0410 
our'ouT [ IJU $0400 IP14 fQU ~0412 
SlATUS E ')ll $0402 184 E 'JU $0414 
Tl4 F.QU i0403 I R 8 F.QU ~0411: 
T4 ::wu ~04 1)5 IR17 f.QU ¢,(14lq 
Hl EQU $0407 IS':M ::au <;Q41A 
Tl7 ~QU $0409 , ... ..• 
Appenclix-0 
INPUT EQU $0410 I ~1 UL 
Rl4 E QU 50412 I )TD z,u 
R4 EQU $0414 I LJA 'X 
R8 ::Qu ~0416 I L D~. 1 ' v 
Rl7 cC/U $0418 I ~,UL 
s c t~ f r,JU $Q41A I ADDO .1 'u 
-·· I STD 1 'u .,. 
ORG $F800 I qcc SKP16 
NDP I INC ,u 
ORCC ,:~~01010000 ISKP16 LOA 1 , X 
LDU ~PR'JDl I L D~ ' y 
BEGIN CLPA I ~~ UL 
STA FLAG I ADDO 1 'u 
LOA s-=: i·l I STC 1 'u 
BEQ FRJ 3CC SKP19 
START LDA ;il II\4C 'u 
STA FLAG SK?l9 LOll ' X 
F f; 0 LDY iiMCNO LOR ' y 
LOX #MLTFt< ~~ UL 
LOA Ill .lDDD 'u 
STA STATUS STC! ' 'J 
SfNC -·· '•' 
CLRA LOA 1 ' u 
STA STATUS LJB :215 
L:lD HJ?UT ~UL 
BPA OVER AOOD z,u 
NtXT l~Y :: r-1 c r-~ o RCS Sl\ 0 20 
LOX t: ~1L TR R CMPO ~655.?1 
s n:c ~LO SKP2l 
L DO $1\VE SK?20 AODD ttl5 
-·· 
$!<?21 STO z,u .. ., ... 
OVER STO Tl4 ·'· .,. 
s VN( LDA 'u 
AODLJ ~14 I LDX u.T ::' !-1P 
8CS SKP12 I CLt:> ' X 
CI1PD t:65521 l r: L R 1 , X 
t\LO SKP13 l CLR 2, X 
SKP12 ADOO 1:15 I L oe. 1115 
SKP13 STD T4 I ~~ UL 
SYNC I STO ' X 
STC' T8 I LO~ ' ;: 
S Y ~JC I LIJB ttl5 
sugD RS I r-~ u L 
r::.cc SKD14 I ADOD 1 ' .'( 
!\ J DD P65521 · I AOOO z,u 
SKP14 STO Tl7 I 11CS SKP22 
SYNC I O!PD tt65S.?l 
s '( tJ c I ~LC S K P 2 3 
-·· 
ISKP22 
'•' 
AD::l!J l 1 ~. 
~. r n rAe r·J n !SK.f'.'?. s Y rJ c 
( U' 
'u 
I,., 
C L ~· ] , ! I I ~ v t,IC 
L :) A 1 , X I c.ooo C'l7 
l J p, 1 ' y ~ c 5 SK?24 
Appendix-[") J-20 
CMPD !165521 ILCJP16 LOA 1 , X 
3LC1 SKP25 I LD'\ ' v 
SKP24 ADDU 11'15 I ~1UL 
SI\P25 syqc I AOQD 1 'u 
STO TB I S El 1 ' 1J 
SYNC I ~.cc L:JPl'j 
STD SAY>:. I INC ' lj 
LDD ~8 ILOP19 LelA '.v 
SU80 SAVE I LD3 ' '( 
BCC SKP26 I 1·1UL 
A ODJ tt65521 I ACDD 'u 
SKP26 SH<C I STD ' LJ 
A ODD R4 I•'• ..•
5CS SKPZ'l I LOA 1 '!.! 
C I~ PO #65521 I LOB !!15 
bLC SKP2g I I~UL 
SKP28 A DDO 1: 1 5 I !1000 2 'u 
St<.P29 STD T14 I 0. c s LOD2Q 
S HIC I Cf-1P0 1165.521 
.A DOD R1't I PLO LOP21 
BCS SKD30 IL·JP20 AOOO ~15 
C~'-1PJ .tt65521 IL'JP21 srn 2 'u 
8LO SKP31 I ... .,. 
SKP30 1\ JDO ~15 I . L 0~ 'u 
... I LOX t'T~HP . ,. 
SKP31 STO SAVf- I CLP ' X 
L 0.!\ FLAG I CLP ! ' X 
C1·1PA 1t1 I (LQ 2 ' '( 
BEQ t~ IJ L T I LCJ6 ~ 1 'i 
CMPA 112 I '-'UL 
'3EQ CONY I s T[l ' X 
LDD SAVE I L:J:- ·y ' ' 
STD RF.S I L D~ .<:15 
L8RA er:GIN . I ~UL 
CUNY LDD SAVE I ADDD 1 , X 
STD OUTPUT I ~OJD 2 'u 
LPRA 3EGIN I ?.C S LOP22 
!::: I c ~~ p [J #6S521 
~~uL T Ti'JC FLAG I BL:J LJP23 
LOX #SAVE ILOP22 A ClOD li 1 c:; 
L DY ,rtRES ILJP23 STO T8 
LUP15 CLC/ ' IJ . ! SP!C 
CLR 1 'u I s '(!\! c 
LOA 1 , X I LDD R17 
LOg 1 'y I STJ T14 
t~UL I LDD P8 
STD z,u I SE'· T1 7 
L Cl a. 
' X I SYNC 
L:J3 1 'v I LDD Pl4 
t<~UL I STD SAVE 
ADOO 1 'u I S Y :JC 
STQ 1 ' IJ I LB?I.\ t--; EXT 
3(( LOP16 I:;: 
I~C 
' t..J 
IMLTFP F8~ 2531! 
Appendix-[! D-21 
MLTRR FD8 2 4 5.21 ITEMP1 f=C5 0 
~:: ITC:MP3 ~c? 0 
o~r, $0000 !SAVE FO;:l, 0 
MCND FD5 0 !FLAG FC'3 ') 
PRC:81 FC8 0 IR~S FOr. 0 
Pf\002 FCB (\ I~: 
PR003 FC f.. 0 I :JRG ;;I=FFE 
Pk004 FCB 0 ISTKT :: (~ !j ~~=eoo 
TEMP FC~ 0 I :No 0 E G I~~ 
•'• ~~~~~-~--~~~-~~--~--~~·--··-~~~~--h~~~-~---~~--~~~--~----~-~~h .,. .. ........... , .... , ..... ~ .. , ...... , .............. , ............. 'I'" ....... , .... , ......... , .... , .... j' ........ , .......... ~ .. , ....... , ............... , ............ , ... , ............... , ......................... , .................... , ............. , .... , ............... , .............. , .. ':' .................... , .. 
... . .. PRCCESSJR ~.JU r~ e E ~ 10 ::: .,. ., . 
... ~**********n***~*~****~*~****~**~*********************~******* .,. 
N A'·1 680910 Ct-! 0 0 li65521 
OUTPUT EQU $0400 BLO SK P 13 
STATUS EQU $0402 ISKP12 4:::JDD 1:!15 
TlS EQU ~0403 ISKP13 STD TS 
T:> E r~U $0405 I S Y !·~C 
T7 EQU $0407 I STD T7 
Tl7 !:QU $0401 I SYNC 
rr~PuT EQU $0410 I STO SAVE 
Rl5 ::QU $0412 I L ·JD R7 
R5 E;;)U $0414 I SU?..D SAVE 
R7 E :)U $0416 I RCC SKP14 
Rl7 EQU $0418 I A DOD ~65521 
SE:M EQU ·~041A ISKDlf+ ST9 T17 
... I S YtJC . ,. 
ORG $F800 I snJC 
NOP I•'• ...
Ct;>CC :t~~OlOlOIJOO I STO r~c ~J o 
L :.lU 1tPRQ:J1 I CLR ,u 
BeGIN CLRA I CLR 1 'u 
S T .~ FL AS L iJ A. 1 , X 
LOA s~M 
-'' 
I LOS 1 ~ y 
6EQ Ft.lD I \~ 'j l 
START LOA ::1 I ST[l 2,U 
STA FLAG I LIJ t!. ' X 
FRO LOY ~tMC rw I L oo, 1 'y 
LOX ~HLTFt:/ I .'·~ Ul 
LOA ltl I ADC1Cl 1 'u 
STA STATUS I STJ 1 'u 
SYf..JC I ~cc SKP16 
CL?A I INC 'u 
ST~ STATUS ISKP16 LOA 1 , X 
LD0 INPUT I LD3 ' y 
5RA ::JVER I 1·1UL 
Nt:XT L8Y t:HOJD I AODD 1' u 
LOX ~i'J.LTRR I ST~ 1 ':J 
s V~JC I sec S!<. 0 lg 
LDO SAV~ I :NC ,u 
•'• IS!<.Pl9 LOA t '( ..• 
OVER STD T15 I l [)Cl ' y 
SYNC I MUL 
1\D:.JJ Dl5 I .0. :J Q J 'u 
scs SKP12 I ST(I ,u 
I'- J ? 
..J ~--
.,. Is tc. P 3 o AOGO ~ 1 5 
LJA 1 'u I·'· •.• 
L 0 ~· !ot15 Is k'. P 31 STCl s .tJ. v r: 
r~UL I LDA Fl A', 
.A;) Or) 2,U I C ~~ P A It 1 
BCS SKP20 I o. E ~ t·lUL T 
UiPD t;65521 I C:~P A t:2 
SLO SKP21 i3EQ ~r~JV 
SKP20 AI)Ou 1t15 LJD SAVE 
SKP21 ST') 2,1_1 t"T'"> ..) I~ ~t=S 
•'• L:PA 5::GIN ... 
LDA 'u IC'JNV LD0 s~v~ 
LJX ttTf:MD I STfl oun>ur 
CL~ 
' X I L::?A ;::, = ,- T \1 ,_ ~ " - I'> 
CLR 1 , X I•'• .,. 
CLR 2, X I r~UL T !NC FL..C.C-
LOR :t15 I LOX ;:s:.v:: 
'~UL I L ::lY Z>~i:<; 
STD 'X IL2P15 C L o 'I ' .__ 
LDL\ , X I CL~ 1 I lj 
LD5 ttlS I L::JA 1 t y 
~1U L LD5 1 , y 
ADDD • 'i J. , .. r·tUL 
AODD 2,U ST!1 2 ~ l_l 
f1CS SKP22 L~H , X 
c;.~ P D :it65521 LD? : ' y 
BLLl SKP23 HUL 
SKP22 .D. D'JD :t15 ADDD 1 , u 
•'• S T:::• 1 ' ll •,• 
Sl\ 0 23 SP.JC. sec l0~16 
SHJC DJC . t u 
S L'~D :::17 ILOP16 LOA 1 ' '( 
BCC SKP24 I LO::. 'y 
A~:JD 1:'65;,21 !'·lUL 
$1\P24 SYNC LI:JllO 1 ' IJ 
STCJ T7 SHl ~ 'u 
s n-:c ~cc LCe>lg 
STO SAVE I •\I C 'u 
L:JD R7 !L0?19 LOt. ' X 
SU?.D SAVE I LO~ ' y 
3(( SK?26 I 'A Ul 
:..ooo tt65521 I ~ODD , u 
$K 0 26 s y ~J c I S T:'· t u 
,\ D D D RS , ... •.. 
P:CS SKP22 I L QA. 1 II· - t .__ 
01Pll :t65521 ! LOS !i15 
t) t ~ 
.. ' L. u SK029 I '·1 UL 
SKP28 .A Jr)[) ~15 I t1ClDD 2,U 
SI\P29 srr T15 I ~ c s LOP2(' 
SPJC Ut?O .tr6<;521 
.A·JDLl R 1 5 u. LG LCJ 0 21 
BCS ~·.'\P30 ILJP28 .1 020 ~ 1 5 
('r-JPO rt65521 ILCIP21 ~· T [' ? 'u 
bLU SKP31 I•'• ..•
Appenciix-D !J-23 
LCJP22 
LOP23 
•'• 
.,, 
L~A 
L C'' X 
Clt:i 
CL~ 
CLR 
LD6 
t·1 'J L 
STC\ 
L JA 
L 0:0 
:~ UL 
A~~o 
AOOO 
scs 
c '-1 p [J 
3LO 
llODD 
STJ 
SYNC 
s n~c 
SYNC 
S Y ~IC 
L:JD 
, u 
~tT::~lP 
'X 
1 , X 
2, X 
1J'!5 
' X 
' X 
~15 
1 , X 
2, u 
LOP22 
:465521 
LOP23 
R17. 
I 
I 
I, .. 
.,, 
I i·1L T F ~ 
I "'1L T R R 
I-·· .,, 
I 
It~ c r,JC 
IP~J~1 
ID?iJC2 
IPPGD3 
IP::l:.J['4 
IT~MP 
I T:: 1·1 o 1 
ITc'~?3 
ISAVf. 
I FLAG 
IRES 
I ... 
.. 
I 
ISTr:T 
I 
I ... 
'•' 
SF' S-lv: 
L3RA f·l: X T 
FOS ~6~17 
c:c~ 2312.? 
QRG $000') 
<=CJB 0 
t:'(£1 0 
:=cg 0 
r:c~ 0 
:= c ;:>, c 
;: c :. [l 
cc~ 0 
F(8 0 
c:og 0 
F~C 1, ,, I) 
F D?, 0 
O~G SFFF:' 
E;:)U ;:=qoc 
f. ~10 3 E G I~~ 
~~~~~~~--~~----~-~-~----------~-----~---~-~----------~--~---··-~ ~ ............ , ......... , ............... , .. ~, .... , .... , .... , .... , .... , ......... , .... , ......... , ............................. , ......... , ... , ..... , ...... , .. ,, .............. , ..... , ... , ....... ":" .. , .... , .................. , ..... , .... , .......................... , ........ , ......... , ...................... , .. 
PROCESSJR NUVBEQ 11 
..... .. ............................................................................. ,.t., ............................ ·~ .................. ·~ ..... "'~ .. ·~ .. ·~ .... ~ ...... "'~ ....................... ~ "'' ........................................ ·~ ... ~ ... ., ........................ "'" ..... "''" .......... J • ..... ~... ... ......... "'r"' .. , ........... , ............... , .......................................... , .. "•" .. , .................. '"r"' ........ , ... , .... , .... , .................................... , ..................... "•" ......... "•" .............. , ...................................... , ............. , ................. .. 
OUTPUT 
·STATUS 
T6 
T12 
INPUT 
R6 
Rl2 
ScM 
BEGIN 
START 
ff<O 
~JAM 
EQU 
EQU 
EJU 
EQU 
E ~~u 
EQIJ 
f. I~ U 
E .. ~u 
O~G 
NO? 
ORCC 
LOU 
CLRA 
STA 
L DA 
c:. E Q 
LCIA 
STA 
LOY 
LOX 
LOA 
STA 
SYNC 
CLRA 
STA 
LDD 
680911 
$0400 
~0402 
$0403 
$0405 
$0411) 
!.0412 
$0414 
!-Ot..16 
$F800 
!1';;01010000 
~PROOl 
FLAG 
SEM 
F~D 
#1 
FLAG 
~1·1CND 
~H·ll T FR 
~1 
STATUS 
STATUS 
INPUT 
! 
I NEXT 
I 
I 
I 
I , .. •.. 
lOVER 
I 
I 
I 
I 
I 
I 
ISKP12 
SKP14 
•'• .,,
SKP15 
3?-A 
LOY 
LJY 
SYNC 
LDD 
STD 
SYNC 
STD 
LDD 
SUSD 
F..CC 
~DOD 
SYNC 
SYNC 
s yr..; c 
SYNC 
OVED 
l:f-1 c "J LJ 
11 ·~ L T R ~ 
S.!l V'f 
T, 
.o 
2AVE 
R6 
SAVC: 
SKP12 
jt6~521 
AODD Pl2 
?.CS SKP14 
c f~ p 0 
o. L(:' 
ADDQ 
STD 
Clkl 
CLR 
LOA 
Lc:'•g 
1165521 
SKP15 
:t15 
~10lC· 
,u 
1 , u 
1 , X 
1 ' y 
Appendix-0 D-24 
MUL SYNC 
STD z,u SYNC 
LOA 'X STD T6 
LOB 1 'y SYNC 
MUL STD SAVE 
ADOD l,U LDD R6 
STD 1,U SUBO SAVE 
BCC SK 0 16 sec SK 0 24 
INC ,u ACJOD 1'!65521 
SKP16 LOA 1 , X ... ..• 
LOS 'y SKP24 STO SAV~ 
MUL LD~ FL~G 
ADCJO 1 'IJ C~1PA ill 
STO 1,U BEQ MULT 
BCC St<P19 CMPA 1Z2 
It\JC ,u g;:Q CONV 
SKP19 LOA 
' X LDD snv!= 
L DB 
' y STCJ RES 
r-~ UL Li3RA 5EG!N 
ADOD 
' IJ I c :F~ v LDD SAVe ST:J ,u I ST~ OUTPUT 
..• I L5RA SEC, IN . ,. 
l DA 1,U I•'• .,. 
LD~ ~15 I i·1UL T INC FLAG 
r,1UL I LOX #SAV= 
ADDu 2,l! I L DY tiRES 
.:. c s sf<', p 2 0 ILi!P15 CLR 'u 
CMPD ~65521 I CLR 1 'u 
blO SKP21 I L Ll:. 1 , X 
SKP20 AOQD tt15 I L 03 1 ' y 
$!(_P21 STD 2 'u I '·1UL 
•'• I STO 2 'u ... 
L ::JA ,u I LOA ' X 
LOX #TE~P I L08 1 ' y 
CLR 
' X I r.IUL 
CLP. 1 , X I 0.000 1 'u 
(LD 2, X I STO 1 'u 
LCJB ttlS I !1(( LOP16 
MUL I INC 'u . 
STO 
' X ILOP16 LOA 1, X 
LOA ' X I LOB ' v 
L 08 tt15 '1UL 
!'1UL AODD 1 'u 
t~ D DO 1, X STD 1 'u 
A.ODU 2 'u BCC L:JP19 
5CS ~KP22 !NC 'u 
C r~PD #65521 LOP19 LOA ' X 
SLCI SKP23 LD5 ' y 
S~<-.P22 A::J'JO ::15 ~UL 
SKP23 SYNC AOOO 'u 
·'• STD 
' IJ .,.
STD T12 •'• .,. 
SY"JC LOA 1 , L! 
S Y "IC ~OR iilc; 
Appendix-0 D-25 
MUL I LDD R6 
AODD 2,U I ST:J SAVE 
6CS LOP20 I SYNC 
01PD #65521 I S Yf>IC 
gLQ LOP21 I SYNC 
LOP20 AODD #15 I LBRA ~J =X T 
I,.UP21 ST8 2,U I ... .,. 
... IMLTFR F08 16087 . ,. 
LOA ,u IMLTRR FOB 29504 
LOX ttT'=MP I·'• .,. 
CLR , X I 'JRG $0000 
CLR l,X t-1CNO FOB 0 
CLR 2, X P~001 FC9 0 
LD~ #15 PR002 r:cA 0 
~1UL PROD3 FCB 0 
STD t X PT<004 FCB 0 
L DA t X TEMP FCB 0 
LOB #15 TE14P1 FCB 0 
f--1UL r::r4P3 r:cR 0 
AOOO 1 t X SAVE FOB 0 
A~DO 2,U FLAG FCB (\ 
BCS L0°22 RES FOB 0 
CMPO #65521 •'• ... 
BLO LOP23 ORG $FFFF. 
LUP22 AD~O 11115 STRT EQU $FI300 
LOP23 STD T6 END REG IN 
SYNC •'• .,. 
•'• 
~~~~~~~~~~~~~~~~-~~~~~~- ~~~~~~~~·~~~~~·~~~·~~~~~~~~·~·~·~····· 
... 
¥~¥¥¥¥¥¥¥¥¥¥¥-~¥¥¥¥¥¥~¥¥¥~ ...... ~-¥¥¥¥¥¥¥¥¥¥¥¥~-¥¥¥~~¥¥¥¥¥¥-¥¥¥-¥¥ 
* 
.... PROCESSOR NUr-'8ER 12 J • . ,. .,. 
·'· ************************************************************** .,. 
NM~ 680912 IFRJ LOY It~~ Ci~ 0 
OUTPUT EQU $0400 I LOX ~MLT:=R 
STATUS E QU ~0402 I LOA ttl 
T7 E QU $0403 I STA STATUS 
T15 EQU $0405 I SYNC 
Tl3 EQU $0407 I CLRA 
Tl1 EQU $0409 I STA STATUS 
INPUT EQU $0410 I LDD INPUT 
R7 E QU ~0412 I 8 RA OVER 
Rl5 . EQU $0414 I NEXT LOY #MCND 
Rl3 EQU $0416 I LDX #MLTRR 
Rll EQU $0418 I SYNC 
. S E f·1 E QU S041A I LDD SAVE 
•'• 
I ... 
.,. .,. 
ORG $F800 lOVER STD T7 
NOP I SYNC 
ORCC #~:01010000 I STO SAVE 
LOU #PROD1 I LDD P.7 
BEGIN CLRA I SUBD SAVE 
STA FLAG I BCC SKP12 
LOA S ::M I ADOD :r65521 
BEQ FP.O ISK.Pl2 SYNC 
START LOA Ill I STO T15 
S T.A FLAG I SYNC 
Appendix-0 J-26 
FRO LOY 4'r~CN 0 I LOB ' y 
LOX #MLTFP. I MUL 
LOA #1 I A.DD8 1 'u 
STA STATUS I STD 1 'u 
SYNC I sec SK 0 21 
CLRA I INC 'u 
STA STATUS ISK.P21 LOA ' X 
LDD INPUT I LOB 'y 
gR A. CVER I MUL 
~J tXT LDY #M·CND I AOOO ,u 
LOX P.~1LTRR I STO 'u 
SYNC I ::' 
LO[l SAVE I LOA 1 'u 
•'• I .,.. LOB .lt15 
OVER STD T7 I r~UL 
S 'fNC I ADDD .:.,u 
STQ SAVE I BCS SK.P22 
LDD R7 I CMPO zt65521 
·SUBD SAVE I ~LO Si<.P23 
BCC SKP12 ISKP22 ADDD lt15 
AJOD lf65521 ISKP23 . S T[) 2 'u 
SKP12 SYNC I -·· ... 
STD T15 I LOA 'u 
SYNC I LOX ~TEt-IP 
ADDD 1:115 I CLR ' X 
BCS SKP14 I CLP 1 , X 
CMPD #65521 I CLR 2 ' )( 
BLO S K P15 I LOA 1115 
SKP14 AODD ~15 I MUL 
SKP15 STO T13 I STO 'X 
SYNC I L 8A ' X 
AODO R13 I L 08 It 1 5 
BCS SKP16 I I~UL 
CMPO #65521 I AOOD 1 , X 
BLC S.K P 17 I AODD z,u 
SKP16 AODO ti15 BCS SKP24 
SKP17 STD T11 CMPD lf65521 
SYNC BLO SKP25 
·'· SKP24 AODD IHS .,. 
STO MCNO SKP25 s Y :~c 
CLR ,u •'• .,. 
CLR l,U SYNC 
LOA 1, X AOOO Rll 
LOB 1 ' y BCS SKD26 
MUL 01ro 1165521 
STD 2,U SLO SK 0 27 
L DA. 
' X S K P2 6 ADDD ltl5 
LOR 1 ' y SK.P27 STD T13 
MUL SYNC 
11.000 1 'u AOOD P.l3 
STO 1 'u BCS SP'28 
BCC SKP18 CMPD 1165521 
I~·~ C 'u BLO SKP29 
SKP18 LOA 1 , X SK.P28 ll.ODO .:il5 
' 
l 
Appendix-D D-27 
SKP29 STO T15 I "'1UL 
SYNC I ADOD 'u 
ADDD R15 I STO ,u 
BCS SKP30 I::: 
01PD ,65521 I LDO 1 'u 
3LO SKP31 I l DB ;115 
SKP30 ADDD ~t15 I ~WL 
SKP31 SYNC I AODO 2,U 
STO T7 I BCS LOP20 
SYNC I CMPD #65521 
STO SAVE I BLO LOP21 
LDD R7 ILOP20 ADDD It 1 5 
SUSD SAVE ILDP21 STD 2tU 
BCC SKP32 •'• .,. 
ADDD ~65521 LOA ,u 
·'· 
LOX :tTEMP .,. 
SKP32 STD SAVE CLR t X 
LOA FLAG CU' 1 , X 
CI~PA ;,1 CLR 2tX 
BEQ MULT LOB 1t15 
UIPA 112 MUL 
BEQ CONY STD ' X 
LDD SAVE LOA 'X 
STO RES LOB 1:15 
L BRA BEGIN I~UL 
CONV LOO SAVE ADDD 1 t X 
STD OUTPUT ADDD 2tU 
LBRA BEGIN P.CS LDP22 
)'• 
.. 
CMPD it65521 
MULT INC FLAG ~LO LOP23 
LOX #SAVE LOP22 ADDO 1115 
LOY #RES LDP2 3 STD T7 
LOP15 CLR ,u SYNC 
CLR 1 'u LDD R7 
LOA 1 t X STO T13 
LOB 1 'y SYNC 
MUL LOD R 1 3 
STD 2,U STD T7 
LOA 
' X SYNC 
L!JB 1 'y LDD rn 
~1UL STD SAVE 
A!JDD 1 'u SYNC 
STO l 'u LBRA NEXT 
BCC LDP16 ... . ,. 
INC 'u MLTFR FOB 29032 
LOP16 LOA 1 , X MLTRR FOB 28641 
LOB 
' y 
... 
... 
MUL ORG !0000 
ADOO 1 ' IJ I~C NO Ffif\ 0 
srn l , I J PRCJOl ,: c !'. f) 
BCC LOP 1 '-J P~UrJ2 F(.f1 () 
INC ,u PP.003 FCR 0 
LUP19 LCIA ' )( IPROD4 FCB 0 
LDB 'y !TEMP FCB 0 
Appendix-0 0-28 
TEMPl FCB 0 I ... . ,. 
TH1P3 FCR 0 I ORG $FFFE 
SAVE FDS 0 ISTRT EQU $F800 
FLAG FCP. 0 I END BEGIN 
RES FD5 0 I t.: 
•'· ****~*******************************~************************* .,.
•'• •'• PROCESSOR NUl-~ BE R 13 ::: .,. .,. 
·'· ************************************************************** .,. 
~J A~l 680913 I AOOO D14 
OUTPUT EQU $0400 I 8CS 5KP14 
STATUS EQU $0402 I CMDO .tt6552l 
T8 EQU $0403 I SLO SKP15 
T14 EQU $0405 ISKP14 ADOO :t15 
Tl2 EQU $0407 IS K P15 SEl Tl2 
INPUT EQU $0410 I SYNC 
Rtl ~QU $0412 I STD SAVE 
R14 EQU $0414 I LDD ~12 
Rl2 EQU $0416 I S:.J8D SAVE 
SEM EQU $0418 I BCC SK P 1 A 
•'• I ADDD 1165521 .... 
ORG $F800 ISKP16 SYNC 
NQP I•'• .,. 
ORCC #~~01010000 I STD r~OJL:l 
L DU i*PROD1 I CLR ,u 
BEGIN CLRA I C LR 1 'u 
ST~ FLAG I LOA 1 , X 
LOA SEM I LOB 1 'y 
BEQ FP D I 1·1UL 
START L DA #1 I STD 2,U 
STA FLCIG I LOA 'X 
FRO LOY ,:tMCNO I L 08 1 ' y 
LOX "MLTFR I MUL 
LOA #1 I A. ODD 1 'u 
STA STATUS I STD. l,U 
SYNC gee SKP18 
CLRA INC ,u 
STA STATUS SKP13 LOA 1 , X 
LOD INPUT L 05 'y 
BRA OVER ~1UL 
NEXT LOY #MCND AOOD 1 'u 
LOX #"''LTRR STD 1,U 
SYNC ~cc SKP21 
LDO SI\VE INC ,u 
•'· SKP21 LOA 'X ...
OVER STO T8 LOB ' y 
SYNC '·1UL 
STO SAVE AOOD ,u 
LDO R8 STO 'u 
SU80 SAVE •'• .,. 
BCC SKP12 LOA 1 'u 
ADOO ,65521 LDC:. 1115 
SI\Pl2 SYNC "1UL 
STD Tl4 AODD 2 'u 
SYNC BCS SKP22 
Appendix-0 0-2° 
CMPO Ji65521 I LDO SAV:: 
BLO SKP23 I STO RES 
SKP22 ADDD ~15 I L8°A 8EGIN 
SKP23 STD z,u ICDNV LDD SAVE 
::: I STD OUTPUT 
LOA ,u I LBRA B=GIN 
LOX #TEMP I•'• ...
CLR 'X IMULT !NC FLAG 
CLR 1 , X I LOX 1iSAVE 
CLR z,x I LCJY If RES 
LDfl 1115 ILDP15 CLR ,u 
l'-11J L I CLR 1 'u 
STD ,x I LOA 1 , X 
LOA 'X I LOg 1 'y 
LOB #15 I MUL 
r-wL I STO 2,U 
ADDD 1 , X I LDA ' X 
AD!JD z,u I L.DB 1 ' '( 
BCS SKP24 I MUL 
CMPD til6'5521 I .A DOD 1 'u 
RLO SK P 2 5 I STD 1 'u 
SKP24 AODD #15 I RCC LOP16 
SKP25 s y ~J c I INC ,u 
~- ILOP16 L DA 1 , X ... 
SYNC I LOe ' y 
STD T12 I I~UL 
SYNC I AOOD 1 'u 
STD SAVE I. STO 1 ' IJ 
LDD R12 I BCC LDP19 
SUBD SAVE I INC 'u 
BCC SKP26 ILDP19 LOA ' X 
ADOD #65521 I LOB ' y 
SKP26 STO T14 I MUL 
SYNC I AO!JO 'u 
AO!JO Rl4 I STD 'u 
BCS SKP28 f.::: 
CMPO #65521 I LOA 1,U 
BLO SK ?2 9 I LOB 1115 
St<..P28 AODO #15 I ~UL 
SKP29 Sl'NC I ADDD 2,U 
STD TB I P..CS LDP20 
SYNC I CMPO 1165521 
STD SAVE I BLO LOP21 
LDD R8 ILOP20 ADOD It 1 5 
SUBD SAVE ILOP21 STO z,u 
BCC SKP30 I ... ... 
ADOD #65521 I LOLl. 'u 
... I LOX :tTE~P ... 
SKP30 STO- SAVE I CLR 'X 
LOA FLAG I CLR 1, X 
C 1-1PA lil I CL?. 2 , X 
R=Q MULT I LD'3 us 
Cr-1PA #2 I MIJL 
BE(J CONV I STD 'X 
Appendix-D D-30 
L DA 
' X 
... 
... 
LOB Ill~ MLTFR FOB 8748 
MUL 14L T R R FDS 12521 
ADOD 1, X ... . ,. 
ADOD z,u ORG ~000(1 
BCS LOP22 MCNO FOR 0 
CMPD t£65.521 PR001 FCB 0 
8LO LOP23 PR002 FCB 0 
LOP22 l'l.ODD ii15 PR•J03 FCB 0 
LOP23 STD TS ?~C04 FCB 0 
SYNC T Fr-1P FCB 0 
LDD R8 TEMP1 ~cB 0 
STD Tl4 TEMP3 ~cs 0 
LDD Rl4 S.GVE ~os 0 
STD T12 FLAG FC3 0 
SYNC RES FOB Q 
LOD Rl2 :;: 
STD SAVE ORG $FFFt 
SYNC STRT '=QU ~F80C 
SYNC END SEGIN 
LB~A NEXT ... ..• 
..• ***********************u************************************** ... 
... ... PROCESSOR NU1·1BEC1 14 J • . ,. . .. .,. 
•'• ************************************************************** .,.
N.l'l.M 680914 I SRA OVER 
OUTPUT EQU $0400 I NEXT LOY #1·1CND 
STATUS EQU $0402 I LOX ~MLTRR 
T9 EQU $0403 I SYNC 
Tl3 EQU $0405 I LDO SAVt: 
T18 EQU i0407 I•'• ... 
INPUT E ')U $0410 lOVER STO T9 
R9 EQU $0412 I SYNC 
R13 EQU $0414 I STD SAV: 
Rl8 EQU $0416 LDD P9 
SEM EQU $0418 SUBD SAVE 
... BCC SKP12 . ,. 
ORG $F800 ADDD 1165521 
NOP S!<.P12 SYNC 
ORCC ~~~01010000 STD T13 
LOU 1'PROD1 Sl'"'C 
BEGIN CLRA sugo R13 
STA FLAG sec SKP14 
LOA SEM ADDD lt65521 
BEQ FRO S I<.P14 STD T1R 
START L DA #1 SYNC 
STA FLAG SYNC 
FRO LOY· #MCND ... ... 
LOX 1f 1·1 L T F R STD I·~ CN 0 
LDA #1 CLR 'u 
STA ST.I'l.TUS C LR 1 'u 
SYNC LOA 1 , X 
CLRA LOB 1 ' y 
STA STATUS MUL 
LDD INPUT ST8 z,u 
Appendix-0 D-31 
LOA 
' X ISKP24 A ODD 1t15 LD6 1 'v ISKP25 SYNC 
MUL I STD T13 
AODD 1 'u I SYNC 
STD 1 'u I STO SAVE 
BCC SKP16 I LDO P.13 
INC 'u I su~o SAVE 
SKP16 LOA 1 , X I sec SKP26 
LD5 
' y I ADOD tf65521 
NUL ISK?26 S~NC 
ADOO 1,U I STD T9 
STO 1 'u I SYNC 
BCC SKP19 I STO SAVE 
INC 'u I L DO P.9 
SKP19 LOA 'X I sugo ·sAv= 
LOB 'v I BCC SI'.P2!3 
MUL I AOOD #65521 
AOOO 'u I ~: 
STD ,u ISKP28 STD SAVF: 
.... I .... LOA FLAG 
L DA 1,U I CMPA li1 
LOB 7:15 I REQ MULT 
~1UL I CMPA #2 
AO':lD z,u I 5EQ CONV 
BCS SKP20 I LDO SAVE 
CMPO #65521 
' 
STD RES 
RLO SKP21 I LB~A BEGIN 
SKP20 AOOO filS I c QtJ v LDD SAVE 
SKP21 STD z,u I STD OUTPUT 
•'• I LBRA BEGIN .,.
LDA 'u I ... ... 
LOX ltTF~lP IMULT INC FLAG 
CLR 
' X I LOX ~SAVE 
CLR 1 , X I LOY :;'DES 
Clq z,x ILOP15 CLP 'u 
LOB tt15 I CLP 1 'u 
11UL I LOA 1 , X 
STO 
' X I L 08 1 'y 
LOA 
' X I MUL 
LOS #15 I STD z,u 
MUL I L DA ' X 
ADOO 1,X I LOS 1 'y 
ADDD 2,U I ~1UL 
BCS SK?22 I AOOD 1,U 
CMPO #65521 I STD 1 'u 
BLO SKP23 I 8CC . LOP16 
SKP22 AODD .r,t 15 I INC ,u 
SKP23 SYNC IL:JP16 · LOA . 1 , X 
... I LOB 'y ... 
SYNC I MUL 
AODD R18 I ADOD 1-,U 
RCS SKP24 I STD 1 'u 
CMPO 1165521 I ~cc LOP19 
BLO SKP25 I INC ,u 
Appendix-D D-32 
LUP19 LOA 
' X IL~P23 STO T13 LOB 'y I SYNC 
MUL I SYNC 
ADDD ,u I LDO R13 
STD 'u I STD Tl8 
... I LDD R1.3 ... 
LOA 1 'u I STO T9 
L~S ~15 I SYNC 
MUL I LDD R9 
ADDD 2,U I STD SAVE 
BCS LOP20 I· SYNC 
CMPO tt65521 I LBRA ~~EXT 
BLO LOP21 I ... ... 
LUP20 ADDD it15 !MLTFR Foe. 1465 
LGP21 STD 2,U IMLTRR FOB 21938 
•'• ::: .,.
L D.O. 'u CRG ioooo 
LOX ttTEMP f·1CNO t=Qr:, 0 
CLR 'X PR801 1=(8 0 
CLR l,X PR002 FCB 0 
CLR 2, X PR003 FCB 0 
L DB .illS PR004 t=(g 0 
f·IUL T:MP FCB 0 
STD 
' X T Ei'lPl FC B 0 
L i:H 'X TE1·1P3 FCS 0 
LOB #15 SAVE FOB 0 
MUL FLAG FCB 0 
AODD l,X RES FOB (1 
ADDD 2,U ... ... 
BCS LCP22 I ORG $FFFE 
CMPD #65521 ISTRT EQU $F800 
BLO LOP23 I END BEGIN 
LUP22 .O.DDD ltlS I ~: 
... ************************************************************** ... 
... .,, PROCESSOR NU~BER 15 ... ... ... ., . 
... ~~~~~~~~-~~~~~~~~·~~~·~···-~··~·~········~~~~·~~··~·~~·~·~~~~~ 
... 
,,. .. "'!'" .. , .... , ..... , ......... , ......... , .... , .... , ..... , ..... , .... , ..... , .... , ..... , ........... ~ ... , ........ '"l" ... , ...... , ........... , .... , ......... , .... , .... , ..... , ..... , ........ , ..... , .... , .......... , ....................... , ................................... , .............. , ..... , ..................... , ........... , ..... , ..... , ... 
NAt~ 680915 I STA FLAG 
OUTPUT EQU $0400 I L DA SEM 
STATUS EQU $0402 I ~EQ FRO 
TlO F.QU $0403 !START LOA #1 
Tl2 EQU $0405 I STA FL~G 
T18 EQU $0407 !FRO L DY #MCND 
INPUT EQU $0410 I LOX #MLTFI:! 
R10 E ~~u $0412 I LOA ~1 
R12 EQU $0414 . I STA STATUS 
R18 EQU $0416 I SYNC 
SEM EQU $0418 I CLRA 
ORG $F800 I STA STATUS 
... I LDO INPUT . ,. 
ORG $FSOO I 11RA OVER 
NOP INfXT L DY It MOJO 
ORCC tn~ClOlOOOO I l 0 )( 'Hil TRD 
L Dtl liPROOl I s '(I.J c 
BtGIN CLRA I LDO SAVE 
Appenclix-0 0-33 
·'· ISKP20 ADOD ns ... 
OVER STO T10 ISKP21 STD 2,U 
SYNC I··· .,. 
STD SAVE I LOA 'u 
LOD R10 I LOX ft T E ~~ P 
SUBD SAVE I C LR ' '/. 
BCC SKP12 I CLR 1 , X 
ADOD ~65521 I C LR 2,X 
SKP12 S Y ~.JC I LOB ~15 
ST~ T12 I M!JL 
SYNC I STQ ' X 
STD SAVE I LOA ' X 
L DO R12 I LOB ~ 1 5 
SUBD SAVE I 1·1UL 
sec SKP14 I ADDD 1 , X 
ADDD :'165521 I ADOD z,u 
SKP14 STD Tl8 I BCS S K P 2 2 
SYNC I Cf~ PO ii65521 
SYNC I 8LO Sl<P23 
.... ISKP22 ~ODD ~ 1 5 . ,. 
STD MCNO ISKP23 SYNC 
CLR ,u J·'· ......
CLR l,U I SYNC 
LOA l,X I SUBD RlQ 
LOS 1 'y sec StU'24 
MUL .0. DOD 1165521 
STD 2,U SKP24 SHIC 
LOA 'X STD T12 
LOB 1 , y SYNC 
MUL STD SAVE 
AuOD l,U LDD ?12 
STD l,U SUBD SAVE 
BCC SKP16 BCC SK.P26 
!NC ,u AODD #65521 
SKP16 LOA l,X SK?26 SYNC 
LOB 
' y ST~ T11) 
i'IUL SYNC 
AODO l,U STO SAVE 
STD 1 'u LDD r->10 
BCC SKP19 SUBD SAVE 
INC ,u fiCC SK 0 23 
SKP19 LOA 
' X ADOIJ 1165521 
LOB 'y "'• .,.
MUL SKP28 STD SAVE 
ADDO ,u LOA FLAG 
STO 'u OIPA ill 
... BEQ i•IUL T 
... 
LOA 1 'u C ~~ PA #2 
LOB #15 SEQ CGNV 
r~UL LDD SAVE 
ADOD z,u STD Rf.S 
BCS SKD20 L8?A BEGIN 
Cr'WO tt65521 C OtJV LOD SAVE 
BLQ SK P 21 ST!J OUTPUT 
Appendix-0 0-34 
LBRA BEGIN CLR ' X 
... CLR 1 , X 
. ,. 
MULT INC FLAG CLR 2, X 
LOX /if SAVE LOR ~15 
L DY #RES f-AUL 
LOP15 CLR ,u STO ,.x 
CL~ 1 'u LOA. ' X 
LOA 1, X LD'3 #15 
L DB 1 'y 1-IUL 
MUL AOOD 1 , X 
STO z,u .A DOD 2tU 
L DA 
' X E.CS L:JP22 
LOB 1 'y CMPD ii65521 
MUL ~LO LOP23 
AODD 1,U LOP22 AOOO #15 
STD 1,U LOP23 STD T18 
BCC LDP16 SYNC 
INC 'u SYNC 
LlJP16 LOA 1 , X SYNC 
LOB ,y I SYNC 
r~UL I LOD R18 
ADDD l,U I ST!J SAVE 
STD l,U I LB~A NEXT 
BCC LDP19 I ::: 
INC ,u IMLTFR 1=08 23174 
Li.JP19 LOA 
' X IMLTRR FO~ 5913 
LOB 
' y I•'• .... 1 .. 
MUL I ORG !0000 
ADDD ,u I 1-1C NO FOB 0 
STD ,u IPRODl FCB 0 
•'• IPRODZ ... r::cs 0 
LOA l,U IPROD3 FCS 0 
LOB #15 IPROD4 r::cg 0 
MUL !TEMP c::ce 0 
AODO 2,U ITE1~Pl FCB 0 
ccs LOP20 IT'=r-1P3 FCB 0 
Cl-1 PD ii65521 I SAVE FOB 0 
BLO LOP21 I FLAG 1=(8 0 
LUP20 ADDO ~15 IRES FOB 0 
LOP21 STO z,u I•'• .,. 
•.. I ... O~G ~FFF!: 
LOA , u ISTP.T EQU tF800 
LOX #TEMP I E r~o BEGIN 
... -~---------------------~------~-------------------------------.,. ¥~~~~~-¥¥¥¥~--¥¥¥••¥¥¥~¥¥¥¥_¥_¥¥¥¥¥¥¥¥¥¥¥~¥¥¥¥¥¥~¥-¥~¥¥¥¥-¥¥¥¥¥ 
•'• 
... 
.PR 0 CESS OR NUl~ BE R 16 :-:: . ,. . ,. 
•'• ************************************************************** ... 
NAM 680916 I NOP 
T4 E (JU $0410 I QRCC rtt01010000 
TS EQU $0412 I L DU nPRODl 
R4 EQIJ $0414 !BEGIN CLRA 
R5 !=QU $0416 I STA FLAG 
SE:M f QU $0418 I LOA S =~I 
•'• I BEQ I=RD .,. 
ORG SF800 !START LOA 1!1 
Appendix-0 J-35 
S T A. FLAG SKP19 STO 2;U 
FRO LOY ~1·1CND t.: 
LDX fii·1L TFR LOA ,u 
BRA OVER LOX ltT!:MP 
NEXT LOY #MCND CLR 'X 
LOX #~1L TP R CLR 1 , X 
OVER SYNC CLR z,x 
SYNC LOB ~15 
SYNC ~~UL 
SYNC STO ' X 
SYNC LOA ' 'I 
LOD R4 LOB 1tl5 
A ODD RS ~·1 U L 
BCS SKP12 ADDD 1 , X 
CMPD '165521 AODD z,u 
BLO SKP13 BCS SKP20 
SKP12 AODO ttl5 CMPO #65521 
SKP13 SY0JC BLO SK::>21 
... SKP20 AOOD 111 5 
'•' 
STD t'ICND SKP21 SYNC 
CLR ,u •'• .,. 
CLR 1,U STO TS 
. L DA 1, X STD T4 
LOB 1 ' y SYNC 
MUL SYNC 
STO z,u SYNC 
LOA 'X SYNC 
LDR 1 'y SYNC 
MUL LOA FLAG 
ADOD 1 1 U C~1PA Ill 
STD 1 'u BEQ SKP 
E\CC St<.P14 LBRA s=GIN 
INC ,u SKP INC FLAG 
SKP14 LOA 1 , X SYNC 
L oe. 
' y LDO RS 
MUL STD T4 
ADOD 1 'u SYNC 
STD 1 9 U SYNC 
sec SK?l7 SYNC 
INC ,u L8RA N!:XT 
SKP17 LOA ,x •'· ...
LOB t y IMLTfR ~DB lBOCS 
~1UL !MLTRR FD~ 5493 
ADDD ,u , ... .,. 
STD t ~J I ORG $0000 
..,, I MOJO FOB 0 
'•' 
LOA l,U IPRQ!:ll FCB 0 
LOB 1tl5 IPROD2 FCB 0 
MUL IPR0~3 FCB 0 
AOOD 2 'u IPR004 FCS 0 
8CS SKP18 IT E I~P FCB 0 
CMPO #65521 JTENPl FC:3 0 
BLO SKP19 ITEMP3 FCB 0 
SKP18 ADDD ttlS !SAVE co~ 0 
Appendix-0 0-36 
FLAG FCB 0 I ORG SFFFE 
... ISTRT EQU $F800 ... 
•.. I ~iW BEGIN ... 
... ************************************************************** ... 
... ... PROCESSOR N UJ~ 8 E R 17 , . ... . .. ., . 
... ************************************************************** ... 
NM1 680917 I INC ,u 
T9 fQU $0410 IS!<P14 LOA 1 , X 
TlO EQU $0412 I L ::l ~- ' y 
R9 =Qu $0414 I MUL 
R10 EQU $0416 I AOOD 1,U 
SU1 EQU $0 418 I STD 1 'u 
... I ~cc SKP17 ... 
ORG $F800 I INC ,u 
NOP ISI<P17 LOA ' X 
ORCC lt%01010000 I L D~ ' y 
LOU #PRDD1 MUL 
BEGIN CLRA AOOO 'u 
STA FLAG STO ,u 
L~A SEM •'• ... 
REQ FRD LOA 1 'u 
STAPT LOA ~1 LD~ .H5 
STA FLAG ~UL 
FJ.<O LDY #MCND ADDD 2,U 
LOX ztMLTFR BCS S K P 18 
BKA C"JVtR CMPD ii65521 
NEXT LOY ttMCND BLfl SKP19 
LOX ~MLTRR SKP18 AODD ~15 
OVER SYNC SKP19 STD 2,U 
SYNC ... ... 
SYNC LDA ,u 
SH'C LOX ttTEMP 
SYNC CLR ' X 
LDD R9 CLR 1 , X 
ADDD R10 CLR 2,X 
BCS SKP12 LD~ #15 
CMPD ~65521 t·1UL 
BLO SKP13 STO ' X 
SKP12 ADOD #15 L DA ' X 
SKP13 SYNC LOB #15 
... MUL ~-
SiD r~CND ADDD 1 , X 
CLR ,u AODD 2,U 
CLR l,U I ~cs SK P2 0 
L DA 1 , X I CMPD ,!165521 
L DB 1 'y I BLO SKP21 
MUL ISKP20 AODD :tl5 
STD 2,u ISK.P21 SYNC 
l 0.11 
' X I ::c 
.L DB 1 ' y I STD T!O 
~1UL I STCl T9 
AOOD 1,U I SYNC 
STD 1 'u I SYNC 
BCC SK ~14 I SYNC 
Appendix-[) 0-37 
SYNC I~LTRR f DB 34561 
SYNC I ~~ 
LOA FLAG I CRG $')000 
OlPA ~1 IMCfW FD5 0 
BEQ SKP IPP001 FC!: 0 
L BRA BE G. IN I PR002 FCg 0 
SKP INC FLAG IP~003 FCB (I 
SYNC IPROD4 FCS t) 
LDO RlO IT Et~P FCB 0 
STD T9 ITEMPl FCP, 0 
S 'fNC ITFMP3 FCB 0 
S 'f~IC IS~VE' Fog I) 
LDD R9 I FLAG FCB I) ... 
STD TlO I ... .,. 
SYNC I r:JRG $FFFE 
L8RA NEXT ISTRT E :-;) U $FSOO 
·'· I E i~ 0 8 E G If.J 'I"
f·1L TFR F D f3 5753 , ... .,. 
•'• ************************************************************** .,. 
"'· 
... PROCESSOR NUt~BER 18 •'• .... .,. .,. 
•'· ************************************************************** .,.
NAM 680918 ISKP13 SYNC 
Tl4 EQU $0410 , ... .,. 
TlS EQU $0412 I STO MCND 
Rl4 E ::)U $0414 I CLR ,u 
R15 EQU $0416 CLR 1 'u 
SEM EQU $0418 lOA. 1 , X 
•'· LD3 1 ' v .,,
OPG $F800- t~UL 
NOP STCl z,u 
ORCC 't%01010000 LOA ' X 
L DU #PROD1 LOB l,Y 
BEGIN CLRA r-1 u L ~ 
STA FLAG AOOD 1,U 
LOA SEN STD 1, u 
BEQ FRO BCC SKP14 
START LOA t:l INC ,u 
STA FLAG SKP14 LOA 1 , X 
FkO LOY #MCND LOB ' y 
LDX #MLTFR ~1UL 
BRA OVER I ADDD 1 'u 
NEXT LOY #I~CNO I STD 1 'u 
LDX ~MLTRR I BCC SKP17 
OvER SYNC I INC , u 
SYNC I S.K p 1 7 LOA ' X 
SYNC I LD?) 'v 
SYNC I MUL 
SYNC I ADOD ,u 
LDD Rl4 I STD ,u 
AODO P.l5 , ... .,. 
BCS SK P 12 I L DA 1 , u 
01PD· N65521 I LOR .1115 
BLO SKP13 I ·~uL 
SKP12 AODD #15 I ADDD 2,U 
Appendix-0 ::>-39 
BCS SKP18 CMPA :t1 
CMPD t£65521 SEQ SKP 
gLo SKP19 LBPA BEGIN 
SKP18 ADOO '*15 SKP INC FLAG 
SKP19 STD z,u SYNC 
•'• LDO R15 ... 
LOA ,u STD Tl4 
LOX #TEMP SYNC 
CLR 'X s.nJc 
CLP 1,X LOO R14 
CLR 2 t X STD T15 
L 08 #15 SYNC 
MUL LBRA N::XT 
STO 
' X 
... 
. ,. 
LOA t X MLTFR FOB 43615 
LDB ~15 ."1L TR R FOB 24743 
MUL I ::: 
AD;JD 1, X I ORG $0000 
ADDD z,u I r~CNO FOB 0 
3CS SKP20 I PRDD1 FCB 0 
CMPO #6 55 21 IPR002 FCB 0 
5LO SK?21 IPR003 FCB 0 
SKP20 AODD #15 IP~004 FCB 0 
SKP21 SYNC I TEMP FCB !' 
... ITEMP1 !=CR 0 ... 
STD T15 ITEMP3 FC B 0 
STO T14 IS AVE F 0?. 0 
SYNC I FLAG c:cs a 
SYNC I ... ,.,. 
SYNC I ORG $FFFE 
SYNC ISTRT F:QU $F800 
SYNC I END REG IN 
LOA FLAG I ... ... 
... 
... 
... **************************~*************~********************* ... 
... . .. CONTROL MICROPROCESSOR ::: ... ... 
•'• ************************************************************** .,. 
...... : 
... 
.,. NM1 CONTROL ORG $F800 .,. 
:::~ STA~T NOP 
ACIACR EQU $1040 ORCC #%01010000 
ACIASR EQU $1040 LOS tt~ao 
ACIARX EQU $1041 CLRA 
ACIATX EQU $1041 STA CONTRL 
ARYOUT EQU $1020 STA STATUS 
INPUT EQU $10QO LOA # 0~00000101 
OUT 1M EQU $1050 STA CONTRL 
OUT1L EQU $1051 LOA t.!$13 
OUT 2M EQU $1052 STA ACIACR 
OUT2L EQU $1053 LOA It $11 
DATA E fJU $1054 STA ACIAC~ 
(UNTRL EQU $101!.: LOA ..,~00000111 
::: STA CONTRL 
LOA tt~~OOOOOllO STA T ~1 p 
Appendix-a 0-39 
STA CONTRL JSR RC X 
LOA #%00000111 JSR TXR 
STA CONTRL C f~PA !#'$00 
STA STATUS F\=Q SK5 
BRA BEGIN JSR VALI 0 
INIT LOS !it$80 AOOA T~P 
LOA # $0 7 SRA. SK6 
JSR TXR SK5 LOA T"'!P 
JSR CRLF LS~A 
BEGIN JSR CRLF LSRA 
JSR PFX LSRA 
JSR RCX LSRA 
CMPA #. , 1 gRA SK6 
BEQ SKPl ZERO CLRA 
CNPA #'2 SK6 STA CNT 
BEQ SKP2 CMPA ~~0 
LSRA DSP L3HS EQr1SG1 
•'• LOOPS JSR (qLc: .,.
SKP1 LOA 1t1 "'· '•' 
STA FLAG JiARITE: LOA C"JT 
LOX #rASG3 I LSPA 
LOA #15 I LSRA 
STA CNT LSRA 
JSR DSPLY LSRA 
LOY #INl JSR CO NVA 
LOX #.~RYIN JSR TXR 
JSR EXG L:JA CNT 
BRA MOOFY ANDA # $0 F 
SKP2 LOA #2 JSR CONVA 
STA FLAG JSR TXR 
lOY #IN 2 L uA t~'= 
LOX #ARVIN JSR TXR 
JSR EXG LOA tt$20 
LOX i'H-1SG8 JSR TXR 
LOA #13 LOB CNT 
STA OJT LOA B,X 
JSR DSPLY LSRA 
J SR CRLF LSRA 
JSP P FX LSRA 
.. , LSRA .,.. 
MOOFY JSR CPLF JSP. CON VA 
LOX ~ARVIN JSR TXR 
LOA ti$20 LOB C ~lT 
JSR TXR LOA 8, X 
JSR RCX A tJ 0 A 11~01= 
JSR TXP I .J s J;' CClNV A 
CMPA #~00 I JSR TXP. 
BE (J ZERO I L DA rt't20 
JSR VALID I JS~ TXR 
LSLA , ... ... 
LSLA I READ JSR RCX 
LSLA I J SP TXR 
L SL A I C"1PA li !. 0 (I 
B~Q MOVE I LOA STATUS 
Appendix-0 D-40 
01PA #'- I ANDA li%11111110 
BEQ OECR I STA CONTRL 
C MPA #$20 I ORA iio/,00001001 
BEQ INCR I STA CONTRL 
JSR VALID I SYNC 
LSLA I ANOA ~%11110111 
LSLA I STA CONTRL 
LSLA I STA STATUS 
LSLA I ... ... 
STA H1P IC'JNV 'LOX #OUT 
JSR RCX I JSR EXC~G 
JSR TXR I JSR CRLF 
JSR VALID I LOX tH1 SG4 
ADDA Tr~P I LOA 1111 
LOB CNT I STA CNT 
STA B,X I JSP OSPLY 
INCR L DA CNT I JSP. CP.LF 
INCA I LOX ~tOUT 
CM 0 A # 30 I JSR ARAY 
BHS t~OVE I JSR CRLF 
STA CNT ISKP4 LOA li15 
BRA LOOPS I STA CNT 
DECR l DA CNT I LOX 1tOUT 
OECA ISKPS LDD ,X++ 
BLT MOVE I STO OUT 2M 
STA CNT I DEC CNT 
L8RA LOOPS I BNE SKPS 
... I LOA ACIASR ... 
MOVE JSR CPLF I. LSRA 
JSR PFX BCC SKP4 
JSR CRLF LOA ACIARX 
LDV ttARYIN ANDA #$7F 
LOA FLAG CMPA t:'G 
01PA ~H LSEQ GET 
8NE SKP3 LBRA SEGIN 
LOA STATUS ... ... 
ANOA ttY,10111111 OSPLY LOA ,X+ 
STA CONTRL JSR TXR 
STA STATUS DEC CNT 
LOX #I".Jl P,NE DSPLY 
JSR EXG ens 
LOX ~INPUT ... . ,. 
JSR f:XG TXR LOB r,"S02 
LBRA BEGIN ~~A IT 3ITB ACIASR 
SKP3 LOA STATUS BEQ WAIT 
ORA #%01000000 STA ACIATX 
STA CONTRL RTS R:TURN 
STA STATUS "· ... 
LOX #IN2 RCX LOA ACUSR 
JSR EXG LSRA 
LOX #INPUT 8CC RCX 
JSR EXG LOA ACI.HX 
... ANOA ti$7F 
... 
RTS STA STATUS 
Appendix-a 0-41 
"'• I ~TS .,. 
CO NVA CMPA #9 I·'· .,. 
BLS OMIT I ... ... 
ADDA ~·A-'9-1 IG~T J SR C~LF 
Dt-1! T ADDA #'0 I JSR PFX 
ANDA· #S7F I LOX IIM$G4 
RTS I LOA #12 
... I STA CNT . ,. 
CRLF LOA ~$00 I JSR DSDLY 
JSP TXR I JSR CRLf= 
LOA tt$0A I JSR PFX 
JSR TXR ILOOPW LOA STATUS 
RTS I ORA ~t%00010000 
·'• I STA STATUS .,.
EXCHG LOA STATUS I STA CONTRL 
'JRA ~?~00001000 !LOOP X LOA #15 
STA CONTRL STA C"lT 
STA STATUS LOY #ARY!N 
L DY #ARYOUT LOX #INPUT 
SYNC LOU liiNZ 
·'• LDD DATA .,.
EXG LDO , y LOOPY ·sYNC 
STO 'X LDD 0 AT A 
LDD 2,Y STD ,X++ 
STO z,x STD ,Y++ 
LDD 4,Y STD ,U++ 
sTn 4,X DEC C"JT 
LCJD 6,Y BNE LOOPY 
STO 6,X ::: 
LDD 8 'y L DA STATUS 
STO 8,X ANDA ~%11111110 
LDD 10,Y STA CIJNTRL 
STD lO,X f'JRA ~tY,OOC01001 
LDD 12,Y STA CONTRL 
STD 12,X SYNC 
LDD 14,Y ANOA #%11110111 
STO 14,X STA CCJNTPL 
LDD 16,Y STA STATUS 
ST'J 16,X ... . ,. 
LDD 1 8' y LOX #OUT 
STD 18,X JSR EXCYG 
L DO 2 0, y LOY #INZ 
STD zo,x LOA ;1115 
LDQ zz,v I STA CNT 
STD 22,X ILOOPZ LDD ,X++ 
LDO 24,Y I STD OUT2~~ 
STO 24,X I L DO 'y ++ 
LDD 26,Y I STD OUTlM 
STD 26,X I DEC CNT 
LDD 2 8, y I. ~ NE LOOPZ 
STD 2 8, X I LOO A( !.ASP. 
ANDA #"'.11110111 I LSt;~A 
STA CONTRL I sec LOOP X 
ANDA #$7F I STO OUT2~·1 
Appendix-0 D-42 
CMPA # ~~ I DEC CNT 
LBEQ BEGIN I B ~J E L'JOP:: 
JSR CRLF I LOA ACIAS~ 
JSR PFX I LSqA 
REPEAT LOX "OUT I 8CC SK9 
LOY IIIN2 I L DA ACIARX 
L~A #15 I ANDA #l7f 
STA C"lT I CI~PA #, 1 
LOOP LDO ,Y++ I LBEQ SKP1 
,'. 
I SEl OUT 1M LBRA SKP2 
L DO ,X++ I t.' 
STO ·OUT21~ IARAY LOA #30 
DEC CNT . I STA eNT 
BNE LOOP I AGAIN CLR CNT1 
LDA ACIASR LOOP2 CLR Ct..IT2 
LSRA LOOPl LOA 
' X 
gee REPEAT INC CNT2 
LOA AeiARX LSRA 
ANDA #<;7F LSRA 
Ct~PA #, 1 LSRA 
L5EQ SKPl LSRA 
CM 0 A #'2 JSR CONVA 
LBEQ SKP2 JSR TXR 
LBRA LOOP X LDA ' X + 
... 1\NOA :t<£ 0 F ... 
DSP JSR CRLF JSR CON VA 
LOX #MSG5 JSR TXR 
LOA #7 INC CNT2 
STA CNT DEC eNT 
JSR OSPLY BEQ SET 
JSR CPLF LOA eNT2 
JSR PFX C I~ PA t/4 
JSR eRLF BEQ eHKT 
LOX #!Nl 8R~ LOOPl 
JSR ARAY OVER JSR eqLF 
JSR CRLF BRA AGAIN 
JSR eRLF CHKT LOA CNT1 
LOX ~MSG6 e ~1 PA ti4 
LOA #7 SEQ OVER 
5TA eNT LOA .r,i$20 
JSR OS PLY J SR TXR 
J SR CRLF INC e ~1 T 1 
LOX #IN2 BRA LOOP2 
JSR AP.AY SET RTS 
JSR CRLF ... ... 
JSR PfX VALID SUBA # '0 
SK9 LOX ~INl I CMPA ~9 
LOY #IN2 I BHI e H K.l 
L DA #15 I PTS 
STA eNT leHKl SUBA P7 
LOOPE LDO ,X++ I ( I~PA lt$0F 
STD OUT 1M I ?. L S: OK 
LOD ,Y++ I 'IRA ERMSr,3 
OK RTS I MSGl FCC 'Address Too Large'. 
Appendix-0 
•'• ... 
PFX 
·'· .,. 
L DX 
LOA 
STA 
JSR 
RTS 
ttMSG7 
#6 
CNT 
OSPLY 
ERMSGl JSR CRLF 
LOX #MSGl 
L DA #17 
STA CNT 
JSR CRLF 
JSR OSPLY 
JSR CRLF 
LBRA HJIT 
Efd·1SG3 JS~ CRLF 
•'• ... 
·'· .,. 
.... 
. ,. 
•'• . ,. 
•'• .,.
... 
... 
LOX 1H1SG2 
LDA il17 
STA CNT 
J SR CRLF 
JSP. OSPLY 
JSR CRLF 
LBRA IN!T 
NAM WINOlS 
LOX .~AX 
LOY #ARVIN 
................................ ____ ..._ .................................... 
.;, ..... , ..... , .. "1 .................... , .... , .... , ..... , .... , .................................. , .......... , .... , .. 
... INPUT R~ORDERING •'• ... ., .
*******************~** 
LDD ,y 
STD 'X 
L DO 6,Y 
STD z,x 
LDO 12,Y· 
STO 4,X 
LDD 18,Y 
STD 6,x 
LDD 24,Y 
STO s,x 
LDD lO,Y 
STO 1 0, X 
LDD 16,Y 
STD 12,X 
L:JD 2 2, y 
STD 14,X 
LDD 28,Y 
STO 16,X 
LDD 4,Y 
D-43 
MSG2 FCC 'Invalid HEX Digit' 
MSG3 FCC 'enter Response' 
MSG4 FCC 'Convolution , 
t-ISGS FCC 'O.rray 1 , 
MSG6 FCC . 'O.rray 2 , 
I,SG 7 FCC 'CIJNV: . 
MSG8 FCC ·=nter Values' 
::: 
ORG $0081 
STACK RMB 1 
T'1P Pt·18 2 
CNT Ri·IB 1 
CNTl Ri·1B 1 
C NT 2 I) r~ R 1 
STATUS RMB 1 
FLAG R r~s 1 
HH Rf,~. 30 
IN2 R Me 30 
lOUT ~t-18 30 
IAPYIN RMB 30 
I ::: 
l O~G <;FFFC:: 
ISTRT ~QU $F800 
I END START 
I LOO 20,Y 
I STD zo,x 
I LDD 26,Y 
STO 22,X 
LDD z,v 
STD 24,X 
LDO 8' y 
STD 26,X 
LDD 14,Y 
STD 28,X 
·'· ,. 
•.. 
----------------------
... .. ................. , ......... , .......... , ......... , ......................... , .. " .. ~ .......... , .... , ............ 
::: •'• 3-POINT PRE-wEAVE :~~ .,. 
•'· 
........................................................................................ 
.,. ....... , ..... ,. ................. , .. "'f"' ............. , ............. , ...... , .. ~ ....... 'f" ~ ... ~, ........ '•" 
s I( p 2 LDD lO,X 
ADOO 2 0, X 
~cs JMPl 
CMPO #65521 
13 LO JM::>Z 
JMPl ADDD fJ 1 5 
I J ~1 P 2 STD T 1·1 P 1 
I aoo~ ' X 
I Cl, c s Jr-IP3 
I C~PD 1165521 
I ;:, 
Appendix-0 ~-44 
BLO JMP4 I PLO Jt~Pl7 
Jt·1P3 ADDD #l5 IJMP16 ~JDD ttl5 
JMP4 STD 
' X IJMP17 STD TMP1 
LDD lO,X I ADOD 6,X 
su~.o 20,X I scs JMP18 
BCC JMP5 C ~~PO ~65521 
ADDD lt65521 BLO JMP19 
Jl"IP5 STD zo,x J"1P18 ADDD ~15 
LDD TMPl JMP19 STD 6,X 
STD 10 9 X LOO 16,X 
LOD 12,X SUBD 26,X 
ADDD zz,x RCC JMP20 
scs JMP6 :.ooo· ~65521 
CMPO ::65521 J~1P20 STD 26 9 X 
SLO JMP7 LDO TMP1 
J,.IP6 ADDD it15 STD 16 9 X 
JI"IP7 STD TMPl LOO 18,X 
ADOD 2, X ADDD 28,X 
BCS JMP8 scs J/o1P21 
Ct-IPD .it65521 CMPO #65521 
BLO JMP9 BLO Jt~P22 
J,.1P8 ADDO .1115 JMP21 AOOO #15 
Jt-1P9 STD 2, X J~P22 STD H1Pl 
LDD 12,X A ODD 8, X 
SUBO zz,x RCS JMP23 
BCC JMP10 Cf-IPO H65521 
ADDD #65521 BLO JMP24 
JMPlO STD 22,X J!\IPZ 3 ADDD tt15 
LDO Tr~ p 1' JMP24 STQ 8, X 
STD 12,X LDD 18,X 
LDD l4 9 X SUBD 28,X 
ADOD 24 9 X sec JNP25 
scs J t~ p 11 AOOD ti65521 
cr~PD #65521 Jr1P25 STO 28,X 
BLO JMP12 LDO THPl 
Jf-1Pll ADDD tt15 STD 18,X 
Jt·IP12 STO TMP1 •'• ··~~~·~~~···~·~~~~~·~~ .,. ¥¥ ¥¥¥¥¥¥¥-¥¥¥¥¥¥ ¥---
ADOD 4,X ... .. . 5-POitH P~E-WEAVE :~:: . ,. .,. 
8CS J~1P13 I ... ~~···~~·~~··~~-··~~·~~  ,. ....................... , .... , .... , .... , ...............................................................................
C r~PO #65521 LOY Ill 
BLO JMP14 LDD 2 ' l( 
JHP13 ADDO #15 AODD 8 t X 
JMP14 STD 4,X BCS JMP26 
LDD l4,X CMPD tt65521 
SUBD 24,X 8LO JMP27 
BCC JM 0 15 JMP26 ADDD 1115 
ADDD lt65521 JIW27 STO 2,v 
JI'1Pl5 STO 24,X LDO 2 , X 
LDD TMP1 sur.o 8, X 
STD 14, X 13CC JMP28 
LDD 16, X ..".ODD #65521 
AODD 26,X JMP28 STD f..,Y 
BCj Jr-1Pl6 LOD 4 , X 
CMPLJ lf65521 AD no 6, X 
App~ndix-0 D-45 
BCS JMP29 IJMP42 ADDD 1115 
CMPO #65521 IJ"'1P43 STO 16,Y 
BLO JMP30 I LOD 16,X 
Jt-tP29 ADDD #15 I SUBD 14,X 
Jt-IP 30 STD 4,Y I ace JMP44 
LOO 6,X I ~DOD 1165521 
SUBD 4,X IJ~P44 STD 22,Y 
BCC JMP31 I AOOD 18,Y 
ADDD #65521 I BCS JMP45 
Jl"iP31 STD lO,Y I CMPD 1165521 
.C\000 6,V I BLO JMP46 
BCS JMP32 IJ'~P45 ADOD #15 
Ct~PO #65521 IJMP46 STD 20,Y 
BLO JMP33 I LOD 16,Y 
JNP32 ADD:::l #15 I ADOD 14,Y 
Jt•lP33 STO s,v I BCS JI-IP4 7 
LDD 4,Y I CMPO tt65521 
ADDU 2, y I BLO JMP43 
BCS JMP34 IJMP47 A:JDO 1115 
C ~~PO •65521 IJ"'1P48 STD TMPl 
BLO JMP35 I ADDO 10,x 
JNP34 AODD 1tl5 I ~cs JMP49 
Jt-1P35 STD TMPl I Ct~PO 1165521 
AODD 
' X I BLO JMP50 
BCS JMP36 IJMP49 ADDD U5 
CMPO #65521 IJMPSO STO 1 2' "( 
SLO JMP37 I LDD l4,Y 
JNP36 ADDD #15 I SUSD 16,Y 
Ji-'t p 3 7 STD 'y I BCC JI-IP 51 
LOO 2,Y I ADDD #65521 
SUBD 4,Y IJ'~P51 STD 16' y 
BCC . JMP38 I LDD TMD1 
ADDD #65521 I STD 14 9 Y 
J/'1P3 8 STQ 4,Y , ... .,. 
LDD TMPl I LDD 22,X 
STD 2,Y I AODD 28,X 
....... I 8CS J r~ P 52 . ,. 
LDD 12,X I CM 0 0 j#65521 
AODD 18,X I BLO JMP53 
e.cs JMP39 J ~1P 52 ADDD 1115 
CMPO #65521 JMP53 STD 26,Y· 
BLO JMD4Q LDD 2 2, X 
Jl'-1P39 AJDD F115 SUSD 28 9 X 
J 11 P40 STD 14 9 Y BCC JMP54 
LDD 12,X ADDD ~65521 
SUBD 1 8 ·,X JMP54 STD 30 ,y 
BCC J ~~ p 41 L DD 24d 
ADDO t165521 ~ODD 26,X 
J 1·1 p 41 STD 18,Y ?.CS JMP56 
LDD l4,X 01PO il65521 
AODO 16,X oLD JMP57 
BCS JM 0 42 JMP56 AODD !il5 
(;-1 p 0 1t65521 Jt.1P 57 STD 2 s ,y 
RLO JMP43 LDD ?.6,X 
Appendi)(-0 i)-46 
SUBD 24,X STD 2 'u 
sec JMP58 LOA 
' X 
ADDD #65521 LOB 1 ' y 
JMP58 STIJ 34,Y ~~ UL 
ADOD 30,Y AODO 1 'u 
~es JMP59 STO 1 'u 
e ~,PO 1165521 gee SKIP3 
BLO J ~1 p 60 ! Ne ,u 
JI'1P5 9 ADDD #15 SKIP3 LOA 1, X 
JMP60 STD 32,Y LOB ' y 
LOD 26,Y ~~UL 
AC)OO 28,Y ADOD 1 'u. 
BCS JMP61 STO 1 'u 
C~1PD #65521 gee SKIP4 
BLO Jt~P62 INC 'u 
Jt-IP61 ADOD #15 SKH'4 LOA ' X 
Jtv1P62 STO TMP1 LOB ' y 
ADOO zo,x MUL 
BCS JMP63 AODO ,u 
U~PD tt65521 STD ,u 
[jLQ JMP64 I:~ 
JHP63 A.JOD #15 I LOA 1 'u 
Jtv1P64 STD 24,Y I L 09 ,!115 
LDD 26,Y I MUL 
S U BD 28,Y I AODO 2,U 
BCe JMP65 I BCS SKIP6 
AODD 1165521 I C I~ PO #65521 
JNP65 STD 28,Y I P.,LO SKIP7 
LOO H1P1 ISKIP6 ADDO 1115 
STIJ 26 ,.'( ISKIP7 STD 2,U 
•'• ~······--·--~--------- I LD~ ,u ..• ¥¥¥¥¥¥¥~¥¥-¥¥~¥¥¥¥¥¥¥¥ 
·'· 
... MULTIPLICATION ... I LOY :II T ~ i·IP . ,. ... . .. 
::: 
--------·--·----------
¥¥¥¥~¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥~ I CLD. ' y 
CLRA I CLR .1,Y 
STA IND I CLR 2 'y 
L:lS #Z I LOB li15 
~OOP LOA FRO I MUL 
REQ OV~Rl ISKIPA STO ' y 
LOY ~CC1EFR I LOA ' y 
BRA ov::R2 I B::Q SKIP~ 
OVERl LOY ;,coEFF I LOB ~15 
OVER2 LOA IND I MIJL 
LDO A,Y ! ADCD 1 ' y 
STO MLTR I BRA SKIPQ 
LDD 
' s ISK!PE LDD 1 ' y 
STO MLTN ISKIPO AOOO 2,u 
LOX #MLTR I BeS SKIPS 
LOY #MLTN I CMPD 1165521 
LOU #PROD1 I RLO SK!PC 
CLR ,u ISKIPo ADOO :t 1 5 
CLR 1 'u ISKIPC STD ,S++ 
LOA 1 ' )( I LD.l IND 
LOB 1 'y I AODA 112 
r~UL I STA IN 0 
Appendix-D 0-47 
CMPA 1134 BCS JMP78 
L BLS LOOP U1PD 1165521 
·'· ~~~~~~~~-~·~·-~-~-~~-· BLO JMD79 .,. ¥¥¥ ¥¥¥ ~-~¥¥¥¥¥¥¥¥¥¥¥
~( •'• 5-PO!NT POST-WEAVF: ·'• JMP78 .A ODD tt15 .,. .,.
•'• ~········~·-~·------·· JI-1P7 9 STD TMPl .,. ¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥-¥¥¥¥
L :J X ~AX LDO z,x 
L DY #Z SUBD 8, X 
L DO 
' y BCC JM 0 8(1 
STD 
' X AODD ,1t65521 
ADDD z,v J ~1 P8 0 STD P,x 
BCS Jt·1P67 LDD TMP1 
CMPO ~J65521 STO 2, X 
BLD JMP68 •'• .,.
J~IP67 AODD #15 LDD 12 9 Y 
Jtv1P68 STO 2, X STO 10,X 
LDD 8 'y AOOD 14 ,.,. 
ADDD lO,Y BCS JUP67 
scs JMP69 CMPO :t65521 
CMPD ti65521 BLO JUP6S 
BLD JMP70 JUP67 AODD li15 
JI'-IP69 ADDD ~15 JUP68 STD 12,X 
JI·1P70 STO lO,Y LDD 2 0 ,y 
LDO 6,Y AOOD 22,Y 
suso 8,Y ~cs JUP69 
BCC JMP71 CMPO #t6552l 
ADOD #t65521 BLO JUP70 
J ~1 P71 STD 8,X JUP69 ADDD It 1 5 
LDO 2, X JUP70 ST:J 22,Y 
ADDD 4 9 Y LOD 18,Y 
BCS JMP72 SU3D 20,Y 
CMPD ti65521 BCC JUP71 
BLO JI-IP 73 ADOD ~65521 
J~1P72 AODD #15 JUP71 STO 1 3, X 
Jt·IP7 3 STO HW1 LDD 12, X 
LDO z,x ADOD 16,Y 
SUBD 4 9 Y BC S JUP72 
BCC JMP74 CMPD t165521 
AOOO ~65521 PLO JUP73 
Jt-1P74 STD 4,X JIJP72 ADDD JilS 
SUBD lO,Y JUP7 3 STO HIDl 
BCC JMP75 LDD 12,X 
ADDD #65521 SUBD lt>,Y 
JtviP7 5 STD 6,X BCC JUP74 
LDD TMPl AOOD t165521 
STD 2, X JUP74 srn 1 4, X 
LDD 4 9 X SU~D 22,Y 
AD~D 1 0 ' y HCC JUP F· 
nCS JMP76 Anno 116~'<' 1 
CMPU ~65521 JUJ-175 STQ 16 9 X 
BLO Jt,1P77 LDO TMPl 
J~1P76 ADDO ttl5 STD 1 2 ')( 
JI'1P77 STD 4 9 X LDD 14,X 
LDD 2 , X .4 ODD 22,Y 
ADDD 8 ' )( P.CS JUP76 
Appendix-D 
cr~PD #65521 IS'<P75 STD 26, X 
SLO JUP77 I L!JD TMPl 
JUP76 ADDD #15 I STD 22,X 
JUP77 STD 14 ,.x I LDD 24,X 
L DD 12,X I .t~DDD 34 ,y 
ADDD 18,X I BCS St<P76 
BCS JUP78 I CMPO it65521 
U1PD #65521 I PLO SKP77 
BLO JUP79 ISKP76 l\000 II 1 5 
JUP78 ADOD ft15 SKP77 STO 24,X 
JUP79 STD TMP1 LDD 22,X 
LDD 12,X ADDD 28,X 
SUBD 18, X 8CS SKP7 .c: 
nCC JUP80 CMDD it65521 
ADOD .~65521 BLO SK 0 79 
JUP80 STD 1 8, X SKP78 ~DOD 1115 
LQD Tt"P1 SKP79 STD TMP1 
STD 12,X LDD 2 2, X 
... SUBD zs,x . ,. 
LDD 24,Y BCC SK 0 80 
STD 20,X. ADDD it65521 
ADDD 26,Y SKP80 STD 28,X 
scs SKP67 LDD TMPl 
CMPO 1'165521 ~ I STO 22,X 
BLO SKP68 I•'• ~~···~~··~~···~···~·~~ ... ¥¥¥¥¥¥¥¥¥ ¥¥¥¥¥¥¥¥¥¥
SKP67 AOOD #15 I ... J • 3-POINT POST-WE liVE •'• ,. . ,. ., . 
SK P68 STD 22,X I•'• ~·~·-~·~~·--~-------·· .,. -~-¥--¥·-·------~·-·--
LOD 32,Y I LDD 'X 
ADDD 34,Y I ADOO 10, X 
8CS SKP69 I BCS J/>1 °81 
01PD :!*65521 I Cr~tPD 1165521 
BLO SKP70 ! RLO Jt·1 P8 2 
SKP69 ADOD ~15 IJ~P81 AODD i:l5 
SKP70 STO 34,Y I J/~P8 2 STD 1Q,X 
LDD 30,Y I LDC' 2, X 
sugo 32,Y I ADDD 12, X 
BCC SKP71 I ~cs JI~P93 
ADDD 1165521 I CMPD #65521 
SK P71 STD 28,X I BLC JMP84 
LOD 22,X IJ~·1P83 ADDD ~15 
AOOD 28,Y IJ"''P84 STD 12,X 
scs SKP72 I L DO 4,X 
CMPD #65521 I ADDD 14,X 
BLO SKP73 I acs JMP85 
SKP72 ADDD #15 I CMPD #65521 
~KP73 STO TMP1 I f:lLO JMP86 
LDO 22,X I J t~P 8 5 ADDD .t' 1 5 
SUBD 28,Y IJMP86 STD 14, X 
BCC SKP74 I L DO 6, X 
Anoo 1165521 I 1\000 1 f:, , X 
SKP74 STD 24,X I P-CS JMP 37 
SURD 34,Y I (;~P() ttb5521 
BCC SKP75 I bLO J ~~ p i3 8 
ADQD 1:65521 IJ"'P87 · C. ODD It 1 5 
Appendix-~ 
Jt-IP88 STD 16,X I BLO JMP99 
LDD 8, X IJMP98 ADOD #15 
ADDD 18,X IJMP99 STD TMP1 
BCS JMP89 I LDD 16,X 
CMPD ~65521 I SUBD 26,X 
BLO JMP90 I 8CC JMPlOO 
Jl'tP89 ADDD #15 I ADOD ~65521 
JI'-1P90 STD 18,X IJ~1P100 ST') 26,X 
LDO 10, X I LDD H1Pl 
AD:JD zo,x I STD 16,X 
BCS JMP91 I LDD 18, X 
Ct-'1 p 0 *65521 I ADDD 28,X 
BLO JM 0 92 I BCS JM 0 101 
JMP91 ADOD 1!15 I C ~~ F' D rt65521 
JMP92 STD Tl~ o 1 I !:1-LO J ~·IP 10 2 
L:JD 10, X IJMP101 ADDD 1115 
SUBD zo,x IJMP102 STD THD1 
BCC Jt-1 p 911 I LDO 18,X 
AODD #65521 I SUBD 28,X 
J t-1 p 911 STD 20,X I sec JMP103 
LDO TMP1 I ADDD ~65521 
STD lO,X IJMP103 STD 28,X 
LOD 12,X I LDD TMP 1· 
ADDO 22,X I STD 18,X 
scs JI-1P922 I~: ·~~~~·········~·~···~~ r-¥¥¥¥¥¥¥¥-¥¥-¥¥¥¥¥¥¥¥
CMPD ~65521 I ... ·'• OUTPUT 3HUFFLE ·'· , . .. . .. 
BLO JMP93 I•'· ··~···~··········~··~· .,. .., ........ " .... ,. .. , ......................... , .... , .... , ............................................ , .... , .. 
JMP922 AODD .It 15 I L JX #AX 
Jr·tP93 STD Tt-1 P 1 I LOY t~QUT 
LDD 12,X I LDO ' X 
SUBD 22,X I STD 'y 
BCC Jt-1P94 I LDD 12, X 
ADDD #65521 I STD 2 'y 
J 1·1P 94 STD 22,X I LDO 24,X 
LDD Tt-1.Pl I STD 4' y 
STD 12,X I LDD 6,X 
LDO 14,X I srn 6,Y 
ADDD 24,X r LDO 1 8, X 
P.CS JMP95 I STD 8' y 
CMPD ~65521 I LDD 2 0, X 
BL!J JMP96 I STD lO,Y 
JMP95 ADOD #15 I L DO 2, X 
J f·IP96 STD TMPl I STD 12tY 
LDD 14,X I L 0 [' . 14,X 
SUBD 24,X I STD 14,Y 
BCC JMP97 I LDD 26,X 
ADO!:> #65521 I SiD l6,Y 
JMP97 STO 24,X I LDC' B, X 
LOQ .H1Pl I STC 1 8 ' y 
STO 14,X I LDO 10, X 
LDD 16,X I s:o 20 ,y 
~DOD 26,X I LDO 22,X 
ecs JMP98 I STO 2 2 'y 
CMPD :t6"i521 I LDD 4 , X 
Appendix:....o D-50 
STD 24,Y I ~DB 16087,29032,8748 
LDO 16,X I ~0~ 23174,43615,1465 
ST[) 26,Y ICOEFR c:oe, 61153,5460,18364 
LDD 28,X I FD8 46773~20640,5493 
STO 28,Y I FD3 6552,57331,37975 
•'· I FOB 28122,34561;24521 ...
COEFF FOB 1,6379,13376 I FOB 29504,28641,12521 
FOB 19136,18005,486471 FJ8 5913,Z474S,21938 
FOB 32759,8192,45457 I END STRT 
FOB 36817,5753,25311 ! ::: 
Appendix-£ 
Backplane wiring connections for the parallel microprocessor 
system 
Appendix-E C:::-1 
8ackpl.;1ne pin connections for t ne narallel 
microprocessor system. 
Flow of dai:a from SCJURC7. ( TX) -) D!:STIN~.TI:J~~ (RX) 
SOAP.D NO A - ( DRJCESSOR NQ ! 2 3) 
S!OE A SI!J~ R 
PIN# Fur~ c -r I r r·J PROC!:SSORtt IP!N;t FUNCT!CJN ?ROC:SSQPtt 
I 
1-8 DATA IN 1 2 .;) I 1-8 ~X s -> 3 
9-14 CLOCK. 1 ., 3 I 9-10 CL!JCK e -) 3 '-
1:,-22 DATA OUT 1 2 3 111-18 RX 4 -> "l -
23-28 JE 1 2 3 11 g-2o CLCCK 4 -> 3 
2<;-36 TX 1 -> 6 I 
37-38 CLOCK 1 -> 6 174 STA"!'US OUT 
3~-46 RX 6 -> 1 175 SYNC OUT 
47-48 CLQCK 6 -) 1 176 SYNC HJ 
49-Sb TX 2 -> 7 177 SYSTE~~ CLOCK. 
s-r-60 CLOCK 2 -> 7 178 rlALT 
49-56 TX 2 -> 5 179 RESET 
57-60 CLOCK 2 -) 5 129-61-93 +VCC 
61-68 RX 7 -) 2 132-64-96 G!:10UN~ 
69-70 C LOC'<. 7 -> 2 I 
71-78 RX 5 -> 2 I 
79-80 CLOCK 5 -> 2 I 
81-RS TX ':l -) 2 I -' 
8 '::1-9 2 CLOCK 3 -> 8 I 
81-88 TX 3 -) 4 I 
8'::1-92 CLnCK 3 -) 4 I 
~.OARO NO g - (DRJCESSCR 4 5 16) 
SIOE A SID!: p 
PIN# FUNCTION PRoc=ssoR~ I PIN# ~UNCTI'JI\J 
I 
1-8 DATA IN 4 5 74 ST.~TUS CUT 
<.:1-12 CLOCK 4 5 75 5PJC OUT 
13-20 DATA OUT 4 5 76 SYNC IN 
21-24 DE 4 5 77 SYSTF.r-1 CLCCK 
25-32 TX 4 -> 9 78 ~~LT 
33-36 CLOCK 4 -) g 79 P!:SF.T 
2S-32 TX 4 -> ':l 29-61-93 •VCC ..J 
33-36 CLOCK 4 -> 3 32-64-96 GRIJUtJO 
3-l-44 RX 9 -) 4 
45-46 CLOC !<, 9 -> 4 
47-54 RX 3 -> 4 
55-56 CLOCK 3 -> .:.. 
57-64 TX 5 -) 1 0 
6~-68 CLQCK 5 -> lf) 
Appendix-E E -2 
5 ., -64 TX 5 -> 2 I 
65-68 CLOCK 5 -> 2 I 
69-76 RX 10 -> 5 I 
77-78 CLCCK 1 0 -> 5 I 
7·Y-86 RX 2 -> 5 1. 
87-88 CLOCK 2 -) 5 I 
BCl\RD NO c - C0 Pr:c=ssoP 6 .., 'l) I 
SIDE A CT[\C .J •• - 0 
PIN# FUNC""ION PROCESS:JR# I D::: •,j :,t c:uNCTION PRJC.::SSCR:t 
I 
1-8 JATA Hl 6 7 8 I 1-8 DX 10 -> 7 
'9-14 CLOCK 6 7 8 I 9-10 CLOCK ·1:) -> 7 
1~-22 DATA OUT 6 7 8 111-18 TX Q -> 13 
·-
23-28 OE 6 7 8 119-24 CLQCK ·;.; -> 13 .., 
29-36 TX 6 -> 11 lll-18 TX 0 -> 3 
37-40 CLOCK 6 -> 11 119-24 CLOCK. .j -) 3 
29-36 TX 6 -> 1 111-18 TX 5 -> g 
37-40 CLOCK. 6 -> 1 119-24 CLOCK 3 -> c. 
41-,.48 RX 11 -> 6 125-32 P.X !3 -> 8 
49-50 CLCICK 11 -> 6 133-34 CLOCK 1 3 -> -;:, ·J 
51-58 RX 1 -> 6 135-42 RX 
., 
-) -~ 
-
5':1-60 CUJCK 1 -> 6 143-44 CLOCK 3 -> •) ·J 
61-68 TX 7 -> 12 145-52 RX 9 -> ~-
69-74 CLOCt<. 7 -> 12 153-54 CLLJCK 9 -) -~ 
61-68 TX .., -> 2 I I 
6'3-7~ C L CJ.C K 7 -> 2 174 ST.ATUS OUT 
61-68 TX 7 -> 10 175 SYNC OUT 
69-74 CLCJCK 7 -> 10 176 SYNC I ~J 
75-52 RX 12 -> 7 177 s Y s T: 1-1 CL QC K 
33-84 CLOCK 12 -> 7 173 HALT 
85-:92 RX , -> 7- 179 RESET '-
92..-94 CLOCK 2 -> 7 129-61-93 +VCC 
132-64-96 G R OU~! 0 
~H1ARO no D - CPROCESS!li\ 9 10 1 7) 
SIDE .ll. S!OE 8 
PIN:ti FIJNCTLJN PRCJCESSORrt I P DJ ~t FUNCTION P~OC':SSDP~ 
I 
1-B D A"'!".~ IN 9 10 I 1-8 RX 5 -) 10 
9-12 CLCJCI<. 9 10 I 9-10 CLOCK 5 -) 1 0 
13-20 :JATA OUT 9 1 0 111-12 ~X 7 -> 1 () 
21-24 c~ 9 1C !19-21) CLoer.·. 7 -> 11) 
2;,-32 TX q -> 14 I 
33-38 C L CJO. 9 -> 1 4 174 STI\TIJS ~UT 
2S-32 TX 9 -> 4 I 7 5 SYNC OUT 
3::1-38 CLJCK 9 -) 4 176 SHJC IN· 
2S-32 T.X ~ -> 0 177 SYSTEr"l C L r,cy.: 
Appendix-E 
33-38 CLOCK 9 -> 3 178 HALT 
39-46 RX 14 -> q 179 RESET 
47-48 CLOCK 14 ;..) 9 129-61-93 +VCC 
4':1-56 P.X 4 -> q 132-64-96 G r:our.J o 
57-58 C L 0 CJ<. 4 -> 9 I 
59-66 RX 8. -> g I 
6 7-6·8 CLOCK 8 -> 9 I 
69-76 TX 10 -> 15 I 
77-82 CLJCK 10 -> 15 I 
69-76 TX ·1 0 -> 5 I 
77-82 CLOCK 10 -> 5 I 
6 ') -7 6 TY 10 -> 7 I 
77-82 CLOCK 10 -> 7 I 
8::1-90 RX 15 -> 10 I 
91-92 CLOCK 1 5 -) 1!) I 
SOARD NC ~ - (PROCESSOR 11 12 13) 
SIDE A SI~E g 
PlNft FUNCTION PROCESSOR!:! I PHJ# ::::u~·'C T I 0 N PROC~SSIJP~ 
I 
1-8 DATA HJ 11 12 13 I 1-8 ?X <.3 -> 13 
9-14 CLOCK 11 12 13 I 9-10 CLOCK g -) 1 3 
15-'22 OATA OUT 11 12 13 111-18 ~X 14 -> 13 
23-28 OE 11 12 13 119-20 CLOCK 14 -> 1 3 
ZiJ-36 TX 11 -> 6 I 
37-38 CLOCK 11 -> 6 174 STATUS QUT 
39-46 RX ;, -) 11 175 SYNC JUT 
47-48 CLOCK 6 -) 11 176 SYNC I~. 
49-56 TX 12 -> 7 177 SYSTEf4 CLQC!<. 
57-60 CLOCK 12 -> 7 173 r'l\LT 
4CJ-56 TX 12 -> 1 5 179 RES~T 
57-60 CLOCK 12 -> 15 129-61-93 +VCC 
61-68 RX 7 -> 12 132-64-96 GPOUND 
69-70 CLOCK 7 -) 12 I I 
71-78 RX 1 5 -> 12 I 
79-80 CLCJCl<. 15 -> 1?. I 
81-88 TX 13 -) 8 I 
8':1-'32 CLOCK 13 -> g I 
81-88 TX 13 -> 14 I 
8':7-92 CLOCK 13 -> 14 I 
BOARD NQ ~ - CPRQCESS~D 14 15 18) 
S!OE A .~I 0 C: n, 
PIN# FUN C T I CJ ~~ PPOC::SSOP# IDINb F U f,J ( T I 0 1·! 
I 
1-8 DATA IN 14 1 5 174 ST.!ITUS GlJT 
9-12 CLCJCK 14 1 5 I 7 s s y fl( 'lUT 
Appenclix-E :-4 
13-20 DATA CJIJT 14 15 176 SYNC I~ 
21-24 OE 14 15 177 s v·s T E 1·1 CL'::CI< 
2:--32 TX 14 -> 9 I29-61-Q3 +VCC 
33-36 CLOCt< 14 -> 9 l32-64-'j6 GPOUW1 
25-32 TX 14 -> 13 I 
3::.;-36 CLOCK 14 -> 1 3 I 
37-44 RX 9 -> 14 I 
4S-46 CLJCK 9 -> 14 I 
47-54 PX 13 -> 14 I 
5S-56 CL:JCt<. 13 -> 14 I 
57-64 TX 1 5 -) 10 I 
6S-68 CLOCI< 15 -> 1 0 I 
57-64 TX 1 5 -> 12 I 
6.:--68 CLOCK 15 -> 12 I 
69-76 ~:X 10 -> 15 I 
77-78 CLOCK 10 -:-) 15 I 
7'-j-86 ~X 12 -> 15 I 
87-88 CLOCI<. 12 -> 15 I 
I 
CONTROL BOARD 
SI!JE ~ 
PlNit FUNC TIC ~J PROCESSOR# Pii'J# l=l.iNCT!ON P~oc:ssoR~ 
1-3 DATA OUT 47-54 DATil IN 
17-18 CLOCK. 1 63-64 CE 1 
1':1-20 CLOCK 4 65-66 Of: 7 
21-22 CLCJCK 7 67-68 Of 1 3 
23-24 CLOCK 10 69-70 Of 4 
25-26 CLOCK 13 71-72 Of. 10 
27-28 CLOCK 6 73-74 c,.. ,_ 11 
29-30 CLJCK 9 75-76 DE 2 
31-32 CLOCK 12 77-78 o: 8 
33-34 CLOCK 1 5 79-80 CE 14 
35-36 CLOCK 3 181-82 GE 5 
37-38 CLOCK 11 183-34' CE 6 
39-40 CLOO, 14 185-86 OF 12 
41-42 CLOCI< .,. I s 7.- 8 8 c: 3 '-
43-44 C L 0 C!<. 5 189-90 OE 9 
4;i-46 CLOCK 3 191-92 r--'- c;· 1 5 
SIDE g 
1-6 STtlTUS IN 
7-12 SYNC IN 
13 S Y ~JC OUT 
l<t-19 SYSTEt1 CLOCK OIJTDUT rn PPCC~SSORS 
., .. 
.:.U RfS~:T TO OTHER g1AROS 
21 HALT TO OTHER oOAt;~D) 
27 -9V FOP Rj-232 RX 
2~-61-93 +VCC 5V POWE~ !=:)(;) ALL eDARCS 
3i-64-96 G q our~ o 
(1) Bergland, G.D: 'Fast Fourier Transform Hardware Implementation 
- An Overview', IEEE Trans. Audio Electroacoustics, Jun. 1969, 
val AU-17, pp. 104-108. 
(2) Brigham, E.O.: The Fast Fourier Transform, Prentice Hall 
Inc., Englewood Cliffs, N.J., 1974. 
(3) Winograd, S.: 'On Computing the Discrete Fourier Transform', 
Math. Comput. 1978, val. 32, pp. 175-199. 
(4) Winograd, S.: 'On Computing the Discrete Fourier Transform', 
Proc. Nat. Acad. Sci., 1976, val. 73, pp. 1005-1006. 
(5) Martin, S.C.P.: Number Theoretic Transform Implementation 
using Microprocessors, Ph.D. thesis, 1980, Univ. of Durham. 
(6) Martin, S.C.P, and Stanier, B.J.: 'Microprocessor 
Implementation of Number Theoretic Transforms', Electron. 
Cir. and Syst., Jan. 1979, val. 3, pp. 21-26. 
(7) McClellan J.H., and Rader C.M.: Number Theory in Digital 
Signal Processing, Prentice Hall Inc., Englewood Cliffs, 
N.J. 1979. 
(8) Cooley, J. W., and Tukey, J. W.: 'An Algorithm for the Machine 
Calculation of Complex Fourier Series', Math. Comput., 1965, 
val. 19, pp. 297-301. 
(9) Agarwal, R.C., and Burrus, C.S.: 'Number Theoretic Transforms 
to Implement Fast Digital Convolution', Proc. IEEE, 1975, val. 63, 
pp. 550-560. 
(10) Vanwormhoudt, M.C.: 'On Number Theoretic Fourier Transform in 
Residue Class Rings', Corresp., IEEE Trans., 1977, val. ASSP-25, 
pp. 585-586. 
Ref-1 
(11) Leibowitz, L.M.: 'Fast Convolution by Number Theoretic 
Transforms', NRL Report 7924, Sept. 1975. 
(12) Rader, C.M.: 'Discrete Convolutions via Mersenne Transforms', 
IEEE Trans. Comput., 1972, val. C-21, pp. 1269-1273. 
(13) Agarwal, R.C., and Burrus, C.S.: 'Fast Convolution Using 
Fermat Number Transform with Applications to Digital 
Filtering', IEEE Trans., 1974, val. ASSP-22, pp. 87-97. 
(14) Leibowitz, L.M.: 'A Simplified Arithmetic for the Fermat 
Number Transform', IEEE Trans., 1976, val. ASSP-24, 
pp. 356-359. 
(15) McClellan, J.H.: 'Hardware Realisation of a Fermat Number 
Transform', IEEE Trans., 1976, val. ASSP-24, pp. 216-225. 
(16) Bywater, R.E.H.: Hardware/Software Design of Digital 
Systems, Prentice Hall Inc. Englewood Cliffs, N.J., 1981. 
(17) Lewin, D.: Theory and Design of Digital Computers, Thomas 
Nelson and Sons Ltd., 1972. 
(18) Parasuraman, B.: 'Hardware Multiplication Techniques for 
Microprocessor Systems', Computer Design, 1977, pp. 75-82. 
(19) Harman, M.G.: 'An Attempt to Design an Improved 
Multipication System', IEEE Trans. Comput., 1968, val. C-17, 
pp. 1090. 
(20) Rabiner, L.R., and Gold, B.: Theory and Application of 
Digital Signal Processing, Prentice Hall Inc. Englewood Cliffs, 
N.J., 1975. 
(21) Chu, Y;: Digital Computer Design Fundamentals, McGraw 
Hill, 1962. 
(2 2) Hayes, J.P.: Computer Architecture and Organisation, McGraw 
Hill, Kogakusha Ltd., 1978. 
Ref-2 
(23) Gosling, J.B.: Design of Arithmetic Units for Digital 
Computers, McMillan Press Ltd., London, 1980. 
(24) Nussbaumer, H.J.: 'Fast Multipliers for. Number Theoretic 
Transforms', IEEE Trans. C-27, Aug. 1978, pp. 764-765. 
(25) Brubaker, T.A., and Becker, J.C.: 'Multiplication Using 
Logarithms Implemented with Read-Only Memory', IEEE Trans. 
Comput., vol. C-24, pp. 761-765. 
(26) Chang, T.: 'Binary Read-Only-Memory Multiplier', 
Eletron. Lett., 13 Dec. 1973, vol. 9, pp. 580-581. 
(27) Johnson, N.: 'Improved Binary Multiplication System', 
Electron. Lett., 11 Jan. 1973, vol. 9, pp. 6-7. 
(28) Davies, A.C.: 'Trade-offs in Fixed-Point Multiplication 
Algorithms for Microprocessors', Comput. and Dig. Techniques, 
1979, vol. 2, pp. 105-112. 
(29) Weed, M.: 'Clockless Multiplication and Division Circuits', 
BYTE, Dec. 1978, pp. 128-136. 
(30) Artwick, B.A.: Microcomputer Interfacing, Prentice Hall Inc., 
Englewood Cliffs, N.J., 1980. 
(31) Davies, A.C., Fung, Y.T.: 'Interfacing a Hardware Multiplier 
to a General-Purpose Microprocessor', Microprocessors, 1977, 
vol. 1, pp. 425-432. 
(32) Evanczuk, S.: 'Josephson Chip Multiplies Ultra Fast', 
Electronics, 14 July, 1982, pp. 48-50. 
(33) Bate, J., and Burkowski, F.: 'A High Speed Extended 
Precision Multiplier for a Microprocessor', Proc. Int. 
Symp. on Mini and Micro Computers, Montreal, Canada, 11-18 
Nov. 1977, pp. 10-13. 
(34) Robinson, D.: 'Hardware Multiplier/Divider Unit for 8-bit 
Ref-3 
Microprocessor Systems', New Electronics (G.B), Mar. 1979, 
val. 12, pp. 20. 
(35) Mick, J., and Springer, J.: 'An Integrated 'Circuit, 
High-Speed Serial-Parallel Multiplier', Apr. 1976, pp. 42, 46. 
(36) Rollenhagen, D.C., Kimball, R.M., and Shay, H.P.: 'LSI 
Multiplier-Divider for 8080', Proc. IEEE 1977 Nat. Aerospace 
and Electron. Conf., NAECON 1977, Dayton, Ohio, U.S.A., 
17-19 May, pp. 887-892. 
(37) Day, M.J.: 'Faster Multiply with Microprocessor Hardware 
Multiply Device', Electron (G.B.), 27 Feb. 1978, pp. 39. 
(38) Waser, S., Newton, V.: 'Increasing Multiplication Speed', 
Electron (G.B.), 12 Dec. 1977, pp. 57-58. 
(39) Rohr, P.: 'LSI Multipliers: The Second Generation', 1979 Int. 
Micro and Mini Computer Conf., Houston, Texas, U.S.A., Nov. 1979, 
pp. 140-143. 
(40) Ambikairajah, E., and Carey, M.J.: 'Technique for Performing 
Multiplication on a 16-bit Microprocessor using extension of 
Booth's Algorithm', Electron. Lett., 17 Jan. 1980, val. 16, 
pp. 53-54. 
(41) Giest, D.J.: 'MOS Processor Picks up Speed with Bipolar 
Multipliers', Electronics (U.S.A.), July 1977, val. 50, 
pp. 113-115. 
(42) '16*16-bit Multipliers meet Military/Commercial High Speed 
Applications', Comput. Des. (U.S.A.), Aug. 1976, val. 15, 
pp. 50. 
(43) McCrea, P.G., and Matheson, W.S.: 'Design of High Speed 
Fully Serial Tree Multiplier', IEEE. Proc. Jan. 1981, 
val. 128, pp. 13-20. 
Ref-4 
(44) Advanced Micro Devices: 'AM25S558, Eight-bit by Eight-bit 
Combinational Multiplier', Preliminary data sheet. 
(45) Flores, I.: The Logic of Computer Arithmetic, Prentice 
Hall Inc., Englewood Cliffs, N.J., 1963. 
(46) Booth, A.D, and Booth, K.H.V.: Automatic Digital 
\ 
Calculators, Butterworth and Co. ltd., London, 1965. 
(47) Abd-Alla, A.M, and Meltzer, A.C.: Principles of Digital 
Computer Design, Prentice Hall, Englewood Cliffs, N.J., 1976, 
val. 1. 
(48) Renold, A.: Comparison of some 8-bi t Microprocessors by 
means of Benchmark Programs, Mitt. Agon. (Switzerland), 
Oct. 1981, pp. 71-75. 
(49) Kolba, D.P., and Parks, T.W.: 'A Prime Factor FFT Algorithm 
Using High-Speed Convolution', IEEE Trans., val. ASSP-25, 
Aug. 1977, pp. 281-294. 
(50) Morris, L.R.: 'A Comparative Study of Time Efficient FFT and 
WFTA Programs for General Purpose Computers', IEEE Trans., 
val. ASSP-26, Apr. 1978, pp.l41-150. 
(51) Silverman, H.F.: 'An Introduction to Programming the Winograd 
Fourier Transform Algorithm (WFTA)', IEEE Trans., val. ASSP-25, 
Apr. 1977, pp. 152-165. 
'Correction and an Addendum to an Introduction to Programming 
the Winograd Fourier Transform Algorithm (WFTA)', IEEE Trans. 
ASSP-26, 1978, pp. 268. 
(52) Nawab, H., and McClellan, J.H.: 'Bounds on the Minimum Number 
of Data Transfers in WFTA and FFT Programs', IEEE Trans., 
val. ASSP-27, Aug. 1979, pp. 394-398. 
(53) Bailey, D.: 'Winograd's Algorithm Applied to Number-Theoretic 
Ref-5 
Transforms', Electron. Lett., 1 Sept. 1977, val. 13 pp. 548-549. 
(54) Texas Instruments Ltd.: TMS9900 Microprocessor Data Manual, 
Aug. 1976. 
(55) Texas Instruments Ltd.: TMS990/100M Microcomputer User's 
Guide, Mar. 1978. 
(56) Moore, C.H.: 'FORTH: A New Way to Program a 
Minicomputer', Astron. Astrophy. Suppl., 1974, val. 15, 
pp. 497-511. 
(57) Brodie, L.: Starting Forth, Prentice Hall Inc., Englewood 
Cliffs N.J., 1981. 
(58) Smith, M.F.: 'Comparative Software Analysis of the MC6809 
Microprocessor', Microprocessors and Micro Systems, val. 5, 
Nov. 1981, pp. 401-404. 
(59) Mintzer, F.: 'Parallel and Cascade Microprocessor 
Implementation for Digital Signal Processing', IEEE Trans. 
ASSP-29, Oct. 1981, pp. 1018-1027. 
(60) Zohar, S.: 'Outline of a Fast Hardware Implementation 
of Winograd's OFT Algorithm', IEEE ICASSP, Apr. 1980, val. 3, 
pp. 796-799. 
(61) Mintzer, F.: 'Attributes of Parallel and Cascade 
Microprocessor Implementations of Digital Signal Processing', 
IEEE Int. Conf. ASSP, April 1980, ICASSP, val. 3, pp. 912-915. 
(62) Duff, M.J.B.: 'Array Processing', Electronics and Power 
Nov./Dec. 1980, pp. 888-893. 
(63) Bain, W.L., and Jump, J.R.: 'Hardware Scheduling Strategies 
for Systems with many Processors', Proc. Int. Conf. Parallel 
Processing, Bellaire, MI, U.S.A., Aug. 1978, pp. 184-187. 
(64) Bellm, H., and Sauer, A.: 'Methods of Data Exchange Between 
Ref-6 
Microcomputers', Proc. Microprocessing and Microprogramming, 
Amsterdam, N. Holland, 3-6 Oct. 1977, Microcomputer Archit., 
pp. 16-22. 
(65) Arden, B.E., and Berenbaum, A.D.: 'A Multi-Microprocessor 
Computer System Architecture', Proc. 5th Symp. Operating 
System Principles, Operating System Rev., Nov. 1975, val. 9, 
pp. 114-121. 
(66) Enslow, P.H.: Multiprocessor and Parallel Processing, 
John Wiley and Sons, 1974. 
(67) Pollard, L.H.: 'Multiprocessing with the TI9900', Eleventh 
Ann. Asilomer Conf. Circuits Systems and Computers, Pacific Grove, 
CA, U.S.A., 7-9 Nov. 1977, pp. 461-465. 
(68) Hoffner, Y., and Smith, M.F.: 'Communication Between two 
Microprocessors Through Common Memory', Microprocessors and 
Microsystems, July/Aug. 1982, val. 6, pp. 303-308. 
(69) Witten, I.H., and Jenkins, R.L.: 'Processor-Processor 
Dialogue Through Existing Input-Output Channels', Computer 
and Digital Techniques, Oct. 1978, val. 1, pp. 125-130. 
(70) Parkinson, D.: 'An Introduction to Array Processors', 
Syst. Int. (G.B), Nov. 1977, val. 5, pp. 21-23. 
(71) Caprani, 0., Jensen, K.H., and Ougaard, U.: 'Microprocessors 
Connected to a Common Memory', Microprocessor and 
Microprogramming Amsterdam, Netherland, 3-6 Oct. 1977, 
Euromicro Symp., Microcomputer Architec., pp. 175-181. 
(72) Hughes, P., and Doone, T.: 'Multiprocessor Systems', 
Syst. Int. (G.B), Feb. 1978, val. 6, pp. 20-21. 
(73) Raphael, H.: 'Multiprocessor Techniques for uP Systems', 
Electron. Eng. (G.B), 1978, val. 50, pp. 65-67. 
Ref-7 
(74) Tanabe, K., and Matsumoto, K.: '16-bit Microprocessor 
with Dual Bus Architecture', Proc. Spring COMPCON, 1979, San 
Francisco, 26 Feb. to 1 Mar. 1979, N.Y., U.S.A., pp. 98-101. 
(75) Crushman, R.H.: 'uP/uC Chip Directory', EON, Oct. 1979, 
pp. 133-240. 
(76) Crushman, R.H., and Bucker, J.: 'EON Seventh Annual uP/uC 
Chip Directory', EON, Nov. 1979, pp. 94-211. 
(77) Scales, H.: 'Multiprocessing with the Motorola's MC6809E', 
BYTE, Jul. 1981, pp. 136-156. 
(78) Leventhal, L.A.: 6809 Assembly Language Programming, 
Osborne McGraw Hill, 1981. 
(79) Motorola Semiconductors Ltd.: MC6809 Data sheet. 
(80) Leibowitz, L.M.: 'A Binary Arithmetic for the Fermat Number 
Transform', NRL Report 7971, 18th Mar. 1976. 
( 81) G a 11 a cher, J.: 'Processor-Processor Communication', 
Microprocessors and Microsystems, Sept. 1979, val. 3, 
pp. 317-320. 
(82) Fronheiser, K.: 'Device Operation and System Implementation 
of the Asynchronous Comminications Interface Adapter (MC6850) 
Motorola Semiconductors Ltd., Application Note AN-754. 
(83) Motorola Semiconductors Ltd., MC6850 Data Sheet. 
(84) Wakerly, J.: 'Serial Communications', Microprocessor and 
Microsystems, 1981, val. 5, pp. 247-253. 
(85) Motorola Semiconductors Ltd.: MC14411 Data Sheet. 
(86) Texas Instruments Ltd.: The TTL Data Book for Design 
Engineers. 
(87) Patel, J. H.: Processor-Memory Interconnection for 
Multiprocessors', Proc. 6th Ann. Symp. on Computers Architec., 
Ref-8 
Philadelphia, PA, 23-25 Apr. 1979, N.Y., U.S.A, pp. 168-177. 
(88) Davidson, K.A., Parsons R.L. etal: 'Processor-to-Processor 
Inter-Communication Employing a Common Storage Module', 
IBM Tech. Disc. Bull., Mar. 1979, val. 21, pp. 3959-3960. 
(89) Zaks, R.: Programming the Z80, 1982, SYBEX Inc. 
(90) Signetics: 8X300 Data Sheet. 
(91) National Semiconductors Ltd.: COP402 Data Sheet. 
(92) Zaks, R.: Programming the 6502, 1978, SYBEX Inc. 
Ref-9 
