Application of the residue number system to the matrix multiplication problem by Chard, Gary Franklin
APPLICATION OF THE RESIDUE NUMBER SYSTEM 
TO THE MATRIX MULTIPLICATION PROBLEM 
A Thesis 
by 
GARY FRANKLIN CHARD 
Submitted to Office of Graduate Studies of 
Texas A&M University in partial fulfillment of the requirements for the degree of 
MASTER OF SCIENCE 
December 1989 
Major Subject: Electrical Engineering 
APPLICATION OF THE RESIDUE NUMBER SYSTEM 
TO THE MATRIX MULTIPLICATION PROBLEM 
A Thesis 
by 
GARY FRANKLIN CHARD 
Approved as to style and content by: 
Yu-Ying Jackson Leung (Chair of Committee) 
Karan L. Watson 
(Member) Phil p S. Noe (Member) 
g ~3 - . + 9:Lc ~ 
Donald K. Friesen 
(Member) Jo W. Howze (Head of Department) 
December 1989 
ABSTRACT 
Application of the Residue Number System 
to the Matrix Multiplication Problem. (December 1989) 
Gary Franklin Chard, B. S. , Texas A&M University 
Chairman of Advisory Committee: Dr. Yu-Ying Jackson Leung 
I'he primary objective of this research is to evaluate 
a residue implementation of the matrix multiplication 
algorithm by comparison to a more conventional binary 
approach. Included in this research is a proposed method 
of concurrent residue multiplication and addition, as well 
as methods of input and output translation. Common 
building blocks are used repetitively throughout the design 
process, in an effort to minimize the design time of such a 
residue system. Logical simulation of the residue design 
was conducted for functional verification, as well as for a 
means of timing comparison to a more conventional design. 
It was found that the residue design was 2. 73 times larger, 
and 3. 18 times faster than the typical binary comparison 
structure. Many comments are presented throughout this 
thesis pertaining to considerations that must be made when 
contemplating the design of a residue system. The matrix 
multiplication algorithm is also simulated, such that exact 
timing information is given for both input and output 
matrix coefficients. 
To my parents 
ACKNOWLEDGEMENT 
I would especially like to thank Dr. Leung for his 
advice and continual support throughout this research. l 
also thank him for helping me make graduate school a 
positive experience. I would also like to thank Dr. 
Watson, Dr. Noe, and Dr. Friesen for serving on my 
committee. Finally, I would like to thank my girlfriend 
Sherry for her moral support, and for her help in preparing 
the manuscript. 
TABLL' O" CONTENTS 
ABSTRACT 
Page 
1. 1. 1. 
DEDICATION 
ACKNOWLEDGMENT 
LIST OF TABLES 
LIST OF FIGURES 
CHAPTER 
I INTRODUCTION 
1 V 
1. X 
1. 1 Problem Statement 
1. 2 Approach 
II BACKGROUND 
2. 1 The Residue Number System 2. 1. 1 Properties of the RNS 2. 1. 2 Basic RNS Identities 
8 
8 
13 
2. 2 Basic Operations in The RNS 2. 2. 1 RNS Addition 2. 2. 2 RNS Multiplication 
2. 3 Translation from Binary to Residue 
2. 4 The Chinese Remainder Theorem 
2. 5 Sign Representation of a Residue Number 
2. 6 Introduction to Matrix Multiplication 
15 
16 
18 
19 
21 
22 
2. 7 The Matrix Multiplication Algorithm 2. 7. 1 The Multiply and Add Cell 2. 7. 2 Formulation of the Matrix 
Multiplication Algorithm 
III APPLICATION OF THE RNS 
TO MATRIX MULTIPLICATION 
26 
27 
28 
31 
3. 1 Error Free Design 
3. 2 System Dynamic Range Determination 
32 
35 
TABLE OF CONTENTS (Continued) 
IV MATR X MULTIPLICATION ALGORITHM SIMULATION 
Page 
39 
MAC Computing Structure 
Input Matrix Coefficient Timzng 
Algorithm Simulatzon Development 
Algorithm Simulation 45 
V DESIGN DEVELOPMENT 
5. 1 
5. 2 
5. 3 
Res' due System Specificatzons 
Multiply and Add Cell 5. 2. 1 MAC Functional Configuration 5. 2, 2 Modified Braun Array 5. 2. 3 Lower Truth Table Modulo m 5. 2. 4 Upper Truth Table Modulo m 5. 2. 5 Four — Bit Binary Adder 
Input Translation 5. 3. 1 Input Operand Adjustment 5. 3. 2 Residue Digit Generation 
52 
55 
57 
61 
66 
67 
68 
70 
72 
Output Translation 5. 4. 1 Controlled Addition/Subtraction 5. 4. 2 Multiplication by Inverses 5. 4. 3 Correct Sign Determination 
75 
78 
78 
83 
VI SIMULATION RESULTS AND COMPARISON 87 
6. 1 Simulation Development 87 
6. 2 Simulation Results 6. 2. 1 MAC Simulation 6. 2. 2 Input Simulation 6. 2. 3 Output Simulation 6. 2. 4 Global Considerations 
88 
89 
91 
93 
95 
6. 3 Residue Design Area Calculations 6. 3. 1 MAC Area 6. 3. 2 Input Translation Area 6. 3. 3 Output Translation Area 6. 3. 4 Global Considerations 
96 
97 
98 
98 
100 
Design Comparison 6. 4. 1 Comparison Structure 101 102 
TABLE OF CONTENTS (Continued) 
6. d. 2 Time and Area Comparison 
VII CONCLUSION 
Page 
103 
105 
7. 1 Contributions 
7. 2 Future Research 
106 
108 
REFERENCES 
APPENDIX A MATRIX MULTIPLICATION ALGORITHM 
SIMULATION RESULTS 
110 
112 
APPENDIX B TRUTH-TABLES AND KARNAUGH MAPS 116 
APPENDIX C SCHEMATIC PLOTS 
APPENDIX D SIMULATION RESULTS 
125 
136 
VITA 150 
LIST OF TABLES 
Table Page 
2. 1 Residue Representation of the Numbers from -4 to t32 for Moduli 2, 3, 5 12 
2. 2 Multiplicative Inverses 
2. 3 Partitioned Interval of Definition 
3. 1 Determination of Dynamic Input Range 
4. 1 Matrix A and B Input Coefficient Timing 
15 
23 
36 
4. 2 Algorithm Simulation for Arbitrary 
Input Matrices 
4. 3 Output Coefficient Timing 
5. 1 Modulo 15 Truth Table 
5. 2 Multiplicative Znverses 
6. 1 Primitive Component Models 
6. 2 MAC Simulat'on Results 
6. 3 Input Translation Simulation Results 
6. 4 Output Translation Simulation Results 
6. 5 Processing Time Comparison 
A. l MAC Simulation Data 
A. 2 Input Translation Simulation Data 
A. 3 Output Translation Simulation Data 
62 
BO 
BB 
90 
92 
103 
137 
143 
149 
LIST OF FIGURES 
Figure 
2. 1 Multiply Add Cell 
2. 2 Hexagonal Array for Matrix Multiplication 
2. 3 Banded Matzix Multiplication 
3. 1 System Configuration 
4. 1 Computing Array for Matrix of Bandwidth Five 
4. 2 MAC Input/Output Naming Convention 
4. 3 Input Matrices of Bandwidth Five 
5. 1 Residue MAC Configuration 
5. 2 Proposed MAC Configuration of Each Modulus 
5. 3 Modified Braun Array 
5. 4 Full Adder Cell Design 
5. 5 Modified Braun Array Hardware 
5. 6 Modulo 15 Karnaugh Maps 
5. 7 Modulo 15 Truth Tables 
5. 8 Four-Bit Binary Adder 
5. 9 Input Translation Functional Configuration 
5. 10 Input Operand Adjustment 
Page 
27 
29 
30 
31 
42 
52 
56 
58 
60 
60 
65 
68 
71 
73 
5. 11 Mixed Radix Coefficient Determination 77 
5. 12 Conditional Adder 
5. 13 Non-Conditional Adder 
5. 14 Four-Bit Braun Array 
5. 15 Multiplication and Addition of the Mixed-Radix Coefficients 
79 
79 
81 
A. l Modulo 7 Truth Table Hardware 126 
LIST OF FIGURES (Continued) 
Page 
A. 2 Modu'o '1 Truth Table Hardware 
A. 3 Modulo 13 Truth Table Hardware 
A. 4 8X4 Multiplier 
A. 5 Twelve-Bit Multiple Generator 
A. 6 Fourt. een-B't Carry Save Adde" 
A. 7 Fourteen-Bit Binary Adder 
A. 8 Modrfied Adder A 
A. 9 Modi ied Adder B 
A. 10 Seventeen-Bit Two's Complementer 
A. ll Modulo 7 Trial ¹1 MAC Simulat'on 
A. 12 Modulo 11 Trial ¹1 MAC Simulation 
A. 13 Modulo 13 Trial ¹1 MAC Simulation 
A. 14 Modulo 15 Trial ¹1 MAC Simulation 
A. 15 Modulo 16 Trial ¹1 MAC Simulation 
127 
128 
129 
130 
131 
132 
133 
134 
135 
138 
139 
140 
141 
142 
A. 16 Modulo 7 Trial ¹1 
Input Translation Simulation 
A. 17 Modulo 11 Trial ¹1 
Input Translation Simulation 
A. 18 Modulo 13 Trial ¹1 
Input Translation Simulation 
A. 19 Modulo 15 Trial ¹1 
Input Translation Simulation 
A. 20 Modulo 16 Trial ¹1 
Input Translation Simulation 
145 
146 
147 
148 
CHAPTER I 
INTRODUCTION 
Digital signal processing is a rapidly emerging 
technical area, where speed of computation is of prime 
importance, as well as practical considerations such as 
component packaging, silicon area, and cost. Some of the 
newest areas of interest in digital signal processing are 
real-time image processing, satellite communications, 
pattern recognition, and vector calculations. For these 
applications, parallelism has recently proven to be the key 
to faster processing of data. Parallelism may be achieved 
on mathematical, architectural, and realizational levels 
[1). The residue number system , as will be seen shortly, 
achieves parallelism on a mathematical level. 
Around 100 A. D. , a Chinese mathematician named Sun Tzu 
authored a book containing a poem called t' ai-yen (great 
generalization) . This poem was a puzzle, which challenged 
the reader to determine an integer number having a 
remainder of two, three, and two, when divided by three, 
five, and seven, respectively. The answer to the poem 
being the integer twenty-three. Although Sun Tzu did not 
know it at the time, he formed the basis of the Residue 
Number System (RNS), which would be studied in detail 
twenty centuries later. His poem is the equivalent of a 
IEEE Transactions on Computers used as a journal model. 
three — modulus RNS with three prime moduli (3, 5, 7] . This 
poem also stated a rule, refined by scholarly people over 
many centuries, called the Chinese Remainder Theorem [2]. 
It is the Chinese Remainder Theorem that allows convers'on 
of the resicue remainder digits back to an integer. 
Between twenty and thirty years ago, a renewed 
interest in the Residue number system began. Szabo and 
Tanaka published a comprehensive book on the basic theorems 
and properties of the RNS [2]. Their primary interest in 
this number system was its application to the design and 
organization of digital computing machines. Without the 
invention of digital computers, the residue numbe system 
would most likely be as underdeveloped today as it was 
centuries ago. The techniques of addition, subtraction, 
multiplication, and division, as well as the fundamental 
properties and theorems of residue arithmetic were 
presented in [2]. Szabo and Tanaka concluded that 
operations such as addition, subtraction, and 
multiplication are simple operations to perform. Division, 
sign determination, and overflow detection were found to be 
difficult operations in both concept and implementation. 
Since the renewed interest occurring in the mid 
1960's, scientists have been studying and contributing to 
the topic of residue arithmetic. Industry has never 
adopted the residue number system as a viable alternative 
to the conventional binary number system. Several changing 
factors, such as the need for increased parallelism in 
algorithms, new hardware capabilities, and semiconductor 
technology evolution, will soon cause the characteristics 
of the residue number system to be more closely examined. 
The number system has many inherent advantages over 
conven ional number systems, as well as a few shortcomings, 
whi h subsequently have greatly limired the acceptance and 
use of the residue number system. Typical processors 
implemented today are unable to do matrix multiplication 
without careful programming by the user. Thus by 
implementation of a dedicated processor for the specific 
task of matrix multiplication, using current Very Large 
Scale Integration (VLSI) techniques, a cost and performance 
effective solution to the problem of matrix multiplication 
can be achieved. The approach that will be used in 
designing this dedicated processor, will be that of 
systematically connecting local processing elements in a 
parallel-pipelined fashion. In [3], an algorithm is 
proposed for matrix — matrix multiplication using a systolic 
array concept. Designs using the systolic array concept 
(simple and regular interconnections, parallel algorithms, 
and pipelining), have been proven to achieve a higher chip 
density, resulting in both a cheaper and a higher 
performance implementation. It is possible that further 
time enhancements may be made by the RNS, which has an 
inherent parallel nature, as compared to the conventional 
binary number system. The intent of this research is to 
show that by applying the residue number system to a 
computationally intense problem, enhancements can be made 
over a compazable problem using the binazy number system. 
1. 1 Problem Statement 
Significant amounts of research effort have oeen 
expended investigating the properties of the residue number 
system, and its appl'cations to current computer 
technology. The aim of this research is to determine iz 
through applying the residue number system to the matrix 
multiplication problem, the solution time can be 
effectively reduced. Also, this research hopes to express 
several practical considerations to be dealt with when 
contemplating the use of a residue type design. 
Furthermore, the exact timing information of the matrix 
multiplication algorithm will be studied, in hope that a 
generalized timing equation can be derived. 
1. 2 Approach 
The goal of this research is to quantify the 
processing time of matrix multiplication, using the residue 
number system. It is expected that the exact timing 
specifications of the system, as well as the exact chip 
area such a design would occupy will be determined. 
Moreover, the results of this research will allow the 
comparison of the residue number system approach to that of 
the bina y system. In the event that significant 
improvements over the binary number system are made, the 
use of the residue number system vill greatly be promoted. 
Details of the approach towards the above stated goals will 
now be described. 
There are several tasks to be considered in the design 
and simulation of a system as ment&oned above. First, a 
method of translation to the residue number system which is 
suitable to a pipelined operation will have to be 
considered. There are currently several papers making 
comments on the translation problem from binary to the 
residue number system. In [4], ROM's (Read Only 
Memories) are used to accomplish part of the translation 
task. In a VLSI design, it is very desirable to avoid 
using ROM' s from the aspect of their slow speed and area 
requirements. Thus, it will be important to develop a 
method of translation, avoiding the use of ROM' s. 
Also, on a larger architectural level, a systolic 
array method of matrix multiplication will be necessary 
[5] . The method of matrix multiplication, proposed in [3], 
uses a systolic array approach. This algorithm will be 
developed, and tailored to accommodate the RNS. The exact 
timing information will also be given. 
Next, a method for implement. at. ion of the basic residue 
number operations such as addition and multiplication will 
be investigated [6-B] . Once again, the design will avoid 
using the ROM approach, in search of a nigher performance 
solution. Considerable time will be spent on the 
optimization of addition and multiplication processes, 
since the MAC (Multiply and Add Cell) will be the most 
prevalent processing element in the design. 
Just as it was necessary to translate from the binary 
to the RNS, it will be necessary to translate from the RNS 
back to the binary number system I4, 9-12]. 
As a verification on the design process, and as a 
check on the timing information, the design will be 
simulated on an Apollo workstation using Mentor Graphics' 
Neted and Quicksim design tools. 
CHAPTER TI 
BACKGROUND 
During the 1950' s, fabrication of transistors on 
crystalline silicon was developed. The integrated circuit 
plays a large role i. n society today. It has applications 
ranging from components in home stereos, to the eiectronic 
ignition control computer module in automobile engines. 
The integrated circuit is also fundamentally important to 
computers as we know them today. 
Cver a short time in the span of history, the 
integrated circuit (IC) has evolved from containing several 
transistors, to present technology of a million transistors 
on a single silicon chip. 
The photolithographic process of the mid 1980's allows 
for the fabrication of integrated circuits very large in 
size, not previously possible, to be placed inside a single 
component package. Even more important than this, current 
research in component packaging includes effort in the area 
of multichip modules, where it will be possible to 
implement large circuits in several pieces, and combine 
them onto a silicon substrate, and encase them in one 
package [13] . 
Therefore, as technology progresses, the capability of 
fabricating complex systems such as the matrix 
multiplication algorithm, requiring significant amounts of 
hardwaze, becomes more viable. It should be noted that the 
desire to implement such a specialized algorithm would be 
primarily for that of an increase in computational speed. 
The host processor could nor possibly multiply two matrices 
of such a complexity in a comparable amount of time. The 
matrix multiplication algorithm will be responsible foz a 
ceztain increase in speed, which is further enhanced 
through the application of the residue number system. The 
properties of the residue number system will now be 
invest' gated. 
2. 1 The Residue Number System 
The notation used in introducing the properties and 
various aspects of the Residue Number System will be 
consistent with that of Szabo and Tanaka [2]. In cases 
where theorems are stated, the proofs will be omitted. 
2. 1. 1 Properties of the RNS 
Every number system has several characteristics 
allowing it to be distinguished from other number systems. 
Among these characteristics are the range, uniqueness in 
representation, and the base (radix) of the number system. 
The decimal number system and the binary number system are 
both fixed radix number systems. The decimal number system 
has a fixed radix of ten, the binary number system has a 
fixed radix of two. The following illustrates the idea of 
a fixed radix system using the decimal number system as an 
example. 
Q~m~e 2 
(12793) zo = 1*10 + 2*10 + 7*10 + 9*10 + 3*10 
Thus, we note that any decimal number can be expressed as a 
sum of rts individual digits multiplied by the base ra'sed 
to the appropriate power of the digit being expressed. In 
this case the digits are multiplied by powers of 10. The 
residue number is not a fixed radix number system. In 
fact, the residue system has more than one radix and is 
described by an N-tuple of integers &m~, mz, ms, . . . . , mw) 
where each of the integers m is called a modulus. This N- 
tuple of integers is often referred to as moduli, which is 
the plural form of the word modulus. Any number x in the 
residue number system can be expressed as an N-tuple of 
integers defined by a set of N equations: 
x = q m + r i = 1, 2, 3, . . . . . , N 
where q is an integer chosen to ensure that r has a value 
equal to or greater than zero and less than the modulus m 
The integer number r is the least positive remainder of 
the division of x by m . This value, r~, is called the 
residue of x. modulo m , often denoted by /x/ . The 
quotient term, q , is often represented as [x/m ] . A 
10 
commonly used form of the above equation is often expressed 
as 
x = m„[x/m;] + /x/ 
where /x/, „ is always a positive integer. The following 
example illustrates both the idea of an N — tuple of moduli, 
and the idea of an N — tuple of rl. 
~E2 2 E 22: 
For a three modulus system, with moduli given by an N- 
tuple (N=3) (mi, m~, ms) = (3, 5, 7), given an integer in the 
decimal number system x = (37) io, a representation in the 
residue l. umber system can be found as follows: 
/x/ i = /37/s = x 
= /x/ , = /37/s = x— 
/x/ s = /37/7 = x 
mi[x/mi] = 37 3[12] 
mz(x/mz] = 37 — 5[7] 
ms[x/ms] = 37 — 7[5] 
Thus, the RNS representation of 37 is (ri, rs, rs) 
(1, 2, 2) . 
Similarly, we can find the residue representation of 
the decimal number 142. 
ri = 142 — 3[47] = 1 
rz = 142 
rs = 142 
5[28] = 2 
7[20] = 2 
Thus, the RNS representation of 142 is given by the set 
(1, 2, 2). It should be noticed that this is the exact 
result obtained in the previous example when a RNS 
representation of 37 was found. It seems as though a 
contraaiction has been made, implying that as far as the 
residue number system is concerned, the numbers 37 and 142 
are identical. The following theorem will help resolve 
this paradox (2). 
Two integers x and y have the same representation for a given ser of moduli mi, mx, . . . . mw if and only i (x — y) is an integer multiple of the least common multiple of the moduli denoted by M. 
The least common multiple of the moduli for the above 
examples is M = (3) *(5) *(7) = 105. If we denote x = 37, 
and y=142, then (x — y) = 105, which is an integer multiple 
of M. Thus as the theorem predicts, the numbers 37 and 142 
should have the same residue representation. F'urther 
insight into the RNS can be obtained by examining Table 
2. 1. First we notice that M = (2) *(3) *(5) = 30. Also 
noticing that the residue representation of 0 is the same 
as the residue representation of 30, and that all the 
numbers between 0 and 30 have a unique residue 
representation. It is also true that any arbitrary 
interval of exactly 30 numbers denotes a unique mapping 
from decimal to a residue representation. The residue 
number system is periodic, and must be restricted to a 
single period, denoted by an interval of definition. It is 
Table 2. 1 Residue Representation of the Numbers 
from -4 to +32 for Moduli 2, 3, 5 
-4 
-3 
-2 
-1 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
Moduli 
3 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
Moduli 
3 
13 
fairly appazent that once a number is converted into its 
residue representation that it is not easy to determine the 
sign of a number, nor is it easy to perfozm any type of 
magnitude compar' son among the residue digits. 
Consequently, this is one of the most serious disadvantages 
of the Residue Number System. 
2. 1. 2 Basic RNS 1dentities 
The following are several pertinent identities of the 
RNS, with the pzoofs omitted for the sake of brevity. 
Interested readers are uroed to consult [2] for urtner 
details. 
(1) 0 & /x/ & (m-1) 
(2) /Km/ = 0 for K an integer 
(3) //x/„/ = /x/ 
(4) /x+mK/ = /x mK/ = /x/ 
(5) /-x/ = /(m-1) x/ = /m-x/ 
Addition for a single modulus in the residue number system 
is now formulated by the following equation: 
/x + y/ = //x/ + /y/ / = //x/ + y/ = /x + /y/ / 
/x+y/ is often referred to as the sum modulo m of x and y. 
Multiplication for a single modulus in the residue number 
system is formulated by the following equation: 
/(x) (y)/m = /(/x/ ) (/y/ )/ = /(x) (/y/ )/ 
The properties of addition and multiplication modulo m will 
now be demonstrated in Example 2. 5. 
~~m1~~2 
Let m = 7, x = 32, y = 26 
/32// = 4 /26/7 = 5 
Addition 
/x + y/v = /32 + 26/7 = /58/7 = /4 + 26/7 = 2 
Multiplication 
/xy/ = / (32) (26) /7 = //32/v26/v = / (4) (26) /7 = 6 
The following multiplicative inverse theorem [2] is 
very useful in solving linear equations of the form /ax/ 
/b/ . If 0 & a &m and /ab/ = 1, then a is calleci the 
multiplicative inverse of b modulo m, and is denoted by a 
/1 ib/ 
The quantity /lib/ exists if and only if the greatest 
common divisor between b and m is equal to one and /b/ does not equal zero. 
If the above theorem holds, then /lib/ is unique. From 
Table 2. 2 it is apparent that for a given number, a 
multiplicative inverse does not always exist. 
15 
Table 2. 2 Multiplicative Inverses 
Modulo 14 
1/X 
14 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
1 
None 
5 
None 
3 
None 
None 
None 
11 
None 
9 
None 
13 
The following equation may be solved using the 
multiplicative inverse theorem: 
/3x/v = 4 
because (a, b) = (3, 5) = /(3) (5) /7 
Thus, /x/-7 = /(5) (4) /7 = 6 
2. 2 Basic Operations in the RNS 
The previous section discussed the fundamental 
theorems and identities of the RNS. This section will 
emphasize the operations on a complete res due 
representation, rather than on a system with a single 
modulus. For the discussion allowing, the reader should 
assume that we have a modulr set that is pairwise 
relatively prime. The assumption of pairwise relatively 
prime moduli will be commented on later. 
2. 2. 1 RNS Addition 
The basic identity for addition modulo m was definea 
for individual moduli. This basic definition can be 
extended to include systems where multiple moduli are to be 
used. Theorem 2. 3 allows addition in systems with multiple 
moduli [2j. 
~~r2~: 
For a given residue system consisting of' moduli 
mr, mz, ms, . . . , mw, let x and y be defined to be in the residue form. This residue form is denoted by /x + y/M. 
/x/ r, /x/ /x/ w ) 
+ 
/x+Y/M 
{ /y/ & , /y/ z , -, /y/ ) 
( /x+y/ j, /x+y/ z, . . . , /x+y/ ) 
Also important, there exists one and only one integer, 
namely /x+y/M, with such a representation on the interval 
(O, M-1). The following example illustrates the process of 
addition for a given set of moduli. 
For the moduli 3, 4, 5, and 13( (M 780) add 
124 & — — — & ( 1, 
79 &---& { 1, 
4, 
3, 4, 
7 
124 0 I 4, 7 ) 
+ 79 1, 3, 4, 1 
/203/w 203 3 3 8 ) 
From Example 2. 6, several comments can be made. First, the 
process of residue addition has no intermodular carries. 
Each residue digit of the result is only dependent upon the 
corresponding digit of the operands. Typical fixed radix 
number systems are not defined in such a way. The binary 
number system is used to illustrate this in Example 2. 7. 
~~e 2 7: 
Let x and y be binary numbers given by x = (13) f o = (1101) 
and y = (11) ro = (1011) s. The binary addition of x and y 
is shown below: 
111 &--- carry digits 1101 1101 
~+ 
~+ 11000 11000 
Note that in order to obtain the result, it is necessary to 
18 
generate carries from the least significant bit position 
towards the most significant bit position ( left to right ) 
so that the higher order resultant bits may be determined. 
It is this absence of interdigit. carries that result in an 
inherent speed advantage over fixed radix systems. Also, 
notice that the result is obtained modulo M, sucn that if 
t. he result exceeds the value M-l, an ambiguity arises. 
This is a result of a previous identity, stating that /x/ 
and /x + mK/ will have the same residue representation. 
2. 2. 2 RNS Multiplication 
The basic identity for multiplication modulo m was 
defined previously zor a system with a single modulus. 
This definition can be extended to include systems with 
multiple moduli, as was the case in addition [2]. 
For a given residue system consisting of moduli 
mi, mz, ms, . . . mw, residue multiplication is defined for x and y by the following: 
x &-----& (/x/ i , /x/ z , /x/ 3 
x y &-----& (/y/ml , /y/m2 , /y/m3 
. , /x/ w ) 
. , /y/mN ) 
/xy/M (/xy/ml i/ Y/m2 r/ Y/m3 . , /xy/ ~ ) 
Within the interval (O, M-1) only one integer will have the 
above residue representation, namely /xy/~. 
For the moduli 3, 4, 5, and 13, (M=7BO) multiply 
x = 122 ( — — — — -) 
5 
( 2, 2, 2, 5 
2, 1, 0, 5 ) 
122 
5 
2, 2, 2, 5 ) 
2, 1, 0, 5 ) 
i610/7ao = 610 ( 1, 2, 0, 12) 
The same comments apply to multiplication that applied to 
addition. Specifically, t. hat multiplication is carry free 
between moduli, and results in ambiguity if xy exceeds M-l. 
2. 3 Translation from Binary to Residue 
A vast majority of digital computers today use a 
form of the binary number system for computation. In order 
to use the RNS, it is necessary to translate from the 
conventional binary number system to the residue 
representation of a number. An integer x in the binary 
number system is described as follows: 
2 b + 2 b I + . . . . . + 2 bz + 2 br + bo 
Szabo and Tanaka observed that if the powers of 2 modulo m 
are stored in computer memory, that Ixl could be computed 
by adding modulo m the powers of two which have a non-zero 
bz [2). 
20 
~Earp 1 e ~2 
Let x = (26) „o = (11010) z 
To compute /x/s, the following values should be 
computed prior to /x/s: 
/2 /2 = 1 
/2/s = 2 
Computing /x/s 
/2 /3 —  2 
/1/s = 1 
/2 / = 1 
/X/3 = / (1) (1) + (2) (1) + (1) (0) + (2) (1) + (1) (0) /3 
/5/s = 2 
/26/s = 2 
Another method of input translation has been proposed 
that uses a variation on the idea presented above. The 
method uses n/2 processing elements (n = word length of 
weighted number), with each processing element responsible 
for storing the two residues of two consecutive bits in the 
input word [10]. Depending on whether a one or a zero is 
present for the specified input bit, the residue of the 2" 
bit position is either added modulo m to the resultant of 
previous bits or zero added to the previous bits 
respectively. This design would be very suitable for 
pipelined operation, with the computation of each pair of 
residue digits for each clock stage. The matrix 
multiplication algorithm used in this research does not 
allow any speed increase with the application of 
pipelining. 
21 
2. 4 The ( hinese Remainder Theorem 
The Chinese Remainder Theorem allows conversion out of 
a residue representation into a weighted number system 
[14], Given a residue representation ( rr, rs, rs, . . . . , rN ), 
the Chinese Remainder Theorem makes it possible to 
determine /x/~, provided that the greatest common divisor 
of any pair of moduli is one. A moduli set obeying this 
property is called pairwise relatively prime. The 
following theorem fails to hold i the requirement of 
pairwise relatively prime coes not hold (2] . 
Chinese Remainder Theorem 
/x/~ /( (za/rsIza/ 3)) / 
where z = M/ma, M = (mr) (ms) . . . . (mw), and the greatest 
common divisor between any two moduli is one. 
F' or the moduli mr = 13, mz = 11, ms = 7, and m4 = 9, the 
number given by the residue representation (4, 2, 4, 7] can be 
found as follows: 
M = (13) (11) (7) (9) = 9009 
/zr/ja = /(11) (7) (9) /ra = /693/rs = 4 
/zQ/rr = / (13) (7) (9) /sr. = /819/rr = 5 
/z3/v = / (13) (11) (9) /v —  /1287/v = 6 
/zg/s = /(13) (11) (7) /s = /1001/g = 2 
/x/goos = /693/ (10) (4) /as + 819/ (9) (2) /rr + 1287/ (6) (4) /v + 1001/ (5) (7) /9/9QQ9 
22 
/ (693) (1) + (819) (7) + (1287) (3) + (1001) (8) /gpps 
/18295/spps = 277 
There exists a modified orm of the Chinese Remaincer 
Theo em in tne event that moduli are chosen such that they 
are not pairwise relatively prime. Interested readers 
should consult [2] for details of this modified Chinese 
Remainder Theorem. 
2. 5 Sign Representation of a Residue Number 
Explicit sign representation of a number defines the 
case where the sign of a number can be determined by 
inspection. Such is the case with a signed magnitude 
rep esentation of a binary number, where the most 
significant bit position gives the sign of the operand. 
Implicit sign representation of a number defines the 
case where the sign information is not readily apparent 
upon inspection of a number. Implicit representation is 
the case when a number is in residue representation. It 
should be apparent from Table 2. 1 that immediate 
determination of operand sign is virtually impossible upon 
inspection. 
It is common practice to consider numbers in the range 
of [O, M/2 -1] as positive, and numbers in the range [M/2, M- 
1] as negative. This assignment is made assuming that the 
23 
dynamic range of the system will remain within the 
specizied range of [0, M-l], otherwise the actual resulting 
number, not to mention the sign, will be lost. The 
following example illustrates the part'tioning of a residue 
system into positive and negative parts. Table 2. 3 
illustrates what is meant by dividing t. he interval of 
definition for a given set of moduli. 
Table 2. 3 Partitioned Interval of Definition 
A = Actual Number 
B = Partitioned Number 
A B 
Mo du li 
2 3 5 A B 
Moduli 
2 3 5 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
0 0 0 
1 1 1 
0 2 2 
1 0 3 
0 1 4 
1 2 0 
0 0 1 
1 1 2 
0 2 3 
1 0 4 
0 1 0 
1 2 1 
0 0 2 
1 1 3 
0 2 4 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
-15 
-14 
-13 
-12 
-11 
-10 
-9 
-8 
-7 
-6 
-5 
-4 
-3 
-2 
-1 
1 0 0 
0 I 1 
1 2 2 
0 0 3 
1 1 4 
0 2 0 
1 0 1 
0 1 2 
1 2 3 
0 0 4 
1 1 0 
0 2 1 
1 0 2 
0 1 3 
1 2 4 
~J((p 1 f 
Let x = 5, and y = -9, from Table 2. 3 the residue 
representation of x and y are as follows 
x = ( 1, 2, 0 
y = ( 1, 0, 1 
/ x+v /~=(0, 2, 1} 
Since x + y = 5 + -9 = -4, we would expect in Table 
2. 3 for the residue representation of -4 to be ( 0, 2, I), 
this is exactly the case. 
This concludes the introduction of the Residue Number 
System. There are many other theorems and identities which 
have not been presented. Division is presented in (2(, but 
it is a complex operation when compared to addition and 
multiplication. This research will not need to implement a 
method of division, hence it will not be discussed. Next 
the basics of matrix multiplication, as well as the matrix 
multiplication algorithm will be examined. 
2. 6 Introduction to Matrix Multiplication 
A matrix is simply an array of numbers denoted by A, 
in the form of: 
aii ais . . . ai„ 
asi ass . . . as 
ami arne '' amn 
A matrix has m rows, and n columns. 'f A is a matrix, and 
has m rows and n columns, then in order to multiply B by A, 
we require that B be a matrix of n rows and p columns. The 
multiplication of two matrices is defined as follows [15]: 
Let C = AB = ~c 
~ , then C is known to be a matrix of 
m rows and p columns such that 
C = a ibis + a ~bxo + + a. b ) i = 1, 2, . . . . , m 
ip 
~m~1 2 
Let A = I 1 
3 
2 I and B = I — 2 5 
1 
Then 
AB = i (1) ( — 2) + (2) (4) 
(&) (-2) +(1) (4) (1) (5) + (2) (-&) (3) (5) +(1) (-&) 
AB = 4 — 1 
-2 12 
It should be noted from a computational standpoint, it 
takes a total of 8 multiplications and 4 additions to 
multiply simple 2-by-2 matrices together. In general, 
[mxn] x[nxm] requires (m-1) n additions, and mn 
multiplications. Clearly for large matrices, the number of 
multiply and addition steps increase rapidly. The 
following algorithm, which forms the foundation to which 
this research was applied, greatly reduces the amount of 
time required to multiply two matrices together. 
26 
2. 7 The Matrix Multiplication Algori. thm 
One of the most successful applications of pipeline 
processing has been in the execution of arithmetic 
operations [3] . Pipelined operation allows for small 
portions of an overall task to occur at each position in 
the "pipe". This type of setup is extremely valuable when 
successive operations of the same type occur (e. g. an 
operation operates on a vector) . It takes a certain amount 
of time to fill up the "pipe", whicn is known as the start- 
up or initialization time. If the successive operations on 
the vector are very long, the start-up time time becomes 
very insignificant. A majority of supercomputers use 
multi-stage pipelining to achieve very fast operating 
speeds. A pipeline arithmetic unit can be visualized as a 
systolic array of linearly connected processors. Where 
each processor (processing element. ) is capable of 
performing a small portion of a global task. Kung found 
that the multiplication of two matrices could be done by 
using an array of hexagonally shaped processing elements 
[3]. This algorithm is suitable to Very Large Scale 
Integration (VLSI) where it is essential that processors 
are regular (in this case identical) and only locally 
connected. The basic processing element used by this 
algorithm is called an inner-product step processor. In 
this research, the inner-product step processor will be 
27 
re erred Lo as a Mult ip) y and Add Cr 11 (MAC) 
2. i. l The Multiply and Add cll 
Figure 2. 1 illustrates the shape of the multiply and 
add cell (innei-. product step processor), which is the most 
basic element rn the matrix multiplication algorithm. 
B 
MAC 
A 
Figure 2. 1 Multiply Add Cell 
The MAC contains three registers Rz, Re, and Rc, and six 
connections crossing the MAC boundary. Of the six 
connections, three are inputs and three are outputs. At 
each time interval, the processor transfers the data on its 
input lines denoted by A, B, and C into Ra, Re, and Rc, 
respectively, then computes the value of (Ra) (Re) +(Rc), and 
2B 
transfers the old values of R~ and Rs, along with the new 
va' ue of Rc ( (R~) (Rs) +(Rc) ) to the output lines, denoted 
A, B, and C, respectively. Since the inputs of each of the 
NAC's are latched, changing outputs will not interfere with 
the input of another NAC until the following clock cycle. 
It is this described cell, that will allow the 
multiplication of two matrices together by the following 
algorithm. 
2. 7. 2 Formulation of the Matrix Multiplication Algorithm 
The matrix product C = (c j) of A = (a j) and B 
(b j), can be computed by the following relationships (3] 
(k+i ) i j Cij (k) k = 1, 2, . . . , n 
Ci j (n+1 ) Ci j 
Figure 2. 2 illustrates the algorithm using a diamond- 
shaped array of linearly connected hexagonally shaped 
multiply and add cells. 
The configuration of Figure 2. 2 could be used to 
multiply the following matrices: A x B = C 
all a12 I x I bll 
a21 a22 I I b21 
b12 I = I cll c12 
b22 c21 c22 
The algorithm is easily applied to larger matrices with the 
addition of more NAC's configured in a similar manner. 
a 12 b21 
b22 
a11 bl1 
MAC b12 
AC MAC 
MAC MAC AC 
AC AC 
AC 
Figure 2. 2 Hexagonal Array for Matrix Multiplication 
30 
The pattern for the input coefficients, as well as ' he 
timing constraints will be examined in Chapter IV. 
The exact configuration of I:igure 2. 2 could be used to 
multiply two band matrices of 1arger dimension. The 
multiplication of two matrices with bandwrdth wr = p~+q, — 1 
and wa —  p~+qz-l, respectively, is shown in Figure 2. 3. 
all a12 0 0 0 
a21 a22 a23 0 0 
0 a32 a33 a34 0 
0 0 a43 a44 a45 
0 0 0 a54 a55 
X 
bll b12 0 0 0 
b21 b22 b23 0 0 
0 b32 b33 b34 0 
0 0 b43 b44 b45 
0 0 0 b54 b55 
cll c12 c13 0 0 
c21 c22 c23 c24 0 
c31 c32 c33 c34 c35 
0 c42 c43 c44 c45 
0 0 c53 c54 c55 
Figure 2. 3 Banded Matrix Multiplication 
F'rom Figure 2. 3, the bandwidt. h of A and B can be calculated 
to be wa = 2+2 — I = 3 and wa = 2+2 — I = 3 respectively. It 
should be noted that this is exactly the bandwidth of a 
matrix which has two columns and two rows. Thus the 
matrices given in Figure 2. 3 could be multiplied using the 
14AC configuration of Figure 2. 2. In general, if A and B 
are matrices of bandwidth wa and wa, then it takes wjwa 
hex-connectect processors to compute the multiplication of A 
and B to obtain the resultant. matrix C [3J 
CHAPTER III 
APPLICATIGN GF THE RNS TD MATRIX I'IULTIPLICATIGN 
The matrix multi plication algorithm lends itself to an 
applica& ion r. equiring high speed multiplication. In 
addition to the requirement of high speed multiplication, 
the application must also have a need to multiply matrices 
with a very large dimension or very frequently. In most 
applications, the algorithm will ideally be implemented on 
a single chip. It is expected that this chip will be 
attached to a host processor, exchanging the various input 
and output operands through the system bus, as shown in 
Figure 3. 1. [16]. 
Host 
Processor Memory 
Matrix 
Multiplication 
Chip 
System Bus 
Figure 3. 1 System Configuration 
Applications without the need for high speed matrix 
multiplication, or without the need to multiply large 
matrices successively, can not efficiently use the 
algorithm. It should be clear that since the algorithm 
32 
will most likely be implemented on a separate chip, and 
that any further increases in speed, attributed to the 
Residue Number System, would be worth extra design time. 
While the amount of time saved for one multiplication of 
small matrices is not very significant, the amount of time 
saved for many successive multiplications adds up to be 
very relevant. As previously mentioned, certain residue 
operations are much more complex than other operations. 
The operations of multiplication and addition are among the 
simplest, while magnitude comparison, sign determination, 
and division have proved to be more difficult. The matrix 
multiplication algorithm is simplified into successive 
multiply and add operations. For this reason, it should be 
apparent that this algorithm is a prime candidate for 
operations of the residue type. It is also expected that 
if division was necessary, that the overhead required to do 
this might be fatal to the application of the Residue 
Number System. The following section discusses the 
considerations necessary to successfully implement the 
matrix multiplication algorithm using the RNS. 
3. 1 Error Free Design 
The residue number system has a very unique property, 
being that it does not suffer from round-off error. This 
can be used to the systems advantage or disadvantage 
33 
depending on the application. In an application where 
exclusively integers will be manipulated, this is a very 
highly desired feature. In applications where fractions 
are being manipulated, error of some magnitude . 's both 
tolerated and expected. In the conventional binary number 
system, when two numbers of arbitrary word length n are 
multiplied together, it is possible to get resultant word 
lengths of 2n. It is common practice to truncate the lower 
n bits when dealing with fractions. Conversely, when 
deal'ng with integers, it is common to designate a certain 
upper limit number of bits for the system. Any time this 
system upper limit is exceeded, overflow is said to have 
occurred. Upon the occurrence of overflow, the result of 
the calculation may be only partially complete, or 
completely incorrect. At any rate, the answer is 
inadequate, and should never be used for any further 
calculation. 
It should be clear that in a residue system design, it 
is not important to designate at the onset whether the 
input operand will be a fraction or an integer. The system 
will produce the entire output length, depending on the 
application, the designer can truncate the upper or lower 
portion of unused bit positions, for integer or fractional 
designs, respectively. 
One consequence of the above mentioned error free 
property, is that the overall system dynamic range must be 
determined. Considering an ordinary binary system, where 
successive multiplication and addition processes occur, it 
can be shown that if the dynamic range of the system (i. e. 
maximum word length of the binary operands) is exceeded 
before any of the above mentioned processes are complete, 
that the result will be both incorrect and unusable. ln a 
residue system equivalent, if the dynamic range is exceeded 
during some calculation internal to the overall process, 
there is still hope. It is only mandatory that the end 
result remain in the dynamic range of the given system. 
The following example should clarify this issue. 
Let the moduli of a system be mi = 3, and ms = 5 
M = (3) (5) = 15 
Thus, the system interval of definition is (0, 14). 
Suppose z = (a) (b) + (c) is to be calculated, where 
a = (4) &-----& /a/ = ( 1, 4 ) 
b = (6) io & /b/M = ( Oi 1 ) 
c = ( — 10) io ~ --& /c/~ ( 2p 0 ) 
x b 
1, 4 ) 
x ( 0, 1 ) 
(24) ip ( 0, 4 } 
Note: (a) (b) has already exceeded the interval of 
definition, even so the calculation is continued. 
24 
+ — 10 ) x ( 2, 0 ) 
(14) io ( & ( 2i 4 ) 
35 
ab t c = (24) ip + ( 10) 
~ p = (+14) & desired result 
Since ( 2, 4 ) is the residue representation of 14, 
the calculation is exactly correct. 
This property has no parallel in the binary number 
system. If at any point in a binary calculation overflow 
occurs, the resulting calculation has no predictable chance 
of being correct. 
3. 2 System Dynamic Range Determination 
There are two ways of determining tne dynamic range of 
a residue system design. It is possible to exami. ne the 
input word length, and make a calculation to determine the 
maximum possible value at the output, assuming worst case 
(largest valued) numbers at the input, for all inputs. The 
second approach is to agree on a maximum allowable output, 
and hope that this range is never exceeded. This 
particular method would be particularly useful if a 
designer knows ahead of time that a certain output. value 
will never be exceeded. In this case, the design would be 
simplified accordingly. In this research, the first 
approach is used. 
The assumed input operand format in this design is 
presumed to be that of a signed magnitude number. This 
format is typical in floating point processors, although 
this is not a floating point processor. It would allow 
easier communication with a floating point processor, since 
the operands will be input and output in tne same format. 
It may be advantageous to place an intermediate processor 
between the host and matrix multiplier, for the purpcse of 
pre-adjusting the mantissa of an input floating point 
number, also for the purpose of readjusting the mantissa on 
return to the host processor [16]. This process should be 
pipelined, so that the time required to adjust the operands 
does not affect the performance of the algorithm. 
The actual dynamic range of this system will be 
directly determined by two factors. The first factor is 
the input operand word length. The second factor depends 
on the size of the input matrices to be multiplied. Table 
3. 1 shows the value determining the maximum value possible 
in any position of the output matrix. 
Table 3. 1 Determination of Dynamic Input Range 
A B 
Square Input Matrix Dimension 
2 1 
3 3 
4 7 
5 15 
6 31 
7 63 
8 127 
6 
54 
294 
1350 
5766 
23814 
96774 
8 
72 
392 
1800 
7688 
31752 
129032 
10 
90 
490 
2250 
9610 
39690 
161290 
12 
108 
588 
2700 
11532 
47628 
193548 
14 
126 
686 
3150 
13454 
55566 
225806 
16 
144 
784 
3600 
15376 
63504 
258604 
A = Signed Magnitude Input Word Length 
B = Maximum Possible Magnitude of Input 
37 
From Table 2. 1, for signed magnitude input word length, the 
maximum magnitude of the input can be calculated by 2" — 1. 
For an input word length of eight bits (n=8), the maximum 
input value can be calculated as 2 ' -1 = 2 -1 = 127. 
The maxrmum value of any number in the output matrix can be 
determined by: 
X = (dimension of input matrix) (maxrmum input value ) 
The above formula is assuming square input matrices, 
Example 3. 2 demonstrates the calculation of this value. 
The maximum value of any one value in the output 
matrix can be calculated given both the square input matrix 
size, and the input word length. If the matrix input size 
is 4, and the input word length is 7, we can calculate the 
upper bound of any entry in the output matrix as follows: 
X = (4) ((2 ~ — 1)) = (4) (63) = 15876 
Consulting Table 3. 1, for an input word length of 7, and a 
matrix size of 4, we do not get the same result as Example 
3. 2. This is because in Example 3. 2, only the positive 
output range was considered, so that the total range can be 
found by doubling the positive dynamic range. In the case 
of Example 3. 1, the total dynamic output range is given by 
(2)(15876) = 31752. 
The motivation for finding the output range so 
38 
meticulously is due to the nature of the residue system. 
Pemembering that the result will only be correct in the 
case where it is enclosed by the interval of definition. 
Tl e approach taken in such a design might be that the 
output must be correct for all possible inputs. Another 
approach could be that of a defined interval of definition, 
with some sort of assurance that the defined interval will 
never be exceeded. Once again, the philosophy behind this 
design is that the correct result will be achieved for all 
possible input combinations of a given word length. 
In this design it was decided that an eight-bit input 
operand, and an input matrix of bandwidth five would be 
sufficient to demonstrate the advantages and disadvantages 
of a residue system design. Specifying a bandwidth of five 
also specifies the maximum allowable output operand value, 
in exact accordance with Table 3. 2. This is very 
convenient from the standpoint of a general design. For an 
input matrix of any arbitrary size, the algorithm works as 
long as the bandwidth of the arbitrary matrix is less than 
or equal to five. In the event that the matrix has a 
bandwidth less than five, zeros should be input at the 
unused input ports. 
39 
CHAP TER IV 
MATRIX MULTIPLICATION AI GORITHM SIMULATION 
The reference presenting this algorithm fails to 
adequately introduce the necessary timing information to 
successfully implement the algorirhm [3j. It will be he 
purpose of this chapter to develop and demonstrate the 
application of the algorithm itself. Specifically, the 
algorithm will be demonstrated for input matrices with an 
input bandwidth of five. The timing parameters obtained 
from this simulation will be needed later. 
4. 1 MAC Computing Structure 
The computing structure will contain wrws multiply and 
add cells. Therefore, a diamond shaped array of twenty- 
five processors will be necessary to implement the 
algorithm. Figure 4. 1 shows the structure to be used for 
the simulation. A MAC referencing system is necessary, so 
that each separate MAC can be identified individually from 
the surrounding processors. As shown in Figure 4. 1, the 
numbering convention is that of starting at the top, and 
numbering each MAC from left to right, consecutively, in a 
row-wise fashion. With the exception of the lower-most MAC 
in each vertical column, each MAC has six external boundary 
2R 
1R 
20 
0 
1R 
30 
MAC I 3R 
4R 
40 
6R 
11R 
7R 
110 
MAC4 MAC 6 
MACI 0 
10R 
150 
15R 
Ci I MAC I 2 MAC I 2 MACIA MACI7 
MAC 1 6 MAC17 MAC I 8 MAC I 9 
Figure 4. 1 Computing Array for Matrix of Bandwidth Five 
connections. These connections must also have 
distinguishable names. The naming convent ion tor MAC r'4 is 
shown . n figure 4. 2. 
4R 
40 
4L 
MAC4 
Figure 4. 2 MAC Input/Output Naming Convention 
it should be noted that all inputs to the MAC traveling 
from left to right, are labeled 4R. All inputs to the MAC 
traveling from right to left are labeled 4I, . The upwards 
going input is labeled 4I, while the upwards going output 
is labeled 40. All other cells are named in a similar 
convention. 
4. 2 Input Matrix Coefficient Timing 
Crucial to the success of this algorithm is the 
pattern of input coefficients. The pattern is somewhat 
regular s n struct ure. After simulation of the algorztnxi zs 
complete, the out put porc timing coefficients wil) be 
apparent. Timing patterns for A = (a. . ) and B = (b, . ) will 
be examined. Figure 4. 3 shows the exact format of the 
input matr' ces A and B, each having a bandwidth of five. 
all a12 a13 0 0 
a21 a22 a23 a24 0 
a31 a32 a33 a34 a35 
0 a33 a43 a44 a45 
0 0 a53 a54 a55 
B 
bll b12 b13 0 0 
b21 b22 b23 b24 0 
b31 b32 b33 b34 b35 
0 b33 b43 b44 b45 
0 0 b53 b54 b55 
Figure 4. 3 Input Matrices of Bandwidth Five 
The A matrix input coefficients will march into the 
computing array of Figure 4. 2 from the left hand side 
towards the right. The B matrix input coefficients will 
march into the computing array from the right hand side 
towards the left. There are five input ports into which 
the A matrix coefficients will go, namely 11R, 7R, 4R, 2R, 
and 1R. There are also five input ports into which the B 
matrix coefficients will go, namely 1L, 3L, 6L, 10L, and 
15L. Table 4. 1 shows the input port coefficient timing 
table. The input timing was found by trial and error. All 
coefficients on a horizontal row are input into the same 
Table 4. 1 Matrix A and B Input Coefficient Timing 
Matrix A Input Timing Table 
Coefficient Movement 
11R 
7R 
I Sl, ~ 4R 
& 15 
2R 
1R 
a11 
a12 
822 
a13 
a42 
a23 
a33 
a43 
a24 
a53 
a45 
Increasing Time 
Matrix B Input Timing Table 
Coefficient Movement 
W O 
1L 
3L 
6L bll 
10L b12 
b21 
b13 
b22 
b3] 
b23 
b32 
b24 
b33 
b35 
b53 
b54 
b55 
Increasing Time 
input port. All coefficients on a vert cal segment are 
input at different ports, during the same clock cycle. 
Going in increments of one from left to right corresponds 
to one clock cycle. This means that input coefficient bii 
is input one clock cycle before coefficient b», and that 
input coefficient b~~ is input three clock cycles before 
input coefficient bss With the input coefficients timing 
established, the development of the simulation may be 
examined. 
4. 3 Algorithm Simulation Development 
In order for the development to be clear, the reader 
is recommended to consult Figure 4. 1 and Figure 4. 2 as 
necessary. The algorithm may be broken down into three 
basic transfers at the MAC level. Each of these transfers 
occurs at the onset of a clock cycle. The first basic 
transfer is an operand traveling from left to right across 
the array. An example of this is the operand at 5R being 
transferred to 9R. The second basic transfer is similar to 
the first basic transfer, except instead of moving from 
left to right, an operand moves from right to left . An 
example of this is the operand at BL being transferred to 
12L. The third basic transfer is only slightly more 
complex than the first two. During any given clock cycle, 
the MAC computes the product and sum of the three inputs, 
which appears at the output of the mac before the next 
clock cycle begins. Using MAC9 from Figure 4. 1 as an 
examp'e, in a civen clock cycle, MAC9 multiplies (9R) (9L), 
and adds to this (9I), and places this result on 90 bezore 
the clock cycle is finished. At the onset of the next 
clock cycle, the operand at 90 will be stored in the 
register in front of MAC3. In other words, the value at 90 
is transformed into 3I when the clock pulse occurs. A 
program was written modeling this transfer level 
description of the MAC array. The complete output of the 
simulation may be found in Appendix A of this thesis, but 
the results are discussed here. 
0. 4 Algorithm Simulation 
Simulation of this algorithm is important for several 
reasons. First, the determination of the input coefficient 
timing is necessary to implement this array structure. Of 
equal importance, the output coefficient timing will be 
determined by examining the simulation results. Finally, 
it is important in this research to determine exactly how 
many clock cycles it takes to complete a complete matrix 
multiplication process, from start to finish. 
The simulation input matrices, as well as the 
resultant matrix is shown in Table 4. 2. The input matrices 
A and B were chosen arbitrarily to demonstrate how the 
algorithm works. Table 4. 2 shows the actual A and B input 
46 
Table 4. 2 Algorithm Simulation for Arbitrary Input Matrices 
Input Matrix A Input Matrix B Resultant Matrix C 
9 
3 
7 
-2 
-9 
0 
-3 
-2 
1 
9 
1 4 8 0 0 
3 2 -5 3 0 
-4 -6 -7 4 -4 
0 7 3 4 -7 
0 0 -9 -I -5 
-25 -40 -62 45 
-5 
-31 -32 6 
-9 -30 -66 33 
26 31 -58 9 
36 117 63 -3 
-36 
9 
-24 
-24 
-42 
Matrix A Input Timing 
11R 
7R 
4R 
2R 
IR 
0 6 0 0 -9 0 
5 0 0 -2 0 0 
0 0 7 0 0 1 
0 3 0 0 -2 0 900-300 
0 O O O O 
9 O O O O 
0 3 0 0 
o 5 o o o 
0 0 0 0 
1L 
3L 
6L 
10L 
15L 
0 -6 0 0 3 0 
0 0 -7 0 0 4 
-5 0 0 4 0 0 
0 3 0 0 -4 0 
0-1000 
0 0 -5 0 0 
-7 0 0 0 0 
0 0 0 0 0 
Matrix B Input Timing 
-4 0 0 7 0 0 -9 0 0 0 0 
110 
70 
40 
20 
10 
30 
60 
100 
150 
Output Matrix Timing 00003600 
0 0 0 26 0 0 117 
0 0 -9 0 0 31 0 
0 -5 0 0 -30 0 0 
-25 0 0 -31 0 0 -66 
0 -40 0 0 -320 0 
0 0 -62 0 0 6 0 
0 0 0 45 0 0 9 
0 0 0 0 -36 0 0 
0 0 
0 0 
0 63 
-58 0 
0 0 
33 0 
0 -24 
0 0 
0 0 
0 0 
0 0 
0 0 
0 -3 
9 0 
0 -24 
0 0 
0 0 
0 0 
0 
0 
0 
0 
-42 
0 
0 
0 
0 
coefficrents as they are znput int o the array. Also shown 
are the output coefficients, which will be used to 
generalize the output matrix cceff'cient timing sequence. 
The output coefficient timing can now be genera'ized 
for the C matrix, and is shown in Table 4. 3. 
Table 4. 3 Output Coefficient Timing 
110 
70 
40 
20 
10 C11 
30 
60 
100 
1SO 
C21 
C1 
C31 
C13 
C41 
C2 
C1 
CS1 
C31 
C23 
C1 
C41 
C2 
CS1 
C3 
C2 
C43 
C3 
CS 
C3 
C4 
CS 
C4 
CS 
The amount of time for one complete matrix 
multiplication can now be extracted. The reader should 
note the regularity of the wedge shaped output coefficient 
pattern in Table 4. 3. 
Several comments can now be made about the overall 
timing of the algorithm. It takes five clock cycles before 
any output coefficient appears at an output port. It takes 
an additional twelve clock cycles before the last 
coefficient is output. Thus, it takes a total of 
seventeen clock cycles to completely multiply two matrices 
of bandwidth five together. 
In general, the processing time to multiply two 
matrices of an arbitrary bandwidth w (w = w, = wz) is grven 
by the following equation: 
T = (3w — 4) + 3(S) 
where: 
T~ = Overall Processing Time 
S = Dimension amount larger than a minimum 
sized matrix of the same bandwidth 
For example, a minimum sized matrix of bandwidth 3 is a 2X2 
matrix, while a minimum sized matrix of bandwidth 5 is a 
3X3 matrix. The amount of time to multiply two matrices of 
bandwidth five, and dimension 4X4 is calculated from the 
above equation TP = (3(5) — 4) + 3(4 — 3) = 14 clock cycles. 
This concludes the algorithm simulation discussion. 
The whole simulation output, rather than the summarized 
results presented in this chapter, can be found in Appendix 
A of this thesis. 
CHAPTER V 
DESIGN DEVELOPMENT 
This chapter introduces the design to be simulated in 
this research. The approach of this chapter will be that 
of a detailed presentation, such that this design could be 
duplicated by the reader. Some of the smaller details will 
be presented as they are important in the design of a 
residue system. The following section describes the design 
at a system level. Subsequent sections examine the steps 
in designing the major blocks of the system. 
5. 1 Residue System Specifications 
It was agreed upon that the design to be simulated in 
this research should be large enough to demonstrate the 
application of the residue system to a problem of useful 
complexity. The design presented in this chapter assumes 
an input operand of eight bits, in signed magnitude format. 
This allows an input range from -127 to t127. An upper 
limit bandwidth of five is placed on the input matrices A 
and B. There are many scientific algorithms requiring the 
multiplication of banded form matrices with bandwidth 
dimensions of five and smaller. 
With the global system requirements specified, the 
process of moduli selection may begin. First the dynamic 
range of the system must be determined. From Table 3. 1, 
fo an input of eight bits, and a matrix size of 5, we see 
that the overall dynamic range of the system is 161290. 
Although the selection of moduli is arbitrary, it is 
beneficial to choose pairwise relatively prime moduli. 
This is done so that the Chinese Remainder Theorem can be 
implemented, rather than the alternate form of the Chinese 
Remainder Theorem. Before the moduli set chosen for this 
design are presented, several comments should be made about 
moduli selection. It is important to have as few moduli as 
possible, yet it is also true that hardware complexity 
increases as the moduli size increases. There exists a set 
of equations to generate moduli sets that are pairwise 
relatively prime. It is not convenient to use these 
equations, since they tend to select small numbers 
initially, with the moduli size growing very rapidly. The 
design of the system presented here chose a set of five 
moduli, although a set of four moduli of the proper 
magnitude would successfully satisfy the system 
requirements. The reason five moduli were presented 
instead of four will be explained shortly. The moduli set 
for this research is given by (mr, mz, ms, mq, ms) 
7, 11, 13, 15, 16 ). First it should be noted that the moduli 
are pairwise relatively prime. The moduli seven, eleven, 
and thirteen are prime numbers themselves, so there is no 
concern that these moduli are not pairwise relatively 
prime. Fifteen and sixteen have a greatest common divisor 
of one, so they are pairwise relatively prime with respect 
to themselves, and with respect to the other moduli as 
well. The reason for selecting five moduli, instead of 
four, s due to the convenience of implementing modulo 2 
addition and multiplication. It was found that modulo 2"' 
addition and multiplication are identical to conventional 
binary addition and multiplication. Although it is not 
apparent to the reader at the present time, it will be 
demonstrated that very little effort will be required to 
implement the modulo 16 operations. Thus, designing the 
MAC portion of the system will be eauivalent to the design 
of a four moduli system. The primary advantage of hav'ng 
five smaller moduli, rather than four larger, is the speed 
of the computation involved. Remembering that the modulus 
of a number is governed by the following equation: 
0 & /z/ & m 
Thus the range of the residue representations for all the 
moduli will be between zero and fifteen. All of which can 
be expressed by four binary digits. This will not be a 
critical component of the design, but will contribute to 
the performance of the residue design. The choice of the 
moduli set is a task left to the designer. Assuming the 
set of moduli will satisfy the system requirements, there 
is no simple and clear cut way to arrive upon the optimum 
set. There is no guarantee that the moduli set chosen in 
this research is optimum. It is not even clear what the 
word optimum means, since in one design it may be necessary 
to optimixe Lhe ci rcuit area, the . peed of oper al ion, or 
combination of br 1 h. 
With the moduli set cho. , en above, Lhe 'ar ' ous aceLs 
of the design may now be investigated. The most prevalent 
portion of the design is the mult'ply and add cell, which 
will be presented first. Also, the approach used to 
translate into and out of the residue representati. on will 
be presented. 
5. 2 Multiply and Add Cell 
The multiply and add cell is the most basic building 
block of the matrix multiplication algorithm. The MAC of 
the residue desi gn will .  till be re' errea to as - MAC, 
although it will consist of five smaller blocks. The 
residue MAC is shown in Figure 5. 1. 
A B C A B C A B C A B C A B C 
cg 
D 
C 
Modulo 7 
MAC 
Modulo 11 
MAC 
Modulo 13 
MAC 
Modulo 15 
MAC 
Modulo 16 
MAC 
Figure 5. 1 Residue MAC Configuration 
It should be noted that each MAC independently processes 
the corresponding residue digits of the input operands A, 
B, and C. 
There are several approaches that can be made when 
implementing modulo addition and multiplicat'on. There has 
previously not been any research effort in concurrent 
addition and multiplication modulo m. It would be easy to 
implement modulo m multiplication, and then implement 
modulo m addition, but this method was found to be very 
time consuming. There are very few if any papers on 
implementing residue multiplication without using the RON 
approach, which is not a viable solution to this problem. 
There are several methods of implementing residue addition. 
One method proposed adds two numbers together, then 
subtracts the modulus from the result of addition 
repetitively until the sum changes from a positive sign to 
a negative sign, when this occurs, the modulus is then 
added back to the current sum, which then becomes the 
result modulo m [7]. This method is acceptable when the 
addition process can be pipelined, but is very time 
consuming to implement sequentially due to the large number 
of subtractions necessary. Another proposed method 
recognizes that residue addition is cyclic, and as a 
consequence of this, uses shift and rotate logic to 
correctly select the desired result [8j. This method grows 
very large in complexity, even for moduli of modest size. 
It was reported that implementation of modulo fifteen 
addition requires over seven hundred logic gates. This 
method will not be acceptable in this research either. It 
must be remembered that this does not include hardware 
requirements to implement modulo m multiplication. 
An alternative approach to the problem is to examine 
the two processes that must occur simultaneously, namely 
modulo m multiplication and addition. Each of the five 
digits of a residue representation can be expressed in four 
binary digits, with the exception of the first modulus, 
which is 7, where its residue digits can be expressec in 
three binary bits. The method proposed in this research is 
a hybrid approach to the problem in the sense that 
operations of both binary and residue types will be used. 
It was found that after the binary multiplication and 
addition of A, B, and C for each modulus has occurred, the 
result could be taken modulo m, . This approach requires 
the implementation of two truth tables for each modulus, 
requiring a total of eight truth tables for the entire 
design. Although it is undesirable to have large amounts 
of truth tables in a design due to their irregular 
structure, in this case it will be acceptable, because the 
same structure will be used repetitively. It will be found 
later that both the input and output translation problems 
will use the same modular truth table blocks. Another 
important factor in allowing the use of truth tables is 
their simplicity in design. Each of the truth tables will 
4 5 
only be required to have five inputs, and four outputs. A 
generalization of this approach to designs of larger 
complexity will be made at later point in this research. 
5. 2. 1 MAC Functional Configuration 
The MAC configuration proposed in this research is 
shown in Figure 5. 2. It should be noted that this 
configuration is for ~~ modulus inside the MAC boundary 
shown, thus there will be four such designs inside the MAC. 
There would be five, except modulo 16 operations are 
simplified and will not need the full configuration as 
shown in Figure 5. 2. Also, the modulo 7 design will not 
have as many input and output bits, but the overall 
structure will be identical. Before the design of the 
individual blocks of the MAC are discussed, an example will 
be used to illustrate operation at a functional level for 
an individual modulus. 
For the modulus 15, given the following inputs: 
A = (12) ip 
B = (9) ip 
C = (5) ip 
(1100) z 
(1001) z 
(0101) z 
the result P = AxB + C can be computed using the 
proposed structure as follows: 
From the Modified Braun Array, 
Modified Braun Array 
AxB+ C 
Upper Truth Table 
Modulo m 
Lower Truth Table 
Modulo m 
Four Bit Binary Adder 
Lower Truth Table 
Modulo m 
Figure 5. 2 Proposed MAC Configuration for Each Modulus 
AxB + C = (12) (9) + (5) = (113) io = (1110001) e 
From the Upper Truth Table, 
/ (1100000) x/is = / (96) /q s = (6), o 
From the Lot er Truth Table: 
/(10001) s/is — /(17) /is — (2)ip 
From the Four-Bit Binary Adder, 
( 2 ) ~ ( 6 ) ( 0 0 1 0 ) s + ( 0 1 1 0 ) x: ( 1 0 0 0 ) s: ( 8 ) 
From the second Lower Truth table, 
/ (8) 10/1S = / (1000) /2 (8) io 
Thus, the modulo 15 result P = AxB + C is: 
/ (113) ip/is = P = (8) ip 
This is the exact answer obtained by the MAC 
configuration, hence the configuration is functionally 
correct. 
This concludes the description of the MAC 
configuration at the functional level. The individual 
blocks of the functional configuration of Figure 5. 2 will 
now be examined closer at the design level. 
5. 2. 2 Modified Braun Array 
A modified form of the braun array is used for the 
binary multiplication and addition of input operands A, B, 
and C. The structure to be used in this design is shown in 
Figure 5. 3. This structure is the common input stage to 
all moduli of the MAC. 
A380 A280 
C3 
AIBO 
CI 
AOBO 
CI CO 
FA FA FA 
FA 
281 
FA 
181 081 
FA 
A382 
FA 282 FA 182 082 FA 
A383 
FA FA 
283 183 083 FA 
FA FA FA 381 
Figure 5. 3 Modified Braun Array 
P =AxB+C 
A typical braun array consists of the lower portion of the 
array of full adders. A top row of full adder cells was 
added to the top of the array in order to accommodate the 
additive input C. Thus the multiplication of A and B, and 
the addition of C to tne result of A and B occurs 
simultaneously. Full adder cells will appear throughout 
this thesis, and it is appropriate at this point to 
introduce the repetitively used full adder cell. The 
unmodified full adder cell is shown in Figure 5. 4a. The 
full adder cell in Figure 5. 4b is modeled after the cel' in 
F'ig 5. 4a, with several "cosmetic" differences. The full 
adder cell in Figure 5. 4b is actually identical to the cell 
in Figure 5. 4a. First, the plotter resolution does not 
include the bubble at the output of NAND gates, so that 
NAND gates appear to be AND gates, although this is not the 
case. Secondly, the very last gate in Figure 5. 4b really 
is a hardwired AND gate, but Neted does not contain a 
hardwired AND gate in the component library. Thus, a 
regular AND gate with an area and time delay of zero was 
inserted for functional and simulation purposes. Thirdly, 
the vertical dimension of the actual implementation is 
reduced such that more full adder cells could be placed on 
the schematic editor screen. It is for this reason that 
the NAND gates are offset from one another rather than in a 
straight line. The hardware implementation of the modified 
Braun Array is shown in Figure 5. 5. This array will be 
used at several places, and will be referred to regularly. 
A) 
Figure 5. 4 Full Adder Cell Design 
Figure 5. 5 Moditied Braun Array Hardware 
5. 2. 3 Lower Truth Table Modulo m 
The lower truth table block has five inputs, and four 
outputs. The input bits are the five lowest order bits of 
the result from the braun array. The output bits represent 
the input modulo m ( /input/ ). There are five different 
moduli in this design, but only four of them have hardware 
requirements. Modulo 16 operations are equivalent to 
typical binary operations, thus no further manipulation of 
the result from the braun array is necessary ( the lower 
four bits are the needed result ) . The detailed and 
complete design procedure of the lower modulo 15 truth 
table will be presented, as well as the results of the 
modulo 7, modulo 11, and modulo 13 truth tables. 
Table 5. 1 shows the five-bit binary input decimal 
equivalent, as well as the result modulo 15, which is the 
desired output of the truth table. The four truth table 
outputs are labeled W, X, Y, and Z, in order of decreasing 
significance. For example, a decimal equivalent input of 
23, gives a result of 8 = /23/„s. From Table 5. 1, a 
Karnaugh Map may be formed for each individual output bit, 
W, X, Y, and Z. Karnaugh Maps allow output variables to be 
simplified into logical equations, as functions of their 
inputs. The Karnaugh Maps for the Modulo 15 truth table 
are shown in Figure 5. 6. The simplified output equations 
derived from the Karnaugh Maps are as follows: 
2 
Table 5. 1 Modulo 15 Truth Table 
Five Bit 
Result 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
Result 
Mod 15 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
0 
1 
Truth Table Out uts 
Y 
0 1 
0 1 
0 1 
0 0 
0 0 
1 1 
1 1 
1 1 1 
1 0 1 
0 0 0 
1 1 1 
0 0 0 0 
0 0 1 0 
1 1 0 1 
Figure 5. 6 Modulo 15 Karnaugh Maps 
W = BD + BC + ABDE + ABODE 
X = CD + ABC + ACDE + ACE + BCE 
Y = ADE + ADE + ABD + ACD + BDE + CDE 
Z = ADE + ADE + ABE + ABE + ABCE + ACE + ABCDE 
These equations can be implemented with multi-input NAND 
gates and inverters. The hardware implementation of the 
Modulus 15 truth tables is shown in figure 5. 7. Once 
again, this is an all NAND realization even though plotter 
resolution does not allow foz bubbles on the outputs of 
NAND gates. The circuitry on the left hand side is the 
lower truth table implementation, the circuitry on the 
right hand side is the upper truth table implementation 
(discussion in the following section) . Similar truth 
tables and Kaznaugh Maps can be formed for the remaining 
moduli (found in Appendix B) . The results aze given below: 
W = 0 
X = ACD + ABCE + BCDE + ACD + ABCE + BCDE 
Y = ABDE + ABCD + ABDE + ACDE + ACDE + ABCD + 
ABDE + ABCD + ABDE + ACDE 
Z = BCE + BCE + ABCE + ABDE + ABDE + BCDE + ABCDE 
W = ABCD + ABCE + ABCD + ABCD + ABCDE 
X = ABC + ACDE + ABCD + ABCD + ABCD t ACDE 
Y = ADE + ABD + BCDE + ADE + ABD + BCDE 
Z = ABE + ABCE + BCDE + ABDE + ABE + ACDE + ABCE 
Figure 5. 7 Modulo 15 Truth Tables 
66 
W = ABDE + ABC + BCD + ABCE + ABCD 
X = ABC + ACDE + ABCD + ABCD + ACDE + BCDE 
Y = ABD + ADE + ACD + BDE + ADE + ABCD 
Z = ABE + ACE + ABE t ACDE + ABCE + ABDE + ABCDE 
5. 2. 4 Upper Truth Table Modulo m 
The upper truth table is developed in a manner similar 
to the lower truth table. Each upper truth table has five 
inputs, and four outputs, but the significance of the input 
bit positions are greater than those of the lower truth 
tables. From Figure 5. 2, it is seen that the Modified 
Braun Array structure has eight outputs, five of them 
connected to the lower truth table as inputs, and three of 
them connected to the upper truth table. The reader should 
note that only the three lower ozdez bits are necessary for 
the MAC, but it will be shown shortly that the input 
translational problem will have a necessity for the 
uppermost two bits. The truth tables and Karnaugh Maps are 
very similar to those previously presented, except for the 
relative magnitude of each input bit position. Previously 
a binary input of 00011 represented the decimal value of 
three. For the upper truth tables this is not the case, a 
similar input in this case yields a decimal value of 
ninety-six (00011XXXXX)s, remembering that the least 
significant bit is really the sixth significant position 
from the Modified Braun Array. The results of the Karnaugh 
Maps for the various moduli are presented below: 
W = 0 
X = DE + BD + CE 
Y = CD + CE + AD 
Z = AD + ADE + ACD 
~~11 1 
W = ACE + ACD 
X = C + AD + AE 
Y = ADE + ACD + ABD + ADE + ADE 
Z = AE + ADE + B + CE 
W = ADE + CE + B + AE 
X = AE + ADE + B + ACE + ACD 
Y = AE + BCD + CDE + CE 
Z = AD + CDE + ABDE 
W = ADE + BC 
X = ABD + ADE + ADE 
Y = AE + ABE 
Z = 0 
The hardware implementation of the modulo 15 upper 
truth table was shown previously in Figure 5. 7. 
5. 2. 5 t'our — BiL Binary Adder 
Th bi nai y adder rs u ed Lo add t he two . our — brt 
o»Lpuis from the upper and lower truth tables. ri serral 
ripple carry adder is used in this case, which is simply a 
one — dimensional array of full adder cells. To form the 
four — bit binary adder, four of the cells in F'igure 5. 4 are 
repeated, and connected such that the carry out from lower 
bit positions becomes the carry in for higher bit 
positrons The complete four-bit binary adder is shown in 
pigure 5. 8. 
Figure 5. 8 Four-Bit Binary Adder 
5. 3 Input Translation 
Input translation must occur before the various 
operands may enter the MAC computing array. Globally, the 
matrix multiplication chip is composed of three major 
parts, he input translator, the MAC array, and the output 
translator. This section will discuss input translation. 
The goal of the input translational portion is to 
convert an eight-bit input operand, in signed magnitude 
format, to the proper residue system representation. The 
method used for input translation uses several functional 
blocks common to the Multiply and Add Cell, reducing the 
design time. It was found that if the input operand is a 
negative number, /x/ = /M-b/ must be calculated (where 
M = 7*11*13*15*16, and b is the absolute value of the input 
operand) to obtain the correct residue representation, 
rather than /x/ when the input operand is positive. 
Example 5. 2 gives several examples of the input conversion 
process. 
~m~le~: 
m = ( 7, 11, 13, 15, 16) 
M = 7*11*13*15*16 = 240240 
Given the following input operands: 
A = (10011010) s = ( 26) io 
B = (01101001) s (+105) io 
C = (10001001) z = ( 9) io 
the residue representations are found as follows: 
/A/ = /240240 — 26/ i = 1, 2, 3, 4, 5 
/B/ /105/ i = 1, 2, 3, 4, 5 
/C/ = /240240 — 9/ i = 1, 2, 3, 4, 5 
The residue representations are: 
70 
A = ( 2, 7, 0, 4, 6 ) 
B = ( 0, 6, 1, 0, 9 ) 
C = ( 5, 2, 4, 6, 7 ) 
The block diagram for the input translation of an 
eight-bit signed magnitude input operand is shown in Figure 
5. 9. The various hardware aspects of input translation 
will now be examined in detail. 
5. 3. 1 Input Operand Adjustment 
The input operand adjustment portion of the input 
translational process examines the most significant bit of 
the input, if this value is a logical "1", then the input 
is negative, and the absolute value of the input must be 
subtracted from M ( M=240240 ). If the most significant 
bit of the input is a logical "0", then the input is 
positive, and the input value should be transferred through 
the input operand adjustment block. It should be noted 
that eighteen bits are required to express M in binary 
form. This implies that eighteen-bit addition will have to 
be performed when the input operand is negative ( M 
absolute value of x). A great simplification can be made 
at this point due to the limitation of the input range. It 
should be noted that for an eight-bit signed magnitude 
number, the largest positive or negative number is one 
hundred and twenty-seven. When this value is subtracted 
from 240240 in binary form, only the ten least significant 
M = 240240 
10 
Input 
Input Operand Adiustment 
10 
Mod 7 Mod I I Mod 13 
Upper/Lower Upper/Lower Upper/Lower 
Truth Tables Truth Tables Truth Tables 
Mod 15 
Upper/Lower 
Truth Tables 
4 4 4 4 4 4 4 4 
Four Bit Four Bit Four Bit Four Bit 
Binary Adder Binary Adder Binary Adder Binary Adder 
Mod 7 
Lower 
Truth Tables 
Mod 11 
Lower 
Truth Tables 
Mod 13 Mod 15 
Lower Lower 
Truth Tables Truth Tables 
4 3 4 0 4 0 4 
Four Bit Four Bit Four Bit. Four Bit 
Binary Adder Binary Adder Binary Adder Binary Adder 
5 5 
Mod 7 
Lower 
Truth Tables 
Mod 11 
Lower 
Truth Tables 
Mod 13 
Lower 
Truth Tables 
Mod 15 
Lower 
Truth Tables 
4 4 4 4 
Complete Residue Representation of Eight Bit Signed Magnitude Input 
Figure 5. 9 Input Translation Functional Configuration 
72 
bit positions are altered. Therefore, it is useless to 
carry out the full eighteen-bit. addition process, when only 
the lowest ten bit positions have the possibility of 
changing. In other words, the top eight bit positions 
remain constant throughout the addition process. The 
necessary manipulation of the upper eight bit positions 
will be examined in the following section. 
The hardware implementation to accomplish the input 
operand adjustment portion of the input translation process 
is shown in Figure 5. 10. It is fairly simple in design, a 
ten-bit binary adder, with a row of exclusive-or gates at 
the input. It should be stated that whenever the input is 
positive, the circuitry allows the input operand to pass 
through unchanged, but when negative, the absolute value of 
the input operand is subtracted from the lower ten bit 
positions of 240240. From the above simplification, the 
result of the subtraction yields a ten-bit number. The 
rest of the input translation of Figure 5. 9 will now be 
examined. 
5. 3. 2 Residue Digit Generation 
The tight dynamic range of the eight-bit input, as 
compared to the large value of M, allows a simplification 
in the upper truth tables of the individual Karnaugh maps. 
Since the input number subtracted from M must be less than 
127, the dynamic range of resultant is a ten-bit 
73 
Figure 5. 10 Input Operand Adjustment 
number between 01111XXXXX and 10011XXXXX. Remembering 
that the Multiply and Add Cell only requires values between 
00000 and 00111, a great simplificarion is made in the 
upper truth tables for all of the moduli. The reason for 
this simplification was not explained when the upper truth 
tables were i. nt. roduced earlier in this chapter. 
As was mentioned previously in the operand adjustment 
section, the upper eight bits of the eighteen-bit 
representation of 240240 are not involved in the 
subtraction process in the operand adjustment portion of 
Figure 5. 9. However, they must be taken into consideration 
to produce a correct result. The upper eight bits of N 
corresponds to a decimal value of 239616. It was found 
that if /239616/ was added to the result of the previous 
steps for each corresponding moduli, that the resulting 
residue representation was correct. The calculation of the 
values to be added to each moduli are calculated below: 
/239616/7 
/239616/zx 
/239616/~s 
/239616/rs 
/239616/z s 
These values are shown to be added in at the appropriate 
step in Figure 5. 9. It is expected that an engineer will 
spend more time designing a residue system than he or she 
would a more conventional system. It should be noted that 
75 
there are quite a few common blocks that are repetitively 
used in this proposed design. More importantly, the common 
building blocks may be optimized separately, then combined 
in an orderly fashion. This methodology both improves 
cizcuit performance and saves silicon area. 
5. 4 Output Translation 
The Chinese Remainder Theorem was examined first when 
considering an appropriate method of converting from 
residue representation back to a binary or fixed weight 
representation. Implementation of the Chinese Remainder 
Theorem requires addition modulo M, which in this case 
means several addition or subtractions with word lengths 
greater than eighteen. Another method of conversion from 
residue to a weighted system is called the mixed-radix 
conversion process [2]. It was found that the mixed-radix 
conversion process has two main advantages over an 
implementation of the Chinese Remainder Theorem. First, a 
significantly larger amount of hardware is required to 
directly implement the Chinese Remainder Theorem. Second, 
the Chinese Remainder Theorem does not allow any type of 
intermediate magnitude comparison or sign determination. 
It will be shown that at an intermediate step of the mixed- 
radix conversion process, enough information exists to 
compare the magnitude of two residue numbers, or to 
determine the sign of a residue number. Using the Chinese 
76 
Remainder Theorem, it is impossible to obtain any of the 
above mentioned information without completely converting 
to the binary representation. 
The mixed radix conversion process is governed by the 
following equation: 
x = As(m~m2msm~) + A~(mam@ms) + As(mrmz)t A2(mr) t A~ 
where ( mr, ma, m3/ m«ms ) = ( 16, 15, 13, 11, 7 ), and x 
is the result of converting a number from residue to binary 
representation. It should be noted that the mixed-radix 
system is a weighted system, hence magnitude and sign 
determination is relatively easy. The mixed radix 
representation of x is given by &Ar, Aa, As, A~, As&. The A 
values may be determined by the following: 
Aj = rr 
A2 —  / ( (x — rr) /mj ) / 
As = / ( (A2 — r2) /ma) / 3 
A~ = / ( (As — rs) /ms) / 
As = / ( (A4 — r~) /m~) / 
Residue division is not really occurring even though the 
above equations imply it is necessary. Multiplying by the 
multiplicative inverse is the same as division, which will 
be the approach taken. The functional diagram of the mixed 
radix conversion process is shown in Figure 5. 11. The 
individual portions of Figure 5. 11 will now be examined. 
Also included in this section on output translation will be 
a comprehensive example unifying the individual processes. 
77 
Residue Represeninilnn Inpuis 
16 15 13 11 7 
Al 
NCA NCA NCA NCA 
CA CA CA CA 
CA CA CA 
CA 
9 9 4 
X X X 
UTT/LTT UTI/LTT UTT/LTT UTT/Lrr 
LTT LTT LTT LTT 
A2 
NCA NCA NCA 
CA CA CA 
CA 
3 I 
CA 
X X X 
urr/LTT vn/Ln urr/LTr 
LTT LTT LTT 
A3 
NCA NCA 
CA CA 
CA CA 
UTT/LTT IJIT/LTT 
LTT LTT 
A4 
NCA 
CA 
CA 
2 
X 
UTT/LTI' 
+ 
Ln 
Figure 5. 11 Mixed Radix Coefficient Determination 
Output Translation 
78 
5. 4. 1 Controlled Addition/Subtraction 
Two of the blocks in Figure 5. 11 are called "CA" and 
"NCA". These notations stand for conditional and non- 
conditional addition. The non-conditional addition block 
has two four-bit numbers as inputs and a five-bit number as 
an output. The block subtracts the one four — bit input from 
the other. This subtraction performs the x — ri portion of 
the mixed-radix process. Since a residue representation 
must not be negative, the subsequent conditional adder 
blocks sample the most significant bit of the five-bit 
input, if this is a zero, then no computation occurs. If 
the most significant bit is a one, then the input is 
negative, and thus the modulus (m ) for the appropriate 
digit of the residue representation (r ) must be added to 
the negative number. This non-conditional addition process 
must be repeated until it is assured that the residue 
representation at each of the stages in determining the A 
contains only non-negative ( or zero ) residue digits. The 
hardware implementation of both the conditional and non- 
conditional adders are shown in Figure 5. 12 and 5. 13. 
5. 4. 2 Multiplication by Inverses 
After the subtraction of the most significant 
remaining residue digits, and the conditional addition 
processes are complete, multiplication of the residue 
79 
Figure 5. 12 Conditional Adder 
Figure 5. 13 Non-Conditional Adder 
representations by the respective multiplicative inverse 
occurs. The multiplicative inverses are shown in Table 
5. 2. 
Table 5. 2 Multiplicative Inverses 
1|16 
1(15 
1~13 
Modulus 
15 13 
1 9 
17 13 
For example, /1~16/~ = 4. All multiplicative inverses can 
be represented in four binary bits except for 17, which 
requires five bits. Multiplication by 17, when using 
binary arithmetic, is simply the digits of the multiplicand 
repeated twice. For example, (17) (9) = (10011001)z, and 
(17)(13) = (11011101)z. Thus, the multiplication of all 
residue digits by their respective multiplicative inverses 
will only require a Braun array capable of four-bit 
multiplication. The desired Braun array has been 
previously designed, with the exception that the top row of 
full adders needed in the MAC has been deleted, and is 
shown in Figure 5. 14. The blocks in Figure 5. 11 labeled 
"UTT/LTT", "+", "LTT", are the same blocks that were 
presented in the MAC section. This sequence of blocks 
simply converts the larger input number into residue 
representation for each of the moduli. The following 
Figure 5. 14 Four Bit Braun Array 
82 
example should help clarify some of the items presented in 
this section. 
Given mr=16, ms=15, ms=13, m~=11, and ms=7, the mixed radix 
coefficients (A ) can be computed for a residue 
representation of x = ( 2, 3, 4, 2, 6 ) as follows: 
(reference Figure 5. 11 as necessary) 
Ar —  2 
(-2) 2 3 
2 0 4 
x(/li16/ ) 
(/r / ) 18 5 0 16 0 2 
A~=1 
(-1) 
(tm ) 
5 0 2 
— 1 
— 1 1 
10 1 
x (/1 ( 15/ ) 
28 30 1 
2 8 1 
(-2) 
(+m ) 
As=2 
2 8 1 
6 -1 
~+ 
6 6 
x(/1(15/ ) 
(/r / ) 102 78 3 1 
A~=3 
83 
(-3) 
(+m ) 
3 1 
— 2 
~+ 
5 
x(/1 ill/ ) 
(/r, / ) 
222 
10 
3 
As=3 
In summary, A~=2, Ax=1, As=2, A~=3, As=3 
Checking the Pesult: 
x = 3*16*15*13*11 + 3*16*15*13 + 2*16*15 + 1*16 + 2 112818 
x = 112818 = ( 2, 3, 4, 2, 6 ) in residue representation 
The resulting coefficients are exactly correct. 
5. 4. 3 Correct Sign Determination 
The conversion is almost complete. The multiplication 
and addition of the moduli and the mixed-radix coefficients 
(A ) can begin as soon as the first A is determined. The 
multiplication and addition begins and occurs 
simultaneously with the determination of the latter mixed- 
radix coefficients ( As, A« As) . Figure 5 . 15 shows the 
functional description of the addition and multiplication 
of the appropriate moduli and mixed-radix coefficients. 
The hardware implementations of the individual blocks are 
shown in Appendix C of this document. The last three 
blocks of Figure 5. 15 will be discussed now as they are 
important to a residue type design. It should be noted 
19S A4 
4 
8X4 Muluplier 
AS 
4 
Twelve-Bit 
ultiple Generato 
A3 1S 
'4 
Four-Bit 
B ra un A rray 
A2 
4 
Eight-Bit 
Binary Adder 
Twelve-Bit 
Binary Adder 
Al 
12 12 12 
Fourteen-Bit 
arry Save Adder 
14 14 
Fourteen-Bit 
arry Save Adde 
14 14 
Fourteen-Bit 
Binary Adder 
14 
18 
edified Adde 
A 
18 
odinted Adde 
B 
12 
Eighteen-Bit 
Two's Complemente 
18 
Eig h teen-B it 
Signed Magnitude 
Result 
Figure 5. 15 Multiplication and Addition of the 
Mixed-Radix Coefficients 
B5 
that the output before the Modified Adder A block will be 
an eighteen-bit number between 0 and 240240. If the number 
is in the range [0, 120119], then the result is a positive 
number and is correct in present form. If the number is in 
the range (120120, 240239], then the number is a negative 
number, and 240240 must be subtracted from it. The 
determination of sign must occur i. n two distinct stages. 
First M(2=120120 must be subtracted from the eighteen-bit 
result. If this number is negative, then the original 
number was in the range of [0, 120119], which was a positive 
number. In the very next stage 't is necessary to add 
120120 back to this number, because it needs to be a 
positive number as it was originally. If the subtraction 
of M)2 is not a negative number, then the original number 
was in the range [120120, 240239], and must be a negative 
number. In the very next stage it is necessary to subtract 
another 120120 from this number, so that a total of 240240 
has been subtracted from it. At this point, the result is 
the completely correct result in two' s complement 
representation. If the most significant bit is a "one", 
then the lower seventeen bits are complemented, and t. he 
result will be in the desired signed magnitude form. If 
the most significant bit is a "zero", then the result is in 
correct signed magnitude form, and should bypass the 
complementer stage. The hardware implementation of the 
Modified Adder A and B, as well as the two' s complementer, 
are shown in Appendix C of this thesis. 
86 
This concludes the introduction of the design of this 
research. The following chapter presents the simulation of 
this design, as well as a performance comparison to a more 
conventional approach. 
87 
CHAPTER VI 
SIMULATIQN RESULTS AND COMPARISON 
The purpose of this chapter is to present the 
simulation results, as well as compare the timing and area 
constraints of the residue design to a more conventional 
binary approach. Mentor Graphics Neted and Quicksim were 
used for schematic capture and logic simulation. 
6. 1 Simulation Development 
Chapter IV presented a detailed simulation of the 
matrix multiplication algorithm. Rather than simulate the 
design as a whole, the three fundamental portions were 
simulated. Specifically, the Multiply and Add Cell for all 
moduli, and the Input and Output translational portions 
were simulated. Simulation of the above portions, 
including the matrix multiplication algorithm simulation, 
will give all the required timing information. It was 
found that the schematic capture of the full design would 
be a very large task, and would not be beneficial to this 
research. 
Before any simulation or comparison begins, it is 
appropriate to present the primitive component timing and 
area models [17). These parameters will be used 
consistently throughout this chapter for comparison and 
simulation purposes. Table 6. 1 gives the proportional 
delay time of an individual gate, as well as the 
proportional area each component occupies. Actual timing 
and area information is strongly dependent on semiconductor 
processing. To obtain actual timing or area information, 
the values must be multiplied by a scaling factor, The 
scaling factor for the gate delays is T~, and typically 
ranges from . 25 ns to 1 ns. The area scaling factor A is 
typically around 25x25 square microns, with a strong 
dependence on the lithographic 1'newidth of the process 
used for fabrication. The simulation results from Quicksim 
will now be summarized. The more detailed simulation 
output in raw data form, as well as graphical form, can be 
found in Appendix D of this thesis. 
Table 6. 1 Primitive Component Models 
Com onent 
n-in ut NAND Gate n&10 
Inverter 
n-in ut AND Gate n&10 
n-in ut OR Gate n&10 
XOR Gate 
One-Bit Full Adder 
D Fli -Flo 
Dela T Area A 
10 
6. 2 Simulation Results 
This section will present the simulation results of 
each of the three major portions of this design. The most 
important portion of the simulation is the MAC. The timing 
information of the MAC will determine the overall system 
clock speed. The overall clock speed will be crucial when 
the comparison is made later in the chapter. A portion of 
this section also deals with the global timing information 
of the design. 
6. 2. 1 MAC Simulation 
The more detailed simulation output for the MAC can be 
found in Table A. l and Figures A. ll through A. 15 in the 
Appendix D. Table 6. 2 shown in this section is to 
summarize the MAC simulation results. Since each MAC 
contains five independent residue multiply and add cells, 
there are five different sub-tables ( one for each modulus 
) shown in Table 6. 2. Also, note that numbers shown in the 
table have been converted to decimal for convenience. The 
worst case simulated delay from Table 6. 2 is 23 Te. The 
following calculations show how the worst case "predicted" 
gate delay for the MAC is obtained: 
Modified Braun Array: 
Delay = 1 AND Gate + 7 Full Adders (1) (2) + (7) (2) 
16 T~ 
Upper/Lower Truth Table: 
Delay = 1 inverter + 2 nand gates (1) (1) + (2) (1) 
Table 6. 2 MAC Simulation Results 
Modulo 7 
Input A Input B Input C 
Trial ¹ I 4 3 5 
Trial ¹2 6 3 5 
Trial ¹3 3 6 1 
Trial ¹4 3 4 3 
Result Delay 
3 23 
2 13 
5 23 
I 14 
Modulo 11 
Input A Input B Input C 
Trial ¹ I 8 9 4 
Trial ¹2 3 7 9 
Trial ¹3 4 8 2 
Result Delay 
10 21 
8 21 
1 22 
Trial ¹4 1 3 10 2 15 
Modulo 13 
Input A Input B Input C 
Trial ¹ I 8 7 10 
Trial ¹2 4 9 2 
Trial ¹3 12 4 8 
Trial ¹4 2 11 5 
Result Delay 
I 23 
12 19 
4 19 
I 16 
Mo&iulo 15 
Input A Input B Input C 
Trial ¹ I 13 10 3 
Trial ¹2 3 9 12 
Trial ¹3 8 9 4 
Trial ¹4 9 3 7 
Result Delay 
13 23 
9 18 
I 16 
Modulo 16 
Input A Input B Input C 
Trial ¹I I 4 9 
Trial ¹2 3 5 2 
Trial ¹3 4 12 10 
Trial ¹4 I 11 12 
Result Delay 
13 9 
I 9 
10 8 
7 6 
Four-Bit Binary Adder: 
Delay = 4 Full Adders 
(4) (2) 
B T~ 
Lower Truth Table: 
Delay = 1 inverter + 2 NAND gates (1) (1) + (2) (1) 
= 3 Te 
Summing the above gate delays gives the MAC worst case gate 
delay, which is 16+3+()+3 = 30 T . Note that the simulated 
worst case delay should always be less than or equal to the 
worst case predicted delay, as is the case in Table 6. 2. 
6. 2. 2 Input Simulation 
The results of the input translation process are 
summarized in Table 6. 3. The more detailed simulation 
output can be found in Table A. 2 and Figures A. 16 through 
A. 20. The reader should notice that some of the delays are 
larger than 30. This means that the input translation 
process must be broken up into two stages since the 
pipeline segment time (governed by the MAC) is the maximum 
time any one segment should take to execute. Examining the 
worst case delays for the input translation process will 
help determine where the latches will need to be placed, 
ensuring that no operation in the input translational 
process exceeds the maximum of 30 Tw. The worst case 
delays for the input translation process are given as 
Table 6. 3 Input Translation Simulation Results 
Trial ¹1 
Trial ¹2 
Trial ¹3 
Trial ¹4 
Modulo 7 
Input Result Delay 
-44 5 25 
+77 0 26 
-114 5 23 
+57 I 15 
Trial ¹ I 
Trial ¹2 
Trial ¹3 
Trial ¹4 
Modulo 11 
Input Result Delay 
-82 6 29 
+60 5 21 
41 3 28 
+114 4 23 
Trial ¹I 
Trial ¹2 
Trial ¹3 
Trial ¹4 
Modulo 13 
Input Result Delay 
-36 3 22 
+109 5 25 
-51 I 24 
+71 6 20 
Trial ¹I 
Trial ¹2 
Trial ¹3 
Trial ¹4 
Modulo 15 
input Result Delay 
-115 5 35 
+36 6 15 
43 2 25 
+28 13 19 
Trial ¹I 
Trial ¹2 
Trial ¹3 
Trial ¹4 
Modulo 16 
Input Result Delay 
-51 13 7 
+49 I 4 
-87 9 6 
+104 8 2 
83 
follows: 
Input Operand Adjustment: 
delay = 1 XOR Gate + 10 Full Adder (I) (2) + (I 0) (2) 
=22 Tv 
Upper/Lower Truth Tables: 
delay = 1 Inverter + 2 NAND Gates (1) (I) + (2) (1) 
=3 Tv 
Four-Bit Binary Adder: 
delay = 4 Full Adders 
(4) (2) 
8 Tv 
Lower Truth Table: 
delay = 1 Inverters t 2 NAND Gates (1) (I) + (2) (1) 
= 3 Tv 
F'our-Bit Binary Adder: 
delay = 8 T 
Lower Truth Table: (Same) 
This gives a worst case gate delay of 47 T for the input 
translation process. If a row of latches is placed between 
the Upper/Lower Truth Tables block and the first Four-Bit 
Binary Adder block, the gate delays for the two different 
stages are 25 Tv and 22 T for the first and second stages, 
respectively. 
6. 2. 3 Output Simulation 
The simulation results of the output translational 
process are summarized in Table 6. 4. Again, the 
Table 6. 4 Output Translation Simulation Results 
od 16 Mod 15 Mod 13 Mod 11 Mod 7 
Input In ut In ut Input Input Result Delay 
Trial ¹ I 7 13 2 4 3 -104297 173 
Trial ¹2 13 13 4 8 6 107533 125 
Trial ¹3 12 5 10 5 4 12380 118 
Trial ¹4 6 2 3 7 2 -1258 160 
more detailed simulation output can be found in Table A. 3 
in the Appendix D. There is no timing diagram for the 
output portion because the large number of signals would 
not fit with clarity on one page. Output translation is 
the most complex of the operations presented thus far. 
There are many design options, depending on the complexity 
of circuitry used in the addition and multiplication of the 
mixed-radix coefficients. This design uses only simple 
binary adders and multipliers. Much faster methods are 
available, but they consume a much larger area. Faster 
methods, as will be proved shortly, are not beneficial. 
Output translation is a pipelined process. As soon as the 
first matrix multiplication is out of the MAC array, the 
next multiplication may begin. In light of this, one pair 
of matrices are being multiplied together at the same time 
the result of the prior pair of matrices is going through 
output translation. Thus, for the above stated reason, the 
time required for output translation is negligible after 
the first matrix multiplication. It can be shown that the 
output translational process must be broken into fifteen 
95 
stages in order to operate at the same clock cycle as the 
rest of the matrix multiplier. From Figure 5. 11 it takes 
eight clock cycles to generate the mixed-radi. x coefficient 
A5. The remaining seven ( 8t7 = 15 ) are a result of t. he 
Twelve-Bit Multiplication, the Fourteen-Bit Carry Save 
Adders, the Modified Adders A and B, and the Two's 
Complementer (all shown in Figure 5. 15) . The resulting 
output matrix values will most likely be transferred off of 
the matrix multiplier chip at the same clock rate as the 
system bus. Also, it is very unlikely that the system bus 
will be operating at the same speed as the matrix 
multiplier. Nonetheless, for comparison purposes, this 
research will use a worst case of 15 clock cycles to 
convert from residue representation to signed magnitude 
format. The following section examines the the global 
timing information from the above simulation results, as 
well as the results of the matrix multiplication algorithm 
simulation. 
6. 2. 4 Global Considerations 
Although a comprehensive global simulation has not 
been performed, there exists enough information to 
precisely predict the overall performance of the design. 
As derived in Chapter IV, it takes seventeen clock cycles 
to completely multiply two matrices together. From the 
96 
previous section, it takes 2 clock cycles for input 
translation, and 15 clock cycles for output translation. 
Therefore, it takes 34 clock cycles to complete the first 
matrrx multiplication. It takes 17 additional clock cycles 
for each successive matrix multiplication. The following 
equation gives the total processing time (T~) in gate 
delays for a certain number of successive 
multiplications (N): 
I ) (tmult:) (ts) N&1 
matrix 
where t~ = (17 cycles + 17 cycles) te 
tn = tssg + teer 
=30+3 
33 To/cycle 
t me = clock cycles to complete 
MAC array portion 
Simplifying: 
T = 1122 + (N — 1) (17) (33) 
1122 + (N) (561) — 561 
561 + (N) (561) 
(N+1) (561) T 
For example, the total processing time to multiply two 
pairs of matrices in succession is 1683 Te. The following 
section calculates the area such a design occupies on 
silicon. 
6. 3 Residue Design Area Calculations 
This section will briefly show the calculations made 
in determining the area this design will occupy on silicon. 
Also in this section will be a discussion on global area 
97 
issues 
6. 3, 1 MAC Area 
The MAC area calculations are shown below, it may be 
necessary for the reader to refer back to figures in the 
previous chapter dealing specifically with the MAC. The 
calculat. ions for one MAC are as follows: 
Modified Braun Array: 
Area = 16 Full Adders + 16 And (16) (10) + (16) (2) 
192 Av 
( 6 Less Full Adders for Mod 1 
Gates 
6 Braun Array Only) 
Lower and Upper Truth 
Modulo 7: 
Upper 
Area 
Lower 
Area 
Modulo 11: 
Upper 
Area 
Lower 
Area 
Modulo 13: 
Upper 
Area 
Lower 
Area 
Modulo 15 
Upper 
Tables: 
12 NAND Gates + 5 1nverters (12) (1) + (5) (1) 
17 Av 
27 NAND Gates + 5 lnverters (27) (1) + (5) (1) 
32 Av 
18 NAND Gates + 5 Inverters (18) (1) + (5) (1) 
23 A 
28 NAND Gates + 5 Inverters (28) (1) + (5) (1) 
33 A 
20 NAND Gates + 5 Inverters (20) (1) + (5) (1) 
25 Ae 
28 NAND Gates + 5 Inverters (28) (1) + (5) (1) 
33 Av 
98 
Area 
Lower 
Area 
10 NAND Gates + 5 Invezters (10) (1) + (5) (1) 
15 A 
26 NAND Gates t 5 Inverters 
(2 6) (1) + (5) (1) 
31 A~ 
Four-Bit Binary Adder: 
Area = 4 Full Adders 
(4) (10) 
40 Av 
TOTAL MAC AREA = 5 (192) + 80 + 2 (129) + 4 (40) — 6 (10) 1398 Av 
Thus the total (4ultiply and Add Cell Area is 1398 Aw. 
6. 3. 2 Input Translation Area 
The input translation area calculations are as 
follows: 
Input Operand Adjustment: 
Area = 10 Full Adders + 7 XOR Gates (10) (10) + (7) (3) 121 Aw 
Upper and Lower Truth Tables: 
(Same as Above) 
Four-Bit Binary Adder: 
&Same as Above) 
TOTAL AREA = 1 (121) + 80 + 3 (129) + 8 (40) 908 AK 
Thus the total Input Translation Area is 908 Ae. 
6. 3. 3 Output Translation Area 
The output translation area calculations are as 
ca 
follows: 
NCA (Non-Conditional Adder): 
Area = 5 Full Adders + 4 Inverters (5) (10) + (4) (1) 
54 A 
CA (Conditional Adder); 
Area = 5 Full Adders t 4 AND Gates (5) (10) + (4) (2) 
58 A~ 
Four-Bit Braun Array: 
Area = 12 Full Adders t 16 AND Gates (12) (10) + (16) (2) 
152 Aw 
Truth Tables: 
(Same as Above) 
Four-Bit Binary Adder: 
(Same as Above) 
Eight-Bit Binary Adder: 
Area = 8 Full Adders (8) (10) 
80 As 
Eight by Four Multiplier: 
Area = 32 Full Adders + 32 And Gates (32) (10) + (32) (2) 
384 A 
Twelve-Bit Binary Adder: 
Area = 12 Full Adders (12) (10) 
120 Ae 
Multiple Generator: 
Area = 36 And Gates (36) (2) 
72 A 
Fourteen-Bit Carry Save Adder: 
Area = 14 Full Adders (14) (10) 
140 Ae 
Modified Binary Adder A: 
Area = 18 Full Adders (18) (10) 
180 AQ 
100 
Modified Binary Adder B: 
Area = 36 Inverters + 54 NAND Gates + 18 Pull 
Adders 
(36) (1) t (54) (1) + (18) (10) 
270 AR 
Twos Complementer: 
Area = 1'7 AND Gates + 17 OP Gates + 17 XOR Gates (17) (2) + (17) (2) + (17) (3) 119 A 
TOTAL OUTPUT 
TRANSLATION AREA = 10(54) + 19(58) + 6(152) + 850 + 400 
+ 80 + 404 + 120 + 72 + 2(140) 
+ 140 + 180 + 270 + 119 
5469 A 
Thus the total Output Tran lation Area is 5469 Aw. 
6. 3. 4 Global Considerations 
There are several global options to be considered when 
implementing such a design, depending strongly upon the 
total amount of silicon area available. One may examine 
the output coefficient pattern, and note that each output 
port only has a non-zero coefficient every three clock 
cycles, such that three output ports could share an output 
translator. It is possible to build an array of data 
latches to accumulate the three resultant matrix operands 
at each clock cycle. In this method, the resulting 
operands wait their turn to enter the single output 
translator, and are then transferred off the of the matrix 
multiplier chip. Another scheme could be to implement 
three different output translators, with each translator 
transferring an output matrix coefficient off chip every 
101 
clock cycle. The latter scheme is much more efficrent 
time-wise, but requires three output transistors as opposed 
to one. The latter scheme also requires more component 
package pins. 
The input translator rs not as large, and does not 
require as much consideration. Since each input port only 
requires an input operand every three clock cycles, and 
there are 10 input port. s, only four input translators are 
necessary. 
The global area calculation, for both methods of 
output translation is shown below: 
TOTAL AREA = 4(908) t 25(1398) + (1 or 3) (5469) 
44051 or 54989 Ae 
The following calculation should give the reader an idea of 
how large such a circuit is on silicon. 
(Av = 25x25 square microns) 
Chip Area = 54989x(25x10 ) x(25x10 ) = . 344 cm 
6. 4 Design Comparison 
It is the purpose of this section to compare the 
residue design to a more conventional implementation of the 
matrix multiplication algorithm. As previously stated, the 
residue system is error free, in the sense that the output 
is correct for all possible eight-bit inputs. It was 
assumed that the input operand was an eight-bit. integer 
value. 
102 
6. 4. 1 Comparison Structure 
The approach of the comparison structure will be that 
of cascading a signed magnitude multiplier with a signed 
magnitude adder. Signed magnitude multipliers are the same 
as a Braun array, with the addition of one XOR Gate to 
determine the resulting sign. The timing and area 
calculations for the eight-bit multiplier are shown below: 
Eight-Bit Multiplier: 
Delay = 12 (2) + 2 
26 T 
Area = 49 (2) + 42 (10) 
518 Ae 
The calculations for a eighteen-bit signed magnitude adder 
are shown below (17): 
Eighteen-Bit Adder: 
Delay = 3 (2) + 17 (2) + 17 (2) + 2 
76 T 
Area = 16 (18) 
288 Ae 
Thus, the comparison structure has a total of 102 T , and 
an area of 806 Av. As with the residue design, the 
following global area calculation will allow comparison to 
the residue design: 
Total Area = 25(806) = 20150 Av 
The total , processing time (Te) to multiply N successive 
pairs of matrices together is given by the following 
equation: 
103 
T = N (te) (17) 
N (t&&& + tarr) (17) 
N (102 + 3) (17) 
1785N T 
N&1 
The total time required to multiply two pairs of matrices 
in succession is 3570 T , the time required to multiply 
three pairs of matrices together in succession is 5355 Ts. 
6. 4, 2 Time and Area Comparisons 
The global area required to implement the residue 
design is 44051 A~ or 54989 A depending on the output 
strategy used. The global area required for the 
conventional binary approach is 20150 A . The residue 
design is 2. 18 or 2. 73 times larger than the binary 
approach. 
The timing comparisons yield different results 
depending upon the assumed number of successive 
multplications occurring. Table 6. 5 shows the time 
required to execute a given number of matrix 
multiplications for the residue and conventional method, as 
well as the ratio of the two processing times. 
Table 6. 5 Processing Time Comparison 
N esidue Tp 
1122 
3366 
Binary Tp 
1785 
8925 
Tp Ratio 
1. 59 
2. 65 
10 6171 17850 2. 89 
50 28611 
100 56661 
500 28 1061 
89250 
178500 
892500 
3. 12 
3. 15 
3. 18 
104 
The total processing time T ratio converges to 3. 18. 1n 
an ideal application, the matrix multiplier is constantly 
in operation, It must be remembered that the residue 
design is capable of multiplying over 1, 5 million pairs of 
matrices in one second ( assuming a relative gate delay of 
one nano-second). For this reason, 500 successive 
multiplications (as assumed in Table 6. 5) is a very small 
number compared to actual hardware capabilities. 
i05 
CHAPTER VII 
CONCLUSION 
This research applied the Residue Number System to a 
specific digital signal processing problem, that of matrix 
multiplication. The mathematical operations of addition 
and multiplication are simpler than residue division and 
sign determination. The matrix multiplication algorithm 
was an ideal candidate, since it only requires 
multiplication and addition. 
The proposed design used common building blocks in the 
multiply and add cell, the input translator, and the output 
translator. Using the common building block approach to 
VLSI design greatly reduces design time. As a result of 
this, any extra design time spent optimizing the layout of 
these modules should be very beneficial to the performance 
of the overall design. This design methodology also 
achieves a higher chip density, resulting in both a cheaper 
and a higher performance implementation. 
The system presented in this research was designed to 
interface with a system using the signed magnitude number 
system. If this design is attached to a purely residue 
processor, neither the input nor the output translators are 
necessary. This would greatly affect the area comparison 
calculations, beneficial to the residue number system. The 
input and output translators were designed and included 
106 
because there are no commercially available residue 
processors, hence input and output. translators are 
essential at the present time. 
7. 1 Contributions 
In this research, a design was formulated for a 
specific input word length and matrix bandwidth. It should 
be noted that there is no limitation when extending this 
methodology to either larger word lengths or matrix 
bandwrdths. The number of truth table inputs is not 
dependent on the specific problem. In this research, two 
four-bit truth tables, and one two-bit truth table could 
have been implemented rather than two five-bit truth 
tables. In light of this, as many truth tables as 
necessary can be placed in parallel for larger word 
lengths. All other portions of the design may easily be 
extended to larger problems, although it may be necessary 
to add another modulus to satisfy dynamic range 
requirements. 
Typical methods of residue addition, and especially 
multiplication, require the use of ROM' s. ROM' s tend to be 
very slow, particularly in this case, where the global 
clock speed (determined by the MAC) is of prime importance. 
A very regular and modular approach to residue 
multiplication and addition was presented. The fact that 
addition and multiplication occurs simultaneously in this 
I07 
research is irrelevant, as each could occur alone with 
similar hardware. The proposed method of residue addition 
and multiplication is an excellent option to the vLSI 
designer. Along with residue addition and multiplication, 
this research also presented methods of input and output 
translation, which modified current methods of input and 
output translation. More importantly, these methods use 
the same building blocks as the MAC, which is essential to 
VLSI design. 
Tnis research also provided a comparison of the 
residue design to a more conventional approach. Although a 
larger amount of area is necessary to implement the residue 
design, it is still easily implemented on a single chip. 
The timing performance is very significant. The residue 
design is capable of a throughput greater than three times 
that of the binary design. The use of the residue system, 
throuah this comparison, should be greatly promoted. Also 
presented in this research were several practical design 
considerations, essential to a system designer considering 
a design of the residue type. Several comments were made 
on the less apparent properties of the RNS, as well as the 
process of moduli selection, 
determination. 
and dynamic range 
The exact input and output coefficient timing was 
derived for the matrix multiplication algorithm, due to 
inadequate presentation in prior literature. Also, a 
108 
simulation of the algorithm was conducted such that timing 
information could be obtained. From the algorithm 
simulation and Quicksim logic simulation, the comparison to 
a conventional binary approach was made. The area required 
for the residue design was found to be 2. 73 times larger 
than the conventional design in the worst case. A large 
portion of the difference is found in the input and output 
translation processes. The residue design is much faster. 
The residue design processes input matrices of bandwidth 
five 3. 18 times faster than the conventional design. This 
proves that "peed enhancements over conventional methods 
can be obtained through the application of the Residue 
Number System. 
7. 2 Future Research 
There are many areas within the Residue Number System 
which would benefit from further research. In order for 
the RNS to find its way into the commercial market, several 
shortcomings must be overcome. Residue division, is very 
difficult, as well as sign determination and magnitude 
comparison. It is the above mentioned limitations that 
currently prevent the possibility of an all-RNS 
workstation. 
It is very likely that many already developed 
algorithms, like the matrix multiplication algorithm, 
could also benefit greatly from the RNS. It is also likely 
109 
basic algorithms can be modified to exploit tne 
characteristics and unique properties of the RNS, 
Especially algorithms which fail to converge from round off 
error, since the RNS does not allow this type of error. 
The residue number system has many interesting 
properties to offer, but requires future designers to 
examine their individual applications, and to objectively 
evaluate the advantages and disadvantages of the RNS. 
110 
REFERENCES 
[1] Bayoumi, M. A, , G. A. Jullien, and W. C. Miller, 
"Highly Parallel Architectures for DSP Algorithms 
using RNS", Proceedings of ISCAS 85, pp. 1395 — 1398, 1985. 
[2] Szabo, N, S. and R. I. Tanaka, Residue Azi thmeti c and Its Appli cati ons to Computer Technology, New York: McGraw-Hill, 1967. 
[3] Kung, H. T. , "Structure of Parallel Algorithms", 
Advances in Computers, Vol. 19, pp. 65 — 112, 1980. 
[4] Baraniecka, A. and G. A. Jullien, "On Decoding Techniques for Residue Numbez System Realizations of Digital Signal Processing Hardware", IEEE Transactions 
on Circuits and Systems, Vol. CAS-25, No. 11, pp. 935- 936, November 1978. 
[5] Leung, Y. -Y. J. and M, A. Shanblatt, Performance Tradeoffs in *he Hierarchical Design of Regular VLSI Structures, Technical Report No. MSU-ENGR-86-001, 
Michigan State University, East Lansing, MI, January 1986. 
[6] Taylor, F. J. , "A VLSI Residue Arithmetic Multiplier", 
IEEE Transactions on Computers, Vol. C-31, No. 6, pp. 540-546, June 1982. 
[7] Bayoumi, M. A. , G. A. Jullien, and W. C. Miller, "A 
VLSI Implementation of Residue Adders", IEEE Transacti ons on Circui. ts and Systems, Vol . CAS-34, No. 3, pp. 284-288, March 1986. 
[8] Banerji, D. , "A Novel Implementation for Addition and Subtraction in Residue Number Systems", IEEE Transactions on Computers, Vol. C-23, No. 1, pp. 106- 109, January 1974. 
111 
[9] Banerji, D. K, and J. A. Brzozowski, "On Translation Algorithms in Residue Number Systems", IEEE Transacti ons on Computers, Vol. C — 21, No. 12, pp. 1281-1285, December 1972. 
[10] Alia, G. and E. Martinelli, "A VLSI Algorithm for Direct and Reverse Conversion from Weighted Binary 
Number System to Residue Number System", IEEE Transacti ons on Ci zcui t s and S yst ems, 
Vol 
. CAS-31, No . 12, pp. 1033-1039, December 1984, 
[11] Capocelli, R. M. and R. Giancarlo, "Efficient VLSI Networks for Converting an Integer from Binary System to Residue Number System and Vice Versa", IEEE Transactions on Circuits and Systems, Vol. CAS-35, No. 11, pp. 1425-1430, November 1988. 
[12] Bayoumi, M. A. , G. A. Jullien, and W. C. Miller, "I/O Strategies for Residue Number System Architectures for Digital Signal Processing Applications", Inteznati onal 
Symposium on Ci rcui ts and Systems 1984, pp. 1069-1072, 1984. 
[13] Lyman, J. , "Components and Packaging", Electr oni c Design, Vol. 37, No. 1, pp. 50 — 63, January 12, 1989. 
[14] Soderstrand, M. A. , Residue Numbez System Arithmetic: 
Modern Applicatio~s in Digital Signal Processing, New York: IEEE Press, 1986. 
[15) Agnew, J. , and R. C. Knapp, Linear Algebra Vi th 
Appli cati ons, Monterrey, CA: Brooks/Cole Publishing Co. , 1983. 
Leung, Y. -Y. J. and M. A. Shanblatt, "Systolic Array Simulation for Quantification of Speed/Area Parameters", Simulation, Vol. 44, No. 6, pp. 295-300, June 1985. 
[17] Hwang, K. , Computer Ari thmeti c, New York: John Wiley 
and Sons, 1987. 
112 
APPENDIX A 
MATRIX MULTIPLICATION ALGORITHM SIMULATION RESULTS 
113 
5X5 MATRIX MULTIPLICATION SIMULATION 
9 0 
3 — 3 
7 — 2 
6 — 2 
0 — 9 
8 
-5 
3 
— 9 — 1 
— 7 
-5 
— 25 — 40 
-5 — 31 
— 9 — 30 
26 31 
36 117 
— 62 
— 32 
— 66 
-58 
63 
45 
6 
33 
9 
— 3 
— 36 
9 
-24 
— 24 
— 42 
11R 
7R 
4R 
2R 
1R 
0 — 9 
— 2 0 
0 0 
0 -2 
— 3 0 
0 0 
9 0 
0 0 
0 5 
2 0 
0 0 
0 0 
3 0 
0 0 
0 0 
1L 
3L 
6L 
10L 
15L 
OUTPUTS 
0 -4 0 0 
0 0 — 6 0 
2 0 0 — 7 
0 -5 
0 0 
0 
3 
0 
0 
0 — 9 0 0 
0 0 — 1 0 
0 0 -5 
0 — 7 0 0 
0 0 0 0 
11 
7 
4 
2 
1 
3 
6 
10 
15 
0 
0 
0 
0 
0 
0 
0 0 
0 — 5 
25 0 
0 — 4 
0 0 
0 
0 
— 9 
0 
0 0 
— 62 
0 36 
0 0 
0 31 
0 — 30 
31 0 
0 -32 
0 0 
5 0 
0 -36 
0 0 
117 0 
0 0 
0 -5 
— 66 0 
0 33 
0 0 
9 0 
0 0 
0 
0 
63 
8 0 
0 
0 
-24 
0 
0 
0 
0 
0 
-3 
0 
-24 
0 
0 
0 
0 — 42 
0 0 
0 0 
L 
R 
0 
2 I 
L 
R 
0 
3 I 
L 
R 
0 
4 I 
L 
R 
0 
5 I 
L 
R 
0 11 
0 -4 
0 9 
0 -25 
0 7 
0 — 4 
0 3 
0 -5 
0 14 
0 -6 
0 9 
0 -40 
0 19 
0 -4 
0 7 
0 -9 
0 8 
0 — 6 
— 10 0 0 
— 3 0 
-31 0 
0 — 16 
0 -2 
0 -30 
0 -23 
0 -3 0 
0 -32 0 
0 0 24 
0 1 
0 31 
0 — 42 
0 — 2 
-48 0 0 
— 9 0 0 
2 0 0 
— 66 0 0 
0 — 13 0 
0 — 9 0 
0 5 0 
0 -58 0 
0 35 0 
0 -1 0 
0 2 0 
0 33 0 
0 0 90 
0 0 — 9 
0 0 3 
0 0 63 
0 0 14 
0 0 — 1 
0 0 5 
9 
0 
0 
9 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
-3 
0 
0 
— 3 
-24 
0 
0 
-24 
0 
0 
0 
0 -42 
0 -42 
0 0 
0 -42 
114 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
0 
I 
L 
R 
0 
I 
L 
R 
0 
I 
L 
R 
0 
I 
L 
R 
0 
I 
L 
R 
0 
I 
L 
R 
0 
I 
L 
R 
0 I 
L 
R 
0 
I 
L 
R 
0 
I 
L 
R 
0 
I 
L 
R 
0 I 
L 
R 
0 
I 
L 
R 
0 
0 11 0 
0 
0 
0 
0 
0 
0 
5 
0 
1 
3 
2 
7 
8 
2 
3 
14 
0 
— 5 
0 
0 
26 
— 6 
7 
— 1 
-2 
— 7 
3 
— 2 
9 
4 
9 
45 
0 19 
0 4 
0 2 
0 2 
0 8 
0 16 
0 -5 
0 3 
0 1 
0 3 
0 6 
0 18 
0 16 
0 2 
0 5 
0 26 
0 8 
0 -5 
0 2 
0 -2 
0 — 10 
0 1 
0 — 7 
0 9 
0 — 62 
18 
-4 
— 2 
26 
— 48 
18 
— 3 
6 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
— 9 
36 
12 
— 6 
-2 
24 
7 
— 7 
7 
-42 
6 
4 
3 
18 
9 
-36 
— 6 
— 9 
54 
-30 
— 7 
-2 
— 16 
15 
4 
7 
43 
0 
0 
0 
0 
0 
54 
7 
9 
117 
— 16 
3 
1 
— 13 
43 
4 
— 2 
35 
— 12 
— 7 
— 3 
9 
0 9 
0 — 14 
0 -5 
0 — 24 
0 — 9 
0 0 
0 — 1 
0 -3 
0 1 
0 — 5 
0 — 2 
0 0 
0 0 
7 
0 
0 
63 
3 
9 
90 
10 
4 
1 
14 
-28 
-7 
— 2 
— 14 
0 
— 3 
0 
0 -7 
0 1 
0 1 
0 -36 
0 — 42 
0 
0 
0 
0 
— 9 
0 
0 
0 
— 1 
0 
0 
— 27 
— 5 
3 
— 42 
0 
0 
5 
0 
0 — 1 
0 0 
0 -5 
0 0 
115 
19 
20 
21 
22 
23 
24 
25 
0 8 
0 2 
0 16 
0 
0 
0 16 
0 2 
0 6 
0 12 
0 32 
0 — 5 
0 5 
0 7 
8 
32 
0 
0 3 
0 — 12 
— 7 
— 9 
63 
18 
— 2 
10 
-4 
7 
— 2 
-5 
6 
-30 
3 
5 
15 
0 
0 
0 1 
0 0 
0 — 2 
0 0 
4 
— 9 
-36 
-2 
8 
0 -4 
0 -9 
0 36 
4 
0 
0 
36 
-7 
9 
— 27 
0 — 7 
0 0 
0 0 
0 — 5 
APPENDIX B 
TRUTH TABLES AND KARNAUGH MAPS 
Modulo 7 Lower Truth Table 
5 BIT 
RESULT 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
IRESULT 
iMOD 7 
TRUTH TABLE 
OUTPU T 
Y 
0 
0 
1 
1 
0 
0 
1 
0 
0 
1 
1 
0 
0 
1 
0 
0 
1 
1 
0 
0 
1 
0 
0 
1 
1 
0 
0 
1 
0 
0 
1 
1 
W 
0 0 0 0 
0 0 0 0 
0 0 0 0 
0 0 0 0 
X 
0 1 1 0 
0 1 1 0 
0 0 0 1 
0 1 0 0 
Y 
0 0 0 0 
0 0 1 1 
1 0 0 0 
1 1 0 1 
z 
0 0 1 
1 1 0 
1 0 1 
0 0 0 
0 0 0 0 
0 0 0 0 
0 0 0 0 
0 0 0 0 
0 1 0 0 
0 0 0 1 
1 0 0 1 
1 0 0 1 
1 1 0 1 
1 0 0 0 
0 1 1 1 
0 0 1 0 
0 0 0 1 
1 0 1 0 
1 0 1 0 
0 1 0 1 
Modulo 11 Lower Truth Table 
5 BIT 
RESULT 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
fRESULTi 
IMOD 111 
0 
1 
2 
3 
5 
6 
7 
8 
9 
10 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
TRUTH 
QUTP 
TABLE 
UT 
Y 
0 
0 
1 
1 
0 
0 
1 
1 
0 
0 
1 
0 
0 
1 
1 
0 
0 
1 
1 
0 
0 
1 
0 
0 
1 
1 
0 
0 
1 
1 
0 
0 
0 0 0 1 
0 0 0 1 
0 0 0 0 
0 0 0 1 
X 
0 1 0 0 
0 1 0 0 
0 1 1 0 
0 1 0 0 
Y 
0 0 0 0 
0 0 1 0 
1 1 0 0 
1 1 1 1 
z 
0 0 1 0 
1 1 0 1 
1 1 0 0 
0 0 1 0 
0 1 0 0 
0 1 0 0 
1 0 1 0 
0 0 1 0 
1 0 1 0 
1 0 1 0 
0 0 0 1 
1 0 0 1 
0 0 1 1 
1 1 1 1 
0 0 0 0 
1 0 0 0 
1 1 0 0 
0 0 1 1 
0 1 1 1 
1 0 0 0 
Nodulo 13 Lower Truth Table 
5 BIT 
RESULT 
0 
1 
2 
3 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
fMOD 13f 
0 
1 
2 
3 
5 
6 
7 
8 
9 
10 
11 
12 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
0 
1 
2 
3 
4 
5 
fRESULTf 
TABLE TRUTH 
OUTP UT 
W 
0 0 1 1 
0 0 0 1 
0 0 0 1 
0 0 0 1 
X 
0 1 1 0 
0 1 0 0 
0 1 0 0 
0 1 0 0 
Y 
0 0 0 
0 0 0 
1 1 1 
1 1 0 
z 
0 0 0 0 
1 1 0 1 
1 1 0 1 
0 0 1 0 
0 0 0 1 
0 1 0 1 
0 1 0 0 
0 1 0 0 
0 1 0 0 
1 0 0 1 
1 0 1 0 
1 0 1 0 
1 1 1 1 
0 0 1 0 
1 1 0 0 
0 0 0 0 
1 1 0 1 
0 0 1 0 
0 0 1 1 
1 1 0 0 
120 
Modulo 15 Lower Truth Table 
5 BIT 
RESULT 
0 
1 
2 
3 
4 
5 
6 
'7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
/MOD 15' 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
0 
1 
iRESULTi 
TRUTH TABLE 
OUTPUT 
W 
0 0 1 
0 0 1 
0 0 0 
0 0 1 
X 
0 1 1 
0 1 1 
0 1 0 
0 1 1 
Y 
0 0 0 
0 0 0 
1 1 0 
1 1 1 
z 
0 0 0 
1 1 1 
1 1 0 
0 0 0 
0 0 1 1 
0 0 1 1 
0 1 0 1 
0 0 0 1 
0 1 1 0 
0 1 1 0 
1 0 0 1 
0 1 0 0 
0 0 0 0 
1 1 1 1 
0 0 0 0 
1 1 0 1 
1 1 1 1 
0 0 0 0 
0 0 1 0 
1 1 0 1 
Modulo Upper Truth Table 
5 BIT 
RESULT 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
iACTL. 
DEVALUE 
0 
32 
64 
96 
128 
160 
192 
224 
256 
288 
320 
352 
384 
416 
448 
480 
512 
544 
576 
608 
640 
672 
704 
736 
768 
800 
832 
864 
896 
928 
960 
992 
[RESULTS 
iMOD 
TRUTH TABLE 
OUTPUT 
W X Y 
0 0 0 
0 1 0 
0 0 0 
0 1 0 
0 0 1 
0 1 1 
0 0 1 
0 0 0 
W 
Q Q * * 
p p * * 
0 0 0 * 
Q Q * * 
X 
0 0 * * 11** 
1 0 1 * 
0 0 * * 
Y 01** 01** 
0 0 0 * 
Q 1 * * 
8 
Q Q * * 00** 
1 0 0 * 11** 
Q * * * 
Q * * * 
p * * * 
Q * * * 
Q * * * j * * * 
* * * 
p * * * 
Q * * * 
Q * * * 
* * * 
* * * 
* * * 
j * * * 
p * * * 
p * * * 
122 
Modulo Upper Truth Table 
5 BIT 
RESULT 
0 
1 
2 
3 
4 
5 
6 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
ACTL. 
IVALUE 
0 
32 
64 
96 
128 
160 
192 
224 
256 
288 
320 
352 
384 
416 
448 
480 
512 
544 
576 
608 
640 
672 
704 
736 
768 
800 
832 
864 
896 
928 
960 
992 
IRESULTI 
IMOD 11I 
0 
10 
6 
TRUTH TABLE 
OUTPUT 
W X Y 
0 0 0 
1 0 1 
1 0 0 
1 0 0 
0 1 1 
0 1 1 
0 1 0 
0 1 0 
W 
Q Q * * 
Q * * 
1 0 0 * 10** 
X 
0 1 * * 01** 
0 1 1 * 
0 1 * * 
Y 
0 1 * * 
* * 001* 
0 0 * * 
Z 01** 
Q Q * * 
0 0 1 * 
j j * * 
0 * * * 
0 * * * 
Q * * * 
Q * * * 
* * * 
* * * 
Q * * * 
* * * 
j * * * 
Q * * * 
* * * 
Q * * * 
Q * * * 
* * * 
* * * 
Q * * * 
123 
Modulo 13 Upper Truth Table 
5 BIT 
RESULT 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
0 
32 
64 
96 
128 
160 
192 
224 
256 
288 
320 
352 
384 
416 
448 
480 
512 
544 
576 
608 
640 
672 
704 
736 
768 
800 
832 
864 
896 
928 
960 
992 
0 
6 
12 
5 
11 
4 
10 
3 
12 
5 
11 
4 
10 
iACTL. iRESULTi 
iVALUEiMOD 13' 
TRUTH TABLE 
OUTPUT 
W X Y 
0 0 0 
0 1 1 
1 1 0 
0 1 0 
1 0 1 
0 1 0 
1 0 1 
0 0 1 
W 
Q j * * 
Q Q * * 
0 0 1 * 
] * * 
X 
Q Q * * 
* * 101* 
Q * * 
Y 
0 j * * 
] Q * * 
0 1 0 * 
Q 1 * * 
Z 
Q j * * 
0 0 * * 
1 1 0 * 
0 Q * * 
Q * * * 
] * * * 
* * * 
Q * * * 
* * * 
Q * * * 
0 * * * 
* * * 
Q * * * 
* * * 
* * * 
Q * * * 
* t * 
j * * * 
0 * * * 
Q * * * 
124 
Modulo 15 Upper Truth Table 
5 BIT 
RESULT 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
0 
32 
64 
96 
128 
160 
192 
224 
256 
288 
320 
352 
384 
416 
448 
480 
512 
544 
576 
608 
640 
672 
704 
736 
768 
800 
832 
864 
896 
928 
960 
992 
0 
2 
6 
8 
10 
12 
14 
0 
2 
4 
6 
8 
lACTL. IRESULTi 
iVALUEiMOD 151 
TRUTH TABLE 
OUTPUT 
W X Y 
0 0 0 
0 0 1 
0 1 0 
0 1 1 
1 0 0 
1 0 1 
1 1 0 
1 1 1 
W 
Q j * * 
Q 1 * * 010* 
p ] * * 
X 
Q Q * * 
Q Q * * 
1 1 0 * 
* * 
Y 
Q P * * 
] j * * 
1 1 0 * 
Q Q * * 
Z 
Q Q * * 
Q Q * * 000* 
Q Q * * 
Q * * * 
Q * * * 
* * * 
p * * * 
Q * * * 
* * * 
Q * * * 
* * * 
* * * 
p * * * 
Q * * * 
1 * * * 
Q * * * 
Q * * * 
Q * * * 
p * * * 
125 
APPENDIX C 
SCHEMATIC PLOTS 
125 
Figure A. l Modulo 7 Truth Table Hardware 
127 
Figure A. 2 Modulo I 1 Truth Table Hardware 
128 
Figure A. 3 Modulo 13 Truth Table Hardware 
129 
Figure A. 4 8X4 Multiplier 
130 
Figure A. 5 Twelve-Bit Multiple Generator 
Figure A. 6 Fourteen-Bit Carry Save Adder 
] 32 
Figure A. 7 Fou rteen-B it B i nary Adder 
133 
Figure A. S Modified Adder A 
134 
Figure A. 9 Modified Adder B 
Figure A. 10 Seventeen-Bit Two's Complementer 
136 
APPENDIX D 
SIMULATION RESULTS 
137 
Table A. l MAC Simulation Data 
ZQ. Q 
21. Q I 
102. 4 Q 
0 
I IC 4 0 
. . I I Q 
I Q 
I D 
14. 4 Q 
115 I ~ 
116 4 0 
:17 I Q 
119 4 0 
120 4 D 
I 4 0 
2 4 0 
3 4 0 
zat e o !18 0 2' 8 D 
15 6 0 
zie. e a 
217 8 0 
. 8 8 0 
219 8 Q 
Zzo. a 0 
221. 3 Q 
zzz. a a 
223. 5 ~ 
224. e a 
Z25. 8 0 
zze e ~ 
3QI 2 Q 
315 2 0 
17. 2 Q )18. 2 Q 
32Q 2 0 321. 2 0 
322 2 0 
C Q 
0 Q 
0 
0 I 
0 I 
0 
G I 
C I 
0 
0 
Q I 
I 
0 I 
Q I 
Q 
0 I 
I 
0 I 
I Q 
I 0 
I Q 
I 0 
I C 
I Q 
Q 
I 0 
I 0 
0 
I 0 
I Q 
I 0 
0 
0 
Q 0 
Q Q 
0 0 
~ Q 
Q Q 
0 0 
Q 0 
Q I Q 
0 I 0 
Q I 0 
I 0 I 
I 3 I 
0 I 
I ' I 
I . I 
I 0 
0 
I 
I 3 I 
I Q I 
Q I 
Q I 
I 
I Q I 
I Q I 
Q ' 0 
tl I 
Q I 0 
0 I 0 
Q 
0 I G 
Q 0 
0 I 0 
0 I ~ 
0 I 3 
0 I 0 
0 I G 
~ 
' C 
Q '. Q 
0 I 0 
I Q 0 
I 0 Q 
I 0 Q 
I Q Q 
I Q C 
0 0 
I 0 0 
C I 0 I 0 
Q I 0 I Q 
Q 
' Q I 0 
I ' I 0 0 
I '. 0 
I 0 
I I 0 
I C 0 
G I 
Q Q 
I I 0 0 
I I I 
I I '. 0 D 
I I ' I Q 
I ' 3 
I 3 I 
I I 
Q 0 Q 0 I 
0 0 0 ' I 
Q Q 0 Q I 
0 0 Q 0 I 
~ Q 0 0 I 
c a o 
Q Q 0 
0 Q 0 Q 
Q 0 0 Q I 
0 0 Q I 
Q 0 0 C 
0 0 0 0 I 
0 0 tl 0 
Q Q 0 Q 
Q 0 3 0 
I I I 0 I 
I I 0 I 
I '. 0 I 
I I I 0 I 
I I I 7 
I I 
I I ' 0 I 
4 
Q 3 
Q 
I 
I I 
C 
I 
3 3 
0 K 
C 
3 . 0 
C 
0 
Q 0 ) 
I Q 
I I 
0 
I 
I 
I 
I Q 
I I 
0 
I 
0 
G I 
I 
0 0 
I 
3 0 G 
0 G 
a 
Q C 
0 
0 Q 
C 
0 G Q 
0 0 
Q I 
0 0 
3 
Q Q 
0 G 
0 0 
0 I 
0 
Q 
0 
0 C 
Q Q 
G 0 
0 Q 
0 0 
0 
Q 0 
0 3 
G 
0 0 
Q 0 
I I 
0 0 0 
I Q 
I 
0 I 0 
Q. Q 
21 Q 
22 Q 
zj. ~ 
102. 4 
I 4 
'15 4 
zo4. e Ill. e 
215 ~ 
216. 8 
2ul. 8 
219 8 
22o. e 
222. 8 
223. 8 
22 ' . S 
125. 8 
226. 8 
227. 8 
3Q7. 2 
318. 2 319. 1 
32D. 2 
321. 2 
0 0 
Q I 0 
Q I 0 
0 I 0 
Q I 
Q I I 
Q I I 
Q ~ I 
Q Q 
Q D 
0 0 
II Q 
0 Q 
Q Q 
Q Q 
I 0 
0 Q I 
0 Q I I 
0 Q I I 
Q Q I I 
Q Q I I, 
Q Q I I 
Q Q I I 
Q Q I I 
Q 0 I I 
0 0 I I 
0 0 I I 
0 0 I I 
0 0 I 
0 Q I I 
Q Q I 
0 0 I I 
Q 0 I 
0 I 
0 I I 
0 I I 
0 I 
0 I 
tl I 
I '. 3 
I 0 
I I 0 
I I Q 
I I Q 
I I Q 
I I 0 
I I 0 
I I ~ 
I I 0 
I I Q 
I I Q 
I I D 
I 0 0 
I Q 0 I 0 Q 
I Q 0 
I 0 0 
Q 
Q I 
0 I 
0 I 
0 I 
Q I 
D 
0 Q 
Q 0 
0 Q 
Q G 
Q 0 
0 Q 
Q Q 
Q Q 
Q 0 
Q 0 
D 0 
Q 0 
a Q 
Q 0 
Q 0 
Q D 
D Q 
Q 0 
Q 
Q I 
0 I 
~ I 
~ I 
~ I 
Q I 
Q I 
Q 
0 
Q I 
0 
Q I 
0 \ 
Q 
0 I 
0 I 
0 I 
0 I 
I I 
I I 
I I 
I I 
I I 
0 X 
0 Q 
~ 0 
0 ~ 
Q Q 
0 0 
Q Q 
0 Q 
0 Q 
0 Q 
G I 
0 0 
0 I 
0 0 
Q 0 
0 I 
0 I 
0 I 
0 I 
0 I 
0 I 
0 Q 
0 Q 
0 Q 
Q Q 
6 x 
x r 
I 4 
I I 
I I 
Q I 
I 0 
I 0 
0 
I 0 
Q I 
0 Q 
0 I 
Q 
0 Q 
0 I 
I I 
Q I 
D Q 
D I 
Q I 
Q 0 I 0 
I I 
Q I 
TIME 3 
N$113NXG 
~ I 53 "bl 3 «I 3 "I 
xl I Clo R I 44 
Q. Q I 
22. 6 I 
23. Q I 
IQ2. ~ Q 
IIQ. 4 D 111. 4 8 
312. ~ Q 
113 4 Q 1' . 4 Q 
115. 4 Q 116. 4 Q 117. 4 6 
'14. 4 0 119. ~ 0 
12Q. ~ Q 
121. 4 Q 
2Q4. 8 I 
zij. e I 
21 ~ . 8 I 
236. 6 I 
Q 0 
0 Q 
0 Q 
I 0 
I 0 
I 0 I Q 
I 0 
I Q 
I 0 
I Q I 0 
I 6 
I 0 
I 0 
I D 
I Q 
I Q 
I 0 
I 0 
Q 6 
0 6 
0 0 
0 I 
o 
Q I 
0 I 
Q I 
0 I 
0 I 
Q I 
Q 
Q 
Q I 
Q I 
0 I 
0 0 
6 0 
C 0 
Q Q 
I I I 
I I I 
I I I 
6 0 I 
0 0 
D 0 I 
Q Q I 
0 0 I 
o o 
D 0 I 
0 Q I 
6 0 I 
0 Q I 
0 0 I 
Q 0 I 
D 0 \ 
I 0 0 
I 0 D 
D 6 
I 0 Q 
I 0 
I 0 
I Q 
0 Q 
0 0 
0 0 
0 0 
0 Q 
a a 
0 D 
0 Q 
D Q 
Q Q 
D Q 
Q 0 
Q 6 
I Q 
I 0 
I 0 
I Q 
I 6 
I 0 
I 0 
I Q 
I Q 
0 
I 0 I 0 
O 
I 0 
I 0 
I Q 
I 0 
I D 
I Q 
I Q 
0 0 
Q 0 
Q 0 
Q 0 
x 6 
Q 0 
Q 0 
Q 6 
6 0 
I 0 
0 Q 
I 0 I 0 I I 
Q Q I I 
Q Q 
O I 
0 0 I I 
I I 
I I 
0 D 
0 Q 
X X 
0 X 
0 I 
0 I 
o o 
I I 
D 0 
I I 
O 0 
I I 
D 0 
I I 
0 0 
a o 
0 0 
0 Q 
0 o 
Q I 
Q I 
Q 0 
. INE 3 " I 53 51 
2 0 'Iz ba 02 ' C 2 '-0 
I I 
I I 
I I 
I I 
I I ! I 
0 Q 
Q Q 
Q 0 
0 Q 
0 0 
0 0 
Q Q 
0 tl 
Q Q 
2 '. 8 . 8 
219 e 
22Q 8 
221. 4 
222 4 223. 8 
3C7 2 315. 2 
316 2 317. 2 
31 ~ 2 319. 2 
32Q. 2 
322. 2 
323. 2 
Q 0 
0 Q 
Q 0 
Q Q 
Q 0 
Q Q 
I 0 
I 0 
I Q 
Q 
I Q 
I 0 
I Q 
I 0 
I 0 
~ I 
0 I 
Q I 
0 I 
Q I 
0 I 
I 0 
I 0 
I 0 
I 0 
I 0 
I C 
I 0 
I 0 
I 0 
Q Q 
tl Q 
0 Q 
7 Q 
Q 0 
0 Q 
I I 
I I 
I 
I I 
I I 
I I 
I 
I I 
I 
0 C 
0 0 
Q 
0 0 
C 
Q Q 
0 I 
0 I 
~ I 
0 I 
0 I 
Q I 
0 I 
0 I 
Q I 
I Q 
I 0 
I Q 
I 0 
0 
I 0 
Q I 
0 I 
0 
D I 
0 I 
0 I 
Q I 
0 I 
Q I 
I 
Q 
I 
G 
0 I 
3 
Q I 
I I 
I Q 
I Q 
G O 
I I 
0 0 
0 I 
Q 0 
I 
Q 
0 I 
C I 
0 Q 
Q 
Q 
Q 0 
0 I 
0 
I 
Q 0 
0 
I 
TINE " 3 " I 
2 " Q "b2 bo 2 " Q ':2 Q 
Qol 
19 Q I 2' D I 
22. Q I 
23 Q I 
102. 4 Q llo. 0 111 4 Q 
I 2. 4 Q 
113 4 Q 
114. 4 Q 
115. 4 Q 
116 I Q 
117 4 Q Iie. 4 o 
IZQ. 4 D 
2QI. S I 
215. 8 I 
214 8 I 215. 8 I 
216 8 I 217. 8 I 
218. 8 I 
22Q. S I 
3Q7. 2 I 
315. 2 I 
316 2 I 317. 2 I 318. 2 I 
319 2 I 
32Q. 2 I 321. 2 I 322. 2 I 324. 2 I 
327. 2 I 
I ~ I 
I Q I 
I 0 I 
I Q I 
I Q I 
0 I I 
Q I I 
0 I I 
Q I I 
0 I 
Q I 
Q I I 
D I I 
0 I I 
0 I I 
0 I I 
Q 0 Q 
0 Q ~ 
0 Q Q 
Q 0 Q 
0 0 0 
Q Q 0 
Q 0 0 
a 0 ~ 
0 0 I 
0 0 I 
0 Q I 
0 Q I 
a 0 I 
Q 0 I 
0 D I 
0 Q I 
0 Q I 
0 0 I 
0 0 I 
I 3 
I Q 
Q 
I 0 
I Q 
I 0 
I 0 
I 0 
I 0 
I ~ 
I Q 
I 0 
I Q 
I Q 
I Q 
I Q 
I Q 
I 0 
I Q 
I 0 
I Q 
I Q 
I Q 
I Q 
Q 0 
tl D 
a Q 
D ~ 
0 0 
Q 0 
0 Q 
0 0 
0 0 
0 Q 
Q 0 
I 0 C 
I Q 0 
I Q Q 
I C 0 
I Q Q 
Q I 
Q I 
Q I I 
0 I I 
Q I I 
C I I 
~ I 
Q I I 
0 I I 
0 I I 
0 I I 
D I Q 
0 I 0 
Q I Q 
Q I Q 
0 I 0 
Q I 
Q I 0 
D I Q 
I Q 
I I 0 
I -' 0 
I I 0 
I I 0 
I I ~ 
I I 0 
I I Q 
I I Q 
I I 0 
I I D 
I I 
Q I I 
Q I I 
0 I 
0 I I 
I Q 0 
I Q 0 
I 0 0 
Q Q 
I 0 Q 
I Q 0 
I 0 0 
I 0 Q 
I 0 0 
Q Q 
I Q Q 
~ Q 
I 0 Q 
I Q 0 
I 0 0 
I Q 0 
a 0 
0 Q 
I 0 0 
I I I 
I I I 
I I 
I I I 
I I I 
I I I 
I I I 
I I I 
I I I 
I I I 
I I I 
X X 
X I 
I I 
I I 
I I 
I I 
I 0 
I 0 
Q 0 
I 0 
Q 
0 
I I 
I Q 
C Q 
I 0 
I 0 
I I 
G Q 
I 
0 Q 
I I 
0 0 
G 0 
0 0 
0 
Q I 
Q I 
I Q 
I 0 
0 Q 
Q Q 
Q Q 
Q Q 
Q I 
4 X 
X X 
Q 
G I 
0 I 
G Q 
I 
Q 0 
I 0 
I 
Q 
0 I 
Q I 
Q I 
0 I 
Q 
I I 
G ~ 
I I 
0 Q 
I 
D 0 
Q I 
Q I 
I I 
Q Q 
I 
I ~ 
Q Q 
I Q 
I I 
I Q 
0 Q 
D 0 
Mod la 16 NXC jl Xl I I Ra I 
Q. D 
3 8 
6. Q 7. 0 9. Q 
IQ2. ~ 
ioa. ~ 
IDS. ~ 
IQT. ~ 
149. ~ 
I 11. ~ 
2Q4. 6 2oj. e 
zoe. e 
209. 8 
216. 8 
212, S 367. 2 
31Q. 2 311. 2 
313. 2 
8 0 
0 Q 
Q 0 
Q 0 
Q 0 
Q C 
0 0 
0 0 
Q Q 
Q Q 
0 0 
Q I 
0 I 
6 I 
8 I 
4 I 
D I 
0 0 
Q Q 
Q 0 
Q Q 
Q I 
Q I 
0 I 
0 I 
0 I 
I I I I 
I I 
I I 
I I 
I I 
0 Q 
0 Q 
Q 0 
Q D 
0 0 
D 0 
0 I 
0 I 
0 I 
0 I 
O I O 
0 I D 
Q I Q 
0 I D 
0 I 4 
0 I 0 
0 I 0 
Q I 0 
Q I Q 
0 I Q 
o I a 
I I 0 
I I Q 
I I 6 
I I 0 
I I 0 
I 0 
I 0 I 
I 0 I 
I ~ I 
I Q I 
D I 
0 I 
Q I 
D I 
D I 
I Q 
I 0 
I 0 
I Q 
I Q 
I 0 
0 I 
0 I 
Q I 
Q I 
Q I 
Q I 
I I 
I I 
I I 
I 
0 0 
0 D 
0 Q 
0 0 
0 0 
0 I 
Q I 
0 I 
0 
0 I 
Q I 
0 
0 I 
0 I 
0 I 
Q I 
0 I 
I 0 
I Q 
I Q 
I 0 
x 
I X 
I X 
I 6 
I I 
Q I 
0 I 
0 I 
0 Q 
0 I 
0 Q 
0 Q 
0 Q 
Q Q 
0 I 
0 Q 
Q I 
I 
0 0 
0 0 
Q 0 
x x x 
x x 
6 0 I 
I Q I 
I 0 I 
I Q I 
I 0 Q 
I I I 
I Q I 
Q 0 I 
Q 0 I 
0 0 I 
I D I 
I 0 Q 
I I Q 
I I Q 
0 I 0 
0 I 0 
D I 0 
0 I I 
I I I 
TINE 3 I "b3 bl 3 I 'P3 91 
"42 0 b2 "bo 02 " Q "P2 PQ 
TINE 3 I b3 bl " 3 I "43 I 2 "Qb2 IQ2 "D2Q 
a3 
a2 
a1 
ao 
b3 
b2 
b1 
bo 
c3 
c2 
c1 
c0 
02 
10. 0 20. 0 30. 0 40. 0 
Figure A. l l Modulo 7 Trial ¹I MAC Simulation 
a3 
a0/ 
c1 
c0 
r2 
r0 
10. 0 20. 0 30. 0 40. 0 
Figure A. 12 Modulo 11 Trial ¹1 MAC Simulation 
a3 
a2 
a1 
ao 
b3 
b2 
b1 
bo 
C3 
c2 
C1 
c0 
r2 
r0 
10. 0 20. 0 30. 0 40. 0 
Figure A. 13 Modulo 13 Trial ¹1 MAC Simulation 
141 
33 
i3 2 
a1 
a0 
b3 
b2 
b1 
bo 
c3 
c2 
c1 
c0 
v3 
r0 
10. 0 20. 0 30. 0 40. 0 
Figure A. 14 Modulo 15 Trial ¹1 MAC Simulation 
142 
10. 0 20. 0 30. 0 
a3 
a2 
a1 
ao 
b3 
b2 
bi 
bo 
c3 
c2 
c1 
c0 
p3 
p2 
p1 
p0 
40. 0 
Figure A. 15 Modulo 16 Trial ¹1 MAC Simulation 
143 
Table A. 2 nput Translation Simulation Data 
d I 7 
p t 
Hcd I 11 
6 lt 
HC115 
0. 0 I 
20. ~ I 
24 0 I 
5 0 I 
'. Cz 4 0 
ce 4 0 
IC 4 0 
108 4 tl 
110 4 0 111. 4 0 
112 4 0 
113 4 0 
11 4 0 
115 ~ 0 
116. 4 0 
117 0 
lilt. 4 0 
119 I 0 
110. 4 0 
121 4 0 
122 4 '3 
123 4 0 
124 4 0 
126. 4 0 
204 8 I 
207 8 I 
zoe e 2'5. !t I 
2 . 8 I 
218 e 
219 4 I 
zzz. e 
Z23 8 I 
224. 8 I 
225. 8 I 
226. 8 I 
227. 8 I 
307. 2 0 311. 2 0 
312. 2 0 
313. 2 ~ 
316. 2 0 
311! 2 0 
319. 2 0 
321. 2 0 
322. 2 0 
0 
0 
0 
0 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I I 
I 
I 
I 
I 
0 
0 
0 
0 
0 
0 
D 
0 
0 
I 
I 
I 
0 
0 
0 
0 
tt 
0 
0 
0 
0 
C 
0 
0 
0 
0 
0 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
\ 
I 
0 I 
0 I 
0 I 
0 
0 
0 
0 I 
0 
~ I 
I 
0 I 
0 I 
0 I 
0 I 
0 I 
0 I 
0 I 
0 I 
0 I 
0 I 
0 I 
D I 
0 I 
0 I 
0 
tl 
0 
I 0 
I 0 
I 0 
I 0 
0 
I 0 
I 0 
I 0 
I 0 
I 0 
I I 
I 
I I 
I I 
I I 
I I 
I I 
I I 
I I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I ! 
0 
D 
0 
0 
0 
0 
0 
0 
0 
0 
0 
D 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
3 
0 
0 
0 
0 
I 
I 
I 
I 
I 
I I 
I 
I 
I 
I 
I 
0 
0 
0 
0 
0 
D 
0 
0 
0 
0 
0 
0 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
0 
0 
0 
0 
0 
G 
0 
0 
0 
0 
0 
0 
I 
I 
I 
I 
I 
I 
I 
I 
I 
D X 
0 X 
0 I 
0 I 
~ I 
0 0 
0 0 
0 
0 0 
0 I 
0 0 
0 0 
0 0 
0 I 
0 
0 I 
0 0 
0 I 
0 0 
0 I 
0 0 
0 0 
0 tl 
~ 0 
0 0 
0 0 
0 I 
0 I 
~ 0 
~ I 
D I 
0 D 
0 I 
0 0 
0 I 
0 ~ 
tl I 
0 I 
0 0 
D 0 
0 0 
0 I 
0 I 
0 0 
0 0 
0 D 
X 
0 
D 
0 
I 
0 
0 
I 
I 
I 
I 
0 
0 
0 
I 
0 
0 
0 
0 
I 
I 
0 
0 
0 
I 
I 
0 
0 
0 
D 
0 
0 
0 
0 
0 
I 
0 
0 
0 
0 
0 
0 
0 
X 
X 
x 
I 
I 
0 
I 
0 
0 
0 
tl 
I 
0 
I 
0 
G 
0 
I 
0 
I 
0 
I 
0 
0 
0 
0 
0 
tl 
I 
0 
0 
0 
0 
I 
I 
I 
I 
0 
I 
0 
0 
I 
I 
0 
I 
0. 0 
20. 0 
21. G 
22. 0 
102. 4 
. 13. 4 
'14 4 
115. 4 
116. 4 117. 4 
118 4 lie. ~ 
120. ~ 
121, ~ 
122. 4 
123. 4 
124 4 
126. 4 
127. ~ 
20 ' , 8 
215, 8 
216. 8 
21'I 
. 8 
218. 8 
219. 8 
220. 8 
221. 8 
224. 8 
225. 8 
227. 8 
228, 8 
307. 2 
317, 2 
318. 2 
319. 2 
320. 2 321. 2 
322. 2 
323. 2 
32 ' . 2 
325. 2 
326. 2 
327. 2 
0 0 
tl 0 
0 0 
0 0 
3 I 
0 I 
0 I 
~ I 
0 
D I 
0 
0 I 
0 
0 I 
D I 
D I 
D I 
0 I 
0 I 
I 0 
I 0 
I D I D 
I 0 
I 0 
I 0 
I 0 
I 0 
0 
0 
I 0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 0 
D 0 
X X 
x 0 
0 0 
0 0 
0 0 
I I 
0 0 
I I 
0 0 
I I 
0 0 
I I 
0 0 
I I 
0 D 
0 I 
0 I 
0 0 
0 I 
0 I 
0 0 
0 
0 0 
0 I 
0 0 
I 0 
0 0 
0 I 
0 I 
0 0 
0 0 
0 0 
0 0 
I 0 
0 0 
I I 
0 0 
I I 
0 0 
I I 
0 0 
0 I 
0 I 
TIHE 7 '- 5 3 I 3 " I 
H d I 13 
0 0 
28. 0 
29. 0 I 
102. 4 0 
105. 4 0 
106 4 0 
107 4 0 
I 0 ~ 0 1-4. 4 0 I!5. 4 0 
'6. 4 0 
117 4 0 
I '. 6 4 0 119. 4 0 
120. 4 0 
121. 4 0 
122 4 0 
123 4 0 
zo4. ~ 
zo . e I 
208. 9 I 
209 8 I 
211 8 I 
214. 6 I 
21 . 8 I 
216. 8 I 
217. 8 I 
zle e 
219. 8 I 
zza. e 
221 8 I 
222. 4 I 
223. 8 I 
224. 6 I 
225. 8 I 
226. 8 I 
227. 8 I 
228. 8 I 
229. 8 I 
230. 8 I 
231. 8 I 
232. 8 I 
307. 2 0 311. 2 0 
318. 2 0 319. 2 0 
32D. Z 0 3?1. 2 0 
322. 2 0 
323. 2 0 
324. 2 0 
325. 2 0 )26. 2 0 
328. 2 D 
3'29. 2 0 
330. 2 0 
0 I 0 0 
I 0 I D 0 
I 0 I 0 0 
0 I I I I 
0 I I I I 
I I I I 
0 I I I I 
0 I I . I 
3 I I '. I 
0 I I ' I 
0 I I I 
0 I I I 
0 I . ' I 
0 I I I I 
0 ' I I I 
0 ' I I I 
0 I I I 
0 I I I 
0 I 0 I 0 
0 I 0 I 0 
I 0 X 
I 0 0 
I 0 0 
0 0 0 
0 0 0 
0 0 0 
0 0 0 
o 0 a 
0 0 
0 0 G 
0 0 0 
0 0 0 
0 0 0 
~ 0 
~ 0 0 
0 0 
0 0 
~ 0 0 
0 I 3 
0 
0 I 0 
0 I 0 
0 I 0 
0 I 0 
0 I 0 
0 I ~ 
0 I 0 
0 I 0 
0 I 0 
0 ' 0 
0 I 0 
0 I 0 
0 I 0 
I 0 
I 0 
I 0 
I 0 
I 0 
I 0 
I 0 
I 0 
I 0 
I 0 
I 0 
I D 
I 0 
0 I 3 
0 I 3 
0 '. 0 
0 I I 
0 I 0 
0 
0 I 0 
0 I 0 
0 I 
0 I 0 
0 I 0 
D I 0 
0 I 0 
0 ' 0 I 0 0 I 
0 I 0 I 0 tl I 0 
0 I 0 
D I 0 
D 1 ~ 
0 I 0 
0 I 0 
0 I 0 
0 I 0 
I I I 
I I I 
I I I 
I I I 
I I 1 
I I I 
I I I 
I I I 
I I I 
I I I 
I I I 
I I I I I I 
I I I 
I 0 0 I 
I 0 0 I 0 
0 I I 
O I 0 
0 I 0 
0 I 0 
0 I 0 
I 0 0 
I 0 0 
I 0 I 
0 0 
I 0 I 
I 0 D 
I 0 I 
I 0 0 
I 0 
I 0 0 
I ~ 0 
I 0 0 I 0 0 
I 0 tl 
I 0 
I 0 
I 0 
I 0 
I 0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 0 
D 0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 0 
X X 
X X 
I 
I 
I 
0 
I 
0 0 
I 
0 0 
I I 
0 
I 
0 
I 0 
0 0 
0 0 
0 
I I 
0 0 
0 0 
I 
0 0 
0 
0 0 
I I 
0 0 
I I 
~ 0 
I 
0 0 
I I 
0 0 
0 I 
0 I 
0 0 
0 I 
0 
I I 
0 0 
0 I 
0 0 
I I 
0 
I 0 
I 
0 0 
I " 3 
2 . 0 2 
TIHE 7 5 . 3 
6 4 '\ 
I I 0 0 I 
I I 0 0 I 
I I 0 0 I 
I 0 0 I 
I 0 0 I 0 
I 0 0 I 0 
0 0 I 0 
0 0 I 0 
0 0 I 0 
I 0 0 I 0 
I 0 0 I 0 
I 0 0 I 0 
0 0 I 0 
0 0 I 0 
0 0 I 0 
0 0 I 0 
0 ' 0 I 
I 0 I 0 I 
0 I 0 I 
0 ' 0 I 
0 I 0 I 
0 I 0 I 
0 I 0 I 
0 I 0 I 
I 0 I 0 I 
I 0 I 0 I 
I 0 I 0 I 
~ I 0 I 
I ~ I 0 I 
I 0 I 0 I 
0 I 0 I 
G I I I 0 
0 I I I 0 
0 '. I I 0 
0 I I I 0 
0 I I I 
0 I I . ' 0 
0 I I I 0 
D I I I 0 
0 I I I 0 
0 I I I 0 
0 I I I 0 
0 I I I 0 
0 I I I 0 
0 I I I 0 
0 I I I 0 
0 I I I 0 
I I 
I I 
I I 
I I 
0 0 
0 0 
0 0 
I 0 
0 ~ 
G 0 
tt 0 
0 0 
tl 0 
tl 0 
0 0 
0 0 
I 0 
I 0 
0 
I 0 
I 0 
I 0 
I 0 
0 
I 0 
I 0 
I ~ 
I 0 
I 0 
I 0 
I 0 
0 0 
0 0 
0 0 
0 0 
~ 0 
0 0 
0 0 
0 ~ 
0 0 
0 0 
tl ~ 
II 0 
0 0 
0 0 
0 0 
0 ~ 
4 0. 0 
0 32 0 
0 34 0 
0 35 0 
102 4 
105. 4 
10 . 4 07. ~ 
110. 4 
111 4 
112 4 
0 11) 
0 114 4 
I 115 
0 116 
I 11 . 4 
0 204 8 
07 il 
210 e 
0 215. 8 
c Zlee 
0 217. 11 
0 216. 8 
0 219. 8 
o zzo e 
I 221. ~ 
0 224 e 
0 225. 8 
0 227 8 
I 228. 8 
0 229. 8 307. 2 
0 310. 2 
I 311. 2 
0 312 2 
I 313. 2 
~ 31I 2 
I 317. 2 
o jle. z 
I 319. 2 
0 320. 2 321. 2 
322 2 
0 323. 2 
0 324. 2 
0 325. 2 
I 326. 2 
tl 
I INE 
0 
I 
0 
0 
I 
0 
0 
X X 4 X 
x r o 
I 0 X 0 X 
I 0 I 0 I 
0 0 I 0 I 
0 0 I '. I 
0 0 0 . I 
C 0 I '. I 
0 0 0 0 0 
0 I I ' 0 
0 0 I ' 0 
0 0 I I I 
0 0 I 0 I 
0 I I I 
0 0 G 0 0 
0 0 I I 0 
0 I I 0 
0 G 0 ~ 
0 0 
0 0 0 0 
I ' ' I 
0 0 0 0 
I 
0 0 0 0 
I I 0 I I 
0 0 tl 0 
I I 0 I 
I 0 0 
I 0 0 0 I 
0 0 I I 
I 0 0 I 0 
0 D ~ I 0 
0 I ' I 0 
0 I D 0 0 
0 I 0 0 I 
0 0 0 0 0 
0 I 0 I I 
~ 0 0 I 0 
0 I I I I 
0 0 0 0 0 
0 I I I I 
0 0 0 0 0 
0 \ I I I 
0 0 0 0 0 
0 I I I I 
0 0 0 0 0 
0 I I 0 I 
7 " 5 " 3 " . I "Q3 " I 
Hdi 6 
I p T 
0 a 3. 0 5. 0 
6. ~ 
7, 0 
102 
10 ~ . I 
106. 4 
204. 8 
205. 8 
207. 8 
208. 8 
210. 8 )07. 2 
309. 2 
I 0 
I 0 
0 
I 0 
I 0 
0 0 
0 0 
0 0 
I I 
I I 
I I 
I I 
I I 
0 I 
0 I 
I I 
I 
I I 
I I 
I I 
I I 
I I 
I I 
0 I 
0 I 
0 
0 I 
0 I 
I 0 
I 0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 D 
0 0 
0 I 
0 
0 I 
0 I 
0 I 
I D 
I 0 
I I 
I I 
I 
I I 
I I 
0 I 
0 I 
0 
I I 
I I 
I I 
I I 
I I 
0 0 
D 0 
X X 
r x 
x 
X I 
I I 
I I 
I I 
0 0 
0 0 
0 0 
0 0 
I 0 
I 0 
I 0 
I 0 
4 X 
X I 
X I 
D I 
0 I 
0 I 
0 0 
0 I 
0 I 
0 0 
0 I 
I I 
0 I 
0 I 
0 0 
TZICE I. 7 ' 5 I 3 I I 9) I 6 
~ I 2 I 0 92 '90 
tINE in7 Ins 13 ll "3 "I 
6 "14 nz "ID "2 "0 
in7 
in6 
in5 
in4 
1R3 
1R2 
ini 
in0 
r0 
10. 0 20. 0 30. 0 40. 0 
Figure A. 16 Modulo 7 Trial ¹I Input Translation Simulation 
in7 
in EI 
in5 
in4 
1 fl 3 
1 fl 2 
1n1 
in0 
f' 3 
p2 
r0 
10. 0 20. 0 30. 0 40. 0 
Figure A. 17 Modulo 11 Trial ¹I Input Translation Simulation 
1n7 
in6 
in5 
in4 
1R3 
1R2 
) 
in 1 
in0 
r3 
r2 
r0 
10. 0 20. 0 30. 0 40. 0 
Figure A. 18 Modulo 13 Trial ¹I Input Translation Simulation 
in7 
in6 
104 
1 fl 3 
in 2 
101 
in0 
f'3 
t 2 
r0 
10. 0 20. 0 30. 0 40. 0 
Figure A. 19 Modulo 15 Trial ¹I Input Translation Simulation 
10. 0 20. 0 30. 0 
in7 
in6 
in5 
in4 
1 fl 3 
1 fl 2 
1 fl 1 
in 0 
g3 
g2 
g1 
g0 
40. 0 
Figure A. 20 Modulo 16 Trial ¹I Input Translation Simulation 
f. 0 
0 !51 
!55 
!59 !6! 
'63 
5 
61 
69 
20( 6 
'06 8 2!0 8 
2 8 8 
. '3G 8 
'. 4 8 
24' 8 
8 :ee 
8 
253 6 
255. 8 
256. 6 
257 tt 
258 6 
759 
260 8 
262 5 
263 6 
264 8 
Jss e 
256 8 
267 8 
268 8 
269 tl 
270 I 
271 8 
272 8 
274 8 
276 tl 
2, 7 tl 
278 6 
279 6 
280 8 
261. 8 
282 8 
283. 8 
285 8 
286 it 
257 8 
2S8 8 
290. 8 
291 8 
292. 8 
293. 8 
294. S 
296 8 
29 3 
99 9 
30 ~ . 8 )o) e 
312. 3 )!4 8 3!7 3 
319 8 
320. 8 
322 8 
323. 8 )zl. 'e 
328. 9 
333. 8 
409. 6 
415. 6 31. 6 433. 6 
437. 6 lse. ( 
4 ~ I. 6 
4 I 2. 6 
~ l4. 6 
446, 6 
445. 5 l49, 6 
450. 6 
t51. 6 
452. 6 
656. 6 
458. 6 
460. 6 
462. 6 
463. 6 
46S. 6 
~ 67. 6 
471. 6 
472. 6 
AD )3. 6 
475. 6 
476. 6 
477. 6 
478. 6 
479. 6 
480. 6 
481. 6 
~ 82. 6 
3 Tabl e A 
I 0 I tt 
0 I 0 
0 I 0 
0 I 0 
0 I 0 
0 I 0 0!0 010 010 010 
0 I 0 
0 I 0 
I 0 
I 0 
I I 0 
I I 0 
I ! 0 ! 0 
I I 0 110 
I ~ 
I 0 
0 
I '. 0 110 
I 0 
I . 0 
0 
0 
I I 0 110 
I 0 
I I 0 011 
I 
I 
0 ' I 
I 
I I 
I 
83 6 
484. 6 
485. 6 
486. 6 
487 6 
4ee 6 
459 6 
490. 6 
431. 6 
493. 6 (94 6 
497 6 
498 6 
5 . ' 3 . 6 
14 
516. 6 51' 6 !8. 6 
522 6 
527 6 
614 4 (le 4 
620 
61s. t 
I '10 4 
642 4 
643 
els ~ 
r, l (49 4 65! 4 
653 4 
654. 4 
655. 4 
656 I 
656 4 
660 
662 ~ 
663. 
664 4 
ees 4 
667 4 
669. ~ 
67!l. 4 671. 4 
672. 4 
674. 4 
677. 4 
678. 4 
6 9. 4 
seo 6 
6 tl I . ~ 
682. 4 
685. ( 
686. 4 
687. 4 
See. e 
690. 4 
692. ~ 
693. 4 
699. 4 
703. 4 
704. 4 
707 4 
715. 4 
718 I 
719 4 
21. 4 
23. 4 
724. 4 
726. 4 
727. ~ 
728. l 
729. l 
731. ~ 
732. 4 
733 ~ 
734. 4 735. 4 
737 4 
738. ~ 
739. 4 
740. 4 
741. ~ 
742. 4 
743. 4 
7 ~ 4. ~ 
745. 4 
746, ~ 
747. 4 
Tla. a 
749. 4 
'750. 4 751. 4 
752. 4 
754. 4 
755. 4 
756. 6 
760. 4 
764. 4 
76S. 4 
766. 4 
770. ~ 
774, ~ 
7 IB6 
I 0 I 0 I 0 !0!0!QO 0!0. '0 1011. 00 
0 ' 0 ' 0 
I 0 I G 3 
I 0 ' 0 0 
I 0 ' C '. 0 
I 3 I 0 ' 0 0 1010!00 
I 0 ' 0 
0 '. 0 I 
0 I 0 ' 0 0 
I J 
0 0 
0 ' 0 ' 0 
I ') - G 
7 '. 0 I 0 
I " I 0 ' 0 
I 3 I 0 ' C D 
I '. I 0 0 I 3 
I ' '. 0 " I 
I I I 0 0 I 0 
I ' I 0 0 I 0 
I 'I G I 3 I!QGI 
I 
1000 
I 0 0 0 1000 
I O DO 
I 0 0 0 
I 0 0 
0 It 0 
I ~ 0 0 
I 0 0 0 1000 1000 
I 0 0C 
I '3 0 C 
I 0 0 0 
0 
1000 1000 1000 000 
0 I I 0 
0 I I 0 
0 I ' 0 
0 I ' 0 
0 I 
0 I 
0 
0 ~ 
Il D 
0 0 001 001 
0 0 I 
0 0 
0 I I 
0 0 I 
0 0 I 
0 0 I 
0 0 I 001 
0 '. I I I ' 0 
0 I I I I I 0 
0 I 
0 0 I 
0 3 I 
0 0 
0 I 
O 0 
0 0 I 
0 0 I 
0 0 I 
I 0 0 
I 0 0 I 
I 0 
0 I 0 
0 I 0 
0 I 0 G!0 
0 I 0 
0 I 0 
0 I 0 
~ I 0 
0 I 
0 I 
0 I 
0 . I 
0 I I 
0 
0 I 
0 I I 
0 I 
XX XX 
XX XX 
XXXXX 
xxxxx 
XXXX9 
. JXX9 
XXX9 
)XXX9 
9 
0 I I 
0 I I 0111 I I I 0111 011 
0 
I I 0 110 110 110 
0 
I 0 
I G 
I tt 0 I 
I tt 
I 0 3 I 
I t) 0 I 
I 0 0 
I ~ I I 
I 0 0 
0 I 0 0 I I 
0 I G 0 I I 
~ 0 0 I I 0 
0 0 0 I I 0 
I C I 3 
I I 
I 0 I I 
0 I I 
I 0 G I 
0 0 
I '. 0 
0 
I 0 
0 
0 I 0 
0 . ' 0 
0 I ' 0 
I 0 
I . ' I I 
0 
0 ' I 
0 ' I 
0 'I 
0 0 
~ 0 I 001 
0 0 I 
0 0 I 
0 0 I 
tt 0 0 
0 0 3 
tl 0 0 
0 0 0 
0 0 D 
~ 3 0 
~ 0 
G I ~ 010 
0 0 
0 I 0 
I 0 010 
0 
I 0 0 
I 0 0 
0 0 
0 0 
I D 0 
I Q 0 
I 0 0 
t:9 6! 
0 0 0 
3 0 0 . ' I 
0 0 0 I 
3 0 0 I I 
7 0 0 I '. 0 
0 0 G I I 3 
G 
I 0 I 
I 0 I 
0 I 
I 0 I 
I 3 I 
I 0 I I 
I '3 I I 011 
I . 0 I I 
I 0 I I 
I . 0 I I 
I I 
0 I I 
I I I 
0 I I 1011 
I 0 I 
3. (3 )9 53 )9743 
33'03 
2 
32CJC 
0 Qsc 3. '05C 
. -' 5 0 
' 
-' 
" 
-'- I 
56 
0)0 6 
06 
0 . 0 I ~ 0 0 0 I 
0 . 0 I 3 0 0 0 I I 0 ) )e) 
78) 
57. ) 0 . 0 I 3 0 0 0 I \ \ 0 0 I \ . 0 I 0 0 0 0 I 
I I'3 
3 7F) 
lao)3 )eo)) 
I ~ 03) 
20033 
30(33 
21678 
01278 
12 3 
1278 
2007a 
00378 
20038 
00038 
00408 
23708 )0583 
30208 
I I 0 
I I 0 
I I 0 
I '- 0 
I I 0 
I I 0 
I 0 
I I 0 
I I 0 
I I 0 1. 0 
I I 0 
I 0 
I 0 
I I 0 
I I 0 110 
I I 0 
I 0 110 110 
I ' 0 
I I 0 
I I 0 
I 0 
I I 0 
I I 
I I 0 
I 0 
I 0 
I ' 0 
0 
I 0 
I I 0 110 
0 
') 
) 
I 0 
0 
0 
0 
0 
0 
I 0 
I I 0 
0 
I . ' 0 
I I I 3 '. ' 3 0 I 0 0 0 ~ I I 
I I I 0 '. I I 0 I 0 0 0 0 I I 0 
I I ' 0 I I 7 ' 0 I 0 0 0 0 I I 0 
I I I 0 I I 0 ' 3 I 0 0 tt 0 I I 0 
I 0 I I 0 ' ) ' 0 0 0 0 I . 0 
I 0 I I G ' 0 I 0 I 0 ~ I I 0 
I I tt I I C ' 0 I ~ 0 0 ~ ' '. 0 
I I I 0 I I 0 0 I 0 0 0 0 . ! 0 
I I I 0 I I C C I tl ~ 0 0 I I 0 
I I ' 0 I I G . 0 I 0 0 0 0 I I 
I I I 0 I I 0 . 0 I 0 0 0 0 I ! 7 
I I I 0 I I 0 I 0 I 0 0 0 0 I I 0 
I I I 3 I I J ' 0 I 0 0 0 G I I a 
I I I 0 '- I 0 -' 0 ' 0 0 G 0 I ' 0 
I I I 0 I I 0 ' 0 I 0 0 0 0 I 0 
I I ' 0 I I 0 3 I 0 0 0 0 I I C 
I I ' ~ I I 0 I ) I 0 0 0 0 I I 0 
I ' I 0 I I I 0 I 0 0 0 D ' I 0 110110'0100 ~ 0110 
I I I 0 I I 0 I 0 0 0 0 I I 0 
I I I 0 I I 0 I 0 I ~ ~ 0 0 I I 0 
I I I 0 I ' tl I 0 I 0 0 0 0 I I 0 
I. I I 0 I I 0 . 0 I ~ 0 0 0 I I 0 
I I I 0 I I 0 '- 0 '- 0 0 0 0 I I 0 
I I I 0 I I ~ ' 0 I 0 0 0 0 I I 0 
I I '. 0 I I D ' 0 ' 0 ~ 0 0 I I 0 
I I I ~ I I 0 ' 0 I 0 G 0 0 I I 0 
I I I 0 I I 0 ' 0 I 0 0 0 0 I I 0 
I I I 0 I I 0 - ' I 0 0 0 0! I 0 1110110!0100001!0 
I I I 0 I I 0 I 0 I ~ 0 Q 0 I ! 0 
I I I 0 I I 0 I 0 I ~ 0 0 0 I I 0 I I I 'I ! I 0 ' 3 I 0 0 0 0 I I 0 
I I ' 0 I \ D \ 0 I 0 0 0 0 I I D 
I I I ~ I I 0 ' ": 0 0 0 0 I I 0 
I I 0 I I 0 ' 3 '. 0 tl 0 " ' I 
I I 0 I -' 0 . ! 0 0 0 3 '. ', 0 
I ' ' " I I I . - . 0 0 0 0 I I 0 
I 0 I I . ' I 0 0 0 0 I I 0 
I I I 0 I I 0 . " ' 0 0 ~ 7 I '. 0 111010!-0GCQI I 0 110110!. '000 ~ '10 
I I I 0 I I 3 0 I 0 0 0 D I I 0 
I I I 0 \ I 0 7 I 0 0 0 0 I I ~ 
I I I 0 I I 0 ' 0 0 0 0 I I 0 
I I ' 0 I I 0 I ". ' 0 0 0 ~ I I 0 1110110010000110 
I I I 0 I I C ~ I 0 0 0 0 I I 0 111011010D00110 
0 I I 0 0 0 I 
I I 0 0 0 I 0 
C I I 0 0 ~ I 0 0!!ODD!0 
0 ~ I I 
0 
0 0 I 
tl 0 I 
0 0 I I 
0 D '. I 
D ~ ! I 
~ ~ I I 
0 0 I I 0011 011 
I I 0 G I 
0 I I I 0 
0 ' ' I 0 0 I 0 
0 ' ' I 0 0 I 0 0' ll 0lo 
0 . ; I 0 0 ' 0 
0 ' I I 0 I 3 
I I ' - I 
0 0 
0010 0010 
0 0 I Ij 
0 G I 0 
0 0 I 0 
0 0 I 
0 0 I 
I 0 
0 I I 0 
0 I I 0 
0 I I 0 01!0 
0 I I 0 
I 0 
49 
I 0 0 0 I 0 0 ~ I I 0 ', ' . C 0 I 0 
0 I I 0 " 0 '. 0 0 0 I I 0 I I ' 0 0 I 0 
0 I I 0 0 0 * ~ 0 0 I I 0 I I ' 0 0 I 0 
0 I I 0 0 0 ' 0 ~ 0 I I 0 I I I 0 7 '. 0 
0 ! I 0 0 0 I 0 0 0 I I 0 I I I " 'J '. 0 
0 I I ~ 0 0 I 0 0 ~ I I 0 I . ' I 0 0 I 0 
~ I I 0 0 0 I 0 0 0 I I 0 I I 0 0 I 0 
0 I I 0 0 0 I 0 0 0 I I 0 . I I 0 0 I 0 Q!10001 ~ 00110!110Q!0 
0 I I 0 0 0 I 0 0 0 I I G I I ' 0 0 I 0 0!1000100011011. 00!0 
'J ' I 0 0 0 I 0 0 0 I I 0 I I I 0 0 . 0 
0 I I 0 0 0 I 0 0 0 I I 0 I I I 0 0 I 0 011000100011011!0010 
0 I I ~ 0 0 I 0 0 D I I 0, I I 0 0 I 0 
0 I I 0 D 0 I 0 0 0 I I 0 I I ' 0 0 I 0 
0 I ' 0 0 0 I ~ D 0 I I 0 I I ' 0 tl I 0 
0 I I 0 0 0 I 0 0 0 I I 0 I I . ' 0 0 I 0 
~ I ! 0 0 0 I 0 0 0 I I 0 I I I 0 0 I 0 
0 I I 0 0 0 I 0 0 0 I '. ~ I '. I 0 0 I 0 011 ~ 0010001101!10010 
0 I I 0 0 0 I 0 0 0 '. I 0 I I I 0 0 ' 0 
~ I I 0 0 0 I 0 0 0 I I 0 I I I 0 0 I 0 
0 I I 0 0 0 I 0 0 0 I I 0 . ' I: 0 0 I 0 
0 ' I 0 0 0 I 0 0 0 I I 0 I I I 0 ~ I 0 
0 I I 0 0 0 -' 0 0 0 I I I . ' I 0 0 I 0 
0 I I 0 ~ 0 I 0 0 0 I ' 0 I ', I 0 0 I 0 
0 I I 0 0 0 I 0 0 0 I I 0 I I ! 0 0 I 0 
0 I I ~ 0 0 I 0 0 0 I I 0 I I ' 0 0 I 0 0110001000110!100!Q 
0 I I 0 0 0 I 0 0 0 I I 0 I I ' 0 0 I 0 
0 ' I 0 0 0 I 0 0 0 I I Q I I '. II 0 I 0 
~ 1100010001101110010 
0 I I 0 0 0 I 0 0 0 I '. 0 I I I 0 D I 0 01100010001101110010 
0 I I 0 0 0 I 0 0 0 I I 0 I ! I 0 0 I 0 
0 I I 0 0 0 I 0 0 0 I I 0 I I '. 0 0 I 0 
0 I I 0 0 0 I 0 0 0 I I 0 I I I 0 0 I 0 
0 I I 0 0 0 I 0 0 0 I I 0 I I I 0 0 I 0 0!100010001101110010 
D I I 0 D 0 I 0 0 0 I I 0 I I I 0 0 I 0 
0 I I 0 0 0 I 0 0 0 I I 0 I I I 0 0 I 0 
0 I I 0 0 0 1 0 0 0 I 1 0 I I I 0 0 I 0 
0 I I 0 0 0 I 0 0 C I I 0 I I I 0 0 ! 0 
0 I I D 0 0 I 0 0 0 I I 0 I I I 0 D I 0 
0 I I 0 0 0 I 0 0 0 I I 0 I I I 0 0 I 0 
0 I I 0 0 0 I 0 0 0 I I 0 I I I 0 0 I 0 
0 I I 0 0 0 I 0 0 0 I I 0 I I I D 0 I 0 
0 I \ 0 0 0 I 0 0 0 I I 0 I I I 0 0 I 0 D!10001000110!110010 
0 I I D 0 0 I 0 0 0 I I 0 I I I 0 0 I 0 
0 I I 0 0 0 I 0 0 0 J. I 0 I I I 0 D I 0 
a I I 0 a a t a o o 11 o 11 I o o I o 
0 I I 0 0 D I 0 0 0 I I 0 I I I 0 0 I 0 
0 I I 0 0 0 I 0 0 0 I I 0 I I I 0 0 I 0 
0 I I 0 0 0 I 0 D 0 I I 0 I I I 0 0 I 0 
0 I I 0 0 0 I D 0 0 I I 0 ! I I 0 0 I 0 
0 I I 0 0 0 I 0 0 0 I I 0 I I I 0 0 I 0 
0 I I 0 0 0 I 0 0 0 I I D I I I 0 0 I 0 
0 I I 0 0 0 I 0 0 0 I I 0 ' I I 0 0 I 0 
0 I 1 0 0 0 I 0 0 0 I I 0 I I I 0 0 I 0 
D I I 0 0 0 I 0 0 0 I I 0 I I I 0 0 I 0 
0 I I 0 0 0 I 0 0 0 I I 0 I ! I 0 0 I 0 
0 I 0 0 0 I 0 II 0 I I 0 I I I 0 0 \ 0 
0 I I 0 0 D I 0 0 0 I I 0 I I I 0 D I 0 
0 I I D D 0 I 0 0 0 I I 0 I I I 0 0 I 0 
16t3 " 6 0 " 511 3 2 I 3 I 0" ll 62 5I3 51D 3!I 12" 13 "7 
6 I " 5 2 313 " 3 0 ". I I ' l2 
2 (6 (". * 
02506 
02586 
02 86 )'8J6 
5 
03 
!25(8 
336)3 
3 )36 3 
!86A3 
6A3 
2A3 
-. ZA3 
I ZA) 
5 AD 
7 5 5 AD 
" 4 5 AD 
24 JAD 
C 0 5 AD 
2 t) 5 AD 
0 05AD 
30!aQ 
00!00 01. 'QD 
') ', DFO 
'IDFD 
'5DD 
0!ICD 
001 
Qe)ao 
QA)8D 
'A30D 
AQOO 
IA(00 
IA40O 
IA40C 
!A410 
lx42C 
'. A4 OC 
IACCC 
IA40C 
IAQQC 
IA40C 
IA41C 
'A6 C 
IA6EC 
' A)FC 
IA350 
IA3 C !ajcc 
IBJCC 
- 8 I DC 
I 8 4 EC 
: 86EC 
' 86AC !8 CC 
la(cc 
I ~ 46C 
1846C 
la44c 
1925C 
OABSC 
OAF7C 
088)C 
2IIC)c 
03C7C 
2DDFC 
tl! '5 
2' 3. 5 0!S. J 
Qeez( 
IA. 26 
I A= 2 6 )6 
IA516 
)2506 
'. 2 
'. 2 5 C 5 
12 !2)c( 
!2 
'. I . : 5
-:: 5 
IQA a 
I 5( 
103(5 
I C6 
!85C( 
3812 
Z9D65 
ODD66 
20046 
2DD ~ ~ 
20005 
OC927 
22eA) 
12!t97 
15395 
OS)44 
05)AC 
OD526 
2A526 
2A506 
385II6 
la506 
' FE3 6 
!F83 ~ 
3F83 ~ 
3FSQC 
31'805 
3091E 
10018 
20DIA 
2052A 
2056A 
0 II 5 6A 
2 OSSA 
205EA 
204EA 
I 0 
I I 0 0 1100 
I I 0 ~ 
I I D 0 
I I 0 0 
I I 0 0 
I I 0 0 
I I 0 D 1100 lloa 100 1100 11 0 0 1100 
I I 0 0 100 1100 100 
I I D 0 1100 1100 1100 1100 100 1100 
I I 0 0 110D 1100 1100 1100 I! 0 0 100 
I 0 0 
1101 0\01 0101 
0 I 0 I 
0 I 0 I 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 
D I 0 I 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0701 0101 
D I D I 
JO!010000110 0! 7 0 0 I 0 I 0 I 0 0 
D I I 0 0 I 0 I 0 I 0 D 
D I I 0 0 I D I 0 I 0 D 0!". 0 
D I 0 01 3 0 01 I 0 01. 0 
0 I 0 
0 I . 0 0130 0110 
0 I " 9 
0 I 0 0 01 "Q 0100 0100 01 '0 0100 
0 I 0 0100 0100 010 ~ 
D I 0 0 
0 I " 0 0100 01G0 0100 
0 I 0 
0 I 0 01)0 
0100 0100 
D I D 0 0100 0100 0100 0100 0100 0100 0100 0100 0100 
a isa 
D I 0 0 
0 I D 0 01 0 0 01 0 0 0100 0100 0100 
0 10 0 0100 0100 010 0 
0 I 0 0 0100 
II IOD 
0 I 0 0 
0 I 0 I 0!DI 
0 I 0 101 0101 
0 I 0 I 
0 '. 0 I 010 0101 0101 0101 0101 
0 I Q I. 
0 I 0 I 0101 0101 010 0101 010 0101 
0 I O 0101 0101 0101 Qlo1 
0 I 0 I 0101 
0 ! D I !0 I I!7 0) 01700101010D 
0 'I " 0 0 I 0 I 0 I 0 0 
Output Tj. -anslat ion Simulation Data 
150 
VITA 
Gary Franklin Chard received his B. S. Degree in 
Electrical Engineering in May of 1988 from Texas A&M 
University, College Station, Texas. He is currently 
completing his M. S. Degree in Electrical Engineering also 
from Texas A&M University. He has worked for Texas 
Instruments Incorporated during the summer since 1985. He 
is a member of Eta Kappa Nu and Tau Beta Pi. His permanent 
address is 2503 Grandview Dr. , Richardson, Texas 75080. 
